\documentclass[11pt]{article}
\usepackage[Mickael]{ammaths}\en
\usealgorithms
\usepackage{aurical}
\begin{document}\nocouleur
\livreteuro{6}{Simulation et échantillonage}{Simulating and sampling}
\vfill
\begin{center}
\includegraphics[height=14cm]{simulating.eps}\\ \smallskip
\textsf{\small \emph{Yucca Muffin}, by Milo Beckman}
\end{center}
\vfill
\begin{center}
\fbox{\parbox{12cm}{\sf\small At the end of this chapter, you should be able to :
\begin{itemize}
\item compute the margin of error and fluctuation interval at $95\%$ confidence for a known probability ;
\item use a fluctuation interval to accept or reject an assumption ;
\item compute the confidence interval at $95\%$ for a sample ;
\item use a confidence interval to estimate a probability ;
\item use the calculator to simulate a random experiment.
\end{itemize}}}
\end{center}
\begin{flushright}
\sf\small Aymar de Saint-Seine et Mickaël Védrine\\ Année scolaire 2010/2011
\end{flushright}
\newpage\null\thispagestyle{empty}\newpage%insertion page blange
\bigskip
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Margin of error
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\exocpart{Margin of error}
\exoc{\textsf{Estimation and sampling : an election}
% Sampling, estimation, margin of error, confidence interval
% 2 hours
In 2008, the American electors had to choose between the Republican John McCain and the Democrat Barack Obama.
On Election day, Obama won with 53\%\ of the votes. Of course, this was not known to the candidates before the
election. Surveys were organised by both parties to \emph{estimate} the proportion of electors who wanted to vote for
each candidate. As it's impossible to gather the opinions of all the electors, surveys are carried over small parts of
the population, called \emph{samples}. We will consider that samples are built randomly.
\probpart{-- Point estimate}
\begin{enumerate}
\item Over a sample of 900 electors, 497 declared that they wanted to vote for Obama. Compute the percentage of
potential Obama electors in this sample.
\item The percentage computed for the sample is a \emph{point estimate} of the actual percentage in the whole
population. Compute the difference between this point estimate and the true value (known only after the election). What
do you think of this estimation ?
\item Ten other surveys were organized over the same period. The size of each sample and the number of potential Obama
electors are given in the table below.
\begin{center}
\begin{tabular}{|c|c|c|c|c|c|c|c|c|c|c|}
\hline
Survey & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 \\
\hline
Size & 895 & 873 & 900 & 885 & 899 & 842 & 878 & 900 & 897 & 892 \\
\hline
Obama electors & 462 & 493 & 501 & 437 & 467 & 447 & 468 & 495 & 488 & 478 \\
\hline
\end{tabular}
\end{center}
\begin{enumerate}
\item Compute the point estimate for each survey. Round the answers to 2DP.
\item How many surveys gave a point estimate equal to the true percentage ?
\item If McCain had only known about the 4th survey, what could he have deduced ?
\item What do you think of the point estimate method ?
\end{enumerate}
\end{enumerate}
\probpart{-- Margin of error}
Estimation by point estimate is not a very good method. The chances that the sample will yield the true value in the
whole population are very small. Furthermore, There may be important differences between the point estimates obtained
from different samples. This phenomenon is known as \emph{sampling fluctuation}.
To illustrate this, one hundred surveys were simulated with a computer, each one over a population of 900 people. (To
do so, the percentage of Obama electors in the whole population, $p=0.53$, was used). The scatterplot below shows the
percentage of potential Obama electors in each survey.
\begin{center}
\psset{xunit=0.1cm,yunit=50cm}
\begin{pspicture}(0,0.47)(101,0.58)
\psaxes[Dx=10,Dy=0.02,Oy=0.48](0,0.48)(101,0.581)
\psdots(1,0.53)(2,0.54)(3,0.49)(4,0.54)(5,0.5)(6,0.51)(7,0.5)(8,0.50)(9,0.54)(10,0.53)(11,0.52)(12,0.51)(13,0.5)(14,
0.5)(15,0.50)(16,0.51)(17,0.52)(18,0.51)(19,0.51)(20,0.51)(21,0.51)(22,0.54)(23,0.54)(24,0.51)(25,0.52)(26,0.53)(27,
0.53)(28,0.50)(29,0.50)(30,0.51)(31,0.52)(32,0.53)(33,0.53)(34,0.50)(35,0.52)(36,0.54)(37,0.52)(38,0.52)(39,0.52)(40,
0.51)(41,0.52)(42,0.53)(43,0.53)(44,0.53)(45,0.55)(46,0.53)(47,0.52)(48,0.53)(49,0.52)(50,0.54)(51,0.49)(52,0.51)(53,
0.54)(54,0.53)(55,0.52)(56,0.52)(57,0.54)(58,0.53)(59,0.52)(60,0.53)(61,0.51)(62,0.51)(63,0.52)(64,0.57)(65,0.52)(66,
0.54)(67,0.5)(68,0.51)(69,0.5)(70,0.52)(71,0.52)(72,0.51)(73,0.53)(74,0.54)(75,0.54)(76,0.52)(77,0.52)(78,0.53)(79,
0.52)(80,0.55)(81,0.55)(82,0.52)(83,0.52)(84,0.52)(85,0.55)(86,0.53)(87,0.52)(88,0.53)(89,0.48)(90,0.5)(91,0.52)(92,
0.5)(93, 0.53)(94,0.54)(95, 0.54)(96,0.5)(97,
0.56)(98,0.54)(99 ,0.51)(100,0.52)
\end{pspicture}
\end{center}
\begin{enumerate}
\item On the scatterplot, show the percentage $p$ of Obama electors in the whole population with a horizontal red line.
How many simulated surveys gave that exact value ?
\item \begin{enumerate}
\item The value $m=\frac{1}{\sqrt{n}}$, where $n$ is the size of a sample, is called the \emph{margin of
error at 95\%\ confidence} for that sample. Compute this value to 3DP.
\item On the graph, show the values $p-m$ and $p+m$ with two horizontal blue lines.
\item How many surveys gave a point estimate included in the interval $[p-m;p+m]$, called \emph{fluctuation interval at $95\%$ confidence} ?
\item Is the answer to the previous question consistent with the name of the interval ?
\end{enumerate}
\item Compute the margin of error and fluctuation interval at 95\%\ confidence for samples of size $n=25$, then $n=100$ and
$n=500$. What do you notice about the margin of error when the size of the sample increases ?
\end{enumerate}
}
\exoc{\textsf{The French lottery and odd numbers}
% Probabilities, simulation, margin of error, confidence interval (Aymar's add-on : Not very sure having confidence interval in this exercise)
% 2 hours
The principles of the French National Lottery (Loto) are fairly simple. Each player picks six numbers (plus one, that
we won't consider in this exercise) between 1 and 49. On lottery day, 49 balls with numbers from 1 to 49 are randomly
drawn from a machine. The balls are not put back in the machine, so the same number cannot appear twice in a drawing.
The order in which the balls are drawn is irrelevant.
Among the numbers from 1 to 49, there are 25 odd numbers and 24 even numbers. So it seems that the French lottery
favours odd numbers. This is what we will study in this exercise.
\probpart{-- Drawing a single number}
In this part, we consider the random experiment that consists in drawing a single ball from the 49 in the machine.
\begin{enumerate}
\item What is the probability of the drawn number being odd ? Give the result as an irreducible fraction and as an
approximate value to 2DP.
\item Fifty samples, each made of $n=100$ independant drawings of a ball were simulated with a computer. For each
sample, the proportion of odd numbers was computed. The results of these fifty samples of size $100$ are given below.
\begin{center}
\begin{tabular}{|c|c|c|c|c|c|c|c|c|c|}
0.44 & 0.52 & 0.50 & 0.44 & 0.51 & 0.41 & 0.44 & 0.40 & 0.57 & 0.50 \\
0.51 & 0.43 & 0.59 & 0.46 & 0.55 & 0.35 & 0.55 & 0.43 & 0.53 & 0.53 \\
0.45 & 0.42 & 0.47 & 0.48 & 0.50 & 0.45 & 0.48 & 0.47 & 0.46 & 0.57 \\
0.52 & 0.55 & 0.53 & 0.46 & 0.45 & 0.44 & 0.45 & 0.48 & 0.51 & 0.46 \\
0.55 & 0.48 & 0.43 & 0.51 & 0.49 & 0.38 & 0.52 & 0.40 & 0.50 & 0.46 \\
\end{tabular}
\end{center}
\begin{enumerate}
\item How many samples showed a proportion equal to the theoretical value to 2DP ?
\item Compute the fluctuation interval at $95\%$ confidence.
\item How many samples showed a proportion inside the margin of error ?
\item Can you find a margin of error at $99\%$ confidence ?
\end{enumerate}
\end{enumerate}
\probpart{-- Drawing six numbers}
In this second part, we consider the random experiment that consists of drawing successively six balls, without putting
them back in the machine. It can be proven that in each drawing of six numbers, there is an average of $3.0612$ odd
numbers, so a proportion $q=\frac{3.0612}{6}\approx 0.51$, or approximately $51\%$.
Fifty samples, each made of $n=100$ independant drawings of six succesive balls were simulated with a computer. For each
sample, the proportion of odd numbers was computed. The results of these fifty samples of size $100$ are given below.
\begin{center}
\begin{tabular}{|c|c|c|c|c|c|c|c|c|c|}
0.527 & 0.475 & 0.500 & 0.522 & 0.558 & 0.518 & 0.510 & 0.518 & 0.550 & 0.607 \\
0.515 & 0.468 & 0.517 & 0.485 & 0.498 & 0.473 & 0.505 & 0.507 & 0.492 & 0.498 \\
0.508 & 0.408 & 0.563 & 0.612 & 0.542 & 0.497 & 0.508 & 0.498 & 0.500 & 0.535 \\
0.498 & 0.508 & 0.525 & 0.478 & 0.517 & 0.528 & 0.492 & 0.487 & 0.535 & 0.523 \\
0.512 & 0.543 & 0.522 & 0.482 & 0.530 & 0.478 & 0.508 & 0.532 & 0.528 & 0.527 \\
\end{tabular}
\end{center}
\begin{enumerate}
\item Compute the fluctuation interval at $95\%$ confidence.
\item How many samples showed a proportion inside the interval ?
\end{enumerate}
\probpart{-- Probabilities on the number of odd numbers}
The table below shows the probabilities of drawing $k$ odd numbers among the six, for $k$ from $0$ to $6$. Values have
been rounded to 3DP.
\begin{center}
\begin{tabular}{|c|c|c|c|c|c|c|c|}
\hline
Odd numbers & $0$ & $1$ & $2$ & $3$ & $4$ & $5$ & $6$ \\
\hline
Probability & $0.010$ & $0.076$ & $0.228$ & $0.333$ & $0.250$ & $0.091$ & $0.013$ \\
\hline
\end{tabular}
\end{center}
For each of the following sentences, say if it's true or false. Justify each answer with a computation or an
explanation.
\begin{enumerate}
\item There are more chances to draw 4 odd numbers or more than 2 odd numbers or less.
\item There are as many chances to draw at least 3 odd numbers than at least 3 even numbers.
\item There are more than $90\%$ chances to draw at least 2 odd numbers.
\item There are as many chances to draw exactly 3 odd numbers than exactly 3 even numbers.
\item There are $50\%$ chances to draw as many odd numbers and even numbers.
\item There are more chances to draw no even number than to draw no odd number.
\end{enumerate}
% \probpart{-- A good strategy}
%
% We've seen in this exercise that the French lottery indeed favours slightly odd numbers over even numbers. Still,would
% it be a good strategy to decide playing only odd numbers ?
}
\exoc{\textsf{Using margins of error to make decisions}
% Surveys, sampling, confidence intervals
% 1 hour
\probpart{-- Accepting or rejecting an assumption}
It is know that in the French population, $26\%$ are allergic to pollen. In one particular sample of 400 people, 120 suffer from that allergy.
\begin{enumerate}
\item Compute the fluctuation interval at $95\%$ confidence.
\item What is the frequency of allergic individuals in this sample ?
\item Would you consider this sample representative of the French population ?
\end{enumerate}
\probpart{-- Parity in French Region councils}
After the 2004 regional elections in France, the repartition between women and men in four regional councils was as
follows. We consider that these councils are random samples of the local politician population in France.
\begin{center}
\begin{tabular}{|c|c|c|c|}
\hline
& Men & Women & Total \\
\hline
Burgundy & 32 & 25 & 57 \\
\hline
Brittany & 38 & 47 & 85 \\
\hline
Rhône-Alpes & 81 & 76 & 57 \\
\hline
Île-de-France & 103 & 106 & 209\\
\hline
\end{tabular}
\end{center}
\begin{enumerate}
\item Supposing that parity between men and women is real in a regional council, what should be the percentage of women
in that council ?
\item Find out the fluctuation interval at $95\%$ confidence for the proportion of women in each council.
\item What do you think of the parity between men and women in the local politician population in France.
\end{enumerate}
\pagebreak
\probpart{-- A car factory}
In a car factory, a control is done for flaws of the type ``grainy spots on the hood''. Normally,
$20\%$ of the vehicles present this kind of flaws. While controlling a random sample of 50 vehicles, it is seen that 13
vehicles have it. Should it be a matter of concern ?
\probpart{-- Rodrigo Partida's case}
In 1970, the Mexican-American Rodrigo Partida was sentenced to eight years of prison. He appealed to the judgment
contending that he was denied due process and equal protection of law because the grand jury of Hidalgo County, Texas,
which indicted him, was unconstitutionally underrepresented by Mexican-Americans. He introduced evidence that in
1970, the total population of Hidalgo County was 181,535 persons of which 143,611, or approximately 79.2\%\, were
persons of Spanish language or Spanish surname. Next, he presented evidence showing the composition of the
grand jury lists over a period of ten years prior to and including the term of court in which the indictment against
him was returned. Of the 870 persons selected for grand jury duty, only 39.0\%\ were Mexican-Americans. If you were a
judge in the court of appeals, how would you react to these allegations ?
}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Confidence Interval
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\exocpart{Confidence interval}
\exoc{In this exercise, we look again at the US 2008 election. We will introduce a better method of estimation, based
on the concept of margin of error. Instead of a simple point estimate, we will build for each sample a \emph{confidence
interval} whose diameter depends on the margin of error we allow.
We still note $p$ the percentage of Obama electors in the whole population (so $p=0.53$). Now, consider a sample fo
size $n$ yielding a point estimate $f$ of $p$. We've seen in the previous part that the margin of error at 95\%\
confidence is $m=\frac{1}{\sqrt{n}}$. Indeed, the probability of the point estimate $f$ being in the interval
$\left[p-\frac{1}{\sqrt{n}},p+\frac{1}{\sqrt{n}}\right]$ is approximately equal to $95\%$.
\begin{enumerate}
\item Translate the fact that $f$ belongs to that interval with two inequalities.
\item Prove that the fact that $f$ belongs to that interval is equivalent to the fact that $p$ belongs to the interval
$\left[f-\frac{1}{\sqrt{n}},f+\frac{1}{\sqrt{n}}\right]$.
\end{enumerate}
The interval $\left[f-\frac{1}{\sqrt{n}},f+\frac{1}{\sqrt{n}}\right]$ is called a \emph{$95\%$ confidence interval}.
Intuitively, this means that, knowing $f$ and not $p$, we have a $5\%$ risk of being wrong if we consider that $p$ is in
the interval. But, as $p$ is fixed, it's not really correct to talk about probability. Once the confidenc interval is
determined, $p$ is either in it or not !
\begin{enumerate}
\item Find the $95\%$ confidence intervals for the surveys of exercice 1 part A.
\item How many surveys gave a confidence interval including the real value ?
\end{enumerate}
}
\exoc{{\bf\textsf{The referendum on the European constitution}}
\begin{enumerate}
\item The French referendum on the Treaty establishing a Constitution for Europe was held on 29 May 2005 to decide
whether France should ratify the proposed Constitution of the European Union. The question put to voters was: ``Do you
approve the bill authorising the ratification of the treaty establishing a Constitution for Europe?''\\
Below are given the results of some surveys carried out before the referendum.
\begin{center}
\begin{tabular}{|c|c|c|c|}
\hline
Dates & Institute & Size & Proportion of <<~no~>> \\
\hline
18 and 19 March 2005 & Ipsos & 860 & 0.52 \\
\hline
25 and 26 March 2005 & Ipsos & 944 & 0.54 \\
\hline
1er and 2 April 2005 & Ipsos & 947 & 0.52 \\
\hline
16 and 17 March 2005 & CSA & 802 & 0.51 \\
\hline
23 March 2005 & CSA & 856 & 0.55 \\
\hline
1 and 2 April 2005 & Louis Harris & 1004 & 0.54 \\
\hline
31 March and 1 April 2005 & IFOP & 868 & 0.55 \\
\hline
24 March 2005 & IFOP & 817 & 0.53 \\
\hline
\end{tabular}
\end{center}
\begin{enumerate}
\item Find the $95\%$ confidence interval for each survey.
\item The result was a victory for the "No" campaign, with $54.67\%$. A commentator then said that not many surveys
had anticipated such a decisive result. What do you think of that opinion ?
\end{enumerate}
\item The United Kingdom referendum was expected to take place in 2006. Following the rejection of the Constitution by
voters in France in May 2005 and in the Netherlands in June 2005, the referendum was postponed indefinitely.\\
ICM research asked 1,000 voters in the third week of May 2005 ``If there were a referendum tomorrow, would you vote for
Britain to sign up to the European Constitution or not?'' : 57\%\ said no. Find the $95\%$ confidence interval for this
survey. If you were a politician, what would you deduce from this ?
\end{enumerate}}
\exoc{{\bf\textsf{M\&Ms}}
% Samples, margin of error, confidence interval
Josh Madison is a 30-something New Yorker who runs a personal website on the internet. In one of his articles, he states
the following crucial problem.
\bigskip
{\sl I love M\&M's. I'm partial to the plain Milk Chocolate variety, but I've been known to have a Peanut from time to
time
in order to remind myself why I don't like them that much. Often, while eating a pack, I'll wonder how they're made and
how the colors are distributed.\\
After wondering about it a little more, I checked out M\&M's web site. According to it, each package of Milk Chocolate
M\&M's should contain 30\%\ blue, 20\%\ brown, 10\%\ green, 10\%\ orange, 10\%\ red, and 20\%\ yellow M\&M's. I checked
the next few packages of M\&M's that I ate and found that their percentages were not even close to the stated
distribution. In my mind, this sort of confirmed my thoughts about how they produce M\&M's: When they make M\&M's, in
any production run, they produce the stated percentage of each color and then just fill the packs off a conveyor line or
some other weight based method. This would mean that any single package could be way off from the stated percentage; but
analyze the counts over a large number of packages, and they should converge towards the stated percentages.}
\bigskip
Today, we will study this essential problem on a few bags of M\&Ms. First, we will consider each bag as a sample.
\begin{enumerate}
\item Open your bag of M\&Ms and count the number of candies of each color and the total number of candies. Give the
results in an absolute frequency table. As soon as you have the results, give them to the teacher who will gather the
data for the whole class.
\item \begin{enumerate}
\item Compute the confidence interval at $95\%$ for each color of your sample.
\item Consider the blue M\&Ms. For how many bags was the expected percentage in the confidence interval ? What
would you conclude from this result ?
\item Answer the previous question for the other colors.
\end{enumerate}
\item To get a larger sample, we will now use the total numbers of candies of each color in all the bags in the class.
Compute the confidence interval with that sample and compare the experimental results with the
expected values. What would you conclude from that ?
\end{enumerate}
}
\exocpart{Simulations}
\exoc{\textsf{Random walks on an axis}
% Simulation with the calculator, probabilities, algorithm, margin of error, confidence interval
% 2 hours
\begin{center}
\psset{xunit=2cm,yunit=1cm}
\begin{pspicture}(-3.5,0)(3.5,1)
\psaxes(0,0)(-3.5,0)(3.5,0)
\uput[u](0,0.5){\includegraphics[width=0.7cm]{flea.eps}}
\end{pspicture}
\end{center}
A flea is moving along an axis. It starts from the origin and, after each jump, lands one unit to the right or one unit
to the left, randomly and with the same probability.
A sequence of jumps is called a \emph{walk}. For example, if the flea is always jumping to the right, the walk will be
noted RRRR. If it alternates between right and left, the walk will be noted RLRL.
\probpart{-- Simulations of 4-jumps walks}
The ``Random'' or ``Alea'' function on your calculator delivers a random decimal number between $0$ and $1$.
\begin{enumerate}
\item Devise a method to simulate a 4-jumps walk using the ``Random'' function.
\item Simulate 25 walks and note the final position of the flea at the end of each walk.
\item What are the possible final positions on the axis ? Explain why some are impossible.
\item Count the number of walks for each final position and show the counts in a table.
\item Add a row to the previous table with the absolute frequencies for the whole class.
\item Compute the relative frequencies for the whole class.
\item Compute the average final position of the flea at the end of a 4-jumps walk.
\end{enumerate}
\probpart{-- An algorithm}
\begin{multicols}{2}
A random walk can be described by the algorithm shown on the right-hand side, where the alea function delivers a random
number in the interval $[0,1[$. Parts of the algorithm have been omitted on purpose.
\begin{center}
\begin{minipage}{7cm}\IncMargin{1em}
\begin{algorithm}[H]
\Begin{$0\rightarrow x$ \;
$1\rightarrow i$ \;
\While{$i\leq 4$}{
\eIf{alea < $0.5$}{
$\ldots\ldots\ldots\rule{0pt}{15pt} \rightarrow x$ \;
}{
$\ldots\ldots\ldots\rule{0pt}{15pt} \rightarrow x$ \;
}
$i+1\rightarrow i$\;
}
\KwOut{$x$}}
\end{algorithm}\DecMargin{1em}
\end{minipage}
\end{center}
\end{multicols}
\begin{enumerate}
\item Explain the functions of the integers $x$ and $i$ in this algorithm.
\item Fill the two incomplete lines.
\item \begin{multicols}{2}
Here are the results of applying the algorithm once. What is the final position of the flea at
the end of this walk ?
\begin{center}
\begin{tabular}{c|c|c|c|c|c}
$i$ & & $1$ & $2$ & $3$ & $4$ \\
\hline
alea & & $0.37$ & $0.01$ & $0.93$ & $0.11$ \\
\hline
$x$ & $0$ & $1$ & $2$ & $1$ & $2$ \\
\hline
\end{tabular}
\end{center}
\end{multicols}
\item Apply the algorithm to create five new walks, using the random function of your calculator and displaying all the
steps of the algorithm like in the example of the previous question.
\item How would you change the algorithm to simulate a $30$-jumps walk ?
\end{enumerate}
\probpart{-- Probabilistic study}
In this part, we will use probabilities to study the situation and compare the theoretical results to the frequencies
we found in part A.
\begin{enumerate}
\item Draw a tree to show all the possible 4-jumps walks. At the end of each branch, write the final position of the
flea.
\item Use the tree to compute the probability of each final position and give the results in a probability table.
\item Compute the margin of error at $95\%$ confidence for your sample of 25 random walks.
\item For each probability, count in the class how many samples of 25 walks gave a frequency within the margin of error.
\end{enumerate}
}
%\pagebreak
\exoc{\textsf{A birth policy}
% Simulation, algorithm, sampling, margin of error, confidence interval
% 2 hours
\vspace*{-10pt}
\begin{multicols}{2}
A government has decided to impose a strict birth policy. Births in a family must stop as soon as a boy is born or
after the birth of the fourth child.
We consider in this exercise that the probabilities of giving birth to a girl or a boy are equal and that each birth is
independant from the previous births in the same family.
This birth policy can be represented as a tree, where the possible families are boxed.\par
\begin{center}
\begin{tabular}{c@{\hskip 0.1cm}c@{\hskip 0.1cm}c@{\hskip 0.1cm}c@{\hskip 0.1cm}c@{\hskip 0.1cm}c}
& \rnode{n1}{$\emptyset$} & & & & \\[0.5cm]
\rnode{n21}{\fbox{B}} & & \rnode{n22}{G} & & & \\[0.5cm]
& \rnode{n31}{\fbox{GB}} & & \rnode{n32}{GG} & & \\[0.5cm]
& & \rnode{n41}{\fbox{GGB}} & & \rnode{n42}{GGG} & \\[0.5cm]
& & & \rnode{n51}{\fbox{GGGB}} & & \rnode{n52}{\fbox{GGGG}}\\
\end{tabular}
\end{center}
\psset{nodesep=2pt} \ncline{->}{n1}{n21} \ncline{->}{n1}{n22}
\ncline{->}{n22}{n31} \ncline{->}{n22}{n32} \ncline{->}{n32}{n41}
\ncline{->}{n32}{n42} \ncline{->}{n42}{n51} \ncline{->}{n42}{n52}
\end{multicols}
\probpart{-- Simulation and statistical view}
\begin{enumerate}
\item Do you think that this policy will favour boys or girls ? No justification is expected.
\item Devise a method to simulate the composition of a family with the calculator.
\item Simulate and write down the composition of 100 families. Count the number of children per family and show the
results in a table with absolute and relative frequencies.
\item Compute the arithmetic mean $m_4$ and the median $d_4$ for the number of children per family in your sample of
$100$ families.
\item Compute the arithmetic mean $M_4$ and the median $D_4$ for all the families in the class.
\end{enumerate}
\probpart{-- An algorithm}
\begin{multicols}{2}
This process can be described as an algorithm. The output is then a list of digits, with $0$ representing a girl and $1$
representing a boy.
\begin{enumerate}
\item Explain the functions of the whole numbers $x$ and $i$ in this algorithm.
\item Explain the condition ``$x\neq 1$ and $i\leq 4$''. Does it ensure that the algorithm will always stop ?
\item What is the function of the list $L$ in this algorithm ?
\item Explain the notation $L(i)$.
\end{enumerate}
\begin{center}
\begin{minipage}{7cm}\IncMargin{1em}
\begin{algorithm}[H]
\Begin{
Clear list $L$ ; $0\rightarrow x$ ; $0\rightarrow i$ \;
\While{$x\neq 1$ and $i\leq 4$}{
$i+1\rightarrow i$\;
\eIf{alea < $0.5$}{
$0\rightarrow x$\;
}{
$1\rightarrow x$\;
}
$x\rightarrow L(i)$\;
}
\KwOut{$L$}}
\end{algorithm}\DecMargin{1em}
\end{minipage}
\end{center}
\end{multicols}
\begin{multicols}{2}
\begin{enumerate}\setcounter{enumi}{4}
\item Here are the results of applying the algorithm once.
Apply the algorithm to get 5 families, displaying all the
steps of the algorithm like in the example.
\end{enumerate}
\begin{center}
\begin{tabular}{c|c|c|c|c}
$i$ & & $1$ & $2$ & $3$ \\
\hline
alea & & $0.37$ & $0.01$ & $0.93$ \\
\hline
$x$ & $0$ & $0$ & $0$ & $1$ \\
\hline
$L$ & $()$ & $(0)$ & $(0,0)$ & $(0,0,1)$ \\
\hline
\end{tabular}
\end{center}
\end{multicols}
\probpart{-- Probabilistic study}
\begin{enumerate}
\item Copy the tree at the beginning of the exercise and add the probabilities.
\item Compute the probability of each type of family.
\item Show in a table the possible numbers of children and their probabilities. Are these probabilities consistent
with the frequencies found at the end of part A ?
\item Use the table to compute the expected value for the number of children in a family.
\end{enumerate}
\probpart{-- The percentage of girls}
The aim of this part is to study the percentage of girls $g$ induced by this birth policy, and therefore answer the
first question of part A. To do so, we will first use the simulations of part A, and then the probabilities of part C.
\begin{enumerate}
\item Count the number and percentage of girls in your 100 simulated families.
\item From the various values found in the class, what could be the theoretical value of this percentage ?
\item Compute the $95\%$ confidence interval for the percentage of girls in your sample.
\item Of all the confidence intervals in the class, how many include the possible value we found in question 2 ? Does
it validate or invalidate this hypothesis ?
\item Use part C to find the true value of the percentage $g$.
\end{enumerate}
}
\exoc{\textsf{Random walks on a tetrahedron}
% Simulation, probabilities, margin of error, confidence interval
% 2h
\begin{minipage}{10cm}
An ant is walking on the edges of a tetrahedron $ABCD$, starting from vertex $A$. When it gets to a vertex, it chooses
randomly the next edge it will walk on. The aim of this exercise is to study the time it will take for the ant to go
back to vertex $A$, assuming that it walks along one edge in exactly 1 minute.
\end{minipage}
\begin{minipage}{5.6cm}
\begin{center}
\psset{unit=0.6cm}
\pspicture(0,0.5)(7,5.5)
\pspolygon(1,2)(5,1)(6,3)(3,5)
\psline(5,1)(3,5)
\psline[linestyle=dashed](1,2)(6,3)
\cput(0.5,2){$A$}
\uput[d](5,1){$B$}
\uput[r](6,3){$C$}
\uput[u](3,5){$D$}
\endpspicture
\end{center}
\end{minipage}
A walk will be noted as a sucession of vertices, as in the example below :
$$A\rightarrow D\rightarrow B\rightarrow C\rightarrow A.$$
A walk will always start from $A$ and stop as soon as the ant comes back to $A$.
\probpart{-- Simulations}
\begin{enumerate}
\item Devise a method to simulate a random walk.
\item Simulate 25 random walks and count the duration of each one. Gather the data in a table with the absolute
frequency of each duration (from 1 to 20 minutes).
\item Explain the value in the column for 1 minute.
\item Is the duration necessarily less than 20 minutes ?
\item Find out the minimum, maximum, range, mean and median of this data.
\item Carry out the previous computations for all the simulated walks in the class.
\end{enumerate}
\pagebreak
\probpart{-- Probabilistic study}
In this part, we will gather the vertices B, C, D, for which the walk doesn't end in a single outcome noted BCD.
Therefore, from vertex A, the only possibility is to go to BCD, while from BCD it's possible to go to A or stay in BCD.
\begin{center}
\psset{xunit=1.0cm,yunit=1.0cm,algebraic=true,dotstyle=*,dotsize=3pt 0,linewidth=0.8pt,arrowsize=3pt
2,arrowinset=0.25}
\begin{pspicture*}(-0.72,-0.4)(7.66,2.68)
\pscircle(0.22,1.04){0.46}
\rput{-0.73}(4.71,1.03){\psellipse(0,0)(0.92,0.47)}
\rput[tl](0.08,1.22){A}
\rput[tl](4.32,1.20){BCD}
\parametricplot{1.0305922595526105}{2.1616224923228797}{
1*2.95*cos(t)+0*2.95*sin(t)+2.36|0*2.95*cos(t)+1*2.95*sin(t)+-0.97}
\parametricplot{4.19943498692105}{5.199866644868201}{1*3.27*cos(t)+0*3.27*sin(t)+2.33|0*3.27*cos(t)+1*3.27*sin(t)+3.47}
\parametricplot{-2.7287925292108506}{2.603954909218935}{1*0.7*cos(t)+0*0.7*sin(t)+6.34|0*0.7*cos(t)+1*0.7*sin(t)+0.92}
\psline{->}(0.97,1.63)(0.68,1.4)
\psline{->}(3.66,0.48)(3.94,0.64)
\psline{->}(5.79,0.49)(5.62,0.76)
\end{pspicture*}
\end{center}
\begin{enumerate}
\item Illustrate this new way of seeing the problem with a more simple graph.
\item Starting from BCD, compute the probabilities to go to A and to stay in BCD. Use these results to put the right
probabilities on the arrows of the previous graph.
\item Show in a probability tree the first four steps of this process.
\item Compute the probabilities of a 2-minutes, a 3-minutes and a 4-minutes walk.
\item Without adding a level to the tree, conjecture a value for the probability of a 5-minutes walk. Deduce the
probability of a walk lasting 5 minutes or less.
\end{enumerate}
\probpart{-- Estimation of a percentage}
In this part, we will find an estimation of the percentage $p$ of walks lasting 5 minutes or less.
\begin{enumerate}
\item Compute the percentage of walks lasting 5 minutes or less in your 25 simulated walks.
\item Compute the $95\%$ confidence interval for this percentage in your sample.
\item Does this interval validate the value conjectured in part B ?
\item Compute the $95\%$ confidence interval for this percentage in the sample made of all the simulated walks in the
class.
\item Does this interval validate the value conjectured in part B ?
\end{enumerate}
}
\partie{Homework}
\exoc{Galileo Galilei (15 February 1564 - 8 January 1642) was an Italian physicist, mathematician, astronomer and
philosopher, and withouth any doubt one of the greatest minds of all times. He was born, the same year as Shakespeare,
in the Italian town of Pisa. He was sent to the University of Pisa to study medicine, but in 1589 became professor of
mathematics (at the age of 25) through the favours of Ferdinando dei Medici, the Grand Duke of Tuscany. During this
period, Galileo was ``ordered'' by the Grand Duke of Tuscany to explain a paradox arising in the experiment of tossing
three dice.
\begin{quote}\Fontskrivan
Why, although there were an equal number of 6 partitions of the numbers 9 and 10. did experience state that the chance
of throwing a total 9 with three fair dice was less than that of throwing a total of 10?
\end{quote}
\pagebreak
\probpart{-- Simulating 1000 throws}
Use a spreadsheet (OpenOffice Calc or Microsoft Excel, for example) to simulate 1000 throws of a fair die, and count the
frequency of each result. To do so, you may use the following functions :
\begin{description}
\item[{\tt=ALEA.ENTRE.BORNES(1;N)} :] Produces a random integer between 1 and N.
\item[{\tt =SOMME(A1:A7)} :] Computes the sum of the numbers in cells A1 to A7, including all cells in
between.
\item[{\tt =NB.SI(A1:A7;k)} :] Counts how many times the value $k$ occurs in the cells A1 to
A7.
\end{description}
Give the results of your 1000 simulations in an absolute frequency table. Does it confirm the Duke's statement ?
\probpart{-- Galileo's solution}
Below is an extract from Galileo's answer to the Great Duke of Tuscany, translated by E. H. Thorne.
\begin{quote}\Fontskrivan
The fact that in a dice-game certain numbers are more advantageous than others has a very obvious reason, i.e. that some
are more easily and more frequently made than others, which depends on their being able to be made up with more variety
of numbers. Thus a 3 and an 18, which are throws which can only be made in one way with 3 numbers (that is, the latter
with 6.6.6 and the former with 1.1.1, and in no other way), are more difficult to make than, e.g. 6 or 7, which can be
made up in several ways, that is, a 6 with 1.2.3 and with 2.2.2 and with 1.1.4, and a 7 with 1.1.5, 1.2.4, 1.3.3, and
2.2.3. Nevertheless, although 9 and 12 can be made up in as many ways as 10 and 11, and therefore they should be
considered as being of equal utility to these, yet it is known that long observation has made dice-players consider 10
and 11 to be more advantageous than 9 and 12. And it is clear that 9 and 10 can be made up by an equal diversity of
numbers (and this is also true of 12 and 11). [...]
Three special points must be noted for a clear understanding of what follows. The first is that that sum of the points
of 3 dice, which is composed of 3 equal numbers, can only be produced by one single throw of the dice: and thus a 3 can
only be produced by the three ace-faces, and a 6, if it is to be made up of 3 twos, can only be made by a single throw.
Secondly: the sum which is made up of 3 numbers, of which two are the same and the third different, can be produced by
three throws: as e.g., a 4 which is made up of a 2 and of two aces, can be produced by three different throws; that is,
when the first die shows 2 and the second and third show the ace, or the second die a 2 and the first and third the ace;
or the third a 2 and the first and second the ace. And so e.g., an 8, when it is made up of 3.3.2, can be produced also
in three ways: i.e. when the first die shows 2 and the others 3 each, or when the second die shows 2 and the first and
third 3, or finally when the third shows 2 and the first and second 3. Thirdly the sum of points which is made up of
three different numbers can be produced in six ways. As for example, an 8 which is made up of 1.3.4. can be made with
six different throws: first, when the first die shows 1, the second 3 and the third 4; second, when the first die still
shows 1, but the second 4 and the third 3; third, when the second die shows 1, and the first 3 and the third 4; fourth,
when the second still shows 1, and the first 4 and the third 3; fifth, when the third die shows 1, the first 3, and the
second 4; sixth, when the third shows 1, the first 4 and the second 3. Therefore, we have so far declared these three
fundamental points; first, that the triples, that is the sum of three-dice throws, which are made up of three equal
numbers, can only be produced in one way; second, that the triples which are made up of two equal numbers and the third
different, are produced in three ways; third, that those triples which are made up of three different numbers are
produced in six ways.
\end{quote}
\begin{enumerate}
\item In the first paragraph, Galileo explains that there is only one way to make a 3, but several to make a 6.
\begin{enumerate}
\item Find in the text how many ways there are to make a 6.
\item In the same manner, find out all the ways to make the numbers 9 and 10.
\item What statement in Galileo's text is then proved ?
\end{enumerate}
\item Explain the second paragraph with the example of the different ways to make a 6, and deduce how many different
throws actually make this value.
\item Compute in the same way the number of different throws that make a 9 and those that make a 10.
\item Write the conclusion of Galileo's paper, explaining to the Great Duke of Tuscany the solution to this problem.
(Minimum 50 words, maximum 150 words.)
\end{enumerate}
\probpart{-- The Passe-Dix game}
Passe-dix, also called passage in English, is a game of chance using dice. It is played with three dice. There is always
a banker, and the number of players is unlimited. To win, a player must throw a point above ten (or pass ten -- whence
the name of the game).
\begin{enumerate}
\item Use Galileo's method to compute the number of throws that can give each possible sum. Give the results in an
absolute frequency table.
\item Add to the previous table the probabilities of each sum, given as an irreducible fraction and an approximate
value to 3 DP.
\item Compute the relative frequencies in the 1000 simulated throws of part A and compare them to the probabilities. Are
the values very different ?
\item Compute the probability of ``passing ten''. Is it a fair game ?
\end{enumerate}
}
\raz
\partie{Last year's test}
A random race is going on between a hare\footnote{hare : \emph{lièvre}} and a tortoise\footnote{tortoise :
\emph{tortue}}. It is played by rolling repeatedly a four-sided fair die.
\begin{itemize}
\itemb If a 4 turns up, the hare directly reaches the finish line and wins.
\itemb If a 1, a 2 or a 3 turns up, the tortoise moves towards the finish line and the die is rolled again. The tortoise
wins after moving four times.
\end{itemize}
\probpart{-- Simulation and statistical study}
\begin{enumerate}
\item Devise a method to simulate the game with a calculator.
\item A sample of 140 races have been simulated. Below are given the winners for each of these races, where the letter
H stands for the hare and T for the tortoise.
\begin{center}
\begin{tabular}{c*{19}{@{ -- }c}}%\hline
H&H&H&T&H&H&H&T&T&H&H&T&H&H&H&H&H&T&T&T\\ %\hline % 7 valeurs de T
T&T&H&T&T&T&T&H&H&H&T&T&T&H&T&T&H&H&H&T\\ %\hline% 12 valeurs de T
T&H&H&H&H&T&T&T&T&H&H&H&T&H&H&H&H&H&H&H\\ %\hline% 6 valeurs de T
T&T&T&H&H&H&H&H&T&T&T&T&H&T&H&H&H&T&T&H\\ %\hline% 10 valeurs de T
T&T&H&T&H&H&T&H&T&H&T&T&T&H&T&H&H&H&T&T\\ %\hline% 11 valeurs de T
T&H&H&T&H&H&T&T&H&T&T&H&H&H&H&T&T&H&H&H\\ %\hline% 8 valeurs de T
T&T&H&T&H&H&H&H&H&T&H&H&H&H&H&H&H&H&H&H\\ %\hline% 4 valeurs de T
\end{tabular}
\end{center}
\medskip
\item Copy the table below and fill it out with the absolute and relative frequencies.
\begin{center}
\begin{tabular}{|*{3}{c|}} \hline
Winner & Hare & Tortoise \\ \hline
Absolute frequencies & & \\ \hline
Relative frequencies & & \\ \hline
\end{tabular}
\end{center}
\item Compute the confidence interval at 95\% for the percentage of races won by the hare.
\item Explain in a few words what a confidence interval at 95\% confidence is.
\item What is the range\footnote{range of an interval : \emph{amplitude}} of this confidence interval ? How could we
lower it ?
\item According to this sample, do you think that these rules are to the advantage of the hare or the tortoise ?
\end{enumerate}
\probpart{-- An algorithm}
This game can be represented as the following algorithm.
\begin{center}
\begin{minipage}{9cm}\IncMargin{1em}
\begin{algorithm}[H]
\KwIn{$0\rightarrow T$ ; $O\rightarrow H$ \;}
\Begin{
\While{$H\neq 1$ and $T\neq 4$}{
\eIf{alea < $0.25$}{
$1\rightarrow H$ \;
}{
$T+1\rightarrow T$ \;
}
}
\eIf{\ldots}{
\KwOut{``The hare wins.''}
}{
\KwOut{``The tortoise wins.''}
}
}
\end{algorithm}\DecMargin{1em}
\end{minipage}
\end{center}
\begin{enumerate}\setcounter{enumi}{7}
\item Explain the functions of the numbers $H$ and $T$ in this algorithm.
\item Explain the condition ``$H\neq 1$ and $T\neq 4$''. Does it ensure that the algorithm will always stop ?
\item What should be entered in the last ``If'' instruction so that the output will be correct ?
\end{enumerate}
\probpart{-- A probabilistic view}
\begin{enumerate}\setcounter{enumi}{10}
\item Copy and fill out the following probability tree :\\
\begin{center}\scalebox{1}{
\pstree[treemode=R,nodesep=5pt,levelsep=3cm,treefit=loose,treenodesize=.1]{\Tr{$\emptyset$}}
{\pstree{\Tr{H}\taput{$\ldots$}}{}
%{\pstree{\Tr{L}\taput{$\frac14$}}{\Tr{$\cdots$}\taput{$\cdots$}\Tr{$\cdots$}\tbput{$\cdots$}}
%\pstree{\Tr{$T$}\tbput{$\frac34$}}{\Tr{$\cdots$}\taput{$\cdots$}\Tr{$\cdots$}\tbput{$\cdots$}}
%}
\pstree{\Tr{T}\tbput{$\ldots$}}
{\pstree{\Tr{H}\taput{$\ldots$}}{}%{\Tr{$\cdots$}\taput{$\cdots$}\Tr{$\cdots$}\tbput{$\cdots$}}
\pstree{\Tr{T}\tbput{$\ldots$}}{\Tr{$\ldots$}\taput{$\ldots$}\Tr{$\ldots$}\tbput{$\ldots$}}
}
}
}%scalebox
\end{center}
\item At the end of each branch of the tree, write down the winner and the probability of this situation.
\item Deduce that the probability of the hare winning is $\dfrac{175}{256}$.
\item According to the following result do you think that these rules are to the advantage of the hare or the tortoise ?
\item The probability computed in question 13 is not included in the confidence interval from part A. Does it mean that
something is wrong with one of these computations ?
\end{enumerate}
%
\newpage\thispagestyle{empty}\null\newpage
\null\vfill\thispagestyle{empty}
\begin{center}
{\Large\bf\textsf{Glossary}}
\end{center}
\begin{center}
\begin{tabularx}{15cm}{|c|c|X|}
\hline
\textbf{English} & \textbf{French} & \textbf{Explanation} \\
\hline
Survey & Sondage & A method for collecting quantitative information about items in a population. \\\hline
Sample & Échantillon & A subset of a population selected for measurement, observation or
questioning, to provide statistical information about the population. \\\hline
Sampling & Échantillonage & The process or technique of obtaining a representative sample. \\\hline
Margin of error & Marge d'erreur & An expression of the lack of precision in the results obtained
from a sample. \\\hline
Fluctuation interval & Intervalle de fluctuation & For a certain proportion of samples, the interval where the
parameter studied should be.\\\hline
Estimate (verb) & Estimer & To calculate roughly, often from imperfect data. \\\hline
Estimate & Estimation & A rough calculation or guess. \\\hline
Estimation & Estimation & The process of making an estimate. \\\hline
Point estimate & Estimation ponctuelle & A single value computed from sample data, used as a "best guess" for an
unknown population parameter. \\\hline
Confidence interval & Intervalle de confiance & A particular kind of interval estimate of a
population parameter. \\\hline
Simulate (verb) & Simuler & To model, replicate, duplicate the behavior, appearance or properties of a system or
environment\\\hline
Simulation & Simulation & Something which simulates a system or environment in order to predict
actual behaviour. \\
\hline
\end{tabularx}
\end{center}
\bigskip
\begin{center}
\begin{quote}
{\em Aw, people can come up with statistics to prove anything, Kent. Forfty percent of all people know that.}
\hfill (Homer Simpson)
\end{quote}
\bigskip
\begin{quote}
{\em Lottery: A tax on people who are bad at math.}
\hfill (Anonymous)
\end{quote}
\bigskip
\begin{quote}
{\em Do not put your faith in what statistics say until you have carefully considered what they do not say.}
\hfill (William W. Watt)
\end{quote}
\bigskip
\begin{quote}
{\em He uses statistics as a drunken man uses lampposts - for support rather than for illumination.}
\hfill (Andrew Lang)
\end{quote}
\end{center}
\vfill
\end{document}