University of Toronto
July 10, 2023
Consider repeating an experiment many times.
Represent the experiments by a sequence of random variables
\[ X_1, X_2, X_3,... \]
where \(X_i\) is the outcome of the \(i^{th}\) experiment.
How reasonable are these assumptions? When might they be violated?
We say \(X_1, X_2, X_3,...\) are independent and identically distributed, or iid, if every \(X_i\) has the same distribution, which we will call \(F\).
\[ X_1, X_2, X_3,... \overset{iid}{\sim} F \]
Common setting
Suppose \(X_1, ..., X_n\) are \(iid\) random variables with \(\mathbb{E}[X_i]=\mu\) and \(\text{Var}(X_i) = \sigma^2<\infty\).
If \(X\) and \(Y\) are random variables and \(a\) is a constant, then
\[ \mathbb{E}[X+Y]= \mathbb{E}[X] + \mathbb{E}[Y] \]
\[ \mathbb{E}[aX] = a\mathbb{E}[X] \]
If \(X\) and \(Y\) are iid random variables and \(a\) is a constant, then
\[ \text{Var}[X+Y]= \text{Var}[X] + \text{Var}[Y] \]
\[ \text{Var}[aX] = a^2\text{Var}[X] \]
Why is the \(iid\) assumption needed here?
What is the expected value of \(\bar{X}_n\)? \[ \; \]
What is the variance of \(\bar{X}_n\)? \[ \; \]
Repeated experiments have a smaller variance than does a single run of an experiment.
\[ \text{Var}[\bar{X}_n] \leq \text{Var}[X_i] \]
How likely is it for a random variable to be outside the interval \(\left( \mathbb{E}[Y]-a, \mathbb{E}[Y]+a \right)\)?
For any probability distribution, most probability mass is within a few standard deviations from the expectation.
Theorem 1 (Chebyshev’s Inequality) Consider a random variable \(Y\) with \(\mathbb{E}[Y]<\infty\) and \(\text{Var}(Y) <\infty\) and a constant \(a>0\). Then
\[ \text{Pr}\left( \lvert Y-\mathbb{E}[Y] \rvert \geq a \right)\leq \frac{1}{a^2}\text{Var}\left( Y\right) \qquad(1)\]
Proof. Let \(f_Y\) be the pdf of \(Y\) and \(\mathbb{E}[Y]=\mu\).
Note that
\[ \text{Pr}\left( \lvert Y-\mu \rvert < a\right) = 1-\text{Pr}\left( \lvert Y-\mu \rvert \geq a\right) \]
Corollary 1 (also, Chebyshev’s Inequality) For a random variable \(Y\) with \(\mathbb{E}[Y] = \mu <\infty\) and \(\text{Var}(Y)=\sigma^2 <\infty\), and a constant \(k>0\). Then
\[ \text{Pr}\left( \lvert Y-\mu \rvert < k\sigma \right)\geq 1-\frac{1}{k^2} \qquad(2)\]
Repeating an experiment more times makes it more likely that the sample mean is close to the expectation.
Theorem 2 (Law of Large Numbers) The sample mean \(\bar{X}_n\) converges in probability to \(\mathbb{E}[X_i]=\mu\), the true mean, provided that \(\text{Var}(X_i) = \sigma^2 < \infty\). This is written as
\[ \bar{X}_n \overset{p}{\longrightarrow} \mu \;\;\; \text{when} \;\;\; n \longrightarrow \infty \]
Proof. Apply Chebyshev’s Inequality (Corollary 1) to \(\bar{X}_n\). Let \(\varepsilon>0\). Then
\[ \text{Pr}\left( \lvert \bar{X}_n-\mu \rvert < \varepsilon \right) \geq 1 - \frac{\sigma^2}{n\varepsilon^2} \]
Taking the limit as \(n\rightarrow \infty\) gives
\[ \lim_{n\rightarrow \infty} \text{Pr}\left( \lvert \bar{X}_n-\mu \rvert <\varepsilon \right)=1 \]
which is the definition of convergence in probability.
This derivation relies on the assumption of finite variance.
Versions of the LLN
This version of the Law of Large Numbers (LLN) is more precisely known as the Weak Law of Large Numbers (WLLN).
There is a Strong Law of Large Numbers (SLLN), which states that \[ \bar{X}_n \overset{a.s.}{\longrightarrow} \mu \;\;\; \text{when} \;\;\; n \longrightarrow \infty. \] This type of convergence is called almost sure, and is defined as \[ \text{Pr}\left( \lim_{n\rightarrow \infty} \bar{X}_n=\mu \right)=1 \]
Given: Random variable \(X\) with \(\mathbb{E}[X]=\mu\) and \(\text{Var}[X]=\sigma^2\).
Want: Random variable \(Z\) such that \(\mathbb{E}[Z]=0\) and \(\text{Var}[Z]=1\).
How?
\[ Z=\frac{X-\mathbb{E}[X]}{\left(\text{Var}(X)\right)^{1/2}}=\frac{X-\mu}{\sigma} \]
Can we verify that \(\mathbb{E}[Z]=0\) and \(\text{Var}[Z]=1\)?
Can we standardize the sample mean, \(\bar{X}_n\)?
Use \(\mathbb{E}[\bar{X}_n] =\mu\) and \(\text{Var}[\bar{X}_n] = \frac{\sigma^2}{n}\) to get
\[ Z_n = \frac{\bar{X}_n-\mathbb{E}[\bar{X}_n]}{\left(\text{Var}(\bar{X}_n)\right)^{1/2}} = \sqrt{n}\frac{\bar{X}_n-\mu}{\sigma} \]
Theorem 3 (Central Limit Theorem) Let \(X_1, ..., X_n\) be a sequence of \(iid\) random variables with \(\mathbb{E}[X_i]=\mu\) and \(\text{Var}(X_i) = \sigma^2<\infty\). Then
\[ \underset{n\rightarrow\infty}{\lim} \text{Pr} \left( Z_n\leq z\right)=\Phi(z) \]
where \(Z_n=\sqrt{n}\frac{\bar{X}_n-\mu}{\sigma}\) and \(\Phi(z)\) denotes the CDF of the standard normal distribution.
We say that \(Z_n\) converges in distribution to \(Z\sim N(0,1)\). That is,
\[ Z_n \overset{D}{\longrightarrow}Z \;\; \text{where} \;\; Z\sim N(0,1) \]
or, equivalently,
\[ \bar{X}_n \overset{D}{\longrightarrow}Y \;\; \text{where} \;\; Y\sim N\left( \mu,\frac{\sigma^2}{n} \right) \]
CLT holds regardless of the distribution of \(X_i\) (!)
Exercise 1 Let \(X_1, ..., X_n\) represent \(iid\) measurements of the weights of newborn babies. The distribution is unknown, but it has mean \(\mu=7.2 \,lbs\) and variance \(\sigma^2=4.8\).
Some helpful values:
[1] 0.2593025
More helpful values:
[1] 1.281552 7.456310
Exercise 2 Let \(X_1, ..., X_n\) be \(iid\) and represent the wait times for customers at a call centre. The distribution is unknown, but it has mean \(\mu=3.6\) minutes and variance \(\sigma^2=2.1\). Let \(T_n\) represent the total wait time for \(n\) customers.
Hint: \(T_n=\sum_{i=1}^n X_i = n\bar{X}_n\)
Some helpful values:
[1] 0.3788104
More helpful values:
[1] 0.6744898 3.9090900
Exercise 3 If you had to write a test without preparing, estimate the probability that you’d be able to pass it if there were 20 questions which were True / False answers only.
Some helpful values: