University of Toronto
July 17, 2023
🏆 You win if the outcome of the roll of a die is greater than or equal to 5.
With each die, what is the probability of getting an outcome that is greater than or equal to 5?
6-sided die
20-sided die
Which die would you rather play with?
Define a random variable \(W\) such that \[ W = \begin{cases} 1 & \text{if the outcome of the roll is $\geq 5$}\\ 0 & \text{otherwise} \end{cases} \]
What is the distribution of \(W\)? What is the parameter?
\[\;\]
Suppose we roll the die and observe \(w=1\). What is the best choice for the parameter of the distribution of \(W\)? Why?
\[\;\] \[\;\]
Definition 1 (Maximum Likelihood Principle) Given a dataset, choose the parameter(s) of interest in a way such that the data are most likely.
If the data \(x_1,...,x_n\) is a realization from a random sample \(X_1,...,X_n\), we can write the probability of observing \(x_1,...,x_n\) for a given parameter(s) \(\theta\) as a probability density function \(f(x_1,...,x_n|\theta)\), or, in the discrete case, a probability mass function \(p(x_1,...,x_n|\theta)\).
\[\;\]
Definition 2 (Maximum Likelihood Estimation) To estimate \(\theta\), find the value of \(\theta \in \Theta\) at which \(\Pr(X_1=x_1,...,X_n=x_n)\) is maximal.
Since \(X_1,...,X_n\) are independent
\[ P(X_1=x_1,...,X_n=x_n)=p(x_1|\theta)\times ...\times p(x_n|\theta) \]
Definition 3 (Likelihood function for discrete data) \[ L(\theta)= p(x_1|\theta)\times ...\times p(x_n|\theta) \]
Example 1 Say there are three flavours of chips: plain, BBQ, and ketchup. They have been combined into one bowl with proportions 30/40/30 or, in another bowl, with proportions 10/70/20. If you reach into a bowl and get a plain chip, what is the MLE? What if you get a BBQ chip?
Definition 4 (Likelihood function for continuous data) \[ L(\theta)= f(x_1|\theta)\times ...\times f(x_n|\theta) \]
Proof. Since \(P(X_i=x_i)=0\) for continuous random variables, consider \(\epsilon>0\), a small fixed value and choose \(\theta\) such that
\[ P(x_1-\epsilon \leq X_1 \leq x_1+\epsilon, ..., x_n-\epsilon \leq X_n \leq x_n+\epsilon) \]
is maximal. Since \(X_1,...,X_n\) are independent, this equals
\[ \begin{gather*} P(x_1-\epsilon \leq X_1 \leq x_1+\epsilon) \times...\times P(x_n-\epsilon \leq X_n \leq x_n+\epsilon) \\ \approx f(x_1|\theta) \times...\times f(x_n|\theta) \times (2\epsilon)^n \end{gather*} \] Since the value of \(\epsilon\) won’t effect the location of the maximum, we can choose \(\theta\) such that \(f(x_1|\theta)\times ...\times f(x_n|\theta)\) is maximized.
Example 2 Suppose data \(x_1,...,x_n\) are a realization of a random sample \(X_1,...,X_n\) such that \(X_i\sim\text{U}(0,\theta),\;\theta >0\). Find the MLE for \(\theta\).
Definition 5 (Log-likelihood function) \[ \ell(\theta)=\ln(L(\theta)) \]
Why?
\(\ln(xy)=\ln(x)+\ln(y)\)
\(\implies\) changes the product of probability density / mass functions to a sum
\(\ln\) is a monotonic increasing function
\(\implies\) does not change the value of \(\theta\) that gives the maximal value
Example 3 Consider a random sample \(X_1,...,X_n\) where \(X_i \sim Bernoulli(\theta)\). Let \(Y=\sum_{i=1}^n X_i\) represent the number of successes and \(Y\sim Binomial(n,\theta)\). What is the MLE for \(\theta\)?
Example 4 Suppose data \(x_1,...,x_n\) are a realization of a random sample \(X_1,...,X_n\) such that \(X_i\sim\text{N}(\mu,\sigma^2)\). Find MLEs for \(\mu\) and \(\sigma\).