Maximum Likelihood Estimation

Sonia Markes

University of Toronto

July 17, 2023

Dice Game

A 6-sided die and a 20-sided die.


🏆 You win if the outcome of the roll of a die is greater than or equal to 5.


Probability of winning

With each die, what is the probability of getting an outcome that is greater than or equal to 5?

6-sided die

20-sided die




Which die would you rather play with?

A model

Define a random variable \(W\) such that \[ W = \begin{cases} 1 & \text{if the outcome of the roll is $\geq 5$}\\ 0 & \text{otherwise} \end{cases} \]

What is the distribution of \(W\)? What is the parameter?


Suppose we roll the die and observe \(w=1\). What is the best choice for the parameter of the distribution of \(W\)? Why?

\[\;\] \[\;\]

Definition 1 (Maximum Likelihood Principle) Given a dataset, choose the parameter(s) of interest in a way such that the data are most likely.

If the data \(x_1,...,x_n\) is a realization from a random sample \(X_1,...,X_n\), we can write the probability of observing \(x_1,...,x_n\) for a given parameter(s) \(\theta\) as a probability density function \(f(x_1,...,x_n|\theta)\), or, in the discrete case, a probability mass function \(p(x_1,...,x_n|\theta)\).


Definition 2 (Maximum Likelihood Estimation) To estimate \(\theta\), find the value of \(\theta \in \Theta\) at which \(\Pr(X_1=x_1,...,X_n=x_n)\) is maximal.

Likelihood function

Discrete case

Since \(X_1,...,X_n\) are independent

\[ P(X_1=x_1,...,X_n=x_n)=p(x_1|\theta)\times ...\times p(x_n|\theta) \]

Definition 3 (Likelihood function for discrete data) \[ L(\theta)= p(x_1|\theta)\times ...\times p(x_n|\theta) \]

Properties of \(L(\theta)\)

  • Since \(x_1,...,x_n\) are fixed values, \(p(x_1|\theta)\times ...\times p(x_n|\theta)\) is a function of \(\theta\).
  • The value of the likelihood function is different for different sets of data.

Example 1 Say there are three flavours of chips: plain, BBQ, and ketchup. They have been combined into one bowl with proportions 30/40/30 or, in another bowl, with proportions 10/70/20. If you reach into a bowl and get a plain chip, what is the MLE? What if you get a BBQ chip?

Continuous case

Definition 4 (Likelihood function for continuous data) \[ L(\theta)= f(x_1|\theta)\times ...\times f(x_n|\theta) \]

Proof. Since \(P(X_i=x_i)=0\) for continuous random variables, consider \(\epsilon>0\), a small fixed value and choose \(\theta\) such that

\[ P(x_1-\epsilon \leq X_1 \leq x_1+\epsilon, ..., x_n-\epsilon \leq X_n \leq x_n+\epsilon) \]

is maximal. Since \(X_1,...,X_n\) are independent, this equals

\[ \begin{gather*} P(x_1-\epsilon \leq X_1 \leq x_1+\epsilon) \times...\times P(x_n-\epsilon \leq X_n \leq x_n+\epsilon) \\ \approx f(x_1|\theta) \times...\times f(x_n|\theta) \times (2\epsilon)^n \end{gather*} \] Since the value of \(\epsilon\) won’t effect the location of the maximum, we can choose \(\theta\) such that \(f(x_1|\theta)\times ...\times f(x_n|\theta)\) is maximized.

Example 2 Suppose data \(x_1,...,x_n\) are a realization of a random sample \(X_1,...,X_n\) such that \(X_i\sim\text{U}(0,\theta),\;\theta >0\). Find the MLE for \(\theta\).

Definition 5 (Log-likelihood function) \[ \ell(\theta)=\ln(L(\theta)) \]


  • \(\ln(xy)=\ln(x)+\ln(y)\)

    \(\implies\) changes the product of probability density / mass functions to a sum

  • \(\ln\) is a monotonic increasing function

    \(\implies\) does not change the value of \(\theta\) that gives the maximal value

Example 3 Consider a random sample \(X_1,...,X_n\) where \(X_i \sim Bernoulli(\theta)\). Let \(Y=\sum_{i=1}^n X_i\) represent the number of successes and \(Y\sim Binomial(n,\theta)\). What is the MLE for \(\theta\)?



  • Given a set of data, the maximum likelihood estimate of \(\theta\) is the value of \(t=h(x_1, ...,x_n)\) that maximizes the likelihood function \(L(\theta)\).
  • The maximum likelihood estimator of \(\theta\) is the random variable \(T=h(X_1, ...,X_n)\) that corresponds to the maximum likelihood estimate.


  • Invariance principle: If \(\widehat{\theta}_{MLE}\) is the maximum likelihood estimator of a parameter \(\theta\) and \(g(\theta)\) is an invertible function of \(\theta\), then \(g(\widehat{\theta}_{MLE})\) is the maximum likelihood estimator for g(θ).
  • Asymptotically unbiased: Even for an MLE that is biased, as \(n\rightarrow \infty\), \(bias(\widehat{\theta}_{MLE})\rightarrow 0\).
  • Asymptotically minimum variance: In the limit as \(n\rightarrow \infty\), MLEs have the smallest variance among all unbiased estimators.

Example 4 Suppose data \(x_1,...,x_n\) are a realization of a random sample \(X_1,...,X_n\) such that \(X_i\sim\text{N}(\mu,\sigma^2)\). Find MLEs for \(\mu\) and \(\sigma\).