Bayesian Estimation

Sonia Markes

University of Toronto

August 9, 2023

Recall: Introduction to Bayesian Inference

Components

Prior : \(\pi(\theta)\)
Data : \(\mathbf{x}=(x_1,...,x_n)\)
Likelihood : \(f_{\theta}(\mathbf{x})\) or \(f(\mathbf{x}|\theta)\)
Marginal : \(m(\mathbf{x})\)
Posterior : \(\pi(\theta|\mathbf{x})\)

Bayes Rule

\[ \pi(\theta | x)=\frac{f(x|\theta)\pi(\theta)}{m(x)} \]

The marginal distribution of the data \(m(\mathbf{x})\) is a normalizing constant with respect to \(\theta\).

The posterior of \(\theta\) is proportional to the likelihood times the prior.

\[ \pi(\theta | x) \propto f(x|\theta)\pi(\theta) \]

Point Estimation

Given data \(x=(x_1,...,x_n)\), realizations of a random sample \(X=(X_1,...,X_n)\), where \(X_i\sim F_{\theta}, \; \theta\in\Theta\), how do we arrive at an estimate \(\widehat{\theta}\)?

Frequentist: Find value of \(\theta\) that maximizes the log-likelihood

\[ \widehat{\theta}_{MLE}=\arg \max \ell(\theta) \]

Bayesian: Find the posterior distribution \(\pi(\theta|X_1,...,X_n)\) and then the estimator can be defined based on an appropriate summary statistic.

Summaries

Some common choices for summaries:

posterior median: \(\widehat{\theta}=\text{Median}(\theta|X_1,...,X_n)\)
posterior mode: \(\widehat{\theta}=\arg \max_{\theta} \pi(\theta|X_1,...,X_n)\)
posterior mean: \(\widehat{\theta}=\mathbb{E}(\theta|X_1,...,X_n)\)

Example 1 (Bernoulli model with Beta prior) Suppose we have a random sample \(X_1,...,X_n \overset{iid}{\sim} \text{Bernoulli}(\theta)\) and choose prior distribution \(\theta\sim\text{Beta}(a,b)\).

Find the posterior distribution of \(\theta|X_1,...,X_n\).

Find the posterior mean.

Find the posterior median.

Find the posterior mode.

Example 2 (Location Normal model with Normal prior) Suppose we have a random sample \(X_1,...,X_n \overset{iid}{\sim} \text{N}(\mu, \sigma_0^2)\) with \(\sigma^2_0\) known and choose prior distribution \(\mu \sim \text{N}(\mu_0,\tau_0^2)\).

Find the posterior distribution of \(\mu|X_1,...,X_n\).

Find an estimate for \(\mu\).

Compare with the MLE.

Credible Intervals

Definition

A \(100(1-\alpha)\%\) credible interval for \(\theta\) given data a random sample \(X=(X_1,...,X_n)\) is any pair \((L_n, U_n)\) such that

\[ \Pr\left( \left. L_n<\theta<U_n \right| X_1,...,X_n \right) = 1-\alpha \]

If \(q_\alpha\) represents the \(\alpha\)-quantile of the posterior, that is, \[ \int_{-\infty}^{q_\alpha} \pi (\theta|X_1,...,X_n)d\theta=\alpha \] then the following are \(100(1-\alpha)\%\) credible intervals:

\((-\infty,q_{1-\alpha})\)
\((q_{\alpha},\infty)\)
\((q_{\alpha/2},q_{1-\alpha/2})\)

When \(\Theta \subset \mathbb{R}\) (as opposed to \(\Theta = \mathbb{R}\)), the “\(\pm \infty\)” are replaced by the endpoints of \(\Theta\).

How do we pick a way of constructing a credible interval?

Choose the shortest interval.

\[ (q_{\alpha/2},q_{1-\alpha/2}) \] is the shortest interval if the distribution is unimodal.

Interpretation

Confidence intervals

If samples of size \(n\) are taken, such that they are drawn independently and separately, \(100(1-\alpha)\%\) of the resulting intervals would contain the true value of the parameter in the long run.

Credible intervals

The probability that the parameter is in the interval is \(1-\alpha\).

Example 3 (Location Normal model with Normal prior) Suppose we have a random sample \(X_1,...,X_n \overset{iid}{\sim} \text{N}(\mu, \sigma_0^2)\) with \(\sigma^2_0\) known and choose prior distribution \(\mu \sim \text{N}(\mu_0,\tau_0^2)\).

Compute a \(95\%\) credible interval for \(\mu\).

\[ \mu |\mathbf{x} \sim N \left( \left( \frac{1}{\tau_0^2}+\frac{n}{\sigma_0^2}\right) ^{-1} \left( \frac{\mu_0}{\tau_0^2}+\frac{n}{\sigma_0^2} \bar{x} \right), \left( \frac{1}{\tau_0^2}+\frac{n}{\sigma_0^2}\right) ^{-1} \right) \]

Example 4 (Bernoulli model with Beta prior) Let \(X_1,...,X_n \overset{iid}{\sim} \text{Bernoulli}(\theta)\). Choose prior distribution \(\theta\sim\text{Beta}(12,12)\). Suppose we observe \(7\) heads in \(10\) flips of this coin.

Compute a \(95\%\) credible interval for \(\theta\).

\[ \theta|\mathbf{x} \sim \text{Beta}(a+n\bar{x}, b+n(1-\bar{x})) \]

n <- 10
xbar <- 7/n
a <- 12
b = a

qbeta(c(0.025,0.975), a + n*xbar, b + n*(1-xbar))

[1] 0.3921530 0.7189338

Example 5 (Exponential model with Gamma prior) Let \(X_1,...,X_n \overset{iid}{\sim} \text{Exp}(\lambda)\) with density \(f_\lambda (x_i)=\lambda e^{-x_i\lambda}\). Use prior distribution \(\lambda \sim\text{Gamma}(\alpha,\beta)\) where \((\alpha,\beta)=(2, 3)\). Suppose we observe 7 in a sample of size \(n=10\).

Compute the posterior median and \(95\%\) credible interval for \(\lambda\).

Math
Code

n <- 10
xbar <- 7/n
alpha <- 2
beta <- 3

qgamma(c(0.025,0.5, 0.975), alpha + n, beta + n*xbar)

[1] 0.6200575 1.1668363 1.9682039