University of Toronto
August 9, 2023
An empirical and iterative process for developing shared knowledge.
We will use Shoemaker’s data of 130 simulated body temperature readings for 65 hypothetical males and 65 hypothetical females.
temp
: temperature in degrees Fahrenheitsex
: 1 = male, 2 = femaleheartrate
: heart rate in beats per minuteFind a \(95\%\) confidence interval for the mean body temperature.
Recall: The studentized mean for a random distribution
\[ \frac{\bar{X}_n-\mu}{S_n/\sqrt{n}} \sim t_{n-1} \]
\[ \Pr\left( -t_{n-1,\;\alpha/2} \leq\frac{\bar{X}_n-\mu}{S_n/\sqrt{n}} \leq t_{n-1,\;\alpha/2}\right) = 1-\alpha \]
Quick exercise: Fill in the intermediate steps.
which gives this formula for a confidence interval \[ \left(\bar{x}_n - t_{n-1,\;\alpha/2} \frac{s_n}{\sqrt{n}} ,\; \bar{x}_n + t_{n-1,\;\alpha/2} \frac{s_n}{\sqrt{n}}\right) \]
lower upper
36.734 36.876
Does the data indicate that mean body temperature is not \(37^{\circ} C\)?
A popular type of summary statistic used when addressing hypothesis-based research questions is called a p-value.
If the parameter is what we claim, does it seem possible that we got the data that we did?
Idea
Terminology
Assuming that \(\theta_0\) is the true value of \(\theta\), the p-value is the probability of observing the test statistic or something more extreme.
We will define a p-value as \[ p_o={\Pr}_{\theta_0}\left(\left|T(X) \right| \geq \left| t_{obs}\right| \right) \] where \(t_{obs}=T(x)\)
If the claim is true, then the test stat is an observation from a t-distribution with \(n-1\) degrees of freedom. What is the probability of getting the observed test statistic, or a value that is more extreme?
Interpretation
The p-value is the probability of obtaining a result from data that is equal to or more extreme than what was actually observed, when calculated assuming the null hypothesis is true.
When we want to make {yes, no} decisions about whether the data supports a certain true value of the parameter, we call the process a statistical hypothesis test.
Very often, p-values are used in statistical hypothesis testing.
Suppose \(X_1, ..., X_n\overset{iid}{\sim} F_\theta\). We want to assess the support for the claim that \(\theta = \theta_0\).
A small p-value can occur because
If we assume the distribution of the data is Normal, then we know that these test statistics have the following distributions:
\[ \frac{\bar{X}_n-\mu}{\sigma/\sqrt{n}}\sim \text{N}\left( 0,1 \right) \quad\text{if }\sigma\text{ is known} \]
\[ \frac{\bar{X}_n-\mu}{S_n/\sqrt{n}}\sim t_{n-1} \quad\text{if }\sigma\text{ is unknown} \]
\[ \frac{\bar{X}_n-\mu}{S_n/\sqrt{n}} \;\dot{\sim} \;\text{N}\left( 0,1 \right) \quad \text{for large }n \]
Rough guidelines for strength of evidence:
Advertisements for a particular chocolate bar claims that they have 5 peanuts in every bar. You eat one of their chocolate bars and only find 2 peanuts. So you buy 49 more chocolates bars. You find average number of peanuts per bar is 4.37 and the standard deviation is 1.28. Does the data support the claim in the ad?
To describe how well a model fits a set of observations, we use goodness of fit tests.
Suppose we have data \(x=(x_1,...,x_n)\), which are realizations of a sample \(X=(X_1,...,X_n)\). We are interested in the claim that \(\theta=\theta_0\) where \(\theta\) is a parameter of the probability distribution of \(X_i\).
Say we have found the maximum likelihood estimate of \(\theta\), and denote it as \(\widehat{\theta}_{MLE}\).
The likelihood ratio is defined as
\[ \Lambda(\theta_0)=\frac{L\left( \theta_0|X\right)}{L\left( \widehat{\theta}_{MLE}|X\right)} \]
\[ \; \]
Notice that \(0 \leq \Lambda(\theta_0)\leq1\)
If we assume that the MLE satisfies \(\left. \frac{\partial\ell}{\partial\theta}\right|_{\theta=\widehat{\theta}_{MLE}} = 0\), it can be shown that the sampling distribution of the MLE is asymptotically normal. It can be shown that for large \(n\),
\[ -2\log \Lambda(\theta_0) \sim \chi_1^2 \]
We say that the likelihood ratio statistic has a chi-squared distribution with one degree of freedom. The p-value for the likelihood ratio statistic is computed as
\[ p_0=\Pr\left( \chi_1^2 > -2\log \Lambda(\theta_0) \right) \]
Suppose we flip a coin 50 times and get 20 heads. Is this evidence that the coin is unfair?
n <- 50
xsum <- 20
# Claim:
p0 <- 0.5
# Estimate:
p_mle <- xsum / n
# Define the log likelihood function
likelihoodfcn <- function(p){
p^(xsum) * (1-p)^(n-xsum)
}
# Compute the likelihood ratio
LR <- likelihoodfcn(p0) / likelihoodfcn(p_mle)
# Do a likelihood ratio test to check if the data supports the claim
lr <- -2*log(LR)
1 - pchisq(lr, df = 1)
[1] 0.1559
If \(\theta \in \mathbb{R}^d\) for \(d>1\) and \(\theta = \theta_0\), then for large \(n\),
\[ -2\log \Lambda(\theta_0) \sim \chi_{d-1}^2 \]
Example: The supplemental text shows how to test a claim that digits in a dataset have been generated uniformly.
The degrees of freedom in the chi-squared distribution depend on the difference between in the degrees of freedom in the MLE and in the degrees of freedom in the hypothesis.