Jon's Dev Blog

Stats 3 - Hypothesis Testing I

February 15, 2021

Statistical Experiments

  1. A statistical experiment consists of the following data:
    1. A set X1,X2,,XnX_1, X_2, \ldots, X_n of iid random variables with a common (unknown) distribution P\mathbb{P}.
    2. A (parametric, identifiable) statistical model (E,{Pθ:θΘ})(E, \{\mathbb{P}_\theta:\theta\in\Theta\}) for P\mathbb{P} which is well-specified (i.e., there exists θΘ\theta^\ast\in\Theta such that P=Pθ\mathbb{P} = \mathbb{P}_{\theta^\ast}).
    3. A partition of Θ\Theta into disjoint sets Θ0\Theta_0 and Θ1\Theta_1, which represent the null and alternative hypotheses, respectively.
  2. Note that θ\theta^\ast is fixed (i.e., non-random). The purpose of a statistical experiment is not to determine its location, but rather reject the assertion that it lies in Θ0\Theta_0.
  3. Given an observation X1,X2,,XnX_1, X_2, \ldots, X_n, we often formulate the null and alternative hypotheses as a function of θ\theta, i.e., we write H0,H1H_0,H_1 TODO: Think about this first.

Tests and Errors

  1. A hypothesis test is a function ψn:En{0,1}.\psi_n: E^n \to \{0,1\}.

  2. The Type I Error associated to ψn\psi_n is

    αψn(θ)=Pθ(ψn(X1,X2,,Xn)=1θΘ0).\alpha_{\psi_n}(\theta) = \mathbb{P}_\theta(\psi_n(X_1, X_2, \ldots, X_n) = 1 \mid \theta\in\Theta_0).

    This represents the probability of rejecting the null hypothesis, given that it is true. Note that in most examples, Θ0\Theta_0 is a singleton (in these cases, α\alpha is a single number), but this is not a guarantee.

  3. The Type II Error associated to ψn\psi_n is

    βψn(θ)=Pθ(ψn(X1,X2,,Xn)=0θΘ1).\beta_{\psi_n}(\theta) = \mathbb{P}_\theta(\psi_n(X_1, X_2, \ldots, X_n) = 0 \mid \theta\in\Theta_1).

    This likewise represents the probability of failing to reject the null hypothesis, given that it is false.

  4. The Power of a statistical test ψn\psi_n is 1 minus the type II error. I think this is wrong. It should be 1 minus the infemum of type II error.

The Level of a Test

  1. An important point about hypothesis testing: While we always want to strike a balance between type I and II errors, we will usually specify a Level for our test, which represents a certain amount of type I error we are willing to tolerate. So in this since, we will generally favor minimizing type II error, given a certain level. The level of a test is denoted

    α:=sup{αψ(θ)θΘ0}.\alpha := \sup\{\alpha_\psi(\theta) \mid \theta\in \Theta_0\}.

    We will often say a test rejects at level α\alpha.

  2. Often, we will not understand the distribution of our test statistic directly, but rather only understand its asymptotic distribution (i.e., considering sample size in the limit). In this case, we will specify an Asymptotic Level of a statistical test, also denoted by

    α:=lim supnsup{αψn(θ)θΘ0}.\alpha := \limsup_{n\to\infty}\sup\{\alpha_{\psi_n}(\theta) \mid \theta\in \Theta_0\}.

The pp-value of a Test

  1. In general, a test has the form

    ψn,α=1{Tn>Cα},\psi_{n,\alpha} = \mathbf{1}\{T_n>C_\alpha\},

    where TnT_n is called the test statistic, which in our current examples, will take the form

    Tn=n(f(Xn)θ0),T_n = \sqrt{n}(f(\overline{X_n}) - \theta_0),

    where f(Xnˉ)f(\bar{X_n}) is an estimator for θ0\theta_0, and which converges in distribution to something familiar (so far in our examples, this is usually N(0,σ2)N(0,\sigma^2)). What this means is that by choosing CαC_\alpha, we can decide the asymptotic level α\alpha. Thus, most of the work will be finding the test statistic TnT_n.

  2. In most examples, the null hypothesis is a singleton {θ0}\{\theta_0\} (for two-sided tests) or a half interval {[θ0,±)}\{[\theta_0,\pm\infty)\} (for one-sided tests). However, this does not generalize well to more abstract null hypotheses.

  3. The Asymptotic pp-value of a test ψn,α\psi_{n,\alpha} is the smallest asymptotic level α\alpha at which ψn,α\psi_{n,\alpha} rejects H0H_0, given an observation X1,X2,,XnX_1, X_2, \ldots, X_n. Equivalently, this is the probability (under the nearest point of the null hypothesis) of an event at least as extreme as the observation.

  4. I should note that I have added some of my own interpretation here. It's not clear from the lectures that pp-value makes no sense without an a priori observation, but this is the only way I can make sense of the definition, whereas everything else makes sense as a function of the observation or parameter space.

  5. In most straightforward statistical tests, we don't usually use asymptotic levels or pp-values. Instead, the exact distribution of TnT_n will be known, e.g., from a t-distribution or χ2\chi^2-distribution.

An example

Recall the kiss example: We observe nn people kissing and record whether they turn their heads to the right or left. We model these observations with a Bernoulli distribution, with null hypothesis Θ0={1/2}\Theta_0=\{1/2\} and alternative hypothesis Θ1=(0,1/2)(1/2,1)\Theta_1=(0,1/2)\cup(1/2,1). Let

Tn=nXnˉ1/2(1/2)(1(1/2))=2nXnˉ1/2T_n = \sqrt{n}\cdot\frac{\lvert\bar{X_n} - 1/2\rvert}{\sqrt{(1/2)(1-(1/2))}} = 2\sqrt{n}\lvert\bar{X_n} - 1/2\rvert

and let

ψn,α=1(Tn>qα/2).\psi_{n,\alpha} = \mathbf{1}(T*n > q*{\alpha/2}).

Then by the Central Limit Theorem (assuming the null hypothesis), ψn,α\psi_{n,\alpha} has asymptotic level α\alpha:

Level:=limnsuppΘ0(αψn,α(p))=limnP1/2(ψn,α=1)=αCLT.\text{Level} := \lim_{n\to\infty}\sup_{p\in\Theta_0}(\alpha_{\psi_{n,\alpha}}(p)) = \underbrace{\lim_{n\to\infty}\mathbb{P}_{1/2}(\psi_{n,\alpha} = 1) = \alpha}_{CLT}.

Suppose n=124n = 124, Xn^=0.645\hat{X_n} = 0.645. Then Tn=3.229T_n= 3.229. Let α\alpha be the value such that qαˉ/2=Tnq_{\bar{\alpha}/2} = T_n, so that ψn,α\psi_{n,\alpha} has asymptotic level αˉ\bar{\alpha}. Then αˉ\bar{\alpha} is the pp-value of this test. Since TnN(0,1)T_n\to N(0,1) in distribution, we can compute the pp-value as αˉ=104\bar{\alpha} = 10^{-4}.


Profile picture

Written by Jon Lamar: Machine learning engineer, former aspiring mathematician, cyclist, family person.