Jon's Dev Blog

Stats 4 - Hypothesis Testing II

February 16, 2021

The other hypothesis testing doc covered some extreme basics, and left a lot of questions unanswered (but what about the geometry of error!?). While I probably won't ever have time to explore some of those concerns, this doc will cover some more intermediate topics in hypothesis testing.


A Motivating Example

Suppose we are performing clinical trials, and want to measure the effectiveness of a drug over placebo at reducing cholesterol. Let X1,,XnX_1, \ldots, X_n be the test group measured drop in cholesterol and let Y1,,YmY_1,\ldots, Y_m be the control group measured drop in cholesterol. Then X1,,XnX_1,\ldots,X_n are iid N(Δd,σd2)N(\Delta_d, \sigma_d^2) and Y1,,YmY_1,\ldots,Y_m are iid N(Δc,σc2)N(\Delta_c, \sigma_c^2), and our null hypothesis is that ΔdΔc\Delta_d \geq \Delta_c. Using Slutskey's Lemma and CLT, we have

XnYm(ΔdΔc)σ^d2n+σ^c2m(d)N(0,1).\frac{\overline{X_n} - \overline{Y_m} - (\Delta_d - \Delta_c)}{\sqrt{\frac{\hat{\sigma}^2_d}{n} + \frac{\hat{\sigma}^2_c}{m}}} \overset{(d)}{\to} N(0,1).

However, this requires both sample sizes to go to infinity, and convergence will be extremely slow if they do not do so at the same rate. Since we rarely have a linear relationship between control and test group sizes (for ethical reasons), asymptotic hypothesis tests of the previous document will not be very useful.

Small Sample Sizes

Essentially, the above limit was a statement of the form...

The χ2\chi^2 and tt Distributions

Set the following notation. Let

Sn=1ni=1n(XiXn)2S_n = \frac{1}{n}\sum_{i=1}^n(X_i - \overline{X_n})^2

be the sample variance of a sample of nn independent Gaussian random variables with variance σ2\sigma^2, and let

S~n=1n1i=1n(XiXn)2=nn1Sn\tilde{S}_n = \frac{1}{n-1}\sum_{i=1}^n(X_i - \overline{X_n})^2 = \frac{n}{n-1}S_n

be the unbiased estimator of sample variance.

The χ2\chi^2 Distributions

The χ2\chi^2 distribution with dd degrees of freedom, written χd2\chi^2_d, is the distribution of the sum of the squares of dd independent samples from N(0,1)N(0,1). Equivalently (and more geometrically), if ZN(0,Id)Z \sim N(0, I_d) is a dd-dimensional Gaussian random variable with unit variance, then Z22χd2\lVert Z\rVert_2^2\sim \chi^2_d. This provides a geometric reason why the kurtosis increases with dd: The average magnitude of a Gaussian random vector will increase with the number of dimensions (more positive numbers to sum).

Let Vχd2V \sim \chi^2_d. Then

  • E[V]=d\mathbb{E}[V] = d; Var(V)=2dVar(V) = 2d. For large dd, we have by CLT that
  • χd2N(d,2d)\chi^2_d\approx N(d, 2d).

Cochrane's Theorem


  1. Xn\overline{X_n} is independent of SnS_n.
  2. nSnσ2χn12\frac{nS_n}{\sigma^2} \sim\chi^2_{n-1}

The Student's tt-Distribution

Let ZZ be standard normal, let VV be χd2\chi^2_d, and assume ZZ and VV are independent. Then the random variable

t=ZV/dt = \frac{Z}{\sqrt{V/d}}

has as its distribution the Student's tt distribution with dd degrees of freedom.

The Student's tt Test

This is the first nonasymptotic test we see in this class. One important thing to note is that when we do nonasymptotic hypothesis testing, we cannot escape the fact that we don't know our distribution. This means we always place an assumption on the underlying distribution of the sample. In the case of the Student's tt-test (note the "Student" in "Student's tt test" is there not only for historical reasons, but also to distinguish from Welch's tt-test), we assume that our data are Gaussian.

The One Sample Student's tt-Test

  1. Assume X1,,XnX_1,\ldots,X_n are iid Gaussian N(μ,σ2)N(\mu, \sigma^2).

  2. The null hypothesis is that μ=0\mu=0, and the alternate hypothesis is either that μ0\mu\neq 0 (for the two-sided test), or μ0\mu\geq 0 (for the one-sided test).

  3. The test statistic is

    Tn:=XnS~n/n.T_n := \frac{\overline{X_n}}{\sqrt{\tilde{S}_n/n}}.

Note that under the null, we have

Tn=nXnμ0σS~n/σ2ZV,T_n = \frac{\sqrt{n}\frac{\overline{X_n} - \mu_0}{\sigma}}{\sqrt{\tilde{S}_n/\sigma^2}} \sim \frac{Z}{\sqrt{V}},

where ZN(0,1)Z\sim N(0,1) and Vχ2n1/(n1)V\sim\chi^2*{n-1}/(n-1) (by Cochrane's Theorem). Thus, the test statistic follows a tt distribution with n1n-1 degrees of freedom and therefore its quantiles are known.

The Two Sample Welch's tt-Test

Returning to out cholesterol example, we can consider the scenario of testing for the difference of means of two samples using a tt distribution. As our null hypothesis is that ΔdΔc\Delta_d \geq \Delta_c, or equivalently that ΔdΔc0\Delta_d - \Delta_c \geq 0, we have a test statistic of the form

Tm,n=XnYnσ^d2n+σ^c2m.T_{m,n} = \frac{\overline{X_n} - \overline{Y_n}}{\sqrt{\frac{\hat{\sigma}^2_d}{n} + \frac{\hat{\sigma}^2_c}{m}}}.

Thus, this is a one-sided example of the Welch tt-test. This is opposed to the Student's tt-test, where the test statistic follows a tt distribution. In this case, the test statistic is approximately a tt-distribution, particularly because the denominator involves something that is approximately (and very nearly so) a χ2\chi^2 distribution.

Theorem (Welch-Satterthwaite): We have Tm,ntNT_{m,n} \approx t_N, where

N=(σ^d2/n+σ^c2/m)2σ^d4n2(n1)+σ^c4m2(m1)min(n,m).N = \frac{(\hat{\sigma}^2_d/n + \hat{\sigma}^2_c/m)^2}{\frac{\hat{\sigma}^4_d}{n^2(n-1)} + \frac{\hat{\sigma}^4_c}{m^2(m-1)}}\geq \min(n,m).

Remark: If the variances are known to be equal, the test statistic becomes exactly a tt distribution, hence the test becomes a two sample Student's tt-test.

Tests based on MLEs

Briefly, these are some other tests.

Wald's Test

Consider an iid sample X1,,XnX_1, \ldots, X_n with statistical model (E,{Pθ}θΘ)(E,\{\mathbb{P}_\theta\}_{\theta\in\Theta}), where ΘRd\Theta\subseteq\mathbb{R}^d and let θ0\theta_0 be fixed and given. Let θ\theta^\ast be the true parameter under the model. Consider the null hypothesis H0:θ=θ0H_0: \theta^\ast = \theta_0 and let θ^n\hat{\theta}_n be the MLE.

If H0H_0 is true, then by CLT, we have

nI(θ0)12(θ^nθ0)(d)N(0,Id).\sqrt{n}I(\theta_0)^{\frac{1}{2}}\cdot\big(\hat{\theta}_n - \theta_0) \overset{(d)}{\to} N(0, I_d).

Hence, by plugging in the MLE into the Fisher information, we have a test statistic TnT_n such that

n(θ^nθ0)TI(θ^n)(θ^nθ0)Tn(d)χd2.\underbrace{n\big(\hat{\theta}_n-\theta_0)^TI(\hat{\theta}_n)(\hat{\theta}_n- \theta_0)}_{T_n} \overset{(d)}{\to} \chi_d^2.

Definition: Wald's Test is any test (one or two sided) based on the above test statistic.

Wald's Test For Implicit Hypotheses

Similar to above, suppose our null hypothesis is of the form H0:g(θ)=0H_0: g(\theta)=0 for some continuously differentiable function g:RdRkg:\mathbb{R}^d\to\mathbb{R}^k (with k<dk<d). Suppose an asymptotically normal estimator θ^n\hat{\theta}_n is available with asymptotic covariance Σ(θ)Rd×d\Sigma(\theta) \in \mathbb{R}^{d\times d}. Let

Γ(θ)=g(θ)TΣ(θ)g(θ)Rk×k.\Gamma(\theta) = \nabla g(\theta)^T\Sigma(\theta)\nabla g(\theta) \in\mathbb{R}^{k\times k}.

Then by the Delta method, we have

nΓ(θ)12(g(θ^n)g(θ))(d)N(0,Ik).\sqrt{n}\cdot \Gamma(\theta)^{\frac{1}{2}}\Big(g(\hat{\theta}_n) - g(\theta)\Big) \overset{(d)}{\to} N(0, I_k).

By Slutskey's Theorem, we can plug θ^n\hat{\theta}_n into Γ\Gamma, hence we have a test statistic TnT_n of the form

ng(θ^n)TΓ1(θ^n)g(θ^)Tn(d)χk2.\underbrace{n\cdot g(\hat{\theta}_n)^T\Gamma^{-1}(\hat{\theta}_n)\cdot g(\hat{\theta})}_{T_n} \overset{(d)}{\to} \chi^2_k.

Definition: Wald's Test for Implicit Hypotheses is any test (one or two sided) based on the above test statistic for some function gg.

Profile picture

Written by Jon Lamar: Machine learning engineer, former aspiring mathematician, cyclist, family person.