The other hypothesis testing doc covered some extreme basics, and left a lot of questions unanswered (but what about the geometry of error!?). While I probably won't ever have time to explore some of those concerns, this doc will cover some more intermediate topics in hypothesis testing.
$t$Tests
A Motivating Example
Suppose we are performing clinical trials, and want to measure the effectiveness of a drug over placebo at reducing cholesterol. Let $X_1, \ldots, X_n$ be the test group measured drop in cholesterol and let $Y_1,\ldots, Y_m$ be the control group measured drop in cholesterol. Then $X_1,\ldots,X_n$ are iid $N(\Delta_d, \sigma_d^2)$ and $Y_1,\ldots,Y_m$ are iid $N(\Delta_c, \sigma_c^2)$, and our null hypothesis is that $\Delta_d \geq \Delta_c$. Using Slutskey's Lemma and CLT, we have
However, this requires both sample sizes to go to infinity, and convergence will be extremely slow if they do not do so at the same rate. Since we rarely have a linear relationship between control and test group sizes (for ethical reasons), asymptotic hypothesis tests of the previous document will not be very useful.
Small Sample Sizes
Essentially, the above limit was a statement of the form...
The $\chi^2$ and $t$ Distributions
Set the following notation. Let
be the sample variance of a sample of $n$ independent Gaussian random variables with variance $\sigma^2$, and let
be the unbiased estimator of sample variance.
The $\chi^2$ Distributions
The $\chi^2$ distribution with $d$ degrees of freedom, written $\chi^2_d$, is the distribution of the sum of the squares of $d$ independent samples from $N(0,1)$. Equivalently (and more geometrically), if $Z \sim N(0, I_d)$ is a $d$dimensional Gaussian random variable with unit variance, then $\lVert Z\rVert_2^2\sim \chi^2_d$. This provides a geometric reason why the kurtosis increases with $d$: The average magnitude of a Gaussian random vector will increase with the number of dimensions (more positive numbers to sum).
Let $V \sim \chi^2_d$. Then
 $\mathbb{E}[V] = d$; $Var(V) = 2d$. For large $d$, we have by CLT that
 $\chi^2_d\approx N(d, 2d)$.
Cochrane's Theorem
Theorem:
 $\overline{X_n}$ is independent of $S_n$.
 $\frac{nS_n}{\sigma^2} \sim\chi^2_{n1}$
The Student's $t$Distribution
Let $Z$ be standard normal, let $V$ be $\chi^2_d$, and assume $Z$ and $V$ are independent. Then the random variable
has as its distribution the Student's $t$ distribution with $d$ degrees of freedom.
The Student's $t$ Test
This is the first nonasymptotic test we see in this class. One important thing to note is that when we do nonasymptotic hypothesis testing, we cannot escape the fact that we don't know our distribution. This means we always place an assumption on the underlying distribution of the sample. In the case of the Student's $t$test (note the "Student" in "Student's $t$ test" is there not only for historical reasons, but also to distinguish from Welch's $t$test), we assume that our data are Gaussian.
The One Sample Student's $t$Test

Assume $X_1,\ldots,X_n$ are iid Gaussian $N(\mu, \sigma^2)$.

The null hypothesis is that $\mu=0$, and the alternate hypothesis is either that $\mu\neq 0$ (for the twosided test), or $\mu\geq 0$ (for the onesided test).

The test statistic is
$T_n := \frac{\overline{X_n}}{\sqrt{\tilde{S}_n/n}}.$
Note that under the null, we have
where $Z\sim N(0,1)$ and $V\sim\chi^2*{n1}/(n1)$ (by Cochrane's Theorem). Thus, the test statistic follows a $t$ distribution with $n1$ degrees of freedom and therefore its quantiles are known.
The Two Sample Welch's $t$Test
Returning to out cholesterol example, we can consider the scenario of testing for the difference of means of two samples using a $t$ distribution. As our null hypothesis is that $\Delta_d \geq \Delta_c$, or equivalently that $\Delta_d  \Delta_c \geq 0$, we have a test statistic of the form
Thus, this is a onesided example of the Welch $t$test. This is opposed to the Student's $t$test, where the test statistic follows a $t$ distribution. In this case, the test statistic is approximately a $t$distribution, particularly because the denominator involves something that is approximately (and very nearly so) a $\chi^2$ distribution.
Theorem (WelchSatterthwaite): We have $T_{m,n} \approx t_N$, where
Remark: If the variances are known to be equal, the test statistic becomes exactly a $t$ distribution, hence the test becomes a two sample Student's $t$test.
Tests based on MLEs
Briefly, these are some other tests.
Wald's Test
Consider an iid sample $X_1, \ldots, X_n$ with statistical model $(E,\{\mathbb{P}_\theta\}_{\theta\in\Theta})$, where $\Theta\subseteq\mathbb{R}^d$ and let $\theta_0$ be fixed and given. Let $\theta^\ast$ be the true parameter under the model. Consider the null hypothesis $H_0: \theta^\ast = \theta_0$ and let $\hat{\theta}_n$ be the MLE.
If $H_0$ is true, then by CLT, we have
Hence, by plugging in the MLE into the Fisher information, we have a test statistic $T_n$ such that
Definition: Wald's Test is any test (one or two sided) based on the above test statistic.
Wald's Test For Implicit Hypotheses
Similar to above, suppose our null hypothesis is of the form $H_0: g(\theta)=0$ for some continuously differentiable function $g:\mathbb{R}^d\to\mathbb{R}^k$ (with $k<d$). Suppose an asymptotically normal estimator $\hat{\theta}_n$ is available with asymptotic covariance $\Sigma(\theta) \in \mathbb{R}^{d\times d}$. Let
Then by the Delta method, we have
By Slutskey's Theorem, we can plug $\hat{\theta}_n$ into $\Gamma$, hence we have a test statistic $T_n$ of the form
Definition: Wald's Test for Implicit Hypotheses is any test (one or two sided) based on the above test statistic for some function $g$.