Jon's Dev Blog

Stats 1 - Basic concepts

February 13, 2021

This begins a series of notes that I took while studying for a stats class in 2019 on edX here. These notes were incomplete, so take everything with a grain of salt. I was simply trying to understand hypothesis testing from a theoretical point of view.

Modes of Convergence

Definitions

  1. The strongest mode of convergence is actually weaker than pointwise. We say a sequence of random variables X1,X2,X_1, X_2, \ldots converges almost surely to LL and write Xn(a.s.)LX_n\overset{(a.s.)}{\to} L if

    P(XnL)=1,\mathbb{P}(X_n \to L) = 1,

    i.e., the set of points for which the sequence does not converge is negligible.

  2. Another concept of convergence is convergence in quatratic mean, which is simply L2L^2 convergence in the probability space.

  3. We say a sequence of random variables X1,X2,X_1, X_2, \ldots converges in probability to LL and write Xn(p)LX_n\overset{(p)}{\to} L if for all ϵ>0\epsilon > 0,

    P(XnL>ϵ)0,\mathbb{P}(\lvert X_n - L \rvert > \epsilon) \to 0,

    where here convergence is that of real numbers. Both almost sure convergence and LpL^p convergence imply convergence in probability.

  4. We say XnLX_n \to L in distribution if the sequence of CDF functions FXnF_{X_n} converges pointwise to FLF_L. Convergence in probability implies convergence in distribution. The converse is true when LL is a point mass.

  5. We briefly talk about uniform convergence later in the class, which is simply convergence in the LL^\infty space.

Limits of Sequences of Random Variables

  1. Law of Large Numbers: Let X1,X2,X_1, X_2, \ldots be an iid sequence of random variables with finite mean μ\mu. Then Xnμ\overline{X_n} \to \mu in probability. Or maybe only in distribution. I think in most well behaved examples, the convergence is in probability. There is also a strong law of large numbers, which asserts almost sure convergence, but I have not used it in the class and don't remember the conditions.

  2. Central Limit Theorem: Let X1,X2,X_1, X_2, \ldots be an iid sequence of random variables with finite mean μ\mu and finite variance σ2\sigma^2. Then

    n(Xnμ)N(0,σ2),\sqrt{n}(\overline{X_n} - \mu) \to N(0,\sigma^2),

    where convergence is in distribution. It is important to note that CLT holds for multivariate distributions as well, with the obvious generalization to a multivariate Gaussian limit.

The Delta Method

  1. The Delta Method: Let X1,X2,X_1, X_2, \ldots be a sequence of dd-dimensional random vectors and suppose

    n(Xnμ)(d)N(0,Σ).\sqrt{n}(\overline{X}_n - \mu) \overset{(d)}{\to} N(0,\Sigma).

    Then for any continuously differentiable function g:RdRg:\mathbb{R}^d\to\mathbb{R}, we have

    n(g(Xn)g(μ))(d)N(0,(g(μ))TΣg(μ)).\sqrt{n}(g(\overline{X}_n) - g(\mu)) \overset{(d)}{\to} N\big(0, (\nabla g(\mu))^T\cdot\Sigma\cdot\nabla g(\mu)\big).
  2. The most obvious application of the Delta method is in parameter estimation for distributions whose parameter is something other than a simple mean (e.g., exponential distribution). We will see the Delta method again when we derive the maximum likelihood estimator.


Profile picture

Written by Jon Lamar: Machine learning engineer, former aspiring mathematician, cyclist, family person.