next up previous


Postscript version of this file

STAT 801 Lecture 8

Reading for Today's Lecture: ?

Goals of Today's Lecture:

Today's notes

Last time we used the Fourier inversion formula to prove the local central limit theorem:

Framework:

We concluded

\begin{displaymath}E(e^{itT}) = \left[\phi(n^{-1/2}t)\right]^n
\end{displaymath}

We differentiated $\phi$ to obtain
\begin{align*}\phi(0) & = 1
\\
\phi^\prime(0) & = i E(X_1) = 0
\\
\phi^{\prime\prime}(0) & = -E(X_1^2) = -1
\end{align*}

It now follows that
\begin{align*}E(e^{itT}) & \approx [1-t^2/(2n) + o(1/n)]^n
\\
& \to e^{-t^2/2}
\end{align*}

Apply the Fourier inversion formula to deduce

\begin{displaymath}f_T(x) \to \frac{1}{\sqrt{2\pi}} e^{-x^2/2}
\end{displaymath}

which is the standard normal random density.

This proof of the central limit theorem is not terribly general since it requires T to have a bounded continuous density. The usual central limit theorem is a statement about cdfs not densities and is

\begin{displaymath}P(T \le t) \to P(Z \le t)
\end{displaymath}

Convergence in Distribution

In undergraduate courses we often teach the central limit theorem as follows: if $X_1,\ldots,X_n$ are iid from a population with mean $\mu$ and standard deviation $\sigma$ then $n^{1/2}(\bar{X}-\mu)/\sigma$ has approximately a normal distribution. We also say that a Binomial(n,p) random variable has approximately a N(np,np(1-p)) distribution.

To make precise sense of these assertions we need to assign a meaning to statements like ``X and Y have approximately the same distribution''. The meaning we want to give is that X and Y have nearly the same cdf but even here we need some care. If n is a large number is the N(0,1/n) distribution close to the distribution of $X\equiv 0$? Is it close to the N(1/n,1/n) distribution? Is it close to the $N(1/\sqrt{n},1/n)$ distribution? If $X_n\equiv 2^{-n}$ is the distribution of Xn close to that of $X\equiv 0$?

The answer to these questions depends in part on how close close needs to be so it's a matter of definition. In practice the usual sort of approximation we want to make is to say that some random variable X, say, has nearly some continuous distribution, like N(0,1). In this case we must want to calculate probabilities like P(X>x) and know that this is nearly P(N(0,1) > x). The real difficulty arises in the case of discrete random variables; in this course we will not actually need to approximate a distribution by a discrete distribution.

When mathematicians say two things are close together: either there is an upper bound on the distance between the two things or they are talking about taking a limit. In this course we do the latter.

Definition: A sequence of random variables Xn converges in distribution to a random variable X if

\begin{displaymath}E(g(X_n)) \to E(g(X))
\end{displaymath}

for every bounded continuous function g.

Theorem: The following are equivalent:

1.
Xn converges in distribution to X.
2.
$P(X_n \le x) \to P(X \le x)$ for each x such that P(X=x)=0
3.
The characteristic functions of the Xn converge to the characteristic function of X:

\begin{displaymath}E(e^{itX_n}) \to E(e^{itX})
\end{displaymath}

for every real x.
These are all implied by

\begin{displaymath}M_{X_n}(t) \to M_X(t) < \infty
\end{displaymath}

for all $\vert t\vert \le \epsilon$ for some positive $\epsilon$.

Now let's go back to the questions I asked:

Here is the message you are supposed to take away from this discussion. You do distributional approximations by showing that a sequence of random variables Xn converges to some X. The limit distribution should be non-trivial, like say N(0,1). We don't say Xn is approximately N(1/n,1/n) but that n1/2 Xn converges to N(0,1) in distribution.

The Central Limit Theorem

If $X_1, X_2, \cdots$ are iid with mean 0 and variance 1 then $n^{1/2}\bar{X}$ converges in distribution to N(0,1). That is,

\begin{displaymath}P(n^{1/2}\bar{X} \le x ) \to \frac{1}{\sqrt{2\pi}} \int_{-\infty}^x e^{-y^2/2} dy
\, .
\end{displaymath}

Proof: As before

\begin{displaymath}E(e^{itn^{1/2}\bar{X}}) \to e^{-t^2/2}
\end{displaymath}

This is the characteristic function of a N(0,1) random variable so we are done by our theorem.

Edgeworth expansions

Suppose that X is a random variable with mean 0, variance 1 and $E(X^3)=\gamma$. If $\phi$ is the characteristic function of X, then

\begin{displaymath}\phi(t) \approx 1 -t^2/2 -i\gamma t^3/6 + \cdots
\end{displaymath}

keeping one more term. Then

\begin{displaymath}\log(\phi(t)) =\log(1+u)
\end{displaymath}

where

\begin{displaymath}u=-t^2/2 -i \gamma t^3/6 + \cdots
\end{displaymath}

Use $\log(1+u) = u-u^2/2 + \cdots$ to get
\begin{multline*}\log(\phi(t)) \approx
\\ [-t^2/2 -i\gamma t^3/6 +\cdots]
\\
-[\cdots]^2/2 +\cdots
\end{multline*}
which rearranged is

\begin{displaymath}\log(\phi(t)) \approx -t^2/2 -i\gamma t^3/6 + \cdots
\end{displaymath}

Now apply this calculation to the characteristic function of $T=n^{1/2}\bar{X}$ where $\bar{X}$ is the mean of a sample of size n. Then

\begin{displaymath}\log(\phi_T(t)) \approx -t^2/2 -i E(T^3) t^3/6 + \cdots
\end{displaymath}

Remember $E(T^3) = \gamma/\sqrt{n}$ and exponentiate to get

\begin{displaymath}\phi_T(t) \approx e^{-t^2/2} \exp\{-i\gamma t^3/(6\sqrt{n}) + \cdots\}
\end{displaymath}

You can do a Taylor expansion of the second exponential around 0 because of the square root of n and get

\begin{displaymath}\phi_T(t) \approx e^{-t^2/2} (1-i\gamma t^3/(6\sqrt{n}))
\end{displaymath}

neglecting higher order terms. This approximation to the characteristic function of T can be inverted to get an Edgeworth approximation to the density (or distribution) of T which looks like

\begin{displaymath}f_T(x) \approx \frac{1}{\sqrt{2\pi}} e^{-x^2/2} [1-\gamma
(x^3-3x)/(6\sqrt{n}) + \cdots]
\end{displaymath}

Remarks:

1.
The error using the central limit theorem to approximate a density or a probability is proportional to n-1/2

2.
This is improved to n-1 for symmetric densities for which $\gamma=0$.

3.
These expansions are asymptotic. This means that the series indicated by $\cdots$ usually does not converge. When n=25 it may help to take the second term but get worse if you include the third or fourth or more.

4.
You can integrate the expansion above for the density to get an approximation for the cdf.

Multivariate convergence in distribution

Definition: $X_n\in R^p$ converges in distribution to $X\in R^p$ if

\begin{displaymath}E(g(X_n)) \to E(g(X))
\end{displaymath}

for each bounded continuous real valued function g on Rp.

This is equivalent to either of

Cramér Wold Device: atXn converges in distribution to at X for each $a \in R^p$

or

Convergence of characteristic functions:

\begin{displaymath}E(e^{ia^tX_n}) \to E(e^{ia^tX})
\end{displaymath}

for each $a \in R^p$.

Extensions of the CLT


1.
If $Y_1,Y_2,\cdots$ are iid in Rp and each has mean $\mu$and variance covariance $\Sigma$ then $n^{1/2}(\bar{Y}-\mu) $ converges in distribution to $MVN(0,\Sigma)$.


2.
If for each n we have a set of independent mean 0 random variables $X_{n1},\ldots,X_{nn}$ with
\begin{align*}E(X_{ni}) & =0
\\
Var(\sum_i X_{ni}) & = 1
\\
\sum E(\vert X_{ni}\vert^3) \to 0
\end{align*}
then $\sum_i X_{ni}$ converges to N(0,1). This is the Lyapunov central limit theorem.


3.
Replace the third moment condition with

\begin{displaymath}\sum E(X_{ni}^2 1(\vert X_{ni}\vert > \epsilon)) \to 0
\end{displaymath}

for each $\epsilon > 0$ then again $\sum_i X_{ni}$ converges in distribution to N(0,1). This is the Lindeberg central limit theorem.

4.
There are extensions to rvs which aren't independent: the m-dependent central limit theorem, the martingale central limit theorem, the central limit theorem for mixing processes.

5.
Many important random variables are not sums of independent random variables. We handle these with Slutsky's theorem and the $\delta$ method.

Slutsky's Theorem: If Xn converges in distribution to Xand Yn converges in distribution (or in probability) to c, a constant, then Xn+Yn converges in distribution to X+c.

Warning: the hypothesis that the limit of Yn be constant is essential.

The delta method: Suppose a sequence Yn of random variables converges to some y a constant and that if we define Xn = an(Yn-y) then Xn converges in distribution to some random variable X. Suppose that f is a differentiable function on the range of Yn. Then an(f(Yn)-f(y)) converges in distribution to $ f^\prime(y) X$. If Xn is in Rp and f maps Rp to Rq then $f^\prime$ is the $q\times p$ matrix of first derivatives of components of f.

Example: Suppose $X_1,\ldots,X_n$ are a sample from a population with mean $\mu$, variance $\sigma^2$, and third and fourth central moments $\mu_3$ and $\mu_4$. Then

\begin{displaymath}n^{1/2}(s^2-\sigma^2) \Rightarrow N(0,\mu_4-\sigma^4)
\end{displaymath}

where $\Rightarrow $ is notation for convergence in distribution. For simplicity I define $s^2 = \overline{X^2} -{\bar{X}}^2$.

Take $Y_n =(\overline{X^2},\bar{X})$. Then Yn converges to $y=(\mu^2+\sigma^2,\mu)$. Take an = n1/2. Then

n1/2(Yn-y)

converges in distribution to $MVN(0,\Sigma)$ with

\begin{displaymath}\Sigma = \left[\begin{array}{cc} \mu_4-\sigma^4 & \mu_3 -\mu(...
...2)\\
\mu_3-\mu(\mu^2+\sigma^2) & \sigma^2 \end{array} \right]
\end{displaymath}

Define f(x1,x2) = x1-x22. Then s2 = f(Yn). The gradient of f has components (1,-2x2). This leads to
\begin{multline*}n^{1/2}(s^2-\sigma^2) \approx
\\
n^{1/2}[1, -2\mu]
\left[\be...
...ne{X^2} -
(\mu^2 + \sigma^2)
\\
\bar{X} -\mu
\end{array}\right]
\end{multline*}
which converges in distribution to $(1,-2\mu) Y$. This rv is $N(0,a^t \Sigma a)=N(0, \mu_4-\sigma^2)$ where $a=(1,-2\mu)^t$.

Remark: In this sort of problem it is best to learn to recognize that the sample variance is unaffected by subtracting $\mu$ from each X. Thus there is no loss in assuming $\mu=0$ which simplifies $\Sigma$ and a.


next up previous



Richard Lockhart
2000-02-02