next up previous


Postscript version of this file

STAT 801 Lecture 7

Reading for Today's Lecture:

Goals of Today's Lecture:

Today's notes

Last time derived the Fourier inversion formula

\begin{displaymath}f_X(x) = \frac{1}{2\pi} \int_{-\infty}^\infty \phi_X(u) e^{-iux} du
\end{displaymath}

and the moment generating function inversion formula

\begin{displaymath}2\pi i f(x) = \int_{-i\infty}^{i\infty} M(z) e^{-zx} dz
\end{displaymath}

(where the limits of integration indicate a contour integral which runs up the imaginary axis.) The methods of complex variables permit this path to be replaced by any contour running up a line like Re(z)=c. (Re(Z) denotes the real part of z, that is, x when z=x+iy with x and y real.) The value of c has to be one for which $M(c) < \infty$. Rewrite the inversion formula using the cumulant generating function $K(t) = \log(M(t))$ to get

\begin{displaymath}2\pi i f(x) = \int_{c-i\infty}^{c+i\infty} \exp(K(z)-zx) dz \, .
\end{displaymath}

Along the contour in question we have z=c+iy so we can think of the integral as being

\begin{displaymath}i\int_{-\infty}^\infty \exp(K(c+iy)-(c+iy)x) dy
\end{displaymath}

Now do a Taylor expansion of the exponent:
\begin{multline*}K(c+iy)-(c+iy)x =
\\ K(c)-cx +iy(K^\prime(c)-x)\\ -y^2 K^{\prime\prime}(c)/2+\cdots
\end{multline*}
Ignore the higher order terms and select a c so that the first derivative

\begin{displaymath}K^\prime(c)-x
\end{displaymath}

vanishes. Such a c is a saddlepoint. We get the formula
\begin{multline*}2\pi f(x) \approx \exp(K(c)-cx)\\ \times \int_{-\infty}^\infty \exp(-y^2
K^{\prime\prime}(c)/2) dy
\end{multline*}
The integral is just a normal density calculation and gives $\sqrt{2\pi/K^{\prime\prime}(c)}$. The saddlepoint approximation is

\begin{displaymath}f(x)=\frac{\exp(K(c)-cx)}{\sqrt{2\pi K^{\prime\prime}(c)}}
\end{displaymath}

Essentially the same idea lies at the heart of the proof of Sterling's approximation to the factorial function:

\begin{displaymath}n! = \int_0^\infty \exp(n\log(x) -x) dx
\end{displaymath}

The exponent is maximized when x=n. For n large we approximate $f(x) = n\log(x) -x$ by

\begin{displaymath}f(x) \approx f(x_0) + (x-x_0) f^\prime(x_0) + (x-x_0)^2
f^{\prime\prime}(x_0)/2
\end{displaymath}

and choose x0 = n to make $f^\prime(x_0) = 0$. Then

\begin{displaymath}n! \approx \int_0^\infty \exp[n\log(n) - n -
(x-n)^2/(2n)] dx
\end{displaymath}

Substitute $y = (x-n)/\sqrt{n}$ to get the approximation

\begin{displaymath}n! \approx n^{1/2}n^n e^{-n} \int_{-\infty}^\infty e^{-y^2/2} dy
\end{displaymath}

or

\begin{displaymath}n! \approx \sqrt{2\pi} n^{n+1/2}e^{-n}
\end{displaymath}

This tactic is called Laplace's method. Note that I am being very sloppy about the limits of integration; to do the thing properly you have to prove that the integral over x not near n is negligible.

Applications of Inversion

1): Numerical calculations

Example: Many statistics have a distribution which is approximately that of

\begin{displaymath}T= \sum \lambda_j Z_j^2
\end{displaymath}

where the Zj are iid N(0,1). In this case
\begin{align*}E(e^{itT}) & = \prod E(e^{it\lambda_j Z_j^2})
\\
& = \prod (1-2it\lambda_j)^{-1/2} \, .
\end{align*}
Imhof (Biometrika, 1961) gives a simplification of the Fourier inversion formula for

FT(x) - FT(0)

which can be evaluated numerically.

Here is how it works:
\begin{align*}F_T(x) - F_T(0) = & \int_0^x f_T(y) dy
\\
= & \int_0^x \frac{1}{2...
...y}^\infty
\\
& \times\prod
(1-2it\lambda_j)^{-1/2} e^{-ity} dt dy
\end{align*}
Multiply

\begin{displaymath}\phi(t) = \left[\frac{1}{\prod(1-2it\lambda_j)}\right]^{1/2}
\end{displaymath}

top and bottom by the complex conjugate of the denominator:

\begin{displaymath}\phi(t) =
\left[\frac{\prod(1+2it\lambda_j)}{\prod(1+4t^2\lambda_j^2)}\right]^{1/2}
\end{displaymath}

The complex number $1+2it\lambda_j$ is $r_j e^{i\theta_j}$where $r_j = \sqrt{1+4t^4\lambda_j^2}$ and $\tan(\theta_j) = 2t\lambda_j$This allows us to rewrite

\begin{displaymath}\phi(t) = \left[\frac{\prod r_j e^{i\sum\theta_j}}{\prod
r_j^2}\right]^{1/2}
\end{displaymath}

or

\begin{displaymath}\phi(t) =
\frac{
e^{i\sum\tan^{-1}(2t\lambda_j)/2}
}{
\prod(1+4t^2\lambda_j^2)^{1/4}
}
\end{displaymath}

Assemble this to give

\begin{displaymath}F_T(x) - F_T(0) =
\frac{1}{2\pi} \int_{-\infty}^\infty
\frac{
e^{i\theta(t)}
}{
\rho(t)
}
\int_0^x e^{-iyt}dy dt
\end{displaymath}

where

\begin{displaymath}\theta(t) = \sum \tan^{-1}(2t\lambda_j) /2\end{displaymath}

and $\rho(t) = \prod(1+4t^2\lambda_j^2)^{1/4}$. But

\begin{displaymath}\int_0^x e^{-iyt}dy = \frac{e^{-ixt}-1}{-it}
\end{displaymath}

We can now collect up the real part of the resulting integral to derive the formula given by Imhof. I don't produce the details here.

2): The central limit theorem (the version which is called ``local'') can be deduced from the Fourier inversion formula: if $X_1,\ldots,X_n$ are iid with mean 0 and variance 1 and $T=n^{1/2}\bar{X}$ then with $\phi$ denoting the characteristic function of a single X we have
\begin{align*}E(e^{itT}) = & E(e^{in^{-1/2} t\sum X_j})
\\
= & \left[\phi(n^{-...
...\
&
\left.+ n^{-1}t^2\phi^{\prime\prime}(0)/2 + o(n^{-1})\right]^n
\end{align*}
But now $\phi(0) = 1$ and

\begin{displaymath}\phi^\prime(t) = \frac{d}{dt} E(e^{itX_1}) = iE(X_1e^{itX_1})
\end{displaymath}

So $\phi^\prime(0) = E(X_1) =0$. Similarly

\begin{displaymath}\phi^{\prime\prime}(t) = i^2 E(X_1^2e^{itX_1})
\end{displaymath}

so that

\begin{displaymath}\phi^{\prime\prime}(0) = -E(X_1^2) =-1
\end{displaymath}

It now follows that
\begin{align*}E(e^{itT}) & \approx [1-t^2/(2n) + o(1/n)]^n
\\
& \to e^{-t^2/2}
\end{align*}
With care we can then apply the Fourier inversion formula and get
\begin{align*}f_T(x) & = \frac{1}{2\pi } \int_{-\infty}^\infty e^{-itx}
[\phi(tn...
...infty e^{-itx} e^{-t^2/2} dt
\\
&=\frac{1}{\sqrt{2\pi}} \phi_Z(-x)
\end{align*}
where $ \phi_Z $ is the characteristic function of a standard normal variable Z. Doing the integral we find

\begin{displaymath}\phi_Z(x) = \phi_Z(-x) = e^{-x^2/2}
\end{displaymath}

so that

\begin{displaymath}f_T(x) \to \frac{1}{\sqrt{2\pi}} e^{-x^2/2}
\end{displaymath}

which is a standard normal random variable.

This proof of the central limit theorem is not terribly general since it requires T to have a bounded continuous density. The usual central limit theorem is a statement about cdfs not densities and is

\begin{displaymath}P(T \le t) \to P(Z \le t)
\end{displaymath}

Convergence in Distribution

In undergraduate courses we often teach the central limit theorem as follows: if $X_1,\ldots,X_n$ are iid from a population with mean $\mu$ and standard deviation $\sigma$ then $n^{1/2}(\bar{X}-\mu)/\sigma$ has approximately a normal distribution. We also say that a Binomial(n,p) random variable has approximately a N(np,np(1-p)) distribution.

To make precise sense of these assertions we need to assign a meaning to statements like ``X and Y have approximately the same distribution''. The meaning we want to give is that X and Y have nearly the same cdf but even here we need some care. If n is a large number is the N(0,1/n) distribution close to the distribution of $X\equiv 0$? Is it close to the N(1/n,1/n) distribution? Is it close to the $N(1/\sqrt{n},1/n)$ distribution? If $X_n\equiv 2^{-n}$ is the distribution of Xn close to that of $X\equiv 0$?

The answer to these questions depends in part on how close close needs to be so it's a matter of definition. In practice the usual sort of approximation we want to make is to say that some random variable X, say, has nearly some continuous distribution, like N(0,1). In this case we must want to calculate probabilities like P(X>x) and know that this is nearly P(N(0,1) > x). The real difficulty arises in the case of discrete random variables; in this course we will not actually need to approximate a distribution by a discrete distribution.

When mathematicians say two things are close together they mean one of two things: either they can provide an upper bound on the distance between the two things or they are talking about taking a limit. In this course we do the latter.

Definition: A sequence of random variables Xn converges in distribution to a random variable X if

\begin{displaymath}E(g(X_n)) \to E(g(X))
\end{displaymath}

for every bounded continuous function g.

Theorem 1   The following are equivalent:
1.
Xn converges in distribution to X.
2.
$P(X_n \le x) \to P(X \le x)$ for each x such that P(X=x)=0
3.
The limit of the characteristic functions of Xn is the characteristic function of X:

\begin{displaymath}E(e^{itX_n}) \to E(e^{itX})
\end{displaymath}

for every real x.
These are all implied by

\begin{displaymath}M_{X_n}(t) \to M_X(t) < \infty
\end{displaymath}

for all $\vert t\vert \le \epsilon$ for some positive $\epsilon$.

Now let's go back to the questions I asked:

Here is the message you are supposed to take away from this discussion. You do distributional approximations by showing that a sequence of random variables Xn converges to some X. The limit distribution should be non-trivial, like say N(0,1). We don't say Xn is approximately N(1/n,1/n) but that n1/2 Xn converges to N(0,1) in distribution.


next up previous



Richard Lockhart
2000-01-26