No Title

Stat 804

Lecture 14 Notes

Our goal in this lecture is to develop asymptotic distribution theory for the sample autocorrelation function. We let $\rho_k$ and $\hat\rho_k$ be the ACF and estimated ACF respectively.

We begin by reducing the behaviour of $\hat\rho_k$ to the behaviour of $\hat{C}$ , the sample autocovariance. Our approach is standard Talyor expansion.

Large sample theory for ratio estimates

Suppose you have pairs (X_n,Y_n) of random variables with

$\begin{displaymath}n^{1/2}(X_n - \mu) \Rightarrow N(0,\sigma^2) \end{displaymath}$

and

$\begin{displaymath}n^{1/2}(Y_n-\nu) \Rightarrow N(0,\tau^2) \end{displaymath}$

We study the large sample behaviour of X_n/Y_n under the assumption that $\nu$ is not 0. We will see that the case $\mu=0$ results in some simplifications. Begin by writing

$\begin{displaymath}\frac{X_n}{Y_n} = \frac{X_n}{\nu+(Y_n-\nu)} = \frac{X_n}{\nu} \frac{1}{1+\epsilon_n} \end{displaymath}$

where

$\begin{displaymath}\epsilon_n = \frac{Y_n-\nu}{\nu} \end{displaymath}$

Notice that $\epsilon_n \to 0$ in probability. We may expand

$\begin{displaymath}\frac{1}{1+\epsilon_n} = \sum_{k=0}^\infty (-\epsilon_n)^k \end{displaymath}$

and then write

$\begin{displaymath}\frac{X_n}{Y_n} = \frac{X_n}{\nu}\sum_{k=0}^\infty (-\epsilon_n)^k \end{displaymath}$

We want to compute the mean of this expression term by term and the variance by using the formula for the variance of the sum and so on. However, what we really do is truncate the infinite sum at some finite number of terms and compute moments of the finite sum. I want to be clear about the distinction; to do so I give an example. Imagine that (X_n,Y_n) has a bivariate normal distribution with means $\mu,\nu$ , variances $\sigma^2/n$ , $\tau^2/n$ and correlation $\rho$ between X_n and Y_n. The quantity X_n/Y_n does not have a well defined mean because $\text{E}\left(\vert X_n/Y_n\vert\right) = \infty$ . Our expansion is still valid, however. Stopping the sum at k=1 leads to the approximation
$\begin{align*}\frac{X_n}{Y_n} & \approx \frac{X_n}{\nu} + \frac{X_n(Y_n-\nu)}{\n... ...\nu} + \frac{\mu(Y_n-\nu)}{\nu^2}+ \frac{(X-n-\mu)(Y_n-\nu)}{\nu^2} \end{align*}$
I now want to look at these terms to decide which are big and which are small. To do so I introduce big O notation:

Definition: : If U_n is a sequence of random variables and a_n>0 a sequence of constants then we write

U_n = O_P(a_n)

if, for each $\epsilon>0$ there is an M (depending on $\epsilon$ but not n) such that

$\begin{displaymath}P(\vert U_n\vert > M\vert a_n\vert) < \epsilon \end{displaymath}$

The idea is that U_n=O_P(a_n) means that U_n is proportional in size to a_n with the ``constant of proportionality'' being a random variable which is not likely to be too large. We also often have use for notation indicating that U_n is actually small compared to a_n.

Definition: : We say U_n=o_P(a_n) if $U_n/a_n \to 0$ in probability: for each $\epsilon>0$

$\begin{displaymath}\lim_{n\to\infty} P(\vert U_n/a_n\vert > \epsilon) = 0 \end{displaymath}$

You can manipulate O_P and o_Pnotation algebraically with a few rules:

1.

If b_n is a sequence of constants such that b_n = ca_n with c>0 then

$\begin{displaymath}U_n = O_P(b_n) \Leftrightarrow U_n =O_P(a_n) \end{displaymath}$

We write

cO_P(a_n) = O_P(a_n)

2.

If U_n = O_P(a_n) and V_n = O_P(b_n) for two sequences a_n and b_n then

U_nV_n =O_P(a_nb_n)

We express this as

O_P(a_n)O_P(b_n) = O_P(a_nb_n)

3.

In particular

b_nO_P(a_n) = O_P(b_na_n)

4.

$\begin{displaymath}O_P(a_n)+O_P(b_n) = O_P(\max(a_n,b_n)) \end{displaymath}$

5.

co_P(a_n) = o_P(a_n)

6.

o_P(a_n)O_P(b_n) = o_P(a_n)o_P(b_n) = o_P(a_nb_n)

7.

In particular b_no_P(a_n) = o_P(b_na_n)

8.

$o_P(a_n)+o_P(b_n) = o_P(\max(a_n,b_n))$

These notions extend Landau's o and O notation to random quantities.

Example: : In our ratio example we have

$\begin{displaymath}X_n = \mu+O_P(n^{-1/2}) \end{displaymath}$

and

$\begin{displaymath}Y_n = \nu+ O_P(n^{-1/2}) \end{displaymath}$

In our geometric expansion

$\begin{displaymath}\epsilon_n^k = O_P(n^{-k/2}) \end{displaymath}$

Look first at the expansion stopped at k=1. We have
$\begin{align*}\frac{X_n}{Y_n} -\frac{\mu}{\nu}& \approx \frac{X_n-\mu}{\nu} - ... ...(Y_n-\nu)}{\nu^2} \\ & = O_P(n^{-1/2}) +O_P(n^{-1/2}) +O_P(n^{-1}) \end{align*}$
(The three terms on the RHS of the first line are being described in terms of roughly how big each is.) If we stop at k=2 we get
$\begin{align*}\frac{X_n}{Y_n} -\frac{\mu}{\nu}& \approx \frac{X_n-\mu}{\nu} - \... ..._P(n^{-1/2}) +O_P(n^{-1/2}) +O_P(n^{-1})+O_P(n^{-1})+ O_P(n^{-3/2}) \end{align*}$

Keeping only terms of order O_P(n^-1/2) we find

$\begin{displaymath}\frac{X_n}{Y_n} -\frac{\mu}{\nu}= \frac{X_n-\mu}{\nu} + \frac{\mu(Y_n-\nu)}{\nu^2} +O_P(n^{-1}) \end{displaymath}$

We now take expected values and discover that up to an error of order n^-1

$\begin{displaymath}\text{E}(X_n/Y_n) = \mu/\nu \end{displaymath}$

BUT you are warned that what is really meant is simply that there is a random variable which is approximately (neglecting something which is probably proportional in size to n^-1)

$\begin{displaymath}\frac{X_n}{Y_n} -\frac{\mu}{\nu} \end{displaymath}$

whose expected value is 0. For the normal example the remainder term in this expansions, that is, the term O_P(n^-1)), is probably small but its expected value is not defined.

To keep terms up to order O_P(n^-1) we have to keep terms out to k=2(In general

$\begin{displaymath}X_n \epsilon_n^k = (\mu+O_P(n^{-1/2})O_P(n^{-k/2}) \end{displaymath}$

For k>2 this is o_P(n^-1) but for k=2 the $\mu O_P(n^{-1})$ term is not negligible. If we retain terms out to k=2 then we get

$\begin{displaymath}\frac{X_n}{Y_n} -\frac{\mu}{\nu}= \frac{X_n-\mu}{\nu} + \fra... ...Y_n-\nu)}{\nu^2} + \frac{\mu(Y_n-\nu)^2}{\nu^3} +O_P(n^{-3/2}) \end{displaymath}$

Taking expected values here we get

$\begin{displaymath}\text{E}\left[\frac{X_n}{Y_n} -\frac{\mu}{\nu}\right] \approx \text{E}\left[(X-n-\mu)(Y_n-\nu)\right]/\nu^2 \end{displaymath}$

up to terms of order n^-1. In the normal case we get

$\begin{displaymath}\text{E}\left[\frac{X_n}{Y_n} -\frac{\mu}{\nu}\right] \approx \rho\sigma\tau/n \end{displaymath}$

In order to compute the approximate variance we ought to compute the second moment of $X_n/Y_n - \mu/\nu$ and subtract the square of the first moment. Imagine you had a random variable of the form

$\begin{displaymath}\sum_{k=1} \frac{W_k}{n^{k/2}} \end{displaymath}$

where I assume that the W_k do not depend on n. The mean, taken term by term would be of the form

$\begin{displaymath}\sum_{k=1} \frac{\eta_k}{n^{k/2}} \end{displaymath}$

and the second moment of the form

$\begin{displaymath}\sum_{j=1}\sum_{k=1} \frac{ \text{E}(W_jW_k)}{n^{(j+k)/2}} \end{displaymath}$

This leads to a variance of the form

$\begin{displaymath}\frac{\text{Var}(W_1)}{n} + \frac{\text{Cov}(W_1,W_2)}{n^{3/2}} + O(n^{-2}) \end{displaymath}$

Our expansion above gave

$\begin{displaymath}W_1 = \frac{n^{1/2}(X_n-\mu)}{\nu} - \frac{\mu n^{1/2}(Y_n-\nu)}{\nu^2} \end{displaymath}$

and

$\begin{displaymath}W_2 = - \frac{n^{1/2}(X_n-\mu) n^{1/2}(Y_n-\nu)}{\nu^2} + \frac{\mu[ n^{1/2}(Y_n-\nu)]^2}{\nu^3} \end{displaymath}$

from which we get the approximate variance

$\begin{displaymath}\left(\frac{\sigma^2}{\nu^2} + \frac{\mu^2 \tau^2}{\nu^4} + \frac{\rho\sigma\tau\mu}{\nu^3}\right)/n + O(n^{-3/2}) \end{displaymath}$

Now I want to apply these ideas to estimation of $\rho_k$ . We make X_n be $\hat{C}(k)$ and Y_n be $\hat{C}(0)$ (and replace n by T). Our first order approximation to $\hat\rho - \rho$ is

$\begin{displaymath}A_1 = T^{1/2}(\hat{C}(k) - C(k))/C(0) - T^{1/2}C(k)(\hat{C}(0)-C(0))/C^2(0) \end{displaymath}$

Our second order approximation would be

$\begin{displaymath}A_1+ [T^{1/2}(\hat{C}(k) - C(k))]^2C(k)/C^3(0) -T^{1/2}(\hat{C}(k) - C(k))T^{1/2}(\hat{C}(0)-C(0))/C^2(0) \end{displaymath}$

I now evaluate means and variances in the special case where $\hat{C}$ has been calculated using a known mean of 0. That is

$\begin{displaymath}\hat{C}(k) = \frac{1}{T} \sum_0^{T-1-k} X_t X_{t+k} \end{displaymath}$

Then

$\begin{displaymath}\text{E}(\hat{C}(k) - C(k)) = -kC(k)/T \end{displaymath}$

$\begin{displaymath}\text{E}(A_1) = -k\rho_k/T^{1/2} \end{displaymath}$

To compute the variance we begin with the second moment which is

$\begin{displaymath}\frac{1}{T^2} \sum_s\sum_t {\rm E}(X_sX_{s+k}X_t X_{t+k}) \end{displaymath}$

The expectations in question involve the fourth order product moments of X and depend on the distribution of the X's and not just on C_X. However, for the interesting case of white noise, we can compute the expected value. For k> 0 you may assume that s<t or s=t since the s> t cases can be figured out by swapping s and t in the s<t case. For s<t the variable X_s is independent of all 3 of X_s+k, X_t and X_t+k. Thus the expectation factors into something containing the factor ${\rm E}(X_s)=0$ . For s=t, we get ${\rm E}(X_s^2){\rm E}(X_{s+k}^2)=\sigma^4$ . and so the second moment is

$\begin{displaymath}\frac{T-k}{T^2}\sigma^4 \end{displaymath}$

This is also the variance since, for k> 0 and for white noise, C_X(k)=0.

For k=0 and s <t or s> t the expectation is simply $\sigma^4$ while for s=t we get ${\rm E}(X_t^4)\equiv \mu_4$ . Thus the variance of the sample variance (when the mean is known to be 0) is

$\begin{displaymath}\frac{T-1}{T} \sigma^4 + \mu_4/T - \sigma^4 = (\mu_4-\sigma^4)/T \, . \end{displaymath}$

For the normal distribution the fourth moment $\mu_4$ is given simply by $3\sigma^4$ .

Having computed the variance it is usual to look at the large sample distribution theory. For k=0 the usual central limit theorem applies to $\sum X_t^2 / T$ (in the case of white noise) to prove that

$\begin{displaymath}\sqrt{T}({\hat C}_X(0) -\sigma^2)/\sqrt{\mu_4} \to N(0,1) \, . \end{displaymath}$

The presence of $\mu_4$ in the formula shows that the approximation is quite sensitive to the assumption of normality.

For k> 0 the theorem needed is called the m-dependent central limit theorem; it shows that

$\begin{displaymath}\sqrt{T} {\hat C}_X(k)/\sigma^2 \to N(0,1) \, . \end{displaymath}$

In each of these cases the assertion is simply that the statistic in question divided by its standard deviation has an approximate normal distribution.

The sample autocorrelation at lag k is

$\begin{displaymath}{\hat C}_X(k)/{\hat C}_X(0) \, . \end{displaymath}$

For k> 0 we can apply Slutsky's theorem to conclude that

$\begin{displaymath}\sqrt{T} {\hat C}_X(k)/{\hat C}_X(0) \to N(0,1) \, . \end{displaymath}$

This justifies drawing lines at $\pm 2/\sqrt{T}$ to carry out a 95% test of the hypothesis that the X series is white noise based on the kth sample autocorrelation.

It is possible to verify that subtraction of $\bar X$ from the observations before computing the sample covariances does not change the large sample approximations, although it does affect the exact formulas for moments.

When the X series is actually not white noise the situation is more complicated. Consider as an example the model

$\begin{displaymath}X_t = \phi X_{t-1} + \epsilon_t \end{displaymath}$

with $\epsilon$ being white noise. Taking

$\begin{displaymath}{\hat C}_X(k) = \frac{1}{T} X_t X_{t+k} \end{displaymath}$

we find that

$\begin{displaymath}T^2{\rm E}({\hat C}_X(k)^2) = \sum_s\sum_t \sum_{u_1} \sum_{u... ...{s-u_1}\epsilon_{s+k-u_2} \epsilon_{t-v_1} \epsilon_{t+k-v_2}) \end{displaymath}$

The expectation is 0 unless either all 4 indices on the $\epsilon$ 's are the same or the indices come in two pairs of equal values. The first case requires u₁=u₂-k and v₁=v₂-k and then s-u₁=t-v₁. The second case requires one of three pairs of equalities: s-u₁=t-v₁ and s-u₂ = t-v₂ or s-u₁=t+k-v₂ and s+k-u₂ = t-v₁ or s-u₁=s+k-u₂ and t-v₁ = t-+k-v₂ along with the restriction that the four indices not all be equal. The actual moment is then $\mu_4$ when all four indices are equal and $\sigma^4$ when there are two pairs. It is now possible to do the sum using geometric series identities and compute the variance of ${\hat C}_X(k)$ . It is not particularly enlightening to finish the calculation in detail. There are versions of the central limit theorem called mixing central limit theorems which can be used for ARMA(p,q) processes in order to conclude that

$\begin{displaymath}\sqrt{T} ( {\hat C}_X(k)-C_X(k))/\sqrt{{\rm Var}({\hat C}_X(k))} \end{displaymath}$

has asymptotically a standard normal distribution and that the same is true when the standard deviation in the denominator is replaced by an estimate. To get from this to distribution theory for the sample autocorrelation is easiest when the true autocorrelation is 0.

The general tactic is the $\delta$ method or Taylor expansion. In this case for each sample size T you have two estimates, say N_T and D_Tof two parameters. You want distribution theory for the ratio R_T = N_T/D_T. The idea is to write R_T=f(N_T,D_T) where f(x,y)=x/y and then make use of the fact that N_T and D_T are close to the parameters they are estimates of. In our case N_Tis the sample autocovariance at lag k which is close to the true autocovariance C_X(k) while the denominator D_T is the sample autocovariance at lag 0, a consistent estimator of C_X(0).

Write

$\begin{eqnarray*}f(N_T,D_T)& = & f(C_X(k),C_X(0)) \cr & & + (N_T-C_X(k))D_1f(C_... ...) \cr & & + (D_T-C_X(0))D_2f(C_X(k),C_X(0)) +\mbox{remainder} \end{eqnarray*}$

If we can use a central limit theorem to conclude that

$\begin{displaymath}(\sqrt{T}(N_T-C_X(k)), \sqrt{T}(D_T-C_X(0))) \end{displaymath}$

has an approximately bivariate normal distribution and if we can neglect the remainder term then

$\begin{displaymath}\sqrt{T}(f(N_T,D_T)-f(C_X(k),C_X(0))) = \sqrt{T}({\hat\rho}(k)-\rho(k)) \end{displaymath}$

has approximately a normal distribution. The notation here is that D_j denotes differentiation with respect to the jth argument of f. For f(x,y) = x/y we have D₁f = 1/y and D₂f = -x/y². When C_X(k)=0 the term involving D₂f vanishes and we simply get the assertion that

$\begin{displaymath}\sqrt{T}({\hat\rho}(k)-\rho(k)) \end{displaymath}$

has the same asymptotic normal distribution as ${\hat C}_X(k)/C_X(0)$ .

Similar ideas can be used for the estimated sample partial ACF.

Portmanteau tests

In order to test the hypothesis that a series is white noise using the distribution theory just given, you have to produce a single statistic to base youre test on. Rather than pick a single value of k the suggestion has been made to consider a sum of squares or a weighted sum of squares of the ${\hat\rho}(k)$ .

A typical statistic is

$\begin{displaymath}T\sum_{k=1}^K {\hat\rho}^2(k) \end{displaymath}$

which, for white noise, has approximately a $\chi_K^2$ distribution. (This fact relies on an extension of the previous computations to conclude that

$\begin{displaymath}\sqrt{T}({\hat \rho}(1), \ldots , {\hat \rho}(K)) \end{displaymath}$

has approximately a standard multivariate distribution. This, in turn, relies on computation of the covariance between ${\hat C}(j)$ and $\hat{C}(k)$ .)

When the parameters in an ARMA(p,q) have been estimated by maximum likelihood the degrees of freedom must be adjusted to K-p-q. The resulting test is the Box-Pierce test; a refined version which takes better account of finite sample properties is the Box-Pierce-Ljung test. S-Plus plots the P-values from these tests for 1 through 10 degrees of freedom as part of the output of arima.diag.

$next$ $up$ $previous$

Richard Lockhart
1999-11-01