next up previous


Postscript version of these notes

Stat 804

Lecture 13 Notes

Non Gaussian series.

The fitting methods we have studied are based on the likelihood for a normal fit. However, the estimates work reasonably well even if the errors are not normal.

Example: AR(1) fit. We fit $X_t - \mu = \rho(X_{t-1} - \mu) + \epsilon_t$using $\hat\mu =\bar{X}$ which is consistent for non-Gaussian errors. (In fact

\begin{displaymath}(1-\rho)\sum_0^{T-1}X_t +\rho X_{T-1} -X_0 =(T-1)(1-\rho)\mu + \sum_0^{T-1} \epsilon_t -
\epsilon_0 ;
\end{displaymath}

divide by T and apply the law of large numbers to $\bar{\epsilon}$ to see that $\bar{X}$ is consistent.)

Here is an outline of the logic of what follows. We will assume that the errors are iid mean 0, variance $\sigma^2$ and finite fourth moment $\mu_4 = \text{E}(\epsilon_t^4)$. We will not assume that the errors have a normal distribution.

1.
The estimates of $\rho$ and $\sigma$ are consistent.

2.
The score function satisfies

\begin{displaymath}T^{-1/2} U(\theta_0) \Rightarrow MVN(0, B)
\end{displaymath}

where

\begin{displaymath}B = \left[\begin{array}{cc}
\frac{1}{1-\rho^2} & 0 \\ 0 & \frac{ \mu_4-\sigma^4}{\sigma^6}
\end{array}\right]
\end{displaymath}

3.
The matrix of second derivatives satisfies

\begin{displaymath}\lim_{T\to \infty} -\frac{1}{T}\frac{\partial U}{\partial\the...
...1}{T}\text{E}\left(\frac{\partial U}{\partial\theta}\right)= A
\end{displaymath}

where

\begin{displaymath}A = \left[\begin{array}{cc}
\frac{1}{1-\rho^2} & 0 \\ 0 & \frac{2}{\sigma^2}
\end{array}\right]
\end{displaymath}

4.
If $\cal I$ is the (conditional) Fisher information then

\begin{displaymath}\lim\lim_{T\to \infty} \frac{1}{T} {\cal I} = A
\end{displaymath}

5.
We can expand $U(\hat\theta)$ about $\theta_0$ and get

\begin{displaymath}\hat\theta - \theta = \left[\frac{1}{T} I(\theta_0) \right]^{...
...eft[
T^{-1/2} U(\theta_0)\right] + \text{negligible remainder}
\end{displaymath}

6.
So

\begin{displaymath}\hat\theta - \theta \approx MVN(0,A^{-1} B A^{-1}) = MVN(0,\Sigma)
\end{displaymath}

where

\begin{displaymath}\Sigma = A^{-1} B A^{-1} = \left[\begin{array}{cc}
1-\rho^2 & 0 \\ 0 & \frac{\mu_4-\sigma^4}{4\sigma^2}
\end{array}\right]
\end{displaymath}

7.
So $T^{1/2}(\hat\rho-\rho) \Rightarrow N(0,1-\rho^2)$ even for non-normal errors.

8.
On the other hand the estimate of $\sigma$ has a limiting distribution which will be different for non-normal errors (because it depends on $\mu_4$ which is $3\sigma^4$ for normal errors and something else in general for non-normal errors).

Here are details.

Consistency: One of our many nearly equivalent estimates of $\rho$ is

\begin{displaymath}\hat\rho = \frac{\sum (X_t-\bar{X}) (X_{t-1}-\bar{X})}{\sum(X_t
-\bar{X})^2}
\end{displaymath}

Divide both top and bottom by T. You need essentially to prove

\begin{displaymath}T^{-1} \sum(X_t-\mu)(X_{t-1}-\mu) \to C(1)
\end{displaymath}

and

\begin{displaymath}T^{-1} \sum(X_t-\mu)^2 \to C(0)
\end{displaymath}

Each of these is correct and hinges on the fact that these linear processes are ergodic -- long time averages converge to expected values. For these particular averages it is possible to compute means and variances and prove that the mean squared error converges to 0.

Score function: asymptotic normality

The score function is

\begin{displaymath}U(\rho,\sigma) = \left[\begin{array}{c}
\frac{\sum X_{t-1}(X_...
... X_{t-1})^2}{\sigma^3}- \frac{T-1}{\sigma}
\end{array}\right]
\end{displaymath}

If $\rho$ and $\sigma$ are the true values of the parameters then

\begin{displaymath}U(\rho,\sigma) = \left[\begin{array}{c}
\frac{\sum X_{t-1}\ep...
...\epsilon_t^2}{\sigma^3}- \frac{T-1}{\sigma}
\end{array}\right]
\end{displaymath}

I claim that $T^{-1/2} U(\rho,\sigma) \Rightarrow MVN(0,B)$. This is proved by the martingale central limit theorem. Technically you fix an $a \in R^2$ and study $T^{-1/2}a^t U(\rho,\sigma)$, proving that the limit is N(0,at B a). I do here only the special cases a=(1,0)t and a=(0,1)t. The second of these is simply

\begin{displaymath}T^{-1/2} \sum(\epsilon_i^2 - \sigma^2) /\sigma^3
\end{displaymath}

which converges by the usual CLT to $N(0,(\mu_4-\sigma^4)/\sigma^6)$. For a=(1,0)t the claim is that

\begin{displaymath}T^{-1/2} \sum X_{t-1}\epsilon_t \Rightarrow N(0,C(0)\sigma^2)
\end{displaymath}

because $C(0) = \sigma^2/(1-\rho^2)$.

To prove this assertion we define for each T a martingale MT,k for $k=1,\ldots,T$ where

\begin{displaymath}M_{T,k} = \sum_1^T D_{T,i}
\end{displaymath}

with

\begin{displaymath}D_{T,i} = T^{-1/2} X_{i-1}\epsilon_i
\end{displaymath}

The martingale property is that

\begin{displaymath}\text{E}(M_{T,k+1}\vert \epsilon_{t-1},\epsilon_{t-2}, \ldots) = M_{T,k}
\end{displaymath}

The martingale central limit theorem (Hall, P. and Heyde, C. C. (1980). Martingale limit theory and its application. New York: Academic Press.) states that

\begin{displaymath}M_{T,T} \Rightarrow N(0,b)
\end{displaymath}

provided that

\begin{displaymath}\sum_k D_{T,k}^2 \to b
\end{displaymath}

and provided that an analogue of Lindeberg's condition holds. Here I check only the former condition:

\begin{displaymath}\sum_k D_{T,k}^2 = \frac{1}{T} \sum X_{t-1}^2 \epsilon_t^2 \to
\text{E}(X_0^2 \epsilon_1^2) = C(0)\sigma^2
\end{displaymath}

(by the ergodic theorem or you could compute means and variances).

Second derivative matrix and Fisher information: the matrix of negative second derivatives is

\begin{displaymath}-\frac{\partial U}{\partial\theta} = \left[\begin{array}{cc}
...
...
X_{t-1})^2}{\sigma^4} -\frac{T-1}{\sigma^2}\end{array}\right]
\end{displaymath}

If you evaluate at the true parameter value and divide by T the matrix and the expected value of the matrix converge to

\begin{displaymath}A= \left[\begin{array}{cc}
\frac{C(0)}{\sigma^2} & 0 \\ 0 & \frac{2}{\sigma^2}
\end{array}\right]
\end{displaymath}

(Again this uses the ergodic theorem or a variance calculation.)

Taylor expansion: In the next step we are supposed to prove that a random vector has a MVN limit. The usual tactic to prove this uses the so called Cramér-Wold device -- you prove that each linear combination of the entries in the vector has a univariate normal limit. Then $U(\hat\rho,\hat\sigma) = 0$ and Taylor's theorem is that

\begin{displaymath}0=U(\hat\rho,\hat\sigma) = U(\rho,\sigma) +
\left[\frac{\partial U(\theta)}{\partial\theta}\right] (\hat\theta -
\theta)
+R
\end{displaymath}

(Here we are using $\theta^t = (\rho,\sigma)$ and R is a remainder term -- a random variable with the property that

\begin{displaymath}P(\vert\vert R\vert\vert/\vert\vert U(\theta)\vert\vert > \eta)
\to 0
\end{displaymath}

for each $\eta > 0$.) Multiply through by

\begin{displaymath}\left[\frac{\partial U(\theta)}{\partial\theta}\right] ^{-1}
\end{displaymath}

and get

\begin{displaymath}T^{1/2}(\hat\theta - \theta) = \left[-T^{-1}\frac{\partial
U(...
...artial\theta}\right] ^{-1} (T^{-1/2}U(\rho,\sigma) +T^{-1/2}R)
\end{displaymath}

It is possible with care to prove that

\begin{displaymath}\left[-T^{-1}\frac{\partial
U(\theta)}{\partial\theta}\right] ^{-1}(T^{-1/2}R)
\to 0
\end{displaymath}

Asymptotic normality: This is a consequence of Slutsky's theorem applied to the Taylor expansion and the results above for U and I. According to Slutsky's theorem the asymptotic distribution of $T^{1/2}(\hat\theta - \theta)$ is the same as that of

\begin{displaymath}A^{-1} (T^{-1/2}U(\rho,\sigma))
\end{displaymath}

which converges in distribution to MVN(0,A-1 B (A-1)t). Now since $C(0) = \sigma^2/(1-\rho^2)$

\begin{displaymath}A^{-1} B (A^{-1})^t = \left[\begin{array}{cc}
1-\rho^2 & 0 \\ 0 & \frac{\mu_4-\sigma^4}{4\sigma^4}
\end{array}\right]
\end{displaymath}

Behaviour of $\hat\rho$: pick off the first component and find

\begin{displaymath}T^{1/2}(\hat\rho - \rho) \Rightarrow N(0,1-\rho^2)
\end{displaymath}

Notice that this answer is the same for normal and non-normal errors.

Behaviour of $\hat\sigma$: on the other hand

\begin{displaymath}T^{1/2}(\hat\sigma - \sigma) \Rightarrow N(0,(\mu_4-\sigma^4)/(4\sigma^2))
\end{displaymath}

which has $\mu_4$ in it and will match the normal theory limit if and only if $\mu_4 = 3\sigma^4$.

More general models: For an ARMA(p,q) model the parameter vector is

\begin{displaymath}\theta = (a_1,\ldots,a_p,b_1,\ldots,b_q,\sigma)^t \, .
\end{displaymath}

In general the matrices B and A are of the form

\begin{displaymath}B=\left[\begin{array}{cc} B_1 & 0 \\ 0 & \frac{\mu_4-\sigma^4}{\sigma^6}
\end{array}\right]
\end{displaymath}

and

\begin{displaymath}A = \left[\begin{array}{cc} A_1 & 0 \\ 0 & \frac{2}{\sigma^2}
\end{array}\right]
\end{displaymath}

where A1 = B1 and A1 is a function of the parameters $a_1,\ldots,a_p,b_1,\ldots,b_q$ only and is the same for both normal and non-normal data.

Model assessment.

Having fitted an ARIMA model you get (essentially automatically) fitted residuals $\hat\epsilon$. Most of the fitting methods lead to fewer residuals than there were in the original series. Since the parameter estimates are consistent (if the model fitted is correct, of course) the fitted residuals should be essentially the true $\epsilon_t$which is white noise. We will assess this by plotting the estimated ACF of $\hat\epsilon$ and then seeing if the estimates are all close enough to 0 to pass for white noise.

To judge close enough we need asymptotic distribution theory for autocovariance estimates.


next up previous



Richard Lockhart
1999-10-27