Postscript version of these notes
STAT 804: Notes on Lecture 6
The Multivariate Normal Distribution
Definition:
iff
Definition:
iff
(a column vector for later use) with the
Zi independent and each
.
In this case,
where the superscript t denotes matrix transpose.
Definition:
has a multivariate normal distribution if it
has the same distribution as
for some
,
some
matrix of constants A and
.
Lemma: The matrix A can be taken to be square with
no loss of generality.
Proof: The simplest proof involves multivariate characteristic
functions:
You check that for
we have
Then
Since the result depends only on
and
the distribution
is the same for any two A giving the same AAT. In particular given an
A which is
we need only use ideas of Cholesky decomposition
to find a square B with BBT=AAT.
I omit a proof.
If the matrix A is singular then X will not have a density. If A is
invertible then we can derive the multivariate normal density
by the change of variables formula:
We get
where
.
Properties of the MVN distribution
- 1.
- If
then
follows easily from our
definition. (You work with the real definition not densities,
etc.)
- 2.
- All margins are multivariate normal: if
and
then
implies that
.
This is really a special case of the previous property.
- 3.
- All conditionals are normal: the conditional distribution of X1given X2=x2 is
To prove this you have to do algebra with the definition of the conditional
density. It is possible to give a sensible formula even if
is
singular.
Partial Correlations
If X is an AR(p) process then
If you hold
fixed then
Xt is
plus a fixed constant. Since
is independent of Xs for all s<t
we see that given
the variables
Xt and Xt-r for r>p are independent.
This would imply
for all q>p. When the process X is Gaussian our results
above on conditional distributions guarantee that we can
calculate this partial autocovariance from the variance
covariance matrix of the vector
.
If we partition this
vector into a first piece with 2 entries and a second piece
with p entries we find that the conditional variance covariance
matrix of
(Xt,Xt-q) given the others is
In this formula
is the 2 by 2 unconditional covariance
matrix of
(Xt,Xt-q), namely,
The matrix
is the
Toeplitz matrix with
C(0) down the diagonal, C(1) down the first sub and super diagonal,
and so on. The matrix
is the
matrix
Once you have then calculated the resulting
matrix
you pick out the off diagonal element; this is the partial
autocovariance. Even when the data are not Gaussian we will be
doing this piece of arithmetic with the autocovariance to define
the partial autocovariance.
To use this idea we define the partial autocorrelation function
to be
If h=1 then there are no Xs between Xt and Xt-h so
there is nothing to condition on. This makes
PACF(1)=ACF(1).
Qualitative idea: A plot of the sample partial
autocorrelation function (derived by replacing CX by in the definition of PACF) is examined. For an AR(p) process
this sample plot ought to drop suddenly to 0 for h>p.
AR(1) example
Now I calculate PACF(2) for a mean 0 AR(1) process. We have
.
Let
with
Y=(Xt,Xt-2)T and
Z= Xt-1. Then (Y,Z)T has
a
distribution with
Following the partitioning into Y and Z we find
and
We find
Now to compute the correlation we divide the conditional covariance by the
product of the two conditional SDs, that is:
Consider the problem of prediction Xt knowing
.
In this course we choose a predictor
to minimize the mean squared prediction error:
The solution is to take
For multivariate normal data this predictor is the usual regression
solution:
where the matrix A is
in the notation
of the earlier notes. This can be written in the form
for suitable constants .
Let
.
The
minimize
Take the derivative with respect to
and set the resulting
k quantities equal to 0 giving
or, dividing by C(0)
(These are the Yule Walker equations again!)
It is a fact that the solution
is
so that
(the last )
is PACF(k).
Remark: This makes it look as if you would compute the
first 40 values of the PACF by solving 40 different problems and picking out the
last
in each one but in fact there is an explicit recursive
algorithm to compute these things.
Estimation of PACF
To estimate the PACF you can either estimate the ACF and do
the arithmetic above with estimates instead of theoretical values
or minimize a sample version of the mean squared prediction error:
The Mean Squared Prediction Error is
We estimate
in this formula with
and then replace the expected value
with an average over the data so that we minimize
This is just a least squares problem regressing the vector
on the design matrix
The estimate of PACF(k) is the kth entry in
Some example plots
What follows are plots of 5 series, their fitted ACFs and their fitted
PACFs. The series are:
- 1.
- A generated MA(2) series:
- 2.
- A generated AR(2) series
- 3.
- A generated AR(3) series:
- 4.
- A series of monthly sunspot counts over a 200 year period.
- 5.
- A series of annual rain fall measurements in New York City.
Here is the SPlus code I used to make the following plots. You should note the
use of the function acf which calculates and plots ACF and PACF.
#
# Comments begin with #
#
# Begin by generating the series. The AR series generated are NOT
# stationary. But I generate 10000 values from the recurrence relation
# and use the last 500. Asymptotic stationarity should guarantee that
# these last 500 are pretty stationary.
#
n<- 10000
ep <- rnorm(n)
ma2 <- ep[3:502]+0.8*ep[2:501]-0.9*ep[1:500]
ar2 <- rep(0,n)
ar2[1:2] <- ep[1:2]
for(i in 3:n) {ar2[i] <- ep[i]
+ ar2[i-1] -0.99*ar2[i-2]}
ar2 <- ar2[(n-499):n]
ar2 <- ts(ar2)
ar3 <- rep(0,n)
ar3[1:3] <- ep[1:3]
for(i in 4:n) {ar3[i] <- ep[i] + 0.8*ar3[i-1]
-ar3[i-2]/3 +(0.8/1.712)*ar3[i-3]}
ar3 <- ar3[(n-499):n]
ar3 <- ts(ar3)
#
# The next line turns on a graphics device -- in this case the
# graph will be made in postscript in a file called ma2.ps. It
# will come out in portrait, not landscape, format.
#
postscript(file="ma2.ps",horizontal=F)
#
# The next line says to put 3 pictures
# in a single column on the plot
#
par(mfcol=c(3,1))
tsplot(ma2,main="MA(2) series")
acf(ma2)
acf(ma2,type="partial")
#
# When you finish a picture you turn
# off the graphics device
# with the next line.
#
dev.off()
#
#
#
postscript(file="ar2.ps",horizontal=F)
par(mfcol=c(3,1))
tsplot(ar2,main="AR(2) series")
acf(ar2)
acf(ar2,type="partial")
dev.off()
#
#
postscript(file="ar3.ps",horizontal=F)
par(mfcol=c(3,1))
tsplot(ar3,main="AR(3) series")
acf(ar3)
acf(ar3,type="partial")
dev.off()
#
#
postscript(file="sunspots.ps",horizontal=F)
par(mfcol=c(3,1))
tsplot(sunspots,main="Sunspots series")
acf(sunspots,lag.max=480)
acf(sunspots,lag.max=480,type="partial")
dev.off()
#
#
postscript(file="rain.nyc1.ps",horizontal=F)
par(mfcol=c(3,1))
tsplot(rain.nyc1,main="New York Rain Series")
acf(rain.nyc1)
acf(rain.nyc1,type="partial")
dev.off()
Sunspot data
New York City Rainfall
You should notice:
- The MA series has an ACF which is near 0 for lags over 2 as
predicted theoretically.
- The AR(p) series have complicated ACF's but the PACFs vanish
at lags more than p.
- The sunspot data has significant autocorrelation over many years
but the PACF is pretty small after say 2 years or so. Notice that SPlus
has labelled the lags in years so that the number 2 on the x axis
corresponds to about 24 months. An AR(p) model with p around 24
would then be a possibility.
- The New York rainfall series looks quite a bit like white noise.
Richard Lockhart
1999-10-01