Postscript version of these notes
STAT 804: Notes on Lecture 9
Fitting higher order autoregressions
For the model
we will use conditional likelihood again. Let
denote the
vector
.
Now we condition
on the first p values of X and use
If we estimate
using
we find that we are trying
to maximize
To estimate
then we merely minimize the sum of squares
This is a straightforward regression problem. We regress the response
vector
on the design matrix
An alternative to estimating
by
is to define
and then recognize that
is maximized by regressing the vector
on the design matrix
From
and
we would get an estimate for
by
Notice that if we put
then
and if
then
so that the normal equations (from least squares)
are nearly the Yule-Walker equations again.
Full maximum likelihood
To compute a full mle of
you generally begin by finding preliminary estimates
say by one of the conditional likelihood
methods above and then iterate via Newton-Raphson or
some other scheme for numerical maximization of the
log-likelihood.
Fitting MA(q) models
Here we consider the model with known mean (generally this
will mean we estimate
and subtract the mean
from all the observations):
In general X has a
distribution and, letting
denote the vector of bis we find
Here X denotes the column vector of all the data.
As an example consider q=1 so that
It is not so easy to work with the determinant and inverse of matrices
like this. Instead we try to mimic the conditional inference approach
above but with a twist; we now condition on something we haven't observed
--
.
Notice that
Now imagine that the data were actually
Then the same idea we used for an AR(1) would give
The parameters are listed in the conditions in this formula merely
to indicate which terms depend on which parameters. For
Gaussian s the terms in this likelihood are squares
as usual (plus logarithms of )
leading to
We will estimate the parameters by maximizing this function
after getting rid of
somehow.
Method A: Put
since 0 is
the most probable value and maximize
Notice that for large T the coefficients of
are close to 0 for most t and the remaining few terms are
negligible relatively to the total.
Method B: Backcasting is the process of
guessing
on the basis of the data; we replace
in the log likelihood by
The problem is that this quantity depends on b and .
We will use the EM algorithm to solve this problem.
The algorithm can be applied when we have (real or imaginary)
missing data. Suppose the data we have is X; some other data
we didn't get is Y and Z=(X,Y). It often happens that we can
think of a Y we didn't observe in such a way that the likelihood
for the whole data set Z would be simple. In that case we can
try to maximize the likelihood for X by following a two step
algorithm first discussed in detail by Dempster, Laid and Rubin.
This algorithm has two steps:
- 1.
- The E or Estimation step. We ``estimate''
the missing data Y by computing
.
Technically,
we are supposed to estimate the likelihood function based on Z.
Factor the density of Z as
fZ = fY|XfX
and take logs to get
We actually estimate the log conditional density (which is
a function of )
by computing
Notice the subscript
on .
This indicates
that you have to know the parameter to compute the conditional
expectation. Notice too that there is another
in the
conditional expectation - the log conditional density has
a parameter in it.
- 2.
- We then maximize our estimate of
to get
a new value
for .
Go back to step 1 with this
replacing
and iterate.
To get started we need a preliminary estimate. Now look at our
problem. In our case the quantity Y is
.
Rather than work with the log-likelihood directly we work with
Y. Our preliminary estimate of Y is 0. We use this value
to estimate
as above getting an estimate .
Then we compute
and
replace
in the log-likelihood above by this
conditional expectation. Then iterate. This process of guessing
is called backcasting.
Summary
- The log likelihood based on
is
- Put
in this formula and estimate
by minimizing
where
for
.
- Now compute
Box, Jenkins and Reinsel presents an algorithm to do so based
on the fact that there are actually two MA representations of
corresponding to a given covariance function (the invertible
one and a non-invertible one). The non-invertible representation
is
this
form can be used to carry out the computation of the
conditional expection.
- Iterate, re-estimating
and recomputing the
backcast value of
if needed.
Richard Lockhart
1999-10-12