No Title

STAT 870 Lecture 4

Goals of Today's Lecture:

Define expected value.
Discuss Monotone Convergence theorem, Dominated Convergence theorem and Fatou's Lemma.
Discuss independence and expected values.

Expected Value

Undergraduate definition of E: integral for absolutely continuous X, sum for discrete. But: $\exists$ rvs which are neither absolutely continuous nor discrete.

General definition of E.

A random variable X is simple if we can write

$\begin{displaymath}X(\omega)= \sum_1^n a_i 1(\omega\in A_i) \end{displaymath}$

for some constants $a_1,\ldots,a_n$ and events A_i.

Def'n: For a simple rv X we define

$\begin{displaymath}E(X) = \sum a_i P(A_i) \end{displaymath}$

For positive random variables which are not simple we extend our definition by approximation:

Def'n: If $X \ge 0$ (almost surely, $P(X\ge 0) = 1$ ) then

$\begin{displaymath}E(X) = \sup\{E(Y): 0 \le Y \le X, Y \mbox{ simple}\} \end{displaymath}$

Def'n: We call X integrable if

$\begin{displaymath}E(\vert X\vert) < \infty \, . \end{displaymath}$

In this case we define

$\begin{displaymath}E(X) = E(\max(X,0)) -E(\max(-X,0)) \end{displaymath}$

Facts: E is a linear, monotone, positive operator:

1.: Linear: E(aX+bY) = aE(X)+bE(Y) provided X and Y are integrable.
2.: Positive: $P(X\ge 0) = 1$ implies $E(X) \ge 0$ .
3.: Monotone: $P(X \ge Y)=1$ and X, Y integrable implies $E(X) \ge E(Y)$ .

Major technical theorems:

Monotone Convergence: If $0 \le X_1 \le X_2 \le \cdots$ a.s. and $X= \lim X_n$ (which exists a.s.) then

$\begin{displaymath}E(X) = \lim_{n\to \infty} E(X_n) \end{displaymath}$

Dominated Convergence: If $\vert X_n\vert \le Y_n$ and $\exists$ rv X st $X_n \to X$ a.s. and rv Y st $Y_n \to Y$ with $E(Y_n) \to E(Y) < \infty$ then

$\begin{displaymath}E(X_n) \to E(X) \end{displaymath}$

Often used with all Y_n the same rv Y.

Fatou's Lemma: If $X_n \ge 0$ then

$\begin{displaymath}E(\lim\sup X_n) \le \lim\sup E(X_n) \end{displaymath}$

Theorem: With this definition of E if X has density f(x) (even in ${\Bbb R}^p$ say) and Y=g(X) then

$\begin{displaymath}E(Y) = \int g(x) f(x) dx \, . \end{displaymath}$

(This could be a multiple integral.) If X has pmf f then

$\begin{displaymath}E(Y) =\sum_x g(x) f(x) \, . \end{displaymath}$

Works even if X has density but Y doesn't.

Def'n: $r^{\rm th}$ moment (about origin) of a real rv X is $\mu_r^\prime=E(X^r)$ (provided it exists). Generally use $\mu$ for E(X). The $r^{\rm th}$ central moment is

$\begin{displaymath}\mu_r = E[(X-\mu)^r] \end{displaymath}$

Call $\sigma^2 = \mu_2$ the variance.

Def'n: For an ${\Bbb R}^p$ valued rv X $\mu_X = E(X)$ is the vector whose $i^{\rm th}$ entry is E(X_i)(provided all entries exist).

Def'n: The ( $p \times p$ ) variance covariance matrix of X is

$\begin{displaymath}Var(X) = E\left[ (X-\mu)(X-\mu)^t \right] \end{displaymath}$

which exists provided each component X_i has a finite second moment. More generally if $X \in {\Bbb R}^p$ and $Y \in {\Bbb R}^q$ both have all components with finite second moments then

$\begin{displaymath}\text{Cov}(X,Y) = \text{E}\left[(X - \mu_X) (Y - \mu_Y)^T\right] \end{displaymath}$

We have

$\begin{displaymath}\text{Cov}(AX+b,CY+d) = A\text{Cov}(X,Y) B^T \end{displaymath}$

for general (conforming) matrices A, C and vectors b and d.

Moments and probabilities of rare events are closely connected as will be seen in a number of important probability theorems. Here is one version of Markov's inequality (one case is Chebyshev's inequality):
$\begin{align*}P(\vert X-\mu\vert \ge t ) &= E[1(\vert X-\mu\vert \ge t)] \\ &\... ...-\mu\vert \ge t)\right] \\ & \le \frac{E[\vert X-\mu\vert^r]}{t^r} \end{align*}$
The intuition is that if moments are small then large deviations from average are unlikely.

Moments and independence

Theorem: If $X_1,\ldots,X_p$ are independent and each X_i is integrable then $X=X_1\cdots X_p$ is integrable and

$\begin{displaymath}E(X_1\cdots X_p) = E(X_1) \cdots E(X_p) \end{displaymath}$

Proof: Usual order: simple Xs first, then positive, then integrable.

Suppose each X_i is simple:

$\begin{displaymath}X_i = \sum_j x_{ij} 1(X_i =x_{ij}) \end{displaymath}$

where the x_ij are the possible values of X_i. Then
$\begin{multline*}E(X_1\cdots X_p) \\ \begin{split} & = \sum_{j_1\ldots j_p} x_... ...x_{pj_p} P(X_p = x_{pj_p})\right] \\ &= \prod E(X_i) \end{split}\end{multline*}$

General X_i>0: X_i,n is X_i rounded down to the nearest multiple of 2^-n (to a maximum of n). Each X_i,n is simple and $X_{1,n},\ldots,X_{p,n}$ are independent. Thus

$\begin{displaymath}{\rm E}(\prod X_{j,n}) = \prod {\rm E}(X_{j,n}) \end{displaymath}$

for each n. If

$\begin{displaymath}X_n^* = \prod X_{j,n} \end{displaymath}$

then

$\begin{displaymath}0 \le X_1^* \le X_2^* \le \cdots \end{displaymath}$

and X_n^* converges to $X^* = \prod X_i$ so that

$\begin{displaymath}{\rm E} ( X^*) = \lim {\rm E}(X_n^*) \end{displaymath}$

by monotone convergence. Also by monotone convergence

$\begin{displaymath}\lim \prod {\rm E}(X_{j,n}) = \prod {\rm E}(X_j) ) < \infty \end{displaymath}$

This shows both that X^* is integrable and that

$\begin{displaymath}E(\prod X_j) = \prod {\rm E} (X_j) \end{displaymath}$

The general case uses the fact that we can write each X_i as the difference of its positive and negative parts:

$\begin{displaymath}X_i = \max(X_i,0) -\max(-X_i,0) \end{displaymath}$

Just expand out the product and use the previous case.

Lebesgue Integration

Lebesgue integral defined much the same way as E.

Borel function f simple if

$\begin{displaymath}f(x) = \sum_1^n a_i 1(x \in A_i) \end{displaymath}$

for almost all $x\in {\Bbb R}^p$ and some constants a_i and Borel sets A_i with $\lambda(A_i) < \infty)$ . For such an f we define

$\begin{displaymath}\int f(x) dx = \sum a_i \lambda(A_i) \end{displaymath}$

Again if

$\begin{displaymath}\sum a_i 1_{A_i} = \sum b_j 1_{B_j} \end{displaymath}$

almost everywhere and all A_i and B_j have finite Lebesgue measure you must check that

$\begin{displaymath}\sum a_i \lambda(A_i) = \sum b_j \lambda(B_j) \end{displaymath}$

If $f \ge 0$ almost everywhere and f is Borel define

$\begin{displaymath}\int f(x) dx = \sup\{ \int g(y) dy\} \end{displaymath}$

where the sup ranges over all simple functions g such that $0 \le g(x) \le f(x)$ for almost all x. Call $f \ge 0$ integrable if $\int f(x) dx< \infty$ .

Call a general f integrable if |f| is integrable and define for integrable f
$\begin{multline*}\int f(x) dx = \int \max(f(x),0) dx \\ - \int \max(-f(x),0) dx \end{multline*}$

Remark: Again you must check that you have not changed the definition of f for either of the previous categories of f.

Facts: $\int$ is a linear, monotone, positive operator:

1.

Linear: provided f and g are integrable

$\begin{displaymath}\int af(x)+bg(x) dx = a\int f(x) dx +b\inf g(x) dx \end{displaymath}$

2.

Positive: If $f(x) \ge 0$ almost everywhere then $\int f(x) dx \ge 0$ .

3.

Monotone: If f(x) > g(x) almost everywhere and f and gare integrable then

$\begin{displaymath}\int f(x) dx \ge \int g(x) dx. \end{displaymath}$

Each of these facts is proved first for simple functions then for positive functions then for general integrable functions.

Major technical theorems:

Monotone Convergence: If $0 \le f_1 \le f_2 \le \cdots$ almost everywhere and $f= \lim f_n$ (which has to exist almost everywhere) then

$\begin{displaymath}\int f(x) dx = \lim_{n\to \infty} f_n(x) dx \end{displaymath}$

Dominated Convergence: If $\vert f_n\vert \le g_n$ and there is a Borel function f such that $f_n(x) \to f(x)$ for almost all x and a Borel function g such that $g_n(x) \to g(x)$ with $\int g_n(x) dx \to \int g(x) dx < \infty$ then f is integrable and

$\begin{displaymath}\int f_n(x) dx \to \int f(x) dx \end{displaymath}$

Fatou's Lemma: If $f_n \ge 0$ almost everywhere then

$\begin{displaymath}\int \limsup f_n(x) dx \le \limsup \int f_n(x) dx \end{displaymath}$

Notice the frequent of almost all or almost everywhere in the hypotheses. In our definition of E wherever we require a property of the function $X(\omega)$ we can require it to hold only for a set of $\omega$ whose complement has probability 0. In this case we say the property holds almost surely. For instance the dominated convergence theorem is usually written:

Dominated Convergence: If $\vert X_n\vert \le Y_n$ almost surely ( often abbreviated to a.s.) and there is a random variable X such that $X_n \to X$ a.s. and a random variable Y such that $Y_n \to Y$ almost surely with $E(Y_n) \to E(Y) < \infty$ then

$\begin{displaymath}E(X_n) \to E(X) \end{displaymath}$

The hypothesis of almost sure convergence can be weakened; I hope to discuss this later in the course.

Multiple Integration: Lebesgue integrals over ${\Bbb R}^p$ defined using Lebesgue measure on ${\Bbb R}^p$ . Iterated integrals wrt Lebesgue measure on ${\Bbb R}^1$ give same answer.

Theorem[Tonelli]: If $f: {\Bbb R}^{p+q} \mapsto {\Bbb R}$ is Borel and $f \ge 0$ almost everywhere then for almost every $x\in {\Bbb R}^p$ the integral

$\begin{displaymath}g(x) \equiv \int f(x,y) dy \end{displaymath}$

exists and

$\begin{displaymath}\int g(x) dx = \int f(x,y) dx dy \end{displaymath}$

RHS denotes p+q dimensional integral defined previously.

Theorem[Fubini] If $f: {\Bbb R}^{p+q} \mapsto {\Bbb R}$ is Borel and integrable then for almost every $x\in {\Bbb R}^p$ the integral

$\begin{displaymath}g(x) \equiv \int f(x,y) dy \end{displaymath}$

exists and is finite. Moreover g is integrable and

$\begin{displaymath}\int g(x) dx = \int f(x,y) dx dy \, . \end{displaymath}$

$next$ $up$ $previous$

Richard Lockhart
2000-09-26