Understanding the Concept of Conditional Expectation
A elementary conditional expectations such as $E[X \mid Y=2]$ are numbers. If we consider $E[X \mid Y=y]$, it is a number that depends on $y$. So it is a function of $y$. Thus we can condition with respect to a σ-algebra, and view the conditional expectation itself as arandom variable.
Conditional Probability Function
Let’s consider two discrete random variables $X$ and $Y$.
Let $p(x, y)=\mathrm{P}(X=x, Y=y)$ be the joint probability mass function, then the marginal distribution
where $B$ is the set of possible values of $Y$.
Similarly,
where $A$ is the set of possible values of $X$.
Then the conditional probability mass function of $X$ given $Y=y$ is
Conditional Expectation
Elementary Version
The conditional expectation of $X$ given $Y=y$ is defined as
\[\mathrm{E}[X \mid Y=y]=\sum_{x \in A} x p_{X \mid Y}(x \mid y) .\]Consider a real-valued function $h$ from $\mathcal{R}$ to $\mathcal{R}$.
The conditional expectation of $h(X)$ given $Y=y$ is
The conditional expectation of $X$ given $Y$, denoted by $\mathrm{E}[X \mid Y]$, is the function of $Y$ that is defined to be $\mathrm{E}[X \mid Y=y]$ when $Y=y$.
Specifically, let $\delta(x)$ be the function with $\delta(0)=1$ and $\delta(x)=0$ for all $x \neq 0$.
Also, let $\delta_y(Y)=\delta(Y-y)$ be the indicator random variable such that $\delta_y(Y)=1$ if the event ${Y=y}$ occurs and $\delta_y(Y)=0$ otherwise.
Then
\(\mathrm{E}[X \mid Y]=\sum_{y \in B} \mathrm{E}[X \mid Y=y] \delta_y(Y)=\sum_{y \in B} \sum_{x \in A} x p_{X \mid Y}(x \mid y) \delta_y(Y) .\)
That is to say: The conditional expectation of $X$ give $Y$ is a random variable that takes in an event, and output a conditional expectation of $X$ given the value of $Y = y$ on that event
General Version
For a $\sigma$-algebra $\mathcal{G}, \mathrm{E}[X \mid \mathcal{G}]$ is defined to be the random variable that satisfies
- $\mathrm{E}[X \mid \mathcal{G}]$ is $\mathcal{G}$-measurable
- $\int_A X d \mathrm{P}=\int_A \mathrm{E}[X \mid \mathcal{G}] d \mathrm{P}$ for all $A \in \mathcal{G}$.
To understand this definition, consider the $\sigma$-algebra generated by the random variable $Y$ (denoted by $\sigma(Y)$ ).
The condition that $\mathrm{E}[X \mid Y]$ is $\sigma(Y)$-measurable is simply that $\mathrm{E}[X \mid Y]$ is a measurable function of $Y$, i.e., $\mathrm{E}[X \mid Y]=h(Y)$ for some measurable function.
To understand the second condition, we may rewrite it as follows:
\(\mathrm{E}\left[\mathbf{1}_A X\right]=\mathrm{E}\left[\mathbf{1}_A \mathrm{E}[X \mid Y]\right],\)
for all event $A$ in $\sigma(Y)$, where $\mathbf{1}_A$ is the indicator random variable with $\mathbf{1}_A=1$ when the event $A$ occurs.