A elementary conditional expectations such as $E[X \mid Y=2]$ are numbers. If we consider $E[X \mid Y=y]$, it is a number that depends on $y$. So it is a function of $y$. Thus we can condition with respect to a σ-algebra, and view the conditional expectation itself as arandom variable.

Conditional Probability Function

Let’s consider two discrete random variables $X$ and $Y$.
Let $p(x, y)=\mathrm{P}(X=x, Y=y)$ be the joint probability mass function, then the marginal distribution

\[p_X(x)=\mathrm{P}(X=x)=\sum_{y \in B} p(x, y),\]

where $B$ is the set of possible values of $Y$.
Similarly,

\[p_Y(y)=\mathrm{P}(Y=y)=\sum_{x \in A} p(x, y),\]

where $A$ is the set of possible values of $X$.
Then the conditional probability mass function of $X$ given $Y=y$ is

\[p_{X \mid Y}(x \mid y)=\mathrm{P}(X=x \mid Y=y)=\frac{p(x, y)}{p_Y(y)} .\]

Conditional Expectation

Elementary Version

The conditional expectation of $X$ given $Y=y$ is defined as

\[\mathrm{E}[X \mid Y=y]=\sum_{x \in A} x p_{X \mid Y}(x \mid y) .\]

Consider a real-valued function $h$ from $\mathcal{R}$ to $\mathcal{R}$.
The conditional expectation of $h(X)$ given $Y=y$ is

\[\mathrm{E}[h(X) \mid Y=y]=\sum_{x \in A} h(x) p_{X \mid Y}(x \mid y) .\]

The conditional expectation of $X$ given $Y$, denoted by $\mathrm{E}[X \mid Y]$, is the function of $Y$ that is defined to be $\mathrm{E}[X \mid Y=y]$ when $Y=y$.
Specifically, let $\delta(x)$ be the function with $\delta(0)=1$ and $\delta(x)=0$ for all $x \neq 0$.
Also, let $\delta_y(Y)=\delta(Y-y)$ be the indicator random variable such that $\delta_y(Y)=1$ if the event ${Y=y}$ occurs and $\delta_y(Y)=0$ otherwise.
Then

\(\mathrm{E}[X \mid Y]=\sum_{y \in B} \mathrm{E}[X \mid Y=y] \delta_y(Y)=\sum_{y \in B} \sum_{x \in A} x p_{X \mid Y}(x \mid y) \delta_y(Y) .\)

That is to say: The conditional expectation of $X$ give $Y$ is a random variable that takes in an event, and output a conditional expectation of $X$ given the value of $Y = y$ on that event

General Version

For a $\sigma$-algebra $\mathcal{G}, \mathrm{E}[X \mid \mathcal{G}]$ is defined to be the random variable that satisfies

  1. $\mathrm{E}[X \mid \mathcal{G}]$ is $\mathcal{G}$-measurable
  2. $\int_A X d \mathrm{P}=\int_A \mathrm{E}[X \mid \mathcal{G}] d \mathrm{P}$ for all $A \in \mathcal{G}$.

To understand this definition, consider the $\sigma$-algebra generated by the random variable $Y$ (denoted by $\sigma(Y)$ ).

The condition that $\mathrm{E}[X \mid Y]$ is $\sigma(Y)$-measurable is simply that $\mathrm{E}[X \mid Y]$ is a measurable function of $Y$, i.e., $\mathrm{E}[X \mid Y]=h(Y)$ for some measurable function.

To understand the second condition, we may rewrite it as follows:

\(\mathrm{E}\left[\mathbf{1}_A X\right]=\mathrm{E}\left[\mathbf{1}_A \mathrm{E}[X \mid Y]\right],\)
for all event $A$ in $\sigma(Y)$, where $\mathbf{1}_A$ is the indicator random variable with $\mathbf{1}_A=1$ when the event $A$ occurs.

Updated: