Law of Iterated Expectation

What is LIE

The law of iterated expectation or the law of total expectation states that

\[E[X] = E[E[X \mid Y]]\]

or, in other words the expected value of the conditional expected value \(X\) given \(Y\) is the same as expected value of \(X\).

Why should I care

Well, the LIE is used in the derivation of the Bellman equations of reinforcement learning, let me show you how. From the definition of the value function (expected value of the discounted return), we have

\[\begin{aligned} V^\pi (x) & = E_\pi \left[\sum_{t=0}^{\infty} {\gamma ^ t r(x_t, a_t) \mid x_0 = x} \right]\\ & = r(x, \pi(x)) + E_\pi \left[ \sum_{t=1}^{\infty} {\gamma ^ t r(x_t, a_t) \mid x_0 = x} \right]\\ & = r(x, \pi(x)) + \gamma E_y \left[ E_\pi \left[ \sum_{t=1}^{\infty} {\gamma ^ {t - 1} r(x_t, a_t) \mid x_0 = x, x_1 = y} \right] \right] \quad \color{CornflowerBlue}{\text{(LIE used here)}}\\ & = r(x, \pi(x)) + \gamma \sum_y {P(y \mid x, \pi (x))} E_\pi \left[ \sum_{t=1}^{\infty} {\gamma ^ {t - 1} r(x_t, a_t) \mid x_1 = y} \right] \quad \color{CornflowerBlue}{\text{(by Markov property)}}\\ & = r(x, \pi(x)) + \gamma \sum_y {P(y \mid x, \pi (x))} V^\pi (y) \qquad \blacksquare \end{aligned}\]

These are the Bellman equations for the value function!