Law of Iterated Expectation
What is LIE
The law of iterated expectation or the law of total expectation states that
\[E[X] = E[E[X \mid Y]]\]or, in other words the expected value of the conditional expected value \(X\) given \(Y\) is the same as expected value of \(X\).
Why should I care
Well, the LIE is used in the derivation of the Bellman equations of reinforcement learning, let me show you how. From the definition of the value function (expected value of the discounted return), we have
\[\begin{aligned} V^\pi (x) & = E_\pi \left[\sum_{t=0}^{\infty} {\gamma ^ t r(x_t, a_t) \mid x_0 = x} \right]\\ & = r(x, \pi(x)) + E_\pi \left[ \sum_{t=1}^{\infty} {\gamma ^ t r(x_t, a_t) \mid x_0 = x} \right]\\ & = r(x, \pi(x)) + \gamma E_y \left[ E_\pi \left[ \sum_{t=1}^{\infty} {\gamma ^ {t - 1} r(x_t, a_t) \mid x_0 = x, x_1 = y} \right] \right] \quad \color{CornflowerBlue}{\text{(LIE used here)}}\\ & = r(x, \pi(x)) + \gamma \sum_y {P(y \mid x, \pi (x))} E_\pi \left[ \sum_{t=1}^{\infty} {\gamma ^ {t - 1} r(x_t, a_t) \mid x_1 = y} \right] \quad \color{CornflowerBlue}{\text{(by Markov property)}}\\ & = r(x, \pi(x)) + \gamma \sum_y {P(y \mid x, \pi (x))} V^\pi (y) \qquad \blacksquare \end{aligned}\]These are the Bellman equations for the value function!