A theoretical analysis of Model-Based Temporal
Difference Learning for Control is given, leading to a proof of
convergence. This work differs from earlier work on the convergence
of Temporal Difference Learning by proving convergence
to the optimal value function. This means that not the values of
the current policy are found, but instead the policy
... read more