Two Novel On-policy Reinforcement Learning Algorithms based on TD(lambda)-methods

Wiering, M.A.; Hasselt, H. van

Two Novel On-policy Reinforcement Learning Algorithms based on TD(lambda)-methods

DSpace/Manakin Repository

Two Novel On-policy Reinforcement Learning Algorithms based on TD(lambda)-methods

Wiering, M.A.; Hasselt, H. van

(2007) Proceedings of IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL)

(Article in proceedings)

Abstract

This paper describes two novel on-policy reinforcement learning algorithms, named QV(lambda)-learning and the actor critic learning automaton (ACLA). Both algorithms learn a state value-function using TD(lambda)-methods. The difference between the algorithms is that QV-learning uses the learned value function and a form of Q-learning to learn Q-values, whereas ACLA uses the value function and a learning ... read more

Download/Full Text

Open Access version via Utrecht University Repository

See more statistics about this item