Abstract
This paper describes two novel on-policy reinforcement
learning algorithms, named QV(lambda)-learning and the actor
critic learning automaton (ACLA). Both algorithms learn a state
value-function using TD(lambda)-methods. The difference between the
algorithms is that QV-learning uses the learned value function
and a form of Q-learning to learn Q-values, whereas ACLA uses
the value function and a learning
... read more