Abstract

Q(lambda)-learning uses TD(lambda)-methods to accelerate Q-learning. The update complexity of previous online Q(lambda)implementations based on lookup-tables is bounded by the size of the
state-action space. Our faster algorithm's update complexity is bounded by the number of actions. The method is based on the observation that Q-value updates may be postponed until
... read more