Abstract
Actor-critic deep reinforcement learning methods have demonstrated significant performance in many challenging decision-making and control tasks, but also suffer from high sample complexity and overestimation bias. Current researches focus on using underestimation to balance overestimation and reducing bias through ensemble learning, but introducing underestimation bias and excessive network costs. In
... read more