相关文章推荐
怕考试的杯子  ·  Hive SQL Parser ...·  1 年前    · 
安静的火腿肠  ·  [Solved] Getting ...·  1 年前    · 
慷慨大方的酱牛肉  ·  Tkinter ...·  1 年前    · 
高兴的眼镜  ·  Call native C code ...·  2 年前    · 

arXiv Is Hiring a DevOps Engineer

Work on one of the world's most important websites and make an impact on open science.

View Jobs We gratefully acknowledge support from the Simons Foundation, member institutions , and all contributors. Donate [Submitted on 7 Feb 2021 ( v1 ), last revised 6 Apr 2022 (this version, v5)]

Title: Tactical Optimism and Pessimism for Deep Reinforcement Learning

View PDF Abstract: In recent years, deep off-policy actor-critic algorithms have become a dominant approach to reinforcement learning for continuous control. One of the primary drivers of this improved performance is the use of pessimistic value updates to address function approximation errors, which previously led to disappointing performance. However, a direct consequence of pessimism is reduced exploration, running counter to theoretical support for the efficacy of optimism in the face of uncertainty. So which approach is best? In this work, we show that the most effective degree of optimism can vary both across tasks and over the course of learning. Inspired by this insight, we introduce a novel deep actor-critic framework, Tactical Optimistic and Pessimistic (TOP) estimation, which switches between optimistic and pessimistic value learning online. This is achieved by formulating the selection as a multi-arm bandit problem. We show in a series of continuous control tasks that TOP outperforms existing methods which rely on a fixed degree of optimism, setting a new state of the art in challenging pixel-based environments. Since our changes are simple to implement, we believe these insights can easily be incorporated into a multitude of off-policy algorithms.

Submission history

From: Theodore Moskovitz [ view email ]
[v1] Sun, 7 Feb 2021 10:28:09 UTC (1,940 KB)
Tue, 9 Feb 2021 09:29:55 UTC (1,940 KB)
Mon, 31 May 2021 20:42:19 UTC (9,070 KB)
Fri, 14 Jan 2022 11:44:29 UTC (3,995 KB)
Wed, 6 Apr 2022 12:03:06 UTC (3,995 KB)
View a PDF of the paper titled Tactical Optimism and Pessimism for Deep Reinforcement Learning, by Ted Moskovitz and 4 other authors
  • View PDF
  • TeX Source
  • Other Formats
  • view license Current browse context:
    cs.LG
    recent | 2021-02 Change to browse by:

    arXivLabs: experimental projects with community collaborators

    arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

    Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

    Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .