RUDDER: Return Decomposition for Delayed Rewards (14 Mar 2021) Elastic Weight Consolidation (21 Aug 2019) MCMC Methods for Posterior Approximation (02 Apr 2019) Quantum TD Learning (22 Aug 2018) Proximal Policy Optimization (13 Jul 2018) Safe, Multi Agent RL for Autonomous Driving (01 Jul 2018) Trust Region Policy Optimization (10 Jun 2018) Deep Deterministic Policy Gradient (15 May 2018) UCB1, Multi Armed Bandits and Regret (22 Sep 2017)