Rainbow: Combining Improvements in Deep Reinforcement Learning, Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, David Silver, 2018Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32 (Association for the Advancement of Artificial Intelligence)DOI: 10.1609/aaai.v32i1.11796 - Presents the complete Rainbow DQN agent, detailing the integration of multiple techniques and their performance benefits.
A Distributional Perspective on Reinforcement Learning, Marc G. Bellemare, Will Dabney, Rémi Munos, 2017Proceedings of the 34th International Conference on Machine Learning, Vol. 70 (PMLR) - Introduces distributional reinforcement learning, a key component for modeling return distributions in Rainbow.
Prioritized Experience Replay, Schaul, Tom, John Quan, Ioannis Antonoglou, and David Silver, 2016International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1511.05952 - Describes prioritized experience replay, which improves sample efficiency and interaction with distributional methods.
Dueling Network Architectures for Deep Reinforcement Learning, Ziyu Wang, Tom Schaul, Matteo Hessel, Hado Hasselt, Marc Lanctot, Nando Freitas, 2016Proceedings of The 33rd International Conference on Machine Learning, Vol. 48 (PMLR) - Introduces the dueling network architecture for improved policy evaluation, adaptable for distributional settings.
Deep Reinforcement Learning with Double Q-learning, Hado van Hasselt, Arthur Guez, David Silver, 2016Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 30 (Association for the Advancement of Artificial Intelligence)DOI: https://doi.org/10.1609/aaai.v30i1.10295 - Presents Double DQN, a method to mitigate overestimation bias in Q-value estimation, integrated into Rainbow.