Centralized Training with Decentralized Execution (CTDE)
Was this section helpful?
Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments, Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, Igor Mordatch, 2017Advances in Neural Information Processing Systems, Vol. 30DOI: 10.5591/978-1-57783-000-8-124 - This foundational paper introduces Multi-Agent Deep Deterministic Policy Gradient (MADDPG), a widely used algorithm that showcases the CTDE approach for mixed cooperative-competitive settings.
QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning, Tabish Rashid, Gregory Farquhar, Shimon Whiteson, Michael Wooldridge, 2018Proceedings of the 35th International Conference on Machine Learning (ICML)DOI: 10.5555/3326779.3326815 - This paper presents QMIX, a prominent CTDE algorithm for cooperative MARL that uses a centralized mixing network to factorize the joint action-value function while ensuring decentralized execution.
Value-Decomposition Networks For Cooperative Multi-Agent Reinforcement Learning, Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z. Leibo, Karl Tuyls, Thore Graepel, 2017International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1706.05296 - This work introduces Value-Decomposition Networks (VDN), an influential early CTDE approach that factorizes the total Q-value into individual agent Q-values for cooperative multi-agent tasks, laying the foundation for subsequent methods like QMIX.