Generative Adversarial Networks, Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio, 2014Advances in Neural Information Processing Systems, Vol. 27 (NeurIPS) - The foundational paper introducing GANs, defining the minimax objective, and establishing the connection to Jensen-Shannon Divergence, which are root causes of non-convergence discussed in the section.
Wasserstein GAN, Martin Arjovsky, Soumith Chintala, and Léon Bottou, 2017Proceedings of the 34th International Conference on Machine Learning (ICML), Vol. 70 - Introduces the Wasserstein distance as an alternative to JSD, directly addressing the vanishing gradient problem when distributions have negligible overlap, a key challenge detailed in the section.
Improved Techniques for Training GANs, Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, Xi Chen, Xi Chen, 2016Advances in Neural Information Processing Systems, Vol. 29DOI: 10.48550/arXiv.1606.03498 - This paper discusses common training instabilities in GANs and proposes several practical heuristics to stabilize training, implicitly highlighting the difficulties of non-convergence and mode collapse.
On the Convergence of Adversarial Training, Thomas Mescheder, Andreas Geiger, and Sebastian Nowozin, 2018Proceedings of the 35th International Conference on Machine Learning (ICML), Vol. 80 (PMLR (Proceedings of Machine Learning Research))DOI: 10.5598/v80/mescheder18a - Provides a theoretical analysis of GAN training dynamics, shedding light on the difficulties of finding saddle points and the oscillatory behavior encountered with alternating gradient updates.