Distilling the Knowledge in a Neural Network, Geoffrey Hinton, Oriol Vinyals, Jeff Dean, 2015arXiv preprint arXiv:1503.02531DOI: 10.48550/arXiv.1503.02531 - The foundational paper introducing knowledge distillation, providing the basis for soft targets, which are relevant to discussions of diversity and mode collapse.
Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks, Samy Bengio, Oriol Vinyals, Navdeep Jaitly, Noam Shazeer, 2015Advances in Neural Information Processing Systems, Vol. 28 (Neural Information Processing Systems Foundation) - Introduces scheduled sampling, a technique to mitigate exposure bias in sequence generation models by gradually using the model's own outputs during training.
Sequence-level training with recurrent neural networks, Marc'Aurelio Ranzato, Sumit Chopra, Michael Auli, Wojciech Zaremba, 2015arXiv preprint arXiv:1511.06732DOI: 10.48550/arXiv.1511.06732 - Explores training sequence generation models using reinforcement learning at the sequence level, directly addressing error propagation and the training-inference mismatch.