Denoising Diffusion Probabilistic Models, Jonathan Ho, Ajay Jain, Pieter Abbeel, 2020Advances in Neural Information Processing Systems 33, Vol. 33 (Curran Associates, Inc.)DOI: 10.5555/3455702.3455871 - This seminal paper introduced the Denoising Diffusion Probabilistic Models (DDPM) framework, detailing the forward and reverse processes, and the simplified training objective, which established the foundation for diffusion models.
Denoising Diffusion Implicit Models, Jiaming Song, Chenlin Meng, Stefano Ermon, 2021International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.2010.02502 - This paper introduced Denoising Diffusion Implicit Models (DDIM), presenting a method for significantly faster inference with fewer steps while maintaining generation quality, a critical contribution for applications requiring efficient sampling.
DiffWave: A Versatile Diffusion Model for Audio Synthesis, Zhifeng Kong, Wei Ping, Kaiming Ren, Kexin Ren, and Qifeng Liu, 2021International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.2009.09761 - This work was among the first to successfully apply diffusion models to high-fidelity audio waveform generation, demonstrating its promise for vocoding and general audio synthesis by adapting the DDPM framework for 1D audio signals.
ProDiff: Progressive Fast Diffusion Model for High-Quality Text-to-Speech, Rongjie Huang, Zhou Zhao, Huadai Liu, Jinglin Liu, Chenye Cui, Yi Ren, 2022Proceedings of the 30th ACM International Conference on Multimedia (ACM)DOI: 10.48550/arXiv.2207.05831 - This research proposed ProDiff, an approach for high-quality text-to-speech that employs a diffusion model for the vocoding component and incorporates methods for accelerating inference, addressing a key challenge of diffusion models in speech synthesis.