WaveNet: A Generative Model for Raw Audio, Aaron van den Oord, Sander Dieleman, Heiga Zen, Koray Kavukcuoglu, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, Karen Simonyan, John Jumper, Zbigniew Wojna, Demis Hassabis, Augustin Degol, Karol Kurach, 2016Proc. INTERSPEECH (ISCA)DOI: 10.21437/Interspeech.2016-169 - Foundational paper introducing autoregressive neural vocoders, detailing dilated causal convolutions for high-fidelity audio generation.
WaveGlow: A Flow-based Generative Network for Speech Synthesis, Rafael Valle, Kevin Stanton, Ryan Prenger, William S. Yorozu, Bryan Catanzaro, 2019Proceedings of Interspeech (ISCA (International Speech Communication Association))DOI: 10.21437/Interspeech.2019-2022 - Introduces a flow-based generative network for speech synthesis, a key example of a non-autoregressive vocoder using normalizing flows.