Overview of Speech Processing Toolkits (ESPnet, NeMo, Coqui)
Was this section helpful?
ESPnet: End-to-End Speech Processing Toolkit, Shinji Watanabe, Takaaki Hori, Shigeki Karita, Tomoki Hayashi, Jiro Nishitoba, Yuya Unno, Nelson Enrique Yalta Soplin, Jahn Heymann, Matthew Wiesner, Nanxin Chen, Adithya Renduchintala, Tsubasa Ochiai, 2018Interspeech 2018DOI: 10.21437/Interspeech.2018-1456 - Introduces the ESPnet toolkit, detailing its end-to-end architecture and use of Kaldi-style recipes for reproducible speech processing.
NeMo: a Toolkit for Conversational AI, Oleksii Kuchaiev, Vitaly Lavrukhin, Boris Ginsburg, Jason Li, Hakan Erdogan, Brian Price, Congzhou Shen, Jianyu Chen, Rakesh Chada, Sameer Badaskar, Jeremy Watts, Sean Narenthiran, Kjell Petersen, Yingtao Mao, Mariya Shmatova, Rafael Mosquera, Marco Matassoni, Michael Katz, Andrew Kelleher, Maciej Ziemba, Misha Khan, Vashishth Sureka, Yang Zhang, Jonathan Safron, Ben Vaisberg, Josh Meyer, Mike Le Beau, Jonathan Rodriguez, and Andrii Trush, 2020Interspeech (ISCA (International Speech Communication Association))DOI: 10.21437/Interspeech.2020-2525 - Presents the NeMo toolkit, emphasizing its modular design and capabilities for building conversational AI models, with strong GPU optimization.