Fundamentals of Speech Recognition, Lawrence R. Rabiner, Biing-Hwang Juang, 1993 (Prentice Hall) - A classic textbook providing comprehensive coverage of traditional speech recognition techniques, including the detailed theory and application of MFCCs with GMM-HMM systems.
Convolutional Neural Networks for Large-Scale Speech Recognition, Osama Abdel-Hamid, Abdel-Rahman Mohamed, Hui Jiang, Li Deng, George Penn, and Dong Yu, 2014IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 22 (IEEE)DOI: 10.1109/TASLP.2014.2339736 - A seminal paper demonstrating the effectiveness of Convolutional Neural Networks (CNNs) in ASR, which commonly use log-mel spectrograms as input and highlight the CNNs' ability to learn from their local patterns.
Speech and Language Processing (3rd ed. draft), Daniel Jurafsky and James H. Martin, 2025 - An authoritative, continuously updated online textbook covering modern speech and language processing, including discussions on feature extraction (MFCCs and spectrograms) for deep learning-based ASR.