Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - A comprehensive textbook covering the mathematical concepts of probability, information theory, and optimization essential for understanding deep learning, including autoencoders. Chapters 3, 4, and 5 are particularly relevant.
Pattern Recognition and Machine Learning, Christopher M. Bishop, 2006 (Springer)DOI: 10.1007/978-0-387-45528-0 - A classic textbook offering a probabilistic perspective on machine learning, with detailed sections on probability theory and information theory that are important for generative models like VAEs.
Machine Learning Course Notes (CS229), Andrew Ng, Tengyu Ma, 2023 (Stanford University) - Provides detailed course notes on the mathematical foundations of machine learning, including probability, optimization algorithms, and relevant statistical concepts.