GloVe: Global Vectors for Word Representation, Jeffrey Pennington, Richard Socher, Christopher Manning, 2014Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (Association for Computational Linguistics)DOI: 10.3115/v1/D14-1162 - Presents GloVe, an unsupervised learning algorithm for obtaining vector representations for words, combining global matrix factorization and local context window methods.
Visualizing Data using t-SNE, Laurens van der Maaten and Geoffrey Hinton, 2008Journal of Machine Learning Research, Vol. 9 (Journal of Machine Learning Research) - The original paper introducing t-Distributed Stochastic Neighbor Embedding (t-SNE), a non-linear dimensionality reduction technique widely used for visualization.
Pattern Recognition and Machine Learning, Christopher M. Bishop, 2006 (Springer) - A standard textbook in machine learning that provides a thorough treatment of dimensionality reduction techniques, including Principal Component Analysis (PCA), fundamental for visualizing high-dimensional data.