BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, 2018Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)DOI: 10.48550/arXiv.1810.04805 - Introduces the BERT model and the pre-train, fine-tune approach for NLP tasks, with examples relevant to the section's content.
Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems 30 (NIPS 2017)DOI: 10.48550/arXiv.1706.03762 - Presents the Transformer architecture, a core component of Large Language Models discussed in this section.
Hugging Face Transformers Documentation, Hugging Face, 2024 (Hugging Face) - Official documentation for the Hugging Face Transformers library, providing practical guidance for implementing fine-tuning procedures.