Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017arXivDOI: 10.48550/arXiv.1706.03762 - This foundational paper introduces the Transformer architecture, which underpins modern LLMs, and explains why they inherently process fixed-size inputs, leading to a stateless interaction model without explicit memory mechanisms.
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela, 2020NeurIPS 2020DOI: 10.48550/arXiv.2005.11401 - This paper introduces Retrieval-Augmented Generation (RAG), a method that addresses context window limitations by retrieving relevant information from a knowledge base to augment the LLM's input, thus extending its effective 'memory' for factual recall beyond the immediate conversation.