Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems 30 (NIPS 2017)DOI: 10.48550/arXiv.1706.03762 - Introduces the Transformer architecture, which underpins modern Large Language Models and their sequence processing, including the fundamental concept of a fixed-size input context defined by attention mechanisms.
Tokenizers, OpenAI, 2024 (OpenAI) - Explains how text is converted into tokens for Large Language Models, which is crucial for understanding how the context window's size is measured. Also details various tokenization schemes.