Neural Machine Translation of Rare Words with Subword Units, Rico Sennrich, Barry Haddow, and Alexandra Birch, 2016Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Association for Computational Linguistics)DOI: 10.18653/v1/P16-1162 - Introduces Byte-Pair Encoding (BPE) for subword tokenization, a foundational method used in many modern language models like GPT-2, directly relevant to the subword examples discussed.
Tokenizers in the transformers library, Hugging Face team, 2024 - Official documentation for tokenizers within the Hugging Face transformers library, explaining how different tokenization algorithms are loaded and used, directly relevant to the Python code example.