Implementation of Shaw et al.'s Relative Position

New · Open Source

Kerb - LLM Development Toolkit

Python toolkit for building production-ready LLM applications. Modular utilities for prompts, RAG, agents, structured outputs, and multi-provider support.

Was this section helpful?

References

Self-Attention with Relative Position Representations, Peter Shaw, Jakob Uszkoreit, Ashish Vaswani, 2018 Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers) (Association for Computational Linguistics) DOI: 10.18653/v1/N18-2074 - The foundational paper introducing relative position representations into Transformer self-attention, detailing the modification of attention scores and value aggregation.
Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017 Advances in Neural Information Processing Systems 30 (NeurIPS 2017) DOI: 10.48550/arXiv.1706.03762 - The original paper that introduced the Transformer architecture, providing the fundamental context for all subsequent positional encoding variations.
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer, Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu, 2019 Journal of Machine Learning Research, Vol. 21 DOI: 10.48550/arXiv.1910.10683 - Describes the T5 model, which utilizes a different but related relative positional encoding scheme, offering practical insights into implementation in large models.
Speech and Language Processing (3rd ed. draft), Daniel Jurafsky and James H. Martin, 2025 - A comprehensive textbook chapter explaining the Transformer architecture, including discussions on various positional encoding techniques, such as relative position embeddings. Chapter 10 covers Transformers and large language models.