Transformer-XL Relative Positional Encoding

New · Open Source

Kerb - LLM Development Toolkit

Python toolkit for building production-ready LLM applications. Modular utilities for prompts, RAG, agents, structured outputs, and multi-provider support.

Was this section helpful?

References

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context, Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov, 2019 Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL) DOI: 10.48550/arXiv.1901.02860 - The original paper introducing the Transformer-XL architecture and its relative positional encoding scheme, including detailed mathematical formulations and implementation strategies.
Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017 Advances in Neural Information Processing Systems (NeurIPS) 30 DOI: 10.48550/arXiv.1706.03762 - The seminal paper that introduced the Transformer architecture and its original absolute sinusoidal positional encoding, providing background for subsequent relative encoding methods.
Self-Attention with Relative Position Representations, Peter Shaw, Jakob Uszkoreit, Ashish Vaswani, 2018 NAACL 2018 DOI: 10.48550/arXiv.1803.02155 - Introduces an earlier approach to relative positional encoding in self-attention, which uses learnable relative position biases, offering a comparison point to Transformer-XL's method.