Rotary Position Embedding (RoPE)

New · Open Source

Kerb - LLM Development Toolkit

Python toolkit for building production-ready LLM applications. Modular utilities for prompts, RAG, agents, structured outputs, and multi-provider support.

Was this section helpful?

References

RoFormer: Enhanced Transformer with Rotary Position Embedding, Jianlin Su, Yu Lu, Shengfeng Pan, Ahmed Murtadha, Bo Wen, Yunfeng Liu, 2021 arXiv preprint arXiv:2104.09864 DOI: 10.48550/arXiv.2104.09864 - This is the seminal paper introducing Rotary Position Embedding (RoPE), detailing its mathematical foundation and demonstrating its effectiveness in the Transformer architecture.
Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017 Advances in Neural Information Processing Systems 30 (NIPS 2017) DOI: 10.48550/arXiv.1706.03762 - The foundational paper that introduced the Transformer architecture, providing essential background for understanding positional encoding mechanisms like RoPE.
Self-Attention with Relative Position Representations, Peter Shaw, Jakob Uszkoreit, Ashish Vaswani, 2018 Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) DOI: 10.48550/arXiv.1803.02155 - This paper proposes an early method for incorporating relative position representations into self-attention, offering a point of comparison to RoPE's approach.
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context, Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov, 2019 Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL) DOI: 10.48550/arXiv.1901.02860 - This work introduces a relative positional encoding scheme and segment-level recurrence, offering another perspective on handling sequence length and relative positions in Transformers.