Scaled Dot-Product Attention

New · Open Source

Kerb - LLM Development Toolkit

Python toolkit for building production-ready LLM applications. Modular utilities for prompts, RAG, agents, structured outputs, and multi-provider support.

Was this section helpful?

References

Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017 Advances in Neural Information Processing Systems 30 (NIPS 2017) DOI: 10.48550/arXiv.1706.03762 - The original research paper that introduced the Transformer architecture and the Scaled Dot-Product Attention mechanism.
Stanford CS224N: Natural Language Processing with Deep Learning, Diyi Yang, Tatsunori Hashimoto, 2024 - Provides lecture materials and assignments that offer detailed academic explanations of the Transformer architecture and attention.
Transformers for Natural Language Processing, Loïc Rio, Sylvain Gugger, Thomas Wolf, 2022 (O'Reilly Media) - A practical and comprehensive guide to Transformer models, including the underlying attention mechanisms.
MultiheadAttention - PyTorch documentation, PyTorch Core Team, 2024 (PyTorch Foundation) - Official documentation for PyTorch's attention module, relevant for understanding practical implementation and parameters of attention layers.