Historical Context of Sequence Modeling

New · Open Source

Kerb - LLM Development Toolkit

Python toolkit for building production-ready LLM applications. Modular utilities for prompts, RAG, agents, structured outputs, and multi-provider support.

Was this section helpful?

References

Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, Daniel Jurafsky and James H. Martin, 2025 (Pearson Education) - A comprehensive textbook covering foundational concepts in natural language processing, including N-gram models and their statistical estimation.
Long Short-Term Memory, Sepp Hochreiter, Jürgen Schmidhuber, 1997 Neural Computation, Vol. 9 (MIT Press) DOI: 10.1162/neco.1997.9.8.1735 - The original paper introducing Long Short-Term Memory (LSTM) networks, which address the vanishing gradient problem in RNNs.
Neural Machine Translation by Jointly Learning to Align and Translate, Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio, 2014 International Conference on Learning Representations (ICLR) DOI: 10.48550/arXiv.1409.0473 - This paper introduced the attention mechanism, allowing models to focus on relevant input parts when generating output sequences.
Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017 Advances in Neural Information Processing Systems (NeurIPS), Vol. 30 DOI: 10.48550/arXiv.1706.03762 - The seminal paper that introduced the Transformer architecture, which relies solely on attention mechanisms and enabled significant parallelization for sequence modeling.