LangChain Documentation: Data Connection, LangChain Team, 2024 (LangChain) - The official resource for understanding how to use LangChain's document loaders and transformers for data ingestion and preprocessing.
Unstructured: Open-Source Library, Unstructured Team, 2022 (Unstructured) - Provides information on the unstructured library, which offers advanced, multi-format document parsing and element extraction.
Speech and Language Processing (3rd ed. draft), Daniel Jurafsky and James H. Martin, 2025 - A comprehensive textbook covering fundamental concepts in Natural Language Processing, including text processing, normalization, and tokenization.