Data Version Control (DVC) Documentation, Iterative, Inc., 2024 - The official documentation for Data Version Control (DVC), an open-source tool for versioning datasets and machine learning models, which integrates with Git.
Data Management for Machine Learning: A Survey, Ce Zhang, Xin Luna Dong, and Anand P. Rajaraman, 2020Proceedings of the VLDB Endowment, Vol. 13 (VLDB Endowment)DOI: 10.14778/3400735.3400736 - A survey of data management challenges and techniques in machine learning, covering data preparation, versioning, and lineage that are central for reproducibility.
lakeFS Documentation, Treeverse, 2024 (Treeverse) - The official documentation for lakeFS, an open-source tool that provides Git-like branching and versioning for data lakes, enabling atomic transactions and isolated environments for data experimentation.