Data Governance and Lineage in Distributed RAG Systems
Was this section helpful?
The DAMA Guide to the Data Management Body of Knowledge (DAMA-DMBOK), DAMA International, 2017 (Technics Publications) - This comprehensive guide details data management, encompassing data governance, quality, and metadata management, providing a framework for large-scale data systems.
OpenLineage Specification, LF AI & Data Foundation, 2023 (LF AI & Data Foundation) - Defines an open standard for collecting and managing data lineage metadata from various data systems, making it suitable for complex distributed pipelines.
Data Governance for Machine Learning: A Survey, Shaghayegh Ebrahimi, Marinka Zitnik, Daniel F. M. S. de R. P. E, Peter F. E, 2020ACM Computing Surveys, Vol. 53 (Association for Computing Machinery (ACM))DOI: 10.1145/3375883 - A survey of challenges and solutions for data governance in machine learning systems, covering data quality, privacy, and explainability.
DataHub: A Metadata Platform for the Modern Data Stack, Shirshanka Das, John Ma, Pedro Silva, Andy Su, Bo Fu, Hichel Lammas, Kevin Liu, Mark Mamon, Mike Minami, Roy Xue, Sethu Raman, Yingjun Wu, David Lee, 2020ACM SIGMOD Record, Vol. 49 (Association for Computing Machinery (ACM))DOI: 10.1145/3444453.3444465 - Describes DataHub's architecture and features as a metadata platform, supporting data discovery, governance, and lineage in large-scale data environments.