C: Capture
Know where all your data comes from and where it lives. The first step to clean data is a complete inventory of your data assets.
The C.A.T.C.H. Methodology
A systematic approach to identifying and documenting all your data sources.
Collect all sources
Identify and list every source of data in your organization, from spreadsheets and databases to cloud applications and IoT devices.
Aggregate formats
Group your data sources by format (e.g., CSV, JSON, PDF) to understand the diversity of your data landscape.
Trace origins
Understand the lineage of your data. Document where it's created, how it's transformed, and who is responsible for it.
Connect systems
Map out how data flows between different systems to identify dependencies and potential bottlenecks.
Highlight gaps
Identify missing data sources or gaps in your data collection processes that could impact your AI initiatives.
Next Step: Label
Once you have a complete inventory of your data, the next step is to organize and categorize it. Learn how to make your data understandable with the Label dimension.
Continue to L: Label