C: Capture

Know where all your data comes from and where it lives. The first step to clean data is a complete inventory of your data assets.

The C.A.T.C.H. Methodology

A systematic approach to identifying and documenting all your data sources.

Collect all sources

Identify and list every source of data in your organization, from spreadsheets and databases to cloud applications and IoT devices.

Aggregate formats

Group your data sources by format (e.g., CSV, JSON, PDF) to understand the diversity of your data landscape.

Trace origins

Understand the lineage of your data. Document where it's created, how it's transformed, and who is responsible for it.

Connect systems

Map out how data flows between different systems to identify dependencies and potential bottlenecks.

Highlight gaps

Identify missing data sources or gaps in your data collection processes that could impact your AI initiatives.

Next Step: Label

Once you have a complete inventory of your data, the next step is to organize and categorize it. Learn how to make your data understandable with the Label dimension.

Continue to L: Label