E: Evaluate
Identify the 'messiness' in your data. Check for missing, duplicate, and inconsistent data to understand the scope of your data quality issues.
The S.C.A.N. Methodology
A process for thoroughly examining your data to uncover quality issues.
Spot errors
Look for obvious errors in data entries and formats, such as typos, incorrect data types, or values that fall outside expected ranges.
Check completeness
Assess the completeness of your records and fields. Identify missing values and determine their potential impact on your analysis.
Assess duplication
Identify and quantify duplicate records across your datasets. Duplicates can skew analysis and lead to incorrect conclusions.
Note inconsistencies
Document inconsistencies in naming conventions, units of measurement, and categorical values across different data sources.
Next Step: Align
After evaluating your data, the next step is to standardize it. Learn how to apply consistent formats and naming conventions with the Align dimension.
Continue to A: Align