Data quality monitoring for Data Ingestion
Ensure that your data sources are pulled correctly
How often do you hear that the data received from external sources is wrong, but it was correct in the past?
DQO allows you to monitor your ingestion tables. Detect schema changes, data format changes, missing data or inconsistent delays in the data delivery.
Data format and ranges
Data format and ranges
Detect data format and data ranges issues in source data before the data pipeline fails on the transformation steps.
Validity checks such as data format, not null, data ranges or uniqueness checks are defined for each source table. Data quality checks are executed after the source data is loaded into your ingestion tables. Any data quality issues are easy to understand. Order of columns in a CSV file has changed and the data was loaded into different columns? Data format or distinct column count checks will detect it.
- Define data format and data range rules for source data
- Run data quality checks at the end of the data ingestion pipeline
- Detect data format issues by detecting unexpected changes to distinct counts of column values
Schema change detection
Schema change detection
Monitor unexpected changes in the table schema that will affect downstream tables. Find out that the column size has been increased, and some rows may not load in downstream tables.
DQO captures the table schema when the table metadata is first retrieved. The column list is hashed to detect changes in the schema. In addition, column data types are compared each time schema checks are performed.
- Detect column data type changes
- Detect column count changes, added or removed columns
- Detect a change to the order of columns that may turn SELECT * queries unusable
Data delays and stale tables
Data delays and stale tables
Analyze the growth (increase in the number of rows) and lag of data (latest rows) to detect tables that are not recently refreshed.
DQO can analyze the timeliness of your data to measure the average lag of data. When your rows do not have a timestamp column, DQO can analyze row growth by measuring the number of rows each day and learning the average growth rate of the table.
- Detect tables that have not been refreshed recently
- Find out which tables are not updated frequently in their data sources (inconsistent refresh)
- Detect tables that receive fewer data than the average
Ground Truth Checks
Ground Truth Checks
Compare the source data with other trusted sources to ensure that your database shows accurate data.
DQO can run accuracy checks that will compare the table with real-world reference data. Just define the other table or a query that returns the same aggregated data (sum, count, etc.) grouped by the same business dimension. Also, compare the data with flat files that you can load to the data quality database.
- Compare the data with the real world, reference data
- Detect issues at a business relevant dimension (date, country, city, department, state, etc.)
- Ensure that you really trust the source data and you can proof it