Last updated: July 22, 2025
How to Detect Data Accuracy Issues? Examples and Best Practices
Data accuracy checks compare an aggregated value in a tested table to the same aggregated value in a reference table to detect differences.
The accuracy checks in DQOps are configured in the accuracy
category of data quality checks.
How data accuracy checks work in DQOps
Data accuracy checks in DQOps run queries with aggregate expressions on two tables: the tested table and the reference table. Both tables must be in the same database.
DQOps stores the measure captured from the tested table in the sensor_readout
table
in an actual_value
column. The other measure captured from the reference table is stored in the expected_value
column.
Aggregate functions
DQOps uses the following list of aggregate functions in SQL queries used in the data quality sensors.
SQL aggregate function | Where to measure it | Description |
---|---|---|
COUNT(*) |
table-level | The number of rows. |
SUM(column) |
column-level | The aggregate sum of all values in a column. |
MIN(column) |
column-level | The minimum value in a column. |
MAX(column) |
column-level | The maximum value in a column. |
AVG(column) |
column-level | The average (mean) value in a column, which is equal to SUM(column) / COUNT(*) . |
COUNT(column) |
column-level | The count of not-null values in a column. Comparing the count of non-null values between up-stream and down-stream tables enables detection of data casting issues, when invalid values were changed to a null value. |
Compared tables
Data accuracy checks in DQOps are designed to compare tables that can be updated in the same transaction. In order to ensure that the comparison is accurate, DQOps runs two aggregate queries on both tables.
Data accuracy check that compares the number of rows in two tables will be demonstrated on two tables that are on the same data lineage. The analyzed table is a staging table, and the reference table is a table in the landing zone on the same database.
SQL queries
DQOps must run the following query on the analyzed table.
SELECT
COUNT(*) AS actual_value
FROM "your_postgresql_database"."staging"."staging_customers"" AS analyzed_table
A similar query must be run on the reference table in the landing zone.
DQOps merges both queries into a single query. A merged query is executed in a single transaction, and a second round-trip to the database is avoided.
SELECT
(SELECT
COUNT(*)
FROM your_postgresql_database.landing.landing_customers AS referenced_table
) AS expected_value,
COUNT(*) AS actual_value
FROM "your_postgresql_database"."staging"."staging_customers"" AS analyzed_table
Configuring data accuracy checks
The configuration of data accuracy checks requires setting two types of parameters.
-
referenced_table
parameter for the data quality sensor, it is a fully qualified name of the reference table -
max_diff_percent
parameter(s) for the data quality rules, which are the alerting thresholds.
Configuring in YAML
An example DQOps YAML file is shown below.
Data quality rule parameters
The data accuracy checks support raising data quality issues at different severity levels. The rule parameter used in the accuracy checks is the maximum allowed error percentage between the measures captured on the analyzed and the reference tables.
DQOps uses the following default configuration.
-
A data quality issue at a
warning
severity level is raised when the error is > 0.0% (the measures differ even slightly) -
A data quality issue at an
error
severity level is raised when the error is > 1.0% -
A data quality issue at a
fatal
severity level is raised when the error is > 5.0%
Data accuracy vs comparison checks
DQOps also supports another type of data accuracy check, designed for comparing tables across data sources.
The table comparison checks are defined in a comparison
category. A table comparison also requires additional configuration.
Please read the manual for comparing tables between data sources in DQOps to learn more.
Table comparison checks also support comparing the tables using an additional GROUP BY clause, to compare groups of rows.
The following table describes the differences between the data quality checks in the accuracy
and the comparison
categories.
Category | Run in a single transaction | Compare data between data sources | Compare with grouping (GROUP BY) |
---|---|---|---|
accuracy |
|||
comparison |
List of accuracy checks at a table level
Data quality check name | Friendly name | Data quality dimension | Description | Standard check |
---|---|---|---|---|
total_row_count_match_percent | Maximum percentage of difference in total row count between tables | Accuracy | A table-level check that compares the row count of the current (tested) table with the row count of another table that is referenced. This check ensures that the difference between the row counts is below the maximum accepted percentage of difference. This check runs an SQL query with an INNER JOIN clause to join another (referenced) table that must be defined in the same database. |
Reference and samples
The full list of all data quality checks in this category is located in the table/accuracy reference. The reference section provides YAML code samples that are ready to copy-paste to the .dqotable.yaml files, the parameters reference, and samples of data source specific SQL queries generated by data quality sensors that are used by those checks.
List of accuracy checks at a column level
Data quality check name | Friendly name | Data quality dimension | Description | Standard check |
---|---|---|---|---|
total_sum_match_percent | The maximum difference in percent between the total sums of this column and the reference column | Accuracy | A column-level check that ensures that the difference between the sum of all values in the tested column and the sum of values in another column in a referenced table is below a maximum accepted percentage of difference. This check runs an SQL query with an INNER JOIN clause to join another (referenced) table that must be defined in the same database. | |
total_min_match_percent | The maximum difference in percent between the minimum value of this column and the reference column | Accuracy | A column-level check that ensures that the difference between the minimum value in the tested column and the minimum value in another column in a referenced table is below a maximum accepted percentage of difference. This check runs an SQL query with an INNER JOIN clause to join another (referenced) table that must be defined in the same database. | |
total_max_match_percent | The maximum difference in percent between the maximum value of this column and the reference column | Accuracy | A column-level check that ensures that the difference between the maximum value in the tested column and the maximum value in another column in a referenced table is below a maximum accepted percentage of difference. This check runs an SQL query with an INNER JOIN clause to join another (referenced) table that must be defined in the same database. | |
total_average_match_percent | The maximum difference in percent between the average value of this column and the reference column | Accuracy | A column-level check that ensures that the difference between the average value in the tested column and the average value of another column in the referenced table is below the maximum accepted percentage of difference. This check runs an SQL query with an INNER JOIN clause to join another (referenced) table that must be defined in the same database. | |
total_not_null_count_match_percent | The maximum difference in percent between the count of not null values of this column and the reference column | Accuracy | A column-level check that ensures that the difference between the count of null values in the tested column and the count of null values in another column in a referenced table is below a maximum accepted percentage of difference. This check runs an SQL query with an INNER JOIN clause to join another (referenced) table that must be defined in the same database. |
Reference and samples
The full list of all data quality checks in this category is located in the column/accuracy reference. The reference section provides YAML code samples that are ready to copy-paste to the .dqotable.yaml files, the parameters reference, and samples of data source specific SQL queries generated by data quality sensors that are used by those checks.
What's next
- Learn how to run data quality checks filtering by a check category name
- Learn how to configure data quality checks and apply alerting rules
- Read the definition of data quality dimensions used by DQOps