Last updated: July 22, 2025
DQOps check_results parquet table schema
The parquet file schema for the check_results table stored in the $DQO_USER_HOME/.data/check_results folder in DQOps.
Table description
The data quality check results table that stores the data quality check results - a copy of sensor readouts (copied from the sensor_readouts table) and evaluated by the data quality rules. This table differs from the sensor_readouts by adding also the result of the rule evaluation. The additional columns are the 'severity' which says if the check passed (the severity is 0) or the data quality check raised a data quality issue with a severity warning - 1, error - 2 or fatal - 3. The check results are stored in the check_results table is located in the $DQO_USER_HOME/.data/check_results folder that contains uncompressed parquet files. The table is partitioned using a Hive compatible partitioning folder structure. When the $DQO_USER_HOME is not configured, it is the folder where DQOps was started (the DQOps user's home folder).
The folder partitioning structure for this table is: c=[connection_name]/t=[schema_name.table_name]/m=[first_day_of_month]/, for example: c=myconnection/t=public.testedtable/m=2023-01-01/.
Parquet table schema
The columns of this table are described below.
Column name | Description | Hive data type |
---|---|---|
id |
The check result id (primary key), it is a uuid of the check hash, time period and the data stream id. This value identifies a single row. | STRING |
actual_value |
The actual sensor value that was captured. | DOUBLE |
expected_value |
The expected value (expected_value). It is an optional column used when the sensor will also retrieve a comparison value (for accuracy checks). | DOUBLE |
time_period |
The time period of the sensor readout (timestamp), using a local timezone from the data source. | TIMESTAMP |
time_period_utc |
The time period of the sensor readout (timestamp) as a UTC timestamp. | TIMESTAMP |
time_gradient |
The time gradient (daily, monthly) for monitoring checks (checkpoints) and partition checks. It is a "milliseconds" for profiling checks. When the time gradient is daily or monthly, the time_period is truncated at the beginning of the time gradient. | STRING |
grouping_level_1 |
Data group value at a single level. | STRING |
grouping_level_2 |
Data group value at a single level. | STRING |
grouping_level_3 |
Data group value at a single level. | STRING |
grouping_level_4 |
Data group value at a single level. | STRING |
grouping_level_5 |
Data group value at a single level. | STRING |
grouping_level_6 |
Data group value at a single level. | STRING |
grouping_level_7 |
Data group value at a single level. | STRING |
grouping_level_8 |
Data group value at a single level. | STRING |
grouping_level_9 |
Data group value at a single level. | STRING |
data_group_hash |
The data group hash, it is a hash of the data group levels' values. | BIGINT |
data_group_name |
The data group name, it is a concatenated name of the data group dimension values, created from [grouping_level_1] / [grouping_level_2] / ... | STRING |
data_grouping_configuration |
The data grouping configuration name, it is a name of the named data grouping configuration that was used to run the data quality check. | STRING |
connection_hash |
A hash calculated from the connection name (the data source name). | BIGINT |
connection_name |
The connection name (the data source name). | STRING |
provider |
The provider name, which is the type of the data source. | STRING |
table_hash |
The table name hash. | BIGINT |
schema_name |
The database schema name. | STRING |
table_name |
The monitored table name. | STRING |
table_name_pattern |
The table name pattern, in case that a data quality check targets multiple tables. | STRING |
table_stage |
The stage name of the table. This is a free-form text configured at the table level that can identify the layers of the data warehouse or a data lake, for example: "landing", "staging", "cleansing", etc. | STRING |
table_priority |
The table priority value copied from the table's definition. The table priority can be used to sort tables according to their importance. | INTEGER |
column_hash |
The hash of a column. | BIGINT |
column_name |
The column for which the results are stored. | STRING |
column_name_pattern |
The column pattern, in case that a data quality check targets multiple columns. | STRING |
check_hash |
The hash of a data quality check. | BIGINT |
check_name |
The data quality check name. | STRING |
check_display_name |
The user configured display name for a data quality check, used when the user wants to use custom, user-friendly data quality check names. | STRING |
check_type |
The data quality check type (profiling, monitoring, partitioned). | STRING |
check_category |
The data quality check category name. | STRING |
table_comparison |
The name of a table comparison configuration used for a data comparison (accuracy) check. | STRING |
quality_dimension |
The data quality dimension name. The popular dimensions are: Timeliness, Completeness, Consistency, Validity, Reasonableness, Uniqueness. | STRING |
sensor_name |
The data quality sensor name. | STRING |
time_series_id |
The time series id (uuid). Identifies a single time series. A time series is a combination of the check_hash and data_group_hash. | STRING |
executed_at |
The UTC timestamp, when the data sensor was executed. | TIMESTAMP |
duration_ms |
The sensor (query) execution duration in milliseconds. | INTEGER |
created_at |
The timestamp when the row was created at. | TIMESTAMP |
updated_at |
The timestamp when the row was updated at. | TIMESTAMP |
created_by |
The login of the user that created the row. | STRING |
updated_by |
The login of the user that updated the row. | STRING |
severity |
Check (rule) severity (0, 1, 2, 3) for none, warning, error and fatal severity failed data quality checks. | INTEGER |
incident_hash |
The matching data quality incident hash. The value is used to map a failed data quality check to an incident. | BIGINT |
reference_connection |
The name of a connection to another data source that contains the reference data used as the expected values for accuracy checks. | STRING |
reference_schema |
The schema in another data source that contains the reference data used as the expected values for accuracy checks. | STRING |
reference_table |
The table name in another data source that contains the reference data used as the expected values for accuracy checks. | STRING |
reference_column |
The column name in another data source that contains the reference data used as the expected values for accuracy checks. | STRING |
include_in_kpi |
The boolean column that identifies data quality rule results that should be counted in the data quality KPI. | BOOLEAN |
include_in_sla |
The boolean column that identifies data quality rule results that should be counted in the data quality SLA (Data Contract). | BOOLEAN |
fatal_lower_bound |
The warning lower bound, returned by the fatal severity rule. | DOUBLE |
fatal_upper_bound |
The fatal upper bound, returned by the fatal severity rule. | DOUBLE |
error_lower_bound |
The error lower bound, returned by the error (medium) severity rule. | DOUBLE |
error_upper_bound |
The error upper bound, returned by the error severity rule. | DOUBLE |
warning_lower_bound |
The warning lower bound, returned by the warning severity rule. | DOUBLE |
warning_upper_bound |
The warning upper bound, returned by the warning severity rule. | DOUBLE |
What's more
- You can find more information on how the Parquet files are partitioned in the data quality results storage concept.