Skip to content

Last updated: July 22, 2025

DQOps errors parquet table schema

The parquet file schema for the errors table stored in the $DQO_USER_HOME/.data/errors folder in DQOps.

Table description

The data quality execution errors table that stores execution errors captured during the sensor execution or the rule evaluation. The sensor execution errors are error messages received from the data source when the tested table does not exist or the sensor's SQL query is invalid. The rule execution errors are exceptions raised during the Python rule evaluation. The errors are stored in the errors table is located in the $DQO_USER_HOME/.data/errors folder that contains uncompressed parquet files. The table is partitioned using a Hive compatible partitioning folder structure. When the $DQO_USER_HOME is not configured, it is the folder where DQOps was started (the DQOps user's home folder).

The folder partitioning structure for this table is: c=[connection_name]/t=[schema_name.table_name]/m=[first_day_of_month]/, for example: c=myconnection/t=public.testedtable/m=2023-01-01/.

Parquet table schema

The columns of this table are described below.

Column name Description Hive data type
id The check result id (primary key), it is a uuid of the check hash, time period and the data stream id. This value identifies a single row. STRING
actual_value The actual sensor value that was captured. DOUBLE
expected_value The expected value (expected_value). It is an optional column used when the sensor will also retrieve a comparison value (for accuracy checks). DOUBLE
time_period The time period of the sensor readout (timestamp), using a local timezone from the data source. TIMESTAMP
time_period_utc The time period of the sensor readout (timestamp) as a UTC timestamp. TIMESTAMP
time_gradient The time gradient (daily, monthly) for monitoring checks (checkpoints) and partition checks. It is a "milliseconds" for profiling checks. When the time gradient is daily or monthly, the time_period is truncated at the beginning of the time gradient. STRING
grouping_level_1 Data group value at a single level. STRING
grouping_level_2 Data group value at a single level. STRING
grouping_level_3 Data group value at a single level. STRING
grouping_level_4 Data group value at a single level. STRING
grouping_level_5 Data group value at a single level. STRING
grouping_level_6 Data group value at a single level. STRING
grouping_level_7 Data group value at a single level. STRING
grouping_level_8 Data group value at a single level. STRING
grouping_level_9 Data group value at a single level. STRING
data_group_hash The data group hash, it is a hash of the data group levels' values. BIGINT
data_group_name The data group name, it is a concatenated name of the data group dimension values, created from [grouping_level_1] / [grouping_level_2] / ... STRING
data_grouping_configuration The data grouping configuration name, it is a name of the named data grouping configuration that was used to run the data quality check. STRING
connection_hash A hash calculated from the connection name (the data source name). BIGINT
connection_name The connection name (the data source name). STRING
provider The provider name, which is the type of the data source. STRING
table_hash The table name hash. BIGINT
schema_name The database schema name. STRING
table_name The monitored table name. STRING
table_name_pattern The table name pattern, in case that a data quality check targets multiple tables. STRING
table_stage The stage name of the table. This is a free-form text configured at the table level that can identify the layers of the data warehouse or a data lake, for example: "landing", "staging", "cleansing", etc. STRING
table_priority The table priority value copied from the table's definition. The table priority can be used to sort tables according to their importance. INTEGER
column_hash The hash of a column. BIGINT
column_name The column for which the results are stored. STRING
column_name_pattern The column pattern, in case that a data quality check targets multiple columns. STRING
check_hash The hash of a data quality check. BIGINT
check_name The data quality check name. STRING
check_display_name The user configured display name for a data quality check, used when the user wants to use custom, user-friendly data quality check names. STRING
check_type The data quality check type (profiling, monitoring, partitioned). STRING
check_category The data quality check category name. STRING
table_comparison The name of a table comparison configuration used for a data comparison (accuracy) check. STRING
quality_dimension The data quality dimension name. The popular dimensions are: Timeliness, Completeness, Consistency, Validity, Reasonableness, Uniqueness. STRING
sensor_name The data quality sensor name. STRING
time_series_id The time series id (uuid). Identifies a single time series. A time series is a combination of the check_hash and data_group_hash. STRING
executed_at The UTC timestamp, when the data sensor was executed. TIMESTAMP
duration_ms The sensor (query) execution duration in milliseconds. INTEGER
created_at The timestamp when the row was created at. TIMESTAMP
updated_at The timestamp when the row was updated at. TIMESTAMP
created_by The login of the user that created the row. STRING
updated_by The login of the user that updated the row. STRING
readout_id Column that stores the sensor readout ID. STRING
error_message Column that stores the error message. STRING
error_source Column that stores the error source, which is the component that raised an error (sensor or rule). STRING
error_timestamp Column that stores the error timestamp using the local timestamp. TIMESTAMP

What's more