Last updated: July 22, 2025

DQOps errors parquet table schema

The parquet file schema for the errors table stored in the $DQO_USER_HOME/.data/errors folder in DQOps.

Table description

The data quality execution errors table that stores execution errors captured during the sensor execution or the rule evaluation. The sensor execution errors are error messages received from the data source when the tested table does not exist or the sensor's SQL query is invalid. The rule execution errors are exceptions raised during the Python rule evaluation. The errors are stored in the errors table is located in the $DQO_USER_HOME/.data/errors folder that contains uncompressed parquet files. The table is partitioned using a Hive compatible partitioning folder structure. When the $DQO_USER_HOME is not configured, it is the folder where DQOps was started (the DQOps user's home folder).

The folder partitioning structure for this table is: c=[connection_name]/t=[schema_name.table_name]/m=[first_day_of_month]/, for example: c=myconnection/t=public.testedtable/m=2023-01-01/.

Parquet table schema

The columns of this table are described below.

Column name	Description	Hive data type
`id`	The check result id (primary key), it is a uuid of the check hash, time period and the data stream id. This value identifies a single row.	STRING
`actual_value`	The actual sensor value that was captured.	DOUBLE
`expected_value`	The expected value (expected_value). It is an optional column used when the sensor will also retrieve a comparison value (for accuracy checks).	DOUBLE
`time_period`	The time period of the sensor readout (timestamp), using a local timezone from the data source.	TIMESTAMP
`time_period_utc`	The time period of the sensor readout (timestamp) as a UTC timestamp.	TIMESTAMP
`time_gradient`	The time gradient (daily, monthly) for monitoring checks (checkpoints) and partition checks. It is a "milliseconds" for profiling checks. When the time gradient is daily or monthly, the time_period is truncated at the beginning of the time gradient.	STRING
`grouping_level_1`	Data group value at a single level.	STRING
`grouping_level_2`	Data group value at a single level.	STRING
`grouping_level_3`	Data group value at a single level.	STRING
`grouping_level_4`	Data group value at a single level.	STRING
`grouping_level_5`	Data group value at a single level.	STRING
`grouping_level_6`	Data group value at a single level.	STRING
`grouping_level_7`	Data group value at a single level.	STRING
`grouping_level_8`	Data group value at a single level.	STRING
`grouping_level_9`	Data group value at a single level.	STRING
`data_group_hash`	The data group hash, it is a hash of the data group levels' values.	BIGINT
`data_group_name`	The data group name, it is a concatenated name of the data group dimension values, created from [grouping_level_1] / [grouping_level_2] / ...	STRING
`data_grouping_configuration`	The data grouping configuration name, it is a name of the named data grouping configuration that was used to run the data quality check.	STRING
`connection_hash`	A hash calculated from the connection name (the data source name).	BIGINT
`connection_name`	The connection name (the data source name).	STRING
`provider`	The provider name, which is the type of the data source.	STRING
`table_hash`	The table name hash.	BIGINT
`schema_name`	The database schema name.	STRING
`table_name`	The monitored table name.	STRING
`table_name_pattern`	The table name pattern, in case that a data quality check targets multiple tables.	STRING
`table_stage`	The stage name of the table. This is a free-form text configured at the table level that can identify the layers of the data warehouse or a data lake, for example: "landing", "staging", "cleansing", etc.	STRING
`table_priority`	The table priority value copied from the table's definition. The table priority can be used to sort tables according to their importance.	INTEGER
`column_hash`	The hash of a column.	BIGINT
`column_name`	The column for which the results are stored.	STRING
`column_name_pattern`	The column pattern, in case that a data quality check targets multiple columns.	STRING
`check_hash`	The hash of a data quality check.	BIGINT
`check_name`	The data quality check name.	STRING
`check_display_name`	The user configured display name for a data quality check, used when the user wants to use custom, user-friendly data quality check names.	STRING
`check_type`	The data quality check type (profiling, monitoring, partitioned).	STRING
`check_category`	The data quality check category name.	STRING
`table_comparison`	The name of a table comparison configuration used for a data comparison (accuracy) check.	STRING
`quality_dimension`	The data quality dimension name. The popular dimensions are: Timeliness, Completeness, Consistency, Validity, Reasonableness, Uniqueness.	STRING
`sensor_name`	The data quality sensor name.	STRING
`time_series_id`	The time series id (uuid). Identifies a single time series. A time series is a combination of the check_hash and data_group_hash.	STRING
`executed_at`	The UTC timestamp, when the data sensor was executed.	TIMESTAMP
`duration_ms`	The sensor (query) execution duration in milliseconds.	INTEGER
`created_at`	The timestamp when the row was created at.	TIMESTAMP
`updated_at`	The timestamp when the row was updated at.	TIMESTAMP
`created_by`	The login of the user that created the row.	STRING
`updated_by`	The login of the user that updated the row.	STRING
`readout_id`	Column that stores the sensor readout ID.	STRING
`error_message`	Column that stores the error message.	STRING
`error_source`	Column that stores the error source, which is the component that raised an error (sensor or rule).	STRING
`error_timestamp`	Column that stores the error timestamp using the local timestamp.	TIMESTAMP

What's more

You can find more information on how the Parquet files are partitioned in the data quality results storage concept.