errors
errors
The data quality execution errors table that stores execution errors captured during the sensor execution or the rule evaluation. The sensor execution errors are error messages received from the data source when the tested table does not exist or the sensor's SQL query is invalid. The rule execution errors are exceptions raised during the Python rule evaluation. The errors are stored in the errors table is located in the $DQO_USER_HOME/.data/errors folder that contains uncompressed parquet files. The table is partitioned using a Hive compatible partitioning folder structure. When the $DQO_USER_HOME is not configured, it is the folder where DQO was started (the DQO user's home folder).
The folder partitioning structure for this table is: c=[connection_name]/t=[schema_name.table_name]/m=[first_day_of_month]/, for example: c=myconnection/t=public.testedtable/m=2023-01-01/.
The columns of this table is described below
Column name | Description | Data type |
---|---|---|
id | The check result id (primary key), it is a uuid of the check hash, time period and the data stream id. This value identifies a single row. | text |
actual_value | The actual sensor value that was captured. | double |
expected_value | The expected value (expected_value). It is an optional column used when the sensor will also retrieve a comparison value (for accuracy checks). | double |
time_period | The time period of the sensor readout (timestamp), using a local timezone from the data source. | local_date_time |
time_period_utc | The time period of the sensor readout (timestamp) as a UTC timestamp. | instant |
time_gradient | The time gradient (daily, monthly) for recurring checks (checkpoints) and partition checks. It is a "milliseconds" for profiling checks. When the time gradient is daily or monthly, the time_period is truncated at the beginning of the time gradient. | text |
grouping_level_1 | Column name for the data stream. | text |
grouping_level_2 | Column name for the data stream. | text |
grouping_level_3 | Column name for the data stream. | text |
grouping_level_4 | Column name for the data stream. | text |
grouping_level_5 | Column name for the data stream. | text |
grouping_level_6 | Column name for the data stream. | text |
grouping_level_7 | Column name for the data stream. | text |
grouping_level_8 | Column name for the data stream. | text |
grouping_level_9 | Column name for the data stream. | text |
data_group_hash | The data group hash, it is a hash of the data group levels' values. | long |
data_group_name | The data group name, it is a concatenated name of the data group dimension values, created from [grouping_level_1] / [grouping_level_2] / ... | text |
data_grouping_configuration | The data grouping configuration name, it is a name of the named data grouping configuration that was used to run the data quality check. | text |
connection_hash | A hash calculated from the connection name (the data source name). | long |
connection_name | The connection name (the data source name). | text |
provider | The provider name, which is the type of the data source. | text |
table_hash | The table name hash. | long |
schema_name | The database schema name. | text |
table_name | The monitored table name. | text |
table_name_pattern | The table name pattern, in case that a data quality check targets multiple tables. | text |
table_stage | The stage name of the table. It is a free-form text configured on the table level that could identify the layers of the data warehouse or a data lake, for example: "landing", "staging", "cleansing", etc. | text |
table_priority | The table priority value copied from the table's definition. The table priority could be used for sorting tables by their importance. | integer |
column_hash | The hash of a column. | long |
column_name | The column for which the results are stored. | text |
column_name_pattern | The column pattern, in case that a data quality check targets multiple columns. | text |
check_hash | The hash of a data quality check. | long |
check_name | The data quality check name. | text |
check_display_name | The user configured display name for a data quality check, used when the user wants to use custom, user-friendly data quality check names. | text |
check_type | The data quality check type (profiling, recurring, partitioned). | text |
check_category | The data quality check category name. | text |
table_comparison | The name of a table comparison configuration used for a data comparison (accuracy) check. | text |
quality_dimension | The data quality dimension name. The popular dimensions are: Timeliness, Completeness, Consistency, Validity, Reasonableness, Uniqueness. | text |
sensor_name | The data quality sensor name. | text |
time_series_id | The time series id (uuid). Identifies a single time series. A time series is a combination of the check_hash and data_stream_hash. | text |
executed_at | The UTC timestamp, when the data sensor was executed. | instant |
duration_ms | The sensor (query) execution duration in milliseconds. | integer |
created_at | The timestamp when the row was created at. | instant |
updated_at | The timestamp when the row was updated at. | instant |
created_by | The login of the user that created the row. | text |
updated_by | The login of the user that updated the row. | text |
readout_id | Column that stores the sensor readout ID. | text |
error_message | Column that stores the error message. | text |
error_source | Column that stores the error source, which is the component that raised an error (sensor or rule). | text |
error_timestamp | Column that stores the error timestamp using the local timestamp. | local_date_time |