Skip to content

Last updated: July 22, 2025

DQOps error_samples parquet table schema

The parquet file schema for the error_samples table stored in the $DQO_USER_HOME/.data/error_samples folder in DQOps.

Table description

The error samples table that stores sample column values that failed data quality checks that operate on rows (mostly Validity and Consistency checks). The error samples are stored in the errors table is located in the $DQO_USER_HOME/.data/error_samples folder that contains uncompressed parquet files. The table is partitioned using a Hive compatible partitioning folder structure. When the $DQO_USER_HOME is not configured, it is the folder where DQOps was started (the DQOps user's home folder).

The folder partitioning structure for this table is: c=[connection_name]/t=[schema_name.table_name]/m=[first_day_of_month]/, for example: c=myconnection/t=public.analyzedtable/m=2023-01-01/. The date used for monthly partitioning is calculated from the executed_at column value.

Parquet table schema

The columns of this table are described below.

Column name Description Hive data type
id The check result id (primary key), it is a uuid of the check hash, collected at, sample index and the data grouping id. This value identifies a single row. STRING
collected_at Column for the time when the error samples were captured. All error samples results started as part of the same error sampling session will share the same time. The parquet files are time partitioned by this column. TIMESTAMP
scope String column that says if the result is for a whole table (the "table" value) or for each data group separately (the "data_group" value). STRING
grouping_level_1 Data group value at a single level. STRING
grouping_level_2 Data group value at a single level. STRING
grouping_level_3 Data group value at a single level. STRING
grouping_level_4 Data group value at a single level. STRING
grouping_level_5 Data group value at a single level. STRING
grouping_level_6 Data group value at a single level. STRING
grouping_level_7 Data group value at a single level. STRING
grouping_level_8 Data group value at a single level. STRING
grouping_level_9 Data group value at a single level. STRING
data_group_hash The data grouping hash, it is a hash of the data grouping level values. BIGINT
data_group_name The data grouping name, it is a concatenated name of the data grouping dimension values, created from [grouping_level_1] / [grouping_level_2] / ... STRING
data_grouping_configuration The data grouping configuration name, it is a name of the named data grouping configuration that was used to run the data quality check. STRING
connection_hash A hash calculated from the connection name (the data source name). BIGINT
connection_name The connection name (the data source name). STRING
provider The provider name, which is the type of the data source. STRING
table_hash The table name hash. BIGINT
schema_name The database schema name. STRING
table_name The monitored table name. STRING
table_stage The stage name of the table. This is a free-form text configured at the table level that can identify the layers of the data warehouse or a data lake, for example: "landing", "staging", "cleansing", etc. STRING
table_priority The table priority value copied from the table's definition. The table priority can be used to sort tables according to their importance. INTEGER
column_hash The hash of a column. BIGINT
column_name The column name for which the results are stored. STRING
check_hash The hash of a data quality check. BIGINT
check_name The data quality check name. STRING
check_display_name The user configured display name for a data quality check, used when the user wants to use custom, user-friendly data quality check names. STRING
check_type The data quality check type (profiling, monitoring, partitioned). STRING
time_gradient The time gradient (daily, monthly) for monitoring checks (checkpoints) and partition checks. It is a "milliseconds" for profiling checks. When the time gradient is daily or monthly, the time_period is truncated at the beginning of the time gradient. STRING
check_category The data quality check category name. STRING
quality_dimension The data quality dimension name. The popular dimensions are: Timeliness, Completeness, Consistency, Validity, Reasonableness, Uniqueness. STRING
table_comparison The name of a table comparison configuration used for a data comparison (accuracy) check. STRING
sensor_name The data quality sensor name. STRING
time_series_id The time series id (uuid). Identifies a single time series. A time series is a combination of the check_hash and data_group_hash. STRING
result_type The sample's result data type. STRING
result_string The sample value when it is a string value. STRING
result_integer The sample value when it is an integer value. It is a long (64 bit) value where we store all short, integer, long values. BIGINT
result_float The sample value when it is a numeric value with. It is a double value where we store all double, float, numeric and decimal values. DOUBLE
result_boolean The sample value when it is a boolean value. BOOLEAN
result_date The sample value when it is a local date value. DATE
result_date_time The sample value when it is a local date time value. TIMESTAMP
result_instant The sample value when it is an absolute (UTC timezone) instant. TIMESTAMP
result_time The sample value when it is time value. INTERVAL
sample_index The 1-based index of the collected sample. INTEGER
sample_filter The sample filtering formula that was used in the where filter. STRING
row_id_1 Data group value at a single level. STRING
row_id_2 Data group value at a single level. STRING
row_id_3 Data group value at a single level. STRING
row_id_4 Data group value at a single level. STRING
row_id_5 Data group value at a single level. STRING
executed_at The UTC timestamp, when the data sensor was executed. TIMESTAMP
duration_ms The sensor (query) execution duration in milliseconds. INTEGER
created_at The timestamp when the row was created at. TIMESTAMP
updated_at The timestamp when the row was updated at. TIMESTAMP
created_by The login of the user that created the row. STRING
updated_by The login of the user that updated the row. STRING

What's more