Last updated: July 22, 2025
DQOps sensor_readouts parquet table schema
The parquet file schema for the sensor_readouts table stored in the $DQO_USER_HOME/.data/sensor_readouts folder in DQOps.
Table description
The data quality sensor readouts table that stores readouts (measures) captured by DQOps sensors, before the value are evaluated by the data quality rules. The sensor readouts are stored in the sensor_readouts table is located in the $DQO_USER_HOME/.data/sensor_readouts folder that contains uncompressed parquet files. The table is partitioned using a Hive compatible partitioning folder structure. When the $DQO_USER_HOME is not configured, it is the folder where DQOps was started (the DQOps user's home folder).
The folder partitioning structure for this table is: c=[connection_name]/t=[schema_name.table_name]/m=[first_day_of_month]/, for example: c=myconnection/t=public.testedtable/m=2023-01-01/.
Parquet table schema
The columns of this table are described below.
Column name | Description | Hive data type |
---|---|---|
id |
The check result id (primary key), it is a uuid of the check hash, time period and the data stream id. This value identifies a single row. | STRING |
actual_value |
The actual sensor value that was captured. | DOUBLE |
expected_value |
The expected value (expected_value). It is an optional column used when the sensor will also retrieve a comparison value (for accuracy checks). | DOUBLE |
time_period |
The time period of the sensor readout (timestamp), using a local timezone from the data source. | TIMESTAMP |
time_period_utc |
The time period of the sensor readout (timestamp) as a UTC timestamp. | TIMESTAMP |
time_gradient |
The time gradient (daily, monthly) for monitoring checks (checkpoints) and partition checks. It is a "milliseconds" for profiling checks. When the time gradient is daily or monthly, the time_period is truncated at the beginning of the time gradient. | STRING |
grouping_level_1 |
Data group value at a single level. | STRING |
grouping_level_2 |
Data group value at a single level. | STRING |
grouping_level_3 |
Data group value at a single level. | STRING |
grouping_level_4 |
Data group value at a single level. | STRING |
grouping_level_5 |
Data group value at a single level. | STRING |
grouping_level_6 |
Data group value at a single level. | STRING |
grouping_level_7 |
Data group value at a single level. | STRING |
grouping_level_8 |
Data group value at a single level. | STRING |
grouping_level_9 |
Data group value at a single level. | STRING |
data_group_hash |
The data group hash, it is a hash of the data group levels' values. | BIGINT |
data_group_name |
The data group name, it is a concatenated name of the data group dimension values, created from [grouping_level_1] / [grouping_level_2] / ... | STRING |
data_grouping_configuration |
The data grouping configuration name, it is a name of the named data grouping configuration that was used to run the data quality check. | STRING |
connection_hash |
A hash calculated from the connection name (the data source name). | BIGINT |
connection_name |
The connection name (the data source name). | STRING |
provider |
The provider name, which is the type of the data source. | STRING |
table_hash |
The table name hash. | BIGINT |
schema_name |
The database schema name. | STRING |
table_name |
The monitored table name. | STRING |
table_name_pattern |
The table name pattern, in case that a data quality check targets multiple tables. | STRING |
table_stage |
The stage name of the table. This is a free-form text configured at the table level that can identify the layers of the data warehouse or a data lake, for example: "landing", "staging", "cleansing", etc. | STRING |
table_priority |
The table priority value copied from the table's definition. The table priority can be used to sort tables according to their importance. | INTEGER |
column_hash |
The hash of a column. | BIGINT |
column_name |
The column for which the results are stored. | STRING |
column_name_pattern |
The column pattern, in case that a data quality check targets multiple columns. | STRING |
check_hash |
The hash of a data quality check. | BIGINT |
check_name |
The data quality check name. | STRING |
check_display_name |
The user configured display name for a data quality check, used when the user wants to use custom, user-friendly data quality check names. | STRING |
check_type |
The data quality check type (profiling, monitoring, partitioned). | STRING |
check_category |
The data quality check category name. | STRING |
table_comparison |
The name of a table comparison configuration used for a data comparison (accuracy) check. | STRING |
quality_dimension |
The data quality dimension name. The popular dimensions are: Timeliness, Completeness, Consistency, Validity, Reasonableness, Uniqueness. | STRING |
sensor_name |
The data quality sensor name. | STRING |
time_series_id |
The time series id (uuid). Identifies a single time series. A time series is a combination of the check_hash and data_group_hash. | STRING |
executed_at |
The UTC timestamp, when the data sensor was executed. | TIMESTAMP |
duration_ms |
The sensor (query) execution duration in milliseconds. | INTEGER |
created_at |
The timestamp when the row was created at. | TIMESTAMP |
updated_at |
The timestamp when the row was updated at. | TIMESTAMP |
created_by |
The login of the user that created the row. | STRING |
updated_by |
The login of the user that updated the row. | STRING |
What's more
- You can find more information on how the Parquet files are partitioned in the data quality results storage concept.