Last updated: July 22, 2025
DQOps YAML file definitions
The definition of YAML files used by DQOps to configure the data sources, monitored tables, and the configuration of activated data quality checks.
TableDailyPartitionedCheckCategoriesSpec
Container of table level daily partitioned checks. Contains categories of daily partitioned checks.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
volume |
Volume daily partitioned data quality checks that verify the quality of every day of data separately | TableVolumeDailyPartitionedChecksSpec | |||
timeliness |
Daily partitioned timeliness checks | TableTimelinessDailyPartitionedChecksSpec | |||
custom_sql |
Custom SQL daily partitioned data quality checks that verify the quality of every day of data separately | TableCustomSqlDailyPartitionedChecksSpec | |||
uniqueness |
Daily partitioned uniqueness checks on a table level. | TableUniquenessDailyPartitionChecksSpec | |||
comparisons |
Dictionary of configuration of checks for table comparisons. The key that identifies each comparison must match the name of a data comparison that is configured on the parent table. | TableComparisonDailyPartitionedChecksSpecMap | |||
custom |
Dictionary of custom checks. The keys are check names within this category. | CustomCheckSpecMap |
TableVolumeDailyPartitionedChecksSpec
Container of table level date partitioned volume data quality checks.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
daily_partition_row_count |
Verifies that each daily partition in the tested table has at least a minimum accepted number of rows. The default configuration of the warning, error and fatal severity rules verifies a minimum row count of one row, which ensures that the partition is not empty. | TableRowCountCheckSpec | |||
daily_partition_row_count_anomaly |
Detects outstanding partitions whose volume (the row count) differs too much from the average daily partition size. It uses time series anomaly detection to find the outliers in the partition volume during the last 90 days. | TableRowCountAnomalyStationaryPartitionCheckSpec | |||
daily_partition_row_count_change |
Detects when the partition's volume (row count) change between the current daily partition and the previous partition exceeds the maximum accepted change percentage. | TableRowCountChangeCheckSpec | |||
daily_partition_row_count_change_1_day |
Detects when the partition volume change (increase or decrease of the row count) since yesterday's daily partition exceeds the maximum accepted change percentage. | TableRowCountChange1DayCheckSpec | |||
daily_partition_row_count_change_7_days |
This check verifies that the percentage of change in the partition's volume (row count) since seven days ago is below the maximum accepted percentage. Verifying a volume change since a value a week ago overcomes the effect of weekly seasonability. | TableRowCountChange7DaysCheckSpec | |||
daily_partition_row_count_change_30_days |
This check verifies that the percentage of change in the partition's volume (row count) since thirty days ago is below the maximum accepted percentage. Comparing the current row count to a value 30 days ago overcomes the effect of monthly seasonability. | TableRowCountChange30DaysCheckSpec | |||
custom_checks |
Dictionary of additional custom checks within this category. The keys are check names defined in the definition section. The sensor parameters and rules should match the type of the configured sensor and rule for the custom check. | CustomCategoryCheckSpecMap |
TableRowCountAnomalyStationaryPartitionCheckSpec
This check detects anomalies in the day-to-day changes to the table volume (the row count). It captures the row count for each day and compares the row count change (increase or decrease) since the previous day. This check raises a data quality issue when the change is in the top anomaly_percent percentage of the biggest day-to-day changes.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
parameters |
Data quality check parameters | TableVolumeRowCountSensorParametersSpec | |||
warning |
Alerting threshold that raises a data quality warning that is considered as a passed data quality check | AnomalyPartitionRowCountRuleWarning1PctParametersSpec | |||
error |
Default alerting threshold for a set number of rows with negative value in a column that raises a data quality alert | AnomalyPartitionRowCountRuleError05PctParametersSpec | |||
fatal |
Alerting threshold that raises a fatal data quality issue which indicates a serious data quality problem | AnomalyPartitionRowCountRuleFatal01PctParametersSpec | |||
schedule_override |
Run check scheduling configuration. Specifies the schedule (a cron expression) when the data quality checks are executed by the scheduler. | CronScheduleSpec | |||
comments |
Comments for change tracking. Please put comments in this collection because YAML comments may be removed when the YAML file is modified by the tool (serialization and deserialization will remove non tracked comments). | CommentsListSpec | |||
disabled |
Disables the data quality check. Only enabled data quality checks and monitorings are executed. The check should be disabled if it should not work, but the configuration of the sensor and rules should be preserved in the configuration. | boolean | |||
exclude_from_kpi |
Data quality check results (alerts) are included in the data quality KPI calculation by default. Set this field to true in order to exclude this data quality check from the data quality KPI calculation. | boolean | |||
include_in_sla |
Marks the data quality check as part of a data quality SLA (Data Contract). The data quality SLA is a set of critical data quality checks that must always pass and are considered as a Data Contract for the dataset. | boolean | |||
quality_dimension |
Configures a custom data quality dimension name that is different than the built-in dimensions (Timeliness, Validity, etc.). | string | |||
display_name |
Data quality check display name that can be assigned to the check, otherwise the check_display_name stored in the parquet result files is the check_name. | string | |||
data_grouping |
Data grouping configuration name that should be applied to this data quality check. The data grouping is used to group the check's result by a GROUP BY clause in SQL, evaluating the data quality check for each group of rows. Use the name of one of data grouping configurations defined on the parent table. | string | |||
always_collect_error_samples |
Forces collecting error samples for this check whenever it fails, even if it is a monitoring check that is run by a scheduler, and running an additional query to collect error samples will impose additional load on the data source. | boolean | |||
do_not_schedule |
Disables running this check by a DQOps CRON scheduler. When a check is disabled from scheduling, it can be only triggered from the user interface or by submitting "run checks" job. | boolean |
AnomalyPartitionRowCountRuleWarning1PctParametersSpec
Data quality rule that detects anomalies on the row count of daily partitions. The rule identifies the top X% of anomalous values, based on the distribution of the changes using a standard deviation. The rule uses the time window of the last 90 days, but at least 30 historical measures must be present to run the calculation.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
anomaly_percent |
The probability (in percent) that the current daily row count is an anomaly because the value is outside the regular range of previous partition volume measures. The default time window of 90 time periods (days, etc.) is used, but at least 30 readouts must exist to run the calculation. | double | |||
use_ai |
Use an AI model to predict anomalies. WARNING: anomaly detection by AI models is not supported in a trial distribution of DQOps. Please contact DQOps support to upgrade your instance to a full DQOps instance. | boolean |
AnomalyPartitionRowCountRuleFatal01PctParametersSpec
Data quality rule that detects anomalies on the row count of daily partitions. The rule identifies the top X% of anomalous values, based on the distribution of the changes using a standard deviation. The rule uses the time window of the last 90 days, but at least 30 historical measures must be present to run the calculation.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
anomaly_percent |
The probability (in percent) that the current daily row count is an anomaly because the value is outside the regular range of previous partition volume measures. The default time window of 90 time periods (days, etc.) is used, but at least 30 readouts must exist to run the calculation. | double | |||
use_ai |
Use an AI model to predict anomalies. WARNING: anomaly detection by AI models is not supported in a trial distribution of DQOps. Please contact DQOps support to upgrade your instance to a full DQOps instance. | boolean |
TableTimelinessDailyPartitionedChecksSpec
Container of table level date partitioned timeliness data quality checks.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
daily_partition_data_ingestion_delay |
Daily partitioned check calculating the time difference in days between the most recent event timestamp and the most recent ingestion timestamp | TableDataIngestionDelayCheckSpec | |||
daily_partition_reload_lag |
Daily partitioned check calculating the longest time a row waited to be loaded, it is the maximum difference in days between the ingestion timestamp and the event timestamp column on any row in the monitored partition | TablePartitionReloadLagCheckSpec | |||
custom_checks |
Dictionary of additional custom checks within this category. The keys are check names defined in the definition section. The sensor parameters and rules should match the type of the configured sensor and rule for the custom check. | CustomCategoryCheckSpecMap |
TableCustomSqlDailyPartitionedChecksSpec
Container of built-in preconfigured data quality checks on a table level that are using custom SQL expressions (conditions).
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
daily_partition_sql_condition_failed_on_table |
Verifies that a custom SQL expression is met for each row. Counts the number of rows where the expression is not satisfied, and raises an issue if too many failures were detected. This check is used also to compare values between columns: `{alias}.col_price > {alias}.col_tax`. Stores a separate data quality check result for each daily partition. | TableSqlConditionFailedCheckSpec | |||
daily_partition_sql_condition_passed_percent_on_table |
Verifies that a minimum percentage of rows passed a custom SQL condition (expression). Reference the current table by using tokens, for example: `{alias}.col_price > {alias}.col_tax`. Stores a separate data quality check result for each daily partition. | TableSqlConditionPassedPercentCheckSpec | |||
daily_partition_sql_aggregate_expression_on_table |
Verifies that a custom aggregated SQL expression (MIN, MAX, etc.) is not outside the expected range. Stores a separate data quality check result for each daily partition. | TableSqlAggregateExpressionCheckSpec | |||
daily_partition_import_custom_result_on_table |
Runs a custom query that retrieves a result of a data quality check performed in the data engineering, whose result (the severity level) is pulled from a separate table. | TableSqlImportCustomResultCheckSpec | |||
custom_checks |
Dictionary of additional custom checks within this category. The keys are check names defined in the definition section. The sensor parameters and rules should match the type of the configured sensor and rule for the custom check. | CustomCategoryCheckSpecMap |
TableUniquenessDailyPartitionChecksSpec
Container of table level daily partition for uniqueness data quality checks.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
daily_partition_duplicate_record_count |
Verifies that the number of duplicate record values in a table does not exceed the maximum accepted count. | TableDuplicateRecordCountCheckSpec | |||
daily_partition_duplicate_record_percent |
Verifies that the percentage of duplicate record values in a table does not exceed the maximum accepted percentage. | TableDuplicateRecordPercentCheckSpec | |||
custom_checks |
Dictionary of additional custom checks within this category. The keys are check names defined in the definition section. The sensor parameters and rules should match the type of the configured sensor and rule for the custom check. | CustomCategoryCheckSpecMap |
TableComparisonDailyPartitionedChecksSpecMap
Container of comparison checks for each defined data comparison. The name of the key in this dictionary must match a name of a table comparison that is defined on the parent table. Contains the daily partitioned comparison checks for each configured reference table.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
self |
Dict[string, TableComparisonDailyPartitionedChecksSpec] |
TableComparisonDailyPartitionedChecksSpec
Container of built-in comparison (accuracy) checks on a table level that are using a defined comparison to identify the reference table and the data grouping configuration. Contains the daily partitioned comparison checks.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
daily_partition_row_count_match |
Verifies that the row count of the tested (parent) table matches the row count of the reference table. Compares each group of data with a GROUP BY clause on the time period (the daily partition) and all other data grouping columns. Stores the most recent captured value for each daily partition that was analyzed. | TableComparisonRowCountMatchCheckSpec | |||
custom_checks |
Dictionary of additional custom checks within this category. The keys are check names defined in the definition section. The sensor parameters and rules should match the type of the configured sensor and rule for the custom check. | CustomCategoryCheckSpecMap |