Skip to content

table profiling checks

TableSchemaProfilingChecksSpec

Container of built-in preconfigured volume data quality checks on a table level.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
profile_column_count Detects if the number of column matches an expected number. Retrieves the metadata of the monitored table, counts the number of columns and compares it to an expected value (an expected number of columns). TableSchemaColumnCountCheckSpec
profile_column_count_changed Detects if the count of columns has changed. Retrieves the metadata of the monitored table, counts the number of columns and compares it the last known column count that was captured when this data quality check was executed the last time. TableSchemaColumnCountChangedCheckSpec
profile_column_list_changed Detects if new columns were added or existing columns were removed. Retrieves the metadata of the monitored table and calculates an unordered hash of the column names. Compares the current hash to the previously known hash to detect any changes to the list of columns. TableSchemaColumnListChangedCheckSpec
profile_column_list_or_order_changed Detects if new columns were added, existing columns were removed or the columns were reordered. Retrieves the metadata of the monitored table and calculates an ordered hash of the column names. Compares the current hash to the previously known hash to detect any changes to the list of columns or their order. TableSchemaColumnListOrOrderChangedCheckSpec
profile_column_types_changed Detects if new columns were added, removed or their data types have changed. Retrieves the metadata of the monitored table and calculates an unordered hash of the column names and the data types (including the length, scale, precision, nullability). Compares the current hash to the previously known hash to detect any changes to the list of columns or their types. TableSchemaColumnTypesChangedCheckSpec

CommentsListSpec

List of comments.


CustomCheckSpec

Custom check specification. This check is usable only when there is a matching custom check definition that identifies the sensor definition and the rule definition.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
sensor_name Optional custom sensor name. It is a folder name inside the user's home 'sensors' folder or the DQO Home (DQO distribution) home/sensors folder. Sample sensor name: table/volume/row_count. When this value is set, it overrides the default sensor definition defined for the named check definition. string
rule_name Optional custom rule name. It is a path to a custom rule python module that starts at the user's home 'rules' folder. The path should not end with the .py file extension. Sample rule: myrules/my_custom_rule. When this value is set, it overrides the default rule definition defined for the named check definition. string
parameters Custom sensor parameters CustomSensorParametersSpec
warning Alerting threshold that raises a data quality warning that is considered as a passed data quality check CustomRuleParametersSpec
error Default alerting threshold for a row count that raises a data quality error (alert) CustomRuleParametersSpec
fatal Alerting threshold that raises a fatal data quality issue which indicates a serious data quality problem CustomRuleParametersSpec
schedule_override Run check scheduling configuration. Specifies the schedule (a cron expression) when the data quality checks are executed by the scheduler. RecurringScheduleSpec
comments Comments for change tracking. Please put comments in this collection because YAML comments may be removed when the YAML file is modified by the tool (serialization and deserialization will remove non tracked comments). CommentsListSpec
disabled Disables the data quality check. Only enabled data quality checks and recurrings are executed. The check should be disabled if it should not work, but the configuration of the sensor and rules should be preserved in the configuration. boolean
exclude_from_kpi Data quality check results (alerts) are included in the data quality KPI calculation by default. Set this field to true in order to exclude this data quality check from the data quality KPI calculation. boolean
include_in_sla Marks the data quality check as part of a data quality SLA. The data quality SLA is a set of critical data quality checks that must always pass and are considered as a data contract for the dataset. boolean
quality_dimension Configures a custom data quality dimension name that is different than the built-in dimensions (Timeliness, Validity, etc.). string
display_name Data quality check display name that could be assigned to the check, otherwise the check_display_name stored in the parquet result files is the check_name. string
data_grouping Data grouping configuration name that should be applied to this data quality check. The data grouping is used to group the check's result by a GROUP BY clause in SQL, evaluating the data quality check for each group of rows. Use the name of one of data grouping configurations defined on the parent table. string

TableTimelinessProfilingChecksSpec

Container of timeliness data quality checks on a table level.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
profile_data_freshness Calculates the number of days since the most recent event timestamp (freshness) TableDataFreshnessCheckSpec
profile_data_staleness Calculates the time difference in days between the current date and the most recent data ingestion timestamp (staleness) TableDataStalenessCheckSpec
profile_data_ingestion_delay Calculates the time difference in days between the most recent event timestamp and the most recent ingestion timestamp TableDataIngestionDelayCheckSpec

CustomCheckSpecMap

Dictionary of custom checks indexed by a check name.


TableAccuracyProfilingChecksSpec

Container of built-in preconfigured accuracy data quality checks on a table level.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
profile_total_row_count_match_percent Verifies that the total row count of the tested table matches the total row count of another (reference) table. TableAccuracyTotalRowCountMatchPercentCheckSpec

RecurringScheduleSpec

Recurring job schedule specification.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
cron_expression Unix style cron expression that specifies when to execute scheduled operations like running data quality checks or synchronizing the configuration with the cloud. string
disabled Disables the schedule. When the value of this 'disable' field is false, the schedule is stored in the metadata but it is not activated to run data quality checks. boolean

TableProfilingCheckCategoriesSpec

Container of table level checks that are activated on a table level.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
result_truncation Defines how many advanced profiling results are stored for the table monthly. By default, DQO will use the 'one_per_month' configuration and store only the most recent advanced profiling result executed during the month. By changing this value, it is possible to store one value per day or even store all advanced profiling results. enum one_per_week
all_results
one_per_hour
one_per_month
one_per_day
volume Configuration of volume data quality checks on a table level. TableVolumeProfilingChecksSpec
timeliness Configuration of timeliness checks on a table level. Timeliness checks detect anomalies like rapid row count changes. TableTimelinessProfilingChecksSpec
accuracy Configuration of accuracy checks on a table level. Accuracy checks compare the tested table with another reference table. TableAccuracyProfilingChecksSpec
sql Configuration of data quality checks that are evaluating custom SQL conditions and aggregated expressions. TableSqlProfilingChecksSpec
availability Configuration of the table availability data quality checks on a table level. TableAvailabilityProfilingChecksSpec
schema Configuration of schema (column count and schema) data quality checks on a table level. TableSchemaProfilingChecksSpec
comparisons Dictionary of configuration of checks for table comparisons. The key that identifies each comparison must match the name of a data comparison that is configured on the parent table. TableComparisonProfilingChecksSpecMap
custom Dictionary of custom checks. The keys are check names. CustomCheckSpecMap

CustomRuleParametersSpec

Custom data quality rule.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 

TableSqlProfilingChecksSpec

Container of built-in preconfigured data quality checks on a table level that are using custom SQL expressions (conditions).

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
profile_sql_condition_passed_percent_on_table Verifies that a set percentage of rows passed a custom SQL condition (expression). TableSqlConditionPassedPercentCheckSpec
profile_sql_condition_failed_count_on_table Verifies that a set number of rows failed a custom SQL condition (expression). TableSqlConditionFailedCountCheckSpec
profile_sql_aggregate_expr_table Verifies that a custom aggregated SQL expression (MIN, MAX, etc.) is not outside the set range. TableSqlAggregateExprCheckSpec

TableComparisonProfilingChecksSpec

Container of built-in comparison (accuracy) checks on a table level that are using a defined comparison to identify the reference table and the data grouping configuration.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
profile_row_count_match Verifies that the row count of the tested (parent) table matches the row count of the reference table. Compares each group of data with a GROUP BY clause. TableComparisonRowCountMatchCheckSpec

TableComparisonProfilingChecksSpecMap

Container of comparison checks for each defined data comparison. The name of the key in this dictionary must match a name of a table comparison that is defined on the parent table.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
access_order boolean
size integer
mod_count integer
threshold integer

TableAvailabilityProfilingChecksSpec

Container of built-in preconfigured table availability data quality checks on a table level.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
profile_table_availability Verifies availability of the table in a database using a simple row count. TableAvailabilityCheckSpec

TableVolumeProfilingChecksSpec

Container of built-in preconfigured volume data quality checks on a table level.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
profile_row_count Verifies that the number of rows in a table does not exceed the minimum accepted count. TableRowCountCheckSpec
profile_row_count_anomaly_differencing_30_days Verifies that the total row count of the tested table changes in a rate within a percentile boundary during last 30 days. TableAnomalyDifferencingRowCount30DaysCheckSpec
profile_row_count_anomaly_differencing Verifies that the total row count of the tested table changes in a rate within a percentile boundary during last 90 days. TableAnomalyDifferencingRowCountCheckSpec
profile_row_count_change Verifies that the total row count of the tested table has changed by a fixed rate since the last readout. TableChangeRowCountCheckSpec
profile_row_count_change_yesterday Verifies that the total row count of the tested table has changed by a fixed rate since the last readout from yesterday. Allows for exact match to readouts from yesterday or past readouts lookup. TableChangeRowCountSinceYesterdayCheckSpec
profile_row_count_change_7_days Verifies that the total row count of the tested table has changed by a fixed rate since the last readout from last week. Allows for exact match to readouts from 7 days ago or past readouts lookup. TableChangeRowCountSince7DaysCheckSpec
profile_row_count_change_30_days Verifies that the total row count of the tested table has changed by a fixed rate since the last readout from last month. Allows for exact match to readouts from 30 days ago or past readouts lookup. TableChangeRowCountSince30DaysCheckSpec

CustomSensorParametersSpec

Custom sensor parameters for custom checks.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
filter SQL WHERE clause added to the sensor query. Both the table level filter and a sensor query filter are added, separated by an AND operator. string

CommentSpec

Comment entry. Comments are added when a change was made and the change should be recorded in a persisted format.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
comment_by Commented by string
comment Comment text string