column profiling checks
ColumnNumericPercentile25SensorParametersSpec
Column level sensor that finds the percentile 25 in a given column.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
percentile_value | 25th percentile, must equal 0.25 | double | |||
filter | SQL WHERE clause added to the sensor query. Both the table level filter and a sensor query filter are added, separated by an AND operator. | string |
ColumnSchemaProfilingChecksSpec
Container of built-in preconfigured data quality checks on a column level that are checking the column schema.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
profile_column_exists | Checks the metadata of the monitored table and verifies if the column exists. | ColumnSchemaColumnExistsCheckSpec | |||
profile_column_type_changed | Checks the metadata of the monitored column and detects if the data type (including the length, precision, scale, nullability) has changed. | ColumnSchemaTypeChangedCheckSpec |
ColumnNumericPercentile90SensorParametersSpec
Column level sensor that finds the percentile 90 in a given column.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
percentile_value | 90th percentile, must equal 0.9 | double | |||
filter | SQL WHERE clause added to the sensor query. Both the table level filter and a sensor query filter are added, separated by an AND operator. | string |
ColumnBoolProfilingChecksSpec
Container of built-in preconfigured data quality checks on a column level that are checking for booleans.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
profile_true_percent | Verifies that the percentage of true values in a column does not exceed the minimum accepted percentage. | ColumnTruePercentCheckSpec | |||
profile_false_percent | Verifies that the percentage of false values in a column does not exceed the minimum accepted percentage. | ColumnFalsePercentCheckSpec |
ColumnNumericPercentile10SensorParametersSpec
Column level sensor that finds the percentile 10 in a given column.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
percentile_value | 10th percentile, must equal 0.1 | double | |||
filter | SQL WHERE clause added to the sensor query. Both the table level filter and a sensor query filter are added, separated by an AND operator. | string |
ColumnPercentile90InRangeCheckSpec
Column level check that ensures that the percentile 90 of values in a monitored column is in a set range.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
parameters | Data quality check parameters | ColumnNumericPercentile90SensorParametersSpec | |||
warning | Alerting threshold that raises a data quality warning that is considered as a passed data quality check | BetweenFloatsRuleParametersSpec | |||
error | Default alerting threshold for a percentile 90 in a column that raises a data quality error (alert). | BetweenFloatsRuleParametersSpec | |||
fatal | Alerting threshold that raises a fatal data quality issue which indicates a serious data quality problem | BetweenFloatsRuleParametersSpec | |||
schedule_override | Run check scheduling configuration. Specifies the schedule (a cron expression) when the data quality checks are executed by the scheduler. | RecurringScheduleSpec | |||
comments | Comments for change tracking. Please put comments in this collection because YAML comments may be removed when the YAML file is modified by the tool (serialization and deserialization will remove non tracked comments). | CommentsListSpec | |||
disabled | Disables the data quality check. Only enabled data quality checks and recurrings are executed. The check should be disabled if it should not work, but the configuration of the sensor and rules should be preserved in the configuration. | boolean | |||
exclude_from_kpi | Data quality check results (alerts) are included in the data quality KPI calculation by default. Set this field to true in order to exclude this data quality check from the data quality KPI calculation. | boolean | |||
include_in_sla | Marks the data quality check as part of a data quality SLA. The data quality SLA is a set of critical data quality checks that must always pass and are considered as a data contract for the dataset. | boolean | |||
quality_dimension | Configures a custom data quality dimension name that is different than the built-in dimensions (Timeliness, Validity, etc.). | string | |||
display_name | Data quality check display name that could be assigned to the check, otherwise the check_display_name stored in the parquet result files is the check_name. | string | |||
data_grouping | Data grouping configuration name that should be applied to this data quality check. The data grouping is used to group the check's result by a GROUP BY clause in SQL, evaluating the data quality check for each group of rows. Use the name of one of data grouping configurations defined on the parent table. | string |
ColumnUniquenessProfilingChecksSpec
Container of built-in preconfigured data quality checks on a column level that are checking for negative values.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
profile_distinct_count | Verifies that the number of distinct values in a column does not fall below the minimum accepted count. | ColumnDistinctCountCheckSpec | |||
profile_distinct_percent | Verifies that the percentage of distinct values in a column does not fall below the minimum accepted percent. | ColumnDistinctPercentCheckSpec | |||
profile_duplicate_count | Verifies that the number of duplicate values in a column does not exceed the maximum accepted count. | ColumnDuplicateCountCheckSpec | |||
profile_duplicate_percent | Verifies that the percentage of duplicate values in a column does not exceed the maximum accepted percentage. | ColumnDuplicatePercentCheckSpec |
ColumnIntegrityProfilingChecksSpec
Container of built-in preconfigured data quality checks on a column level that are checking for integrity.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
profile_foreign_key_not_match_count | Verifies that the number of values in a column that does not match values in another table column does not exceed the set count. | ColumnIntegrityForeignKeyNotMatchCountCheckSpec | |||
profile_foreign_key_match_percent | Verifies that the percentage of values in a column that matches values in another table column does not exceed the set count. | ColumnIntegrityForeignKeyMatchPercentCheckSpec |
ColumnPiiProfilingChecksSpec
Container of built-in preconfigured data quality checks on a column level that are checking for Personal Identifiable Information (PII).
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
profile_valid_usa_phone_percent | Verifies that the percentage of valid USA phone values in a column does not fall below the minimum accepted percentage. | ColumnPiiValidUsaPhonePercentCheckSpec | |||
profile_contains_usa_phone_percent | Verifies that the percentage of rows that contains USA phone number in a column does not exceed the maximum accepted percentage. | ColumnPiiContainsUsaPhonePercentCheckSpec | |||
profile_valid_usa_zipcode_percent | Verifies that the percentage of valid USA zip code values in a column does not fall below the minimum accepted percentage. | ColumnPiiValidUsaZipcodePercentCheckSpec | |||
profile_contains_usa_zipcode_percent | Verifies that the percentage of rows that contains USA zip code in a column does not exceed the maximum accepted percentage. | ColumnPiiContainsUsaZipcodePercentCheckSpec | |||
profile_valid_email_percent | Verifies that the percentage of valid emails values in a column does not fall below the minimum accepted percentage. | ColumnPiiValidEmailPercentCheckSpec | |||
profile_contains_email_percent | Verifies that the percentage of rows that contains valid emails in a column does not exceed the minimum accepted percentage. | ColumnPiiContainsEmailPercentCheckSpec | |||
profile_valid_ip4_address_percent | Verifies that the percentage of valid IP4 address values in a column does not fall below the minimum accepted percentage. | ColumnPiiValidIp4AddressPercentCheckSpec | |||
profile_contains_ip4_percent | Verifies that the percentage of rows that contains valid IP4 address values in a column does not fall below the minimum accepted percentage. | ColumnPiiContainsIp4PercentCheckSpec | |||
profile_valid_ip6_address_percent | Verifies that the percentage of valid IP6 address values in a column does not fall below the minimum accepted percentage. | ColumnPiiValidIp6AddressPercentCheckSpec | |||
profile_contains_ip6_percent | Verifies that the percentage of rows that contains valid IP6 address values in a column does not fall below the minimum accepted percentage. | ColumnPiiContainsIp6PercentCheckSpec |
ColumnPercentile25InRangeCheckSpec
Column level check that ensures that the percentile 25 of values in a monitored column is in a set range.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
parameters | Data quality check parameters | ColumnNumericPercentile25SensorParametersSpec | |||
warning | Alerting threshold that raises a data quality warning that is considered as a passed data quality check | BetweenFloatsRuleParametersSpec | |||
error | Default alerting threshold for a percentile 25 in a column that raises a data quality error (alert). | BetweenFloatsRuleParametersSpec | |||
fatal | Alerting threshold that raises a fatal data quality issue which indicates a serious data quality problem | BetweenFloatsRuleParametersSpec | |||
schedule_override | Run check scheduling configuration. Specifies the schedule (a cron expression) when the data quality checks are executed by the scheduler. | RecurringScheduleSpec | |||
comments | Comments for change tracking. Please put comments in this collection because YAML comments may be removed when the YAML file is modified by the tool (serialization and deserialization will remove non tracked comments). | CommentsListSpec | |||
disabled | Disables the data quality check. Only enabled data quality checks and recurrings are executed. The check should be disabled if it should not work, but the configuration of the sensor and rules should be preserved in the configuration. | boolean | |||
exclude_from_kpi | Data quality check results (alerts) are included in the data quality KPI calculation by default. Set this field to true in order to exclude this data quality check from the data quality KPI calculation. | boolean | |||
include_in_sla | Marks the data quality check as part of a data quality SLA. The data quality SLA is a set of critical data quality checks that must always pass and are considered as a data contract for the dataset. | boolean | |||
quality_dimension | Configures a custom data quality dimension name that is different than the built-in dimensions (Timeliness, Validity, etc.). | string | |||
display_name | Data quality check display name that could be assigned to the check, otherwise the check_display_name stored in the parquet result files is the check_name. | string | |||
data_grouping | Data grouping configuration name that should be applied to this data quality check. The data grouping is used to group the check's result by a GROUP BY clause in SQL, evaluating the data quality check for each group of rows. Use the name of one of data grouping configurations defined on the parent table. | string |
ColumnProfilingCheckCategoriesSpec
Container of column level, preconfigured checks.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
nulls | Configuration of column level checks that verify nulls and blanks. | ColumnNullsProfilingChecksSpec | |||
numeric | Configuration of column level checks that verify negative values. | ColumnNumericProfilingChecksSpec | |||
strings | Configuration of strings checks on a column level. | ColumnStringsProfilingChecksSpec | |||
uniqueness | Configuration of uniqueness checks on a column level. | ColumnUniquenessProfilingChecksSpec | |||
datetime | Configuration of datetime checks on a column level. | ColumnDatetimeProfilingChecksSpec | |||
pii | Configuration of Personal Identifiable Information (PII) checks on a column level. | ColumnPiiProfilingChecksSpec | |||
sql | Configuration of SQL checks that use custom SQL aggregated expressions and SQL conditions in data quality checks. | ColumnSqlProfilingChecksSpec | |||
bool | Configuration of booleans checks on a column level. | ColumnBoolProfilingChecksSpec | |||
integrity | Configuration of integrity checks on a column level. | ColumnIntegrityProfilingChecksSpec | |||
accuracy | Configuration of accuracy checks on a column level. | ColumnAccuracyProfilingChecksSpec | |||
datatype | Configuration of datatype checks on a column level. | ColumnDatatypeProfilingChecksSpec | |||
anomaly | Configuration of anomaly checks on a column level. | ColumnAnomalyProfilingChecksSpec | |||
schema | Configuration of schema checks on a column level. | ColumnSchemaProfilingChecksSpec | |||
comparisons | Dictionary of configuration of checks for table comparisons at a column level. The key that identifies each comparison must match the name of a data comparison that is configured on the parent table. | ColumnComparisonProfilingChecksSpecMap | |||
custom | Dictionary of custom checks. The keys are check names. | CustomCheckSpecMap |
ColumnAccuracyProfilingChecksSpec
Container of built-in preconfigured data quality checks on a column level that are checking for accuracy.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
profile_total_sum_match_percent | Verifies that percentage of the difference in total sum of a column in a table and total sum of a column of another table does not exceed the set number. | ColumnAccuracyTotalSumMatchPercentCheckSpec | |||
profile_total_min_match_percent | Verifies that the percentage of difference in total min of a column in a table and total min of a column of another table does not exceed the set number. | ColumnAccuracyTotalMinMatchPercentCheckSpec | |||
profile_total_max_match_percent | Verifies that the percentage of difference in total max of a column in a table and total max of a column of another table does not exceed the set number. | ColumnAccuracyTotalMaxMatchPercentCheckSpec | |||
profile_total_average_match_percent | Verifies that the percentage of difference in total average of a column in a table and total average of a column of another table does not exceed the set number. | ColumnAccuracyTotalAverageMatchPercentCheckSpec | |||
profile_total_not_null_count_match_percent | Verifies that the percentage of difference in total not null count of a column in a table and total not null count of a column of another table does not exceed the set number. Stores the most recent captured value for each day when the data quality check was evaluated. | ColumnAccuracyTotalNotNullCountMatchPercentCheckSpec |
ColumnSqlProfilingChecksSpec
Container of built-in preconfigured data quality checks on a column level that are using custom SQL expressions (conditions).
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
profile_sql_condition_passed_percent_on_column | Verifies that a minimum percentage of rows passed a custom SQL condition (expression). | ColumnSqlConditionPassedPercentCheckSpec | |||
profile_sql_condition_failed_count_on_column | Verifies that a number of rows failed a custom SQL condition(expression) does not exceed the maximum accepted count. | ColumnSqlConditionFailedCountCheckSpec | |||
profile_sql_aggregate_expr_column | Verifies that a custom aggregated SQL expression (MIN, MAX, etc.) is not outside the set range. | ColumnSqlAggregateExprCheckSpec |
ColumnNumericProfilingChecksSpec
Container of built-in preconfigured data quality checks on a column level for numeric values.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
profile_negative_count | Verifies that the number of negative values in a column does not exceed the maximum accepted count. | ColumnNegativeCountCheckSpec | |||
profile_negative_percent | Verifies that the percentage of negative values in a column does not exceed the maximum accepted percentage. | ColumnNegativePercentCheckSpec | |||
profile_non_negative_count | Verifies that the number of non-negative values in a column does not exceed the maximum accepted count. | ColumnNonNegativeCountCheckSpec | |||
profile_non_negative_percent | Verifies that the percentage of non-negative values in a column does not exceed the maximum accepted percentage. | ColumnNonNegativePercentCheckSpec | |||
profile_expected_numbers_in_use_count | Verifies that the expected numeric values were found in the column. Raises a data quality issue when too many expected values were not found (were missing). | ColumnExpectedNumbersInUseCountCheckSpec | |||
profile_number_value_in_set_percent | The check measures the percentage of rows whose value in a tested column is one of values from a list of expected values or the column value is null. Verifies that the percentage of rows having a valid column value does not exceed the minimum accepted percentage. | ColumnNumberValueInSetPercentCheckSpec | |||
profile_values_in_range_numeric_percent | Verifies that the percentage of values from range in a column does not exceed the minimum accepted percentage. | ColumnValuesInRangeNumericPercentCheckSpec | |||
profile_values_in_range_integers_percent | Verifies that the percentage of values from range in a column does not exceed the minimum accepted percentage. | ColumnValuesInRangeIntegersPercentCheckSpec | |||
profile_value_below_min_value_count | The check counts the number of values in the column that is below the value defined by the user as a parameter. | ColumnValueBelowMinValueCountCheckSpec | |||
profile_value_below_min_value_percent | The check counts the percentage of values in the column that is below the value defined by the user as a parameter. | ColumnValueBelowMinValuePercentCheckSpec | |||
profile_value_above_max_value_count | The check counts the number of values in the column that is above the value defined by the user as a parameter. | ColumnValueAboveMaxValueCountCheckSpec | |||
profile_value_above_max_value_percent | The check counts the percentage of values in the column that is above the value defined by the user as a parameter. | ColumnValueAboveMaxValuePercentCheckSpec | |||
profile_max_in_range | Verifies that the maximal value in a column is not outside the set range. | ColumnMaxInRangeCheckSpec | |||
profile_min_in_range | Verifies that the minimal value in a column is not outside the set range. | ColumnMinInRangeCheckSpec | |||
profile_mean_in_range | Verifies that the average (mean) of all values in a column is not outside the set range. | ColumnMeanInRangeCheckSpec | |||
profile_percentile_in_range | Verifies that the percentile of all values in a column is not outside the set range. | ColumnPercentileInRangeCheckSpec | |||
profile_median_in_range | Verifies that the median of all values in a column is not outside the set range. | ColumnMedianInRangeCheckSpec | |||
profile_percentile_10_in_range | Verifies that the percentile 10 of all values in a column is not outside the set range. | ColumnPercentile10InRangeCheckSpec | |||
profile_percentile_25_in_range | Verifies that the percentile 25 of all values in a column is not outside the set range. | ColumnPercentile25InRangeCheckSpec | |||
profile_percentile_75_in_range | Verifies that the percentile 75 of all values in a column is not outside the set range. | ColumnPercentile75InRangeCheckSpec | |||
profile_percentile_90_in_range | Verifies that the percentile 90 of all values in a column is not outside the set range. | ColumnPercentile90InRangeCheckSpec | |||
profile_sample_stddev_in_range | Verifies that the sample standard deviation of all values in a column is not outside the set range. | ColumnSampleStddevInRangeCheckSpec | |||
profile_population_stddev_in_range | Verifies that the population standard deviation of all values in a column is not outside the set range. | ColumnPopulationStddevInRangeCheckSpec | |||
profile_sample_variance_in_range | Verifies that the sample variance of all values in a column is not outside the set range. | ColumnSampleVarianceInRangeCheckSpec | |||
profile_population_variance_in_range | Verifies that the population variance of all values in a column is not outside the set range. | ColumnPopulationVarianceInRangeCheckSpec | |||
profile_sum_in_range | Verifies that the sum of all values in a column is not outside the set range. | ColumnSumInRangeCheckSpec | |||
profile_invalid_latitude_count | Verifies that the number of invalid latitude values in a column does not exceed the maximum accepted count. | ColumnInvalidLatitudeCountCheckSpec | |||
profile_valid_latitude_percent | Verifies that the percentage of valid latitude values in a column does not fall below the minimum accepted percentage. | ColumnValidLatitudePercentCheckSpec | |||
profile_invalid_longitude_count | Verifies that the number of invalid longitude values in a column does not exceed the maximum accepted count. | ColumnInvalidLongitudeCountCheckSpec | |||
profile_valid_longitude_percent | Verifies that the percentage of valid longitude values in a column does not fall below the minimum accepted percentage. | ColumnValidLongitudePercentCheckSpec |
ColumnComparisonProfilingChecksSpecMap
Container of comparison checks for each defined data comparison. The name of the key in this dictionary must match a name of a table comparison that is defined on the parent table. Contains configuration of column level comparison checks. Each column level check container also defines the name of the reference column name to which we are comparing.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
access_order | boolean | ||||
size | integer | ||||
mod_count | integer | ||||
threshold | integer |
ColumnComparisonProfilingChecksSpec
Container of built-in preconfigured column level comparison checks that compare min/max/sum/mean/nulls measures between the column in the tested (parent) table and a matching reference column in the reference table (the source of truth).
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
profile_sum_match | Verifies that percentage of the difference between the sum of values in a tested column in a parent table and the sum of a values in a column in the reference table. The difference must be below defined percentage thresholds. | ColumnComparisonSumMatchCheckSpec | |||
profile_min_match | Verifies that percentage of the difference between the minimum value in a tested column in a parent table and the minimum value in a column in the reference table. The difference must be below defined percentage thresholds. | ColumnComparisonMinMatchCheckSpec | |||
profile_max_match | Verifies that percentage of the difference between the maximum value in a tested column in a parent table and the maximum value in a column in the reference table. The difference must be below defined percentage thresholds. | ColumnComparisonMaxMatchCheckSpec | |||
profile_mean_match | Verifies that percentage of the difference between the mean (average) value in a tested column in a parent table and the mean (average) value in a column in the reference table. The difference must be below defined percentage thresholds. | ColumnComparisonMeanMatchCheckSpec | |||
profile_not_null_count_match | Verifies that percentage of the difference between the count of not null values in a tested column in a parent table and the count of not null values in a column in the reference table. The difference must be below defined percentage thresholds. | ColumnComparisonNotNullCountMatchCheckSpec | |||
profile_null_count_match | Verifies that percentage of the difference between the count of null values in a tested column in a parent table and the count of null values in a column in the reference table. The difference must be below defined percentage thresholds. | ColumnComparisonNullCountMatchCheckSpec | |||
reference_column | The name of the reference column name in the reference table. It is the column to which the current column is compared to. | string |
ColumnAnomalyProfilingChecksSpec
Container of built-in preconfigured data quality checks on a column level for detecting anomalies.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
profile_mean_anomaly_stationary_30_days | Verifies that the mean value in a column changes in a rate within a percentile boundary during last 30 days. | ColumnAnomalyStationaryMean30DaysCheckSpec | |||
profile_mean_anomaly_stationary | Verifies that the mean value in a column changes in a rate within a percentile boundary during last 90 days. | ColumnAnomalyStationaryMeanCheckSpec | |||
profile_median_anomaly_stationary_30_days | Verifies that the median in a column changes in a rate within a percentile boundary during last 30 days. | ColumnAnomalyStationaryMedian30DaysCheckSpec | |||
profile_median_anomaly_stationary | Verifies that the median in a column changes in a rate within a percentile boundary during last 90 days. | ColumnAnomalyStationaryMedianCheckSpec | |||
profile_sum_anomaly_differencing_30_days | Verifies that the sum in a column changes in a rate within a percentile boundary during last 30 days. | ColumnAnomalyDifferencingSum30DaysCheckSpec | |||
profile_sum_anomaly_differencing | Verifies that the sum in a column changes in a rate within a percentile boundary during last 90 days. | ColumnAnomalyDifferencingSumCheckSpec | |||
profile_mean_change | Verifies that the mean value in a column changed in a fixed rate since last readout. | ColumnChangeMeanCheckSpec | |||
profile_mean_change_yesterday | Verifies that the mean value in a column changed in a fixed rate since last readout from yesterday. | ColumnChangeMeanSinceYesterdayCheckSpec | |||
profile_mean_change_7_days | Verifies that the mean value in a column changed in a fixed rate since last readout from last week. | ColumnChangeMeanSince7DaysCheckSpec | |||
profile_mean_change_30_days | Verifies that the mean value in a column changed in a fixed rate since last readout from last month. | ColumnChangeMeanSince30DaysCheckSpec | |||
profile_median_change | Verifies that the median in a column changed in a fixed rate since last readout. | ColumnChangeMedianCheckSpec | |||
profile_median_change_yesterday | Verifies that the median in a column changed in a fixed rate since last readout from yesterday. | ColumnChangeMedianSinceYesterdayCheckSpec | |||
profile_median_change_7_days | Verifies that the median in a column changed in a fixed rate since last readout from last week. | ColumnChangeMedianSince7DaysCheckSpec | |||
profile_median_change_30_days | Verifies that the median in a column changed in a fixed rate since last readout from last month. | ColumnChangeMedianSince30DaysCheckSpec | |||
profile_sum_change | Verifies that the sum in a column changed in a fixed rate since last readout. | ColumnChangeSumCheckSpec | |||
profile_sum_change_yesterday | Verifies that the sum in a column changed in a fixed rate since last readout from yesterday. | ColumnChangeSumSinceYesterdayCheckSpec | |||
profile_sum_change_7_days | Verifies that the sum in a column changed in a fixed rate since last readout from last week. | ColumnChangeSumSince7DaysCheckSpec | |||
profile_sum_change_30_days | Verifies that the sum in a column changed in a fixed rate since last readout from last month. | ColumnChangeSumSince30DaysCheckSpec |
ColumnDatetimeProfilingChecksSpec
Container of built-in preconfigured data quality checks on a column level that are checking for datetime.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
profile_date_values_in_future_percent | Verifies that the percentage of date values in future in a column does not exceed the maximum accepted percentage. | ColumnDateValuesInFuturePercentCheckSpec | |||
profile_datetime_value_in_range_date_percent | Verifies that the percentage of date values in the range defined by the user in a column does not exceed the maximum accepted percentage. | ColumnDatetimeValueInRangeDatePercentCheckSpec |
ColumnPercentile10InRangeCheckSpec
Column level check that ensures that the percentile 10 of values in a monitored column is in a set range.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
parameters | Data quality check parameters | ColumnNumericPercentile10SensorParametersSpec | |||
warning | Alerting threshold that raises a data quality warning that is considered as a passed data quality check | BetweenFloatsRuleParametersSpec | |||
error | Default alerting threshold for a percentile 10 in a column that raises a data quality error (alert). | BetweenFloatsRuleParametersSpec | |||
fatal | Alerting threshold that raises a fatal data quality issue which indicates a serious data quality problem | BetweenFloatsRuleParametersSpec | |||
schedule_override | Run check scheduling configuration. Specifies the schedule (a cron expression) when the data quality checks are executed by the scheduler. | RecurringScheduleSpec | |||
comments | Comments for change tracking. Please put comments in this collection because YAML comments may be removed when the YAML file is modified by the tool (serialization and deserialization will remove non tracked comments). | CommentsListSpec | |||
disabled | Disables the data quality check. Only enabled data quality checks and recurrings are executed. The check should be disabled if it should not work, but the configuration of the sensor and rules should be preserved in the configuration. | boolean | |||
exclude_from_kpi | Data quality check results (alerts) are included in the data quality KPI calculation by default. Set this field to true in order to exclude this data quality check from the data quality KPI calculation. | boolean | |||
include_in_sla | Marks the data quality check as part of a data quality SLA. The data quality SLA is a set of critical data quality checks that must always pass and are considered as a data contract for the dataset. | boolean | |||
quality_dimension | Configures a custom data quality dimension name that is different than the built-in dimensions (Timeliness, Validity, etc.). | string | |||
display_name | Data quality check display name that could be assigned to the check, otherwise the check_display_name stored in the parquet result files is the check_name. | string | |||
data_grouping | Data grouping configuration name that should be applied to this data quality check. The data grouping is used to group the check's result by a GROUP BY clause in SQL, evaluating the data quality check for each group of rows. Use the name of one of data grouping configurations defined on the parent table. | string |
ColumnPercentile75InRangeCheckSpec
Column level check that ensures that the percentile 75 of values in a monitored column is in a set range.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
parameters | Data quality check parameters | ColumnNumericPercentile75SensorParametersSpec | |||
warning | Alerting threshold that raises a data quality warning that is considered as a passed data quality check | BetweenFloatsRuleParametersSpec | |||
error | Default alerting threshold for a percentile 75 in a column that raises a data quality error (alert). | BetweenFloatsRuleParametersSpec | |||
fatal | Alerting threshold that raises a fatal data quality issue which indicates a serious data quality problem | BetweenFloatsRuleParametersSpec | |||
schedule_override | Run check scheduling configuration. Specifies the schedule (a cron expression) when the data quality checks are executed by the scheduler. | RecurringScheduleSpec | |||
comments | Comments for change tracking. Please put comments in this collection because YAML comments may be removed when the YAML file is modified by the tool (serialization and deserialization will remove non tracked comments). | CommentsListSpec | |||
disabled | Disables the data quality check. Only enabled data quality checks and recurrings are executed. The check should be disabled if it should not work, but the configuration of the sensor and rules should be preserved in the configuration. | boolean | |||
exclude_from_kpi | Data quality check results (alerts) are included in the data quality KPI calculation by default. Set this field to true in order to exclude this data quality check from the data quality KPI calculation. | boolean | |||
include_in_sla | Marks the data quality check as part of a data quality SLA. The data quality SLA is a set of critical data quality checks that must always pass and are considered as a data contract for the dataset. | boolean | |||
quality_dimension | Configures a custom data quality dimension name that is different than the built-in dimensions (Timeliness, Validity, etc.). | string | |||
display_name | Data quality check display name that could be assigned to the check, otherwise the check_display_name stored in the parquet result files is the check_name. | string | |||
data_grouping | Data grouping configuration name that should be applied to this data quality check. The data grouping is used to group the check's result by a GROUP BY clause in SQL, evaluating the data quality check for each group of rows. Use the name of one of data grouping configurations defined on the parent table. | string |
ColumnStringsProfilingChecksSpec
Container of built-in preconfigured data quality checks on a column level that are checking for string.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
profile_string_max_length | Verifies that the length of string in a column does not exceed the maximum accepted length. | ColumnStringMaxLengthCheckSpec | |||
profile_string_min_length | Verifies that the length of string in a column does not fall below the minimum accepted length. | ColumnStringMinLengthCheckSpec | |||
profile_string_mean_length | Verifies that the length of string in a column does not exceed the mean accepted length. | ColumnStringMeanLengthCheckSpec | |||
profile_string_length_below_min_length_count | The check counts the number of strings in the column that is below the length defined by the user as a parameter. | ColumnStringLengthBelowMinLengthCountCheckSpec | |||
profile_string_length_below_min_length_percent | The check counts the percentage of strings in the column that is below the length defined by the user as a parameter. | ColumnStringLengthBelowMinLengthPercentCheckSpec | |||
profile_string_length_above_max_length_count | The check counts the number of strings in the column that is above the length defined by the user as a parameter. | ColumnStringLengthAboveMaxLengthCountCheckSpec | |||
profile_string_length_above_max_length_percent | The check counts the percentage of strings in the column that is above the length defined by the user as a parameter. | ColumnStringLengthAboveMaxLengthPercentCheckSpec | |||
profile_string_length_in_range_percent | The check counts the percentage of those strings with length in the range provided by the user in the column. | ColumnStringLengthInRangePercentCheckSpec | |||
profile_string_empty_count | Verifies that empty strings in a column does not exceed the maximum accepted count. | ColumnStringEmptyCountCheckSpec | |||
profile_string_empty_percent | Verifies that the percentage of empty strings in a column does not exceed the maximum accepted percentage. | ColumnStringEmptyPercentCheckSpec | |||
profile_string_whitespace_count | Verifies that the number of whitespace strings in a column does not exceed the maximum accepted count. | ColumnStringWhitespaceCountCheckSpec | |||
profile_string_whitespace_percent | Verifies that the percentage of whitespace strings in a column does not exceed the minimum accepted percentage. | ColumnStringWhitespacePercentCheckSpec | |||
profile_string_surrounded_by_whitespace_count | Verifies that the number of strings surrounded by whitespace in a column does not exceed the maximum accepted count. | ColumnStringSurroundedByWhitespaceCountCheckSpec | |||
profile_string_surrounded_by_whitespace_percent | Verifies that the percentage of strings surrounded by whitespace in a column does not exceed the maximum accepted percentage. | ColumnStringSurroundedByWhitespacePercentCheckSpec | |||
profile_string_null_placeholder_count | Verifies that the number of null placeholders in a column does not exceed the maximum accepted count. | ColumnStringNullPlaceholderCountCheckSpec | |||
profile_string_null_placeholder_percent | Verifies that the percentage of null placeholders in a column does not exceed the maximum accepted percentage. | ColumnStringNullPlaceholderPercentCheckSpec | |||
profile_string_boolean_placeholder_percent | Verifies that the percentage of boolean placeholder for strings in a column does not fall below the minimum accepted percentage. | ColumnStringBooleanPlaceholderPercentCheckSpec | |||
profile_string_parsable_to_integer_percent | Verifies that the percentage of parsable to integer string in a column does not fall below the minimum accepted percentage. | ColumnStringParsableToIntegerPercentCheckSpec | |||
profile_string_parsable_to_float_percent | Verifies that the percentage of parsable to float string in a column does not fall below the minimum accepted percentage. | ColumnStringParsableToFloatPercentCheckSpec | |||
profile_expected_strings_in_use_count | Verifies that the expected string values were found in the column. Raises a data quality issue when too many expected values were not found (were missing). | ColumnExpectedStringsInUseCountCheckSpec | |||
profile_string_value_in_set_percent | The check measures the percentage of rows whose value in a tested column is one of values from a list of expected values or the column value is null. Verifies that the percentage of rows having a valid column value does not exceed the minimum accepted percentage. | ColumnStringValueInSetPercentCheckSpec | |||
profile_string_valid_dates_percent | Verifies that the percentage of valid dates in a column does not fall below the minimum accepted percentage. | ColumnStringValidDatesPercentCheckSpec | |||
profile_string_valid_country_code_percent | Verifies that the percentage of valid country code in a column does not fall below the minimum accepted percentage. | ColumnStringValidCountryCodePercentCheckSpec | |||
profile_string_valid_currency_code_percent | Verifies that the percentage of valid currency code in a column does not fall below the minimum accepted percentage. | ColumnStringValidCurrencyCodePercentCheckSpec | |||
profile_string_invalid_email_count | Verifies that the number of invalid emails in a column does not exceed the maximum accepted count. | ColumnStringInvalidEmailCountCheckSpec | |||
profile_string_invalid_uuid_count | Verifies that the number of invalid UUID in a column does not exceed the maximum accepted count. | ColumnStringInvalidUuidCountCheckSpec | |||
profile_string_valid_uuid_percent | Verifies that the percentage of valid UUID in a column does not fall below the minimum accepted percentage. | ColumnStringValidUuidPercentCheckSpec | |||
profile_string_invalid_ip4_address_count | Verifies that the number of invalid IP4 address in a column does not exceed the maximum accepted count. | ColumnStringInvalidIp4AddressCountCheckSpec | |||
profile_string_invalid_ip6_address_count | Verifies that the number of invalid IP6 address in a column does not exceed the maximum accepted count. | ColumnStringInvalidIp6AddressCountCheckSpec | |||
profile_string_not_match_regex_count | Verifies that the number of strings not matching the custom regex in a column does not exceed the maximum accepted count. | ColumnStringNotMatchRegexCountCheckSpec | |||
profile_string_match_regex_percent | Verifies that the percentage of strings matching the custom regex in a column does not fall below the minimum accepted percentage. | ColumnStringMatchRegexPercentCheckSpec | |||
profile_string_not_match_date_regex_count | Verifies that the number of strings not matching the date format regex in a column does not exceed the maximum accepted count. | ColumnStringNotMatchDateRegexCountCheckSpec | |||
profile_string_match_date_regex_percent | Verifies that the percentage of strings matching the date format regex in a column does not fall below the minimum accepted percentage. | ColumnStringMatchDateRegexPercentCheckSpec | |||
profile_string_match_name_regex_percent | Verifies that the percentage of strings matching the name regex in a column does not fall below the minimum accepted percentage. | ColumnStringMatchNameRegexPercentCheckSpec | |||
profile_expected_strings_in_top_values_count | Verifies that the top X most popular column values contain all values from a list of expected values. | ColumnExpectedStringsInTopValuesCountCheckSpec | |||
profile_string_datatype_detected | Detects the data type of text values stored in the column. The sensor returns the code of the detected data type of a column: 1 - integers, 2 - floats, 3 - dates, 4 - timestamps, 5 - booleans, 6 - strings, 7 - mixed data types. Raises a data quality issue when the detected data type does not match the expected data type. | ColumnStringDatatypeDetectedCheckSpec |
ColumnNumericPercentile75SensorParametersSpec
Column level sensor that finds the percentile 75 in a given column.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
percentile_value | 75th percentile, must equal 0.75 | double | |||
filter | SQL WHERE clause added to the sensor query. Both the table level filter and a sensor query filter are added, separated by an AND operator. | string |
ColumnDatatypeProfilingChecksSpec
Container of built-in preconfigured data quality checks on a column level that are checking for datatype.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
profile_date_match_format_percent | Verifies that the percentage of date values matching the given format in a column does not exceed the minimum accepted percentage. | ColumnDatatypeDateMatchFormatPercentCheckSpec | |||
profile_string_datatype_changed | Detects that the data type of texts stored in a text column has changed since the last verification. The sensor returns the detected data type of a column: 1 - integers, 2 - floats, 3 - dates, 4 - timestamps, 5 - booleans, 6 - strings, 7 - mixed data types. | ColumnDatatypeStringDatatypeChangedCheckSpec |
ColumnNullsProfilingChecksSpec
Container of built-in preconfigured data quality checks on a column level that are checking for nulls.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
profile_nulls_count | Verifies that the number of null values in a column does not exceed the maximum accepted count. | ColumnNullsCountCheckSpec | |||
profile_nulls_percent | Verifies that the percent of null values in a column does not exceed the maximum accepted percentage. | ColumnNullsPercentCheckSpec | |||
profile_nulls_percent_anomaly_stationary_30_days | Verifies that the null percent value in a column changes in a rate within a percentile boundary during last 30 days. | ColumnAnomalyStationaryNullPercent30DaysCheckSpec | |||
profile_nulls_percent_anomaly_stationary | Verifies that the null percent value in a column changes in a rate within a percentile boundary during last 90 days. | ColumnAnomalyStationaryNullPercentCheckSpec | |||
profile_nulls_percent_change | Verifies that the null percent value in a column changed in a fixed rate since last readout. | ColumnChangeNullPercentCheckSpec | |||
profile_nulls_percent_change_yesterday | Verifies that the null percent value in a column changed in a fixed rate since last readout from yesterday. | ColumnChangeNullPercentSinceYesterdayCheckSpec | |||
profile_nulls_percent_change_7_days | Verifies that the null percent value in a column changed in a fixed rate since last readout from last week. | ColumnChangeNullPercentSince7DaysCheckSpec | |||
profile_nulls_percent_change_30_days | Verifies that the null percent value in a column changed in a fixed rate since last readout from last month. | ColumnChangeNullPercentSince30DaysCheckSpec | |||
profile_not_nulls_count | Verifies that the number of not null values in a column does not exceed the minimum accepted count. | ColumnNotNullsCountCheckSpec | |||
profile_not_nulls_percent | Verifies that the percent of not null values in a column does not exceed the minimum accepted percentage. | ColumnNotNullsPercentCheckSpec |
ColumnMedianInRangeCheckSpec
Column level check that ensures that the median of values in a monitored column is in a set range.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
parameters | Data quality check parameters | ColumnNumericMedianSensorParametersSpec | |||
warning | Alerting threshold that raises a data quality warning that is considered as a passed data quality check | BetweenFloatsRuleParametersSpec | |||
error | Default alerting threshold for a median in a column that raises a data quality error (alert). | BetweenFloatsRuleParametersSpec | |||
fatal | Alerting threshold that raises a fatal data quality issue which indicates a serious data quality problem | BetweenFloatsRuleParametersSpec | |||
schedule_override | Run check scheduling configuration. Specifies the schedule (a cron expression) when the data quality checks are executed by the scheduler. | RecurringScheduleSpec | |||
comments | Comments for change tracking. Please put comments in this collection because YAML comments may be removed when the YAML file is modified by the tool (serialization and deserialization will remove non tracked comments). | CommentsListSpec | |||
disabled | Disables the data quality check. Only enabled data quality checks and recurrings are executed. The check should be disabled if it should not work, but the configuration of the sensor and rules should be preserved in the configuration. | boolean | |||
exclude_from_kpi | Data quality check results (alerts) are included in the data quality KPI calculation by default. Set this field to true in order to exclude this data quality check from the data quality KPI calculation. | boolean | |||
include_in_sla | Marks the data quality check as part of a data quality SLA. The data quality SLA is a set of critical data quality checks that must always pass and are considered as a data contract for the dataset. | boolean | |||
quality_dimension | Configures a custom data quality dimension name that is different than the built-in dimensions (Timeliness, Validity, etc.). | string | |||
display_name | Data quality check display name that could be assigned to the check, otherwise the check_display_name stored in the parquet result files is the check_name. | string | |||
data_grouping | Data grouping configuration name that should be applied to this data quality check. The data grouping is used to group the check's result by a GROUP BY clause in SQL, evaluating the data quality check for each group of rows. Use the name of one of data grouping configurations defined on the parent table. | string |