Skip to content

column profiling checks

ColumnNumericPercentile25SensorParametersSpec

Column level sensor that finds the percentile 25 in a given column.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
percentile_value 25th percentile, must equal 0.25 double
filter SQL WHERE clause added to the sensor query. Both the table level filter and a sensor query filter are added, separated by an AND operator. string

ColumnSchemaProfilingChecksSpec

Container of built-in preconfigured data quality checks on a column level that are checking the column schema.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
profile_column_exists Checks the metadata of the monitored table and verifies if the column exists. ColumnSchemaColumnExistsCheckSpec
profile_column_type_changed Checks the metadata of the monitored column and detects if the data type (including the length, precision, scale, nullability) has changed. ColumnSchemaTypeChangedCheckSpec

ColumnNumericPercentile90SensorParametersSpec

Column level sensor that finds the percentile 90 in a given column.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
percentile_value 90th percentile, must equal 0.9 double
filter SQL WHERE clause added to the sensor query. Both the table level filter and a sensor query filter are added, separated by an AND operator. string

ColumnBoolProfilingChecksSpec

Container of built-in preconfigured data quality checks on a column level that are checking for booleans.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
profile_true_percent Verifies that the percentage of true values in a column does not exceed the minimum accepted percentage. ColumnTruePercentCheckSpec
profile_false_percent Verifies that the percentage of false values in a column does not exceed the minimum accepted percentage. ColumnFalsePercentCheckSpec

ColumnNumericPercentile10SensorParametersSpec

Column level sensor that finds the percentile 10 in a given column.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
percentile_value 10th percentile, must equal 0.1 double
filter SQL WHERE clause added to the sensor query. Both the table level filter and a sensor query filter are added, separated by an AND operator. string

ColumnPercentile90InRangeCheckSpec

Column level check that ensures that the percentile 90 of values in a monitored column is in a set range.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
parameters Data quality check parameters ColumnNumericPercentile90SensorParametersSpec
warning Alerting threshold that raises a data quality warning that is considered as a passed data quality check BetweenFloatsRuleParametersSpec
error Default alerting threshold for a percentile 90 in a column that raises a data quality error (alert). BetweenFloatsRuleParametersSpec
fatal Alerting threshold that raises a fatal data quality issue which indicates a serious data quality problem BetweenFloatsRuleParametersSpec
schedule_override Run check scheduling configuration. Specifies the schedule (a cron expression) when the data quality checks are executed by the scheduler. RecurringScheduleSpec
comments Comments for change tracking. Please put comments in this collection because YAML comments may be removed when the YAML file is modified by the tool (serialization and deserialization will remove non tracked comments). CommentsListSpec
disabled Disables the data quality check. Only enabled data quality checks and recurrings are executed. The check should be disabled if it should not work, but the configuration of the sensor and rules should be preserved in the configuration. boolean
exclude_from_kpi Data quality check results (alerts) are included in the data quality KPI calculation by default. Set this field to true in order to exclude this data quality check from the data quality KPI calculation. boolean
include_in_sla Marks the data quality check as part of a data quality SLA. The data quality SLA is a set of critical data quality checks that must always pass and are considered as a data contract for the dataset. boolean
quality_dimension Configures a custom data quality dimension name that is different than the built-in dimensions (Timeliness, Validity, etc.). string
display_name Data quality check display name that could be assigned to the check, otherwise the check_display_name stored in the parquet result files is the check_name. string
data_grouping Data grouping configuration name that should be applied to this data quality check. The data grouping is used to group the check's result by a GROUP BY clause in SQL, evaluating the data quality check for each group of rows. Use the name of one of data grouping configurations defined on the parent table. string

ColumnUniquenessProfilingChecksSpec

Container of built-in preconfigured data quality checks on a column level that are checking for negative values.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
profile_distinct_count Verifies that the number of distinct values in a column does not fall below the minimum accepted count. ColumnDistinctCountCheckSpec
profile_distinct_percent Verifies that the percentage of distinct values in a column does not fall below the minimum accepted percent. ColumnDistinctPercentCheckSpec
profile_duplicate_count Verifies that the number of duplicate values in a column does not exceed the maximum accepted count. ColumnDuplicateCountCheckSpec
profile_duplicate_percent Verifies that the percentage of duplicate values in a column does not exceed the maximum accepted percentage. ColumnDuplicatePercentCheckSpec

ColumnIntegrityProfilingChecksSpec

Container of built-in preconfigured data quality checks on a column level that are checking for integrity.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
profile_foreign_key_not_match_count Verifies that the number of values in a column that does not match values in another table column does not exceed the set count. ColumnIntegrityForeignKeyNotMatchCountCheckSpec
profile_foreign_key_match_percent Verifies that the percentage of values in a column that matches values in another table column does not exceed the set count. ColumnIntegrityForeignKeyMatchPercentCheckSpec

ColumnPiiProfilingChecksSpec

Container of built-in preconfigured data quality checks on a column level that are checking for Personal Identifiable Information (PII).

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
profile_valid_usa_phone_percent Verifies that the percentage of valid USA phone values in a column does not fall below the minimum accepted percentage. ColumnPiiValidUsaPhonePercentCheckSpec
profile_contains_usa_phone_percent Verifies that the percentage of rows that contains USA phone number in a column does not exceed the maximum accepted percentage. ColumnPiiContainsUsaPhonePercentCheckSpec
profile_valid_usa_zipcode_percent Verifies that the percentage of valid USA zip code values in a column does not fall below the minimum accepted percentage. ColumnPiiValidUsaZipcodePercentCheckSpec
profile_contains_usa_zipcode_percent Verifies that the percentage of rows that contains USA zip code in a column does not exceed the maximum accepted percentage. ColumnPiiContainsUsaZipcodePercentCheckSpec
profile_valid_email_percent Verifies that the percentage of valid emails values in a column does not fall below the minimum accepted percentage. ColumnPiiValidEmailPercentCheckSpec
profile_contains_email_percent Verifies that the percentage of rows that contains valid emails in a column does not exceed the minimum accepted percentage. ColumnPiiContainsEmailPercentCheckSpec
profile_valid_ip4_address_percent Verifies that the percentage of valid IP4 address values in a column does not fall below the minimum accepted percentage. ColumnPiiValidIp4AddressPercentCheckSpec
profile_contains_ip4_percent Verifies that the percentage of rows that contains valid IP4 address values in a column does not fall below the minimum accepted percentage. ColumnPiiContainsIp4PercentCheckSpec
profile_valid_ip6_address_percent Verifies that the percentage of valid IP6 address values in a column does not fall below the minimum accepted percentage. ColumnPiiValidIp6AddressPercentCheckSpec
profile_contains_ip6_percent Verifies that the percentage of rows that contains valid IP6 address values in a column does not fall below the minimum accepted percentage. ColumnPiiContainsIp6PercentCheckSpec

ColumnPercentile25InRangeCheckSpec

Column level check that ensures that the percentile 25 of values in a monitored column is in a set range.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
parameters Data quality check parameters ColumnNumericPercentile25SensorParametersSpec
warning Alerting threshold that raises a data quality warning that is considered as a passed data quality check BetweenFloatsRuleParametersSpec
error Default alerting threshold for a percentile 25 in a column that raises a data quality error (alert). BetweenFloatsRuleParametersSpec
fatal Alerting threshold that raises a fatal data quality issue which indicates a serious data quality problem BetweenFloatsRuleParametersSpec
schedule_override Run check scheduling configuration. Specifies the schedule (a cron expression) when the data quality checks are executed by the scheduler. RecurringScheduleSpec
comments Comments for change tracking. Please put comments in this collection because YAML comments may be removed when the YAML file is modified by the tool (serialization and deserialization will remove non tracked comments). CommentsListSpec
disabled Disables the data quality check. Only enabled data quality checks and recurrings are executed. The check should be disabled if it should not work, but the configuration of the sensor and rules should be preserved in the configuration. boolean
exclude_from_kpi Data quality check results (alerts) are included in the data quality KPI calculation by default. Set this field to true in order to exclude this data quality check from the data quality KPI calculation. boolean
include_in_sla Marks the data quality check as part of a data quality SLA. The data quality SLA is a set of critical data quality checks that must always pass and are considered as a data contract for the dataset. boolean
quality_dimension Configures a custom data quality dimension name that is different than the built-in dimensions (Timeliness, Validity, etc.). string
display_name Data quality check display name that could be assigned to the check, otherwise the check_display_name stored in the parquet result files is the check_name. string
data_grouping Data grouping configuration name that should be applied to this data quality check. The data grouping is used to group the check's result by a GROUP BY clause in SQL, evaluating the data quality check for each group of rows. Use the name of one of data grouping configurations defined on the parent table. string

ColumnProfilingCheckCategoriesSpec

Container of column level, preconfigured checks.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
nulls Configuration of column level checks that verify nulls and blanks. ColumnNullsProfilingChecksSpec
numeric Configuration of column level checks that verify negative values. ColumnNumericProfilingChecksSpec
strings Configuration of strings checks on a column level. ColumnStringsProfilingChecksSpec
uniqueness Configuration of uniqueness checks on a column level. ColumnUniquenessProfilingChecksSpec
datetime Configuration of datetime checks on a column level. ColumnDatetimeProfilingChecksSpec
pii Configuration of Personal Identifiable Information (PII) checks on a column level. ColumnPiiProfilingChecksSpec
sql Configuration of SQL checks that use custom SQL aggregated expressions and SQL conditions in data quality checks. ColumnSqlProfilingChecksSpec
bool Configuration of booleans checks on a column level. ColumnBoolProfilingChecksSpec
integrity Configuration of integrity checks on a column level. ColumnIntegrityProfilingChecksSpec
accuracy Configuration of accuracy checks on a column level. ColumnAccuracyProfilingChecksSpec
datatype Configuration of datatype checks on a column level. ColumnDatatypeProfilingChecksSpec
anomaly Configuration of anomaly checks on a column level. ColumnAnomalyProfilingChecksSpec
schema Configuration of schema checks on a column level. ColumnSchemaProfilingChecksSpec
comparisons Dictionary of configuration of checks for table comparisons at a column level. The key that identifies each comparison must match the name of a data comparison that is configured on the parent table. ColumnComparisonProfilingChecksSpecMap
custom Dictionary of custom checks. The keys are check names. CustomCheckSpecMap

ColumnAccuracyProfilingChecksSpec

Container of built-in preconfigured data quality checks on a column level that are checking for accuracy.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
profile_total_sum_match_percent Verifies that percentage of the difference in total sum of a column in a table and total sum of a column of another table does not exceed the set number. ColumnAccuracyTotalSumMatchPercentCheckSpec
profile_total_min_match_percent Verifies that the percentage of difference in total min of a column in a table and total min of a column of another table does not exceed the set number. ColumnAccuracyTotalMinMatchPercentCheckSpec
profile_total_max_match_percent Verifies that the percentage of difference in total max of a column in a table and total max of a column of another table does not exceed the set number. ColumnAccuracyTotalMaxMatchPercentCheckSpec
profile_total_average_match_percent Verifies that the percentage of difference in total average of a column in a table and total average of a column of another table does not exceed the set number. ColumnAccuracyTotalAverageMatchPercentCheckSpec
profile_total_not_null_count_match_percent Verifies that the percentage of difference in total not null count of a column in a table and total not null count of a column of another table does not exceed the set number. Stores the most recent captured value for each day when the data quality check was evaluated. ColumnAccuracyTotalNotNullCountMatchPercentCheckSpec

ColumnSqlProfilingChecksSpec

Container of built-in preconfigured data quality checks on a column level that are using custom SQL expressions (conditions).

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
profile_sql_condition_passed_percent_on_column Verifies that a minimum percentage of rows passed a custom SQL condition (expression). ColumnSqlConditionPassedPercentCheckSpec
profile_sql_condition_failed_count_on_column Verifies that a number of rows failed a custom SQL condition(expression) does not exceed the maximum accepted count. ColumnSqlConditionFailedCountCheckSpec
profile_sql_aggregate_expr_column Verifies that a custom aggregated SQL expression (MIN, MAX, etc.) is not outside the set range. ColumnSqlAggregateExprCheckSpec

ColumnNumericProfilingChecksSpec

Container of built-in preconfigured data quality checks on a column level for numeric values.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
profile_negative_count Verifies that the number of negative values in a column does not exceed the maximum accepted count. ColumnNegativeCountCheckSpec
profile_negative_percent Verifies that the percentage of negative values in a column does not exceed the maximum accepted percentage. ColumnNegativePercentCheckSpec
profile_non_negative_count Verifies that the number of non-negative values in a column does not exceed the maximum accepted count. ColumnNonNegativeCountCheckSpec
profile_non_negative_percent Verifies that the percentage of non-negative values in a column does not exceed the maximum accepted percentage. ColumnNonNegativePercentCheckSpec
profile_expected_numbers_in_use_count Verifies that the expected numeric values were found in the column. Raises a data quality issue when too many expected values were not found (were missing). ColumnExpectedNumbersInUseCountCheckSpec
profile_number_value_in_set_percent The check measures the percentage of rows whose value in a tested column is one of values from a list of expected values or the column value is null. Verifies that the percentage of rows having a valid column value does not exceed the minimum accepted percentage. ColumnNumberValueInSetPercentCheckSpec
profile_values_in_range_numeric_percent Verifies that the percentage of values from range in a column does not exceed the minimum accepted percentage. ColumnValuesInRangeNumericPercentCheckSpec
profile_values_in_range_integers_percent Verifies that the percentage of values from range in a column does not exceed the minimum accepted percentage. ColumnValuesInRangeIntegersPercentCheckSpec
profile_value_below_min_value_count The check counts the number of values in the column that is below the value defined by the user as a parameter. ColumnValueBelowMinValueCountCheckSpec
profile_value_below_min_value_percent The check counts the percentage of values in the column that is below the value defined by the user as a parameter. ColumnValueBelowMinValuePercentCheckSpec
profile_value_above_max_value_count The check counts the number of values in the column that is above the value defined by the user as a parameter. ColumnValueAboveMaxValueCountCheckSpec
profile_value_above_max_value_percent The check counts the percentage of values in the column that is above the value defined by the user as a parameter. ColumnValueAboveMaxValuePercentCheckSpec
profile_max_in_range Verifies that the maximal value in a column is not outside the set range. ColumnMaxInRangeCheckSpec
profile_min_in_range Verifies that the minimal value in a column is not outside the set range. ColumnMinInRangeCheckSpec
profile_mean_in_range Verifies that the average (mean) of all values in a column is not outside the set range. ColumnMeanInRangeCheckSpec
profile_percentile_in_range Verifies that the percentile of all values in a column is not outside the set range. ColumnPercentileInRangeCheckSpec
profile_median_in_range Verifies that the median of all values in a column is not outside the set range. ColumnMedianInRangeCheckSpec
profile_percentile_10_in_range Verifies that the percentile 10 of all values in a column is not outside the set range. ColumnPercentile10InRangeCheckSpec
profile_percentile_25_in_range Verifies that the percentile 25 of all values in a column is not outside the set range. ColumnPercentile25InRangeCheckSpec
profile_percentile_75_in_range Verifies that the percentile 75 of all values in a column is not outside the set range. ColumnPercentile75InRangeCheckSpec
profile_percentile_90_in_range Verifies that the percentile 90 of all values in a column is not outside the set range. ColumnPercentile90InRangeCheckSpec
profile_sample_stddev_in_range Verifies that the sample standard deviation of all values in a column is not outside the set range. ColumnSampleStddevInRangeCheckSpec
profile_population_stddev_in_range Verifies that the population standard deviation of all values in a column is not outside the set range. ColumnPopulationStddevInRangeCheckSpec
profile_sample_variance_in_range Verifies that the sample variance of all values in a column is not outside the set range. ColumnSampleVarianceInRangeCheckSpec
profile_population_variance_in_range Verifies that the population variance of all values in a column is not outside the set range. ColumnPopulationVarianceInRangeCheckSpec
profile_sum_in_range Verifies that the sum of all values in a column is not outside the set range. ColumnSumInRangeCheckSpec
profile_invalid_latitude_count Verifies that the number of invalid latitude values in a column does not exceed the maximum accepted count. ColumnInvalidLatitudeCountCheckSpec
profile_valid_latitude_percent Verifies that the percentage of valid latitude values in a column does not fall below the minimum accepted percentage. ColumnValidLatitudePercentCheckSpec
profile_invalid_longitude_count Verifies that the number of invalid longitude values in a column does not exceed the maximum accepted count. ColumnInvalidLongitudeCountCheckSpec
profile_valid_longitude_percent Verifies that the percentage of valid longitude values in a column does not fall below the minimum accepted percentage. ColumnValidLongitudePercentCheckSpec

ColumnComparisonProfilingChecksSpecMap

Container of comparison checks for each defined data comparison. The name of the key in this dictionary must match a name of a table comparison that is defined on the parent table. Contains configuration of column level comparison checks. Each column level check container also defines the name of the reference column name to which we are comparing.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
access_order boolean
size integer
mod_count integer
threshold integer

ColumnComparisonProfilingChecksSpec

Container of built-in preconfigured column level comparison checks that compare min/max/sum/mean/nulls measures between the column in the tested (parent) table and a matching reference column in the reference table (the source of truth).

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
profile_sum_match Verifies that percentage of the difference between the sum of values in a tested column in a parent table and the sum of a values in a column in the reference table. The difference must be below defined percentage thresholds. ColumnComparisonSumMatchCheckSpec
profile_min_match Verifies that percentage of the difference between the minimum value in a tested column in a parent table and the minimum value in a column in the reference table. The difference must be below defined percentage thresholds. ColumnComparisonMinMatchCheckSpec
profile_max_match Verifies that percentage of the difference between the maximum value in a tested column in a parent table and the maximum value in a column in the reference table. The difference must be below defined percentage thresholds. ColumnComparisonMaxMatchCheckSpec
profile_mean_match Verifies that percentage of the difference between the mean (average) value in a tested column in a parent table and the mean (average) value in a column in the reference table. The difference must be below defined percentage thresholds. ColumnComparisonMeanMatchCheckSpec
profile_not_null_count_match Verifies that percentage of the difference between the count of not null values in a tested column in a parent table and the count of not null values in a column in the reference table. The difference must be below defined percentage thresholds. ColumnComparisonNotNullCountMatchCheckSpec
profile_null_count_match Verifies that percentage of the difference between the count of null values in a tested column in a parent table and the count of null values in a column in the reference table. The difference must be below defined percentage thresholds. ColumnComparisonNullCountMatchCheckSpec
reference_column The name of the reference column name in the reference table. It is the column to which the current column is compared to. string

ColumnAnomalyProfilingChecksSpec

Container of built-in preconfigured data quality checks on a column level for detecting anomalies.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
profile_mean_anomaly_stationary_30_days Verifies that the mean value in a column changes in a rate within a percentile boundary during last 30 days. ColumnAnomalyStationaryMean30DaysCheckSpec
profile_mean_anomaly_stationary Verifies that the mean value in a column changes in a rate within a percentile boundary during last 90 days. ColumnAnomalyStationaryMeanCheckSpec
profile_median_anomaly_stationary_30_days Verifies that the median in a column changes in a rate within a percentile boundary during last 30 days. ColumnAnomalyStationaryMedian30DaysCheckSpec
profile_median_anomaly_stationary Verifies that the median in a column changes in a rate within a percentile boundary during last 90 days. ColumnAnomalyStationaryMedianCheckSpec
profile_sum_anomaly_differencing_30_days Verifies that the sum in a column changes in a rate within a percentile boundary during last 30 days. ColumnAnomalyDifferencingSum30DaysCheckSpec
profile_sum_anomaly_differencing Verifies that the sum in a column changes in a rate within a percentile boundary during last 90 days. ColumnAnomalyDifferencingSumCheckSpec
profile_mean_change Verifies that the mean value in a column changed in a fixed rate since last readout. ColumnChangeMeanCheckSpec
profile_mean_change_yesterday Verifies that the mean value in a column changed in a fixed rate since last readout from yesterday. ColumnChangeMeanSinceYesterdayCheckSpec
profile_mean_change_7_days Verifies that the mean value in a column changed in a fixed rate since last readout from last week. ColumnChangeMeanSince7DaysCheckSpec
profile_mean_change_30_days Verifies that the mean value in a column changed in a fixed rate since last readout from last month. ColumnChangeMeanSince30DaysCheckSpec
profile_median_change Verifies that the median in a column changed in a fixed rate since last readout. ColumnChangeMedianCheckSpec
profile_median_change_yesterday Verifies that the median in a column changed in a fixed rate since last readout from yesterday. ColumnChangeMedianSinceYesterdayCheckSpec
profile_median_change_7_days Verifies that the median in a column changed in a fixed rate since last readout from last week. ColumnChangeMedianSince7DaysCheckSpec
profile_median_change_30_days Verifies that the median in a column changed in a fixed rate since last readout from last month. ColumnChangeMedianSince30DaysCheckSpec
profile_sum_change Verifies that the sum in a column changed in a fixed rate since last readout. ColumnChangeSumCheckSpec
profile_sum_change_yesterday Verifies that the sum in a column changed in a fixed rate since last readout from yesterday. ColumnChangeSumSinceYesterdayCheckSpec
profile_sum_change_7_days Verifies that the sum in a column changed in a fixed rate since last readout from last week. ColumnChangeSumSince7DaysCheckSpec
profile_sum_change_30_days Verifies that the sum in a column changed in a fixed rate since last readout from last month. ColumnChangeSumSince30DaysCheckSpec

ColumnDatetimeProfilingChecksSpec

Container of built-in preconfigured data quality checks on a column level that are checking for datetime.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
profile_date_values_in_future_percent Verifies that the percentage of date values in future in a column does not exceed the maximum accepted percentage. ColumnDateValuesInFuturePercentCheckSpec
profile_datetime_value_in_range_date_percent Verifies that the percentage of date values in the range defined by the user in a column does not exceed the maximum accepted percentage. ColumnDatetimeValueInRangeDatePercentCheckSpec

ColumnPercentile10InRangeCheckSpec

Column level check that ensures that the percentile 10 of values in a monitored column is in a set range.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
parameters Data quality check parameters ColumnNumericPercentile10SensorParametersSpec
warning Alerting threshold that raises a data quality warning that is considered as a passed data quality check BetweenFloatsRuleParametersSpec
error Default alerting threshold for a percentile 10 in a column that raises a data quality error (alert). BetweenFloatsRuleParametersSpec
fatal Alerting threshold that raises a fatal data quality issue which indicates a serious data quality problem BetweenFloatsRuleParametersSpec
schedule_override Run check scheduling configuration. Specifies the schedule (a cron expression) when the data quality checks are executed by the scheduler. RecurringScheduleSpec
comments Comments for change tracking. Please put comments in this collection because YAML comments may be removed when the YAML file is modified by the tool (serialization and deserialization will remove non tracked comments). CommentsListSpec
disabled Disables the data quality check. Only enabled data quality checks and recurrings are executed. The check should be disabled if it should not work, but the configuration of the sensor and rules should be preserved in the configuration. boolean
exclude_from_kpi Data quality check results (alerts) are included in the data quality KPI calculation by default. Set this field to true in order to exclude this data quality check from the data quality KPI calculation. boolean
include_in_sla Marks the data quality check as part of a data quality SLA. The data quality SLA is a set of critical data quality checks that must always pass and are considered as a data contract for the dataset. boolean
quality_dimension Configures a custom data quality dimension name that is different than the built-in dimensions (Timeliness, Validity, etc.). string
display_name Data quality check display name that could be assigned to the check, otherwise the check_display_name stored in the parquet result files is the check_name. string
data_grouping Data grouping configuration name that should be applied to this data quality check. The data grouping is used to group the check's result by a GROUP BY clause in SQL, evaluating the data quality check for each group of rows. Use the name of one of data grouping configurations defined on the parent table. string

ColumnPercentile75InRangeCheckSpec

Column level check that ensures that the percentile 75 of values in a monitored column is in a set range.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
parameters Data quality check parameters ColumnNumericPercentile75SensorParametersSpec
warning Alerting threshold that raises a data quality warning that is considered as a passed data quality check BetweenFloatsRuleParametersSpec
error Default alerting threshold for a percentile 75 in a column that raises a data quality error (alert). BetweenFloatsRuleParametersSpec
fatal Alerting threshold that raises a fatal data quality issue which indicates a serious data quality problem BetweenFloatsRuleParametersSpec
schedule_override Run check scheduling configuration. Specifies the schedule (a cron expression) when the data quality checks are executed by the scheduler. RecurringScheduleSpec
comments Comments for change tracking. Please put comments in this collection because YAML comments may be removed when the YAML file is modified by the tool (serialization and deserialization will remove non tracked comments). CommentsListSpec
disabled Disables the data quality check. Only enabled data quality checks and recurrings are executed. The check should be disabled if it should not work, but the configuration of the sensor and rules should be preserved in the configuration. boolean
exclude_from_kpi Data quality check results (alerts) are included in the data quality KPI calculation by default. Set this field to true in order to exclude this data quality check from the data quality KPI calculation. boolean
include_in_sla Marks the data quality check as part of a data quality SLA. The data quality SLA is a set of critical data quality checks that must always pass and are considered as a data contract for the dataset. boolean
quality_dimension Configures a custom data quality dimension name that is different than the built-in dimensions (Timeliness, Validity, etc.). string
display_name Data quality check display name that could be assigned to the check, otherwise the check_display_name stored in the parquet result files is the check_name. string
data_grouping Data grouping configuration name that should be applied to this data quality check. The data grouping is used to group the check's result by a GROUP BY clause in SQL, evaluating the data quality check for each group of rows. Use the name of one of data grouping configurations defined on the parent table. string

ColumnStringsProfilingChecksSpec

Container of built-in preconfigured data quality checks on a column level that are checking for string.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
profile_string_max_length Verifies that the length of string in a column does not exceed the maximum accepted length. ColumnStringMaxLengthCheckSpec
profile_string_min_length Verifies that the length of string in a column does not fall below the minimum accepted length. ColumnStringMinLengthCheckSpec
profile_string_mean_length Verifies that the length of string in a column does not exceed the mean accepted length. ColumnStringMeanLengthCheckSpec
profile_string_length_below_min_length_count The check counts the number of strings in the column that is below the length defined by the user as a parameter. ColumnStringLengthBelowMinLengthCountCheckSpec
profile_string_length_below_min_length_percent The check counts the percentage of strings in the column that is below the length defined by the user as a parameter. ColumnStringLengthBelowMinLengthPercentCheckSpec
profile_string_length_above_max_length_count The check counts the number of strings in the column that is above the length defined by the user as a parameter. ColumnStringLengthAboveMaxLengthCountCheckSpec
profile_string_length_above_max_length_percent The check counts the percentage of strings in the column that is above the length defined by the user as a parameter. ColumnStringLengthAboveMaxLengthPercentCheckSpec
profile_string_length_in_range_percent The check counts the percentage of those strings with length in the range provided by the user in the column. ColumnStringLengthInRangePercentCheckSpec
profile_string_empty_count Verifies that empty strings in a column does not exceed the maximum accepted count. ColumnStringEmptyCountCheckSpec
profile_string_empty_percent Verifies that the percentage of empty strings in a column does not exceed the maximum accepted percentage. ColumnStringEmptyPercentCheckSpec
profile_string_whitespace_count Verifies that the number of whitespace strings in a column does not exceed the maximum accepted count. ColumnStringWhitespaceCountCheckSpec
profile_string_whitespace_percent Verifies that the percentage of whitespace strings in a column does not exceed the minimum accepted percentage. ColumnStringWhitespacePercentCheckSpec
profile_string_surrounded_by_whitespace_count Verifies that the number of strings surrounded by whitespace in a column does not exceed the maximum accepted count. ColumnStringSurroundedByWhitespaceCountCheckSpec
profile_string_surrounded_by_whitespace_percent Verifies that the percentage of strings surrounded by whitespace in a column does not exceed the maximum accepted percentage. ColumnStringSurroundedByWhitespacePercentCheckSpec
profile_string_null_placeholder_count Verifies that the number of null placeholders in a column does not exceed the maximum accepted count. ColumnStringNullPlaceholderCountCheckSpec
profile_string_null_placeholder_percent Verifies that the percentage of null placeholders in a column does not exceed the maximum accepted percentage. ColumnStringNullPlaceholderPercentCheckSpec
profile_string_boolean_placeholder_percent Verifies that the percentage of boolean placeholder for strings in a column does not fall below the minimum accepted percentage. ColumnStringBooleanPlaceholderPercentCheckSpec
profile_string_parsable_to_integer_percent Verifies that the percentage of parsable to integer string in a column does not fall below the minimum accepted percentage. ColumnStringParsableToIntegerPercentCheckSpec
profile_string_parsable_to_float_percent Verifies that the percentage of parsable to float string in a column does not fall below the minimum accepted percentage. ColumnStringParsableToFloatPercentCheckSpec
profile_expected_strings_in_use_count Verifies that the expected string values were found in the column. Raises a data quality issue when too many expected values were not found (were missing). ColumnExpectedStringsInUseCountCheckSpec
profile_string_value_in_set_percent The check measures the percentage of rows whose value in a tested column is one of values from a list of expected values or the column value is null. Verifies that the percentage of rows having a valid column value does not exceed the minimum accepted percentage. ColumnStringValueInSetPercentCheckSpec
profile_string_valid_dates_percent Verifies that the percentage of valid dates in a column does not fall below the minimum accepted percentage. ColumnStringValidDatesPercentCheckSpec
profile_string_valid_country_code_percent Verifies that the percentage of valid country code in a column does not fall below the minimum accepted percentage. ColumnStringValidCountryCodePercentCheckSpec
profile_string_valid_currency_code_percent Verifies that the percentage of valid currency code in a column does not fall below the minimum accepted percentage. ColumnStringValidCurrencyCodePercentCheckSpec
profile_string_invalid_email_count Verifies that the number of invalid emails in a column does not exceed the maximum accepted count. ColumnStringInvalidEmailCountCheckSpec
profile_string_invalid_uuid_count Verifies that the number of invalid UUID in a column does not exceed the maximum accepted count. ColumnStringInvalidUuidCountCheckSpec
profile_string_valid_uuid_percent Verifies that the percentage of valid UUID in a column does not fall below the minimum accepted percentage. ColumnStringValidUuidPercentCheckSpec
profile_string_invalid_ip4_address_count Verifies that the number of invalid IP4 address in a column does not exceed the maximum accepted count. ColumnStringInvalidIp4AddressCountCheckSpec
profile_string_invalid_ip6_address_count Verifies that the number of invalid IP6 address in a column does not exceed the maximum accepted count. ColumnStringInvalidIp6AddressCountCheckSpec
profile_string_not_match_regex_count Verifies that the number of strings not matching the custom regex in a column does not exceed the maximum accepted count. ColumnStringNotMatchRegexCountCheckSpec
profile_string_match_regex_percent Verifies that the percentage of strings matching the custom regex in a column does not fall below the minimum accepted percentage. ColumnStringMatchRegexPercentCheckSpec
profile_string_not_match_date_regex_count Verifies that the number of strings not matching the date format regex in a column does not exceed the maximum accepted count. ColumnStringNotMatchDateRegexCountCheckSpec
profile_string_match_date_regex_percent Verifies that the percentage of strings matching the date format regex in a column does not fall below the minimum accepted percentage. ColumnStringMatchDateRegexPercentCheckSpec
profile_string_match_name_regex_percent Verifies that the percentage of strings matching the name regex in a column does not fall below the minimum accepted percentage. ColumnStringMatchNameRegexPercentCheckSpec
profile_expected_strings_in_top_values_count Verifies that the top X most popular column values contain all values from a list of expected values. ColumnExpectedStringsInTopValuesCountCheckSpec
profile_string_datatype_detected Detects the data type of text values stored in the column. The sensor returns the code of the detected data type of a column: 1 - integers, 2 - floats, 3 - dates, 4 - timestamps, 5 - booleans, 6 - strings, 7 - mixed data types. Raises a data quality issue when the detected data type does not match the expected data type. ColumnStringDatatypeDetectedCheckSpec

ColumnNumericPercentile75SensorParametersSpec

Column level sensor that finds the percentile 75 in a given column.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
percentile_value 75th percentile, must equal 0.75 double
filter SQL WHERE clause added to the sensor query. Both the table level filter and a sensor query filter are added, separated by an AND operator. string

ColumnDatatypeProfilingChecksSpec

Container of built-in preconfigured data quality checks on a column level that are checking for datatype.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
profile_date_match_format_percent Verifies that the percentage of date values matching the given format in a column does not exceed the minimum accepted percentage. ColumnDatatypeDateMatchFormatPercentCheckSpec
profile_string_datatype_changed Detects that the data type of texts stored in a text column has changed since the last verification. The sensor returns the detected data type of a column: 1 - integers, 2 - floats, 3 - dates, 4 - timestamps, 5 - booleans, 6 - strings, 7 - mixed data types. ColumnDatatypeStringDatatypeChangedCheckSpec

ColumnNullsProfilingChecksSpec

Container of built-in preconfigured data quality checks on a column level that are checking for nulls.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
profile_nulls_count Verifies that the number of null values in a column does not exceed the maximum accepted count. ColumnNullsCountCheckSpec
profile_nulls_percent Verifies that the percent of null values in a column does not exceed the maximum accepted percentage. ColumnNullsPercentCheckSpec
profile_nulls_percent_anomaly_stationary_30_days Verifies that the null percent value in a column changes in a rate within a percentile boundary during last 30 days. ColumnAnomalyStationaryNullPercent30DaysCheckSpec
profile_nulls_percent_anomaly_stationary Verifies that the null percent value in a column changes in a rate within a percentile boundary during last 90 days. ColumnAnomalyStationaryNullPercentCheckSpec
profile_nulls_percent_change Verifies that the null percent value in a column changed in a fixed rate since last readout. ColumnChangeNullPercentCheckSpec
profile_nulls_percent_change_yesterday Verifies that the null percent value in a column changed in a fixed rate since last readout from yesterday. ColumnChangeNullPercentSinceYesterdayCheckSpec
profile_nulls_percent_change_7_days Verifies that the null percent value in a column changed in a fixed rate since last readout from last week. ColumnChangeNullPercentSince7DaysCheckSpec
profile_nulls_percent_change_30_days Verifies that the null percent value in a column changed in a fixed rate since last readout from last month. ColumnChangeNullPercentSince30DaysCheckSpec
profile_not_nulls_count Verifies that the number of not null values in a column does not exceed the minimum accepted count. ColumnNotNullsCountCheckSpec
profile_not_nulls_percent Verifies that the percent of not null values in a column does not exceed the minimum accepted percentage. ColumnNotNullsPercentCheckSpec

ColumnMedianInRangeCheckSpec

Column level check that ensures that the median of values in a monitored column is in a set range.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
parameters Data quality check parameters ColumnNumericMedianSensorParametersSpec
warning Alerting threshold that raises a data quality warning that is considered as a passed data quality check BetweenFloatsRuleParametersSpec
error Default alerting threshold for a median in a column that raises a data quality error (alert). BetweenFloatsRuleParametersSpec
fatal Alerting threshold that raises a fatal data quality issue which indicates a serious data quality problem BetweenFloatsRuleParametersSpec
schedule_override Run check scheduling configuration. Specifies the schedule (a cron expression) when the data quality checks are executed by the scheduler. RecurringScheduleSpec
comments Comments for change tracking. Please put comments in this collection because YAML comments may be removed when the YAML file is modified by the tool (serialization and deserialization will remove non tracked comments). CommentsListSpec
disabled Disables the data quality check. Only enabled data quality checks and recurrings are executed. The check should be disabled if it should not work, but the configuration of the sensor and rules should be preserved in the configuration. boolean
exclude_from_kpi Data quality check results (alerts) are included in the data quality KPI calculation by default. Set this field to true in order to exclude this data quality check from the data quality KPI calculation. boolean
include_in_sla Marks the data quality check as part of a data quality SLA. The data quality SLA is a set of critical data quality checks that must always pass and are considered as a data contract for the dataset. boolean
quality_dimension Configures a custom data quality dimension name that is different than the built-in dimensions (Timeliness, Validity, etc.). string
display_name Data quality check display name that could be assigned to the check, otherwise the check_display_name stored in the parquet result files is the check_name. string
data_grouping Data grouping configuration name that should be applied to this data quality check. The data grouping is used to group the check's result by a GROUP BY clause in SQL, evaluating the data quality check for each group of rows. Use the name of one of data grouping configurations defined on the parent table. string