TableYaml
ColumnUniquenessDuplicatePercentStatisticsCollectorSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
parameters | Profiler parameters | ColumnUniquenessDuplicatePercentSensorParametersSpec | |||
disabled | Disables this profiler. Only enabled profilers are executed during a profiling process. | boolean |
ColumnStatisticsCollectorsRootCategoriesSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
nulls | Configuration of null values profilers on a column level. | ColumnNullsStatisticsCollectorsSpec | |||
strings | Configuration of string (text) profilers on a column level. | ColumnStringsStatisticsCollectorsSpec | |||
uniqueness | Configuration of profilers that analyse uniqueness of values (distinct count). | ColumnUniquenessStatisticsCollectorsSpec | |||
range | Configuration of profilers that analyse the range of values (min, max). | ColumnRangeStatisticsCollectorsSpec | |||
sampling | Configuration of profilers that collect the column samples. | ColumnSamplingStatisticsCollectorsSpec |
TablePartitionedChecksRootSpec
Container of table level partitioned checks, divided by the time window (daily, monthly, etc.)
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
daily | Configuration of day partitioned data quality checks evaluated at a table level. | TableDailyPartitionedCheckCategoriesSpec | |||
monthly | Configuration of monthly partitioned data quality checks evaluated at a table level.. | TableMonthlyPartitionedCheckCategoriesSpec |
ColumnUniquenessDuplicateCountStatisticsCollectorSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
parameters | Profiler parameters | ColumnUniquenessDuplicateCountSensorParametersSpec | |||
disabled | Disables this profiler. Only enabled profilers are executed during a profiling process. | boolean |
ColumnStringsStringMeanLengthStatisticsCollectorSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
parameters | Profiler parameters | ColumnStringsStringMeanLengthSensorParametersSpec | |||
disabled | Disables this profiler. Only enabled profilers are executed during a profiling process. | boolean |
TableComparisonGroupingColumnsPairSpec
Configuration of a pair of columns on the compared table and the reference table (the source of truth) that are joined and used for grouping to perform data comparison of aggregated results (sums of columns, row counts, etc.).
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
compared_table_column_name | The name of the column on the compared table (the parent table) that is used in the GROUP BY clause to group rows before compared aggregates (row counts, sums, etc.) are calculated. This column is also used to join (match) results to the reference table. | string | |||
reference_table_column_name | The name of the column on the reference table (the source of truth) that is used in the GROUP BY clause to group rows before compared aggregates (row counts, sums, etc.) are calculated. This column is also used to join (match) results to the compared table. | string |
TableComparisonConfigurationSpecMap
Dictionary of data comparison configurations between the current table (the parent of this node) and another reference table (the source of truth) to which we are comparing the tables to measure the accuracy of the data.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
access_order | boolean | ||||
size | integer | ||||
mod_count | integer | ||||
threshold | integer |
ColumnTypeSnapshotSpec
Stores the column data type captured at the time of the table metadata import.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
column_type | Column data type using the monitored database type names. | string | |||
nullable | Column is nullable. | boolean | |||
length | Maximum length of text and binary columns. | integer | |||
precision | Precision of a numeric (decimal) data type. | integer | |||
scale | Scale of a numeric (decimal) data type. | integer |
ColumnStringsStringDatatypeDetectStatisticsCollectorSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
parameters | Profiler parameters | ColumnStringsStringDatatypeDetectSensorParametersSpec | |||
disabled | Disables this profiler. Only enabled profilers are executed during a profiling process. | boolean |
TableYaml
Table and column definition file that defines a list of tables and columns that are covered by data quality checks.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
api_version | string | ||||
kind | enum | table dashboards source sensor check rule file_index settings provider_sensor |
|||
spec | TableSpec |
ColumnStringsStringMinLengthStatisticsCollectorSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
parameters | Profiler parameters | ColumnStringsStringMinLengthSensorParametersSpec | |||
disabled | Disables this profiler. Only enabled profilers are executed during a profiling process. | boolean |
TableIncidentGroupingSpec
Configuration of data quality incident grouping on a table level. Defines how similar data quality issues are grouped into incidents.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
grouping_level | Grouping level of failed data quality checks for creating higher level data quality incidents. The default grouping level is by a table, a data quality dimension and a check category (i.e. a datatype data quality incident detected on a table X in the numeric checks category). | enum | table_dimension_category_type table_dimension table table_dimension_category table_dimension_category_name |
||
minimum_severity | Minimum severity level of data quality issues that are grouped into incidents. The default minimum severity level is 'warning'. Other supported severity levels are 'error' and 'fatal'. | enum | warning error fatal |
||
divide_by_data_group | Create separate data quality incidents for each data group, creating different incidents for different groups of rows. By default, data groups are ignored for grouping data quality issues into data quality incidents. | boolean | |||
disabled | Disables data quality incident creation for failed data quality checks on the table. | boolean |
ColumnRangeMinValueSensorParametersSpec
Column level sensor that finds the minimum value. It works on any data type that supports the MIN functions. The returned data type matches the data type of the column (it could return date, integer, string, datetime, etc.).
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
filter | SQL WHERE clause added to the sensor query. Both the table level filter and a sensor query filter are added, separated by an AND operator. | string |
ColumnNullsNullsPercentStatisticsCollectorSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
parameters | Profiler parameters | ColumnNullsNullsPercentSensorParametersSpec | |||
disabled | Disables this profiler. Only enabled profilers are executed during a profiling process. | boolean |
TableOwnerSpec
Table owner information.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
data_steward | Data steward name | string | |||
application | Business application name | string |
TableVolumeRowCountStatisticsCollectorSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
parameters | Profiler parameters | TableVolumeRowCountSensorParametersSpec | |||
disabled | Disables this profiler. Only enabled profilers are executed during a profiling process. | boolean |
ColumnSpec
Column specification that identifies a single column.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
disabled | Disables all data quality checks on the column. Data quality checks will not be executed. | boolean | |||
sql_expression | SQL expression used for calculated fields or when additional column value transformation is required before the column could be used analyzed in data quality checks (data type conversion, transformation). It should be an SQL expression using the SQL language of the analyzed database type. Use replacement tokens {table} to replace the content with the full table name, {alias} to replace the content with the table alias of an analyzed table or {column} to replace the content with the analyzed column name. An example to extract a value from a string column that stores a JSON in PostgreSQL: "{column}::json->'address'->'zip'". | string | |||
type_snapshot | Column data type that was retrieved when the table metadata was imported. | ColumnTypeSnapshotSpec | |||
profiling_checks | Configuration of data quality profiling checks that are enabled. Pick a check from a category, apply the parameters and rules to enable it. | ColumnProfilingCheckCategoriesSpec | |||
recurring_checks | Configuration of column level recurring checks. Recurring are data quality checks that are evaluated for each period of time (daily, weekly, monthly, etc.). A recurring stores only the most recent data quality check result for each period of time. | ColumnRecurringChecksRootSpec | |||
partitioned_checks | Configuration of column level date/time partitioned checks. Partitioned data quality checks are evaluated for each partition separately, raising separate alerts at a partition level. The table does not need to be physically partitioned by date, it is possible to run data quality checks for each day or month of data separately. | ColumnPartitionedChecksRootSpec | |||
statistics | Custom configuration of a column level statistics collector (a basic profiler). Enables customization of the statistics collector settings when the collector is analysing this column. | ColumnStatisticsCollectorsRootCategoriesSpec | |||
labels | Custom labels that were assigned to the column. Labels are used for searching for columns when filtered data quality checks are executed. | LabelSetSpec | |||
comments | Comments for change tracking. Please put comments in this collection because YAML comments may be removed when the YAML file is modified by the tool (serialization and deserialization will remove non tracked comments). | CommentsListSpec |
ColumnStringsStringMaxLengthStatisticsCollectorSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
parameters | Profiler parameters | ColumnStringsStringMaxLengthSensorParametersSpec | |||
disabled | Disables this profiler. Only enabled profilers are executed during a profiling process. | boolean |
ColumnUniquenessDistinctCountStatisticsCollectorSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
parameters | Profiler parameters | ColumnUniquenessDistinctCountSensorParametersSpec | |||
disabled | Disables this profiler. Only enabled profilers are executed during a profiling process. | boolean |
ColumnSpecMap
Dictionary of columns indexed by a physical column name.
ColumnRangeMinValueStatisticsCollectorSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
parameters | Profiler parameters | ColumnRangeMinValueSensorParametersSpec | |||
disabled | Disables this profiler. Only enabled profilers are executed during a profiling process. | boolean |
ColumnRangeSumValueStatisticsCollectorSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
parameters | Profiler parameters | ColumnNumericSumSensorParametersSpec | |||
disabled | Disables this profiler. Only enabled profilers are executed during a profiling process. | boolean |
DataGroupingConfigurationSpecMap
Dictionary of named data grouping configurations defined on a table level.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
access_order | boolean | ||||
size | integer | ||||
mod_count | integer | ||||
threshold | integer |
TableRecurringChecksSpec
Container of table level recurring, divided by the time window (daily, monthly, etc.)
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
daily | Configuration of daily recurring evaluated at a table level. | TableDailyRecurringCheckCategoriesSpec | |||
monthly | Configuration of monthly recurring evaluated at a table level. | TableMonthlyRecurringCheckCategoriesSpec |
ColumnUniquenessStatisticsCollectorsSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
distinct_count | Configuration of the profiler that counts distinct column values. | ColumnUniquenessDistinctCountStatisticsCollectorSpec | |||
distinct_percent | Configuration of the profiler that measure the percentage of distinct column values. | ColumnUniquenessDistinctPercentStatisticsCollectorSpec | |||
duplicate_count | Configuration of the profiler that counts duplicate column values. | ColumnUniquenessDuplicateCountStatisticsCollectorSpec | |||
duplicate_percent | Configuration of the profiler that measure the percentage of duplicate column values. | ColumnUniquenessDuplicatePercentStatisticsCollectorSpec |
ColumnSamplingColumnSamplesStatisticsCollectorSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
parameters | Profiler parameters | ColumnSamplingColumnSamplesSensorParametersSpec | |||
disabled | Disables this profiler. Only enabled profilers are executed during a profiling process. | boolean |
TableSpec
Table specification that defines data quality tests that are enabled on a table and columns.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
disabled | Disables all data quality checks on the table. Data quality checks will not be executed. | boolean | |||
stage | Stage name. | string | |||
priority | Table priority (1, 2, 3, 4, ...). The tables could be assigned a priority level. The table priority is copied into each data quality check result and a sensor result, enabling efficient grouping of more and less important tables during a data quality improvement project, when the data quality issues on higher priority tables are fixed before data quality issues on less important tables. | integer | |||
filter | SQL WHERE clause added to the sensor queries. Use replacement tokens {table} to replace the content with the full table name, {alias} to replace the content with the table alias of an analyzed table or {column} to replace the content with the analyzed column name. | string | |||
timestamp_columns | Column names that store the timestamps that identify the event (transaction) timestamp and the ingestion (inserted / loaded at) timestamps. Also configures the timestamp source for the date/time partitioned data quality checks (event timestamp or ingestion timestamp). | TimestampColumnsSpec | |||
incremental_time_window | Configuration of the time window for analyzing daily or monthly partitions. Specifies the number of recent days and recent months that are analyzed when the partitioned data quality checks are run in an incremental mode (the default mode). | PartitionIncrementalTimeWindowSpec | |||
default_grouping_name | The name of the default data grouping configuration that is applied on data quality checks. When a default data grouping is selected, all data quality checks run SQL queries with a GROUP BY clause, calculating separate data quality checks for each group of data. The data groupings are defined in the 'groupings' dictionary (indexed by the data grouping name). | string | |||
groupings | Data grouping configurations list. Data grouping configurations are configured in two cases: (1) the data in the table should be analyzed with a GROUP BY condition, to analyze different datasets using separate time series, for example a table contains data from multiple countries and there is a 'country' column used for partitioning. (2) a tag is assigned to a table (within a data grouping level hierarchy), when the data is segmented at a table level (similar tables store the same information, but for different countries, etc.). | DataGroupingConfigurationSpecMap | |||
table_comparisons | Dictionary of data comparison configurations. Data comparison configurations are used for cross data-source comparisons to compare this table (called the compared table) with other reference tables (the source of truth). The reference table's metadata must be imported into DQO, but the reference table could be located on a different data source. DQO will compare metrics calculated for groups of rows (using a GROUP BY clause). For each comparison, the user must specify a name of a data grouping. The number of data grouping dimensions on the parent table and the reference table defined in selected data grouping configurations must match. DQO will run the same data quality sensors on both the parent table (tested table) and the reference table (the source of truth), comparing the measures (sensor readouts) captured from both the tables. | TableComparisonConfigurationSpecMap | |||
incident_grouping | Incident grouping configuration with the overridden configuration at a table level. The field value in this object that are configured will override the default configuration from the connection level. The incident grouping level could be changed or incident creation could be disabled. | TableIncidentGroupingSpec | |||
owner | Table owner information like the data steward name or the business application name. | TableOwnerSpec | |||
profiling_checks | Configuration of data quality profiling checks that are enabled. Pick a check from a category, apply the parameters and rules to enable it. | TableProfilingCheckCategoriesSpec | |||
recurring_checks | Configuration of table level recurring checks. Recurring checks are data quality checks that are evaluated for each period of time (daily, weekly, monthly, etc.). A recurring check stores only the most recent data quality check result for each period of time. | TableRecurringChecksSpec | |||
partitioned_checks | Configuration of table level date/time partitioned checks. Partitioned data quality checks are evaluated for each partition separately, raising separate alerts at a partition level. The table does not need to be physically partitioned by date, it is possible to run data quality checks for each day or month of data separately. | TablePartitionedChecksRootSpec | |||
statistics | Configuration of table level data statistics collector (a basic profiler). Configures which statistics collectors are enabled and how they are configured. | TableStatisticsCollectorsRootCategoriesSpec | |||
schedules_override | Configuration of the job scheduler that runs data quality checks. The scheduler configuration is divided into types of checks that have different schedules. | RecurringSchedulesSpec | |||
columns | Dictionary of columns, indexed by a physical column name. Column specification contains the expected column data type and a list of column level data quality checks that are enabled for a column. | ColumnSpecMap | |||
labels | Custom labels that were assigned to the table. Labels are used for searching for tables when filtered data quality checks are executed. | LabelSetSpec | |||
comments | Comments used for change tracking and documenting changes directly in the table data quality specification file. | CommentsListSpec |
ColumnRangeStatisticsCollectorsSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
min_value | Configuration of the profiler that finds the minimum value in the column. | ColumnRangeMinValueStatisticsCollectorSpec | |||
median_value | Configuration of the profiler that finds the median value in the column. | ColumnRangeMedianValueStatisticsCollectorSpec | |||
max_value | Configuration of the profiler that finds the maximum value in the column. | ColumnRangeMaxValueStatisticsCollectorSpec | |||
sum_value | Configuration of the profiler that finds the sum value in the column. | ColumnRangeSumValueStatisticsCollectorSpec |
PartitionIncrementalTimeWindowSpec
Configuration of the time window for running incremental partition checks.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
daily_partitioning_recent_days | Number of recent days that are analyzed by daily partitioned checks in incremental mode. The default value is 7 days back. | integer | |||
daily_partitioning_include_today | Analyze also today's data by daily partitioned checks in incremental mode. The default value is false, which means that the today's and the future partitions are not analyzed, only yesterday's partition and earlier daily partitions are analyzed because today's data could be still incomplete. Change the value to 'true' if the current day should be also analyzed. The change may require configuring the schedule for daily checks correctly, to run after the data load. | boolean | |||
monthly_partitioning_recent_months | Number of recent months that are analyzed by monthly partitioned checks in incremental mode. The default value is 1 month back which means the previous calendar month. | integer | |||
monthly_partitioning_include_current_month | Analyze also this month's data by monthly partitioned checks in incremental mode. The default value is false, which means that the current month is not analyzed and future data is also filtered out because the current month could be incomplete. Set the value to 'true' if the current month should be analyzed before the end of the month. The schedule for running monthly checks should be also configured to run more frequently (daily, hourly, etc.). | boolean |
ColumnNullsNullsCountStatisticsCollectorSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
parameters | Profiler parameters | ColumnNullsNullsCountSensorParametersSpec | |||
disabled | Disables this profiler. Only enabled profilers are executed during a profiling process. | boolean |
ColumnRangeMaxValueSensorParametersSpec
Column level sensor that finds the maximum value. It works on any data type that supports the MAX functions. The returned data type matches the data type of the column (it could return date, integer, string, datetime, etc.).
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
filter | SQL WHERE clause added to the sensor query. Both the table level filter and a sensor query filter are added, separated by an AND operator. | string |
ColumnRangeMaxValueStatisticsCollectorSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
parameters | Profiler parameters | ColumnRangeMaxValueSensorParametersSpec | |||
disabled | Disables this profiler. Only enabled profilers are executed during a profiling process. | boolean |
ColumnRecurringChecksRootSpec
Container of column level recurring, divided by the time window (daily, monthly, etc.)
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
daily | Configuration of daily recurring evaluated at a column level. | ColumnDailyRecurringCheckCategoriesSpec | |||
monthly | Configuration of monthly recurring evaluated at a column level. | ColumnMonthlyRecurringCheckCategoriesSpec |
TableComparisonConfigurationSpec
Identifies a data comparison configuration between a parent table (the compared table) and the target table from another data source (a reference table).
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
reference_table_connection_name | The name of the connection in DQO where the reference table (the source of truth) is configured. When the connection name is not provided, DQO will find the reference table on the connection of the parent table. | string | |||
reference_table_schema_name | The name of the schema where the reference table is imported into DQO. The reference table's metadata must be imported into DQO. | string | |||
reference_table_name | The name of the reference table that is imported into DQO. The reference table's metadata must be imported into DQO. | string | |||
compared_table_filter | Optional custom SQL filter expression that is added to the SQL query that retrieves the data from the compared table. This expression must be a SQL expression that will be added to the WHERE clause when querying the compared table. | string | |||
reference_table_filter | Optional custom SQL filter expression that is added to the SQL query that retrieves the data from the reference table (the source of truth). This expression must be a SQL expression that will be added to the WHERE clause when querying the reference table. | string | |||
check_type | The type of checks (profiling, recurring, partitioned) that this check comparison configuration is applicable. The default value is 'profiling'. | enum | profiling recurring partitioned |
||
time_scale | The time scale that this check comparison configuration is applicable. Supported values are 'daily' and 'monthly' for recurring and partitioned checks or an empty value for profiling checks. | enum | daily monthly |
||
grouping_columns | List of column pairs from both the compared table and the reference table that are used in a GROUP BY clause for grouping both the compared table and the reference table (the source of truth). The columns are used in the next of the table comparison to join the results of data groups (row counts, sums of columns) between the compared table and the reference table to compare the differences. | TableComparisonGroupingColumnsPairsListSpec |
ColumnSamplingStatisticsCollectorsSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
column_samples | Configuration of the profiler that finds the maximum string length. | ColumnSamplingColumnSamplesStatisticsCollectorSpec |
TimestampColumnsSpec
Configuration of timestamp related columns on a table level. Timestamp columns are used for timeliness data quality checks and for date/time partitioned checks.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
event_timestamp_column | Column name that identifies an event timestamp (date/time), such as a transaction timestamp, impression timestamp, event timestamp. | column_name | |||
ingestion_timestamp_column | Column name that contains the timestamp (or date/time) when the row was ingested (loaded, inserted) into the table. Use a column that is filled by the data pipeline or ETL process at the time of the data loading. | column_name | |||
partition_by_column | Column name that contains the date, datetime or timestamp column for date/time partitioned data. Partition checks (daily partition checks and monthly partition checks) use this column in a GROUP BY clause in order to detect data quality issues in each partition separately. It should be a DATE type, DATETIME type (using a local server time zone) or a TIMESTAMP type (a UTC absolute time). | column_name |
TableComparisonGroupingColumnsPairsListSpec
List of column pairs used for grouping and joining in the table comparison checks.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
mod_count | integer |
TableStatisticsCollectorsRootCategoriesSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
volume | Configuration of volume statistics collectors on a table level. | TableVolumeStatisticsCollectorsSpec |
ColumnRangeMedianValueStatisticsCollectorSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
parameters | Profiler parameters | ColumnNumericMedianSensorParametersSpec | |||
disabled | Disables this profiler. Only enabled profilers are executed during a profiling process. | boolean |
ColumnPartitionedChecksRootSpec
Container of column level partitioned checks, divided by the time window (daily, monthly, etc.)
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
daily | Configuration of day partitioned data quality checks evaluated at a column level. | ColumnDailyPartitionedCheckCategoriesSpec | |||
monthly | Configuration of monthly partitioned data quality checks evaluated at a column level. | ColumnMonthlyPartitionedCheckCategoriesSpec |
ColumnNullsNotNullsPercentStatisticsCollectorSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
parameters | Profiler parameters | ColumnNullsNotNullsPercentSensorParametersSpec | |||
disabled | Disables this profiler. Only enabled profilers are executed during a profiling process. | boolean |
ColumnStringsStatisticsCollectorsSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
string_max_length | Configuration of the profiler that finds the maximum string length. | ColumnStringsStringMaxLengthStatisticsCollectorSpec | |||
string_mean_length | Configuration of the profiler that finds the mean string length. | ColumnStringsStringMeanLengthStatisticsCollectorSpec | |||
string_min_length | Configuration of the profiler that finds the min string length. | ColumnStringsStringMinLengthStatisticsCollectorSpec | |||
string_datatype_detect | Configuration of the profiler that detects datatype. | ColumnStringsStringDatatypeDetectStatisticsCollectorSpec |
ColumnNullsStatisticsCollectorsSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
nulls_count | Configuration of the profiler that counts null column values. | ColumnNullsNullsCountStatisticsCollectorSpec | |||
nulls_percent | Configuration of the profiler that measures the percentage of null values. | ColumnNullsNullsPercentStatisticsCollectorSpec | |||
not_nulls_count | Configuration of the profiler that counts not null column values. | ColumnNullsNotNullsCountStatisticsCollectorSpec | |||
not_nulls_percent | Configuration of the profiler that measures the percentage of not null values. | ColumnNullsNotNullsPercentStatisticsCollectorSpec |
ColumnNullsNotNullsCountStatisticsCollectorSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
parameters | Profiler parameters | ColumnNullsNotNullsCountSensorParametersSpec | |||
disabled | Disables this profiler. Only enabled profilers are executed during a profiling process. | boolean |
TableVolumeStatisticsCollectorsSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
row_count | Configuration of the row count profiler. | TableVolumeRowCountStatisticsCollectorSpec |
ColumnUniquenessDistinctPercentStatisticsCollectorSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
parameters | Profiler parameters | ColumnUniquenessDistinctPercentSensorParametersSpec | |||
disabled | Disables this profiler. Only enabled profilers are executed during a profiling process. | boolean |