Last updated: July 22, 2025
DQOps YAML file definitions
The definition of YAML files used by DQOps to configure the data sources, monitored tables, and the configuration of activated data quality checks.
TableYaml
Table and column definition file that defines a list of tables and columns that are covered by data quality checks.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
api_version |
DQOps YAML schema version | string | dqo/v1 | ||
kind |
File type | enum | source table sensor provider_sensor rule check settings file_index connection_similarity_index dashboards default_schedules default_checks default_table_checks default_column_checks default_notifications |
table | |
spec |
Table specification object with the table metadata and the configuration of data quality checks | TableSpec |
TableSpec
Table specification that defines data quality tests that are enabled on a table and its columns.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
disabled |
Disables all data quality checks on the table. Data quality checks will not be executed. | boolean | |||
stage |
Stage name. | string | |||
priority |
Table priority (1, 2, 3, 4, ...). The tables can be assigned a priority level. The table priority is copied into each data quality check result and a sensor result, enabling efficient grouping of more and less important tables during a data quality improvement project, when the data quality issues on higher priority tables are fixed before data quality issues on less important tables. | integer | |||
filter |
SQL WHERE clause added to the sensor queries. Use replacement tokens {table} to replace the content with the full table name, {alias} to replace the content with the table alias of an analyzed table or {column} to replace the content with the analyzed column name. | string | |||
do_not_collect_error_samples_in_profiling |
Disable automatic collection of error samples in the profiling section. The profiling checks by default always collect error samples for failed data quality checks. | boolean | |||
always_collect_error_samples_in_monitoring |
Always collect error samples for failed monitoring checks. DQOps will not collect error samples automatically when the checks are executed by a scheduler or by running checks from the metadata tree. Error samples are always collected only when the checks are run from the check editor. | boolean | |||
timestamp_columns |
Column names that store the timestamps that identify the event (transaction) timestamp and the ingestion (inserted / loaded at) timestamps. Also configures the timestamp source for the date/time partitioned data quality checks (event timestamp or ingestion timestamp). | TimestampColumnsSpec | |||
incremental_time_window |
Configuration of the time window for analyzing daily or monthly partitions. Specifies the number of recent days and recent months that are analyzed when the partitioned data quality checks are run in an incremental mode (the default mode). | PartitionIncrementalTimeWindowSpec | |||
default_grouping_name |
The name of the default data grouping configuration that is applied on data quality checks. When a default data grouping is selected, all data quality checks run SQL queries with a GROUP BY clause, calculating separate data quality checks for each group of data. The data groupings are defined in the 'groupings' dictionary (indexed by the data grouping name). | string | |||
groupings |
Data grouping configurations list. Data grouping configurations are configured in two cases: (1) the data in the table should be analyzed with a GROUP BY condition, to analyze different datasets using separate time series, for example a table contains data from multiple countries and there is a 'country' column used for partitioning. (2) a tag is assigned to a table (within a data grouping level hierarchy), when the data is segmented at a table level (similar tables store the same information, but for different countries, etc.). | DataGroupingConfigurationSpecMap | |||
table_comparisons |
Dictionary of data comparison configurations. Data comparison configurations are used for comparisons between data sources to compare this table (called the compared table) with other reference tables (the source of truth). The reference table's metadata must be imported into DQOps, but the reference table may be located in another data source. DQOps will compare metrics calculated for groups of rows (using the GROUP BY clause). For each comparison, the user must specify a name of a data grouping. The number of data grouping dimensions in the parent table and the reference table defined in the selected data grouping configurations must match. DQOps will run the same data quality sensors on both the parent table (table under test) and the reference table (the source of truth), comparing the measures (sensor readouts) captured from both tables. | TableComparisonConfigurationSpecMap | |||
incident_grouping |
Incident grouping configuration with the overridden configuration at a table level. The configured field value in this object will override the default configuration from the connection level. Incident grouping level can be changed or incident creation can be disabled. | TableIncidentGroupingSpec | |||
owner |
Table owner information like the data steward name or the business application name. | TableOwnerSpec | |||
profiling_checks |
Configuration of data quality profiling checks that are enabled. Pick a check from a category, apply the parameters and rules to enable it. | TableProfilingCheckCategoriesSpec | |||
monitoring_checks |
Configuration of table level monitoring checks. Monitoring checks are data quality checks that are evaluated for each period of time (daily, weekly, monthly, etc.). A monitoring check stores only the most recent data quality check result for each period of time. | TableMonitoringCheckCategoriesSpec | |||
partitioned_checks |
Configuration of table level date/time partitioned checks. Partitioned data quality checks are evaluated for each partition separately, raising separate alerts at a partition level. The table does not need to be physically partitioned by date, it is possible to run data quality checks for each day or month of data separately. | TablePartitionedCheckCategoriesSpec | |||
statistics |
Configuration of table level data statistics collector (a basic profiler). Configures which statistics collectors are enabled and how they are configured. | TableStatisticsCollectorsRootCategoriesSpec | |||
schedules_override |
Configuration of the job scheduler that runs data quality checks. The scheduler configuration is divided into types of checks that have different schedules. | CronSchedulesSpec | |||
columns |
Dictionary of columns, indexed by a physical column name. Column specification contains the expected column data type and a list of column level data quality checks that are enabled for a column. | ColumnSpecMap | |||
labels |
Custom labels that were assigned to the table. Labels are used for searching for tables when filtered data quality checks are executed. | LabelSetSpec | |||
comments |
Comments used for change tracking and documenting changes directly in the table data quality specification file. | CommentsListSpec | |||
file_format |
File format with the specification used as a source data. It overrides the connection spec's file format when it is set | FileFormatSpec | |||
advanced_properties |
A dictionary of advanced properties that can be used for e.g. to support mapping data to data catalogs, a key/value dictionary. | Dict[string, string] | |||
source_tables |
A list of source tables. This information is used to define the data lineage report for the table. | TableLineageSourceSpecList |
TimestampColumnsSpec
Configuration of timestamp related columns on a table level. Timestamp columns are used for timeliness data quality checks and for date/time partitioned checks.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
event_timestamp_column |
Column name that identifies an event timestamp (date/time), such as a transaction timestamp, impression timestamp, event timestamp. | string | |||
ingestion_timestamp_column |
Column name that contains the timestamp (or date/time) when the row was ingested (loaded, inserted) into the table. Use a column that is filled by the data pipeline or ETL process at the time of the data loading. | string | |||
partition_by_column |
Column name that contains the date, datetime or timestamp column for date/time partitioned data. Partition checks (daily partition checks and monthly partition checks) use this column in a GROUP BY clause in order to detect data quality issues in each partition separately. It should be a DATE type, DATETIME type (using a local server time zone) or a TIMESTAMP type (a UTC absolute time). | string |
PartitionIncrementalTimeWindowSpec
Configuration of the time window for running incremental partition checks.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
daily_partitioning_recent_days |
The number of recent days analyzed by daily partition checks in incremental mode. The default value is 7 last days. | integer | |||
daily_partitioning_include_today |
Analyze also today's data by daily partition checks in incremental mode. The default value is false, which means that the today's and the future partitions are not analyzed. Only yesterday's partition and previous daily partitions are analyzed because today's data may still be incomplete. Change the value to 'true' if the current day should also be analyzed. This change may require you to configure the schedule for daily checks correctly. The checks must run after the data load. | boolean | |||
monthly_partitioning_recent_months |
The number of recent days analyzed by monthly partition checks in incremental mode. The default value is the previous calendar month. | integer | |||
monthly_partitioning_include_current_month |
Analyze also this month's data by monthly partition checks in incremental mode. The default value is false, which means that the current month is not analyzed. Future data is also filtered out because the current month may be incomplete. Change the value to 'true' if the current month should also be analyzed before the end of the month. This change may require you to configure the schedule to run monthly checks more frequently (daily, hourly, etc.). | boolean |
DataGroupingConfigurationSpecMap
Dictionary of named data grouping configurations defined on a table level.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
self |
Dict[string, DataGroupingConfigurationSpec] |
TableComparisonConfigurationSpecMap
Dictionary of data comparison configurations between the current table (the parent of this node) and another reference table (the source of truth) to which we are comparing the tables to measure the accuracy of the data.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
self |
Dict[string, TableComparisonConfigurationSpec] |
TableComparisonConfigurationSpec
Identifies a data comparison configuration between a parent table (the compared table) and the target table from another data source (a reference table).
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
reference_table_connection_name |
The name of the connection in DQOp where the reference table (the source of truth) is configured. When the connection name is not provided, DQOps will find the reference table on the connection of the parent table. | string | |||
reference_table_schema_name |
The name of the schema where the reference table is imported into DQOps. The reference table's metadata must be imported into DQOps. | string | |||
reference_table_name |
The name of the reference table that is imported into DQOps. The reference table's metadata must be imported into DQOps. | string | |||
compared_table_filter |
Optional custom SQL filter expression that is added to the SQL query that retrieves the data from the compared table. This expression must be a SQL expression that will be added to the WHERE clause when querying the compared table. | string | |||
reference_table_filter |
Optional custom SQL filter expression that is added to the SQL query that retrieves the data from the reference table (the source of truth). This expression must be a SQL expression that will be added to the WHERE clause when querying the reference table. | string | |||
check_type |
The type of checks (profiling, monitoring, partitioned) that this check comparison configuration is applicable. The default value is 'profiling'. | enum | profiling monitoring partitioned |
||
time_scale |
The time scale that this check comparison configuration is applicable. Supported values are 'daily' and 'monthly' for monitoring and partitioned checks or an empty value for profiling checks. | enum | daily monthly |
||
grouping_columns |
List of column pairs from both the compared table and the reference table that are used in a GROUP BY clause for grouping both the compared table and the reference table (the source of truth). The columns are used in the next of the table comparison to join the results of data groups (row counts, sums of columns) between the compared table and the reference table to compare the differences. | TableComparisonGroupingColumnsPairsListSpec |
TableComparisonGroupingColumnsPairsListSpec
List of column pairs used for grouping and joining in the table comparison checks.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
self |
List[TableComparisonGroupingColumnsPairSpec] |
TableComparisonGroupingColumnsPairSpec
Configuration of a pair of columns on the compared table and the reference table (the source of truth) that are joined and used for grouping to perform data comparison of aggregated results (sums of columns, row counts, etc.).
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
compared_table_column_name |
The name of the column on the compared table (the parent table) that is used in the GROUP BY clause to group rows before compared aggregates (row counts, sums, etc.) are calculated. This column is also used to join (match) results to the reference table. | string | |||
reference_table_column_name |
The name of the column on the reference table (the source of truth) that is used in the GROUP BY clause to group rows before compared aggregates (row counts, sums, etc.) are calculated. This column is also used to join (match) results to the compared table. | string |
TableIncidentGroupingSpec
Configuration of data quality incident grouping on a table level. Defines how similar data quality issues are grouped into incidents.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
grouping_level |
Grouping level of failed data quality checks for creating higher level data quality incidents. The default grouping level is by a table, a data quality dimension and a check category (i.e. a datatype data quality incident detected on a table X in the numeric checks category). | enum | table table_dimension table_dimension_category table_dimension_category_type table_dimension_category_name |
||
minimum_severity |
Minimum severity level of data quality issues that are grouped into incidents. The default minimum severity level is 'warning'. Other supported severity levels are 'error' and 'fatal'. | enum | warning error fatal |
||
divide_by_data_group |
Create separate data quality incidents for each data group, creating different incidents for different groups of rows. By default, data groups are ignored for grouping data quality issues into data quality incidents. | boolean | |||
disabled |
Disables data quality incident creation for failed data quality checks on the table. | boolean |
TableOwnerSpec
Table owner information.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
data_steward |
Data steward name | string | |||
application |
Business application name | string |
TableMonitoringCheckCategoriesSpec
Container of table level monitoring, divided by the time window (daily, monthly, etc.)
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
daily |
Configuration of daily monitoring evaluated at a table level. | TableDailyMonitoringCheckCategoriesSpec | |||
monthly |
Configuration of monthly monitoring evaluated at a table level. | TableMonthlyMonitoringCheckCategoriesSpec |
TablePartitionedCheckCategoriesSpec
Container of table level partitioned checks, divided by the time window (daily, monthly, etc.)
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
daily |
Configuration of day partitioned data quality checks evaluated at a table level. | TableDailyPartitionedCheckCategoriesSpec | |||
monthly |
Configuration of monthly partitioned data quality checks evaluated at a table level.. | TableMonthlyPartitionedCheckCategoriesSpec |
TableStatisticsCollectorsRootCategoriesSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
volume |
Configuration of volume statistics collectors on a table level. | TableVolumeStatisticsCollectorsSpec | |||
schema |
TableSchemaStatisticsCollectorsSpec |
TableVolumeStatisticsCollectorsSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
row_count |
Configuration of the row count profiler. | TableVolumeRowCountStatisticsCollectorSpec |
TableVolumeRowCountStatisticsCollectorSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
parameters |
Profiler parameters | TableVolumeRowCountSensorParametersSpec | |||
disabled |
Disables this profiler. Only enabled profilers are executed during a profiling process. | boolean |
TableSchemaStatisticsCollectorsSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
column_count |
Configuration of the column count profiler. | TableSchemaColumnCountStatisticsCollectorSpec |
TableSchemaColumnCountStatisticsCollectorSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
parameters |
Profiler parameters | TableColumnCountSensorParametersSpec | |||
disabled |
Disables this profiler. Only enabled profilers are executed during a profiling process. | boolean |
ColumnSpecMap
Dictionary of columns indexed by a physical column name.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
self |
Dict[string, ColumnSpec] |
ColumnSpec
Column specification that identifies a single column.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
disabled |
Disables all data quality checks on the column. Data quality checks will not be executed. | boolean | |||
sql_expression |
SQL expression used for calculated fields or when additional column value transformation is required before the column can be used for analysis with data quality checks (data type conversion, transformation). It should be an SQL expression that uses the SQL language of the analyzed database type. Use the replacement tokens {table} to replace the content with the full table name, {alias} to replace the content with the table alias of the table under analysis, or {column} to replace the content with the analyzed column name. An example of extracting a value from a string column storing JSON in PostgreSQL: "{column}::json->'address'->'zip'". | string | |||
type_snapshot |
Column data type that was retrieved when the table metadata was imported. | ColumnTypeSnapshotSpec | |||
id |
True when this column is a part of the primary key or a business key that identifies a row. Error sampling captures values of id columns to identify the row where the error sample was found. | boolean | |||
profiling_checks |
Configuration of data quality profiling checks that are enabled. Pick a check from a category, apply the parameters and rules to enable it. | ColumnProfilingCheckCategoriesSpec | |||
monitoring_checks |
Configuration of column level monitoring checks. Monitoring are data quality checks that are evaluated for each period of time (daily, weekly, monthly, etc.). A monitoring stores only the most recent data quality check result for each period of time. | ColumnMonitoringCheckCategoriesSpec | |||
partitioned_checks |
Configuration of column level date/time partitioned checks. Partitioned data quality checks are evaluated for each partition separately, raising separate alerts at a partition level. The table does not need to be physically partitioned by date, it is possible to run data quality checks for each day or month of data separately. | ColumnPartitionedCheckCategoriesSpec | |||
statistics |
Custom configuration of a column level statistics collector (a basic profiler). Enables customization of the statistics collector settings when the collector is analysing this column. | ColumnStatisticsCollectorsRootCategoriesSpec | |||
labels |
Custom labels that were assigned to the column. Labels are used for searching for columns when filtered data quality checks are executed. | LabelSetSpec | |||
comments |
Comments for change tracking. Please put comments in this collection because YAML comments may be removed when the YAML file is modified by the tool (serialization and deserialization will remove non tracked comments). | CommentsListSpec | |||
advanced_properties |
A dictionary of advanced properties that can be used for e.g. to support mapping data to data catalogs, a key/value dictionary. | Dict[string, string] |
ColumnTypeSnapshotSpec
Stores the column data type captured at the time of the table metadata import.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
column_type |
Column data type using the monitored database type names. | string | |||
nullable |
Column is nullable. | boolean | |||
length |
Maximum length of text and binary columns. | integer | |||
precision |
Precision of a numeric (decimal) data type. | integer | |||
scale |
Scale of a numeric (decimal) data type. | integer | |||
nested |
This field is a nested field inside another STRUCT. It is used to identify nested fields in JSON files. | boolean |
ColumnMonitoringCheckCategoriesSpec
Container of column level monitoring, divided by the time window (daily, monthly, etc.)
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
daily |
Configuration of daily monitoring evaluated at a column level. | ColumnDailyMonitoringCheckCategoriesSpec | |||
monthly |
Configuration of monthly monitoring evaluated at a column level. | ColumnMonthlyMonitoringCheckCategoriesSpec |
ColumnPartitionedCheckCategoriesSpec
Container of column level partitioned checks, divided by the time window (daily, monthly, etc.)
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
daily |
Configuration of day partitioned data quality checks evaluated at a column level. | ColumnDailyPartitionedCheckCategoriesSpec | |||
monthly |
Configuration of monthly partitioned data quality checks evaluated at a column level. | ColumnMonthlyPartitionedCheckCategoriesSpec |
ColumnStatisticsCollectorsRootCategoriesSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
nulls |
Configuration of null values profilers on a column level. | ColumnNullsStatisticsCollectorsSpec | |||
text |
Configuration of text column profilers on a column level. | ColumnTextStatisticsCollectorsSpec | |||
uniqueness |
Configuration of profilers that analyse uniqueness of values (distinct count). | ColumnUniquenessStatisticsCollectorsSpec | |||
range |
Configuration of profilers that analyse the range of values (min, max). | ColumnRangeStatisticsCollectorsSpec | |||
sampling |
Configuration of profilers that collect the column samples. | ColumnSamplingStatisticsCollectorsSpec |
ColumnNullsStatisticsCollectorsSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
nulls_count |
Configuration of the profiler that counts null column values. | ColumnNullsNullsCountStatisticsCollectorSpec | |||
nulls_percent |
Configuration of the profiler that measures the percentage of null values. | ColumnNullsNullsPercentStatisticsCollectorSpec | |||
not_nulls_count |
Configuration of the profiler that counts not null column values. | ColumnNullsNotNullsCountStatisticsCollectorSpec | |||
not_nulls_percent |
Configuration of the profiler that measures the percentage of not null values. | ColumnNullsNotNullsPercentStatisticsCollectorSpec |
ColumnNullsNullsCountStatisticsCollectorSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
parameters |
Profiler parameters | ColumnNullsNullsCountSensorParametersSpec | |||
disabled |
Disables this profiler. Only enabled profilers are executed during a profiling process. | boolean |
ColumnNullsNullsPercentStatisticsCollectorSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
parameters |
Profiler parameters | ColumnNullsNullsPercentSensorParametersSpec | |||
disabled |
Disables this profiler. Only enabled profilers are executed during a profiling process. | boolean |
ColumnNullsNotNullsCountStatisticsCollectorSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
parameters |
Profiler parameters | ColumnNullsNotNullsCountSensorParametersSpec | |||
disabled |
Disables this profiler. Only enabled profilers are executed during a profiling process. | boolean |
ColumnNullsNotNullsPercentStatisticsCollectorSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
parameters |
Profiler parameters | ColumnNullsNotNullsPercentSensorParametersSpec | |||
disabled |
Disables this profiler. Only enabled profilers are executed during a profiling process. | boolean |
ColumnTextStatisticsCollectorsSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
text_max_length |
Configuration of the profiler that finds the maximum text length. | ColumnTextTextMaxLengthStatisticsCollectorSpec | |||
text_mean_length |
Configuration of the profiler that finds the mean text length. | ColumnTextTextMeanLengthStatisticsCollectorSpec | |||
text_min_length |
Configuration of the profiler that finds the min text length. | ColumnTextTextMinLengthStatisticsCollectorSpec | |||
text_datatype_detect |
Configuration of the profiler that detects datatype. | ColumnTextTextDatatypeDetectStatisticsCollectorSpec | |||
text_min_word_count |
Configuration of the profiler that finds the estimated minimum word count. | ColumnTextMinWordCountStatisticsCollectorSpec | |||
text_max_word_count |
Configuration of the profiler that finds the estimated maximum word count. | ColumnTextMaxWordCountStatisticsCollectorSpec |
ColumnTextTextMaxLengthStatisticsCollectorSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
parameters |
Profiler parameters | ColumnTextTextMaxLengthSensorParametersSpec | |||
disabled |
Disables this profiler. Only enabled profilers are executed during a profiling process. | boolean |
ColumnTextTextMeanLengthStatisticsCollectorSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
parameters |
Profiler parameters | ColumnTextTextMeanLengthSensorParametersSpec | |||
disabled |
Disables this profiler. Only enabled profilers are executed during a profiling process. | boolean |
ColumnTextTextMinLengthStatisticsCollectorSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
parameters |
Profiler parameters | ColumnTextTextMinLengthSensorParametersSpec | |||
disabled |
Disables this profiler. Only enabled profilers are executed during a profiling process. | boolean |
ColumnTextTextDatatypeDetectStatisticsCollectorSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
parameters |
Profiler parameters | ColumnDatatypeStringDatatypeDetectSensorParametersSpec | |||
disabled |
Disables this profiler. Only enabled profilers are executed during a profiling process. | boolean |
ColumnTextMinWordCountStatisticsCollectorSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
parameters |
Profiler parameters | ColumnTextMinWordCountSensorParametersSpec | |||
disabled |
Disables this profiler. Only enabled profilers are executed during a profiling process. | boolean |
ColumnTextMaxWordCountStatisticsCollectorSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
parameters |
Profiler parameters | ColumnTextMaxWordCountSensorParametersSpec | |||
disabled |
Disables this profiler. Only enabled profilers are executed during a profiling process. | boolean |
ColumnUniquenessStatisticsCollectorsSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
distinct_count |
Configuration of the profiler that counts distinct column values. | ColumnUniquenessDistinctCountStatisticsCollectorSpec | |||
distinct_percent |
Configuration of the profiler that measure the percentage of distinct column values. | ColumnUniquenessDistinctPercentStatisticsCollectorSpec | |||
duplicate_count |
Configuration of the profiler that counts duplicate column values. | ColumnUniquenessDuplicateCountStatisticsCollectorSpec | |||
duplicate_percent |
Configuration of the profiler that measure the percentage of duplicate column values. | ColumnUniquenessDuplicatePercentStatisticsCollectorSpec |
ColumnUniquenessDistinctCountStatisticsCollectorSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
parameters |
Profiler parameters | ColumnUniquenessDistinctCountSensorParametersSpec | |||
disabled |
Disables this profiler. Only enabled profilers are executed during a profiling process. | boolean |
ColumnUniquenessDistinctPercentStatisticsCollectorSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
parameters |
Profiler parameters | ColumnUniquenessDistinctPercentSensorParametersSpec | |||
disabled |
Disables this profiler. Only enabled profilers are executed during a profiling process. | boolean |
ColumnUniquenessDuplicateCountStatisticsCollectorSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
parameters |
Profiler parameters | ColumnUniquenessDuplicateCountSensorParametersSpec | |||
disabled |
Disables this profiler. Only enabled profilers are executed during a profiling process. | boolean |
ColumnUniquenessDuplicatePercentStatisticsCollectorSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
parameters |
Profiler parameters | ColumnUniquenessDuplicatePercentSensorParametersSpec | |||
disabled |
Disables this profiler. Only enabled profilers are executed during a profiling process. | boolean |
ColumnRangeStatisticsCollectorsSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
min_value |
Configuration of the profiler that finds the minimum value in the column. | ColumnRangeMinValueStatisticsCollectorSpec | |||
median_value |
Configuration of the profiler that finds the median value in the column. | ColumnRangeMedianValueStatisticsCollectorSpec | |||
max_value |
Configuration of the profiler that finds the maximum value in the column. | ColumnRangeMaxValueStatisticsCollectorSpec | |||
mean_value |
Configuration of the profiler that finds the mean value in the column. | ColumnRangeMeanValueStatisticsCollectorSpec | |||
sum_value |
Configuration of the profiler that finds the sum value in the column. | ColumnRangeSumValueStatisticsCollectorSpec |
ColumnRangeMinValueStatisticsCollectorSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
parameters |
Profiler parameters | ColumnRangeMinValueSensorParametersSpec | |||
disabled |
Disables this profiler. Only enabled profilers are executed during a profiling process. | boolean |
ColumnRangeMinValueSensorParametersSpec
Column level sensor that finds the minimum value. It works on any data type that supports the MIN functions. The returned data type matches the data type of the column (can return date, integer, string, datetime, etc.).
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
filter |
SQL WHERE clause added to the sensor query. Both the table level filter and a sensor query filter are added, separated by an AND operator. | string |
ColumnRangeMedianValueStatisticsCollectorSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
parameters |
Profiler parameters | ColumnNumericMedianSensorParametersSpec | |||
disabled |
Disables this profiler. Only enabled profilers are executed during a profiling process. | boolean |
ColumnRangeMaxValueStatisticsCollectorSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
parameters |
Profiler parameters | ColumnRangeMaxValueSensorParametersSpec | |||
disabled |
Disables this profiler. Only enabled profilers are executed during a profiling process. | boolean |
ColumnRangeMaxValueSensorParametersSpec
Column level sensor that finds the maximum value. It works on any data type that supports the MAX functions. The returned data type matches the data type of the column (can return date, integer, string, datetime, etc.).
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
filter |
SQL WHERE clause added to the sensor query. Both the table level filter and a sensor query filter are added, separated by an AND operator. | string |
ColumnRangeMeanValueStatisticsCollectorSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
parameters |
Profiler parameters | ColumnNumericMeanSensorParametersSpec | |||
disabled |
Disables this profiler. Only enabled profilers are executed during a profiling process. | boolean |
ColumnRangeSumValueStatisticsCollectorSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
parameters |
Profiler parameters | ColumnNumericSumSensorParametersSpec | |||
disabled |
Disables this profiler. Only enabled profilers are executed during a profiling process. | boolean |
ColumnSamplingStatisticsCollectorsSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
column_samples |
Configuration of the profiler that finds the maximum string length. | ColumnSamplingColumnSamplesStatisticsCollectorSpec |
ColumnSamplingColumnSamplesStatisticsCollectorSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
parameters |
Profiler parameters | ColumnSamplingColumnSamplesSensorParametersSpec | |||
disabled |
Disables this profiler. Only enabled profilers are executed during a profiling process. | boolean |
FileFormatSpec
File format specification for data loaded from the physical files of one of supported formats.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
csv |
Csv file format specification. | CsvFileFormatSpec | |||
json |
Json file format specification. | JsonFileFormatSpec | |||
parquet |
Parquet file format specification. | ParquetFileFormatSpec | |||
avro |
Avro file format specification. | AvroFileFormatSpec | |||
iceberg |
Iceberg file format specification. | IcebergFileFormatSpec | |||
delta_lake |
Delta Lake file format specification. | DeltaLakeFileFormatSpec | |||
file_paths |
The list of paths to files with data that are used as a source. | FilePathListSpec |
FilePathListSpec
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
self |
List[string] |
TableLineageSourceSpecList
List of source tables of the current table to build the data lineage report.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
self |
List[TableLineageSource] |
TableLineageSource
Key object that identifies a source table by using the connection name, schema name and table name to identify.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
connection |
Connection name | string | |||
schema |
Schema name | string | |||
table |
Table name | string |
TableLineageSourceSpec
Data lineage specification for a table to identify a source table of the current table where this object is stored.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
source_connection |
The name of a source connection that is defined in DQOps and contains a source table from which the current table receives data. | string | |||
source_schema |
The name of a source schema within the source connection that contains a source table from which the current table receives data. | string | |||
source_table |
The name of a source table in the source schema from which the current table receives data. | string | |||
data_lineage_source_tool |
The name of a source tool from which this data lineage information was copied. This field should be filled when the data lineage was imported from another data catalog or a data lineage tracking platform. | string | |||
properties |
A dictionary of mapping properties stored as a key/value dictionary. Data lineage synchronization tools that are importing data lineage mappings from external data lineage sources can use it to store mapping information. | Dict[string, string] | |||
columns |
Configuration of source columns for each column in the current table. The keys in this dictionary are column names in the current table. The object stored in the dictionary contain a list of source columns. | ColumnLineageSourceSpecMap |
ColumnLineageSourceSpecMap
Dictionary of mapping of source columns to the columns in the current table. The keys in this dictionary are the column names in the current table.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
self |
Dict[string, ColumnLineageSourceSpec] |
ColumnLineageSourceSpec
Describes the list of source columns for a column in the current table.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
source_columns |
A list of source columns from the source table name from which this column receives data. | SourceColumnsSetSpec | |||
properties |
A dictionary of mapping properties stored as a key/value dictionary. Data lineage synchronization tools that are importing data lineage mappings from external data lineage sources can use it to store mapping information. | Dict[string, string] |
SourceColumnsSetSpec
A collection of unique names of source columns from which the current column receives data. This information is used to track column-level data lineage.