Skip to content

Last updated: July 22, 2025

DQOps YAML file definitions

The definition of YAML files used by DQOps to configure the data sources, monitored tables, and the configuration of activated data quality checks.

TableYaml

Table and column definition file that defines a list of tables and columns that are covered by data quality checks.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
api_version DQOps YAML schema version string dqo/v1
kind File type enum source
table
sensor
provider_sensor
rule
check
settings
file_index
connection_similarity_index
dashboards
default_schedules
default_checks
default_table_checks
default_column_checks
default_notifications
table
spec Table specification object with the table metadata and the configuration of data quality checks TableSpec

TableSpec

Table specification that defines data quality tests that are enabled on a table and its columns.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
disabled Disables all data quality checks on the table. Data quality checks will not be executed. boolean
stage Stage name. string
priority Table priority (1, 2, 3, 4, ...). The tables can be assigned a priority level. The table priority is copied into each data quality check result and a sensor result, enabling efficient grouping of more and less important tables during a data quality improvement project, when the data quality issues on higher priority tables are fixed before data quality issues on less important tables. integer
filter SQL WHERE clause added to the sensor queries. Use replacement tokens {table} to replace the content with the full table name, {alias} to replace the content with the table alias of an analyzed table or {column} to replace the content with the analyzed column name. string
do_not_collect_error_samples_in_profiling Disable automatic collection of error samples in the profiling section. The profiling checks by default always collect error samples for failed data quality checks. boolean
always_collect_error_samples_in_monitoring Always collect error samples for failed monitoring checks. DQOps will not collect error samples automatically when the checks are executed by a scheduler or by running checks from the metadata tree. Error samples are always collected only when the checks are run from the check editor. boolean
timestamp_columns Column names that store the timestamps that identify the event (transaction) timestamp and the ingestion (inserted / loaded at) timestamps. Also configures the timestamp source for the date/time partitioned data quality checks (event timestamp or ingestion timestamp). TimestampColumnsSpec
incremental_time_window Configuration of the time window for analyzing daily or monthly partitions. Specifies the number of recent days and recent months that are analyzed when the partitioned data quality checks are run in an incremental mode (the default mode). PartitionIncrementalTimeWindowSpec
default_grouping_name The name of the default data grouping configuration that is applied on data quality checks. When a default data grouping is selected, all data quality checks run SQL queries with a GROUP BY clause, calculating separate data quality checks for each group of data. The data groupings are defined in the 'groupings' dictionary (indexed by the data grouping name). string
groupings Data grouping configurations list. Data grouping configurations are configured in two cases: (1) the data in the table should be analyzed with a GROUP BY condition, to analyze different datasets using separate time series, for example a table contains data from multiple countries and there is a 'country' column used for partitioning. (2) a tag is assigned to a table (within a data grouping level hierarchy), when the data is segmented at a table level (similar tables store the same information, but for different countries, etc.). DataGroupingConfigurationSpecMap
table_comparisons Dictionary of data comparison configurations. Data comparison configurations are used for comparisons between data sources to compare this table (called the compared table) with other reference tables (the source of truth). The reference table's metadata must be imported into DQOps, but the reference table may be located in another data source. DQOps will compare metrics calculated for groups of rows (using the GROUP BY clause). For each comparison, the user must specify a name of a data grouping. The number of data grouping dimensions in the parent table and the reference table defined in the selected data grouping configurations must match. DQOps will run the same data quality sensors on both the parent table (table under test) and the reference table (the source of truth), comparing the measures (sensor readouts) captured from both tables. TableComparisonConfigurationSpecMap
incident_grouping Incident grouping configuration with the overridden configuration at a table level. The configured field value in this object will override the default configuration from the connection level. Incident grouping level can be changed or incident creation can be disabled. TableIncidentGroupingSpec
owner Table owner information like the data steward name or the business application name. TableOwnerSpec
profiling_checks Configuration of data quality profiling checks that are enabled. Pick a check from a category, apply the parameters and rules to enable it. TableProfilingCheckCategoriesSpec
monitoring_checks Configuration of table level monitoring checks. Monitoring checks are data quality checks that are evaluated for each period of time (daily, weekly, monthly, etc.). A monitoring check stores only the most recent data quality check result for each period of time. TableMonitoringCheckCategoriesSpec
partitioned_checks Configuration of table level date/time partitioned checks. Partitioned data quality checks are evaluated for each partition separately, raising separate alerts at a partition level. The table does not need to be physically partitioned by date, it is possible to run data quality checks for each day or month of data separately. TablePartitionedCheckCategoriesSpec
statistics Configuration of table level data statistics collector (a basic profiler). Configures which statistics collectors are enabled and how they are configured. TableStatisticsCollectorsRootCategoriesSpec
schedules_override Configuration of the job scheduler that runs data quality checks. The scheduler configuration is divided into types of checks that have different schedules. CronSchedulesSpec
columns Dictionary of columns, indexed by a physical column name. Column specification contains the expected column data type and a list of column level data quality checks that are enabled for a column. ColumnSpecMap
labels Custom labels that were assigned to the table. Labels are used for searching for tables when filtered data quality checks are executed. LabelSetSpec
comments Comments used for change tracking and documenting changes directly in the table data quality specification file. CommentsListSpec
file_format File format with the specification used as a source data. It overrides the connection spec's file format when it is set FileFormatSpec
advanced_properties A dictionary of advanced properties that can be used for e.g. to support mapping data to data catalogs, a key/value dictionary. Dict[string, string]
source_tables A list of source tables. This information is used to define the data lineage report for the table. TableLineageSourceSpecList

TimestampColumnsSpec

Configuration of timestamp related columns on a table level. Timestamp columns are used for timeliness data quality checks and for date/time partitioned checks.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
event_timestamp_column Column name that identifies an event timestamp (date/time), such as a transaction timestamp, impression timestamp, event timestamp. string
ingestion_timestamp_column Column name that contains the timestamp (or date/time) when the row was ingested (loaded, inserted) into the table. Use a column that is filled by the data pipeline or ETL process at the time of the data loading. string
partition_by_column Column name that contains the date, datetime or timestamp column for date/time partitioned data. Partition checks (daily partition checks and monthly partition checks) use this column in a GROUP BY clause in order to detect data quality issues in each partition separately. It should be a DATE type, DATETIME type (using a local server time zone) or a TIMESTAMP type (a UTC absolute time). string

PartitionIncrementalTimeWindowSpec

Configuration of the time window for running incremental partition checks.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
daily_partitioning_recent_days The number of recent days analyzed by daily partition checks in incremental mode. The default value is 7 last days. integer
daily_partitioning_include_today Analyze also today's data by daily partition checks in incremental mode. The default value is false, which means that the today's and the future partitions are not analyzed. Only yesterday's partition and previous daily partitions are analyzed because today's data may still be incomplete. Change the value to 'true' if the current day should also be analyzed. This change may require you to configure the schedule for daily checks correctly. The checks must run after the data load. boolean
monthly_partitioning_recent_months The number of recent days analyzed by monthly partition checks in incremental mode. The default value is the previous calendar month. integer
monthly_partitioning_include_current_month Analyze also this month's data by monthly partition checks in incremental mode. The default value is false, which means that the current month is not analyzed. Future data is also filtered out because the current month may be incomplete. Change the value to 'true' if the current month should also be analyzed before the end of the month. This change may require you to configure the schedule to run monthly checks more frequently (daily, hourly, etc.). boolean

DataGroupingConfigurationSpecMap

Dictionary of named data grouping configurations defined on a table level.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
self Dict[string, DataGroupingConfigurationSpec]

TableComparisonConfigurationSpecMap

Dictionary of data comparison configurations between the current table (the parent of this node) and another reference table (the source of truth) to which we are comparing the tables to measure the accuracy of the data.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
self Dict[string, TableComparisonConfigurationSpec]

TableComparisonConfigurationSpec

Identifies a data comparison configuration between a parent table (the compared table) and the target table from another data source (a reference table).

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
reference_table_connection_name The name of the connection in DQOp where the reference table (the source of truth) is configured. When the connection name is not provided, DQOps will find the reference table on the connection of the parent table. string
reference_table_schema_name The name of the schema where the reference table is imported into DQOps. The reference table's metadata must be imported into DQOps. string
reference_table_name The name of the reference table that is imported into DQOps. The reference table's metadata must be imported into DQOps. string
compared_table_filter Optional custom SQL filter expression that is added to the SQL query that retrieves the data from the compared table. This expression must be a SQL expression that will be added to the WHERE clause when querying the compared table. string
reference_table_filter Optional custom SQL filter expression that is added to the SQL query that retrieves the data from the reference table (the source of truth). This expression must be a SQL expression that will be added to the WHERE clause when querying the reference table. string
check_type The type of checks (profiling, monitoring, partitioned) that this check comparison configuration is applicable. The default value is 'profiling'. enum profiling
monitoring
partitioned
time_scale The time scale that this check comparison configuration is applicable. Supported values are 'daily' and 'monthly' for monitoring and partitioned checks or an empty value for profiling checks. enum daily
monthly
grouping_columns List of column pairs from both the compared table and the reference table that are used in a GROUP BY clause for grouping both the compared table and the reference table (the source of truth). The columns are used in the next of the table comparison to join the results of data groups (row counts, sums of columns) between the compared table and the reference table to compare the differences. TableComparisonGroupingColumnsPairsListSpec

TableComparisonGroupingColumnsPairsListSpec

List of column pairs used for grouping and joining in the table comparison checks.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
self List[TableComparisonGroupingColumnsPairSpec]

TableComparisonGroupingColumnsPairSpec

Configuration of a pair of columns on the compared table and the reference table (the source of truth) that are joined and used for grouping to perform data comparison of aggregated results (sums of columns, row counts, etc.).

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
compared_table_column_name The name of the column on the compared table (the parent table) that is used in the GROUP BY clause to group rows before compared aggregates (row counts, sums, etc.) are calculated. This column is also used to join (match) results to the reference table. string
reference_table_column_name The name of the column on the reference table (the source of truth) that is used in the GROUP BY clause to group rows before compared aggregates (row counts, sums, etc.) are calculated. This column is also used to join (match) results to the compared table. string

TableIncidentGroupingSpec

Configuration of data quality incident grouping on a table level. Defines how similar data quality issues are grouped into incidents.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
grouping_level Grouping level of failed data quality checks for creating higher level data quality incidents. The default grouping level is by a table, a data quality dimension and a check category (i.e. a datatype data quality incident detected on a table X in the numeric checks category). enum table
table_dimension
table_dimension_category
table_dimension_category_type
table_dimension_category_name
minimum_severity Minimum severity level of data quality issues that are grouped into incidents. The default minimum severity level is 'warning'. Other supported severity levels are 'error' and 'fatal'. enum warning
error
fatal
divide_by_data_group Create separate data quality incidents for each data group, creating different incidents for different groups of rows. By default, data groups are ignored for grouping data quality issues into data quality incidents. boolean
disabled Disables data quality incident creation for failed data quality checks on the table. boolean

TableOwnerSpec

Table owner information.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
data_steward Data steward name string
application Business application name string

TableMonitoringCheckCategoriesSpec

Container of table level monitoring, divided by the time window (daily, monthly, etc.)

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
daily Configuration of daily monitoring evaluated at a table level. TableDailyMonitoringCheckCategoriesSpec
monthly Configuration of monthly monitoring evaluated at a table level. TableMonthlyMonitoringCheckCategoriesSpec

TablePartitionedCheckCategoriesSpec

Container of table level partitioned checks, divided by the time window (daily, monthly, etc.)

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
daily Configuration of day partitioned data quality checks evaluated at a table level. TableDailyPartitionedCheckCategoriesSpec
monthly Configuration of monthly partitioned data quality checks evaluated at a table level.. TableMonthlyPartitionedCheckCategoriesSpec

TableStatisticsCollectorsRootCategoriesSpec

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
volume Configuration of volume statistics collectors on a table level. TableVolumeStatisticsCollectorsSpec
schema TableSchemaStatisticsCollectorsSpec

TableVolumeStatisticsCollectorsSpec

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
row_count Configuration of the row count profiler. TableVolumeRowCountStatisticsCollectorSpec

TableVolumeRowCountStatisticsCollectorSpec

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
parameters Profiler parameters TableVolumeRowCountSensorParametersSpec
disabled Disables this profiler. Only enabled profilers are executed during a profiling process. boolean

TableSchemaStatisticsCollectorsSpec

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
column_count Configuration of the column count profiler. TableSchemaColumnCountStatisticsCollectorSpec

TableSchemaColumnCountStatisticsCollectorSpec

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
parameters Profiler parameters TableColumnCountSensorParametersSpec
disabled Disables this profiler. Only enabled profilers are executed during a profiling process. boolean

ColumnSpecMap

Dictionary of columns indexed by a physical column name.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
self Dict[string, ColumnSpec]

ColumnSpec

Column specification that identifies a single column.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
disabled Disables all data quality checks on the column. Data quality checks will not be executed. boolean
sql_expression SQL expression used for calculated fields or when additional column value transformation is required before the column can be used for analysis with data quality checks (data type conversion, transformation). It should be an SQL expression that uses the SQL language of the analyzed database type. Use the replacement tokens {table} to replace the content with the full table name, {alias} to replace the content with the table alias of the table under analysis, or {column} to replace the content with the analyzed column name. An example of extracting a value from a string column storing JSON in PostgreSQL: "{column}::json->'address'->'zip'". string
type_snapshot Column data type that was retrieved when the table metadata was imported. ColumnTypeSnapshotSpec
id True when this column is a part of the primary key or a business key that identifies a row. Error sampling captures values of id columns to identify the row where the error sample was found. boolean
profiling_checks Configuration of data quality profiling checks that are enabled. Pick a check from a category, apply the parameters and rules to enable it. ColumnProfilingCheckCategoriesSpec
monitoring_checks Configuration of column level monitoring checks. Monitoring are data quality checks that are evaluated for each period of time (daily, weekly, monthly, etc.). A monitoring stores only the most recent data quality check result for each period of time. ColumnMonitoringCheckCategoriesSpec
partitioned_checks Configuration of column level date/time partitioned checks. Partitioned data quality checks are evaluated for each partition separately, raising separate alerts at a partition level. The table does not need to be physically partitioned by date, it is possible to run data quality checks for each day or month of data separately. ColumnPartitionedCheckCategoriesSpec
statistics Custom configuration of a column level statistics collector (a basic profiler). Enables customization of the statistics collector settings when the collector is analysing this column. ColumnStatisticsCollectorsRootCategoriesSpec
labels Custom labels that were assigned to the column. Labels are used for searching for columns when filtered data quality checks are executed. LabelSetSpec
comments Comments for change tracking. Please put comments in this collection because YAML comments may be removed when the YAML file is modified by the tool (serialization and deserialization will remove non tracked comments). CommentsListSpec
advanced_properties A dictionary of advanced properties that can be used for e.g. to support mapping data to data catalogs, a key/value dictionary. Dict[string, string]

ColumnTypeSnapshotSpec

Stores the column data type captured at the time of the table metadata import.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
column_type Column data type using the monitored database type names. string
nullable Column is nullable. boolean
length Maximum length of text and binary columns. integer
precision Precision of a numeric (decimal) data type. integer
scale Scale of a numeric (decimal) data type. integer
nested This field is a nested field inside another STRUCT. It is used to identify nested fields in JSON files. boolean

ColumnMonitoringCheckCategoriesSpec

Container of column level monitoring, divided by the time window (daily, monthly, etc.)

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
daily Configuration of daily monitoring evaluated at a column level. ColumnDailyMonitoringCheckCategoriesSpec
monthly Configuration of monthly monitoring evaluated at a column level. ColumnMonthlyMonitoringCheckCategoriesSpec

ColumnPartitionedCheckCategoriesSpec

Container of column level partitioned checks, divided by the time window (daily, monthly, etc.)

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
daily Configuration of day partitioned data quality checks evaluated at a column level. ColumnDailyPartitionedCheckCategoriesSpec
monthly Configuration of monthly partitioned data quality checks evaluated at a column level. ColumnMonthlyPartitionedCheckCategoriesSpec

ColumnStatisticsCollectorsRootCategoriesSpec

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
nulls Configuration of null values profilers on a column level. ColumnNullsStatisticsCollectorsSpec
text Configuration of text column profilers on a column level. ColumnTextStatisticsCollectorsSpec
uniqueness Configuration of profilers that analyse uniqueness of values (distinct count). ColumnUniquenessStatisticsCollectorsSpec
range Configuration of profilers that analyse the range of values (min, max). ColumnRangeStatisticsCollectorsSpec
sampling Configuration of profilers that collect the column samples. ColumnSamplingStatisticsCollectorsSpec

ColumnNullsStatisticsCollectorsSpec

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
nulls_count Configuration of the profiler that counts null column values. ColumnNullsNullsCountStatisticsCollectorSpec
nulls_percent Configuration of the profiler that measures the percentage of null values. ColumnNullsNullsPercentStatisticsCollectorSpec
not_nulls_count Configuration of the profiler that counts not null column values. ColumnNullsNotNullsCountStatisticsCollectorSpec
not_nulls_percent Configuration of the profiler that measures the percentage of not null values. ColumnNullsNotNullsPercentStatisticsCollectorSpec

ColumnNullsNullsCountStatisticsCollectorSpec

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
parameters Profiler parameters ColumnNullsNullsCountSensorParametersSpec
disabled Disables this profiler. Only enabled profilers are executed during a profiling process. boolean

ColumnNullsNullsPercentStatisticsCollectorSpec

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
parameters Profiler parameters ColumnNullsNullsPercentSensorParametersSpec
disabled Disables this profiler. Only enabled profilers are executed during a profiling process. boolean

ColumnNullsNotNullsCountStatisticsCollectorSpec

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
parameters Profiler parameters ColumnNullsNotNullsCountSensorParametersSpec
disabled Disables this profiler. Only enabled profilers are executed during a profiling process. boolean

ColumnNullsNotNullsPercentStatisticsCollectorSpec

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
parameters Profiler parameters ColumnNullsNotNullsPercentSensorParametersSpec
disabled Disables this profiler. Only enabled profilers are executed during a profiling process. boolean

ColumnTextStatisticsCollectorsSpec

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
text_max_length Configuration of the profiler that finds the maximum text length. ColumnTextTextMaxLengthStatisticsCollectorSpec
text_mean_length Configuration of the profiler that finds the mean text length. ColumnTextTextMeanLengthStatisticsCollectorSpec
text_min_length Configuration of the profiler that finds the min text length. ColumnTextTextMinLengthStatisticsCollectorSpec
text_datatype_detect Configuration of the profiler that detects datatype. ColumnTextTextDatatypeDetectStatisticsCollectorSpec
text_min_word_count Configuration of the profiler that finds the estimated minimum word count. ColumnTextMinWordCountStatisticsCollectorSpec
text_max_word_count Configuration of the profiler that finds the estimated maximum word count. ColumnTextMaxWordCountStatisticsCollectorSpec

ColumnTextTextMaxLengthStatisticsCollectorSpec

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
parameters Profiler parameters ColumnTextTextMaxLengthSensorParametersSpec
disabled Disables this profiler. Only enabled profilers are executed during a profiling process. boolean

ColumnTextTextMeanLengthStatisticsCollectorSpec

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
parameters Profiler parameters ColumnTextTextMeanLengthSensorParametersSpec
disabled Disables this profiler. Only enabled profilers are executed during a profiling process. boolean

ColumnTextTextMinLengthStatisticsCollectorSpec

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
parameters Profiler parameters ColumnTextTextMinLengthSensorParametersSpec
disabled Disables this profiler. Only enabled profilers are executed during a profiling process. boolean

ColumnTextTextDatatypeDetectStatisticsCollectorSpec

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
parameters Profiler parameters ColumnDatatypeStringDatatypeDetectSensorParametersSpec
disabled Disables this profiler. Only enabled profilers are executed during a profiling process. boolean

ColumnTextMinWordCountStatisticsCollectorSpec

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
parameters Profiler parameters ColumnTextMinWordCountSensorParametersSpec
disabled Disables this profiler. Only enabled profilers are executed during a profiling process. boolean

ColumnTextMaxWordCountStatisticsCollectorSpec

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
parameters Profiler parameters ColumnTextMaxWordCountSensorParametersSpec
disabled Disables this profiler. Only enabled profilers are executed during a profiling process. boolean

ColumnUniquenessStatisticsCollectorsSpec

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
distinct_count Configuration of the profiler that counts distinct column values. ColumnUniquenessDistinctCountStatisticsCollectorSpec
distinct_percent Configuration of the profiler that measure the percentage of distinct column values. ColumnUniquenessDistinctPercentStatisticsCollectorSpec
duplicate_count Configuration of the profiler that counts duplicate column values. ColumnUniquenessDuplicateCountStatisticsCollectorSpec
duplicate_percent Configuration of the profiler that measure the percentage of duplicate column values. ColumnUniquenessDuplicatePercentStatisticsCollectorSpec

ColumnUniquenessDistinctCountStatisticsCollectorSpec

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
parameters Profiler parameters ColumnUniquenessDistinctCountSensorParametersSpec
disabled Disables this profiler. Only enabled profilers are executed during a profiling process. boolean

ColumnUniquenessDistinctPercentStatisticsCollectorSpec

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
parameters Profiler parameters ColumnUniquenessDistinctPercentSensorParametersSpec
disabled Disables this profiler. Only enabled profilers are executed during a profiling process. boolean

ColumnUniquenessDuplicateCountStatisticsCollectorSpec

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
parameters Profiler parameters ColumnUniquenessDuplicateCountSensorParametersSpec
disabled Disables this profiler. Only enabled profilers are executed during a profiling process. boolean

ColumnUniquenessDuplicatePercentStatisticsCollectorSpec

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
parameters Profiler parameters ColumnUniquenessDuplicatePercentSensorParametersSpec
disabled Disables this profiler. Only enabled profilers are executed during a profiling process. boolean

ColumnRangeStatisticsCollectorsSpec

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
min_value Configuration of the profiler that finds the minimum value in the column. ColumnRangeMinValueStatisticsCollectorSpec
median_value Configuration of the profiler that finds the median value in the column. ColumnRangeMedianValueStatisticsCollectorSpec
max_value Configuration of the profiler that finds the maximum value in the column. ColumnRangeMaxValueStatisticsCollectorSpec
mean_value Configuration of the profiler that finds the mean value in the column. ColumnRangeMeanValueStatisticsCollectorSpec
sum_value Configuration of the profiler that finds the sum value in the column. ColumnRangeSumValueStatisticsCollectorSpec

ColumnRangeMinValueStatisticsCollectorSpec

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
parameters Profiler parameters ColumnRangeMinValueSensorParametersSpec
disabled Disables this profiler. Only enabled profilers are executed during a profiling process. boolean

ColumnRangeMinValueSensorParametersSpec

Column level sensor that finds the minimum value. It works on any data type that supports the MIN functions. The returned data type matches the data type of the column (can return date, integer, string, datetime, etc.).

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
filter SQL WHERE clause added to the sensor query. Both the table level filter and a sensor query filter are added, separated by an AND operator. string

ColumnRangeMedianValueStatisticsCollectorSpec

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
parameters Profiler parameters ColumnNumericMedianSensorParametersSpec
disabled Disables this profiler. Only enabled profilers are executed during a profiling process. boolean

ColumnRangeMaxValueStatisticsCollectorSpec

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
parameters Profiler parameters ColumnRangeMaxValueSensorParametersSpec
disabled Disables this profiler. Only enabled profilers are executed during a profiling process. boolean

ColumnRangeMaxValueSensorParametersSpec

Column level sensor that finds the maximum value. It works on any data type that supports the MAX functions. The returned data type matches the data type of the column (can return date, integer, string, datetime, etc.).

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
filter SQL WHERE clause added to the sensor query. Both the table level filter and a sensor query filter are added, separated by an AND operator. string

ColumnRangeMeanValueStatisticsCollectorSpec

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
parameters Profiler parameters ColumnNumericMeanSensorParametersSpec
disabled Disables this profiler. Only enabled profilers are executed during a profiling process. boolean

ColumnRangeSumValueStatisticsCollectorSpec

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
parameters Profiler parameters ColumnNumericSumSensorParametersSpec
disabled Disables this profiler. Only enabled profilers are executed during a profiling process. boolean

ColumnSamplingStatisticsCollectorsSpec

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
column_samples Configuration of the profiler that finds the maximum string length. ColumnSamplingColumnSamplesStatisticsCollectorSpec

ColumnSamplingColumnSamplesStatisticsCollectorSpec

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
parameters Profiler parameters ColumnSamplingColumnSamplesSensorParametersSpec
disabled Disables this profiler. Only enabled profilers are executed during a profiling process. boolean

FileFormatSpec

File format specification for data loaded from the physical files of one of supported formats.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
csv Csv file format specification. CsvFileFormatSpec
json Json file format specification. JsonFileFormatSpec
parquet Parquet file format specification. ParquetFileFormatSpec
avro Avro file format specification. AvroFileFormatSpec
iceberg Iceberg file format specification. IcebergFileFormatSpec
delta_lake Delta Lake file format specification. DeltaLakeFileFormatSpec
file_paths The list of paths to files with data that are used as a source. FilePathListSpec

FilePathListSpec

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
self List[string]

TableLineageSourceSpecList

List of source tables of the current table to build the data lineage report.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
self List[TableLineageSource]

TableLineageSource

Key object that identifies a source table by using the connection name, schema name and table name to identify.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
connection Connection name string
schema Schema name string
table Table name string

TableLineageSourceSpec

Data lineage specification for a table to identify a source table of the current table where this object is stored.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
source_connection The name of a source connection that is defined in DQOps and contains a source table from which the current table receives data. string
source_schema The name of a source schema within the source connection that contains a source table from which the current table receives data. string
source_table The name of a source table in the source schema from which the current table receives data. string
data_lineage_source_tool The name of a source tool from which this data lineage information was copied. This field should be filled when the data lineage was imported from another data catalog or a data lineage tracking platform. string
properties A dictionary of mapping properties stored as a key/value dictionary. Data lineage synchronization tools that are importing data lineage mappings from external data lineage sources can use it to store mapping information. Dict[string, string]
columns Configuration of source columns for each column in the current table. The keys in this dictionary are column names in the current table. The object stored in the dictionary contain a list of source columns. ColumnLineageSourceSpecMap

ColumnLineageSourceSpecMap

Dictionary of mapping of source columns to the columns in the current table. The keys in this dictionary are the column names in the current table.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
self Dict[string, ColumnLineageSourceSpec]

ColumnLineageSourceSpec

Describes the list of source columns for a column in the current table.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
source_columns A list of source columns from the source table name from which this column receives data. SourceColumnsSetSpec
properties A dictionary of mapping properties stored as a key/value dictionary. Data lineage synchronization tools that are importing data lineage mappings from external data lineage sources can use it to store mapping information. Dict[string, string]

SourceColumnsSetSpec

A collection of unique names of source columns from which the current column receives data. This information is used to track column-level data lineage.