Skip to content

DQOps Data Quality Operations Center Documentation

DQOps YAML file definitions

Last updated: July 22, 2025

DQOps YAML file definitions

The definition of YAML files used by DQOps to configure the data sources, monitored tables, and the configuration of activated data quality checks.

TableYaml

Table and column definition file that defines a list of tables and columns that are covered by data quality checks.

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`api_version`	DQOps YAML schema version	string		dqo/v1
`kind`	File type	enum	source table sensor provider_sensor rule check settings file_index connection_similarity_index dashboards default_schedules default_checks default_table_checks default_column_checks default_notifications	table
`spec`	Table specification object with the table metadata and the configuration of data quality checks	TableSpec

TableSpec

Table specification that defines data quality tests that are enabled on a table and its columns.

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`disabled`	Disables all data quality checks on the table. Data quality checks will not be executed.	boolean
`stage`	Stage name.	string
`priority`	Table priority (1, 2, 3, 4, ...). The tables can be assigned a priority level. The table priority is copied into each data quality check result and a sensor result, enabling efficient grouping of more and less important tables during a data quality improvement project, when the data quality issues on higher priority tables are fixed before data quality issues on less important tables.	integer
`filter`	SQL WHERE clause added to the sensor queries. Use replacement tokens {table} to replace the content with the full table name, {alias} to replace the content with the table alias of an analyzed table or {column} to replace the content with the analyzed column name.	string
`do_not_collect_error_samples_in_profiling`	Disable automatic collection of error samples in the profiling section. The profiling checks by default always collect error samples for failed data quality checks.	boolean
`always_collect_error_samples_in_monitoring`	Always collect error samples for failed monitoring checks. DQOps will not collect error samples automatically when the checks are executed by a scheduler or by running checks from the metadata tree. Error samples are always collected only when the checks are run from the check editor.	boolean
`timestamp_columns`	Column names that store the timestamps that identify the event (transaction) timestamp and the ingestion (inserted / loaded at) timestamps. Also configures the timestamp source for the date/time partitioned data quality checks (event timestamp or ingestion timestamp).	TimestampColumnsSpec
`incremental_time_window`	Configuration of the time window for analyzing daily or monthly partitions. Specifies the number of recent days and recent months that are analyzed when the partitioned data quality checks are run in an incremental mode (the default mode).	PartitionIncrementalTimeWindowSpec
`default_grouping_name`	The name of the default data grouping configuration that is applied on data quality checks. When a default data grouping is selected, all data quality checks run SQL queries with a GROUP BY clause, calculating separate data quality checks for each group of data. The data groupings are defined in the 'groupings' dictionary (indexed by the data grouping name).	string
`groupings`	Data grouping configurations list. Data grouping configurations are configured in two cases: (1) the data in the table should be analyzed with a GROUP BY condition, to analyze different datasets using separate time series, for example a table contains data from multiple countries and there is a 'country' column used for partitioning. (2) a tag is assigned to a table (within a data grouping level hierarchy), when the data is segmented at a table level (similar tables store the same information, but for different countries, etc.).	DataGroupingConfigurationSpecMap
`table_comparisons`	Dictionary of data comparison configurations. Data comparison configurations are used for comparisons between data sources to compare this table (called the compared table) with other reference tables (the source of truth). The reference table's metadata must be imported into DQOps, but the reference table may be located in another data source. DQOps will compare metrics calculated for groups of rows (using the GROUP BY clause). For each comparison, the user must specify a name of a data grouping. The number of data grouping dimensions in the parent table and the reference table defined in the selected data grouping configurations must match. DQOps will run the same data quality sensors on both the parent table (table under test) and the reference table (the source of truth), comparing the measures (sensor readouts) captured from both tables.	TableComparisonConfigurationSpecMap
`incident_grouping`	Incident grouping configuration with the overridden configuration at a table level. The configured field value in this object will override the default configuration from the connection level. Incident grouping level can be changed or incident creation can be disabled.	TableIncidentGroupingSpec
`owner`	Table owner information like the data steward name or the business application name.	TableOwnerSpec
`profiling_checks`	Configuration of data quality profiling checks that are enabled. Pick a check from a category, apply the parameters and rules to enable it.	TableProfilingCheckCategoriesSpec
`monitoring_checks`	Configuration of table level monitoring checks. Monitoring checks are data quality checks that are evaluated for each period of time (daily, weekly, monthly, etc.). A monitoring check stores only the most recent data quality check result for each period of time.	TableMonitoringCheckCategoriesSpec
`partitioned_checks`	Configuration of table level date/time partitioned checks. Partitioned data quality checks are evaluated for each partition separately, raising separate alerts at a partition level. The table does not need to be physically partitioned by date, it is possible to run data quality checks for each day or month of data separately.	TablePartitionedCheckCategoriesSpec
`statistics`	Configuration of table level data statistics collector (a basic profiler). Configures which statistics collectors are enabled and how they are configured.	TableStatisticsCollectorsRootCategoriesSpec
`schedules_override`	Configuration of the job scheduler that runs data quality checks. The scheduler configuration is divided into types of checks that have different schedules.	CronSchedulesSpec
`columns`	Dictionary of columns, indexed by a physical column name. Column specification contains the expected column data type and a list of column level data quality checks that are enabled for a column.	ColumnSpecMap
`labels`	Custom labels that were assigned to the table. Labels are used for searching for tables when filtered data quality checks are executed.	LabelSetSpec
`comments`	Comments used for change tracking and documenting changes directly in the table data quality specification file.	CommentsListSpec
`file_format`	File format with the specification used as a source data. It overrides the connection spec's file format when it is set	FileFormatSpec
`advanced_properties`	A dictionary of advanced properties that can be used for e.g. to support mapping data to data catalogs, a key/value dictionary.	Dict[string, string]
`source_tables`	A list of source tables. This information is used to define the data lineage report for the table.	TableLineageSourceSpecList

TimestampColumnsSpec

Configuration of timestamp related columns on a table level. Timestamp columns are used for timeliness data quality checks and for date/time partitioned checks.

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`event_timestamp_column`	Column name that identifies an event timestamp (date/time), such as a transaction timestamp, impression timestamp, event timestamp.	string
`ingestion_timestamp_column`	Column name that contains the timestamp (or date/time) when the row was ingested (loaded, inserted) into the table. Use a column that is filled by the data pipeline or ETL process at the time of the data loading.	string
`partition_by_column`	Column name that contains the date, datetime or timestamp column for date/time partitioned data. Partition checks (daily partition checks and monthly partition checks) use this column in a GROUP BY clause in order to detect data quality issues in each partition separately. It should be a DATE type, DATETIME type (using a local server time zone) or a TIMESTAMP type (a UTC absolute time).	string

PartitionIncrementalTimeWindowSpec

Configuration of the time window for running incremental partition checks.

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`daily_partitioning_recent_days`	The number of recent days analyzed by daily partition checks in incremental mode. The default value is 7 last days.	integer
`daily_partitioning_include_today`	Analyze also today's data by daily partition checks in incremental mode. The default value is false, which means that the today's and the future partitions are not analyzed. Only yesterday's partition and previous daily partitions are analyzed because today's data may still be incomplete. Change the value to 'true' if the current day should also be analyzed. This change may require you to configure the schedule for daily checks correctly. The checks must run after the data load.	boolean
`monthly_partitioning_recent_months`	The number of recent days analyzed by monthly partition checks in incremental mode. The default value is the previous calendar month.	integer
`monthly_partitioning_include_current_month`	Analyze also this month's data by monthly partition checks in incremental mode. The default value is false, which means that the current month is not analyzed. Future data is also filtered out because the current month may be incomplete. Change the value to 'true' if the current month should also be analyzed before the end of the month. This change may require you to configure the schedule to run monthly checks more frequently (daily, hourly, etc.).	boolean

DataGroupingConfigurationSpecMap

Dictionary of named data grouping configurations defined on a table level.

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`self`		Dict[string, DataGroupingConfigurationSpec]

TableComparisonConfigurationSpecMap

Dictionary of data comparison configurations between the current table (the parent of this node) and another reference table (the source of truth) to which we are comparing the tables to measure the accuracy of the data.

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`self`		Dict[string, TableComparisonConfigurationSpec]

TableComparisonConfigurationSpec

Identifies a data comparison configuration between a parent table (the compared table) and the target table from another data source (a reference table).

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`reference_table_connection_name`	The name of the connection in DQOp where the reference table (the source of truth) is configured. When the connection name is not provided, DQOps will find the reference table on the connection of the parent table.	string
`reference_table_schema_name`	The name of the schema where the reference table is imported into DQOps. The reference table's metadata must be imported into DQOps.	string
`reference_table_name`	The name of the reference table that is imported into DQOps. The reference table's metadata must be imported into DQOps.	string
`compared_table_filter`	Optional custom SQL filter expression that is added to the SQL query that retrieves the data from the compared table. This expression must be a SQL expression that will be added to the WHERE clause when querying the compared table.	string
`reference_table_filter`	Optional custom SQL filter expression that is added to the SQL query that retrieves the data from the reference table (the source of truth). This expression must be a SQL expression that will be added to the WHERE clause when querying the reference table.	string
`check_type`	The type of checks (profiling, monitoring, partitioned) that this check comparison configuration is applicable. The default value is 'profiling'.	enum	profiling monitoring partitioned
`time_scale`	The time scale that this check comparison configuration is applicable. Supported values are 'daily' and 'monthly' for monitoring and partitioned checks or an empty value for profiling checks.	enum	daily monthly
`grouping_columns`	List of column pairs from both the compared table and the reference table that are used in a GROUP BY clause for grouping both the compared table and the reference table (the source of truth). The columns are used in the next of the table comparison to join the results of data groups (row counts, sums of columns) between the compared table and the reference table to compare the differences.	TableComparisonGroupingColumnsPairsListSpec

TableComparisonGroupingColumnsPairsListSpec

List of column pairs used for grouping and joining in the table comparison checks.

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`self`		List[TableComparisonGroupingColumnsPairSpec]

TableComparisonGroupingColumnsPairSpec

Configuration of a pair of columns on the compared table and the reference table (the source of truth) that are joined and used for grouping to perform data comparison of aggregated results (sums of columns, row counts, etc.).

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`compared_table_column_name`	The name of the column on the compared table (the parent table) that is used in the GROUP BY clause to group rows before compared aggregates (row counts, sums, etc.) are calculated. This column is also used to join (match) results to the reference table.	string
`reference_table_column_name`	The name of the column on the reference table (the source of truth) that is used in the GROUP BY clause to group rows before compared aggregates (row counts, sums, etc.) are calculated. This column is also used to join (match) results to the compared table.	string

TableIncidentGroupingSpec

Configuration of data quality incident grouping on a table level. Defines how similar data quality issues are grouped into incidents.

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`grouping_level`	Grouping level of failed data quality checks for creating higher level data quality incidents. The default grouping level is by a table, a data quality dimension and a check category (i.e. a datatype data quality incident detected on a table X in the numeric checks category).	enum	table table_dimension table_dimension_category table_dimension_category_type table_dimension_category_name
`minimum_severity`	Minimum severity level of data quality issues that are grouped into incidents. The default minimum severity level is 'warning'. Other supported severity levels are 'error' and 'fatal'.	enum	warning error fatal
`divide_by_data_group`	Create separate data quality incidents for each data group, creating different incidents for different groups of rows. By default, data groups are ignored for grouping data quality issues into data quality incidents.	boolean
`disabled`	Disables data quality incident creation for failed data quality checks on the table.	boolean

TableOwnerSpec

Table owner information.

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`data_steward`	Data steward name	string
`application`	Business application name	string

TableMonitoringCheckCategoriesSpec

Container of table level monitoring, divided by the time window (daily, monthly, etc.)

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`daily`	Configuration of daily monitoring evaluated at a table level.	TableDailyMonitoringCheckCategoriesSpec
`monthly`	Configuration of monthly monitoring evaluated at a table level.	TableMonthlyMonitoringCheckCategoriesSpec

TablePartitionedCheckCategoriesSpec

Container of table level partitioned checks, divided by the time window (daily, monthly, etc.)

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`daily`	Configuration of day partitioned data quality checks evaluated at a table level.	TableDailyPartitionedCheckCategoriesSpec
`monthly`	Configuration of monthly partitioned data quality checks evaluated at a table level..	TableMonthlyPartitionedCheckCategoriesSpec

TableStatisticsCollectorsRootCategoriesSpec

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`volume`	Configuration of volume statistics collectors on a table level.	TableVolumeStatisticsCollectorsSpec
`schema`		TableSchemaStatisticsCollectorsSpec

TableVolumeStatisticsCollectorsSpec

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`row_count`	Configuration of the row count profiler.	TableVolumeRowCountStatisticsCollectorSpec

TableVolumeRowCountStatisticsCollectorSpec

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`parameters`	Profiler parameters	TableVolumeRowCountSensorParametersSpec
`disabled`	Disables this profiler. Only enabled profilers are executed during a profiling process.	boolean

TableSchemaStatisticsCollectorsSpec

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`column_count`	Configuration of the column count profiler.	TableSchemaColumnCountStatisticsCollectorSpec

TableSchemaColumnCountStatisticsCollectorSpec

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`parameters`	Profiler parameters	TableColumnCountSensorParametersSpec
`disabled`	Disables this profiler. Only enabled profilers are executed during a profiling process.	boolean

ColumnSpecMap

Dictionary of columns indexed by a physical column name.

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`self`		Dict[string, ColumnSpec]

ColumnSpec

Column specification that identifies a single column.

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`disabled`	Disables all data quality checks on the column. Data quality checks will not be executed.	boolean
`sql_expression`	SQL expression used for calculated fields or when additional column value transformation is required before the column can be used for analysis with data quality checks (data type conversion, transformation). It should be an SQL expression that uses the SQL language of the analyzed database type. Use the replacement tokens {table} to replace the content with the full table name, {alias} to replace the content with the table alias of the table under analysis, or {column} to replace the content with the analyzed column name. An example of extracting a value from a string column storing JSON in PostgreSQL: "{column}::json->'address'->'zip'".	string
`type_snapshot`	Column data type that was retrieved when the table metadata was imported.	ColumnTypeSnapshotSpec
`id`	True when this column is a part of the primary key or a business key that identifies a row. Error sampling captures values of id columns to identify the row where the error sample was found.	boolean
`profiling_checks`	Configuration of data quality profiling checks that are enabled. Pick a check from a category, apply the parameters and rules to enable it.	ColumnProfilingCheckCategoriesSpec
`monitoring_checks`	Configuration of column level monitoring checks. Monitoring are data quality checks that are evaluated for each period of time (daily, weekly, monthly, etc.). A monitoring stores only the most recent data quality check result for each period of time.	ColumnMonitoringCheckCategoriesSpec
`partitioned_checks`	Configuration of column level date/time partitioned checks. Partitioned data quality checks are evaluated for each partition separately, raising separate alerts at a partition level. The table does not need to be physically partitioned by date, it is possible to run data quality checks for each day or month of data separately.	ColumnPartitionedCheckCategoriesSpec
`statistics`	Custom configuration of a column level statistics collector (a basic profiler). Enables customization of the statistics collector settings when the collector is analysing this column.	ColumnStatisticsCollectorsRootCategoriesSpec
`labels`	Custom labels that were assigned to the column. Labels are used for searching for columns when filtered data quality checks are executed.	LabelSetSpec
`comments`	Comments for change tracking. Please put comments in this collection because YAML comments may be removed when the YAML file is modified by the tool (serialization and deserialization will remove non tracked comments).	CommentsListSpec
`advanced_properties`	A dictionary of advanced properties that can be used for e.g. to support mapping data to data catalogs, a key/value dictionary.	Dict[string, string]

ColumnTypeSnapshotSpec

Stores the column data type captured at the time of the table metadata import.

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`column_type`	Column data type using the monitored database type names.	string
`nullable`	Column is nullable.	boolean
`length`	Maximum length of text and binary columns.	integer
`precision`	Precision of a numeric (decimal) data type.	integer
`scale`	Scale of a numeric (decimal) data type.	integer
`nested`	This field is a nested field inside another STRUCT. It is used to identify nested fields in JSON files.	boolean

ColumnMonitoringCheckCategoriesSpec

Container of column level monitoring, divided by the time window (daily, monthly, etc.)

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`daily`	Configuration of daily monitoring evaluated at a column level.	ColumnDailyMonitoringCheckCategoriesSpec
`monthly`	Configuration of monthly monitoring evaluated at a column level.	ColumnMonthlyMonitoringCheckCategoriesSpec

ColumnPartitionedCheckCategoriesSpec

Container of column level partitioned checks, divided by the time window (daily, monthly, etc.)

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`daily`	Configuration of day partitioned data quality checks evaluated at a column level.	ColumnDailyPartitionedCheckCategoriesSpec
`monthly`	Configuration of monthly partitioned data quality checks evaluated at a column level.	ColumnMonthlyPartitionedCheckCategoriesSpec

ColumnStatisticsCollectorsRootCategoriesSpec

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`nulls`	Configuration of null values profilers on a column level.	ColumnNullsStatisticsCollectorsSpec
`text`	Configuration of text column profilers on a column level.	ColumnTextStatisticsCollectorsSpec
`uniqueness`	Configuration of profilers that analyse uniqueness of values (distinct count).	ColumnUniquenessStatisticsCollectorsSpec
`range`	Configuration of profilers that analyse the range of values (min, max).	ColumnRangeStatisticsCollectorsSpec
`sampling`	Configuration of profilers that collect the column samples.	ColumnSamplingStatisticsCollectorsSpec

ColumnNullsStatisticsCollectorsSpec

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`nulls_count`	Configuration of the profiler that counts null column values.	ColumnNullsNullsCountStatisticsCollectorSpec
`nulls_percent`	Configuration of the profiler that measures the percentage of null values.	ColumnNullsNullsPercentStatisticsCollectorSpec
`not_nulls_count`	Configuration of the profiler that counts not null column values.	ColumnNullsNotNullsCountStatisticsCollectorSpec
`not_nulls_percent`	Configuration of the profiler that measures the percentage of not null values.	ColumnNullsNotNullsPercentStatisticsCollectorSpec

ColumnNullsNullsCountStatisticsCollectorSpec

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`parameters`	Profiler parameters	ColumnNullsNullsCountSensorParametersSpec
`disabled`	Disables this profiler. Only enabled profilers are executed during a profiling process.	boolean

ColumnNullsNullsPercentStatisticsCollectorSpec

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`parameters`	Profiler parameters	ColumnNullsNullsPercentSensorParametersSpec
`disabled`	Disables this profiler. Only enabled profilers are executed during a profiling process.	boolean

ColumnNullsNotNullsCountStatisticsCollectorSpec

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`parameters`	Profiler parameters	ColumnNullsNotNullsCountSensorParametersSpec
`disabled`	Disables this profiler. Only enabled profilers are executed during a profiling process.	boolean

ColumnNullsNotNullsPercentStatisticsCollectorSpec

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`parameters`	Profiler parameters	ColumnNullsNotNullsPercentSensorParametersSpec
`disabled`	Disables this profiler. Only enabled profilers are executed during a profiling process.	boolean

ColumnTextStatisticsCollectorsSpec

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`text_max_length`	Configuration of the profiler that finds the maximum text length.	ColumnTextTextMaxLengthStatisticsCollectorSpec
`text_mean_length`	Configuration of the profiler that finds the mean text length.	ColumnTextTextMeanLengthStatisticsCollectorSpec
`text_min_length`	Configuration of the profiler that finds the min text length.	ColumnTextTextMinLengthStatisticsCollectorSpec
`text_datatype_detect`	Configuration of the profiler that detects datatype.	ColumnTextTextDatatypeDetectStatisticsCollectorSpec
`text_min_word_count`	Configuration of the profiler that finds the estimated minimum word count.	ColumnTextMinWordCountStatisticsCollectorSpec
`text_max_word_count`	Configuration of the profiler that finds the estimated maximum word count.	ColumnTextMaxWordCountStatisticsCollectorSpec

ColumnTextTextMaxLengthStatisticsCollectorSpec

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`parameters`	Profiler parameters	ColumnTextTextMaxLengthSensorParametersSpec
`disabled`	Disables this profiler. Only enabled profilers are executed during a profiling process.	boolean

ColumnTextTextMeanLengthStatisticsCollectorSpec

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`parameters`	Profiler parameters	ColumnTextTextMeanLengthSensorParametersSpec
`disabled`	Disables this profiler. Only enabled profilers are executed during a profiling process.	boolean

ColumnTextTextMinLengthStatisticsCollectorSpec

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`parameters`	Profiler parameters	ColumnTextTextMinLengthSensorParametersSpec
`disabled`	Disables this profiler. Only enabled profilers are executed during a profiling process.	boolean

ColumnTextTextDatatypeDetectStatisticsCollectorSpec

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`parameters`	Profiler parameters	ColumnDatatypeStringDatatypeDetectSensorParametersSpec
`disabled`	Disables this profiler. Only enabled profilers are executed during a profiling process.	boolean

ColumnTextMinWordCountStatisticsCollectorSpec

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`parameters`	Profiler parameters	ColumnTextMinWordCountSensorParametersSpec
`disabled`	Disables this profiler. Only enabled profilers are executed during a profiling process.	boolean

ColumnTextMaxWordCountStatisticsCollectorSpec

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`parameters`	Profiler parameters	ColumnTextMaxWordCountSensorParametersSpec
`disabled`	Disables this profiler. Only enabled profilers are executed during a profiling process.	boolean

ColumnUniquenessStatisticsCollectorsSpec

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`distinct_count`	Configuration of the profiler that counts distinct column values.	ColumnUniquenessDistinctCountStatisticsCollectorSpec
`distinct_percent`	Configuration of the profiler that measure the percentage of distinct column values.	ColumnUniquenessDistinctPercentStatisticsCollectorSpec
`duplicate_count`	Configuration of the profiler that counts duplicate column values.	ColumnUniquenessDuplicateCountStatisticsCollectorSpec
`duplicate_percent`	Configuration of the profiler that measure the percentage of duplicate column values.	ColumnUniquenessDuplicatePercentStatisticsCollectorSpec

ColumnUniquenessDistinctCountStatisticsCollectorSpec

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`parameters`	Profiler parameters	ColumnUniquenessDistinctCountSensorParametersSpec
`disabled`	Disables this profiler. Only enabled profilers are executed during a profiling process.	boolean

ColumnUniquenessDistinctPercentStatisticsCollectorSpec

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`parameters`	Profiler parameters	ColumnUniquenessDistinctPercentSensorParametersSpec
`disabled`	Disables this profiler. Only enabled profilers are executed during a profiling process.	boolean

ColumnUniquenessDuplicateCountStatisticsCollectorSpec

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`parameters`	Profiler parameters	ColumnUniquenessDuplicateCountSensorParametersSpec
`disabled`	Disables this profiler. Only enabled profilers are executed during a profiling process.	boolean

ColumnUniquenessDuplicatePercentStatisticsCollectorSpec

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`parameters`	Profiler parameters	ColumnUniquenessDuplicatePercentSensorParametersSpec
`disabled`	Disables this profiler. Only enabled profilers are executed during a profiling process.	boolean

ColumnRangeStatisticsCollectorsSpec

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`min_value`	Configuration of the profiler that finds the minimum value in the column.	ColumnRangeMinValueStatisticsCollectorSpec
`median_value`	Configuration of the profiler that finds the median value in the column.	ColumnRangeMedianValueStatisticsCollectorSpec
`max_value`	Configuration of the profiler that finds the maximum value in the column.	ColumnRangeMaxValueStatisticsCollectorSpec
`mean_value`	Configuration of the profiler that finds the mean value in the column.	ColumnRangeMeanValueStatisticsCollectorSpec
`sum_value`	Configuration of the profiler that finds the sum value in the column.	ColumnRangeSumValueStatisticsCollectorSpec

ColumnRangeMinValueStatisticsCollectorSpec

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`parameters`	Profiler parameters	ColumnRangeMinValueSensorParametersSpec
`disabled`	Disables this profiler. Only enabled profilers are executed during a profiling process.	boolean

ColumnRangeMinValueSensorParametersSpec

Column level sensor that finds the minimum value. It works on any data type that supports the MIN functions. The returned data type matches the data type of the column (can return date, integer, string, datetime, etc.).

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`filter`	SQL WHERE clause added to the sensor query. Both the table level filter and a sensor query filter are added, separated by an AND operator.	string

ColumnRangeMedianValueStatisticsCollectorSpec

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`parameters`	Profiler parameters	ColumnNumericMedianSensorParametersSpec
`disabled`	Disables this profiler. Only enabled profilers are executed during a profiling process.	boolean

ColumnRangeMaxValueStatisticsCollectorSpec

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`parameters`	Profiler parameters	ColumnRangeMaxValueSensorParametersSpec
`disabled`	Disables this profiler. Only enabled profilers are executed during a profiling process.	boolean

ColumnRangeMaxValueSensorParametersSpec

Column level sensor that finds the maximum value. It works on any data type that supports the MAX functions. The returned data type matches the data type of the column (can return date, integer, string, datetime, etc.).

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`filter`	SQL WHERE clause added to the sensor query. Both the table level filter and a sensor query filter are added, separated by an AND operator.	string

ColumnRangeMeanValueStatisticsCollectorSpec

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`parameters`	Profiler parameters	ColumnNumericMeanSensorParametersSpec
`disabled`	Disables this profiler. Only enabled profilers are executed during a profiling process.	boolean

ColumnRangeSumValueStatisticsCollectorSpec

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`parameters`	Profiler parameters	ColumnNumericSumSensorParametersSpec
`disabled`	Disables this profiler. Only enabled profilers are executed during a profiling process.	boolean

ColumnSamplingStatisticsCollectorsSpec

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`column_samples`	Configuration of the profiler that finds the maximum string length.	ColumnSamplingColumnSamplesStatisticsCollectorSpec

ColumnSamplingColumnSamplesStatisticsCollectorSpec

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`parameters`	Profiler parameters	ColumnSamplingColumnSamplesSensorParametersSpec
`disabled`	Disables this profiler. Only enabled profilers are executed during a profiling process.	boolean

FileFormatSpec

File format specification for data loaded from the physical files of one of supported formats.

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`csv`	Csv file format specification.	CsvFileFormatSpec
`json`	Json file format specification.	JsonFileFormatSpec
`parquet`	Parquet file format specification.	ParquetFileFormatSpec
`avro`	Avro file format specification.	AvroFileFormatSpec
`iceberg`	Iceberg file format specification.	IcebergFileFormatSpec
`delta_lake`	Delta Lake file format specification.	DeltaLakeFileFormatSpec
`file_paths`	The list of paths to files with data that are used as a source.	FilePathListSpec

FilePathListSpec

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`self`		List[string]

TableLineageSourceSpecList

List of source tables of the current table to build the data lineage report.

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`self`		List[TableLineageSource]

TableLineageSource

Key object that identifies a source table by using the connection name, schema name and table name to identify.

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`connection`	Connection name	string
`schema`	Schema name	string
`table`	Table name	string

TableLineageSourceSpec

Data lineage specification for a table to identify a source table of the current table where this object is stored.

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`source_connection`	The name of a source connection that is defined in DQOps and contains a source table from which the current table receives data.	string
`source_schema`	The name of a source schema within the source connection that contains a source table from which the current table receives data.	string
`source_table`	The name of a source table in the source schema from which the current table receives data.	string
`data_lineage_source_tool`	The name of a source tool from which this data lineage information was copied. This field should be filled when the data lineage was imported from another data catalog or a data lineage tracking platform.	string
`properties`	A dictionary of mapping properties stored as a key/value dictionary. Data lineage synchronization tools that are importing data lineage mappings from external data lineage sources can use it to store mapping information.	Dict[string, string]
`columns`	Configuration of source columns for each column in the current table. The keys in this dictionary are column names in the current table. The object stored in the dictionary contain a list of source columns.	ColumnLineageSourceSpecMap

ColumnLineageSourceSpecMap

Dictionary of mapping of source columns to the columns in the current table. The keys in this dictionary are the column names in the current table.

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`self`		Dict[string, ColumnLineageSourceSpec]

ColumnLineageSourceSpec

Describes the list of source columns for a column in the current table.

The structure of this object is described below

Property name	Description	Data type	Enum values	Default value	Sample values
`source_columns`	A list of source columns from the source table name from which this column receives data.	SourceColumnsSetSpec
`properties`	A dictionary of mapping properties stored as a key/value dictionary. Data lineage synchronization tools that are importing data lineage mappings from external data lineage sources can use it to store mapping information.	Dict[string, string]

SourceColumnsSetSpec

A collection of unique names of source columns from which the current column receives data. This information is used to track column-level data lineage.