models
CheckTarget
Enumeration of targets where the check is applied. It is one of "table" or "column".
The structure of this object is described below
Data type | Enum values |
---|---|
string | column table |
CheckType
Enumeration of data quality check types: profiling, monitoring, partitioned.
The structure of this object is described below
Data type | Enum values |
---|---|
string | profiling partitioned monitoring |
CheckTimeScale
Enumeration of time scale of monitoring and partitioned data quality checks (daily, monthly, etc.)
The structure of this object is described below
Data type | Enum values |
---|---|
string | daily monthly |
FieldModel
Model of a single field that is used to edit a parameter value for a sensor or a rule. Describes the type of the field and the current value.
The structure of this object is described below
Property name | Description | Data type |
---|---|---|
definition | Field name that matches the field name (snake_case) used in the YAML specification. | ParameterDefinitionSpec |
optional | Field value is optional and may be null, when false - the field is required and must be filled. | boolean |
string_value | Field value for a string field. | string |
boolean_value | Field value for a boolean field. | boolean |
integer_value | Field value for an integer (32-bit) field. | integer |
long_value | Field value for a long (64-bit) field. | long |
double_value | Field value for a double field. | double |
datetime_value | Field value for a date time field. | datetime |
column_name_value | Field value for a column name field. | string |
enum_value | Field value for an enum (choice) field. | string |
string_list_value | Field value for an array (list) of strings. | List[string] |
integer_list_value | Field value for an array (list) of integers, using 64 bit integers. | List[integer] |
date_value | Field value for an date. | date |
RuleParametersModel
Model that returns the form definition and the form data to edit parameters (thresholds) for a rule at a single severity level (low, medium, high).
The structure of this object is described below
Property name | Description | Data type |
---|---|---|
rule_name | Full rule name. This field is for information purposes and could be used to create additional custom checks that are reusing the same data quality rule. | string |
rule_parameters | List of fields for editing the rule parameters like thresholds. | List[FieldModel] |
disabled | Disable the rule. The rule will not be evaluated. The sensor will also not be executed if it has no enabled rules. | boolean |
configured | Returns true when the rule is configured (is not null), so it should be shown in the UI as configured (having values). | boolean |
CheckConfigurationModel
Model containing fundamental configuration of a single data quality check.
The structure of this object is described below
Property name | Description | Data type |
---|---|---|
connection_name | Connection name. | string |
schema_name | Schema name. | string |
table_name | Table name. | string |
column_name | Column name, if the check is set up on a column. | string |
check_target | Check target (table or column). | CheckTarget |
check_type | Check type (profiling, monitoring, partitioned). | CheckType |
check_time_scale | Check timescale (for monitoring and partitioned checks). | CheckTimeScale |
category_name | Category to which this check belongs. | string |
check_name | Check name that is used in YAML file. | string |
sensor_parameters | List of fields for editing the sensor parameters. | List[FieldModel] |
table_level_filter | SQL WHERE clause added to the sensor query for every check on this table. | string |
sensor_level_filter | SQL WHERE clause added to the sensor query for this check. | string |
warning | Rule parameters for the warning severity rule. | RuleParametersModel |
error | Rule parameters for the error severity rule. | RuleParametersModel |
fatal | Rule parameters for the fatal severity rule. | RuleParametersModel |
disabled | Whether the check has been disabled. | boolean |
configured | Whether the check is configured (not null). | boolean |
CheckListModel
Simplistic model that returns a single data quality check, its name and "configured" flag.
The structure of this object is described below
Property name | Description | Data type |
---|---|---|
check_category | Check category. | string |
check_name | Data quality check name that is used in YAML. | string |
help_text | Help text that describes the data quality check. | string |
configured | True if the data quality check is configured (not null). When saving the data quality check configuration, set the flag to true for storing the check. | boolean |
CheckContainerListModel
Simplistic model that returns the list of data quality checks, their names, categories and "configured" flag.
The structure of this object is described below
Property name | Description | Data type |
---|---|---|
checks | Simplistic list of all data quality checks. | List[CheckListModel] |
can_edit | Boolean flag that decides if the current user can edit the check. | boolean |
can_run_checks | Boolean flag that decides if the current user can run checks. | boolean |
can_delete_data | Boolean flag that decides if the current user can delete data (results). | boolean |
RuleThresholdsModel
Model that returns the form definition and the form data to edit a single rule with all three threshold levels (low, medium, high).
The structure of this object is described below
Property name | Description | Data type |
---|---|---|
error | Rule parameters for the error severity rule. | RuleParametersModel |
warning | Rule parameters for the warning severity rule. | RuleParametersModel |
fatal | Rule parameters for the fatal severity rule. | RuleParametersModel |
MonitoringScheduleSpec
Monitoring job schedule specification.
The structure of this object is described below
Property name | Description | Data type |
---|---|---|
cron_expression | Unix style cron expression that specifies when to execute scheduled operations like running data quality checks or synchronizing the configuration with the cloud. | string |
disabled | Disables the schedule. When the value of this 'disable' field is false, the schedule is stored in the metadata but it is not activated to run data quality checks. | boolean |
CheckRunScheduleGroup
The run check scheduling group (profiling, daily checks, monthly checks, etc), which identifies the configuration of a schedule (cron expression) used schedule these checks on the job scheduler.
The structure of this object is described below
Data type | Enum values |
---|---|
string | monitoring_monthly profiling partitioned_daily monitoring_daily partitioned_monthly |
EffectiveScheduleLevelModel
Enumeration of possible levels at which a schedule could be configured.
The structure of this object is described below
Data type | Enum values |
---|---|
string | check_override connection table_override |
EffectiveScheduleModel
Model of a configured schedule (on connection or table) or schedule override (on check). Describes the CRON expression and the time of the upcoming execution, as well as the duration until this time.
The structure of this object is described below
Property name | Description | Data type |
---|---|---|
schedule_group | Field value for a schedule group to which this schedule belongs. | CheckRunScheduleGroup |
schedule_level | Field value for the level at which the schedule has been configured. | EffectiveScheduleLevelModel |
cron_expression | Field value for a CRON expression defining the scheduling. | string |
disabled | Field value stating if the schedule has been explicitly disabled. | boolean |
ScheduleEnabledStatusModel
Enumeration of possible ways a schedule can be configured.
The structure of this object is described below
Data type | Enum values |
---|---|
string | not_configured disabled overridden_by_checks enabled |
CommentSpec
Comment entry. Comments are added when a change was made and the change should be recorded in a persisted format.
The structure of this object is described below
Property name | Description | Data type |
---|---|---|
date | Comment date and time | datetime |
comment_by | Commented by | string |
comment | Comment text | string |
CommentsListSpec
List of comments.
The structure of this object is described below
Property name | Description | Data type |
---|---|---|
self | List[CommentSpec] |
CheckSearchFilters
Target data quality checks filter, identifies which checks on which tables and columns should be executed.
The structure of this object is described below
Property name | Description | Data type |
---|---|---|
column | The column name. This field accepts search patterns in the format: 'fk_*', '*_id', 'prefix*suffix'. | string |
column_data_type | The column data type that was imported from the data source and is stored in the columns -> column_name -> type_snapshot -> column_type field in the .dqotable.yaml file. | string |
column_nullable | Optional filter to find only nullable (when the value is true) or not nullable (when the value is false) columns, based on the value of the columns -> column_name -> type_snapshot -> nullable field in the .dqotable.yaml file. | boolean |
check_target | The target type of object to run checks. Supported values are: table to run only table level checks or column to run only column level checks. | CheckTarget |
check_type | The target type of checks to run. Supported values are profiling, monitoring and partitioned. | CheckType |
time_scale | The time scale of monitoring or partitioned checks to run. Supports running only daily or monthly checks. Daily monitoring checks will replace today's value for all captured check results. | CheckTimeScale |
check_category | The target check category, for example: nulls, volume, anomaly. | string |
table_comparison_name | The name of a configured table comparison. When the table comparison is provided, DQOps will only perform table comparison checks that compare data between tables. | string |
check_name | The target check name to run only this named check. Uses the short check name which is the name of the deepest folder in the checks folder. This field supports search patterns such as: 'profiling_*', '*count', 'profiling*_percent'. | string |
sensor_name | The target sensor name to run only data quality checks that are using this sensor. Uses the full sensor name which is the full folder path within the sensors folder. This field supports search patterns such as: 'table/volume/row_*', '*count', 'table/volume/prefix*_suffix'. | string |
connection | The connection (data source) name. Supports search patterns in the format: 'source*', '*_prod', 'prefix*suffix'. | string |
full_table_name | The schema and table name. It is provided as <schema_name>.<table_name>, for example public.fact_sales. The schema and table name accept patterns both in the schema name and table name parts. Sample patterns are: 'schema_name.tab_prefix_*', 'schema_name.', '.*', 'schema_name.*customer', 'schema_name.tab*_suffix'. | string |
enabled | A boolean flag to target enabled tables, columns or checks. When the value of this field is not set, the default value of this field is true, targeting only tables, columns and checks that are not implicitly disabled. | boolean |
DeleteStoredDataQueueJobParameters
Parameters for the "delete stored data queue job that deletes data from parquet files stored in DQOps user home's .data* directory.
The structure of this object is described below
Property name | Description | Data type |
---|---|---|
connection | The connection name. | string |
full_table_name | The schema and table name. It is provided as <schema_name>.<table_name>, for example public.fact_sales. This filter does not support patterns. | string |
date_start | The start date (inclusive) to delete the data, based on the time_period column in Parquet files. | date |
date_end | The end date (inclusive) to delete the data, based on the time_period column in Parquet files. | date |
delete_errors | Delete the data from the errors table. Because the default value is false, this parameter must be set to true to delete the errors. | boolean |
delete_statistics | Delete the data from the statistics table. Because the default value is false, this parameter must be set to true to delete the statistics. | boolean |
delete_check_results | Delete the data from the check_results table. Because the default value is false, this parameter must be set to true to delete the check results. | boolean |
delete_sensor_readouts | Delete the data from the sensor_readouts table. Because the default value is false, this parameter must be set to true to delete the sensor readouts. | boolean |
column_names | The list of column names to delete the data for column level results or errors only for selected columns. | List[string] |
check_category | The check category name, for example volume or anomaly. | string |
table_comparison_name | The name of a table comparison configuration. Deletes only table comparison results (and errors) for a given comparison. | string |
check_name | The name of a data quality check. Uses the short check name, for example daily_row_count. | string |
check_type | The type of checks whose results and errors should be deleted. For example, use monitoring to delete only monitoring checks data. | string |
sensor_name | The full sensor name whose results, checks based on the sensor, statistics and errors generated by the sensor sound be deleted. Uses a full sensor name, for example: table/volume/row_count. | string |
data_group_tag | The names of data groups in any of the grouping_level_1...grouping_level_9 columns in the Parquet tables. Enables deleting data tagged for one data source or a subset of results when the group level is captured from a column in a monitored table. | string |
quality_dimension | The data quality dimension name, for example Timeliness or Completeness. | string |
time_gradient | The time gradient (time scale) of the sensor and check results that are captured. | string |
collector_category | The statistics collector category when statistics should be deleted. A statistics category is a group of statistics, for example sampling for the column value samples. | string |
collector_name | The statistics collector name when only statistics are deleted for a selected collector, for example sample_values. | string |
collector_target | The type of the target object for which the basic statistics are deleted. Supported values are table and column. | string |
CheckTargetModel
Enumeration of possible targets for check model request result.
The structure of this object is described below
Data type | Enum values |
---|---|
string | column table |
SimilarCheckModel
Describes a single check that is similar to other checks in other check types.
The structure of this object is described below
Property name | Description | Data type |
---|---|---|
check_target | The check target (table or column). | CheckTarget |
check_type | The check type. | CheckType |
time_scale | The time scale (daily, monthly). The time scale is optional and could be null (for profiling checks). | CheckTimeScale |
category | The check's category. | string |
check_name | The similar check name in another category. | string |
CheckModel
Model that returns the form definition and the form data to edit a single data quality check.
The structure of this object is described below
Property name | Description | Data type |
---|---|---|
check_name | Data quality check name that is used in YAML. | string |
help_text | Help text that describes the data quality check. | string |
sensor_parameters | List of fields for editing the sensor parameters. | List[FieldModel] |
sensor_name | Full sensor name. This field is for information purposes and could be used to create additional custom checks that are reusing the same data quality sensor. | string |
quality_dimension | Data quality dimension used for tagging the results of this data quality checks. | string |
rule | Threshold (alerting) rules defined for a check. | RuleThresholdsModel |
supports_grouping | The data quality check supports a custom data grouping configuration. | boolean |
data_grouping_override | Data grouping configuration for this check. When a data grouping configuration is assigned at a check level, it overrides the data grouping configuration from the table level. Data grouping is configured in two cases: (1) the data in the table should be analyzed with a GROUP BY condition, to analyze different groups of rows using separate time series, for example a table contains data from multiple countries and there is a 'country' column used for partitioning. (2) a static data grouping configuration is assigned to a table, when the data is partitioned at a table level (similar tables store the same information, but for different countries, etc.). | DataGroupingConfigurationSpec |
schedule_override | Run check scheduling configuration. Specifies the schedule (a cron expression) when the data quality checks are executed by the scheduler. | MonitoringScheduleSpec |
effective_schedule | Model of configured schedule enabled on the check level. | EffectiveScheduleModel |
schedule_enabled_status | State of the scheduling override for this check. | ScheduleEnabledStatusModel |
comments | Comments for change tracking. Please put comments in this collection because YAML comments may be removed when the YAML file is modified by the tool (serialization and deserialization will remove non tracked comments). | CommentsListSpec |
disabled | Disables the data quality check. Only enabled checks are executed. The sensor should be disabled if it should not work, but the configuration of the sensor and rules should be preserved in the configuration. | boolean |
exclude_from_kpi | Data quality check results (alerts) are included in the data quality KPI calculation by default. Set this field to true in order to exclude this data quality check from the data quality KPI calculation. | boolean |
include_in_sla | Marks the data quality check as part of a data quality SLA. The data quality SLA is a set of critical data quality checks that must always pass and are considered as a data contract for the dataset. | boolean |
configured | True if the data quality check is configured (not null). When saving the data quality check configuration, set the flag to true for storing the check. | boolean |
filter | SQL WHERE clause added to the sensor query. Both the table level filter and a sensor query filter are added, separated by an AND operator. | string |
run_checks_job_template | Configured parameters for the "check run" job that should be pushed to the job queue in order to start the job. | CheckSearchFilters |
data_clean_job_template | Configured parameters for the "data clean" job that after being supplied with a time range should be pushed to the job queue in order to remove stored results connected with this check. | DeleteStoredDataQueueJobParameters |
data_grouping_configuration | The name of a data grouping configuration defined at a table that should be used for this check. | string |
check_target | Type of the check's target (column, table). | CheckTargetModel |
configuration_requirements_errors | List of configuration errors that must be fixed before the data quality check could be executed. | List[string] |
similar_checks | List of similar checks in other check types or in other time scales. | List[SimilarCheckModel] |
can_edit | Boolean flag that decides if the current user can edit the check. | boolean |
can_run_checks | Boolean flag that decides if the current user can run checks. | boolean |
can_delete_data | Boolean flag that decides if the current user can delete data (results). | boolean |
QualityCategoryModel
Model that returns the form definition and the form data to edit all checks within a single category.
The structure of this object is described below
Property name | Description | Data type |
---|---|---|
category | Data quality check category name. | string |
comparison_name | The name of the reference table configuration used for a cross table data comparison (when the category is 'comparisons'). | string |
compare_to_column | The name of the column in the reference table that is compared. | string |
help_text | Help text that describes the category. | string |
checks | List of data quality checks within the category. | List[CheckModel] |
run_checks_job_template | Configured parameters for the "check run" job that should be pushed to the job queue in order to start the job. | CheckSearchFilters |
data_clean_job_template | Configured parameters for the "data clean" job that after being supplied with a time range should be pushed to the job queue in order to remove stored results connected with this quality category. | DeleteStoredDataQueueJobParameters |
CheckContainerModel
Model that returns the form definition and the form data to edit all data quality checks divided by categories.
The structure of this object is described below
Property name | Description | Data type |
---|---|---|
categories | List of all data quality categories that contain data quality checks inside. | List[QualityCategoryModel] |
effective_schedule | Model of configured schedule enabled on the check container. | EffectiveScheduleModel |
effective_schedule_enabled_status | State of the effective scheduling on the check container. | ScheduleEnabledStatusModel |
partition_by_column | The name of the column that partitioned checks will use for the time period partitioning. Important only for partitioned checks. | string |
run_checks_job_template | Configured parameters for the "check run" job that should be pushed to the job queue in order to start the job. | CheckSearchFilters |
data_clean_job_template | Configured parameters for the "data clean" job that after being supplied with a time range should be pushed to the job queue in order to remove stored results connected with this check container | DeleteStoredDataQueueJobParameters |
can_edit | Boolean flag that decides if the current user can edit the check. | boolean |
can_run_checks | Boolean flag that decides if the current user can run checks. | boolean |
can_delete_data | Boolean flag that decides if the current user can delete data (results). | boolean |
CheckContainerTypeModel
Model identifying the check type and timescale of checks belonging to a container.
The structure of this object is described below
Property name | Description | Data type |
---|---|---|
check_type | Check type. | CheckType |
check_time_scale | Check timescale. | CheckTimeScale |
CheckTemplate
Model depicting a named data quality check that can potentially be enabled, regardless to its position in hierarchy tree.
The structure of this object is described below
Property name | Description | Data type |
---|---|---|
check_target | Check target (table, column) | CheckTarget |
check_category | Data quality check category. | string |
check_name | Data quality check name that is used in YAML. | string |
help_text | Help text that describes the data quality check. | string |
check_container_type | Check type with time-scale. | CheckContainerTypeModel |
sensor_name | Full sensor name. | string |
sensor_parameters_definitions | List of sensor parameter fields definitions. | List[ParameterDefinitionSpec] |
rule_parameters_definitions | List of threshold (alerting) rule's parameters definitions (for a single rule, regardless of severity). | List[ParameterDefinitionSpec] |
ProviderType
Data source provider type (dialect type). We will use lower case names to avoid issues with parsing, even if the enum names are not named following the Java naming convention.
The structure of this object is described below
Data type | Enum values |
---|---|
string | snowflake oracle postgresql redshift sqlserver mysql bigquery |
StatisticsCollectorTarget
The structure of this object is described below
Data type | Enum values |
---|---|
string | column table |
StatisticsCollectorSearchFilters
Hierarchy node search filters for finding enabled statistics collectors (basic profilers) to be started.
The structure of this object is described below
Property name | Description | Data type |
---|---|---|
collector_name | The target statistics collector name to capture only selected statistics. Uses the short collector nameThis field supports search patterns such as: 'prefix*', '*suffix', 'prefix_*_suffix'. In order to collect only top 10 most common column samples, use 'column_samples'. | string |
sensor_name | The target sensor name to run only data quality checks that are using this sensor. Uses the full sensor name which is the full folder path within the sensors folder. This field supports search patterns such as: 'table/volume/row_*', '*count', 'table/volume/prefix*_suffix'. | string |
collector_category | The target statistics collector category, for example: nulls, volume, sampling. | string |
target | The target type of object to collect statistics from. Supported values are: table to collect only table level statistics or column to collect only column level statistics. | StatisticsCollectorTarget |
connection | The connection (data source) name. Supports search patterns in the format: 'source*', '*_prod', 'prefix*suffix'. | string |
full_table_name | The schema and table name. It is provided as <schema_name>.<table_name>, for example public.fact_sales. The schema and table name accept patterns both in the schema name and table name parts. Sample patterns are: 'schema_name.tab_prefix_*', 'schema_name.', '.*', 'schema_name.*customer', 'schema_name.tab*_suffix'. | string |
enabled | A boolean flag to target enabled tables, columns or checks. When the value of this field is not set, the default value of this field is true, targeting only tables, columns and checks that are not implicitly disabled. | boolean |
ConnectionModel
Connection model returned by the rest api that is limited only to the basic fields, excluding nested nodes.
The structure of this object is described below
Property name | Description | Data type |
---|---|---|
connection_name | Connection name. | string |
connection_hash | Connection hash that identifies the connection using a unique hash code. | long |
parallel_runs_limit | The concurrency limit for the maximum number of parallel SQL queries executed on this connection. | integer |
provider_type | Database provider type (required). Accepts: bigquery, snowflake, etc. | ProviderType |
bigquery | BigQuery connection parameters. Specify parameters in the bigquery section. | BigQueryParametersSpec |
snowflake | Snowflake connection parameters. | SnowflakeParametersSpec |
postgresql | PostgreSQL connection parameters. | PostgresqlParametersSpec |
redshift | Redshift connection parameters. | RedshiftParametersSpec |
sqlserver | SqlServer connection parameters. | SqlServerParametersSpec |
mysql | MySQL connection parameters. | MysqlParametersSpec |
oracle | Oracle connection parameters. | OracleParametersSpec |
run_checks_job_template | Configured parameters for the "check run" job that should be pushed to the job queue in order to run all checks within this connection. | CheckSearchFilters |
run_profiling_checks_job_template | Configured parameters for the "check run" job that should be pushed to the job queue in order to run profiling checks within this connection. | CheckSearchFilters |
run_monitoring_checks_job_template | Configured parameters for the "check run" job that should be pushed to the job queue in order to run monitoring checks within this connection. | CheckSearchFilters |
run_partition_checks_job_template | Configured parameters for the "check run" job that should be pushed to the job queue in order to run partition partitioned checks within this connection. | CheckSearchFilters |
collect_statistics_job_template | Configured parameters for the "collect statistics" job that should be pushed to the job queue in order to run all statistics collectors within this connection. | StatisticsCollectorSearchFilters |
data_clean_job_template | Configured parameters for the "data clean" job that after being supplied with a time range should be pushed to the job queue in order to remove stored results connected with this connection. | DeleteStoredDataQueueJobParameters |
can_edit | Boolean flag that decides if the current user can update or delete the connection to the data source. | boolean |
can_collect_statistics | Boolean flag that decides if the current user can collect statistics. | boolean |
can_run_checks | Boolean flag that decides if the current user can run checks. | boolean |
can_delete_data | Boolean flag that decides if the current user can delete data (results). | boolean |
yaml_parsing_error | Optional parsing error that was captured when parsing the YAML file. This field is null when the YAML file is valid. If an error was captured, this field returns the file parsing error message and the file location. | string |
DqoQueueJobId
Identifies a single job.
The structure of this object is described below
Property name | Description | Data type |
---|---|---|
job_id | Job id. | long |
job_business_key | Optional job business key that was assigned to the job. A business key is an alternative user assigned unique job identifier used to find the status of a job finding it by the business key. | string |
parent_job_id | Parent job id. Filled only for nested jobs, for example a sub-job that runs data quality checks on a single table. | DqoQueueJobId |