Skip to content

jobs

CollectStatisticsResult

The structure of this object is described below

 Property name   Description                       Data type 
executed_statistics_collectors The total count of all executed statistics collectors. integer
total_collectors_executed The count of executed statistics collectors. integer
columns_analyzed The count of columns for which DQOps executed a collector and tried to read the statistics. integer
columns_successfully_analyzed The count of columns for which DQOps managed to obtain statistics. integer
total_collectors_failed The count of statistics collectors that failed to execute. integer
total_collected_results The total number of results that were collected. integer

DqoJobStatus

Job status of a job on the queue.

The structure of this object is described below

 Data type   Enum values 
string running
waiting
queued
cancelled
cancel_requested
failed
succeeded

CollectStatisticsQueueJobResult

The structure of this object is described below

 Property name   Description                       Data type 
job_id Job id that identifies a job that was started on the DQOps job queue. DqoQueueJobId
result Optional result object that is returned only when the wait parameter was true and the "collect statistics" job has finished. Contains the summary result of collecting basic statistics, including the number of statistics collectors (queries) that managed to capture metrics about the table(s). CollectStatisticsResult
status Job status DqoJobStatus

DqoRoot

DQOps root folders in the dqo use home that may be replicated to a remote file system (uploaded to DQOps Cloud or any other cloud). It is also used as a lock scope.

The structure of this object is described below

 Data type   Enum values 
string settings
_indexes
data_errors
sources
data_incidents
credentials
rules
data_sensor_readouts
data_statistics
sensors
checks
data_check_results
_local_settings

ParquetPartitionId

Identifies a single partition for hive partitioned tables stored as parquet files.

The structure of this object is described below

 Property name   Description                       Data type 
table_type Table type. DqoRoot
connection_name Connection name. string
table_name Table name (schema.table). PhysicalTableName
month The date of teh first day of the month that identifies a monthly partition. date

DataDeleteResultPartition

Results of the "data delete" job for the monthly partition.

The structure of this object is described below

 Property name   Description                       Data type 
rows_affected_count The number of rows that were deleted from the partition. integer
partition_deleted True if a whole partition (a parquet file) was deleted instead of removing only selected rows. boolean

DeleteStoredDataResult

Compiled results of the "data delete".

The structure of this object is described below

 Property name   Description                       Data type 
partition_results Dictionary of partitions that where deleted or updated when the rows were deleted. Dict[ParquetPartitionId, DataDeleteResultPartition]

DeleteStoredDataQueueJobResult

Object returned from the operation that queues a "delete stored data" job. The result contains the job id that was started and optionally can also contain a dictionary of partitions that were cleared or deleted if the operation was started with wait=true parameter to wait for the "delete stored data" job to finish.

The structure of this object is described below

 Property name   Description                       Data type 
job_id Job id that identifies a job that was started on the DQOps job queue. DqoQueueJobId
result Optional result object that is returned only when the wait parameter was true and the "delete stored data" job has finished. Contains a list of partitions that were deleted or updated. DeleteStoredDataResult
status Job status DqoJobStatus

DqoJobType

Job type that identifies a job by type.

The structure of this object is described below

 Data type   Enum values 
string delete_stored_data
import_schema
repair_stored_data
run_scheduled_checks_cron
synchronize_folder
run_checks_on_table
run_checks
collect_statistics
synchronize_multiple_folders
queue_thread_shutdown
collect_statistics_on_table
import_tables

FileSynchronizationDirection

Data synchronization direction between a local DQOps Home and DQOps Cloud data quality data warehouse.

The structure of this object is described below

 Data type   Enum values 
string download
upload
full

SynchronizeRootFolderParameters

Parameter object for starting a file synchronization job. Identifies the folder and direction that should be synchronized.

The structure of this object is described below

 Property name   Description                       Data type 
folder DqoRoot
direction FileSynchronizationDirection
force_refresh_native_table boolean

SynchronizeRootFolderDqoQueueJobParameters

Parameters object for a job that synchronizes one folder with DQOps Cloud.

The structure of this object is described below

 Property name   Description                       Data type 
synchronization_parameter SynchronizeRootFolderParameters

SynchronizeMultipleFoldersDqoQueueJobParameters

Simple object for starting multiple folder synchronization jobs with the same configuration.

The structure of this object is described below

 Property name   Description                       Data type 
direction File synchronization direction, the default is full synchronization (push local changes and pull other changes from DQOps Cloud). FileSynchronizationDirection
force_refresh_native_tables Force full refresh of native tables in the data quality data warehouse. The default synchronization mode is to refresh only modified data. boolean
detect_cron_schedules Scans the yaml files (with the configuration for connections and tables) and detects new cron schedules. Detected cron schedules are registered in the cron (Quartz) job scheduler. boolean
sources Synchronize the "sources" folder. boolean
sensors Synchronize the "sensors" folder. boolean
rules Synchronize the "rules" folder. boolean
checks Synchronize the "checks" folder. boolean
settings Synchronize the "settings" folder. boolean
credentials Synchronize the ".credentials" folder. boolean
data_sensor_readouts Synchronize the ".data/sensor_readouts" folder. boolean
data_check_results Synchronize the ".data/check_results" folder. boolean
data_statistics Synchronize the ".data/statistics" folder. boolean
data_errors Synchronize the ".data/errors" folder. boolean
data_incidents Synchronize the ".data/incidents" folder. boolean
synchronize_folder_with_local_changes Synchronize all folders that have local changes. When this field is set to true, there is no need to enable synchronization of single folders because DQOps will decide which folders need synchronization (to be pushed to the cloud). boolean

TimeWindowFilterParameters

The structure of this object is described below

 Property name   Description                       Data type 
daily_partitioning_recent_days The number of recent days to analyze incrementally by daily partitioned data quality checks. integer
daily_partitioning_include_today Analyze also today and later days when running daily partitioned checks. By default, daily partitioned checks will not analyze today and future dates. Setting true will disable filtering the end dates. boolean
monthly_partitioning_recent_months The number of recent months to analyze incrementally by monthly partitioned data quality checks. integer
monthly_partitioning_include_current_month Analyze also the current month and later months when running monthly partitioned checks. By default, monthly partitioned checks will not analyze the current month and future months. Setting true will disable filtering the end dates. boolean
from_date Analyze the data since the given date (inclusive). The date should be an ISO 8601 date (yyyy-MM-dd). The analyzed table must have the timestamp column properly configured, it is the column that is used for filtering the date and time ranges. Setting the beginning date overrides recent days and recent months. date
from_date_time Analyze the data since the given date and time (inclusive). The date and time should be an ISO 8601 local date and time without the time zone (yyyy-MM-dd HH🇲🇲ss). The analyzed table must have the timestamp column properly configured, it is the column that is used for filtering the date and time ranges. Setting the beginning date and time overrides recent days and recent months. datetime
to_date Analyze the data until the given date (exclusive, the given date and the following dates are not analyzed). The date should be an ISO 8601 date (YYYY-MM-DD). The analyzed table must have the timestamp column properly configured, it is the column that is used for filtering the date and time ranges. Setting the end date overrides the parameters to disable analyzing today or the current month. date
to_date_time Analyze the data until the given date and time (exclusive). The date should be an ISO 8601 date (yyyy-MM-dd). The analyzed table must have the timestamp column properly configured, it is the column that is used for filtering the date and time ranges. Setting the end date and time overrides the parameters to disable analyzing today or the current month. datetime

RunChecksResult

The structure of this object is described below

 Property name   Description                       Data type 
highest_severity The highest check severity for the data quality checks executed in this batch. RuleSeverityLevel
executed_checks The total count of all executed checks. integer
valid_results The total count of all checks that finished successfully (with no data quality issues). integer
warnings The total count of all invalid data quality checks that finished raising a warning. integer
errors The total count of all invalid data quality checks that finished raising an error. integer
fatals The total count of all invalid data quality checks that finished raising a fatal error. integer
execution_errors The total number of checks that failed to execute due to some execution errors. integer

RunChecksParameters

The structure of this object is described below

 Property name   Description                       Data type 
check_search_filters Target data quality checks filter. CheckSearchFilters
time_window_filter Optional time window filter, configures the time range that is analyzed or the number of recent days/months to analyze for day or month partitioned data. TimeWindowFilterParameters
dummy_execution Set the value to true when the data quality checks should be executed in a dummy mode (without running checks on the target systems and storing the results). Only the jinja2 sensors will be rendered. boolean
run_checks_result The result of running the check, updated when the run checks job finishes. Contains the count of executed checks. RunChecksResult

RunChecksOnTableParameters

The structure of this object is described below

 Property name   Description                       Data type 
connection The name of the target connection. string
max_jobs_per_connection The maximum number of concurrent 'run checks on table' jobs that could be run on this connection. Limits the number of concurrent jobs. integer
table The full physical name (schema.table) of the target table. PhysicalTableName
check_search_filters Target data quality checks filter. CheckSearchFilters
time_window_filter Optional time window filter, configures the time range that is analyzed or the number of recent days/months to analyze for day or month partitioned data. TimeWindowFilterParameters
dummy_execution Set the value to true when the data quality checks should be executed in a dummy mode (without running checks on the target systems and storing the results). Only the jinja2 sensors will be rendered. boolean
run_checks_result The result of running the check, updated when the run checks job finishes. Contains the count of executed checks. RunChecksResult

StatisticsDataScope

Enumeration of possible statistics scopes. "table" - a whole table was profiled, "data_groupings" - groups of rows were profiled.

The structure of this object is described below

 Data type   Enum values 
string data_group
table

CollectStatisticsQueueJobParameters

The structure of this object is described below

 Property name   Description                       Data type 
statistics_collector_search_filters Statistics collectors search filters that identify the type of statistics collector to run. StatisticsCollectorSearchFilters
data_scope The target scope of collecting statistics. Statistics could be collected on a whole table or for each data grouping separately. StatisticsDataScope
dummy_sensor_execution Boolean flag that enables a dummy statistics collection (sensors are executed, but the statistics results are not written to the parquet files). boolean
collect_statistics_result The summary of the statistics collection job after if finished. Returns the number of collectors analyzed, columns analyzed, statistics results captured. CollectStatisticsResult

CollectStatisticsOnTableQueueJobParameters

The structure of this object is described below

 Property name   Description                       Data type 
connection The name of the target connection. string
max_jobs_per_connection The maximum number of concurrent 'run checks on table' jobs that could be run on this connection. Limits the number of concurrent jobs. integer
table The full physical name (schema.table) of the target table. PhysicalTableName
statistics_collector_search_filters Statistics collectors search filters that identify the type of statistics collector to run. StatisticsCollectorSearchFilters
data_scope The target scope of collecting statistics. Statistics could be collected on a whole table or for each data grouping separately. StatisticsDataScope
dummy_sensor_execution Boolean flag that enables a dummy statistics collection (sensors are executed, but the statistics results are not written to the parquet files). boolean
collect_statistics_result The summary of the statistics collection job after if finished. Returns the number of collectors analyzed, columns analyzed, statistics results captured. CollectStatisticsResult

ImportSchemaQueueJobParameters

Parameters for the {@link ImportSchemaQueueJob ImportSchemaQueueJob} job that imports tables from a database.

The structure of this object is described below

 Property name   Description                       Data type 
connection_name string
schema_name string
table_name_pattern string

ImportTablesQueueJobParameters

Parameters for the {@link ImportTablesQueueJob ImportTablesQueueJob} job that imports selected tables from the source database.

The structure of this object is described below

 Property name   Description                       Data type 
connection_name Connection name string
schema_name Schema name string
table_names Optional list of table names inside the schema. When the list of tables is empty, all tables are imported. List[string]

RepairStoredDataQueueJobParameters

Parameters for the {@link RepairStoredDataQueueJob RepairStoredDataQueueJob} job that repairs data stored in user's ".data" directory.

The structure of this object is described below

 Property name   Description                       Data type 
connection_name string
schema_table_name string
repair_errors boolean
repair_statistics boolean
repair_check_results boolean
repair_sensor_readouts boolean

DqoJobEntryParametersModel

Model object returned to UI that has typed fields for each supported job parameter type.

The structure of this object is described below

 Property name   Description                       Data type 
synchronize_root_folder_parameters SynchronizeRootFolderDqoQueueJobParameters
synchronize_multiple_folders_parameters SynchronizeMultipleFoldersDqoQueueJobParameters
run_scheduled_checks_parameters MonitoringScheduleSpec
run_checks_parameters RunChecksParameters
run_checks_on_table_parameters RunChecksOnTableParameters
collect_statistics_parameters CollectStatisticsQueueJobParameters
collect_statistics_on_table_parameters CollectStatisticsOnTableQueueJobParameters
import_schema_parameters ImportSchemaQueueJobParameters
import_table_parameters ImportTablesQueueJobParameters
delete_stored_data_parameters DeleteStoredDataQueueJobParameters
repair_stored_data_parameters RepairStoredDataQueueJobParameters

DqoJobHistoryEntryModel

Model of a single job that was scheduled or has finished. It is stored in the job monitoring service on the history list.

The structure of this object is described below

 Property name   Description                       Data type 
job_id DqoQueueJobId
job_type DqoJobType
parameters DqoJobEntryParametersModel
status DqoJobStatus
error_message string

DqoJobChangeModel

Describes a change to the job status or the job queue (such as a new job was added).

The structure of this object is described below

 Property name   Description                       Data type 
status DqoJobStatus
job_id DqoQueueJobId
change_sequence long
updated_model DqoJobHistoryEntryModel

FolderSynchronizationStatus

Enumeration of statuses that identify the synchronization status for each folder that could be synchronized to DQOps Cloud.

The structure of this object is described below

 Data type   Enum values 
string synchronizing
unchanged
changed

CloudSynchronizationFoldersStatusModel

Model that describes the current synchronization status for each folder.

The structure of this object is described below

 Property name   Description                       Data type 
sources The synchronization status of the "sources" folder. FolderSynchronizationStatus
sensors The synchronization status of the "sensors" folder. FolderSynchronizationStatus
rules The synchronization status of the "rules" folder. FolderSynchronizationStatus
checks The synchronization status of the "checks" folder. FolderSynchronizationStatus
settings The synchronization status of the "settings" folder. FolderSynchronizationStatus
credentials The synchronization status of the ".credentials" folder. FolderSynchronizationStatus
data_sensor_readouts The synchronization status of the ".data/sensor_readouts" folder. FolderSynchronizationStatus
data_check_results The synchronization status of the ".data/check_results" folder. FolderSynchronizationStatus
data_statistics The synchronization status of the ".data/statistics" folder. FolderSynchronizationStatus
data_errors The synchronization status of the ".data/errors" folder. FolderSynchronizationStatus
data_incidents The synchronization status of the ".data/incidents" folder. FolderSynchronizationStatus

DqoJobQueueIncrementalSnapshotModel

Job history snapshot model that returns only changes after a given change sequence.

The structure of this object is described below

 Property name   Description                       Data type 
job_changes List[DqoJobChangeModel]
folder_synchronization_status CloudSynchronizationFoldersStatusModel
last_sequence_number long

DqoJobQueueInitialSnapshotModel

Returns the current snapshot of running jobs.

The structure of this object is described below

 Property name   Description                       Data type 
jobs List[DqoJobHistoryEntryModel]
folder_synchronization_status CloudSynchronizationFoldersStatusModel
last_sequence_number long

ImportTablesResult

Result object from the {@link ImportTablesQueueJob ImportTablesQueueJob} table import job that returns list of tables that have been imported.

The structure of this object is described below

 Property name   Description                       Data type 
source_table_specs Table schemas (including column schemas) of imported tables. List[TableSpec]

ImportTablesQueueJobResult

Object returned from the operation that queues a "import tables" job. The result contains the job id that was started and optionally can also contain the result of importing tables if the operation was started with wait=true parameter to wait for the "import tables" job to finish.

The structure of this object is described below

 Property name   Description                       Data type 
job_id Job id that identifies a job that was started on the DQOps job queue. DqoQueueJobId
result Optional result object that is returned only when the wait parameter was true and the "import tables" job has finished. Contains the summary result of importing tables, including table and column schemas of imported tables. ImportTablesResult
status Job status DqoJobStatus

RunChecksQueueJobResult

The structure of this object is described below

 Property name   Description                       Data type 
job_id Job id that identifies a job that was started on the DQOps job queue. DqoQueueJobId
result Optional result object that is returned only when the wait parameter was true and the "run checks" job has finished. Contains the summary result of the data quality checks executed, including the severity of the most severe issue detected. The calling code (the data pipeline) can decide if further processing should be continued. RunChecksResult
status Job status DqoJobStatus

SynchronizeMultipleFoldersQueueJobResult

Object returned from the operation that queues a "synchronize multiple folders" job. The result contains the job id that was started and optionally can also contain the job finish status if the operation was started with wait=true parameter to wait for the "synchronize multiple folders" job to finish.

The structure of this object is described below

 Property name   Description                       Data type 
job_id Job id that identifies a job that was started on the DQOps job queue. DqoQueueJobId
status Job status DqoJobStatus