jobs
CollectStatisticsResult
The structure of this object is described below
Property name | Description | Data type |
---|---|---|
executed_statistics_collectors | The total count of all executed statistics collectors. | integer |
total_collectors_executed | The count of executed statistics collectors. | integer |
columns_analyzed | The count of columns for which DQOps executed a collector and tried to read the statistics. | integer |
columns_successfully_analyzed | The count of columns for which DQOps managed to obtain statistics. | integer |
total_collectors_failed | The count of statistics collectors that failed to execute. | integer |
total_collected_results | The total number of results that were collected. | integer |
DqoJobStatus
Job status of a job on the queue.
The structure of this object is described below
Data type | Enum values |
---|---|
string | running waiting queued cancelled cancel_requested failed succeeded |
CollectStatisticsQueueJobResult
The structure of this object is described below
Property name | Description | Data type |
---|---|---|
job_id | Job id that identifies a job that was started on the DQOps job queue. | DqoQueueJobId |
result | Optional result object that is returned only when the wait parameter was true and the "collect statistics" job has finished. Contains the summary result of collecting basic statistics, including the number of statistics collectors (queries) that managed to capture metrics about the table(s). | CollectStatisticsResult |
status | Job status | DqoJobStatus |
DqoRoot
DQOps root folders in the dqo use home that may be replicated to a remote file system (uploaded to DQOps Cloud or any other cloud). It is also used as a lock scope.
The structure of this object is described below
Data type | Enum values |
---|---|
string | settings _indexes data_errors sources data_incidents credentials rules data_sensor_readouts data_statistics sensors checks data_check_results _local_settings |
ParquetPartitionId
Identifies a single partition for hive partitioned tables stored as parquet files.
The structure of this object is described below
Property name | Description | Data type |
---|---|---|
table_type | Table type. | DqoRoot |
connection_name | Connection name. | string |
table_name | Table name (schema.table). | PhysicalTableName |
month | The date of teh first day of the month that identifies a monthly partition. | date |
DataDeleteResultPartition
Results of the "data delete" job for the monthly partition.
The structure of this object is described below
Property name | Description | Data type |
---|---|---|
rows_affected_count | The number of rows that were deleted from the partition. | integer |
partition_deleted | True if a whole partition (a parquet file) was deleted instead of removing only selected rows. | boolean |
DeleteStoredDataResult
Compiled results of the "data delete".
The structure of this object is described below
Property name | Description | Data type |
---|---|---|
partition_results | Dictionary of partitions that where deleted or updated when the rows were deleted. | Dict[ParquetPartitionId, DataDeleteResultPartition] |
DeleteStoredDataQueueJobResult
Object returned from the operation that queues a "delete stored data" job. The result contains the job id that was started and optionally can also contain a dictionary of partitions that were cleared or deleted if the operation was started with wait=true parameter to wait for the "delete stored data" job to finish.
The structure of this object is described below
Property name | Description | Data type |
---|---|---|
job_id | Job id that identifies a job that was started on the DQOps job queue. | DqoQueueJobId |
result | Optional result object that is returned only when the wait parameter was true and the "delete stored data" job has finished. Contains a list of partitions that were deleted or updated. | DeleteStoredDataResult |
status | Job status | DqoJobStatus |
DqoJobType
Job type that identifies a job by type.
The structure of this object is described below
Data type | Enum values |
---|---|
string | delete_stored_data import_schema repair_stored_data run_scheduled_checks_cron synchronize_folder run_checks_on_table run_checks collect_statistics synchronize_multiple_folders queue_thread_shutdown collect_statistics_on_table import_tables |
FileSynchronizationDirection
Data synchronization direction between a local DQOps Home and DQOps Cloud data quality data warehouse.
The structure of this object is described below
Data type | Enum values |
---|---|
string | download upload full |
SynchronizeRootFolderParameters
Parameter object for starting a file synchronization job. Identifies the folder and direction that should be synchronized.
The structure of this object is described below
Property name | Description | Data type |
---|---|---|
folder | DqoRoot | |
direction | FileSynchronizationDirection | |
force_refresh_native_table | boolean |
SynchronizeRootFolderDqoQueueJobParameters
Parameters object for a job that synchronizes one folder with DQOps Cloud.
The structure of this object is described below
Property name | Description | Data type |
---|---|---|
synchronization_parameter | SynchronizeRootFolderParameters |
SynchronizeMultipleFoldersDqoQueueJobParameters
Simple object for starting multiple folder synchronization jobs with the same configuration.
The structure of this object is described below
Property name | Description | Data type |
---|---|---|
direction | File synchronization direction, the default is full synchronization (push local changes and pull other changes from DQOps Cloud). | FileSynchronizationDirection |
force_refresh_native_tables | Force full refresh of native tables in the data quality data warehouse. The default synchronization mode is to refresh only modified data. | boolean |
detect_cron_schedules | Scans the yaml files (with the configuration for connections and tables) and detects new cron schedules. Detected cron schedules are registered in the cron (Quartz) job scheduler. | boolean |
sources | Synchronize the "sources" folder. | boolean |
sensors | Synchronize the "sensors" folder. | boolean |
rules | Synchronize the "rules" folder. | boolean |
checks | Synchronize the "checks" folder. | boolean |
settings | Synchronize the "settings" folder. | boolean |
credentials | Synchronize the ".credentials" folder. | boolean |
data_sensor_readouts | Synchronize the ".data/sensor_readouts" folder. | boolean |
data_check_results | Synchronize the ".data/check_results" folder. | boolean |
data_statistics | Synchronize the ".data/statistics" folder. | boolean |
data_errors | Synchronize the ".data/errors" folder. | boolean |
data_incidents | Synchronize the ".data/incidents" folder. | boolean |
synchronize_folder_with_local_changes | Synchronize all folders that have local changes. When this field is set to true, there is no need to enable synchronization of single folders because DQOps will decide which folders need synchronization (to be pushed to the cloud). | boolean |
TimeWindowFilterParameters
The structure of this object is described below
Property name | Description | Data type |
---|---|---|
daily_partitioning_recent_days | The number of recent days to analyze incrementally by daily partitioned data quality checks. | integer |
daily_partitioning_include_today | Analyze also today and later days when running daily partitioned checks. By default, daily partitioned checks will not analyze today and future dates. Setting true will disable filtering the end dates. | boolean |
monthly_partitioning_recent_months | The number of recent months to analyze incrementally by monthly partitioned data quality checks. | integer |
monthly_partitioning_include_current_month | Analyze also the current month and later months when running monthly partitioned checks. By default, monthly partitioned checks will not analyze the current month and future months. Setting true will disable filtering the end dates. | boolean |
from_date | Analyze the data since the given date (inclusive). The date should be an ISO 8601 date (yyyy-MM-dd). The analyzed table must have the timestamp column properly configured, it is the column that is used for filtering the date and time ranges. Setting the beginning date overrides recent days and recent months. | date |
from_date_time | Analyze the data since the given date and time (inclusive). The date and time should be an ISO 8601 local date and time without the time zone (yyyy-MM-dd HH |
datetime |
to_date | Analyze the data until the given date (exclusive, the given date and the following dates are not analyzed). The date should be an ISO 8601 date (YYYY-MM-DD). The analyzed table must have the timestamp column properly configured, it is the column that is used for filtering the date and time ranges. Setting the end date overrides the parameters to disable analyzing today or the current month. | date |
to_date_time | Analyze the data until the given date and time (exclusive). The date should be an ISO 8601 date (yyyy-MM-dd). The analyzed table must have the timestamp column properly configured, it is the column that is used for filtering the date and time ranges. Setting the end date and time overrides the parameters to disable analyzing today or the current month. | datetime |
RunChecksResult
The structure of this object is described below
Property name | Description | Data type |
---|---|---|
highest_severity | The highest check severity for the data quality checks executed in this batch. | RuleSeverityLevel |
executed_checks | The total count of all executed checks. | integer |
valid_results | The total count of all checks that finished successfully (with no data quality issues). | integer |
warnings | The total count of all invalid data quality checks that finished raising a warning. | integer |
errors | The total count of all invalid data quality checks that finished raising an error. | integer |
fatals | The total count of all invalid data quality checks that finished raising a fatal error. | integer |
execution_errors | The total number of checks that failed to execute due to some execution errors. | integer |
RunChecksParameters
The structure of this object is described below
Property name | Description | Data type |
---|---|---|
check_search_filters | Target data quality checks filter. | CheckSearchFilters |
time_window_filter | Optional time window filter, configures the time range that is analyzed or the number of recent days/months to analyze for day or month partitioned data. | TimeWindowFilterParameters |
dummy_execution | Set the value to true when the data quality checks should be executed in a dummy mode (without running checks on the target systems and storing the results). Only the jinja2 sensors will be rendered. | boolean |
run_checks_result | The result of running the check, updated when the run checks job finishes. Contains the count of executed checks. | RunChecksResult |
RunChecksOnTableParameters
The structure of this object is described below
Property name | Description | Data type |
---|---|---|
connection | The name of the target connection. | string |
max_jobs_per_connection | The maximum number of concurrent 'run checks on table' jobs that could be run on this connection. Limits the number of concurrent jobs. | integer |
table | The full physical name (schema.table) of the target table. | PhysicalTableName |
check_search_filters | Target data quality checks filter. | CheckSearchFilters |
time_window_filter | Optional time window filter, configures the time range that is analyzed or the number of recent days/months to analyze for day or month partitioned data. | TimeWindowFilterParameters |
dummy_execution | Set the value to true when the data quality checks should be executed in a dummy mode (without running checks on the target systems and storing the results). Only the jinja2 sensors will be rendered. | boolean |
run_checks_result | The result of running the check, updated when the run checks job finishes. Contains the count of executed checks. | RunChecksResult |
StatisticsDataScope
Enumeration of possible statistics scopes. "table" - a whole table was profiled, "data_groupings" - groups of rows were profiled.
The structure of this object is described below
Data type | Enum values |
---|---|
string | data_group table |
CollectStatisticsQueueJobParameters
The structure of this object is described below
Property name | Description | Data type |
---|---|---|
statistics_collector_search_filters | Statistics collectors search filters that identify the type of statistics collector to run. | StatisticsCollectorSearchFilters |
data_scope | The target scope of collecting statistics. Statistics could be collected on a whole table or for each data grouping separately. | StatisticsDataScope |
dummy_sensor_execution | Boolean flag that enables a dummy statistics collection (sensors are executed, but the statistics results are not written to the parquet files). | boolean |
collect_statistics_result | The summary of the statistics collection job after if finished. Returns the number of collectors analyzed, columns analyzed, statistics results captured. | CollectStatisticsResult |
CollectStatisticsOnTableQueueJobParameters
The structure of this object is described below
Property name | Description | Data type |
---|---|---|
connection | The name of the target connection. | string |
max_jobs_per_connection | The maximum number of concurrent 'run checks on table' jobs that could be run on this connection. Limits the number of concurrent jobs. | integer |
table | The full physical name (schema.table) of the target table. | PhysicalTableName |
statistics_collector_search_filters | Statistics collectors search filters that identify the type of statistics collector to run. | StatisticsCollectorSearchFilters |
data_scope | The target scope of collecting statistics. Statistics could be collected on a whole table or for each data grouping separately. | StatisticsDataScope |
dummy_sensor_execution | Boolean flag that enables a dummy statistics collection (sensors are executed, but the statistics results are not written to the parquet files). | boolean |
collect_statistics_result | The summary of the statistics collection job after if finished. Returns the number of collectors analyzed, columns analyzed, statistics results captured. | CollectStatisticsResult |
ImportSchemaQueueJobParameters
Parameters for the {@link ImportSchemaQueueJob ImportSchemaQueueJob} job that imports tables from a database.
The structure of this object is described below
Property name | Description | Data type |
---|---|---|
connection_name | string | |
schema_name | string | |
table_name_pattern | string |
ImportTablesQueueJobParameters
Parameters for the {@link ImportTablesQueueJob ImportTablesQueueJob} job that imports selected tables from the source database.
The structure of this object is described below
Property name | Description | Data type |
---|---|---|
connection_name | Connection name | string |
schema_name | Schema name | string |
table_names | Optional list of table names inside the schema. When the list of tables is empty, all tables are imported. | List[string] |
RepairStoredDataQueueJobParameters
Parameters for the {@link RepairStoredDataQueueJob RepairStoredDataQueueJob} job that repairs data stored in user's ".data" directory.
The structure of this object is described below
Property name | Description | Data type |
---|---|---|
connection_name | string | |
schema_table_name | string | |
repair_errors | boolean | |
repair_statistics | boolean | |
repair_check_results | boolean | |
repair_sensor_readouts | boolean |
DqoJobEntryParametersModel
Model object returned to UI that has typed fields for each supported job parameter type.
The structure of this object is described below
DqoJobHistoryEntryModel
Model of a single job that was scheduled or has finished. It is stored in the job monitoring service on the history list.
The structure of this object is described below
Property name | Description | Data type |
---|---|---|
job_id | DqoQueueJobId | |
job_type | DqoJobType | |
parameters | DqoJobEntryParametersModel | |
status | DqoJobStatus | |
error_message | string |
DqoJobChangeModel
Describes a change to the job status or the job queue (such as a new job was added).
The structure of this object is described below
Property name | Description | Data type |
---|---|---|
status | DqoJobStatus | |
job_id | DqoQueueJobId | |
change_sequence | long | |
updated_model | DqoJobHistoryEntryModel |
FolderSynchronizationStatus
Enumeration of statuses that identify the synchronization status for each folder that could be synchronized to DQOps Cloud.
The structure of this object is described below
Data type | Enum values |
---|---|
string | synchronizing unchanged changed |
CloudSynchronizationFoldersStatusModel
Model that describes the current synchronization status for each folder.
The structure of this object is described below
Property name | Description | Data type |
---|---|---|
sources | The synchronization status of the "sources" folder. | FolderSynchronizationStatus |
sensors | The synchronization status of the "sensors" folder. | FolderSynchronizationStatus |
rules | The synchronization status of the "rules" folder. | FolderSynchronizationStatus |
checks | The synchronization status of the "checks" folder. | FolderSynchronizationStatus |
settings | The synchronization status of the "settings" folder. | FolderSynchronizationStatus |
credentials | The synchronization status of the ".credentials" folder. | FolderSynchronizationStatus |
data_sensor_readouts | The synchronization status of the ".data/sensor_readouts" folder. | FolderSynchronizationStatus |
data_check_results | The synchronization status of the ".data/check_results" folder. | FolderSynchronizationStatus |
data_statistics | The synchronization status of the ".data/statistics" folder. | FolderSynchronizationStatus |
data_errors | The synchronization status of the ".data/errors" folder. | FolderSynchronizationStatus |
data_incidents | The synchronization status of the ".data/incidents" folder. | FolderSynchronizationStatus |
DqoJobQueueIncrementalSnapshotModel
Job history snapshot model that returns only changes after a given change sequence.
The structure of this object is described below
Property name | Description | Data type |
---|---|---|
job_changes | List[DqoJobChangeModel] | |
folder_synchronization_status | CloudSynchronizationFoldersStatusModel | |
last_sequence_number | long |
DqoJobQueueInitialSnapshotModel
Returns the current snapshot of running jobs.
The structure of this object is described below
Property name | Description | Data type |
---|---|---|
jobs | List[DqoJobHistoryEntryModel] | |
folder_synchronization_status | CloudSynchronizationFoldersStatusModel | |
last_sequence_number | long |
ImportTablesResult
Result object from the {@link ImportTablesQueueJob ImportTablesQueueJob} table import job that returns list of tables that have been imported.
The structure of this object is described below
Property name | Description | Data type |
---|---|---|
source_table_specs | Table schemas (including column schemas) of imported tables. | List[TableSpec] |
ImportTablesQueueJobResult
Object returned from the operation that queues a "import tables" job. The result contains the job id that was started and optionally can also contain the result of importing tables if the operation was started with wait=true parameter to wait for the "import tables" job to finish.
The structure of this object is described below
Property name | Description | Data type |
---|---|---|
job_id | Job id that identifies a job that was started on the DQOps job queue. | DqoQueueJobId |
result | Optional result object that is returned only when the wait parameter was true and the "import tables" job has finished. Contains the summary result of importing tables, including table and column schemas of imported tables. | ImportTablesResult |
status | Job status | DqoJobStatus |
RunChecksQueueJobResult
The structure of this object is described below
Property name | Description | Data type |
---|---|---|
job_id | Job id that identifies a job that was started on the DQOps job queue. | DqoQueueJobId |
result | Optional result object that is returned only when the wait parameter was true and the "run checks" job has finished. Contains the summary result of the data quality checks executed, including the severity of the most severe issue detected. The calling code (the data pipeline) can decide if further processing should be continued. | RunChecksResult |
status | Job status | DqoJobStatus |
SynchronizeMultipleFoldersQueueJobResult
Object returned from the operation that queues a "synchronize multiple folders" job. The result contains the job id that was started and optionally can also contain the job finish status if the operation was started with wait=true parameter to wait for the "synchronize multiple folders" job to finish.
The structure of this object is described below
Property name | Description | Data type |
---|---|---|
job_id | Job id that identifies a job that was started on the DQOps job queue. | DqoQueueJobId |
status | Job status | DqoJobStatus |