Skip to content

Last updated: September 18, 2024

dqo collect command-line command

The reference of the collect command in DQOps. Commands related to collecting statistics and samples


dqo collect errorsamples

Run data quality checks that match a given condition and collects error samples

Description

Run data quality checks on your dataset that match a given condition and capture their error samples. The command output is a table with the results that provides insight into the list of invalid values.

Command-line synopsis

$ dqo [dqo options...] collect errorsamples [-deh] [--daily-partitioning-include-today] [-fe] [-fw]
                      [-hl] [--monthly-partitioning-include-current-month]
                      [-c=<connection>] [-cat=<checkCategory>] [-ch=<check>]
                      [-col=<column>] [-ct=<checkType>]
                      [--daily-partitioning-recent-days=<dailyPartitioningRecent
                      Days>] [--from-date=<fromDate>]
                      [--from-date-time=<fromDateTime>]
                      [--from-date-time-offset=<fromDateTimeOffset>]
                      [-m=<mode>]
                      [--monthly-partitioning-recent-months=<monthlyPartitioning
                      RecentMonths>] [-of=<outputFormat>] [-s=<sensor>]
                      [-sc=<scope>] [-t=<table>] [--to-date=<toDate>]
                      [--to-date-time=<toDateTime>]
                      [--to-date-time-offset=<toDateTimeOffset>]
                      [-ts=<timeScale>] [--where-filter=<whereFilter>]
                      [-l=<labels>]... [-tag=<tags>]...

DQOps shell synopsis

dqo> collect errorsamples [-deh] [--daily-partitioning-include-today] [-fe] [-fw]
                      [-hl] [--monthly-partitioning-include-current-month]
                      [-c=<connection>] [-cat=<checkCategory>] [-ch=<check>]
                      [-col=<column>] [-ct=<checkType>]
                      [--daily-partitioning-recent-days=<dailyPartitioningRecent
                      Days>] [--from-date=<fromDate>]
                      [--from-date-time=<fromDateTime>]
                      [--from-date-time-offset=<fromDateTimeOffset>]
                      [-m=<mode>]
                      [--monthly-partitioning-recent-months=<monthlyPartitioning
                      RecentMonths>] [-of=<outputFormat>] [-s=<sensor>]
                      [-sc=<scope>] [-t=<table>] [--to-date=<toDate>]
                      [--to-date-time=<toDateTime>]
                      [--to-date-time-offset=<toDateTimeOffset>]
                      [-ts=<timeScale>] [--where-filter=<whereFilter>]
                      [-l=<labels>]... [-tag=<tags>]...

Command options

All parameters supported by the command are listed below.

Command argument     Description Required Accepted values
-cat
--category
Check category name (volume, nulls, numeric, etc.)
-ch
--check
Data quality check name, supports patterns like '*_id'
-ct
--check-type
Data quality check type (profiling, monitoring, partitioned) profiling
monitoring
partitioned
-col
--column
Column name, supports patterns like '*_id'
-c
--connection
Connection name, supports patterns like 'conn*'
--daily-partitioning-include-today
Analyze also today and later days when running daily partitioned checks. By default, daily partitioned checks will not analyze today and future dates. Setting true will disable filtering the end dates.
--daily-partitioning-recent-days
The number of recent days to analyze incrementally by daily partitioned data quality checks.
-tag
--data-grouping-level-tag
Data grouping hierarchy level filter (tag)
-d
--dummy
Runs data quality check in a dummy mode, sensors are not executed on the target database, but the rest of the process is performed
-e
--enabled
Runs only enabled or only disabled sensors, by default only enabled sensors are executed
-fe
--fail-on-execution-errors
Returns a command status code 4 (when called from the command line) if any execution errors were raised during the execution, the default value is true.
-fw
--file-write
Write command response to a file
--from-date
Analyze the data since the given date (inclusive). The date should be an ISO 8601 date (yyyy-MM-dd). The analyzed table must have the timestamp column properly configured, it is the column that is used for filtering the date and time ranges. Setting the beginning date overrides recent days and recent months.
--from-date-time
Analyze the data since the given date and time (inclusive). The date and time should be an ISO 8601 local date and time without the time zone (yyyy-MM-dd HH:mm:ss). The analyzed table must have the timestamp column properly configured, it is the column that is used for filtering the date and time ranges. Setting the beginning date and time overrides recent days and recent months.
--from-date-time-offset
Analyze the data since the given date and time with a time zone offset (inclusive). The date and time should be an ISO 8601 date and time followed by a time zone offset (yyyy-MM-dd HH:mm:ss). For example: 2023-02-20 14:10:00+02. The analyzed table must have the timestamp column properly configured, it is the column that is used for filtering the date and time ranges. Setting the beginning date and time overrides recent days and recent months.
-t
--table
--full-table-name
Full table name (schema.table), supports wildcard patterns 'sch.tab'
--headless
-hl
Starts DQOps in a headless mode. When DQOps runs in a headless mode and the application cannot start because the DQOps Cloud API key is missing or the DQOps user home folder is not configured, DQOps will stop silently instead of asking the user to approve the setup of the DQOps user home folder structure and/or log into DQOps Cloud.
-h
--help
Show the help for the command and parameters
-l
--label
Label filter
-m
--mode
Reporting mode (silent) silent
--monthly-partitioning-include-current-month
Analyze also the current month and later months when running monthly partitioned checks. By default, monthly partitioned checks will not analyze the current month and future months. Setting true will disable filtering the end dates.
--monthly-partitioning-recent-months
The number of recent months to analyze incrementally by monthly partitioned data quality checks.
-of
--output-format
Output format for tabular responses TABLE
CSV
JSON
-sc
--scope
Error sampling scope that is used for tables with data grouping. Error samples can be collected for the whole table, or for each data grouping. By default, collects error samples for a whole table. table
data_group
-s
--sensor
Data quality sensor name (sensor definition or sensor name), supports patterns like 'table/validity/*'
-ts
--time-scale
Time scale for monitoring and partitioned checks (daily, monthly, etc.) daily
monthly
--to-date
Analyze the data until the given date (exclusive, the given date and the following dates are not analyzed). The date should be an ISO 8601 date (YYYY-MM-DD). The analyzed table must have the timestamp column properly configured, it is the column that is used for filtering the date and time ranges. Setting the end date overrides the parameters to disable analyzing today or the current month.
--to-date-time
Analyze the data until the given date and time (exclusive). The date should be an ISO 8601 date (yyyy-MM-dd). The analyzed table must have the timestamp column properly configured, it is the column that is used for filtering the date and time ranges. Setting the end date and time overrides the parameters to disable analyzing today or the current month.
--to-date-time-offset
Analyze the data until the given date and time with a time zone offset (exclusive). The date and time should be an ISO 8601 date and time followed by a time zone offset (yyyy-MM-dd HH:mm:ss). For example: 2023-02-20 14:10:00+02. The analyzed table must have the timestamp column properly configured, it is the column that is used for filtering the date and time ranges. Setting the end date and time overrides the parameters to disable analyzing today or the current month.
--where-filter
An additional filter which must be a valid SQL predicate (an SQL expression that returns 'true' or 'false') that is added to the WHERE clause of the SQL query that DQOps will run on the data source. The purpose of a custom filter is to analyze only a subset of data, for example, when a new batch of records is loaded, and the data quality checks are evaluated as a data contract. All the records in that batch must tagged with the same value, and the passed predicate to find records from that batch would use the filter in the form: "{alias}.batch_id = 1". The filter can use replacement tokens {alias} to reference the analyzed table.