Last updated: September 18, 2024

dqo collect command-line command

The reference of the collect command in DQOps. Commands related to collecting statistics and samples

dqo collect errorsamples

Run data quality checks that match a given condition and collects error samples

Description

Run data quality checks on your dataset that match a given condition and capture their error samples. The command output is a table with the results that provides insight into the list of invalid values.

Command-line synopsis

$ dqo [dqo options...] collect errorsamples [-deh] [--daily-partitioning-include-today] [-fe] [-fw]
                      [-hl] [--monthly-partitioning-include-current-month]
                      [-c=<connection>] [-cat=<checkCategory>] [-ch=<check>]
                      [-col=<column>] [-ct=<checkType>]
                      [--daily-partitioning-recent-days=<dailyPartitioningRecent
                      Days>] [--from-date=<fromDate>]
                      [--from-date-time=<fromDateTime>]
                      [--from-date-time-offset=<fromDateTimeOffset>]
                      [-m=<mode>]
                      [--monthly-partitioning-recent-months=<monthlyPartitioning
                      RecentMonths>] [-of=<outputFormat>] [-s=<sensor>]
                      [-sc=<scope>] [-t=<table>] [--to-date=<toDate>]
                      [--to-date-time=<toDateTime>]
                      [--to-date-time-offset=<toDateTimeOffset>]
                      [-ts=<timeScale>] [--where-filter=<whereFilter>]
                      [-l=<labels>]... [-tag=<tags>]...

DQOps shell synopsis

dqo> collect errorsamples [-deh] [--daily-partitioning-include-today] [-fe] [-fw]
                      [-hl] [--monthly-partitioning-include-current-month]
                      [-c=<connection>] [-cat=<checkCategory>] [-ch=<check>]
                      [-col=<column>] [-ct=<checkType>]
                      [--daily-partitioning-recent-days=<dailyPartitioningRecent
                      Days>] [--from-date=<fromDate>]
                      [--from-date-time=<fromDateTime>]
                      [--from-date-time-offset=<fromDateTimeOffset>]
                      [-m=<mode>]
                      [--monthly-partitioning-recent-months=<monthlyPartitioning
                      RecentMonths>] [-of=<outputFormat>] [-s=<sensor>]
                      [-sc=<scope>] [-t=<table>] [--to-date=<toDate>]
                      [--to-date-time=<toDateTime>]
                      [--to-date-time-offset=<toDateTimeOffset>]
                      [-ts=<timeScale>] [--where-filter=<whereFilter>]
                      [-l=<labels>]... [-tag=<tags>]...

Command options

All parameters supported by the command are listed below.

Command argument	Description	Accepted values
`-cat` `--category`	Check category name (volume, nulls, numeric, etc.)
`-ch` `--check`	Data quality check name, supports patterns like '*_id'
`-ct` `--check-type`	Data quality check type (profiling, monitoring, partitioned)	profiling monitoring partitioned
`-col` `--column`	Column name, supports patterns like '*_id'
`-c` `--connection`	Connection name, supports patterns like 'conn*'
`--daily-partitioning-include-today`	Analyze also today and later days when running daily partitioned checks. By default, daily partitioned checks will not analyze today and future dates. Setting true will disable filtering the end dates.
`--daily-partitioning-recent-days`	The number of recent days to analyze incrementally by daily partitioned data quality checks.
`-tag` `--data-grouping-level-tag`	Data grouping hierarchy level filter (tag)
`-d` `--dummy`	Runs data quality check in a dummy mode, sensors are not executed on the target database, but the rest of the process is performed
`-e` `--enabled`	Runs only enabled or only disabled sensors, by default only enabled sensors are executed
`-fe` `--fail-on-execution-errors`	Returns a command status code 4 (when called from the command line) if any execution errors were raised during the execution, the default value is true.
`-fw` `--file-write`	Write command response to a file
`--from-date`	Analyze the data since the given date (inclusive). The date should be an ISO 8601 date (yyyy-MM-dd). The analyzed table must have the timestamp column properly configured, it is the column that is used for filtering the date and time ranges. Setting the beginning date overrides recent days and recent months.
`--from-date-time`	Analyze the data since the given date and time (inclusive). The date and time should be an ISO 8601 local date and time without the time zone (yyyy-MM-dd HH:mm:ss). The analyzed table must have the timestamp column properly configured, it is the column that is used for filtering the date and time ranges. Setting the beginning date and time overrides recent days and recent months.
`--from-date-time-offset`	Analyze the data since the given date and time with a time zone offset (inclusive). The date and time should be an ISO 8601 date and time followed by a time zone offset (yyyy-MM-dd HH:mm:ss). For example: 2023-02-20 14:10:00+02. The analyzed table must have the timestamp column properly configured, it is the column that is used for filtering the date and time ranges. Setting the beginning date and time overrides recent days and recent months.
`-t` `--table` `--full-table-name`	Full table name (schema.table), supports wildcard patterns 'sch.tab'
`--headless` `-hl`	Starts DQOps in a headless mode. When DQOps runs in a headless mode and the application cannot start because the DQOps Cloud API key is missing or the DQOps user home folder is not configured, DQOps will stop silently instead of asking the user to approve the setup of the DQOps user home folder structure and/or log into DQOps Cloud.
`-h` `--help`	Show the help for the command and parameters
`-l` `--label`	Label filter
`-m` `--mode`	Reporting mode (silent)	silent
`--monthly-partitioning-include-current-month`	Analyze also the current month and later months when running monthly partitioned checks. By default, monthly partitioned checks will not analyze the current month and future months. Setting true will disable filtering the end dates.
`--monthly-partitioning-recent-months`	The number of recent months to analyze incrementally by monthly partitioned data quality checks.
`-of` `--output-format`	Output format for tabular responses	TABLE CSV JSON
`-sc` `--scope`	Error sampling scope that is used for tables with data grouping. Error samples can be collected for the whole table, or for each data grouping. By default, collects error samples for a whole table.	table data_group
`-s` `--sensor`	Data quality sensor name (sensor definition or sensor name), supports patterns like 'table/validity/*'
`-ts` `--time-scale`	Time scale for monitoring and partitioned checks (daily, monthly, etc.)	daily monthly
`--to-date`	Analyze the data until the given date (exclusive, the given date and the following dates are not analyzed). The date should be an ISO 8601 date (YYYY-MM-DD). The analyzed table must have the timestamp column properly configured, it is the column that is used for filtering the date and time ranges. Setting the end date overrides the parameters to disable analyzing today or the current month.
`--to-date-time`	Analyze the data until the given date and time (exclusive). The date should be an ISO 8601 date (yyyy-MM-dd). The analyzed table must have the timestamp column properly configured, it is the column that is used for filtering the date and time ranges. Setting the end date and time overrides the parameters to disable analyzing today or the current month.
`--to-date-time-offset`	Analyze the data until the given date and time with a time zone offset (exclusive). The date and time should be an ISO 8601 date and time followed by a time zone offset (yyyy-MM-dd HH:mm:ss). For example: 2023-02-20 14:10:00+02. The analyzed table must have the timestamp column properly configured, it is the column that is used for filtering the date and time ranges. Setting the end date and time overrides the parameters to disable analyzing today or the current month.
`--where-filter`	An additional filter which must be a valid SQL predicate (an SQL expression that returns 'true' or 'false') that is added to the WHERE clause of the SQL query that DQOps will run on the data source. The purpose of a custom filter is to analyze only a subset of data, for example, when a new batch of records is loaded, and the data quality checks are evaluated as a data contract. All the records in that batch must tagged with the same value, and the passed predicate to find records from that batch would use the filter in the form: "{alias}.batch_id = 1". The filter can use replacement tokens {alias} to reference the analyzed table.