Configuring schedule by modifying the YAML file
In this section, we describe how you can customize when checks are run by setting schedules by modifying the YAML files. Instructions on how to configure the schedule using the graphical interface can be found here.
Configuring a schedule at the connection level
To schedule a check on an entire connection, you first need to open the YAML file of that connection. To do this, simply use the connection edit command, which will launch the YAML file in Visual Studio Code. To make working with the YAML file even easier, we recommend installing the YAML extension by RedHat and the Better Jinja extension by Samuel Colvin.
For example, to edit the connection named "testconnection" just run
To add a schedule to the YAML file, start by including the schedules:
parameter at the end of the document. Then, specify the check type you want
to run (profiling
, recurring_daily
, recurring_monthly
, partitioned_daily
, or partitioned_monthly
).
Next, define the frequency at which the check should be run using the cron_expression:
and input the frequency in the cron format.
For example, to schedule checks to run every day at 12:00 for all check types, the YAML file will look like the one provided below:
# yaml-language-server: $schema=https://cloud.dqo.ai/dqo-yaml-schema/ConnectionYaml-schema.json
apiVersion: dqo/v1
kind: source
spec:
provider_type: bigquery
bigquery:
source_project_id: bigquery-public-data
authentication_mode: google_application_credentials
schedules:
profiling:
cron_expression: 0 12 * * *
recurring_daily:
cron_expression: 0 12 * * *
recurring_monthly:
cron_expression: 0 12 * * *
partitioned_daily:
cron_expression: 0 12 * * *
partitioned_monthly:
cron_expression: 0 12 * * *
Creating a schedule at the table level
To schedule a check on a table or check level you first need to open the YAML file of that table.
To do this, simply use the table edit command, which will launch the YAML file in Visual Studio Code. To make working with the YAML file even easier, we recommend installing the YAML extension by RedHat and the Better Jinja extension by Samuel Colvin.
For example, to edit the "crime" table that has been defined in the "austin_crime" schema within "testconnection" data source, simply run the following command.
To set a schedule for an entire table in the YAML file, begin by adding the schedules_override:
parameter before the
column section. Then, specify the check type you want to run (profiling
, recurring_daily
, recurring_monthly
, partitioned_daily
, or partitioned_monthly
).
Next, define the frequency at which the check should be run using the cron_expression:
and input the frequency in the cron format.
For example, to schedule checks to run every day at 10:00 for all profiling checks, the YAML file will look like the one provided below:
# yaml-language-server: $schema=https://cloud.dqo.ai/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
incremental_time_window:
daily_partitioning_recent_days: 7
monthly_partitioning_recent_months: 1
schedules_override:
profiling:
cron_expression: 0 10 * * *
columns:
unique_key:
type_snapshot:
column_type: INT64
nullable: true
address:
type_snapshot:
column_type: STRING
nullable: true
census_tract:
type_snapshot:
column_type: FLOAT64
nullable: true
Creating a schedule at the check level
If you want to modify the schedule for a specific check type, open the YAML file of the table as described in the previous section.
Then, simply add the schedules_override:
under that check type's name. Next, define the frequency at which the check
should be run using the cron_expression:
and input the frequency in the cron format.
For example, to schedule profiling on a table level row_count check to run every day at 09:00 or column level nulls_count check to run every day at 08:00 the YAML files will look like the ones provided below:
# yaml-language-server: $schema=https://cloud.dqo.ai/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
incremental_time_window:
daily_partitioning_recent_days: 7
monthly_partitioning_recent_months: 1
profiling_checks:
volume:
row_count:
schedule_override:
cron_expression: 0 9 * * *
error:
min_count: 0
columns:
unique_key:
type_snapshot:
column_type: INT64
nullable: true
address:
type_snapshot:
column_type: STRING
nullable: true
census_tract:
type_snapshot:
column_type: FLOAT64
nullable: true
# yaml-language-server: $schema=https://cloud.dqo.ai/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
incremental_time_window:
daily_partitioning_recent_days: 7
monthly_partitioning_recent_months: 1
columns:
unique_key:
type_snapshot:
column_type: INT64
nullable: true
profiling_checks:
nulls:
nulls_count:
schedule_override:
cron_expression: 0 8 * * *
error:
max_count: 10
address:
type_snapshot:
column_type: STRING
nullable: true