Skip to content

ConnectionYaml

MysqlParametersSpec

MySql connection parameters.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
host MySQL host name. Supports also a ${MYSQL_HOST} configuration with a custom environment variable. string
port MySQL port number. The default port is 3306. Supports also a ${MYSQL_PORT} configuration with a custom environment variable. string
database MySQL database name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
user MySQL user name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
password MySQL database password. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
options MySQL connection 'options' initialization parameter. For example setting this to -c statement_timeout=5min would set the statement timeout parameter for this session to 5 minutes. Supports also a ${MYSQL_OPTIONS} configuration with a custom environment variable. string
sslmode SslMode MySQL connection parameter. enum DISABLED
PREFERRED
VERIFY_IDENTITY
VERIFY_CA
REQUIRED

ConnectionSpec

Data source (connection) specification.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
provider_type Database provider type (required). enum snowflake
oracle
postgresql
redshift
sqlserver
mysql
bigquery
bigquery BigQuery connection parameters. Specify parameters in the bigquery section. BigQueryParametersSpec
snowflake Snowflake connection parameters. Specify parameters in the snowflake section or set the url (which is the Snowflake JDBC url). SnowflakeParametersSpec
postgresql PostgreSQL connection parameters. Specify parameters in the postgresql section or set the url (which is the PostgreSQL JDBC url). PostgresqlParametersSpec
redshift Redshift connection parameters. Specify parameters in the redshift section or set the url (which is the Redshift JDBC url). RedshiftParametersSpec
sqlserver SQL Server connection parameters. Specify parameters in the sqlserver section or set the url (which is the SQL Server JDBC url). SqlServerParametersSpec
mysql MySQL connection parameters. Specify parameters in the sqlserver section or set the url (which is the MySQL JDBC url). MysqlParametersSpec
oracle Oracle connection parameters. Specify parameters in the postgresql section or set the url (which is the Oracle JDBC url). OracleParametersSpec
parallel_runs_limit The concurrency limit for the maximum number of parallel SQL queries executed on this connection. integer
default_grouping_configuration Default data grouping configuration for all tables. The configuration may be overridden on table, column and check level. Data groupings are configured in two cases: (1) the data in the table should be analyzed with a GROUP BY condition, to analyze different datasets using separate time series, for example a table contains data from multiple countries and there is a 'country' column used for partitioning. a static dimension is assigned to a table, when the data is partitioned at a table level (similar tables store the same information, but for different countries, etc.). (2) a static dimension is assigned to a table, when the data is partitioned at a table level (similar tables store the same information, but for different countries, etc.). DataGroupingConfigurationSpec
schedules Configuration of the job scheduler that runs data quality checks. The scheduler configuration is divided into types of checks that have different schedules. RecurringSchedulesSpec
incident_grouping Configuration of data quality incident grouping. Configures how failed data quality checks are grouped into data quality incidents. ConnectionIncidentGroupingSpec
comments Comments for change tracking. Please put comments in this collection because YAML comments may be removed when the YAML file is modified by the tool (serialization and deserialization will remove non tracked comments). CommentsListSpec
labels Custom labels that were assigned to the connection. Labels are used for searching for tables when filtered data quality checks are executed. LabelSetSpec

SqlServerParametersSpec

Microsoft SQL Server connection parameters.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
host SQL Server host name. Supports also a ${SQLSERVER_HOST} configuration with a custom environment variable. string
port SQL Server port number. The default port is 1433. Supports also a ${SQLSERVER_PORT} configuration with a custom environment variable. string
database SQL Server database name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
user SQL Server user name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
password SQL Server database password. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
options SQL Server connection 'options' initialization parameter. For example setting this to -c statement_timeout=5min would set the statement timeout parameter for this session to 5 minutes. Supports also a ${SQLSERVER_OPTIONS} configuration with a custom environment variable. string
disable_encryption Disable SSL encryption parameter. The default value is false. You may need to disable encryption when SQL Server is started in Docker. boolean

DataGroupingDimensionSpec

Single data grouping dimension configuration. A data grouping dimension may be configured as a hardcoded value or a mapping to a column.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
source The source of the data grouping dimension value. The default grouping dimension source is a tag. Assign a tag when there are multiple similar tables that store the same data for different areas (countries, etc.). This could be a country name if a table or partition stores information for that country. enum tag
column_value
tag The value assigned to a data quality grouping dimension when the source is 'tag'. Assign a hardcoded (static) data grouping dimension value (tag) when there are multiple similar tables that store the same data for different areas (countries, etc.). This could be a country name if a table or partition stores information for that country. string
column Column name that contains a dynamic data grouping dimension value (for dynamic data-driven data groupings). Sensor queries will be extended with a GROUP BY {data group level colum name}, sensors (and alerts) will be calculated for each unique value of the specified column. Also a separate time series will be tracked for each value. column_name
name Data grouping dimension name. string

BigQueryParametersSpec

BigQuery connection parameters.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
source_project_id Source GCP project ID. This is the project that has datasets that will be imported. string
billing_project_id Billing GCP project ID. This is the project used as the default GCP project. The calling user must have a bigquery.jobs.create permission in this project. string
authentication_mode Authentication mode to the Google Cloud. enum json_key_content
json_key_path
google_application_credentials
json_key_content JSON key content. Use an environment variable that contains the content of the key as ${KEY_ENV} or a name of a secret in the GCP Secret Manager: ${sm://key-secret-name}. Requires the authentication-mode: json_key_content. string
json_key_path A path to the JSON key file. Requires the authentication-mode: json_key_path. string
quota_project_id Quota GCP project ID. string

RecurringSchedulesSpec

Container of all recurring schedules (cron expressions) for each type of checks. Data quality checks are grouped by type (profiling, whole table checks, time period partitioned checks). Each group of checks could be divided additionally by time scale (daily, monthly, etc). Each time scale has a different recurring schedule used by the job scheduler to run the checks. These schedules are defined in this object.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
profiling Schedule for running profiling data quality checks. RecurringScheduleSpec
recurring_daily Schedule for running daily recurring checks. RecurringScheduleSpec
recurring_monthly Schedule for running monthly recurring checks. RecurringScheduleSpec
partitioned_daily Schedule for running daily partitioned checks. RecurringScheduleSpec
partitioned_monthly Schedule for running monthly partitioned checks. RecurringScheduleSpec

ConnectionIncidentGroupingSpec

Configuration of data quality incident grouping on a connection level. Defines how similar data quality issues are grouped into incidents.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
grouping_level Grouping level of failed data quality checks for creating higher level data quality incidents. The default grouping level is by a table, a data quality dimension and a check category (i.e. a datatype data quality incident detected on a table X in the numeric checks category). enum table_dimension_category_type
table_dimension
table
table_dimension_category
table_dimension_category_name
minimum_severity Minimum severity level of data quality issues that are grouped into incidents. The default minimum severity level is 'warning'. Other supported severity levels are 'error' and 'fatal'. enum warning
error
fatal
divide_by_data_groups Create separate data quality incidents for each data group, creating different incidents for different groups of rows. By default, data groups are ignored for grouping data quality issues into data quality incidents. boolean
max_incident_length_days The maximum length of a data quality incident in days. When a new data quality issue is detected after max_incident_length_days days since a similar data quality was first seen, a new data quality incident is created that will capture all following data quality issues for the next max_incident_length_days days. The default value is 60 days. integer
mute_for_days The number of days that all similar data quality issues are muted when a a data quality incident is closed in the 'mute' status. integer
disabled Disables data quality incident creation for failed data quality checks on the data source. boolean
webhooks Configuration of Webhook URLs for new or updated incident notifications. IncidentWebhookNotificationsSpec

OracleParametersSpec

Oracle connection parameters.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
host Oracle host name. Supports also a ${ORACLE_HOST} configuration with a custom environment variable. string
port Oracle port number. The default port is 1521. Supports also a ${ORACLE_PORT} configuration with a custom environment variable. string
database Oracle database name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
user Oracle user name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
password Oracle database password. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
options Oracle connection 'options' initialization parameter. For example setting this to -c statement_timeout=5min would set the statement timeout parameter for this session to 5 minutes. Supports also a ${ORACLE_OPTIONS} configuration with a custom environment variable. string
initialization_sql Custom SQL that is executed after connecting to Oracle. This SQL script can configure the default language, for example: alter session set NLS_DATE_FORMAT='YYYY-DD-MM HH24:MI:SS' string

IncidentWebhookNotificationsSpec

Configuration of Webhook URLs used for new or updated incident's notifications. Specifies the URLs of webhooks where the notification messages are sent.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
incident_opened_webhook_url Webhook URL where the notification messages describing new incidents are pushed using a HTTP POST request. The format of the JSON message is documented in the IncidentNotificationMessage object. string
incident_acknowledged_webhook_url Webhook URL where the notification messages describing acknowledged messages are pushed using a HTTP POST request. The format of the JSON message is documented in the IncidentNotificationMessage object. string
incident_resolved_webhook_url Webhook URL where the notification messages describing resolved messages are pushed using a HTTP POST request. The format of the JSON message is documented in the IncidentNotificationMessage object. string
incident_muted_webhook_url Webhook URL where the notification messages describing muted messages are pushed using a HTTP POST request. The format of the JSON message is documented in the IncidentNotificationMessage object. string

PostgresqlParametersSpec

Postgresql connection parameters.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
host PostgreSQL host name. Supports also a ${POSTGRESQL_HOST} configuration with a custom environment variable. string
port PostgreSQL port number. The default port is 5432. Supports also a ${POSTGRESQL_PORT} configuration with a custom environment variable. string
database PostgreSQL database name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
user PostgreSQL user name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
password PostgreSQL database password. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
options PostgreSQL connection 'options' initialization parameter. For example setting this to -c statement_timeout=5min would set the statement timeout parameter for this session to 5 minutes. Supports also a ${POSTGRESQL_OPTIONS} configuration with a custom environment variable. string
sslmode Sslmode PostgreSQL connection parameter. The default value is disabled. enum allow
prefer
disable
require
verify-full
verify-ca

RedshiftParametersSpec

Redshift connection parameters.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
host Redshift host name. Supports also a ${REDSHIFT_HOST} configuration with a custom environment variable. string
port Redshift port number. The default port is 5432. Supports also a ${REDSHIFT_PORT} configuration with a custom environment variable. string
database Redshift database name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
user Redshift user name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
password Redshift database password. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
options Redshift connection 'options' initialization parameter. For example setting this to -c statement_timeout=5min would set the statement timeout parameter for this session to 5 minutes. Supports also a ${REDSHIFT_OPTIONS} configuration with a custom environment variable. string

SnowflakeParametersSpec

Snowflake connection parameters.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
account Snowflake account name, e.q. <account>, <account>-<locator>, <account>.<region> or <account>.<region>.<platform>.. Supports also a ${SNOWFLAKE_ACCOUNT} configuration with a custom environment variable. string
warehouse Snowflake warehouse name. Supports also a ${SNOWFLAKE_WAREHOUSE} configuration with a custom environment variable. string
database Snowflake database name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
user Snowflake user name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
password Snowflake database password. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
role Snowflake role name. Supports also ${SNOWFLAKE_ROLE} configuration with a custom environment variable. string

LabelSetSpec

Collection of unique labels assigned to items (tables, columns, checks) that could be targeted for a data quality check execution.


DataGroupingConfigurationSpec

Configuration of the data groupings that is used to calculate data quality checks with a GROUP BY clause. Data grouping levels may be hardcoded if we have different (but similar) tables for different business areas (countries, product groups). We can also pull data grouping levels directly from the database if a table has a column that identifies a business area. Data quality results for new groups are dynamically identified in the database by the GROUP BY clause. Sensor values are extracted for each data group separately, a time series is build for each data group separately.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
level_1 Data grouping dimension level 1 configuration. DataGroupingDimensionSpec
level_2 Data grouping dimension level 2 configuration. DataGroupingDimensionSpec
level_3 Data grouping dimension level 3 configuration. DataGroupingDimensionSpec
level_4 Data grouping dimension level 4 configuration. DataGroupingDimensionSpec
level_5 Data grouping dimension level 5 configuration. DataGroupingDimensionSpec
level_6 Data grouping dimension level 6 configuration. DataGroupingDimensionSpec
level_7 Data grouping dimension level 7 configuration. DataGroupingDimensionSpec
level_8 Data grouping dimension level 8 configuration. DataGroupingDimensionSpec
level_9 Data grouping dimension level 9 configuration. DataGroupingDimensionSpec

ConnectionYaml

Connection definition for a data source connection that is covered by data quality checks.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
api_version string
kind enum table
dashboards
source
sensor
check
rule
file_index
settings
provider_sensor
spec ConnectionSpec