Skip to content

Last updated: July 05, 2025

How to Register Data Sources in DQOps for Data Observability

Read this guide to understand how DQOps stores the connection parameters to monitored data sources, and how to manage credentials in a secure way.

Overview

This guide describes how DQOps stores the configuration of data sources in YAML files, and what configuration options are available.

Adding data sources from the user interface

If you are not interested in the details of the data source configuration, go back to the getting started section, and read the guide of adding a data source from the DQOps user interface, or consult the documentation of each data source, which also contains screenshots of configuration screens for each supported database.

Data sources configuration

DQOps stores the configuration of data sources in YAML files. The files support auto-completion in Visual Studio Code and can be versioned in Git.

DQOps user home folder

For the purpose of this guide, we will assume that DQOps was started in the current folder using the python -m dqops command. All files mentioned in this guide will be relative to the current folder, referred to as a $DQO_USER_HOME in the examples below.

DQOps YAML files structure

The structure of DQOps YAML configuration files is similar to the structure of Kubernetes specification files. Additionally, the first line references a YAML schema file that is used by Visual Studio Code for code completion, validation, and showing the documentation of checks. The concept of working with YAML files shows the editing experience in Visual Studio Code.

DQOps YAML file example

The following code example is a fragment of a DQOps data source configuration file, showing all regular elements of all DQOps YAML files.

1
2
3
4
5
6
7
8
# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/ConnectionYaml-schema.json (1)
apiVersion: dqo/v1
kind: source # (2)!
spec: # (3)!
  provider_type: postgresql
  postgresql:
    host: localhost
    ...
  1. The YAML file schema identifier. Each type of DQOps configuration file points to its own YAML file schema. The YAML schema is used by text editors such as Visual Studio Code for the code completion and schema validation.
  2. The type of the file is identified in the kind element.
  3. The spec connection specification object that describes the data source, and its connection parameters.

The first line of DQOps configuration files has an optional link to a YAML schema file. The YAML schema files are used by text editors such as Visual Studio Code for code completion and schema validation.

The kind node identifies the type of DQOps file, and the spec node contains the specification, which is the real configuration. For detailed file reference, consult the reference of the DQOps YAML files schema in the DQOps reference section.

YAML file extensions

DQOps identifies the type of its YAML files by the file extension. The file extensions of files storing the metadata of data sources are listed below.

File name pattern File purpose
connection.dqoconnection.yaml Data source connection configuration file with the connection details and credentials.
*.dqotable.yaml Monitored table configuration file with the configuration of enabled data quality checks.

Data sources

The data sources can be registered in DQOps using the user interface or creating DQOps YAML .dqoconnection.yaml files directly in the data source folder.

Data sources folder

The configuration of data sources and the metadata of all tables are defined in the sources folder, inside the DQOps user home folder, referred as $DQO_USER_HOME. The example below shows two data sources named prod-landing-zone and prod-data-lake.

$DQO_USER_HOME
├───...
└───sources(1)                                                                
    ├───prod-landing-zone(2)
       ├───connection.dqoconnection.yaml(3)
       └───...
    ├───prod-data-lake
       ├───connection.dqoconnection.yaml
       └───...
    └─...   
  1. The sources folder stores data sources as nested folders.
  2. Each folder inside the sources folder is a connection name to a data source.
  3. Each data source's folder contains a file connection.dqoconnection.yaml which specifies the connection parameters to the data source.

The name of each child folder inside the sources folder is a connection name to a data source. Each data source's folder contains one file named connection.dqoconnection.yaml, that specifies the connection parameters to that data source.

Configuring data sources in YAML files

The data source folder should contain exactly one file, that must be named connection.dqoconnection.yaml. An example data source connection file for PostgreSQL is shown below.

# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/ConnectionYaml-schema.json
apiVersion: dqo/v1
kind: source # (1)!
spec:
  provider_type: postgresql # (2)!
  postgresql: # (3)!
    host: localhost
    port: "52289"
    user: test
    password: test
    database: test
  1. The type of the file is identified in the kind element. Data sources use a source kind.
  2. The type of data sources. Use the documentation of the ConnectionSpec object to find the names of other supported data sources, beyond the postgresql connection type shown in this example.
  3. The configuration node for PostgreSQL. Other supported data sources are configured in similar elements, named as the type of data source.

Each connection.dqoconnection.yaml must have two nodes filled:

  • provider_type enumeration stores the type of the data source.
  • data_source_type node (postgresql in this example) stores the type-safe configuration for that data source, supporting code completion in Visual Studio Code.

The full documentation of the spec element is provided in the ConnectionSpec object reference. Examples of data source specific configurations are located in the data sources section of the documentation.

Configuring data sources from the user interface

Each type of data source has its own connection configuration screen in the DQOps user interface. Check out the data sources section to find the data source of interest, and learn the details of configuring your connections from the DQOps UI.

Using credentials and secrets

DQOps supports providing credentials from a separate location to avoid storing the connection.dqoconnection.yaml files in the source repository.

Referencing environment variables

Credentials are provided to a DQOps instance using environment variables. Environment variables are referenced using as ${ENVIRONMENT_VARIABLE_NAME} values.

# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/ConnectionYaml-schema.json
apiVersion: dqo/v1
kind: source
spec:
  provider_type: postgresql
  postgresql:
    host: localhost
    port: "52289"
    user: ${MY_POSTGRESS_USER}
    password: ${MY_POSTGRESS_DATABASE}
    database: test

If you are running DQOps as a docker container, use the -e docker parameter to pass additional environment variables.

Using shared credentials

Credentials and secrets that are shared with the DQOps Cloud (and DQOps SaaS instances) are stored in the .credentials folder. The name of the secret is the file name inside the .credentials folder. The following example shows two such secrets named my_postgress_user.txt and my_postgress_pass.

$DQO_USER_HOME
├───...
├───.gitignore(1)
└───.credentials(2)                                                               
    ├───my_postgress_user.txt(3)
    ├───my_postgress_pass
    └─...   
  1. The default .gitignore file has a rule to ignore the .credentials folder and all files inside that folder.
  2. The hidden folder for shared credentials. This folder is added to the .gitignore, but is synchronized with DQOps Cloud if you want to use DQOps Cloud and synchronize the metadata between the SaaS environment and your local environment.
  3. The shared credentials are defined as files stored directly in the .credentials folder. The file name extension for credentials does not matter. This example uses a .txt file extension only for clarity.

The whole .credentials folder is added to the .gitignore, ensuring that the credentials are not pushed to the Git source code repository by mistake.

The following example shows how to reference the shared credentials using a ${credential://shared_secret_name} value.

# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/ConnectionYaml-schema.json
apiVersion: dqo/v1
kind: source
spec:
  provider_type: postgresql
  postgresql:
    host: localhost
    port: "52289"
    user: ${credential://my_postgress_user.txt}
    password: ${credential://my_postgress_pass}
    database: test

The shared credentials can be also edited in the Configuration section of the DQOps user interface by users holding an Admin or Editor role. Access management is enabled only in the TEAM and ENTERPRISE editions of DQOps.

Shared credentials configuration

Additional connection configuration

This section describes the remaining configuration parameters defined in the connection.dqoconnection.yaml file.

Job parallelism level

DQOps runs data quality checks in parallel, running checks for each table in a separate thread. The limit of parallel jobs per data source is configured in the parallel_jobs_limit parameter.

# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/ConnectionYaml-schema.json
apiVersion: dqo/v1
kind: source
spec:
  provider_type: postgresql
  parallel_jobs_limit: 10  # (1)!
  postgresql:
    host: localhost
    port: "52289"
  ...
  1. The configuration parameter for configuring the maximum number of tables that are analyzed in parallel.

DQOps license limits

The limit of parallel jobs supported by DQOps depends on the license level. Consult the DQOps pricing for details.

Data quality check scheduling

The DQOps CRON schedules for running each type of data quality check is configured in the schedules section.

# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/ConnectionYaml-schema.json
apiVersion: dqo/v1
kind: source
spec:
  ...
  schedules:  # (1)!
    profiling:
      cron_expression: 0 12 1 * *  # (2)!
    monitoring_daily:
      cron_expression: 0 12 * * *  # (3)!
    monitoring_monthly:
      cron_expression: 0 12 * * *  # (4)!
    partitioned_daily:
      cron_expression: 0 12 * * *  # (5)!
    partitioned_monthly:
      cron_expression: 0 12 * * *  # (6)!
  1. The data source scheduling configuration of running data quality checks.
  2. The CRON schedule for running profiling checks.
  3. The CRON schedule for running daily monitoring checks.
  4. The CRON schedule for running monthly monitoring checks.
  5. The CRON schedule for running daily partition checks.
  6. The CRON schedule for running monthly partition checks.

When a new data source is added to DQOps, the configuration of CRON schedules is copied from the default schedules that are stored in the settings/defaultschedules.dqoschedules.yaml file. The schedules shown in the example above are used as the initial configuration.

Consult the data quality check scheduling manual to see how to configure data quality scheduling from the DQOps user interface.

The full documentation of the schedules element is provided in the DefaultSchedulesSpec object reference.

Data quality issue to incident grouping

DQOps groups similar data quality issues into data quality incidents, the matching method is described in the data quality incidents article.

The following example shows where the configuration is stored in the connection.dqoconnection.yaml file.

# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/ConnectionYaml-schema.json
apiVersion: dqo/v1
kind: source
spec:
  ...
  incident_grouping: # (1)!
    grouping_level: table_dimension_category # (2)!
    minimum_severity: warning
    max_incident_length_days: 60
    mute_for_days: 60 
  1. The incident_grouping node with the configuration of grouping data quality issues to incidents.
  2. The data quality issue mapping method.

The full documentation of the incident_grouping element is provided in the ConnectionIncidentGroupingSpec object reference.

Incident notification

When a new data quality incidents is detected, or a user changes the status of an incident, DQOps can send notifications. The notifications are sent by email or by calling a webhooks and sending a JSON IncidentNotificationMessage using a HTTP POST method. Multiple addresses can be provided, which must be separated by a comma.

This configuration overrides the default settings stored in the settings/defaultnotifications.dqonotifications.yaml file with data source specific configuration, allowing to send the notification to the data source owner or the right data engineering team.

# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/ConnectionYaml-schema.json
apiVersion: dqo/v1
kind: source
spec:
  ...
  incident_grouping:
    grouping_level: table_dimension_category
    minimum_severity: warning
    max_incident_length_days: 60
    mute_for_days: 60 
    incident_notification: # (1)!
      incident_opened_addresses: jsmith@company.com,https://my.ticketingsystem.com/on_new_incident_detected # (2)!
      incident_acknowledged_addresses: https://my.ticketingsystem.com/on_incident_confirmed_by_ops_team # (3)!
      incident_resolved_addresses: https://my.ticketingsystem.com/on_incident_resolved # (4)!
      incident_muted_addresses: https://my.ticketingsystem.com/on_incident_muted # (5)!
  1. The incident notification settings node.
  2. The email address or a webhook url where DQOps sends notifications of new data quality incidents that were just detected. Supports both emails and webhook urls. Multiple target addresses should be separated by a comma.
  3. The email address or a webhook url where DQOps sends notifications of data quality incidents that were reviewed and assigned for resolution. Supports both emails and webhook urls. Multiple target addresses should be separated by a comma.
  4. The email address or a webhook url where DQOps sends notifications of data quality incidents that were resolved and the data quality checks can be run again to validate the fix. Supports both emails and webhook urls. Multiple target addresses should be separated by a comma.
  5. The email address or a webhook url where DQOps sends notifications of data quality incidents that were muted because the incident was identified as a low priority or out-of-scope. Supports both emails and webhook urls. Multiple target addresses should be separated by a comma.

The full documentation of the incident_grouping.incident_notification element is provided in the IncidentNotificationSpec object reference.

What's next