Skip to content

Use cases

We have provided a variety of examples to help you in using DQO effectively. These examples use openly available datasets from Google Cloud.

Prerequisite

To use the examples you need:

After installing Google Cloud CLI, log in to your GCP account, by running:

gcloud auth application-default login

Running the use cases

Standard DQO installation comes with a set of examples, which can be found in the example/ directory. You can view a complete list of the examples with links to detailed explanation by scrolling to the bottom of the page.

The example directory contains two configuration files: connection.dqoconnection.yaml, which stores the data source configuration, and *.dqotable.yaml file, which stores the columns and tables metadata and checks configuration.

While it is not necessary to manually add the connection in our examples, you can find information on how to do it in the Working with DQO section.

To run the examples, follow the steps below.

  1. Go to the directory where you installed DQO and navigate, for example, to examples/data-completeness/number-of-rows-in-the-table-bigquery.

    Run the command provided below.

    run_dqo
    
    ./run_dqo
    
  2. Create DQO userhome folder.

    After installation, you will be asked whether to initialize the DQO userhome folder in the default location. Type Y to create the folder.
    The userhome folder locally stores data such as sensor and checkout readings, as well as data source configurations. You can learn more about data storage here.

  3. Login to DQO Cloud.

    To use DQO features, such as storing data quality definitions and results in the cloud or data quality dashboards, you must create a DQO cloud account.

    After creating an userhome folder, you will be asked whether to log in to the DQO cloud. After typing Y, you will be redirected to https://cloud.dqo.ai/registration, where you can create a new account, use Google single sign-on (SSO) or log in if you already have an account.

    During the first registration, a unique identification code (API Key) will be generated and automatically passed to the DQO application. The API Key is now stored in the configuration file.

  4. To execute the checks that were prepared in the example, run the following command in DQO Shell:

    check run
    

    You can also execute the checks using the graphical interface. Simply, open the DQO User Interface Console (http://localhost:8888).

    Go to the Profiling section. Select the table or column mentioned in the example description from the tree view on the left.

    Select the Advanced Profiling tab. Navigating to a list of checks

    Run the enabled check using the Run check button. Run check

    Review the results by opening the Check details button. Check details

  5. After executing the checks, synchronize the results with your DQO cloud account by running the following command or using the Synchronize button located in the upper right corner of the graphical interface.

    cloud sync all
    
  6. You can now review the results on the data quality dashboards as described in the Working with DQO section.

List of the use cases

Here is a comprehensive list of examples with links to the relevant documentation section with detailed descriptions.

Name of the example Description Link to the dataset description
Data accuracy
Integrity check between columns in different tables This example shows how to check the referential integrity of a column against a column in another table using foreign-key-match-percent check. Link
Data completeness
Number of rows in the table This example shows how to check that the number of rows in a table does not exceed the minimum accepted count using row_count check. Link
Number of null values This example shows how to detect that the number of null values in a column does not exceed the maximum accepted count using nulls_cont check. Link
Data uniqueness
Percentage of duplicates This example shows how to detect that the percentage of duplicate values in a column does not exceed the maximum accepted percentage using duplicate_percent check. Link
Data validity
Percentage of valid USA zipcodes This example shows how to detect that the percentage of valid USA zip code in a column does not fall below a set threshold using valid_usa_zipcode_percent check. Link
Percentage of valid emails This example shows how to detect that the percentage of valid email values in a column does not exceed the maximum accepted percentage using valid_email_percent check. DQOps dataset
Percentage of valid latitude and longitude This example shows how to detect that the percentage of valid latitude and longitude values remain above a set threshold using numeric_valid_latitude_percent and numeric_valid_longitude_percentchecks. Link
Percentage of valid IP4 address This example shows how to detect that the percentage of valid IP4 address in a column does not fall below a set threshold using valid_ip4_address_percent check. DQOps dataset
Percentage of strings matching date regex This example shows how to detect that the percentage of strings matching the date format regex in a column does not exceed a set threshold using string_match_date_regex_percent check. Link
Percentage of negative values This example shows how to detect that the percentage of negative values in a column does not exceed a set threshold using negative_percent check. Link
Percentage of valid currency codes This example shows how to detect that the percentage of valid currency codes in a column does not fall below a set threshold using string_valid_currency_code_percent check. DQOps dataset
Percentage of rows passing SQL condition This example shows how to detect that the percentage of passed sql condition in a column does not fall below a set threshold using sql_condition_passed_percent check. Link
Percentage of valid UUID This example shows how to detect that th percentage of valid UUID values in a column does not fall below a set threshold using string_valid_uuid_percent check. DQOps dataset
Data reasonability
Percentage of values in range This example shows how to detect that the percentage of values within a set range in a column does not exceed a set threshold using values_in_range_integers_percent check. Link
A string not exceeding a set length This example shows how to check that the length of the string does not exceed the indicated value using string_max_length check. Link
Percentage of false values This example shows how to detect that the percentage of false values remains above a set threshold using bool_false_percent check. Link
Stability
Table availability This example shows how to verify that a query can be executed on a table and that the server does not return errors using table_availability check. Link
Data quality monitoring
Running checks with a scheduler This example shows how to set different schedules on multiple checks. Link
Data consistency
Percent of rows having a string column value in an expected set This example shows how to verify that the percentage of strings from a set in a column does not fall below a set threshold using string_value_in_set_percent check. Link