Skip to content

Last updated: July 22, 2025

Sum match data quality checks, SQL examples

A column-level check that ensures that compares the sum of the values in the tested column to the sum of values in a reference column from the reference table. Compares the sum of values for each group of data. The data is grouped using a GROUP BY clause and groups are matched between the tested (parent) table and the reference table (the source of truth).


The sum match data quality check has the following variants for each type of data quality checks supported by DQOps.

profile sum match

Check description

Verifies that percentage of the difference between the sum of values in a tested column in a parent table and the sum of a values in a column in the reference table. The difference must be below defined percentage thresholds.

Data quality check name Friendly name Category Check type Time scale Quality dimension Sensor definition Quality rule Standard
profile_sum_match Maximum percentage of difference between sums of compared columns comparisons profiling Accuracy sum diff_percent

Command-line examples

Please expand the section below to see the DQOps command-line examples to run or activate the profile sum match data quality check.

Managing profile sum match check from DQOps shell

Activate this data quality using the check activate CLI command, providing the connection name, table name, check name, and all other filters. Activates the warning rule with the default parameters.

dqo> check activate -c=connection_name -t=schema_name.table_name -col=column_name -ch=profile_sum_match --enable-warning

You can also use patterns to activate the check on all matching tables and columns.

dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=profile_sum_match --enable-warning

Additional rule parameters are passed using the -Wrule_parameter_name=value.

dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=profile_sum_match --enable-warning
                    -Wmax_diff_percent=value

Activate this data quality using the check activate CLI command, providing the connection name, table name, check name, and all other filters. Activates the error rule with the default parameters.

dqo> check activate -c=connection_name -t=schema_name.table_name -col=column_name -ch=profile_sum_match --enable-error

You can also use patterns to activate the check on all matching tables and columns.

dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=profile_sum_match --enable-error

Additional rule parameters are passed using the -Erule_parameter_name=value.

dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=profile_sum_match --enable-error
                    -Emax_diff_percent=value

Run this data quality check using the check run CLI command by providing the check name and all other targeting filters. The following example shows how to run the profile_sum_match check on all tables and columns on a single data source.

dqo> check run -c=data_source_name -ch=profile_sum_match

It is also possible to run this check on a specific connection and table. In order to do this, use the connection name and the full table name parameters.

dqo> check run -c=connection_name -t=schema_name.table_name -ch=profile_sum_match

You can also run this check on all tables (and columns) on which the profile_sum_match check is enabled using patterns to find tables.

dqo> check run -c=connection_name -t=schema_prefix*.fact_* -col=column_name_* -ch=profile_sum_match

YAML configuration

The sample schema_name.table_name.dqotable.yaml file with the check configured is shown below.

# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
  table_comparisons:
    compare_to_source_of_truth_table:
      reference_table_connection_name: <source_of_truth_connection_name>
      reference_table_schema_name: <source_of_truth_schema_name>
      reference_table_name: <source_of_truth_table_name>
      check_type: profiling
      grouping_columns:
      - compared_table_column_name: country
        reference_table_column_name: country_column_name_on_reference_table
      - compared_table_column_name: state
        reference_table_column_name: state_column_name_on_reference_table
  columns:
    target_column:
      profiling_checks:
        comparisons:
          compare_to_source_of_truth_table:
            reference_column: source_of_truth_column_name
            profile_sum_match:
              warning:
                max_diff_percent: 0.0
              error:
                max_diff_percent: 1.0
              fatal:
                max_diff_percent: 5.0
      labels:
      - This is the column that is analyzed for data quality issues
    country:
      labels:
      - column used as the first grouping key for calculating aggregated values used
        for the table comparison
    state:
      labels:
      - column used as the first grouping key for calculating aggregated values used
        for the table comparison
Samples of generated SQL queries for each data source type

Please expand the database engine name section to see the SQL query rendered by a Jinja2 template for the sum data quality sensor.

BigQuery
{% import '/dialects/bigquery.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table.`target_column`) AS actual_value
FROM `your-google-project-id`.`<target_schema>`.`<target_table>` AS analyzed_table
ClickHouse
{% import '/dialects/clickhouse.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value
FROM "<target_schema>"."<target_table>" AS analyzed_table
Databricks
{% import '/dialects/databricks.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table.`target_column`) AS actual_value
FROM `<target_schema>`.`<target_table>` AS analyzed_table
DB2
{% import '/dialects/db2.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM(
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value
FROM(
    SELECT
        original_table.*
    FROM "<target_schema>"."<target_table>" original_table
) analyzed_table
DuckDB
{% import '/dialects/duckdb.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value
FROM  AS analyzed_table
HANA
{% import '/dialects/hana.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value
FROM (
    SELECT
        original_table.*
    FROM "<target_schema>"."<target_table>" original_table
) analyzed_table
MariaDB
{% import '/dialects/mariadb.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table.`target_column`) AS actual_value
FROM `<target_table>` AS analyzed_table
MySQL
{% import '/dialects/mysql.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table.`target_column`) AS actual_value
FROM `<target_table>` AS analyzed_table
Oracle
{% import '/dialects/oracle.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM(
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value
FROM(
    SELECT
        original_table.*
    FROM "<target_schema>"."<target_table>" original_table
) analyzed_table
PostgreSQL
{% import '/dialects/postgresql.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value
FROM "your_postgresql_database"."<target_schema>"."<target_table>" AS analyzed_table
Presto
{% import '/dialects/presto.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value
FROM (
    SELECT
        original_table.*
    FROM "your_trino_database"."<target_schema>"."<target_table>" original_table
) analyzed_table
QuestDB
{% import '/dialects/questdb.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM(
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value
FROM(
    SELECT
        original_table.*
    FROM "<target_table>" original_table
) analyzed_table
Redshift
{% import '/dialects/redshift.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value
FROM "your_redshift_database"."<target_schema>"."<target_table>" AS analyzed_table
Snowflake
{% import '/dialects/snowflake.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value
FROM "your_snowflake_database"."<target_schema>"."<target_table>" AS analyzed_table
Spark
{% import '/dialects/spark.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table.`target_column`) AS actual_value
FROM `<target_schema>`.`<target_table>` AS analyzed_table
SQL Server
{% import '/dialects/sqlserver.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table.[target_column]) AS actual_value
FROM [your_sql_server_database].[<target_schema>].[<target_table>] AS analyzed_table
Teradata
{% import '/dialects/teradata.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value
FROM "<target_schema>"."<target_table>" AS analyzed_table
Trino
{% import '/dialects/trino.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value
FROM (
    SELECT
        original_table.*
    FROM "your_trino_catalog"."<target_schema>"."<target_table>" original_table
) analyzed_table

daily sum match

Check description

Verifies that percentage of the difference between the sum of values in a tested column in a parent table and the sum of a values in a column in the reference table. The difference must be below defined percentage thresholds. Stores the most recent captured value for each day when the data quality check was evaluated.

Data quality check name Friendly name Category Check type Time scale Quality dimension Sensor definition Quality rule Standard
daily_sum_match Maximum percentage of difference between sums of compared columns comparisons monitoring daily Accuracy sum diff_percent

Command-line examples

Please expand the section below to see the DQOps command-line examples to run or activate the daily sum match data quality check.

Managing daily sum match check from DQOps shell

Activate this data quality using the check activate CLI command, providing the connection name, table name, check name, and all other filters. Activates the warning rule with the default parameters.

dqo> check activate -c=connection_name -t=schema_name.table_name -col=column_name -ch=daily_sum_match --enable-warning

You can also use patterns to activate the check on all matching tables and columns.

dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=daily_sum_match --enable-warning

Additional rule parameters are passed using the -Wrule_parameter_name=value.

dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=daily_sum_match --enable-warning
                    -Wmax_diff_percent=value

Activate this data quality using the check activate CLI command, providing the connection name, table name, check name, and all other filters. Activates the error rule with the default parameters.

dqo> check activate -c=connection_name -t=schema_name.table_name -col=column_name -ch=daily_sum_match --enable-error

You can also use patterns to activate the check on all matching tables and columns.

dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=daily_sum_match --enable-error

Additional rule parameters are passed using the -Erule_parameter_name=value.

dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=daily_sum_match --enable-error
                    -Emax_diff_percent=value

Run this data quality check using the check run CLI command by providing the check name and all other targeting filters. The following example shows how to run the daily_sum_match check on all tables and columns on a single data source.

dqo> check run -c=data_source_name -ch=daily_sum_match

It is also possible to run this check on a specific connection and table. In order to do this, use the connection name and the full table name parameters.

dqo> check run -c=connection_name -t=schema_name.table_name -ch=daily_sum_match

You can also run this check on all tables (and columns) on which the daily_sum_match check is enabled using patterns to find tables.

dqo> check run -c=connection_name -t=schema_prefix*.fact_* -col=column_name_* -ch=daily_sum_match

YAML configuration

The sample schema_name.table_name.dqotable.yaml file with the check configured is shown below.

# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
  table_comparisons:
    compare_to_source_of_truth_table:
      reference_table_connection_name: <source_of_truth_connection_name>
      reference_table_schema_name: <source_of_truth_schema_name>
      reference_table_name: <source_of_truth_table_name>
      check_type: profiling
      grouping_columns:
      - compared_table_column_name: country
        reference_table_column_name: country_column_name_on_reference_table
      - compared_table_column_name: state
        reference_table_column_name: state_column_name_on_reference_table
  columns:
    target_column:
      monitoring_checks:
        daily:
          comparisons:
            compare_to_source_of_truth_table:
              reference_column: source_of_truth_column_name
              daily_sum_match:
                warning:
                  max_diff_percent: 0.0
                error:
                  max_diff_percent: 1.0
                fatal:
                  max_diff_percent: 5.0
      labels:
      - This is the column that is analyzed for data quality issues
    country:
      labels:
      - column used as the first grouping key for calculating aggregated values used
        for the table comparison
    state:
      labels:
      - column used as the first grouping key for calculating aggregated values used
        for the table comparison
Samples of generated SQL queries for each data source type

Please expand the database engine name section to see the SQL query rendered by a Jinja2 template for the sum data quality sensor.

BigQuery
{% import '/dialects/bigquery.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table.`target_column`) AS actual_value
FROM `your-google-project-id`.`<target_schema>`.`<target_table>` AS analyzed_table
ClickHouse
{% import '/dialects/clickhouse.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value
FROM "<target_schema>"."<target_table>" AS analyzed_table
Databricks
{% import '/dialects/databricks.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table.`target_column`) AS actual_value
FROM `<target_schema>`.`<target_table>` AS analyzed_table
DB2
{% import '/dialects/db2.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM(
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value
FROM(
    SELECT
        original_table.*
    FROM "<target_schema>"."<target_table>" original_table
) analyzed_table
DuckDB
{% import '/dialects/duckdb.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value
FROM  AS analyzed_table
HANA
{% import '/dialects/hana.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value
FROM (
    SELECT
        original_table.*
    FROM "<target_schema>"."<target_table>" original_table
) analyzed_table
MariaDB
{% import '/dialects/mariadb.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table.`target_column`) AS actual_value
FROM `<target_table>` AS analyzed_table
MySQL
{% import '/dialects/mysql.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table.`target_column`) AS actual_value
FROM `<target_table>` AS analyzed_table
Oracle
{% import '/dialects/oracle.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM(
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value
FROM(
    SELECT
        original_table.*
    FROM "<target_schema>"."<target_table>" original_table
) analyzed_table
PostgreSQL
{% import '/dialects/postgresql.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value
FROM "your_postgresql_database"."<target_schema>"."<target_table>" AS analyzed_table
Presto
{% import '/dialects/presto.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value
FROM (
    SELECT
        original_table.*
    FROM "your_trino_database"."<target_schema>"."<target_table>" original_table
) analyzed_table
QuestDB
{% import '/dialects/questdb.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM(
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value
FROM(
    SELECT
        original_table.*
    FROM "<target_table>" original_table
) analyzed_table
Redshift
{% import '/dialects/redshift.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value
FROM "your_redshift_database"."<target_schema>"."<target_table>" AS analyzed_table
Snowflake
{% import '/dialects/snowflake.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value
FROM "your_snowflake_database"."<target_schema>"."<target_table>" AS analyzed_table
Spark
{% import '/dialects/spark.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table.`target_column`) AS actual_value
FROM `<target_schema>`.`<target_table>` AS analyzed_table
SQL Server
{% import '/dialects/sqlserver.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table.[target_column]) AS actual_value
FROM [your_sql_server_database].[<target_schema>].[<target_table>] AS analyzed_table
Teradata
{% import '/dialects/teradata.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value
FROM "<target_schema>"."<target_table>" AS analyzed_table
Trino
{% import '/dialects/trino.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value
FROM (
    SELECT
        original_table.*
    FROM "your_trino_catalog"."<target_schema>"."<target_table>" original_table
) analyzed_table

monthly sum match

Check description

Verifies that percentage of the difference between the sum of values in a tested column in a parent table and the sum of a values in a column in the reference table. The difference must be below defined percentage thresholds. Stores the most recent captured value for each month when the data quality check was evaluated.

Data quality check name Friendly name Category Check type Time scale Quality dimension Sensor definition Quality rule Standard
monthly_sum_match Maximum percentage of difference between sums of compared columns comparisons monitoring monthly Accuracy sum diff_percent

Command-line examples

Please expand the section below to see the DQOps command-line examples to run or activate the monthly sum match data quality check.

Managing monthly sum match check from DQOps shell

Activate this data quality using the check activate CLI command, providing the connection name, table name, check name, and all other filters. Activates the warning rule with the default parameters.

dqo> check activate -c=connection_name -t=schema_name.table_name -col=column_name -ch=monthly_sum_match --enable-warning

You can also use patterns to activate the check on all matching tables and columns.

dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=monthly_sum_match --enable-warning

Additional rule parameters are passed using the -Wrule_parameter_name=value.

dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=monthly_sum_match --enable-warning
                    -Wmax_diff_percent=value

Activate this data quality using the check activate CLI command, providing the connection name, table name, check name, and all other filters. Activates the error rule with the default parameters.

dqo> check activate -c=connection_name -t=schema_name.table_name -col=column_name -ch=monthly_sum_match --enable-error

You can also use patterns to activate the check on all matching tables and columns.

dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=monthly_sum_match --enable-error

Additional rule parameters are passed using the -Erule_parameter_name=value.

dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=monthly_sum_match --enable-error
                    -Emax_diff_percent=value

Run this data quality check using the check run CLI command by providing the check name and all other targeting filters. The following example shows how to run the monthly_sum_match check on all tables and columns on a single data source.

dqo> check run -c=data_source_name -ch=monthly_sum_match

It is also possible to run this check on a specific connection and table. In order to do this, use the connection name and the full table name parameters.

dqo> check run -c=connection_name -t=schema_name.table_name -ch=monthly_sum_match

You can also run this check on all tables (and columns) on which the monthly_sum_match check is enabled using patterns to find tables.

dqo> check run -c=connection_name -t=schema_prefix*.fact_* -col=column_name_* -ch=monthly_sum_match

YAML configuration

The sample schema_name.table_name.dqotable.yaml file with the check configured is shown below.

# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
  table_comparisons:
    compare_to_source_of_truth_table:
      reference_table_connection_name: <source_of_truth_connection_name>
      reference_table_schema_name: <source_of_truth_schema_name>
      reference_table_name: <source_of_truth_table_name>
      check_type: profiling
      grouping_columns:
      - compared_table_column_name: country
        reference_table_column_name: country_column_name_on_reference_table
      - compared_table_column_name: state
        reference_table_column_name: state_column_name_on_reference_table
  columns:
    target_column:
      monitoring_checks:
        monthly:
          comparisons:
            compare_to_source_of_truth_table:
              reference_column: source_of_truth_column_name
              monthly_sum_match:
                warning:
                  max_diff_percent: 0.0
                error:
                  max_diff_percent: 1.0
                fatal:
                  max_diff_percent: 5.0
      labels:
      - This is the column that is analyzed for data quality issues
    country:
      labels:
      - column used as the first grouping key for calculating aggregated values used
        for the table comparison
    state:
      labels:
      - column used as the first grouping key for calculating aggregated values used
        for the table comparison
Samples of generated SQL queries for each data source type

Please expand the database engine name section to see the SQL query rendered by a Jinja2 template for the sum data quality sensor.

BigQuery
{% import '/dialects/bigquery.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table.`target_column`) AS actual_value
FROM `your-google-project-id`.`<target_schema>`.`<target_table>` AS analyzed_table
ClickHouse
{% import '/dialects/clickhouse.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value
FROM "<target_schema>"."<target_table>" AS analyzed_table
Databricks
{% import '/dialects/databricks.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table.`target_column`) AS actual_value
FROM `<target_schema>`.`<target_table>` AS analyzed_table
DB2
{% import '/dialects/db2.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM(
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value
FROM(
    SELECT
        original_table.*
    FROM "<target_schema>"."<target_table>" original_table
) analyzed_table
DuckDB
{% import '/dialects/duckdb.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value
FROM  AS analyzed_table
HANA
{% import '/dialects/hana.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value
FROM (
    SELECT
        original_table.*
    FROM "<target_schema>"."<target_table>" original_table
) analyzed_table
MariaDB
{% import '/dialects/mariadb.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table.`target_column`) AS actual_value
FROM `<target_table>` AS analyzed_table
MySQL
{% import '/dialects/mysql.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table.`target_column`) AS actual_value
FROM `<target_table>` AS analyzed_table
Oracle
{% import '/dialects/oracle.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM(
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value
FROM(
    SELECT
        original_table.*
    FROM "<target_schema>"."<target_table>" original_table
) analyzed_table
PostgreSQL
{% import '/dialects/postgresql.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value
FROM "your_postgresql_database"."<target_schema>"."<target_table>" AS analyzed_table
Presto
{% import '/dialects/presto.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value
FROM (
    SELECT
        original_table.*
    FROM "your_trino_database"."<target_schema>"."<target_table>" original_table
) analyzed_table
QuestDB
{% import '/dialects/questdb.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM(
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value
FROM(
    SELECT
        original_table.*
    FROM "<target_table>" original_table
) analyzed_table
Redshift
{% import '/dialects/redshift.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value
FROM "your_redshift_database"."<target_schema>"."<target_table>" AS analyzed_table
Snowflake
{% import '/dialects/snowflake.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value
FROM "your_snowflake_database"."<target_schema>"."<target_table>" AS analyzed_table
Spark
{% import '/dialects/spark.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table.`target_column`) AS actual_value
FROM `<target_schema>`.`<target_table>` AS analyzed_table
SQL Server
{% import '/dialects/sqlserver.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table.[target_column]) AS actual_value
FROM [your_sql_server_database].[<target_schema>].[<target_table>] AS analyzed_table
Teradata
{% import '/dialects/teradata.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value
FROM "<target_schema>"."<target_table>" AS analyzed_table
Trino
{% import '/dialects/trino.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value
FROM (
    SELECT
        original_table.*
    FROM "your_trino_catalog"."<target_schema>"."<target_table>" original_table
) analyzed_table

daily partition sum match

Check description

Verifies that percentage of the difference between the sum of values in a tested column in a parent table and the sum of a values in a column in the reference table. The difference must be below defined percentage thresholds. Compares each daily partition (each day of data) between the compared table and the reference table (the source of truth).

Data quality check name Friendly name Category Check type Time scale Quality dimension Sensor definition Quality rule Standard
daily_partition_sum_match Maximum percentage of difference between sums of compared columns comparisons partitioned daily Accuracy sum diff_percent

Command-line examples

Please expand the section below to see the DQOps command-line examples to run or activate the daily partition sum match data quality check.

Managing daily partition sum match check from DQOps shell

Activate this data quality using the check activate CLI command, providing the connection name, table name, check name, and all other filters. Activates the warning rule with the default parameters.

dqo> check activate -c=connection_name -t=schema_name.table_name -col=column_name -ch=daily_partition_sum_match --enable-warning

You can also use patterns to activate the check on all matching tables and columns.

dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=daily_partition_sum_match --enable-warning

Additional rule parameters are passed using the -Wrule_parameter_name=value.

dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=daily_partition_sum_match --enable-warning
                    -Wmax_diff_percent=value

Activate this data quality using the check activate CLI command, providing the connection name, table name, check name, and all other filters. Activates the error rule with the default parameters.

dqo> check activate -c=connection_name -t=schema_name.table_name -col=column_name -ch=daily_partition_sum_match --enable-error

You can also use patterns to activate the check on all matching tables and columns.

dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=daily_partition_sum_match --enable-error

Additional rule parameters are passed using the -Erule_parameter_name=value.

dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=daily_partition_sum_match --enable-error
                    -Emax_diff_percent=value

Run this data quality check using the check run CLI command by providing the check name and all other targeting filters. The following example shows how to run the daily_partition_sum_match check on all tables and columns on a single data source.

dqo> check run -c=data_source_name -ch=daily_partition_sum_match

It is also possible to run this check on a specific connection and table. In order to do this, use the connection name and the full table name parameters.

dqo> check run -c=connection_name -t=schema_name.table_name -ch=daily_partition_sum_match

You can also run this check on all tables (and columns) on which the daily_partition_sum_match check is enabled using patterns to find tables.

dqo> check run -c=connection_name -t=schema_prefix*.fact_* -col=column_name_* -ch=daily_partition_sum_match

YAML configuration

The sample schema_name.table_name.dqotable.yaml file with the check configured is shown below.

# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
  timestamp_columns:
    partition_by_column: date_column
  incremental_time_window:
    daily_partitioning_recent_days: 7
    monthly_partitioning_recent_months: 1
  table_comparisons:
    compare_to_source_of_truth_table:
      reference_table_connection_name: <source_of_truth_connection_name>
      reference_table_schema_name: <source_of_truth_schema_name>
      reference_table_name: <source_of_truth_table_name>
      check_type: profiling
      grouping_columns:
      - compared_table_column_name: country
        reference_table_column_name: country_column_name_on_reference_table
      - compared_table_column_name: state
        reference_table_column_name: state_column_name_on_reference_table
  columns:
    target_column:
      partitioned_checks:
        daily:
          comparisons:
            compare_to_source_of_truth_table:
              reference_column: source_of_truth_column_name
              daily_partition_sum_match:
                warning:
                  max_diff_percent: 0.0
                error:
                  max_diff_percent: 1.0
                fatal:
                  max_diff_percent: 5.0
      labels:
      - This is the column that is analyzed for data quality issues
    date_column:
      labels:
      - "date or datetime column used as a daily or monthly partitioning key, dates\
        \ (and times) are truncated to a day or a month by the sensor's query for\
        \ partitioned checks"
    country:
      labels:
      - column used as the first grouping key for calculating aggregated values used
        for the table comparison
    state:
      labels:
      - column used as the first grouping key for calculating aggregated values used
        for the table comparison
Samples of generated SQL queries for each data source type

Please expand the database engine name section to see the SQL query rendered by a Jinja2 template for the sum data quality sensor.

BigQuery
{% import '/dialects/bigquery.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table.`target_column`) AS actual_value,
    CAST(analyzed_table.`date_column` AS DATE) AS time_period,
    TIMESTAMP(CAST(analyzed_table.`date_column` AS DATE)) AS time_period_utc
FROM `your-google-project-id`.`<target_schema>`.`<target_table>` AS analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
ClickHouse
{% import '/dialects/clickhouse.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value,
    CAST(analyzed_table."date_column" AS DATE) AS time_period,
    toDateTime64(CAST(analyzed_table."date_column" AS DATE), 3) AS time_period_utc
FROM "<target_schema>"."<target_table>" AS analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
Databricks
{% import '/dialects/databricks.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table.`target_column`) AS actual_value,
    CAST(analyzed_table.`date_column` AS DATE) AS time_period,
    TIMESTAMP(CAST(analyzed_table.`date_column` AS DATE)) AS time_period_utc
FROM `<target_schema>`.`<target_table>` AS analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
DB2
{% import '/dialects/db2.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM(
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value,
    time_period,
    time_period_utc
FROM(
    SELECT
        original_table.*,
    CAST(original_table."date_column" AS DATE) AS time_period,
    TIMESTAMP(CAST(original_table."date_column" AS DATE)) AS time_period_utc
    FROM "<target_schema>"."<target_table>" original_table
) analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
DuckDB
{% import '/dialects/duckdb.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value,
    CAST(analyzed_table."date_column" AS date) AS time_period,
    CAST((CAST(analyzed_table."date_column" AS date)) AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
FROM  AS analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
HANA
{% import '/dialects/hana.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value,
    time_period,
    time_period_utc
FROM (
    SELECT
        original_table.*,
    CAST(original_table."date_column" AS DATE) AS time_period,
    TO_TIMESTAMP(CAST(original_table."date_column" AS DATE)) AS time_period_utc
    FROM "<target_schema>"."<target_table>" original_table
) analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
MariaDB
{% import '/dialects/mariadb.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table.`target_column`) AS actual_value,
    DATE_FORMAT(analyzed_table.`date_column`, '%Y-%m-%d 00:00:00') AS time_period,
    FROM_UNIXTIME(UNIX_TIMESTAMP(DATE_FORMAT(analyzed_table.`date_column`, '%Y-%m-%d 00:00:00'))) AS time_period_utc
FROM `<target_table>` AS analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
MySQL
{% import '/dialects/mysql.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table.`target_column`) AS actual_value,
    DATE_FORMAT(analyzed_table.`date_column`, '%Y-%m-%d 00:00:00') AS time_period,
    FROM_UNIXTIME(UNIX_TIMESTAMP(DATE_FORMAT(analyzed_table.`date_column`, '%Y-%m-%d 00:00:00'))) AS time_period_utc
FROM `<target_table>` AS analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
Oracle
{% import '/dialects/oracle.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM(
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value,
    time_period,
    time_period_utc
FROM(
    SELECT
        original_table.*,
    TRUNC(CAST(original_table."date_column" AS DATE)) AS time_period,
    CAST(TRUNC(CAST(original_table."date_column" AS DATE)) AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
    FROM "<target_schema>"."<target_table>" original_table
) analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
PostgreSQL
{% import '/dialects/postgresql.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value,
    CAST(analyzed_table."date_column" AS date) AS time_period,
    CAST((CAST(analyzed_table."date_column" AS date)) AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
FROM "your_postgresql_database"."<target_schema>"."<target_table>" AS analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
Presto
{% import '/dialects/presto.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value,
    time_period,
    time_period_utc
FROM (
    SELECT
        original_table.*,
    CAST(original_table."date_column" AS date) AS time_period,
    CAST(CAST(original_table."date_column" AS date) AS TIMESTAMP) AS time_period_utc
    FROM "your_trino_database"."<target_schema>"."<target_table>" original_table
) analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
QuestDB
{% import '/dialects/questdb.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM(
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value,
    time_period,
    time_period_utc
FROM(
    SELECT
        original_table.*,
    CAST(DATE_TRUNC('day', original_table."date_column") AS DATE) AS time_period,
    CAST((CAST(DATE_TRUNC('day', original_table."date_column") AS DATE)) AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
    FROM "<target_table>" original_table
) analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
Redshift
{% import '/dialects/redshift.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value,
    CAST(analyzed_table."date_column" AS date) AS time_period,
    CAST((CAST(analyzed_table."date_column" AS date)) AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
FROM "your_redshift_database"."<target_schema>"."<target_table>" AS analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
Snowflake
{% import '/dialects/snowflake.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value,
    CAST(analyzed_table."date_column" AS date) AS time_period,
    TO_TIMESTAMP(CAST(analyzed_table."date_column" AS date)) AS time_period_utc
FROM "your_snowflake_database"."<target_schema>"."<target_table>" AS analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
Spark
{% import '/dialects/spark.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table.`target_column`) AS actual_value,
    CAST(analyzed_table.`date_column` AS DATE) AS time_period,
    TIMESTAMP(CAST(analyzed_table.`date_column` AS DATE)) AS time_period_utc
FROM `<target_schema>`.`<target_table>` AS analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
SQL Server
{% import '/dialects/sqlserver.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table.[target_column]) AS actual_value,
    CAST(analyzed_table.[date_column] AS date) AS time_period,
    CAST((CAST(analyzed_table.[date_column] AS date)) AS DATETIME) AS time_period_utc
FROM [your_sql_server_database].[<target_schema>].[<target_table>] AS analyzed_table
GROUP BY CAST(analyzed_table.[date_column] AS date), CAST(analyzed_table.[date_column] AS date)
ORDER BY CAST(analyzed_table.[date_column] AS date)
Teradata
{% import '/dialects/teradata.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value,
    CAST(analyzed_table."date_column" AS DATE) AS time_period,
    CAST(CAST(analyzed_table."date_column" AS DATE) AS TIMESTAMP) AS time_period_utc
FROM "<target_schema>"."<target_table>" AS analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
Trino
{% import '/dialects/trino.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value,
    time_period,
    time_period_utc
FROM (
    SELECT
        original_table.*,
    CAST(original_table."date_column" AS date) AS time_period,
    CAST(CAST(original_table."date_column" AS date) AS TIMESTAMP) AS time_period_utc
    FROM "your_trino_catalog"."<target_schema>"."<target_table>" original_table
) analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc

monthly partition sum match

Check description

Verifies that percentage of the difference between the sum of values in a tested column in a parent table and the sum of a values in a column in the reference table. The difference must be below defined percentage thresholds. Compares each monthly partition (each month of data) between the compared table and the reference table (the source of truth).

Data quality check name Friendly name Category Check type Time scale Quality dimension Sensor definition Quality rule Standard
monthly_partition_sum_match Maximum percentage of difference between sums of compared columns comparisons partitioned monthly Accuracy sum diff_percent

Command-line examples

Please expand the section below to see the DQOps command-line examples to run or activate the monthly partition sum match data quality check.

Managing monthly partition sum match check from DQOps shell

Activate this data quality using the check activate CLI command, providing the connection name, table name, check name, and all other filters. Activates the warning rule with the default parameters.

dqo> check activate -c=connection_name -t=schema_name.table_name -col=column_name -ch=monthly_partition_sum_match --enable-warning

You can also use patterns to activate the check on all matching tables and columns.

dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=monthly_partition_sum_match --enable-warning

Additional rule parameters are passed using the -Wrule_parameter_name=value.

dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=monthly_partition_sum_match --enable-warning
                    -Wmax_diff_percent=value

Activate this data quality using the check activate CLI command, providing the connection name, table name, check name, and all other filters. Activates the error rule with the default parameters.

dqo> check activate -c=connection_name -t=schema_name.table_name -col=column_name -ch=monthly_partition_sum_match --enable-error

You can also use patterns to activate the check on all matching tables and columns.

dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=monthly_partition_sum_match --enable-error

Additional rule parameters are passed using the -Erule_parameter_name=value.

dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=monthly_partition_sum_match --enable-error
                    -Emax_diff_percent=value

Run this data quality check using the check run CLI command by providing the check name and all other targeting filters. The following example shows how to run the monthly_partition_sum_match check on all tables and columns on a single data source.

dqo> check run -c=data_source_name -ch=monthly_partition_sum_match

It is also possible to run this check on a specific connection and table. In order to do this, use the connection name and the full table name parameters.

dqo> check run -c=connection_name -t=schema_name.table_name -ch=monthly_partition_sum_match

You can also run this check on all tables (and columns) on which the monthly_partition_sum_match check is enabled using patterns to find tables.

dqo> check run -c=connection_name -t=schema_prefix*.fact_* -col=column_name_* -ch=monthly_partition_sum_match

YAML configuration

The sample schema_name.table_name.dqotable.yaml file with the check configured is shown below.

# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
  timestamp_columns:
    partition_by_column: date_column
  incremental_time_window:
    daily_partitioning_recent_days: 7
    monthly_partitioning_recent_months: 1
  table_comparisons:
    compare_to_source_of_truth_table:
      reference_table_connection_name: <source_of_truth_connection_name>
      reference_table_schema_name: <source_of_truth_schema_name>
      reference_table_name: <source_of_truth_table_name>
      check_type: profiling
      grouping_columns:
      - compared_table_column_name: country
        reference_table_column_name: country_column_name_on_reference_table
      - compared_table_column_name: state
        reference_table_column_name: state_column_name_on_reference_table
  columns:
    target_column:
      partitioned_checks:
        monthly:
          comparisons:
            compare_to_source_of_truth_table:
              reference_column: source_of_truth_column_name
              monthly_partition_sum_match:
                warning:
                  max_diff_percent: 0.0
                error:
                  max_diff_percent: 1.0
                fatal:
                  max_diff_percent: 5.0
      labels:
      - This is the column that is analyzed for data quality issues
    date_column:
      labels:
      - "date or datetime column used as a daily or monthly partitioning key, dates\
        \ (and times) are truncated to a day or a month by the sensor's query for\
        \ partitioned checks"
    country:
      labels:
      - column used as the first grouping key for calculating aggregated values used
        for the table comparison
    state:
      labels:
      - column used as the first grouping key for calculating aggregated values used
        for the table comparison
Samples of generated SQL queries for each data source type

Please expand the database engine name section to see the SQL query rendered by a Jinja2 template for the sum data quality sensor.

BigQuery
{% import '/dialects/bigquery.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table.`target_column`) AS actual_value,
    DATE_TRUNC(CAST(analyzed_table.`date_column` AS DATE), MONTH) AS time_period,
    TIMESTAMP(DATE_TRUNC(CAST(analyzed_table.`date_column` AS DATE), MONTH)) AS time_period_utc
FROM `your-google-project-id`.`<target_schema>`.`<target_table>` AS analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
ClickHouse
{% import '/dialects/clickhouse.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value,
    DATE_TRUNC('month', CAST(analyzed_table."date_column" AS DATE)) AS time_period,
    toDateTime64(DATE_TRUNC('month', CAST(analyzed_table."date_column" AS DATE)), 3) AS time_period_utc
FROM "<target_schema>"."<target_table>" AS analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
Databricks
{% import '/dialects/databricks.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table.`target_column`) AS actual_value,
    DATE_TRUNC('MONTH', CAST(analyzed_table.`date_column` AS DATE)) AS time_period,
    TIMESTAMP(DATE_TRUNC('MONTH', CAST(analyzed_table.`date_column` AS DATE))) AS time_period_utc
FROM `<target_schema>`.`<target_table>` AS analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
DB2
{% import '/dialects/db2.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM(
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value,
    time_period,
    time_period_utc
FROM(
    SELECT
        original_table.*,
    DATE_TRUNC('MONTH', CAST(original_table."date_column" AS DATE)) AS time_period,
    TIMESTAMP(DATE_TRUNC('MONTH', CAST(original_table."date_column" AS DATE))) AS time_period_utc
    FROM "<target_schema>"."<target_table>" original_table
) analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
DuckDB
{% import '/dialects/duckdb.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value,
    DATE_TRUNC('MONTH', CAST(analyzed_table."date_column" AS date)) AS time_period,
    CAST((DATE_TRUNC('MONTH', CAST(analyzed_table."date_column" AS date))) AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
FROM  AS analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
HANA
{% import '/dialects/hana.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value,
    time_period,
    time_period_utc
FROM (
    SELECT
        original_table.*,
    SERIES_ROUND(CAST(original_table."date_column" AS DATE), 'INTERVAL 1 MONTH', ROUND_DOWN) AS time_period,
    TO_TIMESTAMP(SERIES_ROUND(CAST(original_table."date_column" AS DATE), 'INTERVAL 1 MONTH', ROUND_DOWN)) AS time_period_utc
    FROM "<target_schema>"."<target_table>" original_table
) analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
MariaDB
{% import '/dialects/mariadb.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table.`target_column`) AS actual_value,
    DATE_FORMAT(analyzed_table.`date_column`, '%Y-%m-01 00:00:00') AS time_period,
    FROM_UNIXTIME(UNIX_TIMESTAMP(DATE_FORMAT(analyzed_table.`date_column`, '%Y-%m-01 00:00:00'))) AS time_period_utc
FROM `<target_table>` AS analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
MySQL
{% import '/dialects/mysql.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table.`target_column`) AS actual_value,
    DATE_FORMAT(analyzed_table.`date_column`, '%Y-%m-01 00:00:00') AS time_period,
    FROM_UNIXTIME(UNIX_TIMESTAMP(DATE_FORMAT(analyzed_table.`date_column`, '%Y-%m-01 00:00:00'))) AS time_period_utc
FROM `<target_table>` AS analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
Oracle
{% import '/dialects/oracle.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM(
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value,
    time_period,
    time_period_utc
FROM(
    SELECT
        original_table.*,
    TRUNC(CAST(original_table."date_column" AS DATE), 'MONTH') AS time_period,
    CAST(TRUNC(CAST(original_table."date_column" AS DATE), 'MONTH') AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
    FROM "<target_schema>"."<target_table>" original_table
) analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
PostgreSQL
{% import '/dialects/postgresql.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value,
    DATE_TRUNC('MONTH', CAST(analyzed_table."date_column" AS date)) AS time_period,
    CAST((DATE_TRUNC('MONTH', CAST(analyzed_table."date_column" AS date))) AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
FROM "your_postgresql_database"."<target_schema>"."<target_table>" AS analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
Presto
{% import '/dialects/presto.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value,
    time_period,
    time_period_utc
FROM (
    SELECT
        original_table.*,
    DATE_TRUNC('MONTH', CAST(original_table."date_column" AS date)) AS time_period,
    CAST(DATE_TRUNC('MONTH', CAST(original_table."date_column" AS date)) AS TIMESTAMP) AS time_period_utc
    FROM "your_trino_database"."<target_schema>"."<target_table>" original_table
) analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
QuestDB
{% import '/dialects/questdb.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM(
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value,
    time_period,
    time_period_utc
FROM(
    SELECT
        original_table.*,
    CAST(DATE_TRUNC('month', original_table."date_column") AS DATE) AS time_period,
    CAST((CAST(DATE_TRUNC('month', original_table."date_column") AS DATE)) AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
    FROM "<target_table>" original_table
) analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
Redshift
{% import '/dialects/redshift.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value,
    DATE_TRUNC('MONTH', CAST(analyzed_table."date_column" AS date)) AS time_period,
    CAST((DATE_TRUNC('MONTH', CAST(analyzed_table."date_column" AS date))) AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
FROM "your_redshift_database"."<target_schema>"."<target_table>" AS analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
Snowflake
{% import '/dialects/snowflake.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value,
    DATE_TRUNC('MONTH', CAST(analyzed_table."date_column" AS date)) AS time_period,
    TO_TIMESTAMP(DATE_TRUNC('MONTH', CAST(analyzed_table."date_column" AS date))) AS time_period_utc
FROM "your_snowflake_database"."<target_schema>"."<target_table>" AS analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
Spark
{% import '/dialects/spark.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table.`target_column`) AS actual_value,
    DATE_TRUNC('MONTH', CAST(analyzed_table.`date_column` AS DATE)) AS time_period,
    TIMESTAMP(DATE_TRUNC('MONTH', CAST(analyzed_table.`date_column` AS DATE))) AS time_period_utc
FROM `<target_schema>`.`<target_table>` AS analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
SQL Server
{% import '/dialects/sqlserver.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table.[target_column]) AS actual_value,
    DATEFROMPARTS(YEAR(CAST(analyzed_table.[date_column] AS date)), MONTH(CAST(analyzed_table.[date_column] AS date)), 1) AS time_period,
    CAST((DATEFROMPARTS(YEAR(CAST(analyzed_table.[date_column] AS date)), MONTH(CAST(analyzed_table.[date_column] AS date)), 1)) AS DATETIME) AS time_period_utc
FROM [your_sql_server_database].[<target_schema>].[<target_table>] AS analyzed_table
GROUP BY DATEFROMPARTS(YEAR(CAST(analyzed_table.[date_column] AS date)), MONTH(CAST(analyzed_table.[date_column] AS date)), 1), DATEADD(month, DATEDIFF(month, 0, analyzed_table.[date_column]), 0)
ORDER BY DATEFROMPARTS(YEAR(CAST(analyzed_table.[date_column] AS date)), MONTH(CAST(analyzed_table.[date_column] AS date)), 1)
Teradata
{% import '/dialects/teradata.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value,
    TRUNC(CAST(analyzed_table."date_column" AS DATE), 'MM') AS time_period,
    CAST(TRUNC(CAST(analyzed_table."date_column" AS DATE), 'MM') AS TIMESTAMP) AS time_period_utc
FROM "<target_schema>"."<target_table>" AS analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
Trino
{% import '/dialects/trino.sql.jinja2' as lib with context -%}
SELECT
    SUM({{ lib.render_target_column('analyzed_table')}}) AS actual_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
    SUM(analyzed_table."target_column") AS actual_value,
    time_period,
    time_period_utc
FROM (
    SELECT
        original_table.*,
    DATE_TRUNC('MONTH', CAST(original_table."date_column" AS date)) AS time_period,
    CAST(DATE_TRUNC('MONTH', CAST(original_table."date_column" AS date)) AS TIMESTAMP) AS time_period_utc
    FROM "your_trino_catalog"."<target_schema>"."<target_table>" original_table
) analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc

What's next