Last updated: July 22, 2025
Invalid usa phone format found data quality checks, SQL examples
This check validates the format of USA phone numbers inside text columns. It counts the number of invalid phone number and raises a data quality issue when too many rows contain phone numbers.
The invalid usa phone format found data quality check has the following variants for each type of data quality checks supported by DQOps.
profile invalid usa phone format found
Check description
Verifies that the number of invalid USA phone numbers in a text column does not exceed the maximum accepted count.
Data quality check name | Friendly name | Category | Check type | Time scale | Quality dimension | Sensor definition | Quality rule | Standard |
---|---|---|---|---|---|---|---|---|
profile_invalid_usa_phone_format_found |
Maximum count of rows containing invalid USA phone number values | patterns | profiling | Validity | invalid_usa_phone_count | max_count |
Command-line examples
Please expand the section below to see the DQOps command-line examples to run or activate the profile invalid usa phone format found data quality check.
Managing profile invalid usa phone format found check from DQOps shell
Activate this data quality using the check activate CLI command, providing the connection name, table name, check name, and all other filters. Activates the warning rule with the default parameters.
dqo> check activate -c=connection_name -t=schema_name.table_name -col=column_name -ch=profile_invalid_usa_phone_format_found --enable-warning
You can also use patterns to activate the check on all matching tables and columns.
dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=profile_invalid_usa_phone_format_found --enable-warning
Additional rule parameters are passed using the -Wrule_parameter_name=value.
Activate this data quality using the check activate CLI command, providing the connection name, table name, check name, and all other filters. Activates the error rule with the default parameters.
dqo> check activate -c=connection_name -t=schema_name.table_name -col=column_name -ch=profile_invalid_usa_phone_format_found --enable-error
You can also use patterns to activate the check on all matching tables and columns.
dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=profile_invalid_usa_phone_format_found --enable-error
Additional rule parameters are passed using the -Erule_parameter_name=value.
Run this data quality check using the check run CLI command by providing the check name and all other targeting filters. The following example shows how to run the profile_invalid_usa_phone_format_found check on all tables and columns on a single data source.
It is also possible to run this check on a specific connection and table. In order to do this, use the connection name and the full table name parameters.
dqo> check run -c=connection_name -t=schema_name.table_name -ch=profile_invalid_usa_phone_format_found
You can also run this check on all tables (and columns) on which the profile_invalid_usa_phone_format_found check is enabled using patterns to find tables.
YAML configuration
The sample schema_name.table_name.dqotable.yaml file with the check configured is shown below.
# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
columns:
target_column:
profiling_checks:
patterns:
profile_invalid_usa_phone_format_found:
warning:
max_count: 0
error:
max_count: 10
fatal:
max_count: 100
labels:
- This is the column that is analyzed for data quality issues
Samples of generated SQL queries for each data source type
Please expand the database engine name section to see the SQL query rendered by a Jinja2 template for the invalid_usa_phone_count data quality sensor.
BigQuery
{% import '/dialects/bigquery.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP_CONTAINS(
CAST({{ lib.render_target_column('analyzed_table') }} AS STRING),
r"^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP_CONTAINS(
CAST(analyzed_table.`target_column` AS STRING),
r"^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value
FROM `your-google-project-id`.`<target_schema>`.`<target_table>` AS analyzed_table
ClickHouse
{% import '/dialects/clickhouse.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT match(
toString({{ lib.render_target_column('analyzed_table') }}),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
Databricks
{% import '/dialects/databricks.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP(
CAST({{ lib.render_target_column('analyzed_table') }} AS STRING),
"^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP(
CAST(analyzed_table.`target_column` AS STRING),
"^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value
FROM `<target_schema>`.`<target_table>` AS analyzed_table
DB2
{% import '/dialects/db2.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP_LIKE({{ lib.render_target_column('analyzed_table') }},
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[[:space:].-]?)?(\(?\d{3}\)?[[:space:].-]?)(\d{3}[[:space:].-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP_LIKE(analyzed_table."target_column",
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[[:space:].-]?)?(\(?\d{3}\)?[[:space:].-]?)(\d{3}[[:space:].-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value
FROM (
SELECT
original_table.*
FROM "<target_schema>"."<target_table>" original_table
) analyzed_table
DuckDB
{% import '/dialects/duckdb.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN {{ lib.render_target_column('analyzed_table') }}
!~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
HANA
{% import '/dialects/hana.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN
CAST({{ lib.render_target_column('analyzed_table') }} AS VARCHAR) NOT LIKE_REGEXPR
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN
CAST(analyzed_table."target_column" AS VARCHAR) NOT LIKE_REGEXPR
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value
FROM (
SELECT
original_table.*
FROM "<target_schema>"."<target_table>" original_table
) analyzed_table
MariaDB
{% import '/dialects/mariadb.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT {{ lib.render_regex(lib.render_target_column('analyzed_table'), '^((((\\\\(\\\\+1\\\\)|(\\\\+1)|(\\\\([0][0][1]\\\\)|([0][0][1]))|\\\\(1\\\\)|(1))[\\\\s.-]?)?(\\\\(?[0-9]{3}\\\\)?[\\\\s.-]?)([0-9]{3}[\\\\s.-]?)([0-9]{4})))$') }}
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
MySQL
{% import '/dialects/mysql.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT {{ lib.render_regex(lib.render_target_column('analyzed_table'), '^((((\\\\(\\\\+1\\\\)|(\\\\+1)|(\\\\([0][0][1]\\\\)|([0][0][1]))|\\\\(1\\\\)|(1))[\\\\s.-]?)?(\\\\(?[0-9]{3}\\\\)?[\\\\s.-]?)([0-9]{3}[\\\\s.-]?)([0-9]{4})))$') }}
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
Oracle
{% import '/dialects/oracle.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP_LIKE({{ lib.render_target_column('analyzed_table') }},
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[[:space:].-]?)?(\(?\d{3}\)?[[:space:].-]?)(\d{3}[[:space:].-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP_LIKE(analyzed_table."target_column",
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[[:space:].-]?)?(\(?\d{3}\)?[[:space:].-]?)(\d{3}[[:space:].-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value
FROM (
SELECT
original_table.*
FROM "<target_schema>"."<target_table>" original_table
) analyzed_table
PostgreSQL
{% import '/dialects/postgresql.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN {{ lib.render_target_column('analyzed_table') }} !~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
Presto
{% import '/dialects/presto.sql.jinja2' as lib with context -%}
SELECT
CAST(SUM(
CASE
WHEN NOT REGEXP_LIKE(
CAST({{ lib.render_target_column('analyzed_table') }} AS VARCHAR),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS DOUBLE) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
CAST(SUM(
CASE
WHEN NOT REGEXP_LIKE(
CAST(analyzed_table."target_column" AS VARCHAR),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS DOUBLE) AS actual_value
FROM (
SELECT
original_table.*
FROM "your_trino_database"."<target_schema>"."<target_table>" original_table
) analyzed_table
QuestDB
{% import '/dialects/questdb.sql.jinja2' as lib with context -%}
SELECT
COALESCE(SUM(
CASE
WHEN {{ lib.render_target_column('analyzed_table') }} !~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
), 0) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM(
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
WHERE {{ lib.render_target_column('original_table') }} IS NOT NULL
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
COALESCE(SUM(
CASE
WHEN analyzed_table."target_column" !~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
), 0) AS actual_value
FROM(
SELECT
original_table.*
FROM "<target_table>" original_table
WHERE original_table."target_column" IS NOT NULL
) analyzed_table
Redshift
{% import '/dialects/redshift.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP_SUBSTR(replace(replace(replace({{ lib.render_target_column('analyzed_table') }}, '(', ''), ')', ''), '-', ''), '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$') IS NOT NULL
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP_SUBSTR(replace(replace(replace(analyzed_table."target_column", '(', ''), ')', ''), '-', ''), '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$') IS NOT NULL
THEN 1
ELSE 0
END
) AS actual_value
FROM "your_redshift_database"."<target_schema>"."<target_table>" AS analyzed_table
Snowflake
{% import '/dialects/snowflake.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE WHEN NOT ({{ lib.render_target_column('analyzed_table') }} REGEXP '^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$')
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE WHEN NOT (analyzed_table."target_column" REGEXP '^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$')
THEN 1
ELSE 0
END
) AS actual_value
FROM "your_snowflake_database"."<target_schema>"."<target_table>" AS analyzed_table
Spark
{% import '/dialects/spark.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP(
CAST({{ lib.render_target_column('analyzed_table') }} AS STRING),
"^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP(
CAST(analyzed_table.`target_column` AS STRING),
"^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value
FROM `<target_schema>`.`<target_table>` AS analyzed_table
SQL Server
{% import '/dialects/sqlserver.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT {{ lib.render_target_column('analyzed_table') }} LIKE '[0-9][0-9][0-9][-.][0-9][0-9][0-9][-.][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '+1([0-9][0-9][0-9])[0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '+1([0-9][0-9][0-9])[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '1([0-9][0-9][0-9])-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '1[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '1 [0-9][0-9][0-9] [0-9][0-9][0-9] [0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '+1 [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '([0-9][0-9][0-9])[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '(+1)%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '(1)%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '([0-9][0-9][0-9])-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT analyzed_table.[target_column] LIKE '[0-9][0-9][0-9][-.][0-9][0-9][0-9][-.][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '+1([0-9][0-9][0-9])[0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '+1([0-9][0-9][0-9])[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '1([0-9][0-9][0-9])-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '1[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '1 [0-9][0-9][0-9] [0-9][0-9][0-9] [0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '+1 [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '([0-9][0-9][0-9])[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '(+1)%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '(1)%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '([0-9][0-9][0-9])-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'
THEN 1
ELSE 0
END
) AS actual_value
FROM [your_sql_server_database].[<target_schema>].[<target_table>] AS analyzed_table
Teradata
{% import '/dialects/teradata.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN {{ lib.render_target_column('analyzed_table') }} IS NOT NULL AND
REGEXP_SUBSTR(
CAST({{ lib.render_target_column('analyzed_table') }} AS VARCHAR(4096)),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) IS NULL THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN analyzed_table."target_column" IS NOT NULL AND
REGEXP_SUBSTR(
CAST(analyzed_table."target_column" AS VARCHAR(4096)),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) IS NULL THEN 1
ELSE 0
END
) AS actual_value
FROM "<target_schema>"."<target_table>" AS analyzed_table
Trino
{% import '/dialects/trino.sql.jinja2' as lib with context -%}
SELECT
CAST(SUM(
CASE
WHEN NOT REGEXP_LIKE(
CAST({{ lib.render_target_column('analyzed_table') }} AS VARCHAR),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS DOUBLE) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
CAST(SUM(
CASE
WHEN NOT REGEXP_LIKE(
CAST(analyzed_table."target_column" AS VARCHAR),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS DOUBLE) AS actual_value
FROM (
SELECT
original_table.*
FROM "your_trino_catalog"."<target_schema>"."<target_table>" original_table
) analyzed_table
Expand the Configure with data grouping section to see additional examples for configuring this data quality checks to use data grouping (GROUP BY).
Configuration with data grouping
Sample configuration with data grouping enabled (YAML) The sample below shows how to configure the data grouping and how it affects the generated SQL query.
# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
default_grouping_name: group_by_country_and_state
groupings:
group_by_country_and_state:
level_1:
source: column_value
column: country
level_2:
source: column_value
column: state
columns:
target_column:
profiling_checks:
patterns:
profile_invalid_usa_phone_format_found:
warning:
max_count: 0
error:
max_count: 10
fatal:
max_count: 100
labels:
- This is the column that is analyzed for data quality issues
country:
labels:
- column used as the first grouping key
state:
labels:
- column used as the second grouping key
Please expand the database engine name section to see the SQL query rendered by a Jinja2 template for the invalid_usa_phone_count sensor.
BigQuery
{% import '/dialects/bigquery.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP_CONTAINS(
CAST({{ lib.render_target_column('analyzed_table') }} AS STRING),
r"^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP_CONTAINS(
CAST(analyzed_table.`target_column` AS STRING),
r"^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table.`country` AS grouping_level_1,
analyzed_table.`state` AS grouping_level_2
FROM `your-google-project-id`.`<target_schema>`.`<target_table>` AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
ClickHouse
{% import '/dialects/clickhouse.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT match(
toString({{ lib.render_target_column('analyzed_table') }}),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT match(
toString(analyzed_table."target_column"),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table."country" AS grouping_level_1,
analyzed_table."state" AS grouping_level_2
FROM "<target_schema>"."<target_table>" AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
Databricks
{% import '/dialects/databricks.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP(
CAST({{ lib.render_target_column('analyzed_table') }} AS STRING),
"^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP(
CAST(analyzed_table.`target_column` AS STRING),
"^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table.`country` AS grouping_level_1,
analyzed_table.`state` AS grouping_level_2
FROM `<target_schema>`.`<target_table>` AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
DB2
{% import '/dialects/db2.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP_LIKE({{ lib.render_target_column('analyzed_table') }},
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[[:space:].-]?)?(\(?\d{3}\)?[[:space:].-]?)(\d{3}[[:space:].-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP_LIKE(analyzed_table."target_column",
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[[:space:].-]?)?(\(?\d{3}\)?[[:space:].-]?)(\d{3}[[:space:].-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table.grouping_level_1,
analyzed_table.grouping_level_2
FROM (
SELECT
original_table.*,
original_table."country" AS grouping_level_1,
original_table."state" AS grouping_level_2
FROM "<target_schema>"."<target_table>" original_table
) analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
DuckDB
{% import '/dialects/duckdb.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN {{ lib.render_target_column('analyzed_table') }}
!~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN analyzed_table."target_column"
!~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table."country" AS grouping_level_1,
analyzed_table."state" AS grouping_level_2
FROM AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
HANA
{% import '/dialects/hana.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN
CAST({{ lib.render_target_column('analyzed_table') }} AS VARCHAR) NOT LIKE_REGEXPR
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN
CAST(analyzed_table."target_column" AS VARCHAR) NOT LIKE_REGEXPR
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table.grouping_level_1,
analyzed_table.grouping_level_2
FROM (
SELECT
original_table.*,
original_table."country" AS grouping_level_1,
original_table."state" AS grouping_level_2
FROM "<target_schema>"."<target_table>" original_table
) analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
MariaDB
{% import '/dialects/mariadb.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT {{ lib.render_regex(lib.render_target_column('analyzed_table'), '^((((\\\\(\\\\+1\\\\)|(\\\\+1)|(\\\\([0][0][1]\\\\)|([0][0][1]))|\\\\(1\\\\)|(1))[\\\\s.-]?)?(\\\\(?[0-9]{3}\\\\)?[\\\\s.-]?)([0-9]{3}[\\\\s.-]?)([0-9]{4})))$') }}
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT
analyzed_table.`target_column` RLIKE '^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?[0-9]{3}\\)?[\\s.-]?)([0-9]{3}[\\s.-]?)([0-9]{4})))$'
THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table.`country` AS grouping_level_1,
analyzed_table.`state` AS grouping_level_2
FROM `<target_table>` AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
MySQL
{% import '/dialects/mysql.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT {{ lib.render_regex(lib.render_target_column('analyzed_table'), '^((((\\\\(\\\\+1\\\\)|(\\\\+1)|(\\\\([0][0][1]\\\\)|([0][0][1]))|\\\\(1\\\\)|(1))[\\\\s.-]?)?(\\\\(?[0-9]{3}\\\\)?[\\\\s.-]?)([0-9]{3}[\\\\s.-]?)([0-9]{4})))$') }}
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP_LIKE(analyzed_table.`target_column`, '^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?[0-9]{3}\\)?[\\s.-]?)([0-9]{3}[\\s.-]?)([0-9]{4})))$')
THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table.`country` AS grouping_level_1,
analyzed_table.`state` AS grouping_level_2
FROM `<target_table>` AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
Oracle
{% import '/dialects/oracle.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP_LIKE({{ lib.render_target_column('analyzed_table') }},
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[[:space:].-]?)?(\(?\d{3}\)?[[:space:].-]?)(\d{3}[[:space:].-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP_LIKE(analyzed_table."target_column",
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[[:space:].-]?)?(\(?\d{3}\)?[[:space:].-]?)(\d{3}[[:space:].-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table.grouping_level_1,
analyzed_table.grouping_level_2
FROM (
SELECT
original_table.*,
original_table."country" AS grouping_level_1,
original_table."state" AS grouping_level_2
FROM "<target_schema>"."<target_table>" original_table
) analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
PostgreSQL
{% import '/dialects/postgresql.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN {{ lib.render_target_column('analyzed_table') }} !~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN analyzed_table."target_column" !~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table."country" AS grouping_level_1,
analyzed_table."state" AS grouping_level_2
FROM "your_postgresql_database"."<target_schema>"."<target_table>" AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
Presto
{% import '/dialects/presto.sql.jinja2' as lib with context -%}
SELECT
CAST(SUM(
CASE
WHEN NOT REGEXP_LIKE(
CAST({{ lib.render_target_column('analyzed_table') }} AS VARCHAR),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS DOUBLE) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
CAST(SUM(
CASE
WHEN NOT REGEXP_LIKE(
CAST(analyzed_table."target_column" AS VARCHAR),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS DOUBLE) AS actual_value,
analyzed_table.grouping_level_1,
analyzed_table.grouping_level_2
FROM (
SELECT
original_table.*,
original_table."country" AS grouping_level_1,
original_table."state" AS grouping_level_2
FROM "your_trino_database"."<target_schema>"."<target_table>" original_table
) analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
QuestDB
{% import '/dialects/questdb.sql.jinja2' as lib with context -%}
SELECT
COALESCE(SUM(
CASE
WHEN {{ lib.render_target_column('analyzed_table') }} !~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
), 0) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM(
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
WHERE {{ lib.render_target_column('original_table') }} IS NOT NULL
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
COALESCE(SUM(
CASE
WHEN analyzed_table."target_column" !~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
), 0) AS actual_value,
analyzed_table.grouping_level_1,
analyzed_table.grouping_level_2
FROM(
SELECT
original_table.*,
original_table."country" AS grouping_level_1,
original_table."state" AS grouping_level_2
FROM "<target_table>" original_table
WHERE original_table."target_column" IS NOT NULL
) analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
Redshift
{% import '/dialects/redshift.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP_SUBSTR(replace(replace(replace({{ lib.render_target_column('analyzed_table') }}, '(', ''), ')', ''), '-', ''), '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$') IS NOT NULL
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP_SUBSTR(replace(replace(replace(analyzed_table."target_column", '(', ''), ')', ''), '-', ''), '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$') IS NOT NULL
THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table."country" AS grouping_level_1,
analyzed_table."state" AS grouping_level_2
FROM "your_redshift_database"."<target_schema>"."<target_table>" AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
Snowflake
{% import '/dialects/snowflake.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE WHEN NOT ({{ lib.render_target_column('analyzed_table') }} REGEXP '^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$')
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE WHEN NOT (analyzed_table."target_column" REGEXP '^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$')
THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table."country" AS grouping_level_1,
analyzed_table."state" AS grouping_level_2
FROM "your_snowflake_database"."<target_schema>"."<target_table>" AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
Spark
{% import '/dialects/spark.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP(
CAST({{ lib.render_target_column('analyzed_table') }} AS STRING),
"^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP(
CAST(analyzed_table.`target_column` AS STRING),
"^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table.`country` AS grouping_level_1,
analyzed_table.`state` AS grouping_level_2
FROM `<target_schema>`.`<target_table>` AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
SQL Server
{% import '/dialects/sqlserver.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT {{ lib.render_target_column('analyzed_table') }} LIKE '[0-9][0-9][0-9][-.][0-9][0-9][0-9][-.][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '+1([0-9][0-9][0-9])[0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '+1([0-9][0-9][0-9])[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '1([0-9][0-9][0-9])-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '1[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '1 [0-9][0-9][0-9] [0-9][0-9][0-9] [0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '+1 [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '([0-9][0-9][0-9])[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '(+1)%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '(1)%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '([0-9][0-9][0-9])-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT analyzed_table.[target_column] LIKE '[0-9][0-9][0-9][-.][0-9][0-9][0-9][-.][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '+1([0-9][0-9][0-9])[0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '+1([0-9][0-9][0-9])[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '1([0-9][0-9][0-9])-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '1[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '1 [0-9][0-9][0-9] [0-9][0-9][0-9] [0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '+1 [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '([0-9][0-9][0-9])[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '(+1)%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '(1)%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '([0-9][0-9][0-9])-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'
THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table.[country] AS grouping_level_1,
analyzed_table.[state] AS grouping_level_2
FROM [your_sql_server_database].[<target_schema>].[<target_table>] AS analyzed_table
GROUP BY analyzed_table.[country], analyzed_table.[state]
ORDER BY level_1, level_2
,
Teradata
{% import '/dialects/teradata.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN {{ lib.render_target_column('analyzed_table') }} IS NOT NULL AND
REGEXP_SUBSTR(
CAST({{ lib.render_target_column('analyzed_table') }} AS VARCHAR(4096)),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) IS NULL THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN analyzed_table."target_column" IS NOT NULL AND
REGEXP_SUBSTR(
CAST(analyzed_table."target_column" AS VARCHAR(4096)),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) IS NULL THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table."country" AS grouping_level_1,
analyzed_table."state" AS grouping_level_2
FROM "<target_schema>"."<target_table>" AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
Trino
{% import '/dialects/trino.sql.jinja2' as lib with context -%}
SELECT
CAST(SUM(
CASE
WHEN NOT REGEXP_LIKE(
CAST({{ lib.render_target_column('analyzed_table') }} AS VARCHAR),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS DOUBLE) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
CAST(SUM(
CASE
WHEN NOT REGEXP_LIKE(
CAST(analyzed_table."target_column" AS VARCHAR),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS DOUBLE) AS actual_value,
analyzed_table.grouping_level_1,
analyzed_table.grouping_level_2
FROM (
SELECT
original_table.*,
original_table."country" AS grouping_level_1,
original_table."state" AS grouping_level_2
FROM "your_trino_catalog"."<target_schema>"."<target_table>" original_table
) analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
daily invalid usa phone format found
Check description
Verifies that the number of invalid USA phone numbers in a text column does not exceed the maximum accepted count.
Data quality check name | Friendly name | Category | Check type | Time scale | Quality dimension | Sensor definition | Quality rule | Standard |
---|---|---|---|---|---|---|---|---|
daily_invalid_usa_phone_format_found |
Maximum count of rows containing invalid USA phone number values | patterns | monitoring | daily | Validity | invalid_usa_phone_count | max_count |
Command-line examples
Please expand the section below to see the DQOps command-line examples to run or activate the daily invalid usa phone format found data quality check.
Managing daily invalid usa phone format found check from DQOps shell
Activate this data quality using the check activate CLI command, providing the connection name, table name, check name, and all other filters. Activates the warning rule with the default parameters.
dqo> check activate -c=connection_name -t=schema_name.table_name -col=column_name -ch=daily_invalid_usa_phone_format_found --enable-warning
You can also use patterns to activate the check on all matching tables and columns.
dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=daily_invalid_usa_phone_format_found --enable-warning
Additional rule parameters are passed using the -Wrule_parameter_name=value.
Activate this data quality using the check activate CLI command, providing the connection name, table name, check name, and all other filters. Activates the error rule with the default parameters.
dqo> check activate -c=connection_name -t=schema_name.table_name -col=column_name -ch=daily_invalid_usa_phone_format_found --enable-error
You can also use patterns to activate the check on all matching tables and columns.
dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=daily_invalid_usa_phone_format_found --enable-error
Additional rule parameters are passed using the -Erule_parameter_name=value.
Run this data quality check using the check run CLI command by providing the check name and all other targeting filters. The following example shows how to run the daily_invalid_usa_phone_format_found check on all tables and columns on a single data source.
It is also possible to run this check on a specific connection and table. In order to do this, use the connection name and the full table name parameters.
dqo> check run -c=connection_name -t=schema_name.table_name -ch=daily_invalid_usa_phone_format_found
You can also run this check on all tables (and columns) on which the daily_invalid_usa_phone_format_found check is enabled using patterns to find tables.
YAML configuration
The sample schema_name.table_name.dqotable.yaml file with the check configured is shown below.
# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
columns:
target_column:
monitoring_checks:
daily:
patterns:
daily_invalid_usa_phone_format_found:
warning:
max_count: 0
error:
max_count: 10
fatal:
max_count: 100
labels:
- This is the column that is analyzed for data quality issues
Samples of generated SQL queries for each data source type
Please expand the database engine name section to see the SQL query rendered by a Jinja2 template for the invalid_usa_phone_count data quality sensor.
BigQuery
{% import '/dialects/bigquery.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP_CONTAINS(
CAST({{ lib.render_target_column('analyzed_table') }} AS STRING),
r"^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP_CONTAINS(
CAST(analyzed_table.`target_column` AS STRING),
r"^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value
FROM `your-google-project-id`.`<target_schema>`.`<target_table>` AS analyzed_table
ClickHouse
{% import '/dialects/clickhouse.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT match(
toString({{ lib.render_target_column('analyzed_table') }}),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
Databricks
{% import '/dialects/databricks.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP(
CAST({{ lib.render_target_column('analyzed_table') }} AS STRING),
"^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP(
CAST(analyzed_table.`target_column` AS STRING),
"^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value
FROM `<target_schema>`.`<target_table>` AS analyzed_table
DB2
{% import '/dialects/db2.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP_LIKE({{ lib.render_target_column('analyzed_table') }},
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[[:space:].-]?)?(\(?\d{3}\)?[[:space:].-]?)(\d{3}[[:space:].-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP_LIKE(analyzed_table."target_column",
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[[:space:].-]?)?(\(?\d{3}\)?[[:space:].-]?)(\d{3}[[:space:].-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value
FROM (
SELECT
original_table.*
FROM "<target_schema>"."<target_table>" original_table
) analyzed_table
DuckDB
{% import '/dialects/duckdb.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN {{ lib.render_target_column('analyzed_table') }}
!~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
HANA
{% import '/dialects/hana.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN
CAST({{ lib.render_target_column('analyzed_table') }} AS VARCHAR) NOT LIKE_REGEXPR
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN
CAST(analyzed_table."target_column" AS VARCHAR) NOT LIKE_REGEXPR
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value
FROM (
SELECT
original_table.*
FROM "<target_schema>"."<target_table>" original_table
) analyzed_table
MariaDB
{% import '/dialects/mariadb.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT {{ lib.render_regex(lib.render_target_column('analyzed_table'), '^((((\\\\(\\\\+1\\\\)|(\\\\+1)|(\\\\([0][0][1]\\\\)|([0][0][1]))|\\\\(1\\\\)|(1))[\\\\s.-]?)?(\\\\(?[0-9]{3}\\\\)?[\\\\s.-]?)([0-9]{3}[\\\\s.-]?)([0-9]{4})))$') }}
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
MySQL
{% import '/dialects/mysql.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT {{ lib.render_regex(lib.render_target_column('analyzed_table'), '^((((\\\\(\\\\+1\\\\)|(\\\\+1)|(\\\\([0][0][1]\\\\)|([0][0][1]))|\\\\(1\\\\)|(1))[\\\\s.-]?)?(\\\\(?[0-9]{3}\\\\)?[\\\\s.-]?)([0-9]{3}[\\\\s.-]?)([0-9]{4})))$') }}
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
Oracle
{% import '/dialects/oracle.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP_LIKE({{ lib.render_target_column('analyzed_table') }},
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[[:space:].-]?)?(\(?\d{3}\)?[[:space:].-]?)(\d{3}[[:space:].-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP_LIKE(analyzed_table."target_column",
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[[:space:].-]?)?(\(?\d{3}\)?[[:space:].-]?)(\d{3}[[:space:].-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value
FROM (
SELECT
original_table.*
FROM "<target_schema>"."<target_table>" original_table
) analyzed_table
PostgreSQL
{% import '/dialects/postgresql.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN {{ lib.render_target_column('analyzed_table') }} !~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
Presto
{% import '/dialects/presto.sql.jinja2' as lib with context -%}
SELECT
CAST(SUM(
CASE
WHEN NOT REGEXP_LIKE(
CAST({{ lib.render_target_column('analyzed_table') }} AS VARCHAR),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS DOUBLE) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
CAST(SUM(
CASE
WHEN NOT REGEXP_LIKE(
CAST(analyzed_table."target_column" AS VARCHAR),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS DOUBLE) AS actual_value
FROM (
SELECT
original_table.*
FROM "your_trino_database"."<target_schema>"."<target_table>" original_table
) analyzed_table
QuestDB
{% import '/dialects/questdb.sql.jinja2' as lib with context -%}
SELECT
COALESCE(SUM(
CASE
WHEN {{ lib.render_target_column('analyzed_table') }} !~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
), 0) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM(
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
WHERE {{ lib.render_target_column('original_table') }} IS NOT NULL
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
COALESCE(SUM(
CASE
WHEN analyzed_table."target_column" !~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
), 0) AS actual_value
FROM(
SELECT
original_table.*
FROM "<target_table>" original_table
WHERE original_table."target_column" IS NOT NULL
) analyzed_table
Redshift
{% import '/dialects/redshift.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP_SUBSTR(replace(replace(replace({{ lib.render_target_column('analyzed_table') }}, '(', ''), ')', ''), '-', ''), '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$') IS NOT NULL
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP_SUBSTR(replace(replace(replace(analyzed_table."target_column", '(', ''), ')', ''), '-', ''), '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$') IS NOT NULL
THEN 1
ELSE 0
END
) AS actual_value
FROM "your_redshift_database"."<target_schema>"."<target_table>" AS analyzed_table
Snowflake
{% import '/dialects/snowflake.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE WHEN NOT ({{ lib.render_target_column('analyzed_table') }} REGEXP '^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$')
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE WHEN NOT (analyzed_table."target_column" REGEXP '^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$')
THEN 1
ELSE 0
END
) AS actual_value
FROM "your_snowflake_database"."<target_schema>"."<target_table>" AS analyzed_table
Spark
{% import '/dialects/spark.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP(
CAST({{ lib.render_target_column('analyzed_table') }} AS STRING),
"^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP(
CAST(analyzed_table.`target_column` AS STRING),
"^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value
FROM `<target_schema>`.`<target_table>` AS analyzed_table
SQL Server
{% import '/dialects/sqlserver.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT {{ lib.render_target_column('analyzed_table') }} LIKE '[0-9][0-9][0-9][-.][0-9][0-9][0-9][-.][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '+1([0-9][0-9][0-9])[0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '+1([0-9][0-9][0-9])[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '1([0-9][0-9][0-9])-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '1[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '1 [0-9][0-9][0-9] [0-9][0-9][0-9] [0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '+1 [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '([0-9][0-9][0-9])[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '(+1)%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '(1)%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '([0-9][0-9][0-9])-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT analyzed_table.[target_column] LIKE '[0-9][0-9][0-9][-.][0-9][0-9][0-9][-.][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '+1([0-9][0-9][0-9])[0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '+1([0-9][0-9][0-9])[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '1([0-9][0-9][0-9])-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '1[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '1 [0-9][0-9][0-9] [0-9][0-9][0-9] [0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '+1 [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '([0-9][0-9][0-9])[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '(+1)%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '(1)%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '([0-9][0-9][0-9])-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'
THEN 1
ELSE 0
END
) AS actual_value
FROM [your_sql_server_database].[<target_schema>].[<target_table>] AS analyzed_table
Teradata
{% import '/dialects/teradata.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN {{ lib.render_target_column('analyzed_table') }} IS NOT NULL AND
REGEXP_SUBSTR(
CAST({{ lib.render_target_column('analyzed_table') }} AS VARCHAR(4096)),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) IS NULL THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN analyzed_table."target_column" IS NOT NULL AND
REGEXP_SUBSTR(
CAST(analyzed_table."target_column" AS VARCHAR(4096)),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) IS NULL THEN 1
ELSE 0
END
) AS actual_value
FROM "<target_schema>"."<target_table>" AS analyzed_table
Trino
{% import '/dialects/trino.sql.jinja2' as lib with context -%}
SELECT
CAST(SUM(
CASE
WHEN NOT REGEXP_LIKE(
CAST({{ lib.render_target_column('analyzed_table') }} AS VARCHAR),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS DOUBLE) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
CAST(SUM(
CASE
WHEN NOT REGEXP_LIKE(
CAST(analyzed_table."target_column" AS VARCHAR),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS DOUBLE) AS actual_value
FROM (
SELECT
original_table.*
FROM "your_trino_catalog"."<target_schema>"."<target_table>" original_table
) analyzed_table
Expand the Configure with data grouping section to see additional examples for configuring this data quality checks to use data grouping (GROUP BY).
Configuration with data grouping
Sample configuration with data grouping enabled (YAML) The sample below shows how to configure the data grouping and how it affects the generated SQL query.
# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
default_grouping_name: group_by_country_and_state
groupings:
group_by_country_and_state:
level_1:
source: column_value
column: country
level_2:
source: column_value
column: state
columns:
target_column:
monitoring_checks:
daily:
patterns:
daily_invalid_usa_phone_format_found:
warning:
max_count: 0
error:
max_count: 10
fatal:
max_count: 100
labels:
- This is the column that is analyzed for data quality issues
country:
labels:
- column used as the first grouping key
state:
labels:
- column used as the second grouping key
Please expand the database engine name section to see the SQL query rendered by a Jinja2 template for the invalid_usa_phone_count sensor.
BigQuery
{% import '/dialects/bigquery.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP_CONTAINS(
CAST({{ lib.render_target_column('analyzed_table') }} AS STRING),
r"^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP_CONTAINS(
CAST(analyzed_table.`target_column` AS STRING),
r"^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table.`country` AS grouping_level_1,
analyzed_table.`state` AS grouping_level_2
FROM `your-google-project-id`.`<target_schema>`.`<target_table>` AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
ClickHouse
{% import '/dialects/clickhouse.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT match(
toString({{ lib.render_target_column('analyzed_table') }}),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT match(
toString(analyzed_table."target_column"),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table."country" AS grouping_level_1,
analyzed_table."state" AS grouping_level_2
FROM "<target_schema>"."<target_table>" AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
Databricks
{% import '/dialects/databricks.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP(
CAST({{ lib.render_target_column('analyzed_table') }} AS STRING),
"^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP(
CAST(analyzed_table.`target_column` AS STRING),
"^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table.`country` AS grouping_level_1,
analyzed_table.`state` AS grouping_level_2
FROM `<target_schema>`.`<target_table>` AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
DB2
{% import '/dialects/db2.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP_LIKE({{ lib.render_target_column('analyzed_table') }},
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[[:space:].-]?)?(\(?\d{3}\)?[[:space:].-]?)(\d{3}[[:space:].-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP_LIKE(analyzed_table."target_column",
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[[:space:].-]?)?(\(?\d{3}\)?[[:space:].-]?)(\d{3}[[:space:].-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table.grouping_level_1,
analyzed_table.grouping_level_2
FROM (
SELECT
original_table.*,
original_table."country" AS grouping_level_1,
original_table."state" AS grouping_level_2
FROM "<target_schema>"."<target_table>" original_table
) analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
DuckDB
{% import '/dialects/duckdb.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN {{ lib.render_target_column('analyzed_table') }}
!~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN analyzed_table."target_column"
!~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table."country" AS grouping_level_1,
analyzed_table."state" AS grouping_level_2
FROM AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
HANA
{% import '/dialects/hana.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN
CAST({{ lib.render_target_column('analyzed_table') }} AS VARCHAR) NOT LIKE_REGEXPR
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN
CAST(analyzed_table."target_column" AS VARCHAR) NOT LIKE_REGEXPR
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table.grouping_level_1,
analyzed_table.grouping_level_2
FROM (
SELECT
original_table.*,
original_table."country" AS grouping_level_1,
original_table."state" AS grouping_level_2
FROM "<target_schema>"."<target_table>" original_table
) analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
MariaDB
{% import '/dialects/mariadb.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT {{ lib.render_regex(lib.render_target_column('analyzed_table'), '^((((\\\\(\\\\+1\\\\)|(\\\\+1)|(\\\\([0][0][1]\\\\)|([0][0][1]))|\\\\(1\\\\)|(1))[\\\\s.-]?)?(\\\\(?[0-9]{3}\\\\)?[\\\\s.-]?)([0-9]{3}[\\\\s.-]?)([0-9]{4})))$') }}
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT
analyzed_table.`target_column` RLIKE '^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?[0-9]{3}\\)?[\\s.-]?)([0-9]{3}[\\s.-]?)([0-9]{4})))$'
THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table.`country` AS grouping_level_1,
analyzed_table.`state` AS grouping_level_2
FROM `<target_table>` AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
MySQL
{% import '/dialects/mysql.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT {{ lib.render_regex(lib.render_target_column('analyzed_table'), '^((((\\\\(\\\\+1\\\\)|(\\\\+1)|(\\\\([0][0][1]\\\\)|([0][0][1]))|\\\\(1\\\\)|(1))[\\\\s.-]?)?(\\\\(?[0-9]{3}\\\\)?[\\\\s.-]?)([0-9]{3}[\\\\s.-]?)([0-9]{4})))$') }}
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP_LIKE(analyzed_table.`target_column`, '^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?[0-9]{3}\\)?[\\s.-]?)([0-9]{3}[\\s.-]?)([0-9]{4})))$')
THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table.`country` AS grouping_level_1,
analyzed_table.`state` AS grouping_level_2
FROM `<target_table>` AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
Oracle
{% import '/dialects/oracle.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP_LIKE({{ lib.render_target_column('analyzed_table') }},
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[[:space:].-]?)?(\(?\d{3}\)?[[:space:].-]?)(\d{3}[[:space:].-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP_LIKE(analyzed_table."target_column",
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[[:space:].-]?)?(\(?\d{3}\)?[[:space:].-]?)(\d{3}[[:space:].-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table.grouping_level_1,
analyzed_table.grouping_level_2
FROM (
SELECT
original_table.*,
original_table."country" AS grouping_level_1,
original_table."state" AS grouping_level_2
FROM "<target_schema>"."<target_table>" original_table
) analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
PostgreSQL
{% import '/dialects/postgresql.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN {{ lib.render_target_column('analyzed_table') }} !~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN analyzed_table."target_column" !~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table."country" AS grouping_level_1,
analyzed_table."state" AS grouping_level_2
FROM "your_postgresql_database"."<target_schema>"."<target_table>" AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
Presto
{% import '/dialects/presto.sql.jinja2' as lib with context -%}
SELECT
CAST(SUM(
CASE
WHEN NOT REGEXP_LIKE(
CAST({{ lib.render_target_column('analyzed_table') }} AS VARCHAR),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS DOUBLE) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
CAST(SUM(
CASE
WHEN NOT REGEXP_LIKE(
CAST(analyzed_table."target_column" AS VARCHAR),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS DOUBLE) AS actual_value,
analyzed_table.grouping_level_1,
analyzed_table.grouping_level_2
FROM (
SELECT
original_table.*,
original_table."country" AS grouping_level_1,
original_table."state" AS grouping_level_2
FROM "your_trino_database"."<target_schema>"."<target_table>" original_table
) analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
QuestDB
{% import '/dialects/questdb.sql.jinja2' as lib with context -%}
SELECT
COALESCE(SUM(
CASE
WHEN {{ lib.render_target_column('analyzed_table') }} !~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
), 0) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM(
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
WHERE {{ lib.render_target_column('original_table') }} IS NOT NULL
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
COALESCE(SUM(
CASE
WHEN analyzed_table."target_column" !~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
), 0) AS actual_value,
analyzed_table.grouping_level_1,
analyzed_table.grouping_level_2
FROM(
SELECT
original_table.*,
original_table."country" AS grouping_level_1,
original_table."state" AS grouping_level_2
FROM "<target_table>" original_table
WHERE original_table."target_column" IS NOT NULL
) analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
Redshift
{% import '/dialects/redshift.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP_SUBSTR(replace(replace(replace({{ lib.render_target_column('analyzed_table') }}, '(', ''), ')', ''), '-', ''), '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$') IS NOT NULL
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP_SUBSTR(replace(replace(replace(analyzed_table."target_column", '(', ''), ')', ''), '-', ''), '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$') IS NOT NULL
THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table."country" AS grouping_level_1,
analyzed_table."state" AS grouping_level_2
FROM "your_redshift_database"."<target_schema>"."<target_table>" AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
Snowflake
{% import '/dialects/snowflake.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE WHEN NOT ({{ lib.render_target_column('analyzed_table') }} REGEXP '^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$')
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE WHEN NOT (analyzed_table."target_column" REGEXP '^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$')
THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table."country" AS grouping_level_1,
analyzed_table."state" AS grouping_level_2
FROM "your_snowflake_database"."<target_schema>"."<target_table>" AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
Spark
{% import '/dialects/spark.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP(
CAST({{ lib.render_target_column('analyzed_table') }} AS STRING),
"^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP(
CAST(analyzed_table.`target_column` AS STRING),
"^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table.`country` AS grouping_level_1,
analyzed_table.`state` AS grouping_level_2
FROM `<target_schema>`.`<target_table>` AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
SQL Server
{% import '/dialects/sqlserver.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT {{ lib.render_target_column('analyzed_table') }} LIKE '[0-9][0-9][0-9][-.][0-9][0-9][0-9][-.][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '+1([0-9][0-9][0-9])[0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '+1([0-9][0-9][0-9])[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '1([0-9][0-9][0-9])-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '1[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '1 [0-9][0-9][0-9] [0-9][0-9][0-9] [0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '+1 [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '([0-9][0-9][0-9])[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '(+1)%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '(1)%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '([0-9][0-9][0-9])-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT analyzed_table.[target_column] LIKE '[0-9][0-9][0-9][-.][0-9][0-9][0-9][-.][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '+1([0-9][0-9][0-9])[0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '+1([0-9][0-9][0-9])[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '1([0-9][0-9][0-9])-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '1[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '1 [0-9][0-9][0-9] [0-9][0-9][0-9] [0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '+1 [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '([0-9][0-9][0-9])[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '(+1)%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '(1)%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '([0-9][0-9][0-9])-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'
THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table.[country] AS grouping_level_1,
analyzed_table.[state] AS grouping_level_2
FROM [your_sql_server_database].[<target_schema>].[<target_table>] AS analyzed_table
GROUP BY analyzed_table.[country], analyzed_table.[state]
ORDER BY level_1, level_2
,
Teradata
{% import '/dialects/teradata.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN {{ lib.render_target_column('analyzed_table') }} IS NOT NULL AND
REGEXP_SUBSTR(
CAST({{ lib.render_target_column('analyzed_table') }} AS VARCHAR(4096)),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) IS NULL THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN analyzed_table."target_column" IS NOT NULL AND
REGEXP_SUBSTR(
CAST(analyzed_table."target_column" AS VARCHAR(4096)),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) IS NULL THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table."country" AS grouping_level_1,
analyzed_table."state" AS grouping_level_2
FROM "<target_schema>"."<target_table>" AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
Trino
{% import '/dialects/trino.sql.jinja2' as lib with context -%}
SELECT
CAST(SUM(
CASE
WHEN NOT REGEXP_LIKE(
CAST({{ lib.render_target_column('analyzed_table') }} AS VARCHAR),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS DOUBLE) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
CAST(SUM(
CASE
WHEN NOT REGEXP_LIKE(
CAST(analyzed_table."target_column" AS VARCHAR),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS DOUBLE) AS actual_value,
analyzed_table.grouping_level_1,
analyzed_table.grouping_level_2
FROM (
SELECT
original_table.*,
original_table."country" AS grouping_level_1,
original_table."state" AS grouping_level_2
FROM "your_trino_catalog"."<target_schema>"."<target_table>" original_table
) analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
monthly invalid usa phone format found
Check description
Verifies that the number of invalid USA phone numbers in a text column does not exceed the maximum accepted count.
Data quality check name | Friendly name | Category | Check type | Time scale | Quality dimension | Sensor definition | Quality rule | Standard |
---|---|---|---|---|---|---|---|---|
monthly_invalid_usa_phone_format_found |
Maximum count of rows containing invalid USA phone number values | patterns | monitoring | monthly | Validity | invalid_usa_phone_count | max_count |
Command-line examples
Please expand the section below to see the DQOps command-line examples to run or activate the monthly invalid usa phone format found data quality check.
Managing monthly invalid usa phone format found check from DQOps shell
Activate this data quality using the check activate CLI command, providing the connection name, table name, check name, and all other filters. Activates the warning rule with the default parameters.
dqo> check activate -c=connection_name -t=schema_name.table_name -col=column_name -ch=monthly_invalid_usa_phone_format_found --enable-warning
You can also use patterns to activate the check on all matching tables and columns.
dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=monthly_invalid_usa_phone_format_found --enable-warning
Additional rule parameters are passed using the -Wrule_parameter_name=value.
Activate this data quality using the check activate CLI command, providing the connection name, table name, check name, and all other filters. Activates the error rule with the default parameters.
dqo> check activate -c=connection_name -t=schema_name.table_name -col=column_name -ch=monthly_invalid_usa_phone_format_found --enable-error
You can also use patterns to activate the check on all matching tables and columns.
dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=monthly_invalid_usa_phone_format_found --enable-error
Additional rule parameters are passed using the -Erule_parameter_name=value.
Run this data quality check using the check run CLI command by providing the check name and all other targeting filters. The following example shows how to run the monthly_invalid_usa_phone_format_found check on all tables and columns on a single data source.
It is also possible to run this check on a specific connection and table. In order to do this, use the connection name and the full table name parameters.
dqo> check run -c=connection_name -t=schema_name.table_name -ch=monthly_invalid_usa_phone_format_found
You can also run this check on all tables (and columns) on which the monthly_invalid_usa_phone_format_found check is enabled using patterns to find tables.
YAML configuration
The sample schema_name.table_name.dqotable.yaml file with the check configured is shown below.
# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
columns:
target_column:
monitoring_checks:
monthly:
patterns:
monthly_invalid_usa_phone_format_found:
warning:
max_count: 0
error:
max_count: 10
fatal:
max_count: 100
labels:
- This is the column that is analyzed for data quality issues
Samples of generated SQL queries for each data source type
Please expand the database engine name section to see the SQL query rendered by a Jinja2 template for the invalid_usa_phone_count data quality sensor.
BigQuery
{% import '/dialects/bigquery.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP_CONTAINS(
CAST({{ lib.render_target_column('analyzed_table') }} AS STRING),
r"^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP_CONTAINS(
CAST(analyzed_table.`target_column` AS STRING),
r"^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value
FROM `your-google-project-id`.`<target_schema>`.`<target_table>` AS analyzed_table
ClickHouse
{% import '/dialects/clickhouse.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT match(
toString({{ lib.render_target_column('analyzed_table') }}),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
Databricks
{% import '/dialects/databricks.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP(
CAST({{ lib.render_target_column('analyzed_table') }} AS STRING),
"^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP(
CAST(analyzed_table.`target_column` AS STRING),
"^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value
FROM `<target_schema>`.`<target_table>` AS analyzed_table
DB2
{% import '/dialects/db2.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP_LIKE({{ lib.render_target_column('analyzed_table') }},
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[[:space:].-]?)?(\(?\d{3}\)?[[:space:].-]?)(\d{3}[[:space:].-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP_LIKE(analyzed_table."target_column",
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[[:space:].-]?)?(\(?\d{3}\)?[[:space:].-]?)(\d{3}[[:space:].-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value
FROM (
SELECT
original_table.*
FROM "<target_schema>"."<target_table>" original_table
) analyzed_table
DuckDB
{% import '/dialects/duckdb.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN {{ lib.render_target_column('analyzed_table') }}
!~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
HANA
{% import '/dialects/hana.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN
CAST({{ lib.render_target_column('analyzed_table') }} AS VARCHAR) NOT LIKE_REGEXPR
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN
CAST(analyzed_table."target_column" AS VARCHAR) NOT LIKE_REGEXPR
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value
FROM (
SELECT
original_table.*
FROM "<target_schema>"."<target_table>" original_table
) analyzed_table
MariaDB
{% import '/dialects/mariadb.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT {{ lib.render_regex(lib.render_target_column('analyzed_table'), '^((((\\\\(\\\\+1\\\\)|(\\\\+1)|(\\\\([0][0][1]\\\\)|([0][0][1]))|\\\\(1\\\\)|(1))[\\\\s.-]?)?(\\\\(?[0-9]{3}\\\\)?[\\\\s.-]?)([0-9]{3}[\\\\s.-]?)([0-9]{4})))$') }}
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
MySQL
{% import '/dialects/mysql.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT {{ lib.render_regex(lib.render_target_column('analyzed_table'), '^((((\\\\(\\\\+1\\\\)|(\\\\+1)|(\\\\([0][0][1]\\\\)|([0][0][1]))|\\\\(1\\\\)|(1))[\\\\s.-]?)?(\\\\(?[0-9]{3}\\\\)?[\\\\s.-]?)([0-9]{3}[\\\\s.-]?)([0-9]{4})))$') }}
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
Oracle
{% import '/dialects/oracle.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP_LIKE({{ lib.render_target_column('analyzed_table') }},
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[[:space:].-]?)?(\(?\d{3}\)?[[:space:].-]?)(\d{3}[[:space:].-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP_LIKE(analyzed_table."target_column",
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[[:space:].-]?)?(\(?\d{3}\)?[[:space:].-]?)(\d{3}[[:space:].-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value
FROM (
SELECT
original_table.*
FROM "<target_schema>"."<target_table>" original_table
) analyzed_table
PostgreSQL
{% import '/dialects/postgresql.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN {{ lib.render_target_column('analyzed_table') }} !~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
Presto
{% import '/dialects/presto.sql.jinja2' as lib with context -%}
SELECT
CAST(SUM(
CASE
WHEN NOT REGEXP_LIKE(
CAST({{ lib.render_target_column('analyzed_table') }} AS VARCHAR),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS DOUBLE) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
CAST(SUM(
CASE
WHEN NOT REGEXP_LIKE(
CAST(analyzed_table."target_column" AS VARCHAR),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS DOUBLE) AS actual_value
FROM (
SELECT
original_table.*
FROM "your_trino_database"."<target_schema>"."<target_table>" original_table
) analyzed_table
QuestDB
{% import '/dialects/questdb.sql.jinja2' as lib with context -%}
SELECT
COALESCE(SUM(
CASE
WHEN {{ lib.render_target_column('analyzed_table') }} !~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
), 0) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM(
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
WHERE {{ lib.render_target_column('original_table') }} IS NOT NULL
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
COALESCE(SUM(
CASE
WHEN analyzed_table."target_column" !~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
), 0) AS actual_value
FROM(
SELECT
original_table.*
FROM "<target_table>" original_table
WHERE original_table."target_column" IS NOT NULL
) analyzed_table
Redshift
{% import '/dialects/redshift.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP_SUBSTR(replace(replace(replace({{ lib.render_target_column('analyzed_table') }}, '(', ''), ')', ''), '-', ''), '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$') IS NOT NULL
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP_SUBSTR(replace(replace(replace(analyzed_table."target_column", '(', ''), ')', ''), '-', ''), '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$') IS NOT NULL
THEN 1
ELSE 0
END
) AS actual_value
FROM "your_redshift_database"."<target_schema>"."<target_table>" AS analyzed_table
Snowflake
{% import '/dialects/snowflake.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE WHEN NOT ({{ lib.render_target_column('analyzed_table') }} REGEXP '^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$')
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE WHEN NOT (analyzed_table."target_column" REGEXP '^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$')
THEN 1
ELSE 0
END
) AS actual_value
FROM "your_snowflake_database"."<target_schema>"."<target_table>" AS analyzed_table
Spark
{% import '/dialects/spark.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP(
CAST({{ lib.render_target_column('analyzed_table') }} AS STRING),
"^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP(
CAST(analyzed_table.`target_column` AS STRING),
"^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value
FROM `<target_schema>`.`<target_table>` AS analyzed_table
SQL Server
{% import '/dialects/sqlserver.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT {{ lib.render_target_column('analyzed_table') }} LIKE '[0-9][0-9][0-9][-.][0-9][0-9][0-9][-.][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '+1([0-9][0-9][0-9])[0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '+1([0-9][0-9][0-9])[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '1([0-9][0-9][0-9])-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '1[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '1 [0-9][0-9][0-9] [0-9][0-9][0-9] [0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '+1 [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '([0-9][0-9][0-9])[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '(+1)%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '(1)%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '([0-9][0-9][0-9])-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT analyzed_table.[target_column] LIKE '[0-9][0-9][0-9][-.][0-9][0-9][0-9][-.][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '+1([0-9][0-9][0-9])[0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '+1([0-9][0-9][0-9])[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '1([0-9][0-9][0-9])-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '1[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '1 [0-9][0-9][0-9] [0-9][0-9][0-9] [0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '+1 [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '([0-9][0-9][0-9])[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '(+1)%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '(1)%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '([0-9][0-9][0-9])-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'
THEN 1
ELSE 0
END
) AS actual_value
FROM [your_sql_server_database].[<target_schema>].[<target_table>] AS analyzed_table
Teradata
{% import '/dialects/teradata.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN {{ lib.render_target_column('analyzed_table') }} IS NOT NULL AND
REGEXP_SUBSTR(
CAST({{ lib.render_target_column('analyzed_table') }} AS VARCHAR(4096)),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) IS NULL THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN analyzed_table."target_column" IS NOT NULL AND
REGEXP_SUBSTR(
CAST(analyzed_table."target_column" AS VARCHAR(4096)),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) IS NULL THEN 1
ELSE 0
END
) AS actual_value
FROM "<target_schema>"."<target_table>" AS analyzed_table
Trino
{% import '/dialects/trino.sql.jinja2' as lib with context -%}
SELECT
CAST(SUM(
CASE
WHEN NOT REGEXP_LIKE(
CAST({{ lib.render_target_column('analyzed_table') }} AS VARCHAR),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS DOUBLE) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
CAST(SUM(
CASE
WHEN NOT REGEXP_LIKE(
CAST(analyzed_table."target_column" AS VARCHAR),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS DOUBLE) AS actual_value
FROM (
SELECT
original_table.*
FROM "your_trino_catalog"."<target_schema>"."<target_table>" original_table
) analyzed_table
Expand the Configure with data grouping section to see additional examples for configuring this data quality checks to use data grouping (GROUP BY).
Configuration with data grouping
Sample configuration with data grouping enabled (YAML) The sample below shows how to configure the data grouping and how it affects the generated SQL query.
# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
default_grouping_name: group_by_country_and_state
groupings:
group_by_country_and_state:
level_1:
source: column_value
column: country
level_2:
source: column_value
column: state
columns:
target_column:
monitoring_checks:
monthly:
patterns:
monthly_invalid_usa_phone_format_found:
warning:
max_count: 0
error:
max_count: 10
fatal:
max_count: 100
labels:
- This is the column that is analyzed for data quality issues
country:
labels:
- column used as the first grouping key
state:
labels:
- column used as the second grouping key
Please expand the database engine name section to see the SQL query rendered by a Jinja2 template for the invalid_usa_phone_count sensor.
BigQuery
{% import '/dialects/bigquery.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP_CONTAINS(
CAST({{ lib.render_target_column('analyzed_table') }} AS STRING),
r"^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP_CONTAINS(
CAST(analyzed_table.`target_column` AS STRING),
r"^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table.`country` AS grouping_level_1,
analyzed_table.`state` AS grouping_level_2
FROM `your-google-project-id`.`<target_schema>`.`<target_table>` AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
ClickHouse
{% import '/dialects/clickhouse.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT match(
toString({{ lib.render_target_column('analyzed_table') }}),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT match(
toString(analyzed_table."target_column"),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table."country" AS grouping_level_1,
analyzed_table."state" AS grouping_level_2
FROM "<target_schema>"."<target_table>" AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
Databricks
{% import '/dialects/databricks.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP(
CAST({{ lib.render_target_column('analyzed_table') }} AS STRING),
"^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP(
CAST(analyzed_table.`target_column` AS STRING),
"^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table.`country` AS grouping_level_1,
analyzed_table.`state` AS grouping_level_2
FROM `<target_schema>`.`<target_table>` AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
DB2
{% import '/dialects/db2.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP_LIKE({{ lib.render_target_column('analyzed_table') }},
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[[:space:].-]?)?(\(?\d{3}\)?[[:space:].-]?)(\d{3}[[:space:].-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP_LIKE(analyzed_table."target_column",
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[[:space:].-]?)?(\(?\d{3}\)?[[:space:].-]?)(\d{3}[[:space:].-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table.grouping_level_1,
analyzed_table.grouping_level_2
FROM (
SELECT
original_table.*,
original_table."country" AS grouping_level_1,
original_table."state" AS grouping_level_2
FROM "<target_schema>"."<target_table>" original_table
) analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
DuckDB
{% import '/dialects/duckdb.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN {{ lib.render_target_column('analyzed_table') }}
!~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN analyzed_table."target_column"
!~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table."country" AS grouping_level_1,
analyzed_table."state" AS grouping_level_2
FROM AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
HANA
{% import '/dialects/hana.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN
CAST({{ lib.render_target_column('analyzed_table') }} AS VARCHAR) NOT LIKE_REGEXPR
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN
CAST(analyzed_table."target_column" AS VARCHAR) NOT LIKE_REGEXPR
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table.grouping_level_1,
analyzed_table.grouping_level_2
FROM (
SELECT
original_table.*,
original_table."country" AS grouping_level_1,
original_table."state" AS grouping_level_2
FROM "<target_schema>"."<target_table>" original_table
) analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
MariaDB
{% import '/dialects/mariadb.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT {{ lib.render_regex(lib.render_target_column('analyzed_table'), '^((((\\\\(\\\\+1\\\\)|(\\\\+1)|(\\\\([0][0][1]\\\\)|([0][0][1]))|\\\\(1\\\\)|(1))[\\\\s.-]?)?(\\\\(?[0-9]{3}\\\\)?[\\\\s.-]?)([0-9]{3}[\\\\s.-]?)([0-9]{4})))$') }}
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT
analyzed_table.`target_column` RLIKE '^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?[0-9]{3}\\)?[\\s.-]?)([0-9]{3}[\\s.-]?)([0-9]{4})))$'
THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table.`country` AS grouping_level_1,
analyzed_table.`state` AS grouping_level_2
FROM `<target_table>` AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
MySQL
{% import '/dialects/mysql.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT {{ lib.render_regex(lib.render_target_column('analyzed_table'), '^((((\\\\(\\\\+1\\\\)|(\\\\+1)|(\\\\([0][0][1]\\\\)|([0][0][1]))|\\\\(1\\\\)|(1))[\\\\s.-]?)?(\\\\(?[0-9]{3}\\\\)?[\\\\s.-]?)([0-9]{3}[\\\\s.-]?)([0-9]{4})))$') }}
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP_LIKE(analyzed_table.`target_column`, '^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?[0-9]{3}\\)?[\\s.-]?)([0-9]{3}[\\s.-]?)([0-9]{4})))$')
THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table.`country` AS grouping_level_1,
analyzed_table.`state` AS grouping_level_2
FROM `<target_table>` AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
Oracle
{% import '/dialects/oracle.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP_LIKE({{ lib.render_target_column('analyzed_table') }},
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[[:space:].-]?)?(\(?\d{3}\)?[[:space:].-]?)(\d{3}[[:space:].-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP_LIKE(analyzed_table."target_column",
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[[:space:].-]?)?(\(?\d{3}\)?[[:space:].-]?)(\d{3}[[:space:].-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table.grouping_level_1,
analyzed_table.grouping_level_2
FROM (
SELECT
original_table.*,
original_table."country" AS grouping_level_1,
original_table."state" AS grouping_level_2
FROM "<target_schema>"."<target_table>" original_table
) analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
PostgreSQL
{% import '/dialects/postgresql.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN {{ lib.render_target_column('analyzed_table') }} !~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN analyzed_table."target_column" !~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table."country" AS grouping_level_1,
analyzed_table."state" AS grouping_level_2
FROM "your_postgresql_database"."<target_schema>"."<target_table>" AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
Presto
{% import '/dialects/presto.sql.jinja2' as lib with context -%}
SELECT
CAST(SUM(
CASE
WHEN NOT REGEXP_LIKE(
CAST({{ lib.render_target_column('analyzed_table') }} AS VARCHAR),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS DOUBLE) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
CAST(SUM(
CASE
WHEN NOT REGEXP_LIKE(
CAST(analyzed_table."target_column" AS VARCHAR),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS DOUBLE) AS actual_value,
analyzed_table.grouping_level_1,
analyzed_table.grouping_level_2
FROM (
SELECT
original_table.*,
original_table."country" AS grouping_level_1,
original_table."state" AS grouping_level_2
FROM "your_trino_database"."<target_schema>"."<target_table>" original_table
) analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
QuestDB
{% import '/dialects/questdb.sql.jinja2' as lib with context -%}
SELECT
COALESCE(SUM(
CASE
WHEN {{ lib.render_target_column('analyzed_table') }} !~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
), 0) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM(
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
WHERE {{ lib.render_target_column('original_table') }} IS NOT NULL
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
COALESCE(SUM(
CASE
WHEN analyzed_table."target_column" !~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
), 0) AS actual_value,
analyzed_table.grouping_level_1,
analyzed_table.grouping_level_2
FROM(
SELECT
original_table.*,
original_table."country" AS grouping_level_1,
original_table."state" AS grouping_level_2
FROM "<target_table>" original_table
WHERE original_table."target_column" IS NOT NULL
) analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
Redshift
{% import '/dialects/redshift.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP_SUBSTR(replace(replace(replace({{ lib.render_target_column('analyzed_table') }}, '(', ''), ')', ''), '-', ''), '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$') IS NOT NULL
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP_SUBSTR(replace(replace(replace(analyzed_table."target_column", '(', ''), ')', ''), '-', ''), '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$') IS NOT NULL
THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table."country" AS grouping_level_1,
analyzed_table."state" AS grouping_level_2
FROM "your_redshift_database"."<target_schema>"."<target_table>" AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
Snowflake
{% import '/dialects/snowflake.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE WHEN NOT ({{ lib.render_target_column('analyzed_table') }} REGEXP '^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$')
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE WHEN NOT (analyzed_table."target_column" REGEXP '^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$')
THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table."country" AS grouping_level_1,
analyzed_table."state" AS grouping_level_2
FROM "your_snowflake_database"."<target_schema>"."<target_table>" AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
Spark
{% import '/dialects/spark.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP(
CAST({{ lib.render_target_column('analyzed_table') }} AS STRING),
"^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP(
CAST(analyzed_table.`target_column` AS STRING),
"^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table.`country` AS grouping_level_1,
analyzed_table.`state` AS grouping_level_2
FROM `<target_schema>`.`<target_table>` AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
SQL Server
{% import '/dialects/sqlserver.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT {{ lib.render_target_column('analyzed_table') }} LIKE '[0-9][0-9][0-9][-.][0-9][0-9][0-9][-.][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '+1([0-9][0-9][0-9])[0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '+1([0-9][0-9][0-9])[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '1([0-9][0-9][0-9])-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '1[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '1 [0-9][0-9][0-9] [0-9][0-9][0-9] [0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '+1 [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '([0-9][0-9][0-9])[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '(+1)%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '(1)%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '([0-9][0-9][0-9])-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT analyzed_table.[target_column] LIKE '[0-9][0-9][0-9][-.][0-9][0-9][0-9][-.][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '+1([0-9][0-9][0-9])[0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '+1([0-9][0-9][0-9])[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '1([0-9][0-9][0-9])-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '1[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '1 [0-9][0-9][0-9] [0-9][0-9][0-9] [0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '+1 [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '([0-9][0-9][0-9])[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '(+1)%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '(1)%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '([0-9][0-9][0-9])-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'
THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table.[country] AS grouping_level_1,
analyzed_table.[state] AS grouping_level_2
FROM [your_sql_server_database].[<target_schema>].[<target_table>] AS analyzed_table
GROUP BY analyzed_table.[country], analyzed_table.[state]
ORDER BY level_1, level_2
,
Teradata
{% import '/dialects/teradata.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN {{ lib.render_target_column('analyzed_table') }} IS NOT NULL AND
REGEXP_SUBSTR(
CAST({{ lib.render_target_column('analyzed_table') }} AS VARCHAR(4096)),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) IS NULL THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN analyzed_table."target_column" IS NOT NULL AND
REGEXP_SUBSTR(
CAST(analyzed_table."target_column" AS VARCHAR(4096)),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) IS NULL THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table."country" AS grouping_level_1,
analyzed_table."state" AS grouping_level_2
FROM "<target_schema>"."<target_table>" AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
Trino
{% import '/dialects/trino.sql.jinja2' as lib with context -%}
SELECT
CAST(SUM(
CASE
WHEN NOT REGEXP_LIKE(
CAST({{ lib.render_target_column('analyzed_table') }} AS VARCHAR),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS DOUBLE) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
CAST(SUM(
CASE
WHEN NOT REGEXP_LIKE(
CAST(analyzed_table."target_column" AS VARCHAR),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS DOUBLE) AS actual_value,
analyzed_table.grouping_level_1,
analyzed_table.grouping_level_2
FROM (
SELECT
original_table.*,
original_table."country" AS grouping_level_1,
original_table."state" AS grouping_level_2
FROM "your_trino_catalog"."<target_schema>"."<target_table>" original_table
) analyzed_table
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2
daily partition invalid usa phone format found
Check description
Verifies that the number of invalid USA phone numbers in a text column does not exceed the maximum accepted count.
Data quality check name | Friendly name | Category | Check type | Time scale | Quality dimension | Sensor definition | Quality rule | Standard |
---|---|---|---|---|---|---|---|---|
daily_partition_invalid_usa_phone_format_found |
Maximum count of rows containing invalid USA phone number values | patterns | partitioned | daily | Validity | invalid_usa_phone_count | max_count |
Command-line examples
Please expand the section below to see the DQOps command-line examples to run or activate the daily partition invalid usa phone format found data quality check.
Managing daily partition invalid usa phone format found check from DQOps shell
Activate this data quality using the check activate CLI command, providing the connection name, table name, check name, and all other filters. Activates the warning rule with the default parameters.
dqo> check activate -c=connection_name -t=schema_name.table_name -col=column_name -ch=daily_partition_invalid_usa_phone_format_found --enable-warning
You can also use patterns to activate the check on all matching tables and columns.
dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=daily_partition_invalid_usa_phone_format_found --enable-warning
Additional rule parameters are passed using the -Wrule_parameter_name=value.
Activate this data quality using the check activate CLI command, providing the connection name, table name, check name, and all other filters. Activates the error rule with the default parameters.
dqo> check activate -c=connection_name -t=schema_name.table_name -col=column_name -ch=daily_partition_invalid_usa_phone_format_found --enable-error
You can also use patterns to activate the check on all matching tables and columns.
dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=daily_partition_invalid_usa_phone_format_found --enable-error
Additional rule parameters are passed using the -Erule_parameter_name=value.
Run this data quality check using the check run CLI command by providing the check name and all other targeting filters. The following example shows how to run the daily_partition_invalid_usa_phone_format_found check on all tables and columns on a single data source.
It is also possible to run this check on a specific connection and table. In order to do this, use the connection name and the full table name parameters.
dqo> check run -c=connection_name -t=schema_name.table_name -ch=daily_partition_invalid_usa_phone_format_found
You can also run this check on all tables (and columns) on which the daily_partition_invalid_usa_phone_format_found check is enabled using patterns to find tables.
YAML configuration
The sample schema_name.table_name.dqotable.yaml file with the check configured is shown below.
# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
timestamp_columns:
partition_by_column: date_column
incremental_time_window:
daily_partitioning_recent_days: 7
monthly_partitioning_recent_months: 1
columns:
target_column:
partitioned_checks:
daily:
patterns:
daily_partition_invalid_usa_phone_format_found:
warning:
max_count: 0
error:
max_count: 10
fatal:
max_count: 100
labels:
- This is the column that is analyzed for data quality issues
date_column:
labels:
- "date or datetime column used as a daily or monthly partitioning key, dates\
\ (and times) are truncated to a day or a month by the sensor's query for\
\ partitioned checks"
Samples of generated SQL queries for each data source type
Please expand the database engine name section to see the SQL query rendered by a Jinja2 template for the invalid_usa_phone_count data quality sensor.
BigQuery
{% import '/dialects/bigquery.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP_CONTAINS(
CAST({{ lib.render_target_column('analyzed_table') }} AS STRING),
r"^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP_CONTAINS(
CAST(analyzed_table.`target_column` AS STRING),
r"^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value,
CAST(analyzed_table.`date_column` AS DATE) AS time_period,
TIMESTAMP(CAST(analyzed_table.`date_column` AS DATE)) AS time_period_utc
FROM `your-google-project-id`.`<target_schema>`.`<target_table>` AS analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
ClickHouse
{% import '/dialects/clickhouse.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT match(
toString({{ lib.render_target_column('analyzed_table') }}),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT match(
toString(analyzed_table."target_column"),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value,
CAST(analyzed_table."date_column" AS DATE) AS time_period,
toDateTime64(CAST(analyzed_table."date_column" AS DATE), 3) AS time_period_utc
FROM "<target_schema>"."<target_table>" AS analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
Databricks
{% import '/dialects/databricks.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP(
CAST({{ lib.render_target_column('analyzed_table') }} AS STRING),
"^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP(
CAST(analyzed_table.`target_column` AS STRING),
"^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value,
CAST(analyzed_table.`date_column` AS DATE) AS time_period,
TIMESTAMP(CAST(analyzed_table.`date_column` AS DATE)) AS time_period_utc
FROM `<target_schema>`.`<target_table>` AS analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
DB2
{% import '/dialects/db2.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP_LIKE({{ lib.render_target_column('analyzed_table') }},
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[[:space:].-]?)?(\(?\d{3}\)?[[:space:].-]?)(\d{3}[[:space:].-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP_LIKE(analyzed_table."target_column",
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[[:space:].-]?)?(\(?\d{3}\)?[[:space:].-]?)(\d{3}[[:space:].-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value,
time_period,
time_period_utc
FROM (
SELECT
original_table.*,
CAST(original_table."date_column" AS DATE) AS time_period,
TIMESTAMP(CAST(original_table."date_column" AS DATE)) AS time_period_utc
FROM "<target_schema>"."<target_table>" original_table
) analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
DuckDB
{% import '/dialects/duckdb.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN {{ lib.render_target_column('analyzed_table') }}
!~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN analyzed_table."target_column"
!~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value,
CAST(analyzed_table."date_column" AS date) AS time_period,
CAST((CAST(analyzed_table."date_column" AS date)) AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
FROM AS analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
HANA
{% import '/dialects/hana.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN
CAST({{ lib.render_target_column('analyzed_table') }} AS VARCHAR) NOT LIKE_REGEXPR
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN
CAST(analyzed_table."target_column" AS VARCHAR) NOT LIKE_REGEXPR
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value,
time_period,
time_period_utc
FROM (
SELECT
original_table.*,
CAST(original_table."date_column" AS DATE) AS time_period,
TO_TIMESTAMP(CAST(original_table."date_column" AS DATE)) AS time_period_utc
FROM "<target_schema>"."<target_table>" original_table
) analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
MariaDB
{% import '/dialects/mariadb.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT {{ lib.render_regex(lib.render_target_column('analyzed_table'), '^((((\\\\(\\\\+1\\\\)|(\\\\+1)|(\\\\([0][0][1]\\\\)|([0][0][1]))|\\\\(1\\\\)|(1))[\\\\s.-]?)?(\\\\(?[0-9]{3}\\\\)?[\\\\s.-]?)([0-9]{3}[\\\\s.-]?)([0-9]{4})))$') }}
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT
analyzed_table.`target_column` RLIKE '^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?[0-9]{3}\\)?[\\s.-]?)([0-9]{3}[\\s.-]?)([0-9]{4})))$'
THEN 1
ELSE 0
END
) AS actual_value,
DATE_FORMAT(analyzed_table.`date_column`, '%Y-%m-%d 00:00:00') AS time_period,
FROM_UNIXTIME(UNIX_TIMESTAMP(DATE_FORMAT(analyzed_table.`date_column`, '%Y-%m-%d 00:00:00'))) AS time_period_utc
FROM `<target_table>` AS analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
MySQL
{% import '/dialects/mysql.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT {{ lib.render_regex(lib.render_target_column('analyzed_table'), '^((((\\\\(\\\\+1\\\\)|(\\\\+1)|(\\\\([0][0][1]\\\\)|([0][0][1]))|\\\\(1\\\\)|(1))[\\\\s.-]?)?(\\\\(?[0-9]{3}\\\\)?[\\\\s.-]?)([0-9]{3}[\\\\s.-]?)([0-9]{4})))$') }}
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP_LIKE(analyzed_table.`target_column`, '^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?[0-9]{3}\\)?[\\s.-]?)([0-9]{3}[\\s.-]?)([0-9]{4})))$')
THEN 1
ELSE 0
END
) AS actual_value,
DATE_FORMAT(analyzed_table.`date_column`, '%Y-%m-%d 00:00:00') AS time_period,
FROM_UNIXTIME(UNIX_TIMESTAMP(DATE_FORMAT(analyzed_table.`date_column`, '%Y-%m-%d 00:00:00'))) AS time_period_utc
FROM `<target_table>` AS analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
Oracle
{% import '/dialects/oracle.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP_LIKE({{ lib.render_target_column('analyzed_table') }},
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[[:space:].-]?)?(\(?\d{3}\)?[[:space:].-]?)(\d{3}[[:space:].-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP_LIKE(analyzed_table."target_column",
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[[:space:].-]?)?(\(?\d{3}\)?[[:space:].-]?)(\d{3}[[:space:].-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value,
time_period,
time_period_utc
FROM (
SELECT
original_table.*,
TRUNC(CAST(original_table."date_column" AS DATE)) AS time_period,
CAST(TRUNC(CAST(original_table."date_column" AS DATE)) AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
FROM "<target_schema>"."<target_table>" original_table
) analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
PostgreSQL
{% import '/dialects/postgresql.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN {{ lib.render_target_column('analyzed_table') }} !~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN analyzed_table."target_column" !~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value,
CAST(analyzed_table."date_column" AS date) AS time_period,
CAST((CAST(analyzed_table."date_column" AS date)) AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
FROM "your_postgresql_database"."<target_schema>"."<target_table>" AS analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
Presto
{% import '/dialects/presto.sql.jinja2' as lib with context -%}
SELECT
CAST(SUM(
CASE
WHEN NOT REGEXP_LIKE(
CAST({{ lib.render_target_column('analyzed_table') }} AS VARCHAR),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS DOUBLE) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
CAST(SUM(
CASE
WHEN NOT REGEXP_LIKE(
CAST(analyzed_table."target_column" AS VARCHAR),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS DOUBLE) AS actual_value,
time_period,
time_period_utc
FROM (
SELECT
original_table.*,
CAST(original_table."date_column" AS date) AS time_period,
CAST(CAST(original_table."date_column" AS date) AS TIMESTAMP) AS time_period_utc
FROM "your_trino_database"."<target_schema>"."<target_table>" original_table
) analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
QuestDB
{% import '/dialects/questdb.sql.jinja2' as lib with context -%}
SELECT
COALESCE(SUM(
CASE
WHEN {{ lib.render_target_column('analyzed_table') }} !~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
), 0) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM(
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
WHERE {{ lib.render_target_column('original_table') }} IS NOT NULL
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
COALESCE(SUM(
CASE
WHEN analyzed_table."target_column" !~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
), 0) AS actual_value,
time_period,
time_period_utc
FROM(
SELECT
original_table.*,
CAST(DATE_TRUNC('day', original_table."date_column") AS DATE) AS time_period,
CAST((CAST(DATE_TRUNC('day', original_table."date_column") AS DATE)) AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
FROM "<target_table>" original_table
WHERE original_table."target_column" IS NOT NULL
) analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
Redshift
{% import '/dialects/redshift.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP_SUBSTR(replace(replace(replace({{ lib.render_target_column('analyzed_table') }}, '(', ''), ')', ''), '-', ''), '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$') IS NOT NULL
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP_SUBSTR(replace(replace(replace(analyzed_table."target_column", '(', ''), ')', ''), '-', ''), '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$') IS NOT NULL
THEN 1
ELSE 0
END
) AS actual_value,
CAST(analyzed_table."date_column" AS date) AS time_period,
CAST((CAST(analyzed_table."date_column" AS date)) AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
FROM "your_redshift_database"."<target_schema>"."<target_table>" AS analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
Snowflake
{% import '/dialects/snowflake.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE WHEN NOT ({{ lib.render_target_column('analyzed_table') }} REGEXP '^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$')
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE WHEN NOT (analyzed_table."target_column" REGEXP '^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$')
THEN 1
ELSE 0
END
) AS actual_value,
CAST(analyzed_table."date_column" AS date) AS time_period,
TO_TIMESTAMP(CAST(analyzed_table."date_column" AS date)) AS time_period_utc
FROM "your_snowflake_database"."<target_schema>"."<target_table>" AS analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
Spark
{% import '/dialects/spark.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP(
CAST({{ lib.render_target_column('analyzed_table') }} AS STRING),
"^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP(
CAST(analyzed_table.`target_column` AS STRING),
"^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value,
CAST(analyzed_table.`date_column` AS DATE) AS time_period,
TIMESTAMP(CAST(analyzed_table.`date_column` AS DATE)) AS time_period_utc
FROM `<target_schema>`.`<target_table>` AS analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
SQL Server
{% import '/dialects/sqlserver.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT {{ lib.render_target_column('analyzed_table') }} LIKE '[0-9][0-9][0-9][-.][0-9][0-9][0-9][-.][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '+1([0-9][0-9][0-9])[0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '+1([0-9][0-9][0-9])[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '1([0-9][0-9][0-9])-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '1[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '1 [0-9][0-9][0-9] [0-9][0-9][0-9] [0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '+1 [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '([0-9][0-9][0-9])[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '(+1)%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '(1)%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '([0-9][0-9][0-9])-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT analyzed_table.[target_column] LIKE '[0-9][0-9][0-9][-.][0-9][0-9][0-9][-.][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '+1([0-9][0-9][0-9])[0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '+1([0-9][0-9][0-9])[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '1([0-9][0-9][0-9])-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '1[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '1 [0-9][0-9][0-9] [0-9][0-9][0-9] [0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '+1 [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '([0-9][0-9][0-9])[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '(+1)%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '(1)%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '([0-9][0-9][0-9])-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'
THEN 1
ELSE 0
END
) AS actual_value,
CAST(analyzed_table.[date_column] AS date) AS time_period,
CAST((CAST(analyzed_table.[date_column] AS date)) AS DATETIME) AS time_period_utc
FROM [your_sql_server_database].[<target_schema>].[<target_table>] AS analyzed_table
GROUP BY CAST(analyzed_table.[date_column] AS date), CAST(analyzed_table.[date_column] AS date)
ORDER BY CAST(analyzed_table.[date_column] AS date)
Teradata
{% import '/dialects/teradata.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN {{ lib.render_target_column('analyzed_table') }} IS NOT NULL AND
REGEXP_SUBSTR(
CAST({{ lib.render_target_column('analyzed_table') }} AS VARCHAR(4096)),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) IS NULL THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN analyzed_table."target_column" IS NOT NULL AND
REGEXP_SUBSTR(
CAST(analyzed_table."target_column" AS VARCHAR(4096)),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) IS NULL THEN 1
ELSE 0
END
) AS actual_value,
CAST(analyzed_table."date_column" AS DATE) AS time_period,
CAST(CAST(analyzed_table."date_column" AS DATE) AS TIMESTAMP) AS time_period_utc
FROM "<target_schema>"."<target_table>" AS analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
Trino
{% import '/dialects/trino.sql.jinja2' as lib with context -%}
SELECT
CAST(SUM(
CASE
WHEN NOT REGEXP_LIKE(
CAST({{ lib.render_target_column('analyzed_table') }} AS VARCHAR),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS DOUBLE) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
CAST(SUM(
CASE
WHEN NOT REGEXP_LIKE(
CAST(analyzed_table."target_column" AS VARCHAR),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS DOUBLE) AS actual_value,
time_period,
time_period_utc
FROM (
SELECT
original_table.*,
CAST(original_table."date_column" AS date) AS time_period,
CAST(CAST(original_table."date_column" AS date) AS TIMESTAMP) AS time_period_utc
FROM "your_trino_catalog"."<target_schema>"."<target_table>" original_table
) analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
Expand the Configure with data grouping section to see additional examples for configuring this data quality checks to use data grouping (GROUP BY).
Configuration with data grouping
Sample configuration with data grouping enabled (YAML) The sample below shows how to configure the data grouping and how it affects the generated SQL query.
# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
timestamp_columns:
partition_by_column: date_column
incremental_time_window:
daily_partitioning_recent_days: 7
monthly_partitioning_recent_months: 1
default_grouping_name: group_by_country_and_state
groupings:
group_by_country_and_state:
level_1:
source: column_value
column: country
level_2:
source: column_value
column: state
columns:
target_column:
partitioned_checks:
daily:
patterns:
daily_partition_invalid_usa_phone_format_found:
warning:
max_count: 0
error:
max_count: 10
fatal:
max_count: 100
labels:
- This is the column that is analyzed for data quality issues
date_column:
labels:
- "date or datetime column used as a daily or monthly partitioning key, dates\
\ (and times) are truncated to a day or a month by the sensor's query for\
\ partitioned checks"
country:
labels:
- column used as the first grouping key
state:
labels:
- column used as the second grouping key
Please expand the database engine name section to see the SQL query rendered by a Jinja2 template for the invalid_usa_phone_count sensor.
BigQuery
{% import '/dialects/bigquery.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP_CONTAINS(
CAST({{ lib.render_target_column('analyzed_table') }} AS STRING),
r"^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP_CONTAINS(
CAST(analyzed_table.`target_column` AS STRING),
r"^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table.`country` AS grouping_level_1,
analyzed_table.`state` AS grouping_level_2,
CAST(analyzed_table.`date_column` AS DATE) AS time_period,
TIMESTAMP(CAST(analyzed_table.`date_column` AS DATE)) AS time_period_utc
FROM `your-google-project-id`.`<target_schema>`.`<target_table>` AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ClickHouse
{% import '/dialects/clickhouse.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT match(
toString({{ lib.render_target_column('analyzed_table') }}),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT match(
toString(analyzed_table."target_column"),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table."country" AS grouping_level_1,
analyzed_table."state" AS grouping_level_2,
CAST(analyzed_table."date_column" AS DATE) AS time_period,
toDateTime64(CAST(analyzed_table."date_column" AS DATE), 3) AS time_period_utc
FROM "<target_schema>"."<target_table>" AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc
Databricks
{% import '/dialects/databricks.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP(
CAST({{ lib.render_target_column('analyzed_table') }} AS STRING),
"^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP(
CAST(analyzed_table.`target_column` AS STRING),
"^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table.`country` AS grouping_level_1,
analyzed_table.`state` AS grouping_level_2,
CAST(analyzed_table.`date_column` AS DATE) AS time_period,
TIMESTAMP(CAST(analyzed_table.`date_column` AS DATE)) AS time_period_utc
FROM `<target_schema>`.`<target_table>` AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc
DB2
{% import '/dialects/db2.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP_LIKE({{ lib.render_target_column('analyzed_table') }},
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[[:space:].-]?)?(\(?\d{3}\)?[[:space:].-]?)(\d{3}[[:space:].-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP_LIKE(analyzed_table."target_column",
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[[:space:].-]?)?(\(?\d{3}\)?[[:space:].-]?)(\d{3}[[:space:].-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table.grouping_level_1,
analyzed_table.grouping_level_2,
time_period,
time_period_utc
FROM (
SELECT
original_table.*,
original_table."country" AS grouping_level_1,
original_table."state" AS grouping_level_2,
CAST(original_table."date_column" AS DATE) AS time_period,
TIMESTAMP(CAST(original_table."date_column" AS DATE)) AS time_period_utc
FROM "<target_schema>"."<target_table>" original_table
) analyzed_table
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc
DuckDB
{% import '/dialects/duckdb.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN {{ lib.render_target_column('analyzed_table') }}
!~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN analyzed_table."target_column"
!~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table."country" AS grouping_level_1,
analyzed_table."state" AS grouping_level_2,
CAST(analyzed_table."date_column" AS date) AS time_period,
CAST((CAST(analyzed_table."date_column" AS date)) AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
FROM AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc
HANA
{% import '/dialects/hana.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN
CAST({{ lib.render_target_column('analyzed_table') }} AS VARCHAR) NOT LIKE_REGEXPR
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN
CAST(analyzed_table."target_column" AS VARCHAR) NOT LIKE_REGEXPR
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table.grouping_level_1,
analyzed_table.grouping_level_2,
time_period,
time_period_utc
FROM (
SELECT
original_table.*,
original_table."country" AS grouping_level_1,
original_table."state" AS grouping_level_2,
CAST(original_table."date_column" AS DATE) AS time_period,
TO_TIMESTAMP(CAST(original_table."date_column" AS DATE)) AS time_period_utc
FROM "<target_schema>"."<target_table>" original_table
) analyzed_table
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc
MariaDB
{% import '/dialects/mariadb.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT {{ lib.render_regex(lib.render_target_column('analyzed_table'), '^((((\\\\(\\\\+1\\\\)|(\\\\+1)|(\\\\([0][0][1]\\\\)|([0][0][1]))|\\\\(1\\\\)|(1))[\\\\s.-]?)?(\\\\(?[0-9]{3}\\\\)?[\\\\s.-]?)([0-9]{3}[\\\\s.-]?)([0-9]{4})))$') }}
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT
analyzed_table.`target_column` RLIKE '^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?[0-9]{3}\\)?[\\s.-]?)([0-9]{3}[\\s.-]?)([0-9]{4})))$'
THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table.`country` AS grouping_level_1,
analyzed_table.`state` AS grouping_level_2,
DATE_FORMAT(analyzed_table.`date_column`, '%Y-%m-%d 00:00:00') AS time_period,
FROM_UNIXTIME(UNIX_TIMESTAMP(DATE_FORMAT(analyzed_table.`date_column`, '%Y-%m-%d 00:00:00'))) AS time_period_utc
FROM `<target_table>` AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc
MySQL
{% import '/dialects/mysql.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT {{ lib.render_regex(lib.render_target_column('analyzed_table'), '^((((\\\\(\\\\+1\\\\)|(\\\\+1)|(\\\\([0][0][1]\\\\)|([0][0][1]))|\\\\(1\\\\)|(1))[\\\\s.-]?)?(\\\\(?[0-9]{3}\\\\)?[\\\\s.-]?)([0-9]{3}[\\\\s.-]?)([0-9]{4})))$') }}
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP_LIKE(analyzed_table.`target_column`, '^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?[0-9]{3}\\)?[\\s.-]?)([0-9]{3}[\\s.-]?)([0-9]{4})))$')
THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table.`country` AS grouping_level_1,
analyzed_table.`state` AS grouping_level_2,
DATE_FORMAT(analyzed_table.`date_column`, '%Y-%m-%d 00:00:00') AS time_period,
FROM_UNIXTIME(UNIX_TIMESTAMP(DATE_FORMAT(analyzed_table.`date_column`, '%Y-%m-%d 00:00:00'))) AS time_period_utc
FROM `<target_table>` AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc
Oracle
{% import '/dialects/oracle.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP_LIKE({{ lib.render_target_column('analyzed_table') }},
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[[:space:].-]?)?(\(?\d{3}\)?[[:space:].-]?)(\d{3}[[:space:].-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP_LIKE(analyzed_table."target_column",
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[[:space:].-]?)?(\(?\d{3}\)?[[:space:].-]?)(\d{3}[[:space:].-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table.grouping_level_1,
analyzed_table.grouping_level_2
,
time_period,
time_period_utc
FROM (
SELECT
original_table.*,
original_table."country" AS grouping_level_1,
original_table."state" AS grouping_level_2,
TRUNC(CAST(original_table."date_column" AS DATE)) AS time_period,
CAST(TRUNC(CAST(original_table."date_column" AS DATE)) AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
FROM "<target_schema>"."<target_table>" original_table
) analyzed_table
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc
PostgreSQL
{% import '/dialects/postgresql.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN {{ lib.render_target_column('analyzed_table') }} !~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN analyzed_table."target_column" !~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table."country" AS grouping_level_1,
analyzed_table."state" AS grouping_level_2,
CAST(analyzed_table."date_column" AS date) AS time_period,
CAST((CAST(analyzed_table."date_column" AS date)) AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
FROM "your_postgresql_database"."<target_schema>"."<target_table>" AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc
Presto
{% import '/dialects/presto.sql.jinja2' as lib with context -%}
SELECT
CAST(SUM(
CASE
WHEN NOT REGEXP_LIKE(
CAST({{ lib.render_target_column('analyzed_table') }} AS VARCHAR),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS DOUBLE) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
CAST(SUM(
CASE
WHEN NOT REGEXP_LIKE(
CAST(analyzed_table."target_column" AS VARCHAR),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS DOUBLE) AS actual_value,
analyzed_table.grouping_level_1,
analyzed_table.grouping_level_2
,
time_period,
time_period_utc
FROM (
SELECT
original_table.*,
original_table."country" AS grouping_level_1,
original_table."state" AS grouping_level_2,
CAST(original_table."date_column" AS date) AS time_period,
CAST(CAST(original_table."date_column" AS date) AS TIMESTAMP) AS time_period_utc
FROM "your_trino_database"."<target_schema>"."<target_table>" original_table
) analyzed_table
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc
QuestDB
{% import '/dialects/questdb.sql.jinja2' as lib with context -%}
SELECT
COALESCE(SUM(
CASE
WHEN {{ lib.render_target_column('analyzed_table') }} !~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
), 0) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM(
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
WHERE {{ lib.render_target_column('original_table') }} IS NOT NULL
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
COALESCE(SUM(
CASE
WHEN analyzed_table."target_column" !~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
), 0) AS actual_value,
analyzed_table.grouping_level_1,
analyzed_table.grouping_level_2,
time_period,
time_period_utc
FROM(
SELECT
original_table.*,
original_table."country" AS grouping_level_1,
original_table."state" AS grouping_level_2,
CAST(DATE_TRUNC('day', original_table."date_column") AS DATE) AS time_period,
CAST((CAST(DATE_TRUNC('day', original_table."date_column") AS DATE)) AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
FROM "<target_table>" original_table
WHERE original_table."target_column" IS NOT NULL
) analyzed_table
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc
Redshift
{% import '/dialects/redshift.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP_SUBSTR(replace(replace(replace({{ lib.render_target_column('analyzed_table') }}, '(', ''), ')', ''), '-', ''), '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$') IS NOT NULL
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP_SUBSTR(replace(replace(replace(analyzed_table."target_column", '(', ''), ')', ''), '-', ''), '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$') IS NOT NULL
THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table."country" AS grouping_level_1,
analyzed_table."state" AS grouping_level_2,
CAST(analyzed_table."date_column" AS date) AS time_period,
CAST((CAST(analyzed_table."date_column" AS date)) AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
FROM "your_redshift_database"."<target_schema>"."<target_table>" AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc
Snowflake
{% import '/dialects/snowflake.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE WHEN NOT ({{ lib.render_target_column('analyzed_table') }} REGEXP '^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$')
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE WHEN NOT (analyzed_table."target_column" REGEXP '^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$')
THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table."country" AS grouping_level_1,
analyzed_table."state" AS grouping_level_2,
CAST(analyzed_table."date_column" AS date) AS time_period,
TO_TIMESTAMP(CAST(analyzed_table."date_column" AS date)) AS time_period_utc
FROM "your_snowflake_database"."<target_schema>"."<target_table>" AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc
Spark
{% import '/dialects/spark.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP(
CAST({{ lib.render_target_column('analyzed_table') }} AS STRING),
"^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP(
CAST(analyzed_table.`target_column` AS STRING),
"^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table.`country` AS grouping_level_1,
analyzed_table.`state` AS grouping_level_2,
CAST(analyzed_table.`date_column` AS DATE) AS time_period,
TIMESTAMP(CAST(analyzed_table.`date_column` AS DATE)) AS time_period_utc
FROM `<target_schema>`.`<target_table>` AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc
SQL Server
{% import '/dialects/sqlserver.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT {{ lib.render_target_column('analyzed_table') }} LIKE '[0-9][0-9][0-9][-.][0-9][0-9][0-9][-.][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '+1([0-9][0-9][0-9])[0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '+1([0-9][0-9][0-9])[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '1([0-9][0-9][0-9])-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '1[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '1 [0-9][0-9][0-9] [0-9][0-9][0-9] [0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '+1 [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '([0-9][0-9][0-9])[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '(+1)%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '(1)%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '([0-9][0-9][0-9])-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT analyzed_table.[target_column] LIKE '[0-9][0-9][0-9][-.][0-9][0-9][0-9][-.][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '+1([0-9][0-9][0-9])[0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '+1([0-9][0-9][0-9])[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '1([0-9][0-9][0-9])-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '1[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '1 [0-9][0-9][0-9] [0-9][0-9][0-9] [0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '+1 [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '([0-9][0-9][0-9])[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '(+1)%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '(1)%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '([0-9][0-9][0-9])-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'
THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table.[country] AS grouping_level_1,
analyzed_table.[state] AS grouping_level_2,
CAST(analyzed_table.[date_column] AS date) AS time_period,
CAST((CAST(analyzed_table.[date_column] AS date)) AS DATETIME) AS time_period_utc
FROM [your_sql_server_database].[<target_schema>].[<target_table>] AS analyzed_table
GROUP BY analyzed_table.[country], analyzed_table.[state], CAST(analyzed_table.[date_column] AS date), CAST(analyzed_table.[date_column] AS date)
ORDER BY level_1, level_2CAST(analyzed_table.[date_column] AS date)
Teradata
{% import '/dialects/teradata.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN {{ lib.render_target_column('analyzed_table') }} IS NOT NULL AND
REGEXP_SUBSTR(
CAST({{ lib.render_target_column('analyzed_table') }} AS VARCHAR(4096)),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) IS NULL THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN analyzed_table."target_column" IS NOT NULL AND
REGEXP_SUBSTR(
CAST(analyzed_table."target_column" AS VARCHAR(4096)),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) IS NULL THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table."country" AS grouping_level_1,
analyzed_table."state" AS grouping_level_2,
CAST(analyzed_table."date_column" AS DATE) AS time_period,
CAST(CAST(analyzed_table."date_column" AS DATE) AS TIMESTAMP) AS time_period_utc
FROM "<target_schema>"."<target_table>" AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc
Trino
{% import '/dialects/trino.sql.jinja2' as lib with context -%}
SELECT
CAST(SUM(
CASE
WHEN NOT REGEXP_LIKE(
CAST({{ lib.render_target_column('analyzed_table') }} AS VARCHAR),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS DOUBLE) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
CAST(SUM(
CASE
WHEN NOT REGEXP_LIKE(
CAST(analyzed_table."target_column" AS VARCHAR),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS DOUBLE) AS actual_value,
analyzed_table.grouping_level_1,
analyzed_table.grouping_level_2
,
time_period,
time_period_utc
FROM (
SELECT
original_table.*,
original_table."country" AS grouping_level_1,
original_table."state" AS grouping_level_2,
CAST(original_table."date_column" AS date) AS time_period,
CAST(CAST(original_table."date_column" AS date) AS TIMESTAMP) AS time_period_utc
FROM "your_trino_catalog"."<target_schema>"."<target_table>" original_table
) analyzed_table
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc
monthly partition invalid usa phone format found
Check description
Verifies that the number of invalid USA phone numbers in a text column does not exceed the maximum accepted count.
Data quality check name | Friendly name | Category | Check type | Time scale | Quality dimension | Sensor definition | Quality rule | Standard |
---|---|---|---|---|---|---|---|---|
monthly_partition_invalid_usa_phone_format_found |
Maximum count of rows containing invalid USA phone number values | patterns | partitioned | monthly | Validity | invalid_usa_phone_count | max_count |
Command-line examples
Please expand the section below to see the DQOps command-line examples to run or activate the monthly partition invalid usa phone format found data quality check.
Managing monthly partition invalid usa phone format found check from DQOps shell
Activate this data quality using the check activate CLI command, providing the connection name, table name, check name, and all other filters. Activates the warning rule with the default parameters.
dqo> check activate -c=connection_name -t=schema_name.table_name -col=column_name -ch=monthly_partition_invalid_usa_phone_format_found --enable-warning
You can also use patterns to activate the check on all matching tables and columns.
dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=monthly_partition_invalid_usa_phone_format_found --enable-warning
Additional rule parameters are passed using the -Wrule_parameter_name=value.
Activate this data quality using the check activate CLI command, providing the connection name, table name, check name, and all other filters. Activates the error rule with the default parameters.
dqo> check activate -c=connection_name -t=schema_name.table_name -col=column_name -ch=monthly_partition_invalid_usa_phone_format_found --enable-error
You can also use patterns to activate the check on all matching tables and columns.
dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=monthly_partition_invalid_usa_phone_format_found --enable-error
Additional rule parameters are passed using the -Erule_parameter_name=value.
Run this data quality check using the check run CLI command by providing the check name and all other targeting filters. The following example shows how to run the monthly_partition_invalid_usa_phone_format_found check on all tables and columns on a single data source.
It is also possible to run this check on a specific connection and table. In order to do this, use the connection name and the full table name parameters.
dqo> check run -c=connection_name -t=schema_name.table_name -ch=monthly_partition_invalid_usa_phone_format_found
You can also run this check on all tables (and columns) on which the monthly_partition_invalid_usa_phone_format_found check is enabled using patterns to find tables.
YAML configuration
The sample schema_name.table_name.dqotable.yaml file with the check configured is shown below.
# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
timestamp_columns:
partition_by_column: date_column
incremental_time_window:
daily_partitioning_recent_days: 7
monthly_partitioning_recent_months: 1
columns:
target_column:
partitioned_checks:
monthly:
patterns:
monthly_partition_invalid_usa_phone_format_found:
warning:
max_count: 0
error:
max_count: 10
fatal:
max_count: 100
labels:
- This is the column that is analyzed for data quality issues
date_column:
labels:
- "date or datetime column used as a daily or monthly partitioning key, dates\
\ (and times) are truncated to a day or a month by the sensor's query for\
\ partitioned checks"
Samples of generated SQL queries for each data source type
Please expand the database engine name section to see the SQL query rendered by a Jinja2 template for the invalid_usa_phone_count data quality sensor.
BigQuery
{% import '/dialects/bigquery.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP_CONTAINS(
CAST({{ lib.render_target_column('analyzed_table') }} AS STRING),
r"^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP_CONTAINS(
CAST(analyzed_table.`target_column` AS STRING),
r"^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value,
DATE_TRUNC(CAST(analyzed_table.`date_column` AS DATE), MONTH) AS time_period,
TIMESTAMP(DATE_TRUNC(CAST(analyzed_table.`date_column` AS DATE), MONTH)) AS time_period_utc
FROM `your-google-project-id`.`<target_schema>`.`<target_table>` AS analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
ClickHouse
{% import '/dialects/clickhouse.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT match(
toString({{ lib.render_target_column('analyzed_table') }}),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT match(
toString(analyzed_table."target_column"),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value,
DATE_TRUNC('month', CAST(analyzed_table."date_column" AS DATE)) AS time_period,
toDateTime64(DATE_TRUNC('month', CAST(analyzed_table."date_column" AS DATE)), 3) AS time_period_utc
FROM "<target_schema>"."<target_table>" AS analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
Databricks
{% import '/dialects/databricks.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP(
CAST({{ lib.render_target_column('analyzed_table') }} AS STRING),
"^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP(
CAST(analyzed_table.`target_column` AS STRING),
"^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value,
DATE_TRUNC('MONTH', CAST(analyzed_table.`date_column` AS DATE)) AS time_period,
TIMESTAMP(DATE_TRUNC('MONTH', CAST(analyzed_table.`date_column` AS DATE))) AS time_period_utc
FROM `<target_schema>`.`<target_table>` AS analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
DB2
{% import '/dialects/db2.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP_LIKE({{ lib.render_target_column('analyzed_table') }},
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[[:space:].-]?)?(\(?\d{3}\)?[[:space:].-]?)(\d{3}[[:space:].-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP_LIKE(analyzed_table."target_column",
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[[:space:].-]?)?(\(?\d{3}\)?[[:space:].-]?)(\d{3}[[:space:].-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value,
time_period,
time_period_utc
FROM (
SELECT
original_table.*,
DATE_TRUNC('MONTH', CAST(original_table."date_column" AS DATE)) AS time_period,
TIMESTAMP(DATE_TRUNC('MONTH', CAST(original_table."date_column" AS DATE))) AS time_period_utc
FROM "<target_schema>"."<target_table>" original_table
) analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
DuckDB
{% import '/dialects/duckdb.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN {{ lib.render_target_column('analyzed_table') }}
!~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN analyzed_table."target_column"
!~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value,
DATE_TRUNC('MONTH', CAST(analyzed_table."date_column" AS date)) AS time_period,
CAST((DATE_TRUNC('MONTH', CAST(analyzed_table."date_column" AS date))) AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
FROM AS analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
HANA
{% import '/dialects/hana.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN
CAST({{ lib.render_target_column('analyzed_table') }} AS VARCHAR) NOT LIKE_REGEXPR
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN
CAST(analyzed_table."target_column" AS VARCHAR) NOT LIKE_REGEXPR
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value,
time_period,
time_period_utc
FROM (
SELECT
original_table.*,
SERIES_ROUND(CAST(original_table."date_column" AS DATE), 'INTERVAL 1 MONTH', ROUND_DOWN) AS time_period,
TO_TIMESTAMP(SERIES_ROUND(CAST(original_table."date_column" AS DATE), 'INTERVAL 1 MONTH', ROUND_DOWN)) AS time_period_utc
FROM "<target_schema>"."<target_table>" original_table
) analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
MariaDB
{% import '/dialects/mariadb.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT {{ lib.render_regex(lib.render_target_column('analyzed_table'), '^((((\\\\(\\\\+1\\\\)|(\\\\+1)|(\\\\([0][0][1]\\\\)|([0][0][1]))|\\\\(1\\\\)|(1))[\\\\s.-]?)?(\\\\(?[0-9]{3}\\\\)?[\\\\s.-]?)([0-9]{3}[\\\\s.-]?)([0-9]{4})))$') }}
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT
analyzed_table.`target_column` RLIKE '^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?[0-9]{3}\\)?[\\s.-]?)([0-9]{3}[\\s.-]?)([0-9]{4})))$'
THEN 1
ELSE 0
END
) AS actual_value,
DATE_FORMAT(analyzed_table.`date_column`, '%Y-%m-01 00:00:00') AS time_period,
FROM_UNIXTIME(UNIX_TIMESTAMP(DATE_FORMAT(analyzed_table.`date_column`, '%Y-%m-01 00:00:00'))) AS time_period_utc
FROM `<target_table>` AS analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
MySQL
{% import '/dialects/mysql.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT {{ lib.render_regex(lib.render_target_column('analyzed_table'), '^((((\\\\(\\\\+1\\\\)|(\\\\+1)|(\\\\([0][0][1]\\\\)|([0][0][1]))|\\\\(1\\\\)|(1))[\\\\s.-]?)?(\\\\(?[0-9]{3}\\\\)?[\\\\s.-]?)([0-9]{3}[\\\\s.-]?)([0-9]{4})))$') }}
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP_LIKE(analyzed_table.`target_column`, '^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?[0-9]{3}\\)?[\\s.-]?)([0-9]{3}[\\s.-]?)([0-9]{4})))$')
THEN 1
ELSE 0
END
) AS actual_value,
DATE_FORMAT(analyzed_table.`date_column`, '%Y-%m-01 00:00:00') AS time_period,
FROM_UNIXTIME(UNIX_TIMESTAMP(DATE_FORMAT(analyzed_table.`date_column`, '%Y-%m-01 00:00:00'))) AS time_period_utc
FROM `<target_table>` AS analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
Oracle
{% import '/dialects/oracle.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP_LIKE({{ lib.render_target_column('analyzed_table') }},
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[[:space:].-]?)?(\(?\d{3}\)?[[:space:].-]?)(\d{3}[[:space:].-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP_LIKE(analyzed_table."target_column",
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[[:space:].-]?)?(\(?\d{3}\)?[[:space:].-]?)(\d{3}[[:space:].-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value,
time_period,
time_period_utc
FROM (
SELECT
original_table.*,
TRUNC(CAST(original_table."date_column" AS DATE), 'MONTH') AS time_period,
CAST(TRUNC(CAST(original_table."date_column" AS DATE), 'MONTH') AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
FROM "<target_schema>"."<target_table>" original_table
) analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
PostgreSQL
{% import '/dialects/postgresql.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN {{ lib.render_target_column('analyzed_table') }} !~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN analyzed_table."target_column" !~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value,
DATE_TRUNC('MONTH', CAST(analyzed_table."date_column" AS date)) AS time_period,
CAST((DATE_TRUNC('MONTH', CAST(analyzed_table."date_column" AS date))) AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
FROM "your_postgresql_database"."<target_schema>"."<target_table>" AS analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
Presto
{% import '/dialects/presto.sql.jinja2' as lib with context -%}
SELECT
CAST(SUM(
CASE
WHEN NOT REGEXP_LIKE(
CAST({{ lib.render_target_column('analyzed_table') }} AS VARCHAR),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS DOUBLE) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
CAST(SUM(
CASE
WHEN NOT REGEXP_LIKE(
CAST(analyzed_table."target_column" AS VARCHAR),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS DOUBLE) AS actual_value,
time_period,
time_period_utc
FROM (
SELECT
original_table.*,
DATE_TRUNC('MONTH', CAST(original_table."date_column" AS date)) AS time_period,
CAST(DATE_TRUNC('MONTH', CAST(original_table."date_column" AS date)) AS TIMESTAMP) AS time_period_utc
FROM "your_trino_database"."<target_schema>"."<target_table>" original_table
) analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
QuestDB
{% import '/dialects/questdb.sql.jinja2' as lib with context -%}
SELECT
COALESCE(SUM(
CASE
WHEN {{ lib.render_target_column('analyzed_table') }} !~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
), 0) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM(
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
WHERE {{ lib.render_target_column('original_table') }} IS NOT NULL
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
COALESCE(SUM(
CASE
WHEN analyzed_table."target_column" !~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
), 0) AS actual_value,
time_period,
time_period_utc
FROM(
SELECT
original_table.*,
CAST(DATE_TRUNC('month', original_table."date_column") AS DATE) AS time_period,
CAST((CAST(DATE_TRUNC('month', original_table."date_column") AS DATE)) AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
FROM "<target_table>" original_table
WHERE original_table."target_column" IS NOT NULL
) analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
Redshift
{% import '/dialects/redshift.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP_SUBSTR(replace(replace(replace({{ lib.render_target_column('analyzed_table') }}, '(', ''), ')', ''), '-', ''), '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$') IS NOT NULL
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP_SUBSTR(replace(replace(replace(analyzed_table."target_column", '(', ''), ')', ''), '-', ''), '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$') IS NOT NULL
THEN 1
ELSE 0
END
) AS actual_value,
DATE_TRUNC('MONTH', CAST(analyzed_table."date_column" AS date)) AS time_period,
CAST((DATE_TRUNC('MONTH', CAST(analyzed_table."date_column" AS date))) AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
FROM "your_redshift_database"."<target_schema>"."<target_table>" AS analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
Snowflake
{% import '/dialects/snowflake.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE WHEN NOT ({{ lib.render_target_column('analyzed_table') }} REGEXP '^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$')
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE WHEN NOT (analyzed_table."target_column" REGEXP '^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$')
THEN 1
ELSE 0
END
) AS actual_value,
DATE_TRUNC('MONTH', CAST(analyzed_table."date_column" AS date)) AS time_period,
TO_TIMESTAMP(DATE_TRUNC('MONTH', CAST(analyzed_table."date_column" AS date))) AS time_period_utc
FROM "your_snowflake_database"."<target_schema>"."<target_table>" AS analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
Spark
{% import '/dialects/spark.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP(
CAST({{ lib.render_target_column('analyzed_table') }} AS STRING),
"^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP(
CAST(analyzed_table.`target_column` AS STRING),
"^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value,
DATE_TRUNC('MONTH', CAST(analyzed_table.`date_column` AS DATE)) AS time_period,
TIMESTAMP(DATE_TRUNC('MONTH', CAST(analyzed_table.`date_column` AS DATE))) AS time_period_utc
FROM `<target_schema>`.`<target_table>` AS analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
SQL Server
{% import '/dialects/sqlserver.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT {{ lib.render_target_column('analyzed_table') }} LIKE '[0-9][0-9][0-9][-.][0-9][0-9][0-9][-.][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '+1([0-9][0-9][0-9])[0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '+1([0-9][0-9][0-9])[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '1([0-9][0-9][0-9])-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '1[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '1 [0-9][0-9][0-9] [0-9][0-9][0-9] [0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '+1 [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '([0-9][0-9][0-9])[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '(+1)%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '(1)%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '([0-9][0-9][0-9])-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT analyzed_table.[target_column] LIKE '[0-9][0-9][0-9][-.][0-9][0-9][0-9][-.][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '+1([0-9][0-9][0-9])[0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '+1([0-9][0-9][0-9])[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '1([0-9][0-9][0-9])-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '1[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '1 [0-9][0-9][0-9] [0-9][0-9][0-9] [0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '+1 [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '([0-9][0-9][0-9])[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '(+1)%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '(1)%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '([0-9][0-9][0-9])-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'
THEN 1
ELSE 0
END
) AS actual_value,
DATEFROMPARTS(YEAR(CAST(analyzed_table.[date_column] AS date)), MONTH(CAST(analyzed_table.[date_column] AS date)), 1) AS time_period,
CAST((DATEFROMPARTS(YEAR(CAST(analyzed_table.[date_column] AS date)), MONTH(CAST(analyzed_table.[date_column] AS date)), 1)) AS DATETIME) AS time_period_utc
FROM [your_sql_server_database].[<target_schema>].[<target_table>] AS analyzed_table
GROUP BY DATEFROMPARTS(YEAR(CAST(analyzed_table.[date_column] AS date)), MONTH(CAST(analyzed_table.[date_column] AS date)), 1), DATEADD(month, DATEDIFF(month, 0, analyzed_table.[date_column]), 0)
ORDER BY DATEFROMPARTS(YEAR(CAST(analyzed_table.[date_column] AS date)), MONTH(CAST(analyzed_table.[date_column] AS date)), 1)
Teradata
{% import '/dialects/teradata.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN {{ lib.render_target_column('analyzed_table') }} IS NOT NULL AND
REGEXP_SUBSTR(
CAST({{ lib.render_target_column('analyzed_table') }} AS VARCHAR(4096)),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) IS NULL THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN analyzed_table."target_column" IS NOT NULL AND
REGEXP_SUBSTR(
CAST(analyzed_table."target_column" AS VARCHAR(4096)),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) IS NULL THEN 1
ELSE 0
END
) AS actual_value,
TRUNC(CAST(analyzed_table."date_column" AS DATE), 'MM') AS time_period,
CAST(TRUNC(CAST(analyzed_table."date_column" AS DATE), 'MM') AS TIMESTAMP) AS time_period_utc
FROM "<target_schema>"."<target_table>" AS analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
Trino
{% import '/dialects/trino.sql.jinja2' as lib with context -%}
SELECT
CAST(SUM(
CASE
WHEN NOT REGEXP_LIKE(
CAST({{ lib.render_target_column('analyzed_table') }} AS VARCHAR),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS DOUBLE) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
CAST(SUM(
CASE
WHEN NOT REGEXP_LIKE(
CAST(analyzed_table."target_column" AS VARCHAR),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS DOUBLE) AS actual_value,
time_period,
time_period_utc
FROM (
SELECT
original_table.*,
DATE_TRUNC('MONTH', CAST(original_table."date_column" AS date)) AS time_period,
CAST(DATE_TRUNC('MONTH', CAST(original_table."date_column" AS date)) AS TIMESTAMP) AS time_period_utc
FROM "your_trino_catalog"."<target_schema>"."<target_table>" original_table
) analyzed_table
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc
Expand the Configure with data grouping section to see additional examples for configuring this data quality checks to use data grouping (GROUP BY).
Configuration with data grouping
Sample configuration with data grouping enabled (YAML) The sample below shows how to configure the data grouping and how it affects the generated SQL query.
# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
timestamp_columns:
partition_by_column: date_column
incremental_time_window:
daily_partitioning_recent_days: 7
monthly_partitioning_recent_months: 1
default_grouping_name: group_by_country_and_state
groupings:
group_by_country_and_state:
level_1:
source: column_value
column: country
level_2:
source: column_value
column: state
columns:
target_column:
partitioned_checks:
monthly:
patterns:
monthly_partition_invalid_usa_phone_format_found:
warning:
max_count: 0
error:
max_count: 10
fatal:
max_count: 100
labels:
- This is the column that is analyzed for data quality issues
date_column:
labels:
- "date or datetime column used as a daily or monthly partitioning key, dates\
\ (and times) are truncated to a day or a month by the sensor's query for\
\ partitioned checks"
country:
labels:
- column used as the first grouping key
state:
labels:
- column used as the second grouping key
Please expand the database engine name section to see the SQL query rendered by a Jinja2 template for the invalid_usa_phone_count sensor.
BigQuery
{% import '/dialects/bigquery.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP_CONTAINS(
CAST({{ lib.render_target_column('analyzed_table') }} AS STRING),
r"^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP_CONTAINS(
CAST(analyzed_table.`target_column` AS STRING),
r"^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table.`country` AS grouping_level_1,
analyzed_table.`state` AS grouping_level_2,
DATE_TRUNC(CAST(analyzed_table.`date_column` AS DATE), MONTH) AS time_period,
TIMESTAMP(DATE_TRUNC(CAST(analyzed_table.`date_column` AS DATE), MONTH)) AS time_period_utc
FROM `your-google-project-id`.`<target_schema>`.`<target_table>` AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ClickHouse
{% import '/dialects/clickhouse.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT match(
toString({{ lib.render_target_column('analyzed_table') }}),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT match(
toString(analyzed_table."target_column"),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table."country" AS grouping_level_1,
analyzed_table."state" AS grouping_level_2,
DATE_TRUNC('month', CAST(analyzed_table."date_column" AS DATE)) AS time_period,
toDateTime64(DATE_TRUNC('month', CAST(analyzed_table."date_column" AS DATE)), 3) AS time_period_utc
FROM "<target_schema>"."<target_table>" AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc
Databricks
{% import '/dialects/databricks.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP(
CAST({{ lib.render_target_column('analyzed_table') }} AS STRING),
"^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP(
CAST(analyzed_table.`target_column` AS STRING),
"^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table.`country` AS grouping_level_1,
analyzed_table.`state` AS grouping_level_2,
DATE_TRUNC('MONTH', CAST(analyzed_table.`date_column` AS DATE)) AS time_period,
TIMESTAMP(DATE_TRUNC('MONTH', CAST(analyzed_table.`date_column` AS DATE))) AS time_period_utc
FROM `<target_schema>`.`<target_table>` AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc
DB2
{% import '/dialects/db2.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP_LIKE({{ lib.render_target_column('analyzed_table') }},
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[[:space:].-]?)?(\(?\d{3}\)?[[:space:].-]?)(\d{3}[[:space:].-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP_LIKE(analyzed_table."target_column",
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[[:space:].-]?)?(\(?\d{3}\)?[[:space:].-]?)(\d{3}[[:space:].-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table.grouping_level_1,
analyzed_table.grouping_level_2,
time_period,
time_period_utc
FROM (
SELECT
original_table.*,
original_table."country" AS grouping_level_1,
original_table."state" AS grouping_level_2,
DATE_TRUNC('MONTH', CAST(original_table."date_column" AS DATE)) AS time_period,
TIMESTAMP(DATE_TRUNC('MONTH', CAST(original_table."date_column" AS DATE))) AS time_period_utc
FROM "<target_schema>"."<target_table>" original_table
) analyzed_table
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc
DuckDB
{% import '/dialects/duckdb.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN {{ lib.render_target_column('analyzed_table') }}
!~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN analyzed_table."target_column"
!~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table."country" AS grouping_level_1,
analyzed_table."state" AS grouping_level_2,
DATE_TRUNC('MONTH', CAST(analyzed_table."date_column" AS date)) AS time_period,
CAST((DATE_TRUNC('MONTH', CAST(analyzed_table."date_column" AS date))) AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
FROM AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc
HANA
{% import '/dialects/hana.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN
CAST({{ lib.render_target_column('analyzed_table') }} AS VARCHAR) NOT LIKE_REGEXPR
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN
CAST(analyzed_table."target_column" AS VARCHAR) NOT LIKE_REGEXPR
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table.grouping_level_1,
analyzed_table.grouping_level_2,
time_period,
time_period_utc
FROM (
SELECT
original_table.*,
original_table."country" AS grouping_level_1,
original_table."state" AS grouping_level_2,
SERIES_ROUND(CAST(original_table."date_column" AS DATE), 'INTERVAL 1 MONTH', ROUND_DOWN) AS time_period,
TO_TIMESTAMP(SERIES_ROUND(CAST(original_table."date_column" AS DATE), 'INTERVAL 1 MONTH', ROUND_DOWN)) AS time_period_utc
FROM "<target_schema>"."<target_table>" original_table
) analyzed_table
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc
MariaDB
{% import '/dialects/mariadb.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT {{ lib.render_regex(lib.render_target_column('analyzed_table'), '^((((\\\\(\\\\+1\\\\)|(\\\\+1)|(\\\\([0][0][1]\\\\)|([0][0][1]))|\\\\(1\\\\)|(1))[\\\\s.-]?)?(\\\\(?[0-9]{3}\\\\)?[\\\\s.-]?)([0-9]{3}[\\\\s.-]?)([0-9]{4})))$') }}
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT
analyzed_table.`target_column` RLIKE '^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?[0-9]{3}\\)?[\\s.-]?)([0-9]{3}[\\s.-]?)([0-9]{4})))$'
THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table.`country` AS grouping_level_1,
analyzed_table.`state` AS grouping_level_2,
DATE_FORMAT(analyzed_table.`date_column`, '%Y-%m-01 00:00:00') AS time_period,
FROM_UNIXTIME(UNIX_TIMESTAMP(DATE_FORMAT(analyzed_table.`date_column`, '%Y-%m-01 00:00:00'))) AS time_period_utc
FROM `<target_table>` AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc
MySQL
{% import '/dialects/mysql.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT {{ lib.render_regex(lib.render_target_column('analyzed_table'), '^((((\\\\(\\\\+1\\\\)|(\\\\+1)|(\\\\([0][0][1]\\\\)|([0][0][1]))|\\\\(1\\\\)|(1))[\\\\s.-]?)?(\\\\(?[0-9]{3}\\\\)?[\\\\s.-]?)([0-9]{3}[\\\\s.-]?)([0-9]{4})))$') }}
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP_LIKE(analyzed_table.`target_column`, '^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?[0-9]{3}\\)?[\\s.-]?)([0-9]{3}[\\s.-]?)([0-9]{4})))$')
THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table.`country` AS grouping_level_1,
analyzed_table.`state` AS grouping_level_2,
DATE_FORMAT(analyzed_table.`date_column`, '%Y-%m-01 00:00:00') AS time_period,
FROM_UNIXTIME(UNIX_TIMESTAMP(DATE_FORMAT(analyzed_table.`date_column`, '%Y-%m-01 00:00:00'))) AS time_period_utc
FROM `<target_table>` AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc
Oracle
{% import '/dialects/oracle.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP_LIKE({{ lib.render_target_column('analyzed_table') }},
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[[:space:].-]?)?(\(?\d{3}\)?[[:space:].-]?)(\d{3}[[:space:].-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP_LIKE(analyzed_table."target_column",
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[[:space:].-]?)?(\(?\d{3}\)?[[:space:].-]?)(\d{3}[[:space:].-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table.grouping_level_1,
analyzed_table.grouping_level_2
,
time_period,
time_period_utc
FROM (
SELECT
original_table.*,
original_table."country" AS grouping_level_1,
original_table."state" AS grouping_level_2,
TRUNC(CAST(original_table."date_column" AS DATE), 'MONTH') AS time_period,
CAST(TRUNC(CAST(original_table."date_column" AS DATE), 'MONTH') AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
FROM "<target_schema>"."<target_table>" original_table
) analyzed_table
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc
PostgreSQL
{% import '/dialects/postgresql.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN {{ lib.render_target_column('analyzed_table') }} !~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN analyzed_table."target_column" !~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table."country" AS grouping_level_1,
analyzed_table."state" AS grouping_level_2,
DATE_TRUNC('MONTH', CAST(analyzed_table."date_column" AS date)) AS time_period,
CAST((DATE_TRUNC('MONTH', CAST(analyzed_table."date_column" AS date))) AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
FROM "your_postgresql_database"."<target_schema>"."<target_table>" AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc
Presto
{% import '/dialects/presto.sql.jinja2' as lib with context -%}
SELECT
CAST(SUM(
CASE
WHEN NOT REGEXP_LIKE(
CAST({{ lib.render_target_column('analyzed_table') }} AS VARCHAR),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS DOUBLE) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
CAST(SUM(
CASE
WHEN NOT REGEXP_LIKE(
CAST(analyzed_table."target_column" AS VARCHAR),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS DOUBLE) AS actual_value,
analyzed_table.grouping_level_1,
analyzed_table.grouping_level_2
,
time_period,
time_period_utc
FROM (
SELECT
original_table.*,
original_table."country" AS grouping_level_1,
original_table."state" AS grouping_level_2,
DATE_TRUNC('MONTH', CAST(original_table."date_column" AS date)) AS time_period,
CAST(DATE_TRUNC('MONTH', CAST(original_table."date_column" AS date)) AS TIMESTAMP) AS time_period_utc
FROM "your_trino_database"."<target_schema>"."<target_table>" original_table
) analyzed_table
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc
QuestDB
{% import '/dialects/questdb.sql.jinja2' as lib with context -%}
SELECT
COALESCE(SUM(
CASE
WHEN {{ lib.render_target_column('analyzed_table') }} !~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
), 0) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM(
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
WHERE {{ lib.render_target_column('original_table') }} IS NOT NULL
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
COALESCE(SUM(
CASE
WHEN analyzed_table."target_column" !~ '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
THEN 1
ELSE 0
END
), 0) AS actual_value,
analyzed_table.grouping_level_1,
analyzed_table.grouping_level_2,
time_period,
time_period_utc
FROM(
SELECT
original_table.*,
original_table."country" AS grouping_level_1,
original_table."state" AS grouping_level_2,
CAST(DATE_TRUNC('month', original_table."date_column") AS DATE) AS time_period,
CAST((CAST(DATE_TRUNC('month', original_table."date_column") AS DATE)) AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
FROM "<target_table>" original_table
WHERE original_table."target_column" IS NOT NULL
) analyzed_table
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc
Redshift
{% import '/dialects/redshift.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP_SUBSTR(replace(replace(replace({{ lib.render_target_column('analyzed_table') }}, '(', ''), ')', ''), '-', ''), '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$') IS NOT NULL
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP_SUBSTR(replace(replace(replace(analyzed_table."target_column", '(', ''), ')', ''), '-', ''), '^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$') IS NOT NULL
THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table."country" AS grouping_level_1,
analyzed_table."state" AS grouping_level_2,
DATE_TRUNC('MONTH', CAST(analyzed_table."date_column" AS date)) AS time_period,
CAST((DATE_TRUNC('MONTH', CAST(analyzed_table."date_column" AS date))) AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
FROM "your_redshift_database"."<target_schema>"."<target_table>" AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc
Snowflake
{% import '/dialects/snowflake.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE WHEN NOT ({{ lib.render_target_column('analyzed_table') }} REGEXP '^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$')
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE WHEN NOT (analyzed_table."target_column" REGEXP '^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$')
THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table."country" AS grouping_level_1,
analyzed_table."state" AS grouping_level_2,
DATE_TRUNC('MONTH', CAST(analyzed_table."date_column" AS date)) AS time_period,
TO_TIMESTAMP(DATE_TRUNC('MONTH', CAST(analyzed_table."date_column" AS date))) AS time_period_utc
FROM "your_snowflake_database"."<target_schema>"."<target_table>" AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc
Spark
{% import '/dialects/spark.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT REGEXP(
CAST({{ lib.render_target_column('analyzed_table') }} AS STRING),
"^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT REGEXP(
CAST(analyzed_table.`target_column` AS STRING),
"^((((\\(\\+1\\)|(\\+1)|(\\([0][0][1]\\)|([0][0][1]))|\\(1\\)|(1))[\\s.-]?)?(\\(?\\d{3}\\)?[\\s.-]?)(\\d{3}[\\s.-]?)(\\d{4})))$"
) THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table.`country` AS grouping_level_1,
analyzed_table.`state` AS grouping_level_2,
DATE_TRUNC('MONTH', CAST(analyzed_table.`date_column` AS DATE)) AS time_period,
TIMESTAMP(DATE_TRUNC('MONTH', CAST(analyzed_table.`date_column` AS DATE))) AS time_period_utc
FROM `<target_schema>`.`<target_table>` AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc
SQL Server
{% import '/dialects/sqlserver.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN NOT {{ lib.render_target_column('analyzed_table') }} LIKE '[0-9][0-9][0-9][-.][0-9][0-9][0-9][-.][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '+1([0-9][0-9][0-9])[0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '+1([0-9][0-9][0-9])[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '1([0-9][0-9][0-9])-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '1[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '1 [0-9][0-9][0-9] [0-9][0-9][0-9] [0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '+1 [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '([0-9][0-9][0-9])[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '(+1)%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '(1)%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT {{ lib.render_target_column('analyzed_table') }} LIKE '([0-9][0-9][0-9])-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'
THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN NOT analyzed_table.[target_column] LIKE '[0-9][0-9][0-9][-.][0-9][0-9][0-9][-.][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '+1([0-9][0-9][0-9])[0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '+1([0-9][0-9][0-9])[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '1([0-9][0-9][0-9])-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '1[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '1 [0-9][0-9][0-9] [0-9][0-9][0-9] [0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '+1 [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '([0-9][0-9][0-9])[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '(+1)%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '(1)%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
AND NOT analyzed_table.[target_column] LIKE '([0-9][0-9][0-9])-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'
THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table.[country] AS grouping_level_1,
analyzed_table.[state] AS grouping_level_2,
DATEFROMPARTS(YEAR(CAST(analyzed_table.[date_column] AS date)), MONTH(CAST(analyzed_table.[date_column] AS date)), 1) AS time_period,
CAST((DATEFROMPARTS(YEAR(CAST(analyzed_table.[date_column] AS date)), MONTH(CAST(analyzed_table.[date_column] AS date)), 1)) AS DATETIME) AS time_period_utc
FROM [your_sql_server_database].[<target_schema>].[<target_table>] AS analyzed_table
GROUP BY analyzed_table.[country], analyzed_table.[state], DATEFROMPARTS(YEAR(CAST(analyzed_table.[date_column] AS date)), MONTH(CAST(analyzed_table.[date_column] AS date)), 1), DATEADD(month, DATEDIFF(month, 0, analyzed_table.[date_column]), 0)
ORDER BY level_1, level_2DATEFROMPARTS(YEAR(CAST(analyzed_table.[date_column] AS date)), MONTH(CAST(analyzed_table.[date_column] AS date)), 1)
Teradata
{% import '/dialects/teradata.sql.jinja2' as lib with context -%}
SELECT
SUM(
CASE
WHEN {{ lib.render_target_column('analyzed_table') }} IS NOT NULL AND
REGEXP_SUBSTR(
CAST({{ lib.render_target_column('analyzed_table') }} AS VARCHAR(4096)),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) IS NULL THEN 1
ELSE 0
END
) AS actual_value
{{- lib.render_data_grouping_projections('analyzed_table') }}
{{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
SUM(
CASE
WHEN analyzed_table."target_column" IS NOT NULL AND
REGEXP_SUBSTR(
CAST(analyzed_table."target_column" AS VARCHAR(4096)),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) IS NULL THEN 1
ELSE 0
END
) AS actual_value,
analyzed_table."country" AS grouping_level_1,
analyzed_table."state" AS grouping_level_2,
TRUNC(CAST(analyzed_table."date_column" AS DATE), 'MM') AS time_period,
CAST(TRUNC(CAST(analyzed_table."date_column" AS DATE), 'MM') AS TIMESTAMP) AS time_period_utc
FROM "<target_schema>"."<target_table>" AS analyzed_table
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc
Trino
{% import '/dialects/trino.sql.jinja2' as lib with context -%}
SELECT
CAST(SUM(
CASE
WHEN NOT REGEXP_LIKE(
CAST({{ lib.render_target_column('analyzed_table') }} AS VARCHAR),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS DOUBLE) AS actual_value
{{- lib.render_data_grouping_projections_reference('analyzed_table') }}
{{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM (
SELECT
original_table.*
{{- lib.render_data_grouping_projections('original_table') }}
{{- lib.render_time_dimension_projection('original_table') }}
FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{{- lib.render_where_clause() -}}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}
SELECT
CAST(SUM(
CASE
WHEN NOT REGEXP_LIKE(
CAST(analyzed_table."target_column" AS VARCHAR),
'^((((\(\+1\)|(\+1)|(\([0][0][1]\)|([0][0][1]))|\(1/)|(1))[\s.-]?)?(\(?\d{3}\)?[\s.-]?)(\d{3}[\s.-]?)(\d{4})))$'
) THEN 1
ELSE 0
END
) AS DOUBLE) AS actual_value,
analyzed_table.grouping_level_1,
analyzed_table.grouping_level_2
,
time_period,
time_period_utc
FROM (
SELECT
original_table.*,
original_table."country" AS grouping_level_1,
original_table."state" AS grouping_level_2,
DATE_TRUNC('MONTH', CAST(original_table."date_column" AS date)) AS time_period,
CAST(DATE_TRUNC('MONTH', CAST(original_table."date_column" AS date)) AS TIMESTAMP) AS time_period_utc
FROM "your_trino_catalog"."<target_schema>"."<target_table>" original_table
) analyzed_table
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc
What's next
- Learn how to configure data quality checks in DQOps
- Look at the examples of running data quality checks, targeting tables and columns