Last updated: July 22, 2025

Expected texts in top values count data quality checks, SQL examples

A column-level check that counts how many expected text values are among the TOP most popular values in the column. The check will first count the number of occurrences of each column's value and will pick the TOP X most popular values (configurable by the 'top' parameter). Then, it will compare the list of most popular values to the given list of expected values that should be most popular. This check will verify how many supposed most popular values (provided in the 'expected_values' list) were not found in the top X most popular values in the column. This check is helpful in analyzing string columns with frequently occurring values, such as country codes for countries with the most customers.

The expected texts in top values count data quality check has the following variants for each type of data quality checks supported by DQOps.

profile expected texts in top values count

Check description

Verifies that the top X most popular column values contain all values from a list of expected values.

Data quality check name	Friendly name	Category	Check type	Time scale	Quality dimension	Sensor definition	Quality rule	Standard
`profile_expected_texts_in_top_values_count`	Verify that the most popular text values match the list of expected values	accepted_values	profiling		Reasonableness	expected_texts_in_top_values_count	max_missing

Command-line examples

Please expand the section below to see the DQOps command-line examples to run or activate the profile expected texts in top values count data quality check.

Managing profile expected texts in top values count check from DQOps shell

Activate the check with a warning ruleActivate the check with an error ruleRun all configured checks

Activate this data quality using the check activate CLI command, providing the connection name, table name, check name, and all other filters. Activates the warning rule with the default parameters.

dqo> check activate -c=connection_name -t=schema_name.table_name -col=column_name -ch=profile_expected_texts_in_top_values_count --enable-warning

You can also use patterns to activate the check on all matching tables and columns.

dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=profile_expected_texts_in_top_values_count --enable-warning

Additional rule parameters are passed using the -Wrule_parameter_name=value.

dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=profile_expected_texts_in_top_values_count --enable-warning
                    -Wmax_missing=value

Activate this data quality using the check activate CLI command, providing the connection name, table name, check name, and all other filters. Activates the error rule with the default parameters.

dqo> check activate -c=connection_name -t=schema_name.table_name -col=column_name -ch=profile_expected_texts_in_top_values_count --enable-error

You can also use patterns to activate the check on all matching tables and columns.

dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=profile_expected_texts_in_top_values_count --enable-error

Additional rule parameters are passed using the -Erule_parameter_name=value.

dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=profile_expected_texts_in_top_values_count --enable-error
                    -Emax_missing=value

Run this data quality check using the check run CLI command by providing the check name and all other targeting filters. The following example shows how to run the profile_expected_texts_in_top_values_count check on all tables and columns on a single data source.

dqo> check run -c=data_source_name -ch=profile_expected_texts_in_top_values_count

It is also possible to run this check on a specific connection and table. In order to do this, use the connection name and the full table name parameters.

dqo> check run -c=connection_name -t=schema_name.table_name -ch=profile_expected_texts_in_top_values_count

You can also run this check on all tables (and columns) on which the profile_expected_texts_in_top_values_count check is enabled using patterns to find tables.

dqo> check run -c=connection_name -t=schema_prefix*.fact_* -col=column_name_* -ch=profile_expected_texts_in_top_values_count

YAML configuration

The sample schema_name.table_name.dqotable.yaml file with the check configured is shown below.

# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
  columns:
    target_column:
      profiling_checks:
        accepted_values:
          profile_expected_texts_in_top_values_count:
            parameters:
              expected_values:
              - USD
              - GBP
              - EUR
              top: 3
            warning:
              max_missing: 0
            error:
              max_missing: 1
            fatal:
              max_missing: 2
      labels:
      - This is the column that is analyzed for data quality issues

Samples of generated SQL queries for each data source type

Please expand the database engine name section to see the SQL query rendered by a Jinja2 template for the expected_texts_in_top_values_count data quality sensor.

BigQuery

Sensor template for BigQueryRendered SQL for BigQuery

{% import '/dialects/bigquery.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY NULL
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values
        FROM
            `your-google-project-id`.`<target_schema>`.`<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY top_value
        ORDER BY total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

ClickHouse

Sensor template for ClickHouseRendered SQL for ClickHouse

{% import '/dialects/clickhouse.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY NULL
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values
        FROM
            "<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY top_value
        ORDER BY total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

Databricks

Sensor template for DatabricksRendered SQL for Databricks

{% import '/dialects/databricks.sql.jinja2' as lib with context -%}
{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY NULL
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values
        FROM
            `<target_schema>`.`<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY top_value
        ORDER BY total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

DB2

Sensor template for DB2Rendered SQL for DB2

{% import '/dialects/db2.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value AS top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period AS time_period,
        top_col_values.time_period_utc AS time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) AS top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM (
                SELECT
                    original_table.*,
                    {{ lib.render_target_column('original_table') }} AS top_value
                    {{- lib.render_data_grouping_projections('original_table') }}
                    {{- lib.render_time_dimension_projection('original_table') }}
                FROM {{ lib.render_target_table() }} original_table
                {{- lib.render_where_clause(extra_filter = lib.render_target_column('original_table') ~ ' IS NOT NULL', table_alias_prefix='original_table') }}
            ) analyzed_table
        {{- lib.render_where_clause(indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true and lib.data_groupings is not none -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value AS top_value,
        RANK() OVER(
            ORDER BY top_col_values.total_values DESC) AS top_values_rank
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
        FROM (
                SELECT
                    original_table.*,
                    original_table."target_column" AS top_value
                FROM "<target_schema>"."<target_table>" original_table
WHERE (original_table."target_column" IS NOT NULL)
            ) analyzed_table
        GROUP BY top_value
        ORDER BY total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

DuckDB

Sensor template for DuckDBRendered SQL for DuckDB

{% import '/dialects/duckdb.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY NULL
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values
        FROM
             AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY top_value
        ORDER BY total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

HANA

Sensor template for HANARendered SQL for HANA

{% import '/dialects/hana.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM (
                SELECT
                    original_table.*,
                    {{ lib.render_target_column('original_table') }} AS top_value
                    {{- lib.render_data_grouping_projections('original_table') }}
                    {{- lib.render_time_dimension_projection('original_table') }}
                FROM {{ lib.render_target_table() }} original_table
                {{- lib.render_where_clause(extra_filter = lib.render_target_column('original_table') ~ ' IS NOT NULL', table_alias_prefix='original_table') }}
            ) analyzed_table
        {{- lib.render_where_clause(indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true and lib.data_groupings is not none -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
        FROM (
                SELECT
                    original_table.*,
                    original_table."target_column" AS top_value
                FROM "<target_schema>"."<target_table>" original_table
WHERE (original_table."target_column" IS NOT NULL)
            ) analyzed_table
        GROUP BY top_value
        ORDER BY total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

MariaDB

Sensor template for MariaDBRendered SQL for MariaDB

{% import '/dialects/mariadb.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
        FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY NULL
            ORDER BY top_col_values.total_values DESC) as top_values_rank
        FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values
        FROM
            `<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY top_value
        ORDER BY total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

MySQL

Sensor template for MySQLRendered SQL for MySQL

{% import '/dialects/mysql.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
        FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY NULL
            ORDER BY top_col_values.total_values DESC) as top_values_rank
        FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values
        FROM
            `<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY top_value
        ORDER BY total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

Oracle

Sensor template for OracleRendered SQL for Oracle

{% import '/dialects/oracle.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period time_period,
        top_col_values.time_period_utc time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} top_value,
            COUNT(*) total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM
        (
            SELECT
            additional_table.*,
            {{ lib.render_target_column('additional_table') }} top_value
            {{- lib.render_data_grouping_projections('additional_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('additional_table', indentation = '            ') }}
            FROM {{ lib.render_target_table() }} additional_table) analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) top_col_values
) top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL actual_value,
    MAX(0) expected_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM(
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
    {{- lib.render_where_clause(table_alias_prefix='original_table') }}) analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) actual_value,
    MAX({{ parameters.expected_values | length }}) expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) actual_value,
    MAX(3) expected_value
FROM
(
    SELECT
        top_col_values.top_value top_value,
        RANK() OVER(PARTITION BY NULL
            ORDER BY top_col_values.total_values DESC) top_values_rank
    FROM
    (
        SELECT
            analyzed_table."target_column" top_value,
            COUNT(*) total_values
        FROM
        (
            SELECT
            additional_table.*,
            additional_table."target_column" top_value
            FROM "<target_schema>"."<target_table>" additional_table) analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY top_value
        ORDER BY total_values DESC
    ) top_col_values
) top_values
WHERE top_values_rank <= 3

PostgreSQL

Sensor template for PostgreSQLRendered SQL for PostgreSQL

{% import '/dialects/postgresql.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY NULL
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values
        FROM
            "your_postgresql_database"."<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY top_value
        ORDER BY total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

Presto

Sensor template for PrestoRendered SQL for Presto

{% import '/dialects/presto.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM (
                SELECT
                    original_table.*,
                    {{ lib.render_target_column('original_table') }} AS top_value
                    {{- lib.render_data_grouping_projections('original_table') }}
                    {{- lib.render_time_dimension_projection('original_table') }}
                FROM {{ lib.render_target_table() }} original_table
                {{- lib.render_where_clause(extra_filter = lib.render_target_column('original_table') ~ ' IS NOT NULL', table_alias_prefix='original_table') }}
            ) analyzed_table
        {{- lib.render_where_clause(indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY NULL
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
        FROM (
                SELECT
                    original_table.*,
                    original_table."target_column" AS top_value
                FROM "your_trino_database"."<target_schema>"."<target_table>" original_table
WHERE (original_table."target_column" IS NOT NULL)
            ) analyzed_table
        GROUP BY top_value
        ORDER BY total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

QuestDB

Sensor template for QuestDBRendered SQL for QuestDB

{% import '/dialects/questdb.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT() AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM(
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{%- else %}
    COUNT_DISTINCT(
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT_DISTINCT(
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY NULL
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT() AS total_values
        FROM
            "<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY top_value
        ORDER BY total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

Redshift

Sensor template for RedshiftRendered SQL for Redshift

{% import '/dialects/redshift.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY NULL
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values
        FROM
            "your_redshift_database"."<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY top_value
        ORDER BY total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

Snowflake

Sensor template for SnowflakeRendered SQL for Snowflake

{% import '/dialects/snowflake.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY NULL
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values
        FROM
            "your_snowflake_database"."<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY top_value
        ORDER BY total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

Spark

Sensor template for SparkRendered SQL for Spark

{% import '/dialects/spark.sql.jinja2' as lib with context -%}
{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY NULL
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values
        FROM
            `<target_schema>`.`<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY top_value
        ORDER BY total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

SQL Server

Sensor template for SQL ServerRendered SQL for SQL Server

{% import '/dialects/sqlserver.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT_BIG(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} {{ lib.render_target_column('analyzed_table') }}
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT_BIG(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{% if lib.time_series is not none -%}
GROUP BY time_period, time_period_utc
{%- endif -%}
{%- if (lib.data_groupings is not none and (lib.data_groupings | length) > 0) -%}
    {% if lib.time_series is none %}GROUP BY {% endif -%}
    {%- for attribute in lib.data_groupings -%}
    {{ ', ' if lib.time_series is not none and loop.index == 1 else "" }}top_values.grouping_{{ attribute }}
    {%- endfor -%}
{%- endif -%}

SELECT
    COUNT_BIG(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY NULL
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table.[target_column] AS top_value,
            COUNT_BIG(*) AS total_values
        FROM
            [your_sql_server_database].[<target_schema>].[<target_table>] AS analyzed_table
        WHERE (analyzed_table.[target_column] IS NOT NULL)
        GROUP BY analyzed_table.[target_column]
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

Teradata

Sensor template for TeradataRendered SQL for Teradata

{% import '/dialects/teradata.sql.jinja2' as lib with context -%}
{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY NULL
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values
        FROM
            "<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY top_value
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

Trino

Sensor template for TrinoRendered SQL for Trino

{% import '/dialects/trino.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM (
                SELECT
                    original_table.*,
                    {{ lib.render_target_column('original_table') }} AS top_value
                    {{- lib.render_data_grouping_projections('original_table') }}
                    {{- lib.render_time_dimension_projection('original_table') }}
                FROM {{ lib.render_target_table() }} original_table
                {{- lib.render_where_clause(extra_filter = lib.render_target_column('original_table') ~ ' IS NOT NULL', table_alias_prefix='original_table') }}
            ) analyzed_table
        {{- lib.render_where_clause(indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY NULL
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
        FROM (
                SELECT
                    original_table.*,
                    original_table."target_column" AS top_value
                FROM "your_trino_catalog"."<target_schema>"."<target_table>" original_table
WHERE (original_table."target_column" IS NOT NULL)
            ) analyzed_table
        GROUP BY top_value
        ORDER BY total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

Expand the Configure with data grouping section to see additional examples for configuring this data quality checks to use data grouping (GROUP BY).

Configuration with data grouping

Sample configuration with data grouping enabled (YAML) The sample below shows how to configure the data grouping and how it affects the generated SQL query.

# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
  default_grouping_name: group_by_country_and_state
  groupings:
    group_by_country_and_state:
      level_1:
        source: column_value
        column: country
      level_2:
        source: column_value
        column: state
  columns:
    target_column:
      profiling_checks:
        accepted_values:
          profile_expected_texts_in_top_values_count:
            parameters:
              expected_values:
              - USD
              - GBP
              - EUR
              top: 3
            warning:
              max_missing: 0
            error:
              max_missing: 1
            fatal:
              max_missing: 2
      labels:
      - This is the column that is analyzed for data quality issues
    country:
      labels:
      - column used as the first grouping key
    state:
      labels:
      - column used as the second grouping key

Please expand the database engine name section to see the SQL query rendered by a Jinja2 template for the expected_texts_in_top_values_count sensor.

BigQuery

Sensor template for BigQueryRendered SQL for BigQuery

{% import '/dialects/bigquery.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values,
            analyzed_table.`country` AS grouping_level_1,
            analyzed_table.`state` AS grouping_level_2
        FROM
            `your-google-project-id`.`<target_schema>`.`<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, top_value
        ORDER BY grouping_level_1, grouping_level_2, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

ClickHouse

Sensor template for ClickHouseRendered SQL for ClickHouse

{% import '/dialects/clickhouse.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values,
            analyzed_table."country" AS grouping_level_1,
            analyzed_table."state" AS grouping_level_2
        FROM
            "<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, top_value
        ORDER BY grouping_level_1, grouping_level_2, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

Databricks

Sensor template for DatabricksRendered SQL for Databricks

{% import '/dialects/databricks.sql.jinja2' as lib with context -%}
{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values,
            analyzed_table.`country` AS grouping_level_1,
            analyzed_table.`state` AS grouping_level_2
        FROM
            `<target_schema>`.`<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, top_value
        ORDER BY grouping_level_1, grouping_level_2, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

DB2

Sensor template for DB2Rendered SQL for DB2

{% import '/dialects/db2.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value AS top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period AS time_period,
        top_col_values.time_period_utc AS time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) AS top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM (
                SELECT
                    original_table.*,
                    {{ lib.render_target_column('original_table') }} AS top_value
                    {{- lib.render_data_grouping_projections('original_table') }}
                    {{- lib.render_time_dimension_projection('original_table') }}
                FROM {{ lib.render_target_table() }} original_table
                {{- lib.render_where_clause(extra_filter = lib.render_target_column('original_table') ~ ' IS NOT NULL', table_alias_prefix='original_table') }}
            ) analyzed_table
        {{- lib.render_where_clause(indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true and lib.data_groupings is not none -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value AS top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) AS top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values,

                        analyzed_table.grouping_level_1,

                        analyzed_table.grouping_level_2
        FROM (
                SELECT
                    original_table.*,
                    original_table."target_column" AS top_value,
    original_table."country" AS grouping_level_1,
    original_table."state" AS grouping_level_2
                FROM "<target_schema>"."<target_table>" original_table
WHERE (original_table."target_column" IS NOT NULL)
            ) analyzed_table
        GROUP BY grouping_level_1, grouping_level_2, top_value
        ORDER BY grouping_level_1, grouping_level_2, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

DuckDB

Sensor template for DuckDBRendered SQL for DuckDB

{% import '/dialects/duckdb.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values,
            analyzed_table."country" AS grouping_level_1,
            analyzed_table."state" AS grouping_level_2
        FROM
             AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, top_value
        ORDER BY grouping_level_1, grouping_level_2, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

HANA

Sensor template for HANARendered SQL for HANA

{% import '/dialects/hana.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM (
                SELECT
                    original_table.*,
                    {{ lib.render_target_column('original_table') }} AS top_value
                    {{- lib.render_data_grouping_projections('original_table') }}
                    {{- lib.render_time_dimension_projection('original_table') }}
                FROM {{ lib.render_target_table() }} original_table
                {{- lib.render_where_clause(extra_filter = lib.render_target_column('original_table') ~ ' IS NOT NULL', table_alias_prefix='original_table') }}
            ) analyzed_table
        {{- lib.render_where_clause(indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true and lib.data_groupings is not none -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values,

                        analyzed_table.grouping_level_1,

                        analyzed_table.grouping_level_2
        FROM (
                SELECT
                    original_table.*,
                    original_table."target_column" AS top_value,
    original_table."country" AS grouping_level_1,
    original_table."state" AS grouping_level_2
                FROM "<target_schema>"."<target_table>" original_table
WHERE (original_table."target_column" IS NOT NULL)
            ) analyzed_table
        GROUP BY grouping_level_1, grouping_level_2, top_value
        ORDER BY grouping_level_1, grouping_level_2, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

MariaDB

Sensor template for MariaDBRendered SQL for MariaDB

{% import '/dialects/mariadb.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
        FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
        FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values,
            analyzed_table.`country` AS grouping_level_1,
            analyzed_table.`state` AS grouping_level_2
        FROM
            `<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, top_value
        ORDER BY grouping_level_1, grouping_level_2, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

MySQL

Sensor template for MySQLRendered SQL for MySQL

{% import '/dialects/mysql.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
        FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
        FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values,
            analyzed_table.`country` AS grouping_level_1,
            analyzed_table.`state` AS grouping_level_2
        FROM
            `<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, top_value
        ORDER BY grouping_level_1, grouping_level_2, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

Oracle

Sensor template for OracleRendered SQL for Oracle

{% import '/dialects/oracle.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period time_period,
        top_col_values.time_period_utc time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} top_value,
            COUNT(*) total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM
        (
            SELECT
            additional_table.*,
            {{ lib.render_target_column('additional_table') }} top_value
            {{- lib.render_data_grouping_projections('additional_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('additional_table', indentation = '            ') }}
            FROM {{ lib.render_target_table() }} additional_table) analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) top_col_values
) top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL actual_value,
    MAX(0) expected_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM(
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
    {{- lib.render_where_clause(table_alias_prefix='original_table') }}) analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) actual_value,
    MAX({{ parameters.expected_values | length }}) expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) actual_value,
    MAX(3) expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table."target_column" top_value,
            COUNT(*) total_values,

                        analyzed_table.grouping_level_1,

                        analyzed_table.grouping_level_2

        FROM
        (
            SELECT
            additional_table.*,
            additional_table."target_column" top_value,
            additional_table."country" AS grouping_level_1,
            additional_table."state" AS grouping_level_2
            FROM "<target_schema>"."<target_table>" additional_table) analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, top_value
        ORDER BY grouping_level_1, grouping_level_2, total_values DESC
    ) top_col_values
) top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

PostgreSQL

Sensor template for PostgreSQLRendered SQL for PostgreSQL

{% import '/dialects/postgresql.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values,
            analyzed_table."country" AS grouping_level_1,
            analyzed_table."state" AS grouping_level_2
        FROM
            "your_postgresql_database"."<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, top_value
        ORDER BY grouping_level_1, grouping_level_2, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

Presto

Sensor template for PrestoRendered SQL for Presto

{% import '/dialects/presto.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM (
                SELECT
                    original_table.*,
                    {{ lib.render_target_column('original_table') }} AS top_value
                    {{- lib.render_data_grouping_projections('original_table') }}
                    {{- lib.render_time_dimension_projection('original_table') }}
                FROM {{ lib.render_target_table() }} original_table
                {{- lib.render_where_clause(extra_filter = lib.render_target_column('original_table') ~ ' IS NOT NULL', table_alias_prefix='original_table') }}
            ) analyzed_table
        {{- lib.render_where_clause(indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values,

                        analyzed_table.grouping_level_1,

                        analyzed_table.grouping_level_2

        FROM (
                SELECT
                    original_table.*,
                    original_table."target_column" AS top_value,
    original_table."country" AS grouping_level_1,
    original_table."state" AS grouping_level_2
                FROM "your_trino_database"."<target_schema>"."<target_table>" original_table
WHERE (original_table."target_column" IS NOT NULL)
            ) analyzed_table
        GROUP BY grouping_level_1, grouping_level_2, top_value
        ORDER BY grouping_level_1, grouping_level_2, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

QuestDB

Sensor template for QuestDBRendered SQL for QuestDB

{% import '/dialects/questdb.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT() AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM(
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{%- else %}
    COUNT_DISTINCT(
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT_DISTINCT(
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT() AS total_values,
            analyzed_table."country" AS grouping_level_1,
            analyzed_table."state" AS grouping_level_2
        FROM
            "<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, top_value
        ORDER BY grouping_level_1, grouping_level_2, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

Redshift

Sensor template for RedshiftRendered SQL for Redshift

{% import '/dialects/redshift.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values,
            analyzed_table."country" AS grouping_level_1,
            analyzed_table."state" AS grouping_level_2
        FROM
            "your_redshift_database"."<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, top_value
        ORDER BY grouping_level_1, grouping_level_2, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

Snowflake

Sensor template for SnowflakeRendered SQL for Snowflake

{% import '/dialects/snowflake.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values,
            analyzed_table."country" AS grouping_level_1,
            analyzed_table."state" AS grouping_level_2
        FROM
            "your_snowflake_database"."<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, top_value
        ORDER BY grouping_level_1, grouping_level_2, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

Spark

Sensor template for SparkRendered SQL for Spark

{% import '/dialects/spark.sql.jinja2' as lib with context -%}
{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values,
            analyzed_table.`country` AS grouping_level_1,
            analyzed_table.`state` AS grouping_level_2
        FROM
            `<target_schema>`.`<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, top_value
        ORDER BY grouping_level_1, grouping_level_2, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

SQL Server

Sensor template for SQL ServerRendered SQL for SQL Server

{% import '/dialects/sqlserver.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT_BIG(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} {{ lib.render_target_column('analyzed_table') }}
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT_BIG(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{% if lib.time_series is not none -%}
GROUP BY time_period, time_period_utc
{%- endif -%}
{%- if (lib.data_groupings is not none and (lib.data_groupings | length) > 0) -%}
    {% if lib.time_series is none %}GROUP BY {% endif -%}
    {%- for attribute in lib.data_groupings -%}
    {{ ', ' if lib.time_series is not none and loop.index == 1 else "" }}top_values.grouping_{{ attribute }}
    {%- endfor -%}
{%- endif -%}

SELECT
    COUNT_BIG(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table.[target_column] AS top_value,
            COUNT_BIG(*) AS total_values,
            analyzed_table.[country] AS grouping_level_1,
            analyzed_table.[state] AS grouping_level_2
        FROM
            [your_sql_server_database].[<target_schema>].[<target_table>] AS analyzed_table
        WHERE (analyzed_table.[target_column] IS NOT NULL)
        GROUP BY analyzed_table.[country], analyzed_table.[state], analyzed_table.[target_column]
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3GROUP BY top_values.grouping_level_1top_values.grouping_level_2

Teradata

Sensor template for TeradataRendered SQL for Teradata

{% import '/dialects/teradata.sql.jinja2' as lib with context -%}
{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values,
            analyzed_table."country" AS grouping_level_1,
            analyzed_table."state" AS grouping_level_2
        FROM
            "<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, top_value
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

Trino

Sensor template for TrinoRendered SQL for Trino

{% import '/dialects/trino.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM (
                SELECT
                    original_table.*,
                    {{ lib.render_target_column('original_table') }} AS top_value
                    {{- lib.render_data_grouping_projections('original_table') }}
                    {{- lib.render_time_dimension_projection('original_table') }}
                FROM {{ lib.render_target_table() }} original_table
                {{- lib.render_where_clause(extra_filter = lib.render_target_column('original_table') ~ ' IS NOT NULL', table_alias_prefix='original_table') }}
            ) analyzed_table
        {{- lib.render_where_clause(indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values,

                        analyzed_table.grouping_level_1,

                        analyzed_table.grouping_level_2

        FROM (
                SELECT
                    original_table.*,
                    original_table."target_column" AS top_value,
    original_table."country" AS grouping_level_1,
    original_table."state" AS grouping_level_2
                FROM "your_trino_catalog"."<target_schema>"."<target_table>" original_table
WHERE (original_table."target_column" IS NOT NULL)
            ) analyzed_table
        GROUP BY grouping_level_1, grouping_level_2, top_value
        ORDER BY grouping_level_1, grouping_level_2, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

daily expected texts in top values count

Check description

Verifies that the top X most popular column values contain all values from a list of expected values. Stores the most recent captured value for each day when the data quality check was evaluated.

Data quality check name	Friendly name	Category	Check type	Time scale	Quality dimension	Sensor definition	Quality rule	Standard
`daily_expected_texts_in_top_values_count`	Verify that the most popular text values match the list of expected values	accepted_values	monitoring	daily	Reasonableness	expected_texts_in_top_values_count	max_missing

Command-line examples

Please expand the section below to see the DQOps command-line examples to run or activate the daily expected texts in top values count data quality check.

Managing daily expected texts in top values count check from DQOps shell

Activate the check with a warning ruleActivate the check with an error ruleRun all configured checks

Activate this data quality using the check activate CLI command, providing the connection name, table name, check name, and all other filters. Activates the warning rule with the default parameters.

dqo> check activate -c=connection_name -t=schema_name.table_name -col=column_name -ch=daily_expected_texts_in_top_values_count --enable-warning

You can also use patterns to activate the check on all matching tables and columns.

dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=daily_expected_texts_in_top_values_count --enable-warning

Additional rule parameters are passed using the -Wrule_parameter_name=value.

dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=daily_expected_texts_in_top_values_count --enable-warning
                    -Wmax_missing=value

Activate this data quality using the check activate CLI command, providing the connection name, table name, check name, and all other filters. Activates the error rule with the default parameters.

dqo> check activate -c=connection_name -t=schema_name.table_name -col=column_name -ch=daily_expected_texts_in_top_values_count --enable-error

You can also use patterns to activate the check on all matching tables and columns.

dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=daily_expected_texts_in_top_values_count --enable-error

Additional rule parameters are passed using the -Erule_parameter_name=value.

dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=daily_expected_texts_in_top_values_count --enable-error
                    -Emax_missing=value

Run this data quality check using the check run CLI command by providing the check name and all other targeting filters. The following example shows how to run the daily_expected_texts_in_top_values_count check on all tables and columns on a single data source.

dqo> check run -c=data_source_name -ch=daily_expected_texts_in_top_values_count

It is also possible to run this check on a specific connection and table. In order to do this, use the connection name and the full table name parameters.

dqo> check run -c=connection_name -t=schema_name.table_name -ch=daily_expected_texts_in_top_values_count

You can also run this check on all tables (and columns) on which the daily_expected_texts_in_top_values_count check is enabled using patterns to find tables.

dqo> check run -c=connection_name -t=schema_prefix*.fact_* -col=column_name_* -ch=daily_expected_texts_in_top_values_count

YAML configuration

The sample schema_name.table_name.dqotable.yaml file with the check configured is shown below.

# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
  columns:
    target_column:
      monitoring_checks:
        daily:
          accepted_values:
            daily_expected_texts_in_top_values_count:
              parameters:
                expected_values:
                - USD
                - GBP
                - EUR
                top: 3
              warning:
                max_missing: 0
              error:
                max_missing: 1
              fatal:
                max_missing: 2
      labels:
      - This is the column that is analyzed for data quality issues

Samples of generated SQL queries for each data source type

Please expand the database engine name section to see the SQL query rendered by a Jinja2 template for the expected_texts_in_top_values_count data quality sensor.

BigQuery

Sensor template for BigQueryRendered SQL for BigQuery

{% import '/dialects/bigquery.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY NULL
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values
        FROM
            `your-google-project-id`.`<target_schema>`.`<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY top_value
        ORDER BY total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

ClickHouse

Sensor template for ClickHouseRendered SQL for ClickHouse

{% import '/dialects/clickhouse.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY NULL
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values
        FROM
            "<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY top_value
        ORDER BY total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

Databricks

Sensor template for DatabricksRendered SQL for Databricks

{% import '/dialects/databricks.sql.jinja2' as lib with context -%}
{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY NULL
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values
        FROM
            `<target_schema>`.`<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY top_value
        ORDER BY total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

DB2

Sensor template for DB2Rendered SQL for DB2

{% import '/dialects/db2.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value AS top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period AS time_period,
        top_col_values.time_period_utc AS time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) AS top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM (
                SELECT
                    original_table.*,
                    {{ lib.render_target_column('original_table') }} AS top_value
                    {{- lib.render_data_grouping_projections('original_table') }}
                    {{- lib.render_time_dimension_projection('original_table') }}
                FROM {{ lib.render_target_table() }} original_table
                {{- lib.render_where_clause(extra_filter = lib.render_target_column('original_table') ~ ' IS NOT NULL', table_alias_prefix='original_table') }}
            ) analyzed_table
        {{- lib.render_where_clause(indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true and lib.data_groupings is not none -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value AS top_value,
        RANK() OVER(
            ORDER BY top_col_values.total_values DESC) AS top_values_rank
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
        FROM (
                SELECT
                    original_table.*,
                    original_table."target_column" AS top_value
                FROM "<target_schema>"."<target_table>" original_table
WHERE (original_table."target_column" IS NOT NULL)
            ) analyzed_table
        GROUP BY top_value
        ORDER BY total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

DuckDB

Sensor template for DuckDBRendered SQL for DuckDB

{% import '/dialects/duckdb.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY NULL
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values
        FROM
             AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY top_value
        ORDER BY total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

HANA

Sensor template for HANARendered SQL for HANA

{% import '/dialects/hana.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM (
                SELECT
                    original_table.*,
                    {{ lib.render_target_column('original_table') }} AS top_value
                    {{- lib.render_data_grouping_projections('original_table') }}
                    {{- lib.render_time_dimension_projection('original_table') }}
                FROM {{ lib.render_target_table() }} original_table
                {{- lib.render_where_clause(extra_filter = lib.render_target_column('original_table') ~ ' IS NOT NULL', table_alias_prefix='original_table') }}
            ) analyzed_table
        {{- lib.render_where_clause(indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true and lib.data_groupings is not none -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
        FROM (
                SELECT
                    original_table.*,
                    original_table."target_column" AS top_value
                FROM "<target_schema>"."<target_table>" original_table
WHERE (original_table."target_column" IS NOT NULL)
            ) analyzed_table
        GROUP BY top_value
        ORDER BY total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

MariaDB

Sensor template for MariaDBRendered SQL for MariaDB

{% import '/dialects/mariadb.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
        FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY NULL
            ORDER BY top_col_values.total_values DESC) as top_values_rank
        FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values
        FROM
            `<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY top_value
        ORDER BY total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

MySQL

Sensor template for MySQLRendered SQL for MySQL

{% import '/dialects/mysql.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
        FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY NULL
            ORDER BY top_col_values.total_values DESC) as top_values_rank
        FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values
        FROM
            `<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY top_value
        ORDER BY total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

Oracle

Sensor template for OracleRendered SQL for Oracle

{% import '/dialects/oracle.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period time_period,
        top_col_values.time_period_utc time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} top_value,
            COUNT(*) total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM
        (
            SELECT
            additional_table.*,
            {{ lib.render_target_column('additional_table') }} top_value
            {{- lib.render_data_grouping_projections('additional_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('additional_table', indentation = '            ') }}
            FROM {{ lib.render_target_table() }} additional_table) analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) top_col_values
) top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL actual_value,
    MAX(0) expected_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM(
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
    {{- lib.render_where_clause(table_alias_prefix='original_table') }}) analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) actual_value,
    MAX({{ parameters.expected_values | length }}) expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) actual_value,
    MAX(3) expected_value
FROM
(
    SELECT
        top_col_values.top_value top_value,
        RANK() OVER(PARTITION BY NULL
            ORDER BY top_col_values.total_values DESC) top_values_rank
    FROM
    (
        SELECT
            analyzed_table."target_column" top_value,
            COUNT(*) total_values
        FROM
        (
            SELECT
            additional_table.*,
            additional_table."target_column" top_value
            FROM "<target_schema>"."<target_table>" additional_table) analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY top_value
        ORDER BY total_values DESC
    ) top_col_values
) top_values
WHERE top_values_rank <= 3

PostgreSQL

Sensor template for PostgreSQLRendered SQL for PostgreSQL

{% import '/dialects/postgresql.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY NULL
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values
        FROM
            "your_postgresql_database"."<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY top_value
        ORDER BY total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

Presto

Sensor template for PrestoRendered SQL for Presto

{% import '/dialects/presto.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM (
                SELECT
                    original_table.*,
                    {{ lib.render_target_column('original_table') }} AS top_value
                    {{- lib.render_data_grouping_projections('original_table') }}
                    {{- lib.render_time_dimension_projection('original_table') }}
                FROM {{ lib.render_target_table() }} original_table
                {{- lib.render_where_clause(extra_filter = lib.render_target_column('original_table') ~ ' IS NOT NULL', table_alias_prefix='original_table') }}
            ) analyzed_table
        {{- lib.render_where_clause(indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY NULL
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
        FROM (
                SELECT
                    original_table.*,
                    original_table."target_column" AS top_value
                FROM "your_trino_database"."<target_schema>"."<target_table>" original_table
WHERE (original_table."target_column" IS NOT NULL)
            ) analyzed_table
        GROUP BY top_value
        ORDER BY total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

QuestDB

Sensor template for QuestDBRendered SQL for QuestDB

{% import '/dialects/questdb.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT() AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM(
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{%- else %}
    COUNT_DISTINCT(
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT_DISTINCT(
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY NULL
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT() AS total_values
        FROM
            "<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY top_value
        ORDER BY total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

Redshift

Sensor template for RedshiftRendered SQL for Redshift

{% import '/dialects/redshift.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY NULL
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values
        FROM
            "your_redshift_database"."<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY top_value
        ORDER BY total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

Snowflake

Sensor template for SnowflakeRendered SQL for Snowflake

{% import '/dialects/snowflake.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY NULL
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values
        FROM
            "your_snowflake_database"."<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY top_value
        ORDER BY total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

Spark

Sensor template for SparkRendered SQL for Spark

{% import '/dialects/spark.sql.jinja2' as lib with context -%}
{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY NULL
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values
        FROM
            `<target_schema>`.`<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY top_value
        ORDER BY total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

SQL Server

Sensor template for SQL ServerRendered SQL for SQL Server

{% import '/dialects/sqlserver.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT_BIG(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} {{ lib.render_target_column('analyzed_table') }}
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT_BIG(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{% if lib.time_series is not none -%}
GROUP BY time_period, time_period_utc
{%- endif -%}
{%- if (lib.data_groupings is not none and (lib.data_groupings | length) > 0) -%}
    {% if lib.time_series is none %}GROUP BY {% endif -%}
    {%- for attribute in lib.data_groupings -%}
    {{ ', ' if lib.time_series is not none and loop.index == 1 else "" }}top_values.grouping_{{ attribute }}
    {%- endfor -%}
{%- endif -%}

SELECT
    COUNT_BIG(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY NULL
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table.[target_column] AS top_value,
            COUNT_BIG(*) AS total_values
        FROM
            [your_sql_server_database].[<target_schema>].[<target_table>] AS analyzed_table
        WHERE (analyzed_table.[target_column] IS NOT NULL)
        GROUP BY analyzed_table.[target_column]
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

Teradata

Sensor template for TeradataRendered SQL for Teradata

{% import '/dialects/teradata.sql.jinja2' as lib with context -%}
{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY NULL
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values
        FROM
            "<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY top_value
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

Trino

Sensor template for TrinoRendered SQL for Trino

{% import '/dialects/trino.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM (
                SELECT
                    original_table.*,
                    {{ lib.render_target_column('original_table') }} AS top_value
                    {{- lib.render_data_grouping_projections('original_table') }}
                    {{- lib.render_time_dimension_projection('original_table') }}
                FROM {{ lib.render_target_table() }} original_table
                {{- lib.render_where_clause(extra_filter = lib.render_target_column('original_table') ~ ' IS NOT NULL', table_alias_prefix='original_table') }}
            ) analyzed_table
        {{- lib.render_where_clause(indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY NULL
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
        FROM (
                SELECT
                    original_table.*,
                    original_table."target_column" AS top_value
                FROM "your_trino_catalog"."<target_schema>"."<target_table>" original_table
WHERE (original_table."target_column" IS NOT NULL)
            ) analyzed_table
        GROUP BY top_value
        ORDER BY total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

Expand the Configure with data grouping section to see additional examples for configuring this data quality checks to use data grouping (GROUP BY).

Configuration with data grouping

Sample configuration with data grouping enabled (YAML) The sample below shows how to configure the data grouping and how it affects the generated SQL query.

# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
  default_grouping_name: group_by_country_and_state
  groupings:
    group_by_country_and_state:
      level_1:
        source: column_value
        column: country
      level_2:
        source: column_value
        column: state
  columns:
    target_column:
      monitoring_checks:
        daily:
          accepted_values:
            daily_expected_texts_in_top_values_count:
              parameters:
                expected_values:
                - USD
                - GBP
                - EUR
                top: 3
              warning:
                max_missing: 0
              error:
                max_missing: 1
              fatal:
                max_missing: 2
      labels:
      - This is the column that is analyzed for data quality issues
    country:
      labels:
      - column used as the first grouping key
    state:
      labels:
      - column used as the second grouping key

Please expand the database engine name section to see the SQL query rendered by a Jinja2 template for the expected_texts_in_top_values_count sensor.

BigQuery

Sensor template for BigQueryRendered SQL for BigQuery

{% import '/dialects/bigquery.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values,
            analyzed_table.`country` AS grouping_level_1,
            analyzed_table.`state` AS grouping_level_2
        FROM
            `your-google-project-id`.`<target_schema>`.`<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, top_value
        ORDER BY grouping_level_1, grouping_level_2, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

ClickHouse

Sensor template for ClickHouseRendered SQL for ClickHouse

{% import '/dialects/clickhouse.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values,
            analyzed_table."country" AS grouping_level_1,
            analyzed_table."state" AS grouping_level_2
        FROM
            "<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, top_value
        ORDER BY grouping_level_1, grouping_level_2, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

Databricks

Sensor template for DatabricksRendered SQL for Databricks

{% import '/dialects/databricks.sql.jinja2' as lib with context -%}
{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values,
            analyzed_table.`country` AS grouping_level_1,
            analyzed_table.`state` AS grouping_level_2
        FROM
            `<target_schema>`.`<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, top_value
        ORDER BY grouping_level_1, grouping_level_2, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

DB2

Sensor template for DB2Rendered SQL for DB2

{% import '/dialects/db2.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value AS top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period AS time_period,
        top_col_values.time_period_utc AS time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) AS top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM (
                SELECT
                    original_table.*,
                    {{ lib.render_target_column('original_table') }} AS top_value
                    {{- lib.render_data_grouping_projections('original_table') }}
                    {{- lib.render_time_dimension_projection('original_table') }}
                FROM {{ lib.render_target_table() }} original_table
                {{- lib.render_where_clause(extra_filter = lib.render_target_column('original_table') ~ ' IS NOT NULL', table_alias_prefix='original_table') }}
            ) analyzed_table
        {{- lib.render_where_clause(indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true and lib.data_groupings is not none -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value AS top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) AS top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values,

                        analyzed_table.grouping_level_1,

                        analyzed_table.grouping_level_2
        FROM (
                SELECT
                    original_table.*,
                    original_table."target_column" AS top_value,
    original_table."country" AS grouping_level_1,
    original_table."state" AS grouping_level_2
                FROM "<target_schema>"."<target_table>" original_table
WHERE (original_table."target_column" IS NOT NULL)
            ) analyzed_table
        GROUP BY grouping_level_1, grouping_level_2, top_value
        ORDER BY grouping_level_1, grouping_level_2, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

DuckDB

Sensor template for DuckDBRendered SQL for DuckDB

{% import '/dialects/duckdb.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values,
            analyzed_table."country" AS grouping_level_1,
            analyzed_table."state" AS grouping_level_2
        FROM
             AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, top_value
        ORDER BY grouping_level_1, grouping_level_2, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

HANA

Sensor template for HANARendered SQL for HANA

{% import '/dialects/hana.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM (
                SELECT
                    original_table.*,
                    {{ lib.render_target_column('original_table') }} AS top_value
                    {{- lib.render_data_grouping_projections('original_table') }}
                    {{- lib.render_time_dimension_projection('original_table') }}
                FROM {{ lib.render_target_table() }} original_table
                {{- lib.render_where_clause(extra_filter = lib.render_target_column('original_table') ~ ' IS NOT NULL', table_alias_prefix='original_table') }}
            ) analyzed_table
        {{- lib.render_where_clause(indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true and lib.data_groupings is not none -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values,

                        analyzed_table.grouping_level_1,

                        analyzed_table.grouping_level_2
        FROM (
                SELECT
                    original_table.*,
                    original_table."target_column" AS top_value,
    original_table."country" AS grouping_level_1,
    original_table."state" AS grouping_level_2
                FROM "<target_schema>"."<target_table>" original_table
WHERE (original_table."target_column" IS NOT NULL)
            ) analyzed_table
        GROUP BY grouping_level_1, grouping_level_2, top_value
        ORDER BY grouping_level_1, grouping_level_2, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

MariaDB

Sensor template for MariaDBRendered SQL for MariaDB

{% import '/dialects/mariadb.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
        FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
        FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values,
            analyzed_table.`country` AS grouping_level_1,
            analyzed_table.`state` AS grouping_level_2
        FROM
            `<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, top_value
        ORDER BY grouping_level_1, grouping_level_2, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

MySQL

Sensor template for MySQLRendered SQL for MySQL

{% import '/dialects/mysql.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
        FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
        FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values,
            analyzed_table.`country` AS grouping_level_1,
            analyzed_table.`state` AS grouping_level_2
        FROM
            `<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, top_value
        ORDER BY grouping_level_1, grouping_level_2, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

Oracle

Sensor template for OracleRendered SQL for Oracle

{% import '/dialects/oracle.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period time_period,
        top_col_values.time_period_utc time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} top_value,
            COUNT(*) total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM
        (
            SELECT
            additional_table.*,
            {{ lib.render_target_column('additional_table') }} top_value
            {{- lib.render_data_grouping_projections('additional_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('additional_table', indentation = '            ') }}
            FROM {{ lib.render_target_table() }} additional_table) analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) top_col_values
) top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL actual_value,
    MAX(0) expected_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM(
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
    {{- lib.render_where_clause(table_alias_prefix='original_table') }}) analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) actual_value,
    MAX({{ parameters.expected_values | length }}) expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) actual_value,
    MAX(3) expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table."target_column" top_value,
            COUNT(*) total_values,

                        analyzed_table.grouping_level_1,

                        analyzed_table.grouping_level_2

        FROM
        (
            SELECT
            additional_table.*,
            additional_table."target_column" top_value,
            additional_table."country" AS grouping_level_1,
            additional_table."state" AS grouping_level_2
            FROM "<target_schema>"."<target_table>" additional_table) analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, top_value
        ORDER BY grouping_level_1, grouping_level_2, total_values DESC
    ) top_col_values
) top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

PostgreSQL

Sensor template for PostgreSQLRendered SQL for PostgreSQL

{% import '/dialects/postgresql.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values,
            analyzed_table."country" AS grouping_level_1,
            analyzed_table."state" AS grouping_level_2
        FROM
            "your_postgresql_database"."<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, top_value
        ORDER BY grouping_level_1, grouping_level_2, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

Presto

Sensor template for PrestoRendered SQL for Presto

{% import '/dialects/presto.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM (
                SELECT
                    original_table.*,
                    {{ lib.render_target_column('original_table') }} AS top_value
                    {{- lib.render_data_grouping_projections('original_table') }}
                    {{- lib.render_time_dimension_projection('original_table') }}
                FROM {{ lib.render_target_table() }} original_table
                {{- lib.render_where_clause(extra_filter = lib.render_target_column('original_table') ~ ' IS NOT NULL', table_alias_prefix='original_table') }}
            ) analyzed_table
        {{- lib.render_where_clause(indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values,

                        analyzed_table.grouping_level_1,

                        analyzed_table.grouping_level_2

        FROM (
                SELECT
                    original_table.*,
                    original_table."target_column" AS top_value,
    original_table."country" AS grouping_level_1,
    original_table."state" AS grouping_level_2
                FROM "your_trino_database"."<target_schema>"."<target_table>" original_table
WHERE (original_table."target_column" IS NOT NULL)
            ) analyzed_table
        GROUP BY grouping_level_1, grouping_level_2, top_value
        ORDER BY grouping_level_1, grouping_level_2, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

QuestDB

Sensor template for QuestDBRendered SQL for QuestDB

{% import '/dialects/questdb.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT() AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM(
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{%- else %}
    COUNT_DISTINCT(
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT_DISTINCT(
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT() AS total_values,
            analyzed_table."country" AS grouping_level_1,
            analyzed_table."state" AS grouping_level_2
        FROM
            "<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, top_value
        ORDER BY grouping_level_1, grouping_level_2, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

Redshift

Sensor template for RedshiftRendered SQL for Redshift

{% import '/dialects/redshift.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values,
            analyzed_table."country" AS grouping_level_1,
            analyzed_table."state" AS grouping_level_2
        FROM
            "your_redshift_database"."<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, top_value
        ORDER BY grouping_level_1, grouping_level_2, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

Snowflake

Sensor template for SnowflakeRendered SQL for Snowflake

{% import '/dialects/snowflake.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values,
            analyzed_table."country" AS grouping_level_1,
            analyzed_table."state" AS grouping_level_2
        FROM
            "your_snowflake_database"."<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, top_value
        ORDER BY grouping_level_1, grouping_level_2, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

Spark

Sensor template for SparkRendered SQL for Spark

{% import '/dialects/spark.sql.jinja2' as lib with context -%}
{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values,
            analyzed_table.`country` AS grouping_level_1,
            analyzed_table.`state` AS grouping_level_2
        FROM
            `<target_schema>`.`<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, top_value
        ORDER BY grouping_level_1, grouping_level_2, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

SQL Server

Sensor template for SQL ServerRendered SQL for SQL Server

{% import '/dialects/sqlserver.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT_BIG(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} {{ lib.render_target_column('analyzed_table') }}
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT_BIG(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{% if lib.time_series is not none -%}
GROUP BY time_period, time_period_utc
{%- endif -%}
{%- if (lib.data_groupings is not none and (lib.data_groupings | length) > 0) -%}
    {% if lib.time_series is none %}GROUP BY {% endif -%}
    {%- for attribute in lib.data_groupings -%}
    {{ ', ' if lib.time_series is not none and loop.index == 1 else "" }}top_values.grouping_{{ attribute }}
    {%- endfor -%}
{%- endif -%}

SELECT
    COUNT_BIG(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table.[target_column] AS top_value,
            COUNT_BIG(*) AS total_values,
            analyzed_table.[country] AS grouping_level_1,
            analyzed_table.[state] AS grouping_level_2
        FROM
            [your_sql_server_database].[<target_schema>].[<target_table>] AS analyzed_table
        WHERE (analyzed_table.[target_column] IS NOT NULL)
        GROUP BY analyzed_table.[country], analyzed_table.[state], analyzed_table.[target_column]
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3GROUP BY top_values.grouping_level_1top_values.grouping_level_2

Teradata

Sensor template for TeradataRendered SQL for Teradata

{% import '/dialects/teradata.sql.jinja2' as lib with context -%}
{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values,
            analyzed_table."country" AS grouping_level_1,
            analyzed_table."state" AS grouping_level_2
        FROM
            "<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, top_value
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

Trino

Sensor template for TrinoRendered SQL for Trino

{% import '/dialects/trino.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM (
                SELECT
                    original_table.*,
                    {{ lib.render_target_column('original_table') }} AS top_value
                    {{- lib.render_data_grouping_projections('original_table') }}
                    {{- lib.render_time_dimension_projection('original_table') }}
                FROM {{ lib.render_target_table() }} original_table
                {{- lib.render_where_clause(extra_filter = lib.render_target_column('original_table') ~ ' IS NOT NULL', table_alias_prefix='original_table') }}
            ) analyzed_table
        {{- lib.render_where_clause(indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values,

                        analyzed_table.grouping_level_1,

                        analyzed_table.grouping_level_2

        FROM (
                SELECT
                    original_table.*,
                    original_table."target_column" AS top_value,
    original_table."country" AS grouping_level_1,
    original_table."state" AS grouping_level_2
                FROM "your_trino_catalog"."<target_schema>"."<target_table>" original_table
WHERE (original_table."target_column" IS NOT NULL)
            ) analyzed_table
        GROUP BY grouping_level_1, grouping_level_2, top_value
        ORDER BY grouping_level_1, grouping_level_2, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

monthly expected texts in top values count

Check description

Verifies that the top X most popular column values contain all values from a list of expected values. Stores the most recent captured value for each month when the data quality check was evaluated.

Data quality check name	Friendly name	Category	Check type	Time scale	Quality dimension	Sensor definition	Quality rule	Standard
`monthly_expected_texts_in_top_values_count`	Verify that the most popular text values match the list of expected values	accepted_values	monitoring	monthly	Reasonableness	expected_texts_in_top_values_count	max_missing

Command-line examples

Please expand the section below to see the DQOps command-line examples to run or activate the monthly expected texts in top values count data quality check.

Managing monthly expected texts in top values count check from DQOps shell

Activate the check with a warning ruleActivate the check with an error ruleRun all configured checks

Activate this data quality using the check activate CLI command, providing the connection name, table name, check name, and all other filters. Activates the warning rule with the default parameters.

dqo> check activate -c=connection_name -t=schema_name.table_name -col=column_name -ch=monthly_expected_texts_in_top_values_count --enable-warning

You can also use patterns to activate the check on all matching tables and columns.

dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=monthly_expected_texts_in_top_values_count --enable-warning

Additional rule parameters are passed using the -Wrule_parameter_name=value.

dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=monthly_expected_texts_in_top_values_count --enable-warning
                    -Wmax_missing=value

Activate this data quality using the check activate CLI command, providing the connection name, table name, check name, and all other filters. Activates the error rule with the default parameters.

dqo> check activate -c=connection_name -t=schema_name.table_name -col=column_name -ch=monthly_expected_texts_in_top_values_count --enable-error

You can also use patterns to activate the check on all matching tables and columns.

dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=monthly_expected_texts_in_top_values_count --enable-error

Additional rule parameters are passed using the -Erule_parameter_name=value.

dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=monthly_expected_texts_in_top_values_count --enable-error
                    -Emax_missing=value

Run this data quality check using the check run CLI command by providing the check name and all other targeting filters. The following example shows how to run the monthly_expected_texts_in_top_values_count check on all tables and columns on a single data source.

dqo> check run -c=data_source_name -ch=monthly_expected_texts_in_top_values_count

It is also possible to run this check on a specific connection and table. In order to do this, use the connection name and the full table name parameters.

dqo> check run -c=connection_name -t=schema_name.table_name -ch=monthly_expected_texts_in_top_values_count

You can also run this check on all tables (and columns) on which the monthly_expected_texts_in_top_values_count check is enabled using patterns to find tables.

dqo> check run -c=connection_name -t=schema_prefix*.fact_* -col=column_name_* -ch=monthly_expected_texts_in_top_values_count

YAML configuration

The sample schema_name.table_name.dqotable.yaml file with the check configured is shown below.

# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
  columns:
    target_column:
      monitoring_checks:
        monthly:
          accepted_values:
            monthly_expected_texts_in_top_values_count:
              parameters:
                expected_values:
                - USD
                - GBP
                - EUR
                top: 3
              warning:
                max_missing: 0
              error:
                max_missing: 1
              fatal:
                max_missing: 2
      labels:
      - This is the column that is analyzed for data quality issues

Samples of generated SQL queries for each data source type

Please expand the database engine name section to see the SQL query rendered by a Jinja2 template for the expected_texts_in_top_values_count data quality sensor.

BigQuery

Sensor template for BigQueryRendered SQL for BigQuery

{% import '/dialects/bigquery.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY NULL
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values
        FROM
            `your-google-project-id`.`<target_schema>`.`<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY top_value
        ORDER BY total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

ClickHouse

Sensor template for ClickHouseRendered SQL for ClickHouse

{% import '/dialects/clickhouse.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY NULL
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values
        FROM
            "<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY top_value
        ORDER BY total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

Databricks

Sensor template for DatabricksRendered SQL for Databricks

{% import '/dialects/databricks.sql.jinja2' as lib with context -%}
{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY NULL
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values
        FROM
            `<target_schema>`.`<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY top_value
        ORDER BY total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

DB2

Sensor template for DB2Rendered SQL for DB2

{% import '/dialects/db2.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value AS top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period AS time_period,
        top_col_values.time_period_utc AS time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) AS top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM (
                SELECT
                    original_table.*,
                    {{ lib.render_target_column('original_table') }} AS top_value
                    {{- lib.render_data_grouping_projections('original_table') }}
                    {{- lib.render_time_dimension_projection('original_table') }}
                FROM {{ lib.render_target_table() }} original_table
                {{- lib.render_where_clause(extra_filter = lib.render_target_column('original_table') ~ ' IS NOT NULL', table_alias_prefix='original_table') }}
            ) analyzed_table
        {{- lib.render_where_clause(indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true and lib.data_groupings is not none -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value AS top_value,
        RANK() OVER(
            ORDER BY top_col_values.total_values DESC) AS top_values_rank
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
        FROM (
                SELECT
                    original_table.*,
                    original_table."target_column" AS top_value
                FROM "<target_schema>"."<target_table>" original_table
WHERE (original_table."target_column" IS NOT NULL)
            ) analyzed_table
        GROUP BY top_value
        ORDER BY total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

DuckDB

Sensor template for DuckDBRendered SQL for DuckDB

{% import '/dialects/duckdb.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY NULL
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values
        FROM
             AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY top_value
        ORDER BY total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

HANA

Sensor template for HANARendered SQL for HANA

{% import '/dialects/hana.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM (
                SELECT
                    original_table.*,
                    {{ lib.render_target_column('original_table') }} AS top_value
                    {{- lib.render_data_grouping_projections('original_table') }}
                    {{- lib.render_time_dimension_projection('original_table') }}
                FROM {{ lib.render_target_table() }} original_table
                {{- lib.render_where_clause(extra_filter = lib.render_target_column('original_table') ~ ' IS NOT NULL', table_alias_prefix='original_table') }}
            ) analyzed_table
        {{- lib.render_where_clause(indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true and lib.data_groupings is not none -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
        FROM (
                SELECT
                    original_table.*,
                    original_table."target_column" AS top_value
                FROM "<target_schema>"."<target_table>" original_table
WHERE (original_table."target_column" IS NOT NULL)
            ) analyzed_table
        GROUP BY top_value
        ORDER BY total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

MariaDB

Sensor template for MariaDBRendered SQL for MariaDB

{% import '/dialects/mariadb.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
        FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY NULL
            ORDER BY top_col_values.total_values DESC) as top_values_rank
        FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values
        FROM
            `<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY top_value
        ORDER BY total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

MySQL

Sensor template for MySQLRendered SQL for MySQL

{% import '/dialects/mysql.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
        FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY NULL
            ORDER BY top_col_values.total_values DESC) as top_values_rank
        FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values
        FROM
            `<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY top_value
        ORDER BY total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

Oracle

Sensor template for OracleRendered SQL for Oracle

{% import '/dialects/oracle.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period time_period,
        top_col_values.time_period_utc time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} top_value,
            COUNT(*) total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM
        (
            SELECT
            additional_table.*,
            {{ lib.render_target_column('additional_table') }} top_value
            {{- lib.render_data_grouping_projections('additional_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('additional_table', indentation = '            ') }}
            FROM {{ lib.render_target_table() }} additional_table) analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) top_col_values
) top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL actual_value,
    MAX(0) expected_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM(
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
    {{- lib.render_where_clause(table_alias_prefix='original_table') }}) analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) actual_value,
    MAX({{ parameters.expected_values | length }}) expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) actual_value,
    MAX(3) expected_value
FROM
(
    SELECT
        top_col_values.top_value top_value,
        RANK() OVER(PARTITION BY NULL
            ORDER BY top_col_values.total_values DESC) top_values_rank
    FROM
    (
        SELECT
            analyzed_table."target_column" top_value,
            COUNT(*) total_values
        FROM
        (
            SELECT
            additional_table.*,
            additional_table."target_column" top_value
            FROM "<target_schema>"."<target_table>" additional_table) analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY top_value
        ORDER BY total_values DESC
    ) top_col_values
) top_values
WHERE top_values_rank <= 3

PostgreSQL

Sensor template for PostgreSQLRendered SQL for PostgreSQL

{% import '/dialects/postgresql.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY NULL
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values
        FROM
            "your_postgresql_database"."<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY top_value
        ORDER BY total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

Presto

Sensor template for PrestoRendered SQL for Presto

{% import '/dialects/presto.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM (
                SELECT
                    original_table.*,
                    {{ lib.render_target_column('original_table') }} AS top_value
                    {{- lib.render_data_grouping_projections('original_table') }}
                    {{- lib.render_time_dimension_projection('original_table') }}
                FROM {{ lib.render_target_table() }} original_table
                {{- lib.render_where_clause(extra_filter = lib.render_target_column('original_table') ~ ' IS NOT NULL', table_alias_prefix='original_table') }}
            ) analyzed_table
        {{- lib.render_where_clause(indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY NULL
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
        FROM (
                SELECT
                    original_table.*,
                    original_table."target_column" AS top_value
                FROM "your_trino_database"."<target_schema>"."<target_table>" original_table
WHERE (original_table."target_column" IS NOT NULL)
            ) analyzed_table
        GROUP BY top_value
        ORDER BY total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

QuestDB

Sensor template for QuestDBRendered SQL for QuestDB

{% import '/dialects/questdb.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT() AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM(
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{%- else %}
    COUNT_DISTINCT(
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT_DISTINCT(
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY NULL
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT() AS total_values
        FROM
            "<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY top_value
        ORDER BY total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

Redshift

Sensor template for RedshiftRendered SQL for Redshift

{% import '/dialects/redshift.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY NULL
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values
        FROM
            "your_redshift_database"."<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY top_value
        ORDER BY total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

Snowflake

Sensor template for SnowflakeRendered SQL for Snowflake

{% import '/dialects/snowflake.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY NULL
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values
        FROM
            "your_snowflake_database"."<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY top_value
        ORDER BY total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

Spark

Sensor template for SparkRendered SQL for Spark

{% import '/dialects/spark.sql.jinja2' as lib with context -%}
{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY NULL
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values
        FROM
            `<target_schema>`.`<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY top_value
        ORDER BY total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

SQL Server

Sensor template for SQL ServerRendered SQL for SQL Server

{% import '/dialects/sqlserver.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT_BIG(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} {{ lib.render_target_column('analyzed_table') }}
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT_BIG(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{% if lib.time_series is not none -%}
GROUP BY time_period, time_period_utc
{%- endif -%}
{%- if (lib.data_groupings is not none and (lib.data_groupings | length) > 0) -%}
    {% if lib.time_series is none %}GROUP BY {% endif -%}
    {%- for attribute in lib.data_groupings -%}
    {{ ', ' if lib.time_series is not none and loop.index == 1 else "" }}top_values.grouping_{{ attribute }}
    {%- endfor -%}
{%- endif -%}

SELECT
    COUNT_BIG(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY NULL
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table.[target_column] AS top_value,
            COUNT_BIG(*) AS total_values
        FROM
            [your_sql_server_database].[<target_schema>].[<target_table>] AS analyzed_table
        WHERE (analyzed_table.[target_column] IS NOT NULL)
        GROUP BY analyzed_table.[target_column]
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

Teradata

Sensor template for TeradataRendered SQL for Teradata

{% import '/dialects/teradata.sql.jinja2' as lib with context -%}
{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY NULL
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values
        FROM
            "<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY top_value
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

Trino

Sensor template for TrinoRendered SQL for Trino

{% import '/dialects/trino.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM (
                SELECT
                    original_table.*,
                    {{ lib.render_target_column('original_table') }} AS top_value
                    {{- lib.render_data_grouping_projections('original_table') }}
                    {{- lib.render_time_dimension_projection('original_table') }}
                FROM {{ lib.render_target_table() }} original_table
                {{- lib.render_where_clause(extra_filter = lib.render_target_column('original_table') ~ ' IS NOT NULL', table_alias_prefix='original_table') }}
            ) analyzed_table
        {{- lib.render_where_clause(indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY NULL
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
        FROM (
                SELECT
                    original_table.*,
                    original_table."target_column" AS top_value
                FROM "your_trino_catalog"."<target_schema>"."<target_table>" original_table
WHERE (original_table."target_column" IS NOT NULL)
            ) analyzed_table
        GROUP BY top_value
        ORDER BY total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3

Expand the Configure with data grouping section to see additional examples for configuring this data quality checks to use data grouping (GROUP BY).

Configuration with data grouping

Sample configuration with data grouping enabled (YAML) The sample below shows how to configure the data grouping and how it affects the generated SQL query.

# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
  default_grouping_name: group_by_country_and_state
  groupings:
    group_by_country_and_state:
      level_1:
        source: column_value
        column: country
      level_2:
        source: column_value
        column: state
  columns:
    target_column:
      monitoring_checks:
        monthly:
          accepted_values:
            monthly_expected_texts_in_top_values_count:
              parameters:
                expected_values:
                - USD
                - GBP
                - EUR
                top: 3
              warning:
                max_missing: 0
              error:
                max_missing: 1
              fatal:
                max_missing: 2
      labels:
      - This is the column that is analyzed for data quality issues
    country:
      labels:
      - column used as the first grouping key
    state:
      labels:
      - column used as the second grouping key

Please expand the database engine name section to see the SQL query rendered by a Jinja2 template for the expected_texts_in_top_values_count sensor.

BigQuery

Sensor template for BigQueryRendered SQL for BigQuery

{% import '/dialects/bigquery.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values,
            analyzed_table.`country` AS grouping_level_1,
            analyzed_table.`state` AS grouping_level_2
        FROM
            `your-google-project-id`.`<target_schema>`.`<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, top_value
        ORDER BY grouping_level_1, grouping_level_2, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

ClickHouse

Sensor template for ClickHouseRendered SQL for ClickHouse

{% import '/dialects/clickhouse.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values,
            analyzed_table."country" AS grouping_level_1,
            analyzed_table."state" AS grouping_level_2
        FROM
            "<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, top_value
        ORDER BY grouping_level_1, grouping_level_2, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

Databricks

Sensor template for DatabricksRendered SQL for Databricks

{% import '/dialects/databricks.sql.jinja2' as lib with context -%}
{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values,
            analyzed_table.`country` AS grouping_level_1,
            analyzed_table.`state` AS grouping_level_2
        FROM
            `<target_schema>`.`<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, top_value
        ORDER BY grouping_level_1, grouping_level_2, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

DB2

Sensor template for DB2Rendered SQL for DB2

{% import '/dialects/db2.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value AS top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period AS time_period,
        top_col_values.time_period_utc AS time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) AS top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM (
                SELECT
                    original_table.*,
                    {{ lib.render_target_column('original_table') }} AS top_value
                    {{- lib.render_data_grouping_projections('original_table') }}
                    {{- lib.render_time_dimension_projection('original_table') }}
                FROM {{ lib.render_target_table() }} original_table
                {{- lib.render_where_clause(extra_filter = lib.render_target_column('original_table') ~ ' IS NOT NULL', table_alias_prefix='original_table') }}
            ) analyzed_table
        {{- lib.render_where_clause(indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true and lib.data_groupings is not none -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value AS top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) AS top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values,

                        analyzed_table.grouping_level_1,

                        analyzed_table.grouping_level_2
        FROM (
                SELECT
                    original_table.*,
                    original_table."target_column" AS top_value,
    original_table."country" AS grouping_level_1,
    original_table."state" AS grouping_level_2
                FROM "<target_schema>"."<target_table>" original_table
WHERE (original_table."target_column" IS NOT NULL)
            ) analyzed_table
        GROUP BY grouping_level_1, grouping_level_2, top_value
        ORDER BY grouping_level_1, grouping_level_2, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

DuckDB

Sensor template for DuckDBRendered SQL for DuckDB

{% import '/dialects/duckdb.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values,
            analyzed_table."country" AS grouping_level_1,
            analyzed_table."state" AS grouping_level_2
        FROM
             AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, top_value
        ORDER BY grouping_level_1, grouping_level_2, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

HANA

Sensor template for HANARendered SQL for HANA

{% import '/dialects/hana.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM (
                SELECT
                    original_table.*,
                    {{ lib.render_target_column('original_table') }} AS top_value
                    {{- lib.render_data_grouping_projections('original_table') }}
                    {{- lib.render_time_dimension_projection('original_table') }}
                FROM {{ lib.render_target_table() }} original_table
                {{- lib.render_where_clause(extra_filter = lib.render_target_column('original_table') ~ ' IS NOT NULL', table_alias_prefix='original_table') }}
            ) analyzed_table
        {{- lib.render_where_clause(indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true and lib.data_groupings is not none -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values,

                        analyzed_table.grouping_level_1,

                        analyzed_table.grouping_level_2
        FROM (
                SELECT
                    original_table.*,
                    original_table."target_column" AS top_value,
    original_table."country" AS grouping_level_1,
    original_table."state" AS grouping_level_2
                FROM "<target_schema>"."<target_table>" original_table
WHERE (original_table."target_column" IS NOT NULL)
            ) analyzed_table
        GROUP BY grouping_level_1, grouping_level_2, top_value
        ORDER BY grouping_level_1, grouping_level_2, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

MariaDB

Sensor template for MariaDBRendered SQL for MariaDB

{% import '/dialects/mariadb.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
        FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
        FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values,
            analyzed_table.`country` AS grouping_level_1,
            analyzed_table.`state` AS grouping_level_2
        FROM
            `<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, top_value
        ORDER BY grouping_level_1, grouping_level_2, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

MySQL

Sensor template for MySQLRendered SQL for MySQL

{% import '/dialects/mysql.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
        FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
        FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values,
            analyzed_table.`country` AS grouping_level_1,
            analyzed_table.`state` AS grouping_level_2
        FROM
            `<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, top_value
        ORDER BY grouping_level_1, grouping_level_2, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

Oracle

Sensor template for OracleRendered SQL for Oracle

{% import '/dialects/oracle.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period time_period,
        top_col_values.time_period_utc time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} top_value,
            COUNT(*) total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM
        (
            SELECT
            additional_table.*,
            {{ lib.render_target_column('additional_table') }} top_value
            {{- lib.render_data_grouping_projections('additional_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('additional_table', indentation = '            ') }}
            FROM {{ lib.render_target_table() }} additional_table) analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) top_col_values
) top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL actual_value,
    MAX(0) expected_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM(
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
    {{- lib.render_where_clause(table_alias_prefix='original_table') }}) analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) actual_value,
    MAX({{ parameters.expected_values | length }}) expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) actual_value,
    MAX(3) expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table."target_column" top_value,
            COUNT(*) total_values,

                        analyzed_table.grouping_level_1,

                        analyzed_table.grouping_level_2

        FROM
        (
            SELECT
            additional_table.*,
            additional_table."target_column" top_value,
            additional_table."country" AS grouping_level_1,
            additional_table."state" AS grouping_level_2
            FROM "<target_schema>"."<target_table>" additional_table) analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, top_value
        ORDER BY grouping_level_1, grouping_level_2, total_values DESC
    ) top_col_values
) top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

PostgreSQL

Sensor template for PostgreSQLRendered SQL for PostgreSQL

{% import '/dialects/postgresql.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values,
            analyzed_table."country" AS grouping_level_1,
            analyzed_table."state" AS grouping_level_2
        FROM
            "your_postgresql_database"."<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, top_value
        ORDER BY grouping_level_1, grouping_level_2, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

Presto

Sensor template for PrestoRendered SQL for Presto

{% import '/dialects/presto.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM (
                SELECT
                    original_table.*,
                    {{ lib.render_target_column('original_table') }} AS top_value
                    {{- lib.render_data_grouping_projections('original_table') }}
                    {{- lib.render_time_dimension_projection('original_table') }}
                FROM {{ lib.render_target_table() }} original_table
                {{- lib.render_where_clause(extra_filter = lib.render_target_column('original_table') ~ ' IS NOT NULL', table_alias_prefix='original_table') }}
            ) analyzed_table
        {{- lib.render_where_clause(indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values,

                        analyzed_table.grouping_level_1,

                        analyzed_table.grouping_level_2

        FROM (
                SELECT
                    original_table.*,
                    original_table."target_column" AS top_value,
    original_table."country" AS grouping_level_1,
    original_table."state" AS grouping_level_2
                FROM "your_trino_database"."<target_schema>"."<target_table>" original_table
WHERE (original_table."target_column" IS NOT NULL)
            ) analyzed_table
        GROUP BY grouping_level_1, grouping_level_2, top_value
        ORDER BY grouping_level_1, grouping_level_2, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

QuestDB

Sensor template for QuestDBRendered SQL for QuestDB

{% import '/dialects/questdb.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT() AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM(
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{%- else %}
    COUNT_DISTINCT(
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT_DISTINCT(
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT() AS total_values,
            analyzed_table."country" AS grouping_level_1,
            analyzed_table."state" AS grouping_level_2
        FROM
            "<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, top_value
        ORDER BY grouping_level_1, grouping_level_2, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

Redshift

Sensor template for RedshiftRendered SQL for Redshift

{% import '/dialects/redshift.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values,
            analyzed_table."country" AS grouping_level_1,
            analyzed_table."state" AS grouping_level_2
        FROM
            "your_redshift_database"."<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, top_value
        ORDER BY grouping_level_1, grouping_level_2, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

Snowflake

Sensor template for SnowflakeRendered SQL for Snowflake

{% import '/dialects/snowflake.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values,
            analyzed_table."country" AS grouping_level_1,
            analyzed_table."state" AS grouping_level_2
        FROM
            "your_snowflake_database"."<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, top_value
        ORDER BY grouping_level_1, grouping_level_2, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

Spark

Sensor template for SparkRendered SQL for Spark

{% import '/dialects/spark.sql.jinja2' as lib with context -%}
{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values,
            analyzed_table.`country` AS grouping_level_1,
            analyzed_table.`state` AS grouping_level_2
        FROM
            `<target_schema>`.`<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, top_value
        ORDER BY grouping_level_1, grouping_level_2, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

SQL Server

Sensor template for SQL ServerRendered SQL for SQL Server

{% import '/dialects/sqlserver.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT_BIG(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} {{ lib.render_target_column('analyzed_table') }}
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT_BIG(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{% if lib.time_series is not none -%}
GROUP BY time_period, time_period_utc
{%- endif -%}
{%- if (lib.data_groupings is not none and (lib.data_groupings | length) > 0) -%}
    {% if lib.time_series is none %}GROUP BY {% endif -%}
    {%- for attribute in lib.data_groupings -%}
    {{ ', ' if lib.time_series is not none and loop.index == 1 else "" }}top_values.grouping_{{ attribute }}
    {%- endfor -%}
{%- endif -%}

SELECT
    COUNT_BIG(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table.[target_column] AS top_value,
            COUNT_BIG(*) AS total_values,
            analyzed_table.[country] AS grouping_level_1,
            analyzed_table.[state] AS grouping_level_2
        FROM
            [your_sql_server_database].[<target_schema>].[<target_table>] AS analyzed_table
        WHERE (analyzed_table.[target_column] IS NOT NULL)
        GROUP BY analyzed_table.[country], analyzed_table.[state], analyzed_table.[target_column]
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3GROUP BY top_values.grouping_level_1top_values.grouping_level_2

Teradata

Sensor template for TeradataRendered SQL for Teradata

{% import '/dialects/teradata.sql.jinja2' as lib with context -%}
{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values,
            analyzed_table."country" AS grouping_level_1,
            analyzed_table."state" AS grouping_level_2
        FROM
            "<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, top_value
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

Trino

Sensor template for TrinoRendered SQL for Trino

{% import '/dialects/trino.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM (
                SELECT
                    original_table.*,
                    {{ lib.render_target_column('original_table') }} AS top_value
                    {{- lib.render_data_grouping_projections('original_table') }}
                    {{- lib.render_time_dimension_projection('original_table') }}
                FROM {{ lib.render_target_table() }} original_table
                {{- lib.render_where_clause(extra_filter = lib.render_target_column('original_table') ~ ' IS NOT NULL', table_alias_prefix='original_table') }}
            ) analyzed_table
        {{- lib.render_where_clause(indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        RANK() OVER(PARTITION BY top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values,

                        analyzed_table.grouping_level_1,

                        analyzed_table.grouping_level_2

        FROM (
                SELECT
                    original_table.*,
                    original_table."target_column" AS top_value,
    original_table."country" AS grouping_level_1,
    original_table."state" AS grouping_level_2
                FROM "your_trino_catalog"."<target_schema>"."<target_table>" original_table
WHERE (original_table."target_column" IS NOT NULL)
            ) analyzed_table
        GROUP BY grouping_level_1, grouping_level_2, top_value
        ORDER BY grouping_level_1, grouping_level_2, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2
ORDER BY grouping_level_1, grouping_level_2

daily partition expected texts in top values count

Check description

Verifies that the top X most popular column values contain all values from a list of expected values. Stores a separate data quality check result for each daily partition.

Data quality check name	Friendly name	Category	Check type	Time scale	Quality dimension	Sensor definition	Quality rule	Standard
`daily_partition_expected_texts_in_top_values_count`	Verify that the most popular text values match the list of expected values	accepted_values	partitioned	daily	Reasonableness	expected_texts_in_top_values_count	max_missing

Command-line examples

Please expand the section below to see the DQOps command-line examples to run or activate the daily partition expected texts in top values count data quality check.

Managing daily partition expected texts in top values count check from DQOps shell

Activate the check with a warning ruleActivate the check with an error ruleRun all configured checks

Activate this data quality using the check activate CLI command, providing the connection name, table name, check name, and all other filters. Activates the warning rule with the default parameters.

dqo> check activate -c=connection_name -t=schema_name.table_name -col=column_name -ch=daily_partition_expected_texts_in_top_values_count --enable-warning

You can also use patterns to activate the check on all matching tables and columns.

dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=daily_partition_expected_texts_in_top_values_count --enable-warning

Additional rule parameters are passed using the -Wrule_parameter_name=value.

dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=daily_partition_expected_texts_in_top_values_count --enable-warning
                    -Wmax_missing=value

Activate this data quality using the check activate CLI command, providing the connection name, table name, check name, and all other filters. Activates the error rule with the default parameters.

dqo> check activate -c=connection_name -t=schema_name.table_name -col=column_name -ch=daily_partition_expected_texts_in_top_values_count --enable-error

You can also use patterns to activate the check on all matching tables and columns.

dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=daily_partition_expected_texts_in_top_values_count --enable-error

Additional rule parameters are passed using the -Erule_parameter_name=value.

dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=daily_partition_expected_texts_in_top_values_count --enable-error
                    -Emax_missing=value

Run this data quality check using the check run CLI command by providing the check name and all other targeting filters. The following example shows how to run the daily_partition_expected_texts_in_top_values_count check on all tables and columns on a single data source.

dqo> check run -c=data_source_name -ch=daily_partition_expected_texts_in_top_values_count

It is also possible to run this check on a specific connection and table. In order to do this, use the connection name and the full table name parameters.

dqo> check run -c=connection_name -t=schema_name.table_name -ch=daily_partition_expected_texts_in_top_values_count

You can also run this check on all tables (and columns) on which the daily_partition_expected_texts_in_top_values_count check is enabled using patterns to find tables.

dqo> check run -c=connection_name -t=schema_prefix*.fact_* -col=column_name_* -ch=daily_partition_expected_texts_in_top_values_count

YAML configuration

The sample schema_name.table_name.dqotable.yaml file with the check configured is shown below.

# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
  timestamp_columns:
    partition_by_column: date_column
  incremental_time_window:
    daily_partitioning_recent_days: 7
    monthly_partitioning_recent_months: 1
  columns:
    target_column:
      partitioned_checks:
        daily:
          accepted_values:
            daily_partition_expected_texts_in_top_values_count:
              parameters:
                expected_values:
                - USD
                - GBP
                - EUR
                top: 3
              warning:
                max_missing: 0
              error:
                max_missing: 1
              fatal:
                max_missing: 2
      labels:
      - This is the column that is analyzed for data quality issues
    date_column:
      labels:
      - "date or datetime column used as a daily or monthly partitioning key, dates\
        \ (and times) are truncated to a day or a month by the sensor's query for\
        \ partitioned checks"

Samples of generated SQL queries for each data source type

Please expand the database engine name section to see the SQL query rendered by a Jinja2 template for the expected_texts_in_top_values_count data quality sensor.

BigQuery

Sensor template for BigQueryRendered SQL for BigQuery

{% import '/dialects/bigquery.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values,
            CAST(analyzed_table.`date_column` AS DATE) AS time_period,
            TIMESTAMP(CAST(analyzed_table.`date_column` AS DATE)) AS time_period_utc
        FROM
            `your-google-project-id`.`<target_schema>`.`<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY time_period, time_period_utc, top_value
        ORDER BY time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc

ClickHouse

Sensor template for ClickHouseRendered SQL for ClickHouse

{% import '/dialects/clickhouse.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values,
            CAST(analyzed_table."date_column" AS DATE) AS time_period,
            toDateTime64(CAST(analyzed_table."date_column" AS DATE), 3) AS time_period_utc
        FROM
            "<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY time_period, time_period_utc, top_value
        ORDER BY time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc

Databricks

Sensor template for DatabricksRendered SQL for Databricks

{% import '/dialects/databricks.sql.jinja2' as lib with context -%}
{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values,
            CAST(analyzed_table.`date_column` AS DATE) AS time_period,
            TIMESTAMP(CAST(analyzed_table.`date_column` AS DATE)) AS time_period_utc
        FROM
            `<target_schema>`.`<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY time_period, time_period_utc, top_value
        ORDER BY time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc

DB2

Sensor template for DB2Rendered SQL for DB2

{% import '/dialects/db2.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value AS top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period AS time_period,
        top_col_values.time_period_utc AS time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) AS top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM (
                SELECT
                    original_table.*,
                    {{ lib.render_target_column('original_table') }} AS top_value
                    {{- lib.render_data_grouping_projections('original_table') }}
                    {{- lib.render_time_dimension_projection('original_table') }}
                FROM {{ lib.render_target_table() }} original_table
                {{- lib.render_where_clause(extra_filter = lib.render_target_column('original_table') ~ ' IS NOT NULL', table_alias_prefix='original_table') }}
            ) analyzed_table
        {{- lib.render_where_clause(indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true and lib.data_groupings is not none -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc
FROM
(
    SELECT
        top_col_values.top_value AS top_value,
        top_col_values.time_period AS time_period,
        top_col_values.time_period_utc AS time_period_utc,
        RANK() OVER(
            ORDER BY top_col_values.total_values DESC) AS top_values_rank
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values,
            time_period,
            time_period_utc
        FROM (
                SELECT
                    original_table.*,
                    original_table."target_column" AS top_value,
    CAST(original_table."date_column" AS DATE) AS time_period,
    TIMESTAMP(CAST(original_table."date_column" AS DATE)) AS time_period_utc
                FROM "<target_schema>"."<target_table>" original_table
WHERE (original_table."target_column" IS NOT NULL)
            ) analyzed_table
        GROUP BY time_period, time_period_utc, top_value
        ORDER BY time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc

DuckDB

Sensor template for DuckDBRendered SQL for DuckDB

{% import '/dialects/duckdb.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values,
            CAST(analyzed_table."date_column" AS date) AS time_period,
            CAST((CAST(analyzed_table."date_column" AS date)) AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
        FROM
             AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY time_period, time_period_utc, top_value
        ORDER BY time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc

HANA

Sensor template for HANARendered SQL for HANA

{% import '/dialects/hana.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM (
                SELECT
                    original_table.*,
                    {{ lib.render_target_column('original_table') }} AS top_value
                    {{- lib.render_data_grouping_projections('original_table') }}
                    {{- lib.render_time_dimension_projection('original_table') }}
                FROM {{ lib.render_target_table() }} original_table
                {{- lib.render_where_clause(extra_filter = lib.render_target_column('original_table') ~ ' IS NOT NULL', table_alias_prefix='original_table') }}
            ) analyzed_table
        {{- lib.render_where_clause(indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true and lib.data_groupings is not none -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values,
            time_period,
            time_period_utc
        FROM (
                SELECT
                    original_table.*,
                    original_table."target_column" AS top_value,
    CAST(original_table."date_column" AS DATE) AS time_period,
    TO_TIMESTAMP(CAST(original_table."date_column" AS DATE)) AS time_period_utc
                FROM "<target_schema>"."<target_table>" original_table
WHERE (original_table."target_column" IS NOT NULL)
            ) analyzed_table
        GROUP BY time_period, time_period_utc, top_value
        ORDER BY time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc

MariaDB

Sensor template for MariaDBRendered SQL for MariaDB

{% import '/dialects/mariadb.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
        FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period
            ORDER BY top_col_values.total_values DESC) as top_values_rank
        FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values,
            DATE_FORMAT(analyzed_table.`date_column`, '%Y-%m-%d 00:00:00') AS time_period,
            FROM_UNIXTIME(UNIX_TIMESTAMP(DATE_FORMAT(analyzed_table.`date_column`, '%Y-%m-%d 00:00:00'))) AS time_period_utc
        FROM
            `<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY time_period, time_period_utc, top_value
        ORDER BY time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc

MySQL

Sensor template for MySQLRendered SQL for MySQL

{% import '/dialects/mysql.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
        FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period
            ORDER BY top_col_values.total_values DESC) as top_values_rank
        FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values,
            DATE_FORMAT(analyzed_table.`date_column`, '%Y-%m-%d 00:00:00') AS time_period,
            FROM_UNIXTIME(UNIX_TIMESTAMP(DATE_FORMAT(analyzed_table.`date_column`, '%Y-%m-%d 00:00:00'))) AS time_period_utc
        FROM
            `<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY time_period, time_period_utc, top_value
        ORDER BY time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc

Oracle

Sensor template for OracleRendered SQL for Oracle

{% import '/dialects/oracle.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period time_period,
        top_col_values.time_period_utc time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} top_value,
            COUNT(*) total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM
        (
            SELECT
            additional_table.*,
            {{ lib.render_target_column('additional_table') }} top_value
            {{- lib.render_data_grouping_projections('additional_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('additional_table', indentation = '            ') }}
            FROM {{ lib.render_target_table() }} additional_table) analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) top_col_values
) top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL actual_value,
    MAX(0) expected_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM(
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
    {{- lib.render_where_clause(table_alias_prefix='original_table') }}) analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) actual_value,
    MAX({{ parameters.expected_values | length }}) expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) actual_value,
    MAX(3) expected_value,
    top_values.time_period,
    top_values.time_period_utc
FROM
(
    SELECT
        top_col_values.top_value top_value,
        top_col_values.time_period time_period,
        top_col_values.time_period_utc time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period
            ORDER BY top_col_values.total_values DESC) top_values_rank
    FROM
    (
        SELECT
            analyzed_table."target_column" top_value,
            COUNT(*) total_values,
            time_period,
            time_period_utc
        FROM
        (
            SELECT
            additional_table.*,
            additional_table."target_column" top_value,
            TRUNC(CAST(additional_table."date_column" AS DATE)) AS time_period,
            CAST(TRUNC(CAST(additional_table."date_column" AS DATE)) AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
            FROM "<target_schema>"."<target_table>" additional_table) analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY time_period, time_period_utc, top_value
        ORDER BY time_period, time_period_utc, total_values DESC
    ) top_col_values
) top_values
WHERE top_values_rank <= 3
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc

PostgreSQL

Sensor template for PostgreSQLRendered SQL for PostgreSQL

{% import '/dialects/postgresql.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values,
            CAST(analyzed_table."date_column" AS date) AS time_period,
            CAST((CAST(analyzed_table."date_column" AS date)) AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
        FROM
            "your_postgresql_database"."<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY time_period, time_period_utc, top_value
        ORDER BY time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc

Presto

Sensor template for PrestoRendered SQL for Presto

{% import '/dialects/presto.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM (
                SELECT
                    original_table.*,
                    {{ lib.render_target_column('original_table') }} AS top_value
                    {{- lib.render_data_grouping_projections('original_table') }}
                    {{- lib.render_time_dimension_projection('original_table') }}
                FROM {{ lib.render_target_table() }} original_table
                {{- lib.render_where_clause(extra_filter = lib.render_target_column('original_table') ~ ' IS NOT NULL', table_alias_prefix='original_table') }}
            ) analyzed_table
        {{- lib.render_where_clause(indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values,
            time_period,
            time_period_utc
        FROM (
                SELECT
                    original_table.*,
                    original_table."target_column" AS top_value,
    CAST(original_table."date_column" AS date) AS time_period,
    CAST(CAST(original_table."date_column" AS date) AS TIMESTAMP) AS time_period_utc
                FROM "your_trino_database"."<target_schema>"."<target_table>" original_table
WHERE (original_table."target_column" IS NOT NULL)
            ) analyzed_table
        GROUP BY time_period, time_period_utc, top_value
        ORDER BY time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc

QuestDB

Sensor template for QuestDBRendered SQL for QuestDB

{% import '/dialects/questdb.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT() AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM(
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{%- else %}
    COUNT_DISTINCT(
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT_DISTINCT(
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT() AS total_values,
            CAST(DATE_TRUNC('day', analyzed_table."date_column") AS DATE) AS time_period,
            CAST((CAST(DATE_TRUNC('day', analyzed_table."date_column") AS DATE)) AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
        FROM
            "<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY time_period, time_period_utc, top_value
        ORDER BY time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc

Redshift

Sensor template for RedshiftRendered SQL for Redshift

{% import '/dialects/redshift.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values,
            CAST(analyzed_table."date_column" AS date) AS time_period,
            CAST((CAST(analyzed_table."date_column" AS date)) AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
        FROM
            "your_redshift_database"."<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY time_period, time_period_utc, top_value
        ORDER BY time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc

Snowflake

Sensor template for SnowflakeRendered SQL for Snowflake

{% import '/dialects/snowflake.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values,
            CAST(analyzed_table."date_column" AS date) AS time_period,
            TO_TIMESTAMP(CAST(analyzed_table."date_column" AS date)) AS time_period_utc
        FROM
            "your_snowflake_database"."<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY time_period, time_period_utc, top_value
        ORDER BY time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc

Spark

Sensor template for SparkRendered SQL for Spark

{% import '/dialects/spark.sql.jinja2' as lib with context -%}
{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values,
            CAST(analyzed_table.`date_column` AS DATE) AS time_period,
            TIMESTAMP(CAST(analyzed_table.`date_column` AS DATE)) AS time_period_utc
        FROM
            `<target_schema>`.`<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY time_period, time_period_utc, top_value
        ORDER BY time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc

SQL Server

Sensor template for SQL ServerRendered SQL for SQL Server

{% import '/dialects/sqlserver.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT_BIG(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} {{ lib.render_target_column('analyzed_table') }}
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT_BIG(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{% if lib.time_series is not none -%}
GROUP BY time_period, time_period_utc
{%- endif -%}
{%- if (lib.data_groupings is not none and (lib.data_groupings | length) > 0) -%}
    {% if lib.time_series is none %}GROUP BY {% endif -%}
    {%- for attribute in lib.data_groupings -%}
    {{ ', ' if lib.time_series is not none and loop.index == 1 else "" }}top_values.grouping_{{ attribute }}
    {%- endfor -%}
{%- endif -%}

SELECT
    COUNT_BIG(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table.[target_column] AS top_value,
            COUNT_BIG(*) AS total_values,
            CAST(analyzed_table.[date_column] AS date) AS time_period,
            CAST((CAST(analyzed_table.[date_column] AS date)) AS DATETIME) AS time_period_utc
        FROM
            [your_sql_server_database].[<target_schema>].[<target_table>] AS analyzed_table
        WHERE (analyzed_table.[target_column] IS NOT NULL)
        GROUP BY CAST(analyzed_table.[date_column] AS date), CAST(analyzed_table.[date_column] AS date), analyzed_table.[target_column]
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3GROUP BY time_period, time_period_utc

Teradata

Sensor template for TeradataRendered SQL for Teradata

{% import '/dialects/teradata.sql.jinja2' as lib with context -%}
{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values,
            CAST(analyzed_table."date_column" AS DATE) AS time_period,
            CAST(CAST(analyzed_table."date_column" AS DATE) AS TIMESTAMP) AS time_period_utc
        FROM
            "<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY time_period, time_period_utc, top_value
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc

Trino

Sensor template for TrinoRendered SQL for Trino

{% import '/dialects/trino.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM (
                SELECT
                    original_table.*,
                    {{ lib.render_target_column('original_table') }} AS top_value
                    {{- lib.render_data_grouping_projections('original_table') }}
                    {{- lib.render_time_dimension_projection('original_table') }}
                FROM {{ lib.render_target_table() }} original_table
                {{- lib.render_where_clause(extra_filter = lib.render_target_column('original_table') ~ ' IS NOT NULL', table_alias_prefix='original_table') }}
            ) analyzed_table
        {{- lib.render_where_clause(indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values,
            time_period,
            time_period_utc
        FROM (
                SELECT
                    original_table.*,
                    original_table."target_column" AS top_value,
    CAST(original_table."date_column" AS date) AS time_period,
    CAST(CAST(original_table."date_column" AS date) AS TIMESTAMP) AS time_period_utc
                FROM "your_trino_catalog"."<target_schema>"."<target_table>" original_table
WHERE (original_table."target_column" IS NOT NULL)
            ) analyzed_table
        GROUP BY time_period, time_period_utc, top_value
        ORDER BY time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc

Expand the Configure with data grouping section to see additional examples for configuring this data quality checks to use data grouping (GROUP BY).

Configuration with data grouping

Sample configuration with data grouping enabled (YAML) The sample below shows how to configure the data grouping and how it affects the generated SQL query.

# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
  timestamp_columns:
    partition_by_column: date_column
  incremental_time_window:
    daily_partitioning_recent_days: 7
    monthly_partitioning_recent_months: 1
  default_grouping_name: group_by_country_and_state
  groupings:
    group_by_country_and_state:
      level_1:
        source: column_value
        column: country
      level_2:
        source: column_value
        column: state
  columns:
    target_column:
      partitioned_checks:
        daily:
          accepted_values:
            daily_partition_expected_texts_in_top_values_count:
              parameters:
                expected_values:
                - USD
                - GBP
                - EUR
                top: 3
              warning:
                max_missing: 0
              error:
                max_missing: 1
              fatal:
                max_missing: 2
      labels:
      - This is the column that is analyzed for data quality issues
    date_column:
      labels:
      - "date or datetime column used as a daily or monthly partitioning key, dates\
        \ (and times) are truncated to a day or a month by the sensor's query for\
        \ partitioned checks"
    country:
      labels:
      - column used as the first grouping key
    state:
      labels:
      - column used as the second grouping key

Please expand the database engine name section to see the SQL query rendered by a Jinja2 template for the expected_texts_in_top_values_count sensor.

BigQuery

Sensor template for BigQueryRendered SQL for BigQuery

{% import '/dialects/bigquery.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period, top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values,
            analyzed_table.`country` AS grouping_level_1,
            analyzed_table.`state` AS grouping_level_2,
            CAST(analyzed_table.`date_column` AS DATE) AS time_period,
            TIMESTAMP(CAST(analyzed_table.`date_column` AS DATE)) AS time_period_utc
        FROM
            `your-google-project-id`.`<target_schema>`.`<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc, top_value
        ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc

ClickHouse

Sensor template for ClickHouseRendered SQL for ClickHouse

{% import '/dialects/clickhouse.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period, top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values,
            analyzed_table."country" AS grouping_level_1,
            analyzed_table."state" AS grouping_level_2,
            CAST(analyzed_table."date_column" AS DATE) AS time_period,
            toDateTime64(CAST(analyzed_table."date_column" AS DATE), 3) AS time_period_utc
        FROM
            "<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc, top_value
        ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc

Databricks

Sensor template for DatabricksRendered SQL for Databricks

{% import '/dialects/databricks.sql.jinja2' as lib with context -%}
{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period, top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values,
            analyzed_table.`country` AS grouping_level_1,
            analyzed_table.`state` AS grouping_level_2,
            CAST(analyzed_table.`date_column` AS DATE) AS time_period,
            TIMESTAMP(CAST(analyzed_table.`date_column` AS DATE)) AS time_period_utc
        FROM
            `<target_schema>`.`<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc, top_value
        ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc

DB2

Sensor template for DB2Rendered SQL for DB2

{% import '/dialects/db2.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value AS top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period AS time_period,
        top_col_values.time_period_utc AS time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) AS top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM (
                SELECT
                    original_table.*,
                    {{ lib.render_target_column('original_table') }} AS top_value
                    {{- lib.render_data_grouping_projections('original_table') }}
                    {{- lib.render_time_dimension_projection('original_table') }}
                FROM {{ lib.render_target_table() }} original_table
                {{- lib.render_where_clause(extra_filter = lib.render_target_column('original_table') ~ ' IS NOT NULL', table_alias_prefix='original_table') }}
            ) analyzed_table
        {{- lib.render_where_clause(indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true and lib.data_groupings is not none -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value AS top_value,
        top_col_values.time_period AS time_period,
        top_col_values.time_period_utc AS time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period, top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) AS top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values,

                        analyzed_table.grouping_level_1,

                        analyzed_table.grouping_level_2,
            time_period,
            time_period_utc
        FROM (
                SELECT
                    original_table.*,
                    original_table."target_column" AS top_value,
    original_table."country" AS grouping_level_1,
    original_table."state" AS grouping_level_2,
    CAST(original_table."date_column" AS DATE) AS time_period,
    TIMESTAMP(CAST(original_table."date_column" AS DATE)) AS time_period_utc
                FROM "<target_schema>"."<target_table>" original_table
WHERE (original_table."target_column" IS NOT NULL)
            ) analyzed_table
        GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc, top_value
        ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc

DuckDB

Sensor template for DuckDBRendered SQL for DuckDB

{% import '/dialects/duckdb.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period, top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values,
            analyzed_table."country" AS grouping_level_1,
            analyzed_table."state" AS grouping_level_2,
            CAST(analyzed_table."date_column" AS date) AS time_period,
            CAST((CAST(analyzed_table."date_column" AS date)) AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
        FROM
             AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc, top_value
        ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc

HANA

Sensor template for HANARendered SQL for HANA

{% import '/dialects/hana.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM (
                SELECT
                    original_table.*,
                    {{ lib.render_target_column('original_table') }} AS top_value
                    {{- lib.render_data_grouping_projections('original_table') }}
                    {{- lib.render_time_dimension_projection('original_table') }}
                FROM {{ lib.render_target_table() }} original_table
                {{- lib.render_where_clause(extra_filter = lib.render_target_column('original_table') ~ ' IS NOT NULL', table_alias_prefix='original_table') }}
            ) analyzed_table
        {{- lib.render_where_clause(indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true and lib.data_groupings is not none -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period, top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values,

                        analyzed_table.grouping_level_1,

                        analyzed_table.grouping_level_2,
            time_period,
            time_period_utc
        FROM (
                SELECT
                    original_table.*,
                    original_table."target_column" AS top_value,
    original_table."country" AS grouping_level_1,
    original_table."state" AS grouping_level_2,
    CAST(original_table."date_column" AS DATE) AS time_period,
    TO_TIMESTAMP(CAST(original_table."date_column" AS DATE)) AS time_period_utc
                FROM "<target_schema>"."<target_table>" original_table
WHERE (original_table."target_column" IS NOT NULL)
            ) analyzed_table
        GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc, top_value
        ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc

MariaDB

Sensor template for MariaDBRendered SQL for MariaDB

{% import '/dialects/mariadb.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
        FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period, top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
        FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values,
            analyzed_table.`country` AS grouping_level_1,
            analyzed_table.`state` AS grouping_level_2,
            DATE_FORMAT(analyzed_table.`date_column`, '%Y-%m-%d 00:00:00') AS time_period,
            FROM_UNIXTIME(UNIX_TIMESTAMP(DATE_FORMAT(analyzed_table.`date_column`, '%Y-%m-%d 00:00:00'))) AS time_period_utc
        FROM
            `<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc, top_value
        ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc

MySQL

Sensor template for MySQLRendered SQL for MySQL

{% import '/dialects/mysql.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
        FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period, top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
        FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values,
            analyzed_table.`country` AS grouping_level_1,
            analyzed_table.`state` AS grouping_level_2,
            DATE_FORMAT(analyzed_table.`date_column`, '%Y-%m-%d 00:00:00') AS time_period,
            FROM_UNIXTIME(UNIX_TIMESTAMP(DATE_FORMAT(analyzed_table.`date_column`, '%Y-%m-%d 00:00:00'))) AS time_period_utc
        FROM
            `<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc, top_value
        ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc

Oracle

Sensor template for OracleRendered SQL for Oracle

{% import '/dialects/oracle.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period time_period,
        top_col_values.time_period_utc time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} top_value,
            COUNT(*) total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM
        (
            SELECT
            additional_table.*,
            {{ lib.render_target_column('additional_table') }} top_value
            {{- lib.render_data_grouping_projections('additional_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('additional_table', indentation = '            ') }}
            FROM {{ lib.render_target_table() }} additional_table) analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) top_col_values
) top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL actual_value,
    MAX(0) expected_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM(
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
    {{- lib.render_where_clause(table_alias_prefix='original_table') }}) analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) actual_value,
    MAX({{ parameters.expected_values | length }}) expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) actual_value,
    MAX(3) expected_value,
    top_values.time_period,
    top_values.time_period_utc,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value top_value,
        top_col_values.time_period time_period,
        top_col_values.time_period_utc time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period, top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table."target_column" top_value,
            COUNT(*) total_values,

                        analyzed_table.grouping_level_1,

                        analyzed_table.grouping_level_2
,
            time_period,
            time_period_utc
        FROM
        (
            SELECT
            additional_table.*,
            additional_table."target_column" top_value,
            additional_table."country" AS grouping_level_1,
            additional_table."state" AS grouping_level_2,
            TRUNC(CAST(additional_table."date_column" AS DATE)) AS time_period,
            CAST(TRUNC(CAST(additional_table."date_column" AS DATE)) AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
            FROM "<target_schema>"."<target_table>" additional_table) analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc, top_value
        ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc, total_values DESC
    ) top_col_values
) top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc

PostgreSQL

Sensor template for PostgreSQLRendered SQL for PostgreSQL

{% import '/dialects/postgresql.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period, top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values,
            analyzed_table."country" AS grouping_level_1,
            analyzed_table."state" AS grouping_level_2,
            CAST(analyzed_table."date_column" AS date) AS time_period,
            CAST((CAST(analyzed_table."date_column" AS date)) AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
        FROM
            "your_postgresql_database"."<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc, top_value
        ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc

Presto

Sensor template for PrestoRendered SQL for Presto

{% import '/dialects/presto.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM (
                SELECT
                    original_table.*,
                    {{ lib.render_target_column('original_table') }} AS top_value
                    {{- lib.render_data_grouping_projections('original_table') }}
                    {{- lib.render_time_dimension_projection('original_table') }}
                FROM {{ lib.render_target_table() }} original_table
                {{- lib.render_where_clause(extra_filter = lib.render_target_column('original_table') ~ ' IS NOT NULL', table_alias_prefix='original_table') }}
            ) analyzed_table
        {{- lib.render_where_clause(indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period, top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values,

                        analyzed_table.grouping_level_1,

                        analyzed_table.grouping_level_2
,
            time_period,
            time_period_utc
        FROM (
                SELECT
                    original_table.*,
                    original_table."target_column" AS top_value,
    original_table."country" AS grouping_level_1,
    original_table."state" AS grouping_level_2,
    CAST(original_table."date_column" AS date) AS time_period,
    CAST(CAST(original_table."date_column" AS date) AS TIMESTAMP) AS time_period_utc
                FROM "your_trino_database"."<target_schema>"."<target_table>" original_table
WHERE (original_table."target_column" IS NOT NULL)
            ) analyzed_table
        GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc, top_value
        ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc

QuestDB

Sensor template for QuestDBRendered SQL for QuestDB

{% import '/dialects/questdb.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT() AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM(
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{%- else %}
    COUNT_DISTINCT(
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT_DISTINCT(
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period, top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT() AS total_values,
            analyzed_table."country" AS grouping_level_1,
            analyzed_table."state" AS grouping_level_2,
            CAST(DATE_TRUNC('day', analyzed_table."date_column") AS DATE) AS time_period,
            CAST((CAST(DATE_TRUNC('day', analyzed_table."date_column") AS DATE)) AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
        FROM
            "<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc, top_value
        ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc

Redshift

Sensor template for RedshiftRendered SQL for Redshift

{% import '/dialects/redshift.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period, top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values,
            analyzed_table."country" AS grouping_level_1,
            analyzed_table."state" AS grouping_level_2,
            CAST(analyzed_table."date_column" AS date) AS time_period,
            CAST((CAST(analyzed_table."date_column" AS date)) AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
        FROM
            "your_redshift_database"."<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc, top_value
        ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc

Snowflake

Sensor template for SnowflakeRendered SQL for Snowflake

{% import '/dialects/snowflake.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period, top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values,
            analyzed_table."country" AS grouping_level_1,
            analyzed_table."state" AS grouping_level_2,
            CAST(analyzed_table."date_column" AS date) AS time_period,
            TO_TIMESTAMP(CAST(analyzed_table."date_column" AS date)) AS time_period_utc
        FROM
            "your_snowflake_database"."<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc, top_value
        ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc

Spark

Sensor template for SparkRendered SQL for Spark

{% import '/dialects/spark.sql.jinja2' as lib with context -%}
{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period, top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values,
            analyzed_table.`country` AS grouping_level_1,
            analyzed_table.`state` AS grouping_level_2,
            CAST(analyzed_table.`date_column` AS DATE) AS time_period,
            TIMESTAMP(CAST(analyzed_table.`date_column` AS DATE)) AS time_period_utc
        FROM
            `<target_schema>`.`<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc, top_value
        ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc

SQL Server

Sensor template for SQL ServerRendered SQL for SQL Server

{% import '/dialects/sqlserver.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT_BIG(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} {{ lib.render_target_column('analyzed_table') }}
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT_BIG(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{% if lib.time_series is not none -%}
GROUP BY time_period, time_period_utc
{%- endif -%}
{%- if (lib.data_groupings is not none and (lib.data_groupings | length) > 0) -%}
    {% if lib.time_series is none %}GROUP BY {% endif -%}
    {%- for attribute in lib.data_groupings -%}
    {{ ', ' if lib.time_series is not none and loop.index == 1 else "" }}top_values.grouping_{{ attribute }}
    {%- endfor -%}
{%- endif -%}

SELECT
    COUNT_BIG(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period, top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table.[target_column] AS top_value,
            COUNT_BIG(*) AS total_values,
            analyzed_table.[country] AS grouping_level_1,
            analyzed_table.[state] AS grouping_level_2,
            CAST(analyzed_table.[date_column] AS date) AS time_period,
            CAST((CAST(analyzed_table.[date_column] AS date)) AS DATETIME) AS time_period_utc
        FROM
            [your_sql_server_database].[<target_schema>].[<target_table>] AS analyzed_table
        WHERE (analyzed_table.[target_column] IS NOT NULL)
        GROUP BY analyzed_table.[country], analyzed_table.[state], CAST(analyzed_table.[date_column] AS date), CAST(analyzed_table.[date_column] AS date), analyzed_table.[target_column]
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3GROUP BY time_period, time_period_utc, top_values.grouping_level_1top_values.grouping_level_2

Teradata

Sensor template for TeradataRendered SQL for Teradata

{% import '/dialects/teradata.sql.jinja2' as lib with context -%}
{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period, top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values,
            analyzed_table."country" AS grouping_level_1,
            analyzed_table."state" AS grouping_level_2,
            CAST(analyzed_table."date_column" AS DATE) AS time_period,
            CAST(CAST(analyzed_table."date_column" AS DATE) AS TIMESTAMP) AS time_period_utc
        FROM
            "<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc, top_value
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc

Trino

Sensor template for TrinoRendered SQL for Trino

{% import '/dialects/trino.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM (
                SELECT
                    original_table.*,
                    {{ lib.render_target_column('original_table') }} AS top_value
                    {{- lib.render_data_grouping_projections('original_table') }}
                    {{- lib.render_time_dimension_projection('original_table') }}
                FROM {{ lib.render_target_table() }} original_table
                {{- lib.render_where_clause(extra_filter = lib.render_target_column('original_table') ~ ' IS NOT NULL', table_alias_prefix='original_table') }}
            ) analyzed_table
        {{- lib.render_where_clause(indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period, top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values,

                        analyzed_table.grouping_level_1,

                        analyzed_table.grouping_level_2
,
            time_period,
            time_period_utc
        FROM (
                SELECT
                    original_table.*,
                    original_table."target_column" AS top_value,
    original_table."country" AS grouping_level_1,
    original_table."state" AS grouping_level_2,
    CAST(original_table."date_column" AS date) AS time_period,
    CAST(CAST(original_table."date_column" AS date) AS TIMESTAMP) AS time_period_utc
                FROM "your_trino_catalog"."<target_schema>"."<target_table>" original_table
WHERE (original_table."target_column" IS NOT NULL)
            ) analyzed_table
        GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc, top_value
        ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc

monthly partition expected texts in top values count

Check description

Verifies that the top X most popular column values contain all values from a list of expected values. Stores a separate data quality check result for each monthly partition.

Data quality check name	Friendly name	Category	Check type	Time scale	Quality dimension	Sensor definition	Quality rule	Standard
`monthly_partition_expected_texts_in_top_values_count`	Verify that the most popular text values match the list of expected values	accepted_values	partitioned	monthly	Reasonableness	expected_texts_in_top_values_count	max_missing

Command-line examples

Please expand the section below to see the DQOps command-line examples to run or activate the monthly partition expected texts in top values count data quality check.

Managing monthly partition expected texts in top values count check from DQOps shell

Activate the check with a warning ruleActivate the check with an error ruleRun all configured checks

Activate this data quality using the check activate CLI command, providing the connection name, table name, check name, and all other filters. Activates the warning rule with the default parameters.

dqo> check activate -c=connection_name -t=schema_name.table_name -col=column_name -ch=monthly_partition_expected_texts_in_top_values_count --enable-warning

You can also use patterns to activate the check on all matching tables and columns.

dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=monthly_partition_expected_texts_in_top_values_count --enable-warning

Additional rule parameters are passed using the -Wrule_parameter_name=value.

dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=monthly_partition_expected_texts_in_top_values_count --enable-warning
                    -Wmax_missing=value

Activate this data quality using the check activate CLI command, providing the connection name, table name, check name, and all other filters. Activates the error rule with the default parameters.

dqo> check activate -c=connection_name -t=schema_name.table_name -col=column_name -ch=monthly_partition_expected_texts_in_top_values_count --enable-error

You can also use patterns to activate the check on all matching tables and columns.

dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=monthly_partition_expected_texts_in_top_values_count --enable-error

Additional rule parameters are passed using the -Erule_parameter_name=value.

dqo> check activate -c=connection_name -t=schema_prefix*.fact_* -col=column_name -ch=monthly_partition_expected_texts_in_top_values_count --enable-error
                    -Emax_missing=value

Run this data quality check using the check run CLI command by providing the check name and all other targeting filters. The following example shows how to run the monthly_partition_expected_texts_in_top_values_count check on all tables and columns on a single data source.

dqo> check run -c=data_source_name -ch=monthly_partition_expected_texts_in_top_values_count

It is also possible to run this check on a specific connection and table. In order to do this, use the connection name and the full table name parameters.

dqo> check run -c=connection_name -t=schema_name.table_name -ch=monthly_partition_expected_texts_in_top_values_count

You can also run this check on all tables (and columns) on which the monthly_partition_expected_texts_in_top_values_count check is enabled using patterns to find tables.

dqo> check run -c=connection_name -t=schema_prefix*.fact_* -col=column_name_* -ch=monthly_partition_expected_texts_in_top_values_count

YAML configuration

The sample schema_name.table_name.dqotable.yaml file with the check configured is shown below.

# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
  timestamp_columns:
    partition_by_column: date_column
  incremental_time_window:
    daily_partitioning_recent_days: 7
    monthly_partitioning_recent_months: 1
  columns:
    target_column:
      partitioned_checks:
        monthly:
          accepted_values:
            monthly_partition_expected_texts_in_top_values_count:
              parameters:
                expected_values:
                - USD
                - GBP
                - EUR
                top: 3
              warning:
                max_missing: 0
              error:
                max_missing: 1
              fatal:
                max_missing: 2
      labels:
      - This is the column that is analyzed for data quality issues
    date_column:
      labels:
      - "date or datetime column used as a daily or monthly partitioning key, dates\
        \ (and times) are truncated to a day or a month by the sensor's query for\
        \ partitioned checks"

Samples of generated SQL queries for each data source type

Please expand the database engine name section to see the SQL query rendered by a Jinja2 template for the expected_texts_in_top_values_count data quality sensor.

BigQuery

Sensor template for BigQueryRendered SQL for BigQuery

{% import '/dialects/bigquery.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values,
            DATE_TRUNC(CAST(analyzed_table.`date_column` AS DATE), MONTH) AS time_period,
            TIMESTAMP(DATE_TRUNC(CAST(analyzed_table.`date_column` AS DATE), MONTH)) AS time_period_utc
        FROM
            `your-google-project-id`.`<target_schema>`.`<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY time_period, time_period_utc, top_value
        ORDER BY time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc

ClickHouse

Sensor template for ClickHouseRendered SQL for ClickHouse

{% import '/dialects/clickhouse.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values,
            DATE_TRUNC('month', CAST(analyzed_table."date_column" AS DATE)) AS time_period,
            toDateTime64(DATE_TRUNC('month', CAST(analyzed_table."date_column" AS DATE)), 3) AS time_period_utc
        FROM
            "<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY time_period, time_period_utc, top_value
        ORDER BY time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc

Databricks

Sensor template for DatabricksRendered SQL for Databricks

{% import '/dialects/databricks.sql.jinja2' as lib with context -%}
{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values,
            DATE_TRUNC('MONTH', CAST(analyzed_table.`date_column` AS DATE)) AS time_period,
            TIMESTAMP(DATE_TRUNC('MONTH', CAST(analyzed_table.`date_column` AS DATE))) AS time_period_utc
        FROM
            `<target_schema>`.`<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY time_period, time_period_utc, top_value
        ORDER BY time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc

DB2

Sensor template for DB2Rendered SQL for DB2

{% import '/dialects/db2.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value AS top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period AS time_period,
        top_col_values.time_period_utc AS time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) AS top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM (
                SELECT
                    original_table.*,
                    {{ lib.render_target_column('original_table') }} AS top_value
                    {{- lib.render_data_grouping_projections('original_table') }}
                    {{- lib.render_time_dimension_projection('original_table') }}
                FROM {{ lib.render_target_table() }} original_table
                {{- lib.render_where_clause(extra_filter = lib.render_target_column('original_table') ~ ' IS NOT NULL', table_alias_prefix='original_table') }}
            ) analyzed_table
        {{- lib.render_where_clause(indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true and lib.data_groupings is not none -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc
FROM
(
    SELECT
        top_col_values.top_value AS top_value,
        top_col_values.time_period AS time_period,
        top_col_values.time_period_utc AS time_period_utc,
        RANK() OVER(
            ORDER BY top_col_values.total_values DESC) AS top_values_rank
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values,
            time_period,
            time_period_utc
        FROM (
                SELECT
                    original_table.*,
                    original_table."target_column" AS top_value,
    DATE_TRUNC('MONTH', CAST(original_table."date_column" AS DATE)) AS time_period,
    TIMESTAMP(DATE_TRUNC('MONTH', CAST(original_table."date_column" AS DATE))) AS time_period_utc
                FROM "<target_schema>"."<target_table>" original_table
WHERE (original_table."target_column" IS NOT NULL)
            ) analyzed_table
        GROUP BY time_period, time_period_utc, top_value
        ORDER BY time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc

DuckDB

Sensor template for DuckDBRendered SQL for DuckDB

{% import '/dialects/duckdb.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values,
            DATE_TRUNC('MONTH', CAST(analyzed_table."date_column" AS date)) AS time_period,
            CAST((DATE_TRUNC('MONTH', CAST(analyzed_table."date_column" AS date))) AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
        FROM
             AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY time_period, time_period_utc, top_value
        ORDER BY time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc

HANA

Sensor template for HANARendered SQL for HANA

{% import '/dialects/hana.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM (
                SELECT
                    original_table.*,
                    {{ lib.render_target_column('original_table') }} AS top_value
                    {{- lib.render_data_grouping_projections('original_table') }}
                    {{- lib.render_time_dimension_projection('original_table') }}
                FROM {{ lib.render_target_table() }} original_table
                {{- lib.render_where_clause(extra_filter = lib.render_target_column('original_table') ~ ' IS NOT NULL', table_alias_prefix='original_table') }}
            ) analyzed_table
        {{- lib.render_where_clause(indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true and lib.data_groupings is not none -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values,
            time_period,
            time_period_utc
        FROM (
                SELECT
                    original_table.*,
                    original_table."target_column" AS top_value,
    SERIES_ROUND(CAST(original_table."date_column" AS DATE), 'INTERVAL 1 MONTH', ROUND_DOWN) AS time_period,
    TO_TIMESTAMP(SERIES_ROUND(CAST(original_table."date_column" AS DATE), 'INTERVAL 1 MONTH', ROUND_DOWN)) AS time_period_utc
                FROM "<target_schema>"."<target_table>" original_table
WHERE (original_table."target_column" IS NOT NULL)
            ) analyzed_table
        GROUP BY time_period, time_period_utc, top_value
        ORDER BY time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc

MariaDB

Sensor template for MariaDBRendered SQL for MariaDB

{% import '/dialects/mariadb.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
        FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period
            ORDER BY top_col_values.total_values DESC) as top_values_rank
        FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values,
            DATE_FORMAT(analyzed_table.`date_column`, '%Y-%m-01 00:00:00') AS time_period,
            FROM_UNIXTIME(UNIX_TIMESTAMP(DATE_FORMAT(analyzed_table.`date_column`, '%Y-%m-01 00:00:00'))) AS time_period_utc
        FROM
            `<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY time_period, time_period_utc, top_value
        ORDER BY time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc

MySQL

Sensor template for MySQLRendered SQL for MySQL

{% import '/dialects/mysql.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
        FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period
            ORDER BY top_col_values.total_values DESC) as top_values_rank
        FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values,
            DATE_FORMAT(analyzed_table.`date_column`, '%Y-%m-01 00:00:00') AS time_period,
            FROM_UNIXTIME(UNIX_TIMESTAMP(DATE_FORMAT(analyzed_table.`date_column`, '%Y-%m-01 00:00:00'))) AS time_period_utc
        FROM
            `<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY time_period, time_period_utc, top_value
        ORDER BY time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc

Oracle

Sensor template for OracleRendered SQL for Oracle

{% import '/dialects/oracle.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period time_period,
        top_col_values.time_period_utc time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} top_value,
            COUNT(*) total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM
        (
            SELECT
            additional_table.*,
            {{ lib.render_target_column('additional_table') }} top_value
            {{- lib.render_data_grouping_projections('additional_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('additional_table', indentation = '            ') }}
            FROM {{ lib.render_target_table() }} additional_table) analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) top_col_values
) top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL actual_value,
    MAX(0) expected_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM(
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
    {{- lib.render_where_clause(table_alias_prefix='original_table') }}) analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) actual_value,
    MAX({{ parameters.expected_values | length }}) expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) actual_value,
    MAX(3) expected_value,
    top_values.time_period,
    top_values.time_period_utc
FROM
(
    SELECT
        top_col_values.top_value top_value,
        top_col_values.time_period time_period,
        top_col_values.time_period_utc time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period
            ORDER BY top_col_values.total_values DESC) top_values_rank
    FROM
    (
        SELECT
            analyzed_table."target_column" top_value,
            COUNT(*) total_values,
            time_period,
            time_period_utc
        FROM
        (
            SELECT
            additional_table.*,
            additional_table."target_column" top_value,
            TRUNC(CAST(additional_table."date_column" AS DATE), 'MONTH') AS time_period,
            CAST(TRUNC(CAST(additional_table."date_column" AS DATE), 'MONTH') AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
            FROM "<target_schema>"."<target_table>" additional_table) analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY time_period, time_period_utc, top_value
        ORDER BY time_period, time_period_utc, total_values DESC
    ) top_col_values
) top_values
WHERE top_values_rank <= 3
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc

PostgreSQL

Sensor template for PostgreSQLRendered SQL for PostgreSQL

{% import '/dialects/postgresql.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values,
            DATE_TRUNC('MONTH', CAST(analyzed_table."date_column" AS date)) AS time_period,
            CAST((DATE_TRUNC('MONTH', CAST(analyzed_table."date_column" AS date))) AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
        FROM
            "your_postgresql_database"."<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY time_period, time_period_utc, top_value
        ORDER BY time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc

Presto

Sensor template for PrestoRendered SQL for Presto

{% import '/dialects/presto.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM (
                SELECT
                    original_table.*,
                    {{ lib.render_target_column('original_table') }} AS top_value
                    {{- lib.render_data_grouping_projections('original_table') }}
                    {{- lib.render_time_dimension_projection('original_table') }}
                FROM {{ lib.render_target_table() }} original_table
                {{- lib.render_where_clause(extra_filter = lib.render_target_column('original_table') ~ ' IS NOT NULL', table_alias_prefix='original_table') }}
            ) analyzed_table
        {{- lib.render_where_clause(indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values,
            time_period,
            time_period_utc
        FROM (
                SELECT
                    original_table.*,
                    original_table."target_column" AS top_value,
    DATE_TRUNC('MONTH', CAST(original_table."date_column" AS date)) AS time_period,
    CAST(DATE_TRUNC('MONTH', CAST(original_table."date_column" AS date)) AS TIMESTAMP) AS time_period_utc
                FROM "your_trino_database"."<target_schema>"."<target_table>" original_table
WHERE (original_table."target_column" IS NOT NULL)
            ) analyzed_table
        GROUP BY time_period, time_period_utc, top_value
        ORDER BY time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc

QuestDB

Sensor template for QuestDBRendered SQL for QuestDB

{% import '/dialects/questdb.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT() AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM(
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{%- else %}
    COUNT_DISTINCT(
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT_DISTINCT(
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT() AS total_values,
            CAST(DATE_TRUNC('month', analyzed_table."date_column") AS DATE) AS time_period,
            CAST((CAST(DATE_TRUNC('month', analyzed_table."date_column") AS DATE)) AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
        FROM
            "<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY time_period, time_period_utc, top_value
        ORDER BY time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc

Redshift

Sensor template for RedshiftRendered SQL for Redshift

{% import '/dialects/redshift.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values,
            DATE_TRUNC('MONTH', CAST(analyzed_table."date_column" AS date)) AS time_period,
            CAST((DATE_TRUNC('MONTH', CAST(analyzed_table."date_column" AS date))) AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
        FROM
            "your_redshift_database"."<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY time_period, time_period_utc, top_value
        ORDER BY time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc

Snowflake

Sensor template for SnowflakeRendered SQL for Snowflake

{% import '/dialects/snowflake.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values,
            DATE_TRUNC('MONTH', CAST(analyzed_table."date_column" AS date)) AS time_period,
            TO_TIMESTAMP(DATE_TRUNC('MONTH', CAST(analyzed_table."date_column" AS date))) AS time_period_utc
        FROM
            "your_snowflake_database"."<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY time_period, time_period_utc, top_value
        ORDER BY time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc

Spark

Sensor template for SparkRendered SQL for Spark

{% import '/dialects/spark.sql.jinja2' as lib with context -%}
{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values,
            DATE_TRUNC('MONTH', CAST(analyzed_table.`date_column` AS DATE)) AS time_period,
            TIMESTAMP(DATE_TRUNC('MONTH', CAST(analyzed_table.`date_column` AS DATE))) AS time_period_utc
        FROM
            `<target_schema>`.`<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY time_period, time_period_utc, top_value
        ORDER BY time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc

SQL Server

Sensor template for SQL ServerRendered SQL for SQL Server

{% import '/dialects/sqlserver.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT_BIG(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} {{ lib.render_target_column('analyzed_table') }}
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT_BIG(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{% if lib.time_series is not none -%}
GROUP BY time_period, time_period_utc
{%- endif -%}
{%- if (lib.data_groupings is not none and (lib.data_groupings | length) > 0) -%}
    {% if lib.time_series is none %}GROUP BY {% endif -%}
    {%- for attribute in lib.data_groupings -%}
    {{ ', ' if lib.time_series is not none and loop.index == 1 else "" }}top_values.grouping_{{ attribute }}
    {%- endfor -%}
{%- endif -%}

SELECT
    COUNT_BIG(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table.[target_column] AS top_value,
            COUNT_BIG(*) AS total_values,
            DATEFROMPARTS(YEAR(CAST(analyzed_table.[date_column] AS date)), MONTH(CAST(analyzed_table.[date_column] AS date)), 1) AS time_period,
            CAST((DATEFROMPARTS(YEAR(CAST(analyzed_table.[date_column] AS date)), MONTH(CAST(analyzed_table.[date_column] AS date)), 1)) AS DATETIME) AS time_period_utc
        FROM
            [your_sql_server_database].[<target_schema>].[<target_table>] AS analyzed_table
        WHERE (analyzed_table.[target_column] IS NOT NULL)
        GROUP BY DATEFROMPARTS(YEAR(CAST(analyzed_table.[date_column] AS date)), MONTH(CAST(analyzed_table.[date_column] AS date)), 1), DATEADD(month, DATEDIFF(month, 0, analyzed_table.[date_column]), 0), analyzed_table.[target_column]
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3GROUP BY time_period, time_period_utc

Teradata

Sensor template for TeradataRendered SQL for Teradata

{% import '/dialects/teradata.sql.jinja2' as lib with context -%}
{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values,
            TRUNC(CAST(analyzed_table."date_column" AS DATE), 'MM') AS time_period,
            CAST(TRUNC(CAST(analyzed_table."date_column" AS DATE), 'MM') AS TIMESTAMP) AS time_period_utc
        FROM
            "<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY time_period, time_period_utc, top_value
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc

Trino

Sensor template for TrinoRendered SQL for Trino

{% import '/dialects/trino.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM (
                SELECT
                    original_table.*,
                    {{ lib.render_target_column('original_table') }} AS top_value
                    {{- lib.render_data_grouping_projections('original_table') }}
                    {{- lib.render_time_dimension_projection('original_table') }}
                FROM {{ lib.render_target_table() }} original_table
                {{- lib.render_where_clause(extra_filter = lib.render_target_column('original_table') ~ ' IS NOT NULL', table_alias_prefix='original_table') }}
            ) analyzed_table
        {{- lib.render_where_clause(indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period
            ORDER BY top_col_values.total_values DESC) as top_values_rank
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values,
            time_period,
            time_period_utc
        FROM (
                SELECT
                    original_table.*,
                    original_table."target_column" AS top_value,
    DATE_TRUNC('MONTH', CAST(original_table."date_column" AS date)) AS time_period,
    CAST(DATE_TRUNC('MONTH', CAST(original_table."date_column" AS date)) AS TIMESTAMP) AS time_period_utc
                FROM "your_trino_catalog"."<target_schema>"."<target_table>" original_table
WHERE (original_table."target_column" IS NOT NULL)
            ) analyzed_table
        GROUP BY time_period, time_period_utc, top_value
        ORDER BY time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY time_period, time_period_utc
ORDER BY time_period, time_period_utc

Expand the Configure with data grouping section to see additional examples for configuring this data quality checks to use data grouping (GROUP BY).

Configuration with data grouping

Sample configuration with data grouping enabled (YAML) The sample below shows how to configure the data grouping and how it affects the generated SQL query.

# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
  timestamp_columns:
    partition_by_column: date_column
  incremental_time_window:
    daily_partitioning_recent_days: 7
    monthly_partitioning_recent_months: 1
  default_grouping_name: group_by_country_and_state
  groupings:
    group_by_country_and_state:
      level_1:
        source: column_value
        column: country
      level_2:
        source: column_value
        column: state
  columns:
    target_column:
      partitioned_checks:
        monthly:
          accepted_values:
            monthly_partition_expected_texts_in_top_values_count:
              parameters:
                expected_values:
                - USD
                - GBP
                - EUR
                top: 3
              warning:
                max_missing: 0
              error:
                max_missing: 1
              fatal:
                max_missing: 2
      labels:
      - This is the column that is analyzed for data quality issues
    date_column:
      labels:
      - "date or datetime column used as a daily or monthly partitioning key, dates\
        \ (and times) are truncated to a day or a month by the sensor's query for\
        \ partitioned checks"
    country:
      labels:
      - column used as the first grouping key
    state:
      labels:
      - column used as the second grouping key

Please expand the database engine name section to see the SQL query rendered by a Jinja2 template for the expected_texts_in_top_values_count sensor.

BigQuery

Sensor template for BigQueryRendered SQL for BigQuery

{% import '/dialects/bigquery.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period, top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values,
            analyzed_table.`country` AS grouping_level_1,
            analyzed_table.`state` AS grouping_level_2,
            DATE_TRUNC(CAST(analyzed_table.`date_column` AS DATE), MONTH) AS time_period,
            TIMESTAMP(DATE_TRUNC(CAST(analyzed_table.`date_column` AS DATE), MONTH)) AS time_period_utc
        FROM
            `your-google-project-id`.`<target_schema>`.`<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc, top_value
        ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc

ClickHouse

Sensor template for ClickHouseRendered SQL for ClickHouse

{% import '/dialects/clickhouse.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period, top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values,
            analyzed_table."country" AS grouping_level_1,
            analyzed_table."state" AS grouping_level_2,
            DATE_TRUNC('month', CAST(analyzed_table."date_column" AS DATE)) AS time_period,
            toDateTime64(DATE_TRUNC('month', CAST(analyzed_table."date_column" AS DATE)), 3) AS time_period_utc
        FROM
            "<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc, top_value
        ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc

Databricks

Sensor template for DatabricksRendered SQL for Databricks

{% import '/dialects/databricks.sql.jinja2' as lib with context -%}
{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period, top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values,
            analyzed_table.`country` AS grouping_level_1,
            analyzed_table.`state` AS grouping_level_2,
            DATE_TRUNC('MONTH', CAST(analyzed_table.`date_column` AS DATE)) AS time_period,
            TIMESTAMP(DATE_TRUNC('MONTH', CAST(analyzed_table.`date_column` AS DATE))) AS time_period_utc
        FROM
            `<target_schema>`.`<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc, top_value
        ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc

DB2

Sensor template for DB2Rendered SQL for DB2

{% import '/dialects/db2.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value AS top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period AS time_period,
        top_col_values.time_period_utc AS time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) AS top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM (
                SELECT
                    original_table.*,
                    {{ lib.render_target_column('original_table') }} AS top_value
                    {{- lib.render_data_grouping_projections('original_table') }}
                    {{- lib.render_time_dimension_projection('original_table') }}
                FROM {{ lib.render_target_table() }} original_table
                {{- lib.render_where_clause(extra_filter = lib.render_target_column('original_table') ~ ' IS NOT NULL', table_alias_prefix='original_table') }}
            ) analyzed_table
        {{- lib.render_where_clause(indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true and lib.data_groupings is not none -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value AS top_value,
        top_col_values.time_period AS time_period,
        top_col_values.time_period_utc AS time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period, top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) AS top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values,

                        analyzed_table.grouping_level_1,

                        analyzed_table.grouping_level_2,
            time_period,
            time_period_utc
        FROM (
                SELECT
                    original_table.*,
                    original_table."target_column" AS top_value,
    original_table."country" AS grouping_level_1,
    original_table."state" AS grouping_level_2,
    DATE_TRUNC('MONTH', CAST(original_table."date_column" AS DATE)) AS time_period,
    TIMESTAMP(DATE_TRUNC('MONTH', CAST(original_table."date_column" AS DATE))) AS time_period_utc
                FROM "<target_schema>"."<target_table>" original_table
WHERE (original_table."target_column" IS NOT NULL)
            ) analyzed_table
        GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc, top_value
        ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc

DuckDB

Sensor template for DuckDBRendered SQL for DuckDB

{% import '/dialects/duckdb.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period, top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values,
            analyzed_table."country" AS grouping_level_1,
            analyzed_table."state" AS grouping_level_2,
            DATE_TRUNC('MONTH', CAST(analyzed_table."date_column" AS date)) AS time_period,
            CAST((DATE_TRUNC('MONTH', CAST(analyzed_table."date_column" AS date))) AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
        FROM
             AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc, top_value
        ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc

HANA

Sensor template for HANARendered SQL for HANA

{% import '/dialects/hana.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM (
                SELECT
                    original_table.*,
                    {{ lib.render_target_column('original_table') }} AS top_value
                    {{- lib.render_data_grouping_projections('original_table') }}
                    {{- lib.render_time_dimension_projection('original_table') }}
                FROM {{ lib.render_target_table() }} original_table
                {{- lib.render_where_clause(extra_filter = lib.render_target_column('original_table') ~ ' IS NOT NULL', table_alias_prefix='original_table') }}
            ) analyzed_table
        {{- lib.render_where_clause(indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true and lib.data_groupings is not none -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period, top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values,

                        analyzed_table.grouping_level_1,

                        analyzed_table.grouping_level_2,
            time_period,
            time_period_utc
        FROM (
                SELECT
                    original_table.*,
                    original_table."target_column" AS top_value,
    original_table."country" AS grouping_level_1,
    original_table."state" AS grouping_level_2,
    SERIES_ROUND(CAST(original_table."date_column" AS DATE), 'INTERVAL 1 MONTH', ROUND_DOWN) AS time_period,
    TO_TIMESTAMP(SERIES_ROUND(CAST(original_table."date_column" AS DATE), 'INTERVAL 1 MONTH', ROUND_DOWN)) AS time_period_utc
                FROM "<target_schema>"."<target_table>" original_table
WHERE (original_table."target_column" IS NOT NULL)
            ) analyzed_table
        GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc, top_value
        ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc

MariaDB

Sensor template for MariaDBRendered SQL for MariaDB

{% import '/dialects/mariadb.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
        FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period, top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
        FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values,
            analyzed_table.`country` AS grouping_level_1,
            analyzed_table.`state` AS grouping_level_2,
            DATE_FORMAT(analyzed_table.`date_column`, '%Y-%m-01 00:00:00') AS time_period,
            FROM_UNIXTIME(UNIX_TIMESTAMP(DATE_FORMAT(analyzed_table.`date_column`, '%Y-%m-01 00:00:00'))) AS time_period_utc
        FROM
            `<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc, top_value
        ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc

MySQL

Sensor template for MySQLRendered SQL for MySQL

{% import '/dialects/mysql.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
        FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period, top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
        FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values,
            analyzed_table.`country` AS grouping_level_1,
            analyzed_table.`state` AS grouping_level_2,
            DATE_FORMAT(analyzed_table.`date_column`, '%Y-%m-01 00:00:00') AS time_period,
            FROM_UNIXTIME(UNIX_TIMESTAMP(DATE_FORMAT(analyzed_table.`date_column`, '%Y-%m-01 00:00:00'))) AS time_period_utc
        FROM
            `<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc, top_value
        ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc

Oracle

Sensor template for OracleRendered SQL for Oracle

{% import '/dialects/oracle.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period time_period,
        top_col_values.time_period_utc time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} top_value,
            COUNT(*) total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM
        (
            SELECT
            additional_table.*,
            {{ lib.render_target_column('additional_table') }} top_value
            {{- lib.render_data_grouping_projections('additional_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('additional_table', indentation = '            ') }}
            FROM {{ lib.render_target_table() }} additional_table) analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) top_col_values
) top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL actual_value,
    MAX(0) expected_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM(
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
    {{- lib.render_where_clause(table_alias_prefix='original_table') }}) analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) actual_value,
    MAX({{ parameters.expected_values | length }}) expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) actual_value,
    MAX(3) expected_value,
    top_values.time_period,
    top_values.time_period_utc,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value top_value,
        top_col_values.time_period time_period,
        top_col_values.time_period_utc time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period, top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table."target_column" top_value,
            COUNT(*) total_values,

                        analyzed_table.grouping_level_1,

                        analyzed_table.grouping_level_2
,
            time_period,
            time_period_utc
        FROM
        (
            SELECT
            additional_table.*,
            additional_table."target_column" top_value,
            additional_table."country" AS grouping_level_1,
            additional_table."state" AS grouping_level_2,
            TRUNC(CAST(additional_table."date_column" AS DATE), 'MONTH') AS time_period,
            CAST(TRUNC(CAST(additional_table."date_column" AS DATE), 'MONTH') AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
            FROM "<target_schema>"."<target_table>" additional_table) analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc, top_value
        ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc, total_values DESC
    ) top_col_values
) top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc

PostgreSQL

Sensor template for PostgreSQLRendered SQL for PostgreSQL

{% import '/dialects/postgresql.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period, top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values,
            analyzed_table."country" AS grouping_level_1,
            analyzed_table."state" AS grouping_level_2,
            DATE_TRUNC('MONTH', CAST(analyzed_table."date_column" AS date)) AS time_period,
            CAST((DATE_TRUNC('MONTH', CAST(analyzed_table."date_column" AS date))) AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
        FROM
            "your_postgresql_database"."<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc, top_value
        ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc

Presto

Sensor template for PrestoRendered SQL for Presto

{% import '/dialects/presto.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM (
                SELECT
                    original_table.*,
                    {{ lib.render_target_column('original_table') }} AS top_value
                    {{- lib.render_data_grouping_projections('original_table') }}
                    {{- lib.render_time_dimension_projection('original_table') }}
                FROM {{ lib.render_target_table() }} original_table
                {{- lib.render_where_clause(extra_filter = lib.render_target_column('original_table') ~ ' IS NOT NULL', table_alias_prefix='original_table') }}
            ) analyzed_table
        {{- lib.render_where_clause(indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period, top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values,

                        analyzed_table.grouping_level_1,

                        analyzed_table.grouping_level_2
,
            time_period,
            time_period_utc
        FROM (
                SELECT
                    original_table.*,
                    original_table."target_column" AS top_value,
    original_table."country" AS grouping_level_1,
    original_table."state" AS grouping_level_2,
    DATE_TRUNC('MONTH', CAST(original_table."date_column" AS date)) AS time_period,
    CAST(DATE_TRUNC('MONTH', CAST(original_table."date_column" AS date)) AS TIMESTAMP) AS time_period_utc
                FROM "your_trino_database"."<target_schema>"."<target_table>" original_table
WHERE (original_table."target_column" IS NOT NULL)
            ) analyzed_table
        GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc, top_value
        ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc

QuestDB

Sensor template for QuestDBRendered SQL for QuestDB

{% import '/dialects/questdb.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT() AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections_reference('analyzed_table') }}
    {{- lib.render_time_dimension_projection_reference('analyzed_table') }}
FROM(
    SELECT
        original_table.*
        {{- lib.render_data_grouping_projections('original_table') }}
        {{- lib.render_time_dimension_projection('original_table') }}
    FROM {{ lib.render_target_table() }} original_table
) analyzed_table
{%- else %}
    COUNT_DISTINCT(
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT_DISTINCT(
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period, top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT() AS total_values,
            analyzed_table."country" AS grouping_level_1,
            analyzed_table."state" AS grouping_level_2,
            CAST(DATE_TRUNC('month', analyzed_table."date_column") AS DATE) AS time_period,
            CAST((CAST(DATE_TRUNC('month', analyzed_table."date_column") AS DATE)) AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
        FROM
            "<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc, top_value
        ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc

Redshift

Sensor template for RedshiftRendered SQL for Redshift

{% import '/dialects/redshift.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period, top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values,
            analyzed_table."country" AS grouping_level_1,
            analyzed_table."state" AS grouping_level_2,
            DATE_TRUNC('MONTH', CAST(analyzed_table."date_column" AS date)) AS time_period,
            CAST((DATE_TRUNC('MONTH', CAST(analyzed_table."date_column" AS date))) AS TIMESTAMP WITH TIME ZONE) AS time_period_utc
        FROM
            "your_redshift_database"."<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc, top_value
        ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc

Snowflake

Sensor template for SnowflakeRendered SQL for Snowflake

{% import '/dialects/snowflake.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period, top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values,
            analyzed_table."country" AS grouping_level_1,
            analyzed_table."state" AS grouping_level_2,
            DATE_TRUNC('MONTH', CAST(analyzed_table."date_column" AS date)) AS time_period,
            TO_TIMESTAMP(DATE_TRUNC('MONTH', CAST(analyzed_table."date_column" AS date))) AS time_period_utc
        FROM
            "your_snowflake_database"."<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc, top_value
        ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc

Spark

Sensor template for SparkRendered SQL for Spark

{% import '/dialects/spark.sql.jinja2' as lib with context -%}
{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period, top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table.`target_column` AS top_value,
            COUNT(*) AS total_values,
            analyzed_table.`country` AS grouping_level_1,
            analyzed_table.`state` AS grouping_level_2,
            DATE_TRUNC('MONTH', CAST(analyzed_table.`date_column` AS DATE)) AS time_period,
            TIMESTAMP(DATE_TRUNC('MONTH', CAST(analyzed_table.`date_column` AS DATE))) AS time_period_utc
        FROM
            `<target_schema>`.`<target_table>` AS analyzed_table
        WHERE (analyzed_table.`target_column` IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc, top_value
        ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc

SQL Server

Sensor template for SQL ServerRendered SQL for SQL Server

{% import '/dialects/sqlserver.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT_BIG(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} {{ lib.render_target_column('analyzed_table') }}
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    NULL AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT_BIG(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{% if lib.time_series is not none -%}
GROUP BY time_period, time_period_utc
{%- endif -%}
{%- if (lib.data_groupings is not none and (lib.data_groupings | length) > 0) -%}
    {% if lib.time_series is none %}GROUP BY {% endif -%}
    {%- for attribute in lib.data_groupings -%}
    {{ ', ' if lib.time_series is not none and loop.index == 1 else "" }}top_values.grouping_{{ attribute }}
    {%- endfor -%}
{%- endif -%}

SELECT
    COUNT_BIG(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period, top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table.[target_column] AS top_value,
            COUNT_BIG(*) AS total_values,
            analyzed_table.[country] AS grouping_level_1,
            analyzed_table.[state] AS grouping_level_2,
            DATEFROMPARTS(YEAR(CAST(analyzed_table.[date_column] AS date)), MONTH(CAST(analyzed_table.[date_column] AS date)), 1) AS time_period,
            CAST((DATEFROMPARTS(YEAR(CAST(analyzed_table.[date_column] AS date)), MONTH(CAST(analyzed_table.[date_column] AS date)), 1)) AS DATETIME) AS time_period_utc
        FROM
            [your_sql_server_database].[<target_schema>].[<target_table>] AS analyzed_table
        WHERE (analyzed_table.[target_column] IS NOT NULL)
        GROUP BY analyzed_table.[country], analyzed_table.[state], DATEFROMPARTS(YEAR(CAST(analyzed_table.[date_column] AS date)), MONTH(CAST(analyzed_table.[date_column] AS date)), 1), DATEADD(month, DATEDIFF(month, 0, analyzed_table.[date_column]), 0), analyzed_table.[target_column]
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3GROUP BY time_period, time_period_utc, top_values.grouping_level_1top_values.grouping_level_2

Teradata

Sensor template for TeradataRendered SQL for Teradata

{% import '/dialects/teradata.sql.jinja2' as lib with context -%}
{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            {{ lib.render_target_column('analyzed_table') }} AS top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection('analyzed_table', indentation = '            ') }}
        FROM
            {{ lib.render_target_table() }} AS analyzed_table
        {{- lib.render_where_clause(extra_filter = lib.render_target_column('analyzed_table') ~ ' IS NOT NULL', indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period, top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            analyzed_table."target_column" AS top_value,
            COUNT(*) AS total_values,
            analyzed_table."country" AS grouping_level_1,
            analyzed_table."state" AS grouping_level_2,
            TRUNC(CAST(analyzed_table."date_column" AS DATE), 'MM') AS time_period,
            CAST(TRUNC(CAST(analyzed_table."date_column" AS DATE), 'MM') AS TIMESTAMP) AS time_period_utc
        FROM
            "<target_schema>"."<target_table>" AS analyzed_table
        WHERE (analyzed_table."target_column" IS NOT NULL)
        GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc, top_value
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc

Trino

Sensor template for TrinoRendered SQL for Trino

{% import '/dialects/trino.sql.jinja2' as lib with context -%}

{%- macro extract_in_list(values_list) -%}
    {%- for i in values_list -%}
        {%- if not loop.last -%}
            {{lib.make_text_constant(i)}}{{", "}}
        {%- else -%}
            {{lib.make_text_constant(i)}}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}

{%- macro render_from_subquery() -%}
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        {% if lib.time_series is not none -%}
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        {% endif -%}
        RANK() OVER({{- render_data_grouping('top_col_values', indentation = ' ', partition_by_enabled=true) }}
            ORDER BY top_col_values.total_values DESC) as top_values_rank {{- render_data_grouping('top_col_values', indentation = ' ') }}
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values
            {{- lib.render_data_grouping_projections_reference('analyzed_table', indentation = '            ') }}
            {{- lib.render_time_dimension_projection_reference('analyzed_table', indentation = '            ') }}
        FROM (
                SELECT
                    original_table.*,
                    {{ lib.render_target_column('original_table') }} AS top_value
                    {{- lib.render_data_grouping_projections('original_table') }}
                    {{- lib.render_time_dimension_projection('original_table') }}
                FROM {{ lib.render_target_table() }} original_table
                {{- lib.render_where_clause(extra_filter = lib.render_target_column('original_table') ~ ' IS NOT NULL', table_alias_prefix='original_table') }}
            ) analyzed_table
        {{- lib.render_where_clause(indentation = '        ') }}
        GROUP BY {{ render_grouping_columns() -}} top_value
        ORDER BY {{ render_grouping_columns() -}} total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= {{ parameters.top }}
{%- endmacro -%}

{% macro render_grouping_columns() %}
    {%- if (lib.data_groupings is not none and (lib.data_groupings | length()) > 0) or lib.time_series is not none -%}
        {{ lib.render_grouping_column_names() }} {{- ', ' -}}
    {%- endif -%}
{% endmacro %}

{%- macro render_data_grouping(table_alias_prefix = '', indentation = '', partition_by_enabled = false) -%}

    {%- if partition_by_enabled == true -%}PARTITION BY
        {%- if lib.time_series is not none -%}
            {{" "}}top_col_values.time_period
        {%- elif lib.data_groupings is none -%}
            {{" "}}NULL
        {%- endif -%}
    {%- endif -%}

    {%- if lib.data_groupings is not none and (lib.data_groupings | length()) > 0 -%}
        {%- for attribute in lib.data_groupings -%}
            {{- "" if loop.first and lib.time_series is none and partition_by_enabled else "," -}}
            {%- with data_grouping_level = lib.data_groupings[attribute] -%}
                {%- if data_grouping_level.source == 'tag' -%}
                    {{ indentation }}{{ lib.make_text_constant(data_grouping_level.tag) }}
                {%- elif data_grouping_level.source == 'column_value' -%}
                    {{ indentation }}{{ table_alias_prefix }}.grouping_{{ attribute }}
                {%- endif -%}
            {%- endwith %}
        {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

SELECT
{%- if 'expected_values' not in parameters or parameters.expected_values|length == 0 %}
    MAX(1 + NULL) AS actual_value,
    MAX(0) AS expected_value
    {{- lib.render_data_grouping_projections('analyzed_table') }}
    {{- lib.render_time_dimension_projection('analyzed_table') }}
FROM {{ lib.render_target_table() }} AS analyzed_table
{%- else %}
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ({{ extract_in_list(parameters.expected_values) }}) THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX({{ parameters.expected_values | length }}) AS expected_value
    {%- if lib.time_series is not none -%} {{- "," }}
    top_values.time_period,
    top_values.time_period_utc
    {%- endif -%}
    {{- render_data_grouping('top_values', indentation = lib.eol() ~ '    ') }}
{{ render_from_subquery() }}
{%- endif -%}
{{- lib.render_group_by() -}}
{{- lib.render_order_by() -}}

SELECT
    COUNT(DISTINCT
        CASE
            WHEN top_values.top_value IN ('USD', 'GBP', 'EUR') THEN top_values.top_value
            ELSE NULL
        END
    ) AS actual_value,
    MAX(3) AS expected_value,
    top_values.time_period,
    top_values.time_period_utc,
    top_values.grouping_level_1,
    top_values.grouping_level_2
FROM
(
    SELECT
        top_col_values.top_value as top_value,
        top_col_values.time_period as time_period,
        top_col_values.time_period_utc as time_period_utc,
        RANK() OVER(PARTITION BY top_col_values.time_period, top_col_values.grouping_level_1, top_col_values.grouping_level_2
            ORDER BY top_col_values.total_values DESC) as top_values_rank, top_col_values.grouping_level_1, top_col_values.grouping_level_2
    FROM
    (
        SELECT
            top_value,
            COUNT(*) AS total_values,

                        analyzed_table.grouping_level_1,

                        analyzed_table.grouping_level_2
,
            time_period,
            time_period_utc
        FROM (
                SELECT
                    original_table.*,
                    original_table."target_column" AS top_value,
    original_table."country" AS grouping_level_1,
    original_table."state" AS grouping_level_2,
    DATE_TRUNC('MONTH', CAST(original_table."date_column" AS date)) AS time_period,
    CAST(DATE_TRUNC('MONTH', CAST(original_table."date_column" AS date)) AS TIMESTAMP) AS time_period_utc
                FROM "your_trino_catalog"."<target_schema>"."<target_table>" original_table
WHERE (original_table."target_column" IS NOT NULL)
            ) analyzed_table
        GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc, top_value
        ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc, total_values DESC
    ) AS top_col_values
) AS top_values
WHERE top_values_rank <= 3
GROUP BY grouping_level_1, grouping_level_2, time_period, time_period_utc
ORDER BY grouping_level_1, grouping_level_2, time_period, time_period_utc

What's next

Learn how to configure data quality checks in DQOps
Look at the examples of running data quality checks, targeting tables and columns