Sensors
This is a list of the sensors in DQOps broken down by category and a brief description of what they do.
Table sensors
accuracy
Sensor name | Description |
---|---|
total_row_count_match_percent | Table level sensor that calculates the percentage of the difference of the total row count of all rows in the tested table and the total row count of the other (reference) table. |
availability
Sensor name | Description |
---|---|
table_availability | Table availability sensor runs a simple table scan query to detect if the table is queryable. This sensor returns 0.0 when no failure was detected or 1.0 when a failure was detected. |
schema
Sensor name | Description |
---|---|
column_count | Table schema data quality sensor that reads the metadata from a monitored data source and counts the number of columns. |
column_list_ordered_hash | Table schema data quality sensor detects if the list and order of columns have changed on the table. The sensor calculates a hash of the list of column names. The hash value depends on the names of the columns and the order of the columns. |
column_list_unordered_hash | Table schema data quality sensor detects if the list of columns have changed on the table. The sensor calculates a hash of the list of column names. The hash value depends on the names of the columns, but not on the order of columns. |
column_types_hash | Table schema data quality sensor detects if the list of columns has changed or any of the column has a new data type, length, scale, precision or nullability. The sensor calculates a hash of the list of column names and all components of the column's type (the type name, length, scale, precision, nullability). The hash value does not depend on the order of columns. |
sql
Sensor name | Description |
---|---|
sql_aggregated_expression | Table level sensor that executes a given SQL expression on a table. |
sql_condition_failed_count | Table level sensor that uses a custom SQL condition (an SQL expression that returns a boolean value) to count rows that do not meet the condition. |
sql_condition_failed_percent | Table level sensor that uses a custom SQL condition (an SQL expression that returns a boolean value) to count the percentage of rows that do not meet the condition. |
sql_condition_passed_count | Table level sensor that uses a custom SQL condition (an SQL expression that returns a boolean value) to count rows that meet the condition. |
sql_condition_passed_percent | Table level sensor that uses a custom SQL condition (an SQL expression that returns a boolean value) to count the percentage of rows that meet the condition. |
timeliness
Sensor name | Description |
---|---|
data_freshness | Table sensor that runs a query calculating maximum days since the most recent event. |
data_ingestion_delay | Table sensor that runs a query calculating the time difference in days between the most recent transaction timestamp and the most recent data loading timestamp. |
data_staleness | Table sensor that runs a query calculating the time difference in days between the current date and most recent data loading timestamp (staleness). |
partition_reload_lag | Table sensor that runs a query calculating maximum difference in days between ingestion timestamp and event timestamp rows. |
volume
Sensor name | Description |
---|---|
row_count | Table sensor that executes a row count query. |
Column sensors
accuracy
Sensor name | Description |
---|---|
total_average_match_percent | Column level sensor that calculates the percentage of the difference in average of a column in a table and average of a column of another table. |
total_max_match_percent | Column level sensor that calculates the percentage of the difference in max of a column in a table and max of a column of another table. |
total_min_match_percent | Column level sensor that calculates the percentage of the difference in min of a column in a table and min of a column of another table. |
total_not_null_count_match_percent | Column level sensor that calculates the percentage of the difference in row count of a column in a table and row count of a column of another table. |
total_sum_match_percent | Column level sensor that calculates the percentage of the difference in sum of a column in a table and sum of a column of another table. |
bool
Sensor name | Description |
---|---|
false_percent | Column level sensor that calculates the percentage of rows with a false value in a column. |
true_percent | Column level sensor that calculates the percentage of rows with a true value in a column. |
datatype
Sensor name | Description |
---|---|
string_datatype_detect | Column level sensor that analyzes all values in a text column and detects the data type of the values. The sensor returns a value that identifies the detected data type of column: 1 - integers, 2 - floats, 3 - dates, 4 - timestamps, 5 - booleans, 6 - strings, 7 - mixed data types. |
datetime
Sensor name | Description |
---|---|
date_match_format_percent | Column level sensor that calculates the percentage of values that does fit a given date regex in a column. |
date_values_in_future_percent | Column level sensor that calculates the percentage of rows with a date value in the future, compared with the current date. |
value_in_range_date_percent | Column level sensor that calculates the percent of non-negative values in a column. |
integrity
Sensor name | Description |
---|---|
foreign_key_match_percent | Column level sensor that calculates the percentage of values that match values in column of another table. |
foreign_key_not_match_count | Column level sensor that calculates the count of values that does not match values in column of another table. |
nulls
Sensor name | Description |
---|---|
not_null_count | Column-level sensor that calculates the number of rows with not null values. |
not_null_percent | Column level sensor that calculates the percentage of not null values in a column. |
null_count | Column-level sensor that calculates the number of rows with null values. |
null_percent | Column-level sensor that calculates the percentage of rows with null values. |
numeric
Sensor name | Description |
---|---|
expected_numbers_in_use_count | Column level sensor that counts how many expected numeric values are used in a tested column. Finds unique column values from the set of expected numeric values and counts them. This sensor is useful to analyze numeric columns that have a low number of unique values and it should be tested if all possible values from the list of expected values are used in any row. The typical types of tested columns are numeric status or type columns. |
invalid_latitude_count | Column level sensor that counts invalid latitude in a column. |
invalid_longitude_count | Column level sensor that counts invalid longitude in a column. |
mean | Column level sensor that counts the average (mean) of values in a column. |
negative_count | Column level sensor that counts negative values in a column. |
negative_percent | Column level sensor that counts percentage of negative values in a column. |
non_negative_count | Column level sensor that counts non negative values in a column. |
non_negative_percent | Column level sensor that calculates the percent of non-negative values in a column. |
number_value_in_set_percent | Column level sensor that calculates the percentage of rows for which the tested numeric column contains a value from the list of expected values. Columns with null values are also counted as a passing value (the sensor assumes that a 'null' is also an expected and accepted value). This sensor is useful for checking numeric columns that store numeric codes (such as status codes) that the only values found in the column are from a set of expected values. |
percentile | Column level sensor that finds the median in a given column. |
population_stddev | Column level sensor that calculates population standard deviation in a given column. |
population_variance | Column level sensor that calculates population variance in a given column. |
sample_stddev | Column level sensor that calculates sample standard deviation in a given column. |
sample_variance | Column level sensor that calculates sample variance in a given column. |
sum | Column level sensor that counts the sum of values in a column. |
valid_latitude_percent | Column level sensor that counts percentage of valid latitude in a column. |
valid_longitude_percent | Column level sensor that counts percentage of valid longitude in a column. |
value_above_max_value_count | Column level sensor that calculates the count of values that are above than a given value in a column. |
value_above_max_value_percent | Column level sensor that calculates the percentage of values that are above than a given value in a column. |
value_below_min_value_count | Column level sensor that calculates the count of values that are below than a given value in a column. |
value_below_min_value_percent | Column level sensor that calculates the percentage of values that are below than a given value in a column. |
values_in_range_integers_percent | Column level sensor that finds the maximum value. It works on any data type that supports the MAX functions. The returned data type matches the data type of the column (it could return date, integer, string, datetime, etc.). |
values_in_range_numeric_percent | Column level sensor that finds the maximum value. It works on any data type that supports the MAX functions. The returned data type matches the data type of the column (it could return date, integer, string, datetime, etc.). |
pii
Sensor name | Description |
---|---|
contains_email_percent | Column level sensor that calculates the percentage of rows with a valid email value in a column. |
contains_ip4_percent | Column level sensor that calculates the percentage of rows with a valid IP4 value in a column. |
contains_ip6_percent | Column level sensor that calculates the percentage of rows with a valid IP6 value in a column. |
contains_usa_phone_percent | Column level sensor that calculates the percent of values that contains a USA phone number in a column. |
contains_usa_zipcode_percent | Column level sensor that calculates the percent of values that contain a USA ZIP code number in a column. |
range
Sensor name | Description |
---|---|
max_value | Column level sensor that counts maximum value in a column. |
min_value | Column level sensor that counts minimum value in a column. |
sampling
Sensor name | Description |
---|---|
column_samples | Column level sensor that retrieves a column value samples. Column value sampling is used in profiling and in capturing error samples for failed data quality checks. |
schema
Sensor name | Description |
---|---|
column_exists | Column level data quality sensor that reads the metadata of the table from the data source and checks if the column name exists on the table. Returns 1.0 when the column was found, 0.0 when the column is missing. |
column_type_hash | Column level data quality sensor that reads the metadata of the table from the data source and calculates a hash of the detected data type (also including the length, scale and precision) of the target colum. Returns a 15-16 decimal digit hash of the column data type. |
sql
Sensor name | Description |
---|---|
sql_aggregated_expression | Column level sensor that executes a given SQL expression on a column. |
sql_condition_failed_count | Column level sensor that uses a custom SQL condition (an SQL expression that returns a boolean value) to count rows that do not meet the condition. |
sql_condition_failed_percent | Column level sensor that uses a custom SQL condition (an SQL expression that returns a boolean value) to count the percentage of rows that do not meet the condition. |
sql_condition_passed_count | Column level sensor that uses a custom SQL condition (an SQL expression that returns a boolean value) to count rows that meet the condition. |
sql_condition_passed_percent | Column level sensor that uses a custom SQL condition (an SQL expression that returns a boolean value) to count the percentage of rows that meet the condition. |
strings
Sensor name | Description |
---|---|
expected_strings_in_top_values_count | Column level sensor that counts how many expected string values are among the TOP most popular values in the column. The sensor will first count the number of occurrences of each column's value and will pick the TOP X most popular values (configurable by the 'top' parameter). Then, it will compare the list of most popular values to the given list of expected values that should be most popular. This sensor will return the number of expected values that were found within the 'top' most popular column values. This sensor is useful for analyzing string columns that have several very popular values, these could be the country codes of the countries with the most number of customers. The sensor can detect if any of the most popular value (an expected value) is no longer one of the top X most popular values. |
expected_strings_in_use_count | Column level sensor that counts how many expected string values are used in a tested column. Finds unique column values from the set of expected string values and counts them. This sensor is useful to analyze string columns that have a low number of unique values and it should be tested if all possible values from the list of expected values are used in any row. The typical type of columns analyzed using this sensor are currency, country, status or gender columns. |
string_boolean_placeholder_percent | Column level sensor that calculates the number of rows with a boolean placeholder string column value. |
string_empty_count | Column level sensor that calculates the number of rows with an empty string. |
string_empty_percent | Column level sensor that calculates the percentage of rows with an empty string. |
string_invalid_email_count | Column level sensor that calculates the number of rows with an invalid emails value in a column. |
string_invalid_ip4_address_count | Column level sensor that calculates the number of rows with an invalid IP4 address value in a column. |
string_invalid_ip6_address_count | Column level sensor that calculates the number of rows with an invalid IP6 address value in a column. |
string_invalid_uuid_count | Column level sensor that calculates the number of rows with an invalid uuid value in a column. |
string_length_above_max_length_count | Column level sensor that calculates the count of values that are longer than a given length in a column. |
string_length_above_max_length_percent | Column level sensor that calculates the percentage of values that are longer than a given length in a column. |
string_length_below_min_length_count | Column level sensor that calculates the count of values that are shorter than a given length in a column. |
string_length_below_min_length_percent | Column level sensor that calculates the percentage of values that are shorter than a given length in a column. |
string_length_in_range_percent | Column level sensor that calculates the percentage of strings with a length below the indicated length in a column. |
string_match_date_regex_percent | Column level sensor that calculates the percentage of values that does fit a given date regex in a column. |
string_match_name_regex_percent | Column level sensor that calculates the percentage of values that does fit a given name regex in a column. |
string_match_regex_percent | Column level sensor that calculates the percent of values that fit to a regex in a column. |
string_max_length | Column level sensor that ensures that the length of string in a column does not exceed the maximum accepted length. |
string_mean_length | Column level sensor that ensures that the length of string in a column does not exceed the mean accepted length. |
string_min_length | Column level sensor that ensures that the length of string in a column does not exceed the minimum accepted length. |
string_not_match_date_regex_count | Column level sensor that calculates the number of values that does not fit to a date regex in a column. |
string_not_match_regex_count | Column level sensor that calculates the number of values that does not fit to a regex in a column. |
string_null_placeholder_count | Column level sensor that calculates the number of rows with a null placeholder string column value. |
string_null_placeholder_percent | Column level sensor that calculates the percentage of rows with a null placeholder string column value. |
string_parsable_to_float_percent | Column level sensor that calculates the percentage of rows with parsable to float string column value. |
string_parsable_to_integer_percent | Column level sensor that calculates the number of rows with parsable to integer string column value. |
string_surrounded_by_whitespace_count | Column level sensor that calculates the number of rows with string surrounded by whitespace column value. |
string_surrounded_by_whitespace_percent | Column level sensor that calculates the percentage of rows with string surrounded by whitespace column value. |
string_valid_country_code_percent | Column level sensor that calculates the percentage of rows with a valid country code string column value. |
string_valid_currency_code_percent | Column level sensor that calculates the percentage of rows with a valid currency code string column value. |
string_valid_date_percent | Column level sensor that ensures that there is at least a minimum percentage of valid dates in a monitored column.. |
string_valid_uuid_percent | Column level sensor that calculates the percentage of rows with a valid UUID value in a column. |
string_value_in_set_percent | Column level sensor that calculates the percentage of rows for which the tested string (text) column contains a value from the list of expected values. Columns with null values are also counted as a passing value (the sensor assumes that a 'null' is also an expected and accepted value). This sensor is useful for testing that a string column with a low number of unique values (country, currency, state, gender, etc.) contains only values from a set of expected values. |
string_whitespace_count | Column level sensor that calculates the number of rows with an whitespace string column value. |
string_whitespace_percent | Column level sensor that calculates the percentage of rows with a whitespace string column value. |
uniqueness
Sensor name | Description |
---|---|
distinct_count | Column level sensor that calculates the number of unique non-null values. |
distinct_percent | Column level sensor that calculates the percentage of unique values in a column. |
duplicate_count | Column level sensor that calculates the number of duplicate values in a given column. |
duplicate_percent | Column level sensor that calculates the percentage of rows that are duplicates. |