Skip to content

Rules

In DQO, the data quality rule and the data quality sensor form the data quality check.

Rule is a set of conditions against which sensor readouts are verified, described by a list of thresholds. A basic rule can simply score the most recent data quality result if the value is above or below particular value or within the expected range.

Rules evaluate sensors results and assigns them severity levels. There are 3 severity levels in DQO: warning, error and fatal

Example of rule

A standard data quality check on a table that counts the number of rows uses a simple "min_count" rule. For example when the error severity level is set to 10 and the table has fewer than 10 rows the data quality error will be raised.

Below is an example of Phyton script that defines classes and methods for min_count threshold rule.

min_count.py
# Class that specifies the minimum count parameter for the rule. 
class MinCountRuleParametersSpec:
    min_count: float

# Class that represents a historical data point from the sensor.
class HistoricDataPoint:
    timestamp_utc: datetime
    local_datetime: datetime
    back_periods_index: int
    sensor_readout: float

# Class that specifies the time window settings for the rule.
class RuleTimeWindowSettingsSpec:
    prediction_time_window: int
    min_periods_with_readouts: int


# Class Rthat specifies the parameters for running the rule. 
class RuleExecutionRunParameters:
    actual_value: float
    parameters: MinCountRuleParametersSpec
    time_period_local: datetime
    previous_readouts: Sequence[HistoricDataPoint]
    time_window: RuleTimeWindowSettingsSpec


# Class that specifies the result of running the rule.
class RuleExecutionResult:
    passed: bool
    expected_value: float
    lower_bound: float
    upper_bound: float

    def __init__(self, passed=True, expected_value=None, lower_bound=None, upper_bound=None):
        self.passed = passed
        self.expected_value = expected_value
        self.lower_bound = lower_bound
        self.upper_bound = upper_bound


# A method that evaluates the rule based on the parameters specified in the RuleExecutionRunParameters class.
def evaluate_rule(rule_parameters: RuleExecutionRunParameters) -> RuleExecutionResult:
    if not hasattr(rule_parameters, 'actual_value'):
        return RuleExecutionResult()

    expected_value = None
    lower_bound = rule_parameters.parameters.min_count
    upper_bound = None
    passed = rule_parameters.actual_value >= lower_bound

    return RuleExecutionResult(passed, expected_value, lower_bound, upper_bound)

Rule categories

Rules are divided into the following categories. A full description of each category and subcategory of rules is available at the link.