Skip to content

Last updated: July 22, 2025

How to detect empty and incomplete columns with examples

Read this guide to learn how to detect empty columns or incomplete columns containing too many null values in a dataset.

The data quality checks that detect empty and incomplete columns are configured in the nulls category in DQOps.

Issues with empty and incomplete columns

DQOps categorizes data quality issues with empty and incomplete columns in the Completeness data quality dimension.

Types of completeness issues

We identify three types of data completeness issues related to the null values in a dataset.

  • Incomplete columns that contain a few null values.

  • Probably incomplete columns that are expected to contain some null values, but the measured percentage of null values is higher than anticipated.

  • Empty columns that have no values. Empty columns were most likely defined to store some information that was never provided.

  • Inconsistently incomplete columns whose percentage of null values changes over time. DQOps detects these types of issues using time series anomaly detection.

The causes of completeness issues

Null values appear in the dataset for several reasons.

  • A human error during a data entry leaves a required field empty.

  • The field is optional, and the process followed by users does not require entering the information.

  • A bug in the data entry form skipped the validation of required fields.

  • A bug in the data transformation or mapping code did not pass the value of a required field downstream.

  • The field was required, but the restriction was lifted later in time

Problems caused by incomplete columns

Null values can cause problems in several data analytics and data transformation areas.

  • Dashboards will show lower numbers because null values are not included in calculations.

  • Some SQL queries that use filters may fail to return any records when a column has any null values. It is caused by Three-Valued Logic used by SQL language to compare values to a null value. For example, an SQL filter: WHERE product_id NOT IN (SELECT nullable_product_id FROM table_with_nulls) will stop working due to a comparison to a null value. This query type is a particular case that forces any SQL-compliant database engine to return no rows. These types of data quality issues are hard to find, especially when a Business Intelligence engine generates the queries.

Incomplete columns

We say that a column is incomplete when it contains some null values. The following example shows the data profiling statistics of a column with over 16% of null values.

Column with null values profiling statistics

Detect incomplete columns with UI

DQOps uses a nulls_count data quality check to count null values. It raises a data quality issue when any null values are found.

The default value of the max_count parameter is 0, which asserts that no null values are present.

Detect incomplete columns with some null values using a data quality check

Incomplete columns error sampling in UI

To assist with identifying the root cause of errors and cleaning up the data, DQOps offers error sampling for this check. You can view representative examples of data that do not meet the specified data quality criteria by clicking on the Error sampling tab in the results section.

Detect incomplete columns with some null values using a data quality check - error sampling

For additional information about error sampling, please refer to the Data Quality Error Sampling documentation.

Detect incomplete columns in YAML

The following example shows a nulls_count check configured in a YAML file.

# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
  columns:
    street_number:
      type_snapshot:
        column_type: STRING
        nullable: true
      monitoring_checks:
        daily:
          nulls:
            daily_nulls_count:
              error:
                max_count: 0

Partially incomplete columns

A partially incomplete column is nullable, and it is acceptable to store some null values in the column, but the column contains too many null values.

We can define a minimum completeness level in many ways:

  • A maximum number of rows containing a null value in the column. We can use the nulls_count data quality check with a higher value of the max_count parameter.

  • A maximum percentage of null values. DQOps has a nulls_percent data quality check for that purpose.

  • A minimum number of rows that must have a non-null value. DQOps has a dedicated data quality check not_nulls_count that also detects empty columns.

  • Or a minimum percentage of non-null values in a column. The data quality check not_nulls_percent supports this case.

Detect partially incomplete column in UI

The nulls_percent check measures the percentage of null values in a column. DQOps supports configuring multiple issue severity levels (warning, error, and fatal) by using a different threshold values.

You can set up multiple thresholds in the advanced Check editor mode. To switch to the Advanced mode, choose the Multiple levels option from the Issue severity level dropdown menu in the simplified Check editor. You can find more information about the Check editor modes in the DQOps user interface overview section.

The following example raises a warning severity issue when the percent of the null value is above 16%. An issue at an error severity level is raised when the percent of null values exceeds 20%.

Detect incomplete columns with a minimum accepted percentage of nulls

Detect partially incomplete column in YAML

The configuration of the nulls_percent check is straightforward in YAML.

# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
  columns:
    street_number:
      type_snapshot:
        column_type: STRING
        nullable: true
      monitoring_checks:
        daily:
          nulls:
            daily_nulls_percent:
              warning:
                max_percent: 16.0
              error:
                max_percent: 20.0 

Detect empty columns

We say a column is empty when it has no values and all rows contain only nulls.

DQOps detects empty columns using the not_nulls_count data quality check with the default configuration.

The not_nulls_count check has a rule parameter min_count that verifies a minimum number of rows containing a value. The default value is 1 row, which finds empty columns not passing that limit.

Detect empty columns in UI

The not_nulls_count check configured with the default settings finds empty columns. The following screen shows a valid column that was not empty.

Detect empty columns in tables with a data quality check

Detect empty columns in UI

The configuration of the not_nulls_count that detects empty columns is shown below.

# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
  columns:
    street_number:
      type_snapshot:
        column_type: STRING
        nullable: true
      monitoring_checks:
        daily:
          nulls:
            daily_not_nulls_count:
              error:
                min_count: 1

Detect a minimum number of non-null values

The configuration of the not_nulls_count check is easy to adapt to detect columns that should have at least a given number of non-null values.

The minimum accepted number of non-null values is configured by setting the min_count parameter to a desired count.

Detect a minimum number of non-null values in UI

The following example shows how to assert that a column contains at least 1 700 000 non-null values.

Detect columns with too little non-null values in a column

Detect a minimum number of non-null values in YAML

The configuration of the not_nulls_count check in a YAML file only uses a different value of the min_count parameter.

# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
  columns:
    street_number:
      type_snapshot:
        column_type: STRING
        nullable: true
      monitoring_checks:
        daily:
          nulls:
            daily_not_nulls_count:
              error:
                min_count: 1700000

Anomalies of data completeness

An unexpected change in the percentage of null value is a noticeable data anomaly. DQOps uses time series anomaly detection to identify these issues. The nulls_percent_anomaly check measures the percentage of null values for each day and raises data quality issues for anomalies.

The nulls_percent_anomaly check supports two methods of operation.

  • A daily monitoring check measures the percentage of all rows in a monitored table that contain null values.

  • A daily partition check analyzes every daily partition. The check raises a data quality issue when the percentage of null values between daily partitions changes. It can happen when the transformation logic in the data pipeline was recently modified, and an invalid transformation has problems with data conversion.

Configuring completeness anomaly detection in UI

The following sample shows how to configure the daily_partition_nulls_percent_anomaly check for detecting null anomalies across daily partitions. The configuration of the daily_nulls_percent_anomaly check that monitors the whole table every day is the same, but the daily_nulls_percent_anomaly check requires 30 days of monitoring before it will show any results.

As shown in the example below, we observed a significant increase in percentage of null values on August 4th,

Detect anomalies in the percentage of null values in a column

Configuring completeness anomaly detection in YAML

The nulls_percent_anomaly check only requires the configuration of the anomaly_percent parameters for each issue severity level.

# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
  timestamp_columns:
    partition_by_column: created_date
  incremental_time_window:
    daily_partitioning_recent_days: 7
    monthly_partitioning_recent_months: 1
  columns:
    street_number:
      type_snapshot:
        column_type: STRING
        nullable: true
      partitioned_checks:
        daily:
          nulls:
            daily_partition_nulls_percent_anomaly:
              warning:
                anomaly_percent: 3.0
              error:
                anomaly_percent: 0.5

Completeness data quality issues

The data quality dashboards for monitoring null values are found in the Data Quality Dimensions -> Completeness folder.

Whole table monitoring dashboards

The Current completeness issues on columns dashboard shows an aggregated view of active data quality issues with empty or incomplete columns.

This dashboard shows only the status of the most recent evaluation of all data quality checks for null values. When the data is fixed in the data source, and the failed data quality check is rerun, the issue will disappear from the dashboard.

Data quality dashboard showing empty and incomplete detected by data quality checks

Partition monitoring dashboards

DQOps has a separate set of data quality dashboards for partitioned data. These dashboards are found in the "Partitions" folder. They show issues for every daily or monthly partition.

The top section of the partition's *Current completeness issues on columns" dashboard shows the data sources, affected tables, and the types of completeness issues.

Partitions with nulls shown on a dashboard - the filters

The next section shows more details about incomplete or empty columns. The status identifies the highest severity issue by color.

List of columns in a partitioned table that have incomplete data shown on a data quality dashboard

Use cases

Name of the example Description
Detect incomplete columns This example shows how to incomplete columns that have too many null values using the nulls_count check.

List of nulls checks at a column level

Data quality check name Friendly name Data quality dimension Description Standard check
nulls_count Maximum count of rows containing null values (incomplete column) Completeness Detects incomplete columns that contain any null values. Counts the number of rows having a null value. Raises a data quality issue when the count of null values is above a max_count threshold.
nulls_percent Maximum percentage of rows containing null values (incomplete column) Completeness Detects incomplete columns that contain any null values. Measures the percentage of rows having a null value. Raises a data quality issue when the percentage of null values is above a max_percent threshold. Configure this check to accept a given percentage of null values by setting the max_percent parameter.
nulls_percent_anomaly Abnormal change in percentage of null values. Measured as a percentile of anomalous measures. Completeness Detects day-to-day anomalies in the percentage of null values. Measures the percentage of rows having a null value. Raises a data quality issue when the rate of null values increases or decreases too much.
not_nulls_count Minimum count of rows containing non-null values Completeness Verifies that a column contains a minimum number of non-null values. The default value of the min_count parameter is 1 to detect at least one value in a monitored column.
not_nulls_percent Maximum percentage of rows containing non-null values Completeness Verifies that a column contains some null values by measuring the maximum percentage of rows that have non-null values. Raises a data quality issue when the percentage of non-null values is above max_percentage, which means that a column that is expected to have null values is The default value of the max_percentage parameter is 0.0, but DQOps supports setting a higher value to verify that the percentage of null values is not above a threshold.
empty_column_found Find an empty column Completeness Detects empty columns that contain only null values. Counts the number of rows that have non-null values. Raises a data quality issue when the count of non-null values is below min_count. The default value of the min_count parameter is 1, but DQOps supports setting a higher number to assert that a column has at least that many non-null values.
nulls_percent_change Maximum percentage of change in the count of null values Completeness Detects relative increases or decreases in the percentage of null values since the last measured percentage. Measures the percentage of null values for each day. Raises a data quality issue when the change in the percentage of null values is above max_percent of the previous percentage.
nulls_percent_change_1_day Maximum percentage of change in the count of null values vs 1 day ago Completeness Detects relative increases or decreases in the percentage of null values since the previous day. Measures the percentage of null values for each day. Raises a data quality issue when the change in the percentage of null values is above max_percent of the previous percentage.
nulls_percent_change_7_days Maximum percentage of change in the count of null values vs 7 day ago Completeness Detects relative increases or decreases in the percentage of null values since the last week (seven days ago). Measures the percentage of null values for each day. Raises a data quality issue when the change in the percentage of null values is above max_percent of the previous percentage.
nulls_percent_change_30_days Maximum percentage of change in the count of null values vs 30 day ago Completeness Detects relative increases or decreases in the percentage of null values since the last month (30 days ago). Measures the percentage of null values for each day. Raises a data quality issue when the change in the percentage of null values is above max_percent of the previous percentage.

Reference and samples

The full list of all data quality checks in this category is located in the column/nulls reference. The reference section provides YAML code samples that are ready to copy-paste to the .dqotable.yaml files, the parameters reference, and samples of data source specific SQL queries generated by data quality sensors that are used by those checks.

What's next