Data observability for Data Operations

Detect potential issues with data pipelines

How often invalid data in one table has spread across multiple downstream tables and only a full refresh can help?

Data Observability is a process of observing the Data Quality metrics of all source and target tables. Both: detect Data Quality issues in source tables and ensure that the data pipeline has generated a target table that meets the requirements.

Source Data Quality Rules

Source Data Quality Rules

Monitor the Data Quality rules for source data in one place. Detect issues and instabillity of data sources before they affect the whole Data Warehouse or Data Lake.

DQO.ai stores the Data Quality definitions for tables as simple YAML files. All Data Quality rules for a source table can be edited in one place, using code complete in all most popular text editors. Just copy the Data Quality definition file and make small changes to monitor a quality of another similar table.

  • Data Quality of source tables are easy to define
  • All Data Quality rules for all source tables may be defined in the same way
  • Adding new tables to be observed is as simple as making a copying a YAML file

All downstream tables always correct

All downstream tables always correct

Detect data transformation issues in the data pipeline that generated a target table not meeting the Data Quality requirements.

Data Quality metrics may be defined also for target tables. DQO.ai will verify that the tables are not missing the data and they meet the requirements every day or after every data load.

  • Detect data pipeline issues by observing target tables
  • Ensure that the target tables meet the requirements every day
  • Release your data pipelines with Data Quality rules monitored by DQO.ai to be sure that your pipelines work as expected

Cross-checks across tables

Cross-checks across tables

Detect discrepancies between source and target tables to detect unexpected issue in the data pipelines.


Define summary queries that extract basic metrics of the source and target table. Compare those metrics to detect discrepancies. Simply: row counts in the target table should not be lower than row counts of the source table for each partition.

  • Compare summary metrics between related tables in the data lineage
  • Detect missing data at a partition level
  • Detect value mismatches between related tables by comparing additive columns (aggregable)

All data up to date

All data up to date

Monitor the data lag (delays) for all tables to detect tables that are stale and were not refreshed recently.


DQO.ai comes with a verified set of timeliness checks to monitor the data lag (how old is the newest row in the table). DQO.ai will also monitor the completeness of the data to detect missing data ranges, maybe one day of data was not loaded.

  • Detect tables that were not refreshed recently
  • Detect missing time ranges if an incremental data load missed a few days of data
  • Learn which tables are receiving updates inconsistently with a variable delay

Database Availability and response times

Database Availability and response times

Detect that all tables have data and the response time to typical queries meets the KPIs.

Define Data Quality checks for Availability. DQO.ai will run simple queries on the database and Data Lake to ensure that tables are present and populated. Define typical queries that you run from the dashboards to check the database response time.

  • Ensure that all tables are available
  • Check that tables are populated with data
  • Monitor the database response time for popular queries to ensure that your real-time dashboards are responsive for your users

Downstream tables never corrupted

Downstream tables never corrupted

Monitor the Data Quality of source tables to detect that Data Quality issues may affect downstream tables if the pipelines are not stopped on time. Better to stop the data pipeline than run a full refresh later.

Dependencies between source and target tables are defined together with the Data Quality rules. Your data pipeline could simply check if there are any unresolved Data Quality issues before it corrupts the target table.

  • Data lineage defined with the quality rules
  • Get a list of downstream tables affected by Data Quality issues
  • Track issues across databases and and deep data lineage trees

Data discrepancies

Data discrepancies

Detect inconsistent behavior of source tables that could indicate some other issues.

DQO.ai Data Observability framework will monitor metrics of a table like data delays, daily row count changes or averages for additive columns (fact table measures). Unexpected changes (increases, decreases) to the metrics. It may indicate missing partitions or an incorrect order of data loading when a data pipeline misses some source data.

  • Detect possibly missing data by observing row counts
  • Detect issues by observing outliers like a rapid row count decrease
  • Learn about the dynamics of source tables, their growth rate

No one can understand your data like we do!