Data observability for DevOps
Keep the Data Quality checks along with the data pipeline code
How hard is it to migrate definitions between development, test and production environments?
Data Observability can help. Observe the data sources that you use in the dashboards and get warned about potential issues before more dashboards are affected.
Source Data Quality Rules
Source Data Quality Rules
Monitor the Data Quality rules for source data in one place. Detect issues and instability of data sources before they affect the whole Data Warehouse or a Data Lake.
DQO.ai stores the Data Quality definitions for tables as simple YAML files. All Data Quality rules for a source table can be edited in one place, using code complete in all most popular text editors. Just copy the Data Quality definition file and make small changes to monitor a quality of another similar table.
- Data Quality of source tables are easy to define
- All Data Quality rules for all source tables may be defined in the same way
- Adding new tables to be observed is as simple as making a copying a YAML file
Data Quality Testing
Data Quality Testing
Define Data Quality tests in code. Develop data pipelines following a Test Driven Development approach: develop the pipeline, test it, refactor... and retest after the changes.
Data Quality checks are defined in text files. The developer can follow a Test Driven Workflow to run data loading scripts followed by running Data Quality checks.
- Data Quality checks defined in code
- Data Quality checks may be instantly executed
- Enable Test Driven Development and Integration Testing for databases and Data Lakes
Tested DEV -> TEST -> PROD migration
Tested DEV -> TEST -> PROD migration
Define Data Quality rules on the dev / test / UAT environments. Run Data quality checks after migration to the production environment to ensure that the migration was successful.
Data Quality rules that are defined in text files are easy to store in the code repository. No deployment is required to update the Data Quality checks. Simply migrate your pipelines to the production environment, run the pipelines and run DQO.ai Data Quality checks to ensure a successful migration.
- Manage multiple environments
- Instantly upgrade the Data Quality rules after migrating your pipelines to the production environment
- Define Data Quality tests to be executed after migration
Data Quality Test Versioning
Data Quality Test Versioning
Store the Data Quality rules definitions in a source repository. Track how your Data Quality expectations are evolving over time.
DQO.ai Data Quality rules are just YAML files. Store them in the repository like any other code. Create pull requests, compare changes to the rules by simply using tools for Git.
- Data Quality rules are easy to version
- Data Quality rules may be released after a peer review (a pull request)
- Check who has changed the Data Quality rules or a data lineage dependency
Work with local environments
Work with local environments
Verify the Data Quality of data generated by your data preparation scripts on local environments (local databases) before the changes are merged into the shared environment.
DQO.ai does not need a server to run Data Quality checks. DQO.ai command line tools can connect to your local database and run the Data Quality checks.
- Build Data Quality checks without affecting shared environments (like a development database shared by other developers)
- Verify changes to the Data Quality checks locally
- Design and test custom Data Quality checks in isolated environments