The Data Quality Assessment Framework (DQAF) identifies errors and inconsistencies within a dataset that can negatively impact its overall quality.
Data quality assessment is a process used to identify errors and inconsistencies within a dataset that can negatively impact its overall quality. Poor data quality can significantly disrupt business operations, increase the costs of running data platforms, and even lead to violations of regulatory requirements in sectors like healthcare and finance.
When organizations encounter a high volume of data quality issues, it’s essential to conduct a thorough assessment. This involves verifying the validity of these issues and determining their impact on the organization. The assessment should provide clear answers to management and data owners on whether a data cleansing initiative would effectively address the most critical problems. To conduct a comprehensive data quality assessment, it’s crucial to engage various data stakeholders involved in the data lifecycle – those who provide, manage, and use the data – to gather their feedback and insights.
Table of Contents
What is Data Quality Assessment
Data quality assessment is a structured process designed to pinpoint datasets with low quality. Low-quality datasets contain a significant number of invalid records, which reduces their usability for analysis and other business purposes. Each type of problem found in the data is known as a data quality issue.
The primary aim of a data quality assessment project is to uncover these issues and determine if they negatively impact business operations. The assessment concludes by providing data owners with recommendations regarding the identified issues and their severity. These recommendations are typically delivered in the form of a report.
In addition to a list of confirmed errors, the report should also provide a numeric “health score” to quantify the data quality. For example, the report might state: “The b2b_customers table has 80% completeness.” This gives stakeholders a clear and measurable understanding of the dataset’s quality.
Steps in the Data Quality Assessment Framework
A data quality assessment framework is a process that should be repeated for each data domain within an organization. The project manager should engage data engineers, data owners, data analysts, and all other relevant stakeholders who can provide feedback on potential data errors. This process typically involves the following six steps:
- Define Goals: Clearly establish the business reasons for undertaking the data quality assessment. This might include addressing data reliability concerns, meeting regulatory compliance requirements, or improving the accuracy of data used for decision-making.
- Determine Scope and Resources: Identify the critical datasets that will be included in the assessment. Assemble a team of data stakeholders, engineers, and data stewards who will participate in the project.
- Set Up Tools: Choose and configure the appropriate data profiling or data quality tools. Ensure the tools can connect to the relevant data sources and that the necessary metadata is available.
- Conduct Data Profiling: Use the selected tools to analyze the datasets. Gather statistics on data values, examine sample data, and perform data quality tests to identify potential errors and inconsistencies.
- Review Errors: Carefully examine the data quality issues flagged by the tools. Collaborate with data owners and consumers to validate these issues and assess their impact on business operations.
- Generate Data Quality Baseline Report: Compile a comprehensive report that summarizes the data quality assessment findings. Include a “health score” for each dataset and a prioritized list of data quality errors that require remediation.
Examples and Risks
Data quality assessments should prioritize the identification of risks that jeopardize data integrity. These data quality issues can render data unusable for analysis or decision-making. Risk assessments should also aim to confirm suspected data errors and pinpoint recurring issues that require excessive data cleansing efforts.
Data quality tools can automatically detect many common data quality problems, including:
- Duplicate values: For instance, a product list might contain two identical product entries, indicating an error.
- Format errors: This includes issues like incorrect formatting of tax identifiers, phone numbers, or dates.
- Invalid categorical values: Examples include incorrect values in columns containing country or state names.
- Data outliers: These are unusually large or small values that are likely inaccurate, such as a delivery date set far in the future.
By proactively identifying and addressing these types of data quality risks, organizations can ensure their data remains reliable, accurate, and fit for its intended purpose.
How to Pick Data Quality Assessment Tools
Data quality assessments are often conducted using specialized data quality tools. These tools typically offer a combination of features: data profiling, data quality testing, and data quality reporting. While some data teams rely on SQL queries for data profiling, this method can be labor-intensive and require many manual steps. Standalone data profiling tools are also common, but they may provide statistics without comprehensive data quality testing capabilities.
To perform an end-to-end data quality assessment effectively, consider selecting a tool that supports the following features:
- Data Profiling: This functionality enables the collection of data samples and the calculation of essential statistics, such as row counts, distinct counts, and data type distributions.
- Data Quality Testing: The tool should include built-in capabilities to detect common data quality issues, such as duplicates, format errors, invalid values, and outliers.
- Data Quality Reporting: The ability to generate clear and concise data quality reports is crucial. These reports should summarize key findings, including data quality metrics and prioritized lists of errors.
Data quality best practices - a step-by-step guide to improve data quality
- Learn the best practices in starting and scaling data quality
- Learn how to find and manage data quality issues
Best Practices for Conducting the Assessment
The success of a data quality assessment hinges on several key factors that should be incorporated into the framework:
- Engage Stakeholders: Actively involve all data stakeholders (owners, engineers, analysts, business users) throughout the assessment process. Conduct interviews and consultations to understand their data quality needs and concerns.
- Establish Standards: Define clear data quality standards, including the specific dimensions to be assessed (completeness, accuracy, consistency, etc.). Use a standardized formula for calculating data quality scores to ensure consistency and comparability across datasets.
- Focus on Actionable Outcomes: The final report should prioritize confirmed data quality issues that require immediate attention. Present the findings in a clear and concise manner, enabling data owners and remediation teams to take effective action.
What is the DQOps Data Quality Operations Center
DQOps is a data quality and observability platform designed to monitor data and assess the data quality trust score with data quality KPIs. DQOps provides extensive support for configuring data quality checks, applying configuration by data quality policies, detecting anomalies, and managing the data quality incident workflow.
DQOps is a platform that combines the functionality of a data quality platform to perform the data quality assessment of data assets. It is also a complete data observability platform that can monitor data and measure data quality metrics at table level to measure its health scores with data quality KPIs. DQOps provides over 50 customizable data quality dashboards that can report data health from various perspectives.
You can set up DQOps locally or in your on-premises environment to learn how DQOps can monitor data sources and ensure data quality within a data platform. Follow the DQOps documentation, go through the DQOps getting started guide to learn how to set up DQOps locally, and try it.
You may also be interested in our free eBook, “A step-by-step guide to improve data quality.” The eBook documents our proven process for managing data quality issues and ensuring a high level of data quality over time. This is a great resource to learn about data quality.