Data quality is a perceived level of usefulness of data. You must ask users to fill out a data quality requirements template to provide their expectations of good data quality.
When it comes to using data to make smart choices, the quality of that data is super important. Think of it like building a house – you need strong materials to make sure it’s safe and lasts a long time. In the same way, good data quality means your information is accurate, and you can trust it to make good decisions.
To get that good quality, we need to start by talking to the people who know the data best. This means both the data owners, who are in charge of the data, and the data users, like analysts and scientists, who actually work with the data every day. The data owners can tell us what the data is supposed to be used for and how important it is. The data users can point out any problems they’ve seen with the data, so we can fix them and make sure they don’t happen again. By working together, we can make sure our data is always in top shape, so we can use it to make the best choices possible.
The following data quality requirements template can be a good starting point for collecting quality expectations of all users and data stakeholders.
Data stakeholders
The key players in gathering data quality requirements are not just the data owners but also the data team members who work with the data daily to uncover insights. They are often the first to spot any data quality problems and act as the initial support for business users who encounter incorrect results in their dashboards, reports, or AI models due to poor data quality.
Here’s a breakdown of the specific data stakeholders you should involve:
- Data Owners: As the custodians of the data, they possess the deepest understanding of its purpose and structure. They play a crucial role in not only defining data quality expectations but also actively participating in resolving any data quality issues that arise.
- Data Consumers: This group includes data scientists and data analysts who directly utilize the data. Their hands-on experience often provides invaluable insights into potential data quality concerns that might otherwise go unnoticed.
- Data Engineering Teams: If your data undergoes transformations or processing, involving the data engineering teams is essential. They might have encountered challenges during data handling that could shed light on underlying data quality problems.
Data quality requirements collection process
Gathering data quality requirements is a collaborative effort that involves engaging with stakeholders, understanding their perspectives, and documenting their expectations. This process ensures that everyone is on the same page about what constitutes ‘good’ data, paving the way for effective data quality management.
Here’s a breakdown of the typical steps involved:
- Template Distribution: The Data Quality team prepares clear and concise templates (like the Excel examples mentioned earlier) and distributes them to the identified data stakeholders.
- Stakeholder Input: Data owners, consumers, and engineering teams fill out the templates based on their understanding and requirements. Encourage them to provide specific examples and use cases to illustrate their expectations.
- Template Collection: Stakeholders return the completed templates to the Data Quality team.
- Review and Consolidation: The Data Quality team meticulously reviews the submitted templates, consolidates the information, and addresses any ambiguities or inconsistencies. This might involve follow-up discussions with stakeholders to clarify specific points.
- Data Quality Rule Creation: Based on the gathered requirements, the Data Quality team defines and implements data quality rules to monitor and enforce data integrity.
- Data Quality Review with Data Stakeholders: Once potential data quality issues have been identified, it’s crucial to review and confirm these findings with the relevant stakeholders. This collaborative discussion will help determine the appropriate data cleansing activities and ensure that everyone is aligned on the path forward.
Data quality requirements template
A practical and effective approach for gathering data quality requirements is to dedicate one Excel file to each table in your database. This allows for a focused and organized collection of information specific to that table.
The Excel file should ideally consist of two sheets:
- Table-Level Information: This sheet captures the essential metadata about the table. The content of this sheet is described in detail below.
- Column-Level Requirements: This sheet dives deeper into the data quality expectations for each column within the table. The specifics of this sheet are also outlined below.
To ensure consistency and avoid confusion, it’s highly recommended to implement validation rules within the Excel fields. This will guide data stakeholders to provide their input in a standardized format, making subsequent analysis and consolidation much smoother.
By using this structured Excel template approach, you create a clear and accessible framework for capturing data quality requirements, fostering collaboration among stakeholders, and laying a strong foundation for data quality management.
Table-level summary information
In the first sheet of your Excel template, focus on gathering four key types of information from data owners that are vital for understanding the meaning, criticality, location, and volume of the data source:
Data Owner Fields
This section collects details about the person or team responsible for the data source and establishes the communication channels for reporting data quality issues:
- Business Data Owner Name: The individual or team accountable for the table’s business context and data quality.
- Data Owner’s Department Name: The department responsible for the table.
- Data Steward Name: The person directly managing and maintaining the table’s data quality.
- Notification Address: The email address(es) to which data quality issues should be reported.
Table Details Fields
Here, we capture information about the physical location of the table, its name, type, and its role within the overall data architecture:
- Table Name: The name of the table in the database.
- Database Name: The name of the database where the table resides.
- Schema Name: If applicable, the schema within the database where the table is located.
- Table Type: The type of table (e.g., Fact table, Dimension table, Staging table).
- Table Role: A brief description of the table’s function in the data architecture.
- Description: A more detailed description of the purpose of the table, including special requirements for data quality checks.
Data Criticality Fields
This section assesses whether the table requires special handling due to sensitive data or if it warrants higher priority in data quality checks:
- Table Criticality Level: The table’s importance to business operations (e.g., High, Medium, Low).
- Sensitive Data Status: Indicates if the table contains sensitive data (Present, Possible, Not Acceptable).
- Data Quality Priority: Suggestion from the data owner about the position of the table’s data quality assessment task in the data quality project’s backlog.
- SLA for issues: Expected data quality issue reaction time.
Data Volume Fields
These fields capture the table’s current size and anticipated growth rate, which is crucial for planning data quality checks and avoiding performance bottlenecks:
- Expected Volume: The estimated number of rows currently in the table.
- Daily Volume Increase: The expected daily increase in the number of rows.
- Monthly Volume Increase: The expected monthly increase in the number of rows.
- Days of Volume Increase: The days that the table is updated, for example, working days or once a week.
- Data Quality Check Scheduling: The expected time when the data observability platform can scan the data for data quality issues.
By gathering this table-level information, you establish a solid foundation for understanding the context and importance of the data, enabling you to tailor your data quality efforts effectively.
Column-level requirements
In the second sheet of your Excel template, delve into the specifics of each column that data stakeholders deem crucial for data quality monitoring. This is where you’ll capture their expectations regarding the acceptable values and formats for each column.
Key Points:
- Identify Key Columns: Ask stakeholders to list all the columns they consider important from a data quality perspective. These might be columns used in critical business calculations, reporting, or decision-making processes.
- Data Dictionary References: If a column’s values should adhere to a predefined set of options (like currency codes or department names), request that stakeholders provide either:
- A link to the relevant data dictionary or reference table.
- A list of all expected values for that column.
- Regular Expression Patterns: For columns where values must conform to a specific format (such as email addresses or contract numbers), ask stakeholders to provide:
- A few sample values that exemplify the correct format.
- If possible, a regular expression pattern that can be used to validate the format automatically.
By capturing these column-level requirements, you empower your data quality efforts with the granular details needed to implement effective data validation and monitoring processes. This proactive approach helps prevent data quality issues from impacting your business operations and ensures that your data remains reliable and trustworthy.
Download DQOps data quality requirement templates
DQOps team has prepared a data quality requirements template document that you can use. Please click the link below to download the file.
Data quality best practices - a step-by-step guide to improve data quality
- Learn the best practices in starting and scaling data quality
- Learn how to find and manage data quality issues
What is the DQOps Data Quality Operations Center
DQOps is a data quality platform designed to monitor data and assess the data quality trust score with data quality KPIs. DQOps provides extensive support for configuring data quality checks, applying configuration by data quality policies, detecting anomalies, and managing the data quality incident workflow.
DQOps was designed to be a user-friendly data quality platform that empowers users to perform data quality assessment projects. DQOps has a rule mining engine that will suggest the configuration of data quality checks. It also supports defining custom data quality checks, but its competitive advantage is the ability to define reusable custom data quality checks, which can be used to validate organization-specific data formats, such as contract or invoice numbers.
You can set up DQOps locally or in your on-premises environment to learn how DQOps can monitor data sources and ensure data quality within a data platform. Follow the DQOps documentation, go through the DQOps getting started guide to learn how to set up DQOps locally, and try it.
You may also be interested in our free eBook, “A step-by-step guide to improve data quality.” The eBook documents our proven process for managing data quality issues and ensuring a high level of data quality over time.