Last updated: July 22, 2025
How to detect data type changes in text columns, with examples
Read this guide to learn how DQOps detect the effective data type in text columns. Data type detection verifies that values are castable to a given data type.
The data quality checks that detect data types in text columns are configured in the datatype
category in DQOps.
Data type detection
The data type detection checks in DQOps are designed to analyze text columns in raw tables before the data pipeline or an ETL process converts the values to a target data type, such as an integer, float, date, or timestamp.
Data type detection prevents issues in the data pipelines.
-
The data loading process can break on the first value that cannot be converted to the target data types. It often happens when a CAST SQL function is used to convert text values to a target data type. The default behavior of most database engines is to stop processing an INSERT or UPDATE command, returning an error code. If that happens and the error is not handled correctly, the batch is not loaded.
-
The data transformation logic can skip rows containing values that were not transformed (converted). It leads to data completeness issues and missing rows.
-
A data transformation logic that uses safe casting will convert values in a wrong format to a null value, leading to incomplete values in the target column.
Physical data type detection
The data type detection checks in the datatype
category analyze values in text columns (string, char, varchar, nvarchar, ...).
They are sensing the desired data type based on the values in the column.
If you want to detect that the physical data type of a column has changed, use the
table schema drift checks in the schema
category.
Mixed data type issues
The following example shows an issue related to storing mixed data types in a text column. The tested column is street_name. We assume that all values are integer values, allowing us to use an INTEGER data type instead of a STRING type. The column statistics show that the most common street numbers are integers, but DQOps detected that the column contains mixed data types.
We can change the table filter to return only rows containing non-numeric values that failed to be converted to an INTEGER data type.
The SQL filter that we used is shown below.
After capturing statistics for the column again, the only column value samples that were captured were non-numeric values.
Data sample
The following sample of the column values shows both integer and non-integer values.
street_number |
---|
1520 |
500 |
1500 |
600 |
915 1/2 |
Data type detections by DQOps
DQOps has two dedicated data quality checks for detecting data types in text columns. The SQL queries in these checks try to convert all values in a column to all supported data types. If all values are convertible to the same data type, DQOps returns the category code of that type. Otherwise, DQOps returns a special data type category code that identifies a mixed data type.
Detected data types
DQOps detects the following categories of data types.
Data type category | Data type identifier (actual_value) |
Description | Sample values |
---|---|---|---|
integers | 1 | Detect integer and long values | 0, 1, -12, 12345 |
floats | 2 | Detect numeric values, but also supports integer values | 1.23, -234.333, 1, 3444 |
dates | 3 | Dates without the time in ISO 8601 format and US format | 19/02/2020, 2020-04-01 |
datetimes | 4 | Date and time values containing both the date and time components | 10/12/2020 0:00:01, 2020-12-10 0:00:01 |
timestamps | 5 | Datetime extended by fraction of second or time zone | 2020-12-10T00:00:01.123456-07:00, 2020-12-10 00:00:01 GMT+01:00 |
booleans | 6 | Placeholders of a true and false values | true, false, TRUE, FALSE, yes, no, YES, NO, y, n, Y, N, t, f, T, F |
texts | 7 | Text values that are not numbers, dates or boolean placeholders | New York, Austin, TX |
mixed | 8 | The values found in the column are mixed including values of multiple types | 1, 43.11, 2020-04-01, true, Austin |
Data type detection checks
DQOps has two data quality checks for data type detection. An assertion check that verifies if all values in a column match a given data type. And a data type change detection that detects if new rows contain values of a different data type.
Assert data type check
The detected_datatype_in_text data quality check analyzes all values in a column and asserts that all values are of an expected data type.
Configure data type detection check in UI
The detected_datatype_in_text data quality check is easy to activate. The parameter of the rule is the expected data type.
Data type detection error sampling in UI
To assist with identifying the root cause of errors and cleaning up the data, DQOps offers error sampling for this check. You can view representative examples of data that do not meet the specified data quality criteria by clicking on the Error sampling tab in the results section.
For additional information about error sampling, please refer to the Data Quality Error Sampling documentation.
Configure data type detection check in YAML
The detected_datatype_in_text check uses the data type category names listed in the table above.
Detect type changed check
The data type change detection check detected_datatype_in_text_changed monitors changes to the data type, comparing the detected data type to the last known detected data type. DQOps uses the historical data quality results for change detection.
Configure type change check in UI
The detected_datatype_in_text_changed check uses parameterless data quality rules to select the severity level for reported issues.
The example above shows the first execution of the check when historical results are not yet present. Hence, the expected_value is missing. When DQOps reruns this check the next day, the expected_value will be the value of the actual_value from the previous day.
Configure type change check in YAML
The detected_datatype_in_text_changed check uses parameterless rules to activate the desired alerting severity level.
Detecting data types across partitions
The data type detection checks also work on partitioned data. A partitioned variant of the check is enabled in the Partitioned Checks section. DQOps will analyze rows for each daily or monthly partition. By using a partitioned version of the data type detection checks, it is possible to detect data type drifts between partitions.
The following example shows that the last five daily partitions contained only integer values in the street_number column, and the detected data type has not changed day-to-day.
List of datatype checks at a column level
Data quality check name | Friendly name | Data quality dimension | Description | Standard check |
---|---|---|---|---|
detected_datatype_in_text | Detect and verify the data type of all values in a text column | Consistency | A column-level check that scans all values in a text column and detects the data type of all values in a monitored column. The actual_value returned from the sensor can be one of seven codes: 1 - integers, 2 - floats, 3 - dates, 4 - datetimes, 5 - timestamps, 6 - booleans, 7 - strings, 8 - mixed data types. The check compares the data type detected in all non-null columns to an expected data type. The rule compares the value using equals and requires values in the range 1..8, which are the codes of detected data types. | |
detected_datatype_in_text_changed | Detect a change of the data type of all values in a text column | Consistency | A column-level check that scans all values in a text column, finds the right data type and detects when the desired data type changes. The actual_value returned from the sensor can be one of seven codes: 1 - integers, 2 - floats, 3 - dates, 4 - datetimes, 5 - timestamps, 6 - booleans, 7 - strings, 8 - mixed data types. The check compares the data type detected during the current run to the last known data type detected during a previous run. For daily monitoring checks, it compares the value to yesterday's value (or an earlier date). For partitioned checks, it compares the current data type to the data type in the previous daily or monthly partition. The last partition with data is used for comparison. |
Reference and samples
The full list of all data quality checks in this category is located in the column/datatype reference. The reference section provides YAML code samples that are ready to copy-paste to the .dqotable.yaml files, the parameters reference, and samples of data source specific SQL queries generated by data quality sensors that are used by those checks.
What's next
- Learn how to run data quality checks filtering by a check category name
- Learn how to configure data quality checks and apply alerting rules
- Read the definition of data quality dimensions used by DQOps