What is Data Quality and the Qualytics 8?
We spent a lot of time dealing with data quality issues before taking a stance and building Qualytics. Through our experiences, we identified and defined the Qualytics 8 as the key dimensions that must be addressed for a trustworthy data ecosystem. We believe that to achieve comprehensive data quality, data should be assessed in 8 main categories.
The Qualytics 8
Qualytics uses these 8 fundamental categories to assess data quality.
Conformity is the alignment of the content to the required standards, schemas, and formats. Do the data values comply with the specified formats? Do the values, formats, and definitions in one data set agree with another data set? These requirements dictate how data should conform to expectations.
At Qualytics, we focus on 3 measures for data’s conformity – data types, schemas, and formats.
Examples of Conformity
An incoming CSV (text) file is received and needs to be formatted to timestamp values. If all the date / time values populate as timestamp values that the data store expects, this would pass the conformity test.
Consider a requirement for a product code field where all data should conform to the standard of 3 digit upper-case letters from A to Z – for instance ABC. If there were a new data point that included lowercase letters, say def, this data point would fail the conformity test.
What’s the Point?
Conformity ensures understanding and smooth transformations between data stores. If your data was not aligned with requirements, then downstream operations would be negatively impacted because non-conforming data would be interpreted differently or may not be correctly processed.
A clear example of conformity would be the pattern of an email address. We know that all emails must include these fields to be valid: Name@domain.topleveldomain
Within these fields, certain requirements need to be met to make the email address valid.
- Name – Can include letters A-Z, Numbers 0-9 and Special characters: +, -, _
- @ – There can only be ONE of this symbol
- Domain – Can include letters A-Z, Numbers 0-9 and Special character: _
- Top Level Domain – Collection 2 to 6 letters from A-Z
If a piece of data, from an EMAIL field did not comply with this pattern, and then is delivered to the target, the downstream reports will be affected and incorrect. Setting standards for conformity prevent the data from being incorrect and help support confidence in the data. The bottom line is that data needs to be aligned across a data field, and the whole data store in order to ensure conformity of the data across the data set.
How does Qualytics address conformity?
By identifying the expected data types and formatting patterns of your data, Qualytics can determine if newly inserted records do not fit the expectations – and can do so by inferring the checks based off of historic data’s shapes, patterns, and formats. Based on expected data types, checks will run to determine if your data passes its specific conformity criteria. In the example above the check applied would be a pattern check in a Regex format. Below are some other popular checks that Qualytics checks to confirm Conformity of the data.
- After/Before Date/Time
- Credit Card
- Data Type
- Greater Than / Less Than
- Matches Pattern
- Min/Max String Length
- Not Negative
- Satisfies Query
- Social Security Number
Based on your data’s expectations, the Qualytics Data Firewall automatically identifies conformity issues and will detect and respond to nonconforming data. While conformity is a common data quality problem, it is vital to businesses’ health to address each of the Qualytics 8.
Qualytics is the complete solution to instill trust and confidence in your enterprise data ecosystem. It seamlessly connects to your databases, warehouses, and source systems, proactively improving data quality through anomaly detection, signaling and workflow. Learn more about the Qualytics 8 factors in our other blogs here – Accuracy, Consistency, Timeliness. Let’s talk about your data quality today. Contact us at email@example.com.