What is Data Quality and the Qualytics 8?
We spent a lot of time dealing with data quality issues before we decided to take a stance and build Qualytics. Going through our experiences, we have ultimately identified and defined the Qualytics 8 as the key dimensions that must be addressed for a trustworthy data ecosystem. We believe that to achieve comprehensive data quality, data should be assessed in 8 main categories.
The Qualytics 8
Qualytics uses these 8 fundamental categories to assess data quality.
What is Data Precision in Data Quality?
Many people confuse data precision with accuracy, but it’s important to understand each and the differences, especially when applied to data quality. Precision is defined as the exactness of the measurement. A highly precise television would reflect minute differences in colors with incredibly high pixel resolution. In data quality, precision assesses the depth of detail that is encoded in the data. To strengthen the definition, one may ask themself, “how tightly can my data be defined?”
To understand precision, one must also understand accuracy. Accuracy is how close the overall dataset reflects the truth. Just because something is accurate, it doesn’t make it precise, and vice versa. The key to success lies in optimizing both to serve data quality remediation.
A basic example of precision can be seen when shooting a basketball. Let’s say a player misses every shot thrown, but every shot hit exactly the front of the rim. Although these shots are inaccurate, as they are missing the basket, they are precise since it’s hitting the same spot every time. This example correlates to the bottom right dart board depicting precision but not accuracy.
Since the player is consistently inaccurate, they can correct their throws by aiming higher to achieve both accuracy and precision. Qualytics applies this same idea to customers’ data. If the data is precise but inaccurate, it can be adjusted correspondingly to ensure anomalies are detected based on any deviation from the precision.
Let’s revisit an example from previous blogs in the Qualytics 8 series of collecting data on customer’s mailing addresses. Imagine that all addresses are converted into a latitude/longitude coordinate for the purpose of mapping and geospatial analytics. Perhaps you want to know the land zoning of your customers. Maybe one of your customers is Amazon in New York City, located at 40.7533344,-74.0000361. If you only had 2 digits of precision (40.75,-74.00), you would be 5 city blocks away at a park. With 1 digit of precision (40.8, 74.0), you would be at yet another park on the other side of the Hudson River. In either circumstance, this loss of precision gives you an entirely different result and many geostatistical insights would be unavailable to your business.
How to Check for Data Precision and Fix Anomalies
The real challenge with data precision problems is that it’s nearly impossible to undo the problem later. You have to catch the issue as close to the beginning before the detail is lost. This means ensuring that database replicas are using compatible data types (e.g. a float with 7 digits of precision vs a double with 15 digits of precision) and that the values are precisely the same (or different only within an acceptable tolerance).
Tips to Ensure Precision in Data Quality
Problems with precision in data quality are often discovered in relation to other pieces of data and information, underscoring the point that addressing data quality isn’t a one-stop shop. It requires the combination of multiple key data quality dimensions in coordination with each other.
How Does the Qualytics Platform Address Precision?
So, how does precision relate to data quality and data remediation?
Let’s consider a scenario where data has been incorrectly recorded for years. While the historical data cannot be used to set the correct calculation, it’s still possible to monitor for fluctuations and detect anomalies. Although the data is inaccurate, it is precise; these historical values have a tendency to be skewed in the same way every time. Therefore, precision can be used to address the underlying issues by adjusting the data based on the discovered anomaly. For example, if the data being collected on a form was not accounting for daylight savings time, the data could be “salvaged” by being re-adjusted to account for this anomaly. And if any of the values don’t fall within the skew, alerts for anomalies can be created in order to remediate the data quality discrepancy moving forward.
Qualytics is the complete solution to instill trust and confidence in your enterprise data ecosystem. It seamlessly connects to your databases, warehouses, and source systems, proactively improving data quality through anomaly detection, signaling, and workflow. Learn more about the Qualytics 8 factors in our other blogs here – Accuracy, Coverage, Conformity, Consistency, and Timeliness. Let’s talk about your data quality today. Interested in learning about how the Qualytics data remediation process can help you? Fill out our interest form or email us directly at email@example.com.