The Qualytics 8 – Accuracy

What is Data Quality and the Qualytics 8? 

We spent a lot of time dealing with data quality issues before we decided to take a stance and build Qualytics. Going through our experiences, we have ultimately identified and defined the Qualytics 8 as the key dimensions that must be addressed for a trustworthy data ecosystem. We believe that to achieve comprehensive data quality, data should be assessed in 8 main categories.

The Qualytics 8

Qualytics uses these 8 fundamental categories to assess data quality.

Completeness
availability of required data attributes
Coverage
availability of required data records
Conformity
alignment of content with required standards & schemas
Consistency
how well the data complies with required formats / definitions
Duplication
redundancy of records and/or attributes
}}
Timeliness
currency of content representation as well as whether data is available / can be used when needed
Volumetrics
volume data per row / column / client / entity over time
Accuracy
relationship of the content with original intent

Accuracy Explained

Data accuracy is often one of the most critical components of assessments of data for its quality. We often like to think of Accuracy as the data’s relationship with its original intent. In order for a data point to be considered as being accurate, the values – the content – of the underlying data must be correct and must be represented in a consistent and unambiguous form

The data’s form is inevitably crucial to accuracy. Let’s start with an example: we may encounter date data in our database, such as 09/03/21 and 03/09/2021. Simply looking at the raw data would cause ambiguity without knowing the intent. In this case, the database may have had the intent that the dates must be stored MM/DD/YYYY.  

Diving deeper into dates becoming timestamps, we may encounter a data point in the format of  yyyy-MM-dd’T’HH:mm:ss*SSSZZZZ, and a timestamp in another database technology that stores them as yyyy-dd-MM’T’HH:mm:ss*SSSZZZZ – and the two systems may be integrated for an interoperability workflow purpose. Spot the difference? A simple flip of Months and Days again would create ambiguity in a simple but crucial data field that would have a lot of downstream implications. In order for data’s content to make sense and to eliminate ambiguities, we must start with form and understand the intent of the data.

Where Does Inaccurate Data Come From?

One of the largest sources of inaccurate data is a result of manual data entry. Although human error may not seem like a big deal, the problem is that a simple mistake can have a huge impact. A single keystroke could be the difference of a company thinking a deal profited them $100,000, when in reality it was only $10,000. Or going back to our date example, a data entry form could allow users to type in 04/05/2021 vs 05/04/2021 for a date field without any checks on data entry, potentially causing accuracy issues.

As mentioned, inaccurate data also comes from lack of standardization. The problem occurs when the same type of information is presented in multiple ways. This can include a variety of categories including dates, street addresses, locations, business names, customer names, etc. 

Photo by Annie Spratt on Unsplash

Benefits of Data Accuracy

  • Improve Business Decisions
    Accurate data is the key to making business decisions. With accurate information, businesses can generate trusted insights to make reliable choices. Having the right information when making these decisions will benefit the overall growth and performance of the company. Basically, better data means more optimized decision-making.
  • Maximize Resource Efficiency
    Allocate time and energy towards relevant decision making 
  • Improve Customer Satisfaction
    Having accurate data will allow businesses insight into how their customers are feeling and responding to their products or services. This is key to understanding customer’s needs and addressing them. 
  • Meet Requirements
    For highly regulated industries, such as healthcare, finance, and government, it is critical that data is accurate. For example, take into consideration a CT scan that was ordered with the wrong date. If the patient never got the scan or got it late, the result could be fatal.

Tips to Ensure Data Accuracy

01
Analyze, Analyze, Analyze

It’s best practice to repeatedly analyze data over time. As data grows, this can become too much for manual review, which is why businesses should adopt a comparison tool to automate this process.

02
Compare

Businesses should dive into their data and compare consistency throughout various stores. At Qualytics, we offer exactly that with our product Compare. It works with enhanced profiling data to ensure consistency between two data stores. Compare highlights schema differences but also underlying data quality problems, so businesses can focus and optimize their efforts.

03
Check

Businesses should regularly conduct data quality checks to find common issues. It’s especially important to be on the lookout for duplicate and/or incomplete data. The solution is using our product, Protect, that enables users to always check for accuracy of data at-rest or data in-flight.

04
Data Quality Assurance Program

Create a detailed plan to assure data quality evaluating a number of key quality attributes. At Qualytics, we focus on the Q8: completeness, coverage, conformity, consistency, duplication, timeliness, volumetrics, and, of course, accuracy. Things to consider when creating Data Quality Assurance Plans: data quality objectives, requirements, methods/procedures, structure, and actions users can take to correct quality issues.

How Does the Qualytics Data Firewall Address Accuracy?

Screenshot of Protect Feature of Qualytics’ Firewall
Screenshot of Protect Feature of Qualytics’ Firewall

Qualytics Protect trains on the metadata collected from your data stores to learn what accurate values look like for your business. It then derives data quality checks custom tailored to detect any anomalies in your systems and applies those checks continuously to identify inaccurate data in realtime as it enters your data pipelines.

Pipeline integration examples: Prefect, Airflow, and Airbyte
Pipeline integration examples: Prefect, Airflow, and Airbyte

Upon detection of an accuracy concern, Qualytics Protect can alert your Data Analysts in your favorite collaboration app, initiate automated responses using your Data Engineers’ favorite toolchains, or even quarantine the suspicious data records to prevent them from infecting downstream systems.

Automatically. Thoroughly. Continuously.

Alert examples: Email, Teams, and Slack
Alert examples: Email, Teams, and Slack

How Does the Qualytics Data Firewall Address Accuracy of a Data Migration?

Qualytics Compare ensures the accuracy of a data migration by using proprietary algorithms to generate deep profiles of your Source and Target datastores that are then analyzed to identify any inaccuracies that were introduced in the migration process. By employing a sophisticated metadata-backed abstraction layer, Compare can ensure accurate replication of data values even when the Source and Target systems use entirely different database technology. 

Cloud Migration
Cloud Migration

Compare fully supports on-premise to cloud migrations, migrations between datastore vendors, and even migrations from database management systems to raw object stores. Built for “Big Data”, Qualytics Compare can easily handle data migrations of any size. 

Intuitively. Efficiently. Correctly.

Qualytics is the complete solution to instill trust and confidence in your enterprise data ecosystem. It seamlessly connects to your databases, warehouses, and source systems, proactively improving data quality through anomaly detection, signaling and workflow. Learn more about the Qualytics 8 factors in our other blogs here – Accuracy, Duplication. Let’s talk about your data quality today. Contact us at hello@qualytics.co.