The Qualytics 8 – Consistency

What is Data Quality and the Qualytics 8? 

We spent a lot of time dealing with data quality issues before taking a stance and building Qualytics. Through our experiences, we have ultimately identified and defined the Qualytics 8 as the key dimensions that must be addressed for a trustworthy data ecosystem. We believe that to achieve comprehensive data quality, data should be assessed in 8 main categories.

The Qualytics 8

Qualytics uses these 8 fundamental categories to assess data quality.

Completeness
required fields are fully populated
Coverage
availability and uniqueness of expected records
Conformity
Alignment of the content to the required standards, schemas, and formats
Consistency
the value is the same across all datastores within the organization
Precision
your data is the resolution that is expected - How tightly can you define your data?
}}
Timeliness
data is available when expected
Volumetrics
data has the same size and shape across similar cycles
Accuracy
your data represents the real-world values they are expected to model

Consistency Explained

According to DataCadamia, the definition of consistency with respect to data quality is, “..two data values drawn from separate data sets must not conflict with each other, although consistency does not necessarily imply correctness.” 

Data consistency means that the value is the same across all datastores within the organization. The enterprise environment is complex and data is extracted, loaded, moved, copied and stored in various data stores across the board. Consistency encompasses the principles through which we can guarantee consistent data between data stores.

Consistency is often confused with  Conformity. To clarify the difference between the two, let’s look at an example. Consider a typical bank  system where credit card transactions are commonplace. The data that flows in daily from various sources must meet data type constraints transaction amounts in integers, vendor as String, zip code as alphanumeric, etc. – this is the data’s conformity. Ensuring this data has same value across sources is its consistency – the raw incoming record into the credit card operational database is the same as the record in the data warehouse downstream.

As we take an agile approach to software development, we also take an agile approach towards our definition of our guiding principles – we learn & adapt every single day. This is why we decided that Duplication should be part of Consistency. Let’s talk a bit about why, and about why it matters.

Duplication is the redundancy of records and/or attributes. There are many sources of data duplication, such as messy data, and data ops complexity/misconfiguration. It can also be as simple as duplicate data entry into a source system. Another example is having more than one record per entity,  based on the type of information provided. This occurs when information can be entered in multiple ways, such as addresses with differentiation in “drive” versus “dr.”  A common scenario dealing with duplicate data would be Marketers using a CRM tool:

They may find multiple records for the same account which leads to inaccurate reporting, faulty metrics, and declining sender reputation.

What’s the Point?

  • Wasted Marketing Money – Example:  a direct mail campaign that uses duplicate data may cost double —or more—due to sending multiple pieces to the same person. 
  • ETL and Labor Costs 
  • KPI/Reporting/Audit Issues – KPIs represent wrong aggregate numbers with duplicate records that they may be based upon
  • Customer Service Engagement

Duplicate data rolls into Consistency because duplication breaks the core principles of Consistency – data is inconsistent between data stores with duplicates.

How Does Qualytics Address Consistency?

Consistency is very relevant for data validation efforts in large data migrations. Because our Compare product runs a comparison of a source and target data store for equivalency, consistency is a major factor taken into account – duplicate or inconsistent records between the two would fail data validation principles.  

You may be wondering, but what about data confidence after migration? That’s where Protect comes in. The same principles do apply to data in flight and at rest for anomaly detection, and is taken into account as our platform infers data quality rules in its inductive learning process.

How Does Qualytics Address Duplication?

In order for an organization to prevent duplication from happening, data quality must be addressed through every aspect of your enterprise data ecosystem. De-duplication is the process of finding duplicate data and merging the best data. The Qualytics Data Firewall automatically identifies the unique fields in your enterprise data and will detect & respond to the introduction of duplicate values for such fields.

Qualytics is the complete solution to instill trust and confidence in your enterprise data ecosystem. It seamlessly connects to your databases, warehouses, and source systems, proactively improving data quality through anomaly detection, signaling and workflow. Learn more about the Qualytics 8 factors in our other blogs here – Accuracy, Timeliness, Conformity. Let’s talk about your data quality today. Contact us at hello@qualytics.co.

Comments (1)

Comments are closed.