Let’s Spend Some Data Quality Time Together

Gartner estimates that poor data quality costs an average organization $13.5 million per year. Drivers of these costs include lack of a common language for business information, independent data maintenance, and multiple versions of the truth.

Associations face similar challenges as they work with data from multiple specialized systems, often including an AMS, event registration, survey management, social media, and email marketing applications. Customer data such as demographics, job function, career stage, and company name is vital to associations as it drives pricing strategies, product offerings, and ultimately customer value.

To illustrate the impact of data quality, let’s suppose that customer records contain 50 attributes and each is inaccurate with probability of 5%.

(1-.05)^{50 =} .078

In this example, the result is that fewer than 10% complete and accurate customer records!

Technology solutions that combine data directly from source systems for analytics are particularly susceptible to the high cost of poor data quality. Complete and accurate data is also needed to leverage many predictive analytics and data mining techniques to ensure accurate data-guided decisions.

Like other business initiatives, successful data quality requires an optimal combination of people, process, and technology to serve as a foundation for successful association analytics.

Process

The activities which occur during the “collect” step of DSK’s methodology address data quality. The data source inventory identifies the location and specifics of the data which is used to answer business questions in addition to reference data, such as standard job title and demographic values. The master data management policy includes rules for acceptable value ranges, confidence thresholds for automatic linking, procedures for adding allowable values, and monitoring strategies. The dictionary of common business terms maps to this information and further communicates a shared understanding of data.

The initial process is iterative as cleaning and deduplication techniques are applied to historical data while confirming allowable values and thresholds. Since the best way to minimize data quality issues is preventing them at the source, close collaboration with source system groups establishes data entry standards as part of data governance. Ongoing processes incorporate these results to immediately improve data analytics, improve efficiency, and ensure accurate capture of data history for slowly-changing dimensions.

Data quality activities also provide immediate benefits through improved operational efficiency by reducing time-consuming and tedious tasks of correcting data and identifying duplicate records.

Technology

Often important association data consists of free-form text, such as company names and job titles, which are created and maintained by customers, sometimes in multiple systems. This data cannot simply be cleaned using basic data queries that rely on exactly matching discrete values.

Leading Extract, Transform, and Load (ETL) tools such as Microsoft SQL Server Integration Services (SSIS) provide important data quality tools that leverage text matching algorithms to clean data.

Data quality features include:

Data Quality Services: Knowledge-driven approach providing interactive and automated techniques to manage data quality using domains.
Fuzzy Lookup: Comparison of values against reference data to create similarity and confidence measures to automatically link data or flag for manual review.
Fuzzy Grouping: Creation of groups of candidate duplicate records assigned probability scores.

An added benefit of these tools is that data quality improves at a greater rate over time as the knowledge base of domain data and matched values grows over time, essentially creating a self-learning system. The technology also maintains audit data to allow data quality to serve as a business area and leverage the same data analytics and Tableau visualization tools as other association business areas.

People

Incorporating data quality as a priority within the data analytics process enhances trust in data, demonstrates tangible benefits, improves efficiency, and strengthens adoption of data analytics. Like data-guided decisions, the core of these benefits is people.

DSK understands the cost of not focusing on data quality. That’s why we include data cleaning as a core part of our proven process to position your association for success with data analytics.

Blog