As we move forward from the last decade of the 20th century, where the focus was on deploying IT solutions to the 21st century, which is the Information age, we realize how data both enables and cripples companies and industries. The focus of ERPs, CRMs and various other systems deployed in the 20th century was to integrate systems and to collect data. The focus of the current century professionals is to gather insights from the data collected in various databases and warehouses. As the focus on Data Analytics and Data Science increases, enterprise companies are increasingly realizing the nuisance created by poor data quality. As per The Data Warehousing Institute estimates that data quality problems cost US businesses $600 billion a year

Organizations can potentially lose and alienate customers by inaccurately addressing emails or failing to recognizing them when customer call. This loss of customer is a potential loss of future revenue. It is time for companies to view data as a critical resource and make a corporate commitment to manage data quality leading to establishing a program with dedicated people, process, systems and tools to achieve high data quality. Although, data quality problems will exist in every business unit of a company, this white paper will focus specifically on Customer Data Quality, which has a direct impact on sales and thus, revenue.

What is Data Quality?

While we continue to discuss the causes, impacts and solutions for poor data quality, it remains pertinent to define what data quality is and what its constituent attributes are. Data quality is not just data that is devoid of errors. Incorrect data is just one of the characteristics of bad data. A complete definition of Data Quality is made up of 6 attributes mentioned below.

Attributes of Data Quality:

Data quality is not just limited to the accuracy of raw data, but a list of attributes that characterize the quality of data:


1. Accurate: Data should accurately represent reality or a verifiable source
2. Consistent: Data elements should be consistently defined and understood
3. Complete: All the necessary data should be present
4. Valid: All data values should fall within acceptable ranges defined by business.
5. Real time: Data should be available when needed
6. Accessible: Data should be easily accessible, understandable and usable.

Customer Data Quality

According to Gartner, the amount of prospect and customer data in a business-to-business organization typically doubles every 12 to 18 months. Further, between 10 to 25 percent of customer and prospect data has critical data errors ranging from inaccurate demographic data to lack to current disposition. Organizations need to maintain high data quality in order to ensure efficient operation of CRM systems on which they have spent millions of dollars. While CRM systems act as integrators, their output is only as good as the quality of the data that they use as input. Maintenance of high customer data quality also ensures maintenance of a meaningful and accurate set of customer metrics. Achieving and maintaining a level of highly integrated and consistent customer data, necessary for CRM success is not an easy task. Lack of customer data quality may also lead to inaccurate sales forecasting which sets inaccurate goals for the entire organization, lost sales opportunities for new sales, upsell, and cross sell opportunities and incorrect sales data compensation allocation which further leads to loss of employee morale. In order to understand customer data better, it is important to understand the architecture for customer data in an enterprise organization.

Customer Data Architecture

The Customer Data Architecture provides a base for collecting, storing and consuming customer information. It defines the systems, databases, data flow and data consumption systems used in a typical enterprise organization. The self-explanatory Customer Data Architecture shown below would be used as a baseline to identify and to detail Customer Data Quality problems and then to identify solutions to said Data Quality problems.


Sources of Poor Customer Data Quality

Now that we have drawn our focus to customer data quality and briefly on the impacts that poor

customer data quality might have, let us look at some common sources of poor customer data quality:

1. Human error factor: As is mostly the case, most data pollution issues start with inaccurate data entry. The more manual the process of data entry is, the more inconsistent, inaccurate and incomplete is the data. Since sales is a relationship driven activity, most of the data entry at CRM or ERP systems is manual. The added complexity is the resistance of the sales agents to new creating new processes and systems

2. Multi-channel data inconsistency: Since the focus of sales organizations is to make it easier for customers to purchase products and services, B2B organization sales business units often create multiple touch points to serve the customer. Customer data usually resides in multiple systems like CRM, Point of Sale systems, Billing Systems etc, coupled with an organization’s efforts to purchase data from 3rd Party providers or validate customer information from 3rd Party providers, it makes achieving high quality customer data a nightmare.

3. Multiple data stores: Supporting the multiple data source systems are multiple data stores, which have their own integration limitations along with multiple data formats, attribute definitions and table structures.

4. Data definition and Validation: In all probability, each of the system identified in the Customer Data Architecture are used by a different team, which implies that each team would have their own definition for data and a different set of validation rules that govern data integration. It can become challenging for any CRM system and sales team to create a consistent customer definition from this chaotic data architecture.

5. Business Process Breakdowns: While data migration and conversion projects are performed with the good intent of reducing data discrepancies, ETL tools that pull data from one system to another also generate defects. Although systems integrators may convert databases from one format to another, they may fail to integrate different business processes. This breakdown now becomes the source of creating data pollution.

6. Dynamically Changing Dimension: Mergers and acquisitions, which are quite common place in the enterprise IT world, pose another challenge for data consistency. The intent of an M&A might be to acquire product or to acquire a new demographic, but the challenge in integrating the acquiring company systems with that of the acquired company systems, creates a maze of data that is difficult to navigate. Another case of M&A that significantly impacts customer data is that of M&A in the client structures. If, for example, company A merges with company B, and both company A and B were customers of firm C, it is a huge task for company C to collate company A and B sales and customer data to ensure ongoing opportunity mapping and maintaining a clear view of future upsell, cross sell opportunities.

In summary, making it easier for the customer to purchase product in turn poses a risk of a diminishing customer data quality for the organization, which increases operational costs and jeopardizes future
sales opportunities.

Achieving High Customer Data Quality

The end goal for every enterprise company is to achieve high Customer Data Quality. The more structured the approach to achieve this goal, the sooner the goal is reached. The recommendations mentioned below are a result of years of implementation and consulting efforts with methods that lead to Data Quality success:

1. Creation of a Data Quality Team: Since data inflow and consumption is an ongoing activity with an ever-increasing scope, the Data Quality improvement and maintenance effort should also be ongoing and continuous. It is recommended to identify data owners for the customer data who manage a team of data stewards. This team of data stewards constantly comes up with process, policy and system improvements with the dual goal of improving data quality, and work efficiency. Data Custodians should implement the changes the data stewards recommend. In most enterprise firms, Data Custodians are the IT team who have to align the data architecture with the recommendations of the Data Stewards and Owners.

2. Profiling data: Creating a weighted average metric to measure data quality provides an organization with a perspective into what the data quality situation is and what is the short and long-term goal for data quality. Data quality metrics can be formed using the attributes mentioned above. The data should be accurate, consistent, complete, valid, real-time and accessible. For each firm, the importance, availability and requirement for the data against these attributes would differ. Once the data is measured against these metrics, data attributes can be identified in red, yellow and green zones, which further helps, identify the sources of
data pollution. As an example, the following metric table can be created to keep a track of the customer data quality on an ongoing basis. As more fields are added, the metric calculation would become more robust and the current situation of the quality of customer data more


3. Data Quality Remediation: Remediation of poor data quality needs to have a 3 pronged
approach. Improving quality at source, improving quality at storage and consumption level and improving pre-existing data pollution. All these three are explained below:

a. Improving data at source: Since the biggest source of data quality errors is data entry by humans, more automation and digitization is the most likely solution. With more and more automation solutions available in the market, investing in solutions that give sales agents suggestions of customer name, address, contact information etc. would help reduce manual entry and thus reduce inconsistency, and inaccuracy in the data.

b. Improving data at storage: Like mentioned before, enterprise IT sales cycles would always comprise of multiple channels of sales that use multiple systems. The data from these multiple systems would inadvertently be stored in multiple databases, which may be inconsistent. In order to change this situation, there needs to be a change made to the data architecture. A middleware that acts as a data filter between data storage and data consuming systems is the need of the hour. This middleware will house the business rules applicable to the data for each consuming system depending on the need. Additionally, creating a middleware would avoid the cost of integrating multiple data sources and avoid further data mess created by ETL tools used for integrations.


c. Improving pre-existing data pollution: While the above-mentioned 2 steps would help improve situations going forward, it is important to look back the mistakes of the past and rectify them. Creating a write-back program that determines sources of polluted data and creates methodologies to write either higher quality data back to source or at the middleware level is critical to having a long lasting effect on Data Quality. In addition to a write-back program, a continuous cleanup effort for preexisting polluted data paves the way to achieving success in achieving high quality data.


In conclusion, achieving high quality customer data and then maintaining it, is not beyond any company. The more initiative an organization takes in this matter, the higher rewards it would reap by means of reduced wastage, increased sales and improved customer loyalty. Technology can help automate most of the recommendations provided. The most effective data quality improvement initiatives are created by the leaders of the organization, which ensures vision, commitment, and drive. Truly, the time to start these initiatives is now!

Leave a Reply