Data Management: Getting CleanBy CIOinsight | Posted 08-01-2004
Data Management: Getting Clean
As long as there has been data, there have been errors. But at least in the good old days (way back in the 1980s, say), there was less data to manage. "The main source of customer data was basically the address to ship an item and an address to bill the customer," says Paul Kirby, a research director at AMR Research Inc. Today, data is considerably more complex. In addition to keeping track of vendors, suppliers, inventory and financial records, companies now track customers' buying habits, preferences and a host of other bits of data that make marketers salivate.
All of that drooling, of course, presumes that the numbers are accurate. But what if they're not? This May, Gartner Inc. released a startling statistic: More than 25 percent of critical data used in large corporations is flawed, due to human data-entry error, customer profile changes (such as a change of address) and a lack of proper corporate data standards. The result: soiled statistics, fallacious forecasting and sagging sales. What's more, the research firm says that through 2007 "more than 50 percent of data-warehouse projects will experience limited acceptance, if not outright failure, because they will not proactively address data-quality issues."
In other words, that major data-integration project that you championed for monthsand on which you staked your reputationcould ultimately fail, damaging your credibility and giving upper management one more reason to blame the IT department.
While it may seem ironic that companies spend far more time and money analyzing data than ensuring it's accurate, Gartner analyst Ted Friedman isn't surprised. "People become enamored with data-analysis tools and make assumptions that the data is readily available and in good shape," he says. "They gloss over the data issues, and that's a recipe for failure."
Gartner isn't the only research company tracking the data-quality problem. The Data Warehousing Institute estimates that companies lose more than $600 billion every year thanks to poor data. Forty billion of that can be attributed to the consumer packaged goods industry and retail supply chain alone, says Kosin Huang, a senior analyst at The Yankee Group. According to a recent study by Forrester Research Inc., 37 percent of companies cite duplicate and overlapping files as significant data-management problems.
Dirty data can damage every aspect of your business. On the customer-service side, bad and out-of-date information can mean failed marketing promotions and angry customers. In the supply chain, poor product data can cause production bottlenecks and slow down delivery orders to retailers. And if you're a company that has to meet with, say, Wal-Mart Store Inc.'s RFID mandate, flawed data increases the likelihood of glitches in that rollout, too. "Sure, you can track cases and different pallets," says Huang, "but you may be tracking the wrong thing."
The benefits of squeaky clean data are numerous (and, in many ways, obvious): Better customer data means improved customer service and a decreased risk of failed marketing promotions, not to mention new opportunities to cross- and up-sell. It also increases the accuracy of forecasting, makes your supply chain more efficient, and helps companies comply with federal regulations such as Sarbanes-Oxley.
"There's no doubt that a quality-checking process cannot be skimped," says J. Wadsworth, vice president of data services at MarketTouch, an Alpharetta, Ga.-based direct-marketing company. "If you're not doing it, you're throwing money to the wind."
At Hogan & Hartson LLP, a Washington D.C.-based law firm with about 1,000 attorneys in roughly 20 offices around the globe, Bill Gregory recognized the need for better data quality when he took over as CIO in June 2003. His law firm had undergone rapid growth in a short period, and customer files and other data needed to be integrated and checked for accuracy. But getting lawyers and senior partners to support the initiativeand agree to own and manage the datawas a challenge. Gregory says the biggest obstacle in rolling out his strategy was getting employees to "understand the implications of their data and visualize how the quality of their data affects business processes. That takes a lot of effort." Gartner's Friedman says this is a common problem. "It's a thorny issue because who's really responsible for quality of data? People abdicate responsibility," he notes. "Many people think data is IT's problem, and no one wants to step up and say it's their responsibility."
But analysts, vendors and CIOs agree: While IT's job is certainly to facilitate a good data-cleansing strategy, it's the business unit that has to own and manage the data. They're the ones who use the data, after all. "IT knows how the systems work, but the business side knows the method and the madness of its own processes," says AMR's Kirby. "The business must be the ultimate judge, but it needs help from IT on a systems level."
Patrick Wise, vice president of advanced technology for Landstar System Inc., a multimillion-dollar transportation company, says it wasn't difficult to convince business managers of this. "They were terribly excited about having a single view of the business, so they were very willing [to own the data]," he says. His finance department sees the value as well, Wise says, adding, "It's a big relief that we have this in place now that Sarbanes-Oxley is an issue."
To ensure that data quality remains high, analysts agree that CIOs should appoint a data stewarda liaison in charge of managing all the data all the time. But they disagree on where the data steward should sit, and on how many of them should be employed. Friedman argues that business units should appoint many data stewards, each responsible for specific data sets. For example, a call-center manager might be tapped as the steward of all customer data in a particular region. That way, the function will be smaller and can be handled by an employee who's already close to the data. "You want to place the accountability as close to the user as possible," he says. Other analysts think companies should appoint an overall data czar who reports to the COO.
|Data Quality Methodology|
Improving data quality isn't just a desirable (and profitable) goal, it's a processone that requires input and support from all areas of a business. Creating complementary and parallel processes not only saves time and gets better results (cleaner data), it fosters alignment between IT and the business units IT serves. But remember: The responsibility for data quality ultimately lies with the leaders of the business units, not the IT department.
Source: The Data Warehousing Institute
However the data-steward issue is solved, it's imperative for IT to sit down with members from each business unit in order to understand where the greatest data-quality problems are and decide upon standards for data handling. The goal here is to come to consensus on who owns which data, what types of information should be included in each record, how the data should be entered and how relationships between records should be defined.
At Emerson Process Management, a $3.2 billion operating unit of Emerson, Nancy Rybeck, the division's customer-data warehouse-strategy architect, was tasked with building a data warehouse after an earlier attempt had failed. The division, which produces valves, pressure devices and software for power plants, refineries and food and beverage manufacturers, has 14 subdivisions, more than 10,000 employees and over 100,000 customers around the world. While each division makes separate products, their client base often overlaps, and because each division owned its own data, there was no way to get an overall view of the company's business.
Three years ago, Rybeck began an effort to consolidate and clean the division's customer database. It was a significant challenge, she says. "Conceptually it doesn't sound that difficult, but you can't tell just by looking at the information how these pieces all match up."
To make it happen, Rybeck first spoke to all the business unit heads in order to get a better understanding of their processes, the kind of data they needed, and how that data could best be delivered. Then she created a relational data model to meet those parameters. Finally, Rybeck took in the data from the various units and began the long process of cleaning, consolidating and "de-duplicating," using software from Group 1 Software Inc., and appointing a data steward in each business unit to coordinate the effort. Today, the company is able to look at their customers from many angles. "We have a report that shows all the business we have done with a client, broken down by division. Then we can break it down by world area, then by country, then by site," she says.
Keeping the data clean is a constant effort. The system automatically processes 1.7 million addresses per month, some of which are reviewed by individuals. Software does much of the work, but the reviewing process obviously still requires a lot of manual labor. "Some data requires actual eyes," she says, especially when dealing with information about companies in foreign countries. And finding the resources to make that happen can be tough. "You can't just clean the data in one pass. It takes a manual review in some places, and it might take several passes through the software to get it straight," she says.
It's also important to decide when data gets cleaned, and how often. Some companies deploy cleansing software for one-time events, such as a marketing promotion, so data is cleaned when a particular department needs the information. While analysts admit that many companies still approach data quality this way, nearly all warn that one-stop cleaning is a waste of time. "Data quality degrades," says Friedman. "It has a half-life, like radioactive material, depending on the business activity. In certain situations quality will degrade faster than in others. So you can't cleanse once and stop, you have to do it on a constant basis."
A number of cleansing-software vendors have appeared within the last five years, and their offerings are largely similar. Some focus specifically on CRM applications, while others also offer tools for the supply chain and inventory management. Regardless of which you pick, analysts say data-cleansing software should include several important tools: data profiling, which helps analyze data and find inconsistencies; parsing, which identifies different types of data and puts them in specific fields; standardization, which ensures consistency throughout the data; verification, comparing customer data against a universal master such as the U.S. Postal Service; matching, which links files that are related; and consolidation, which eliminates duplicate entries.
Meanwhile, realizing that the failure of so many CRM rollouts can be attributed to bad data, many CRM vendors are getting into the data-cleansing game themselves. Their offerings are limited to customer data, however, and are tightly woven into the softwarewhich won't help if you're looking to clean financial files and product records.
In fact, you may find yourself partnering with more than one vendor, depending on the tools they provide. Gregory, of Hogan & Hartson, says his company uses a combination of cleansing tools from DataFlux Corp. and CRM provider Interface Software Inc. "I think they can and should co-exist," he says, "although ultimately there needs to be a central data-management function that is aware of and coordinates use of these types of products."
Patrik Riese, director of CRM at Saab Cars USA Inc., says his data-cleansing initiative uses software from both Firstlogic Inc. and Siebel Systems Inc., depending on which data needs to be cleaned, and how. Firstlogic provides the automaker with census data that helps the company match potential customers with the appropriate dealer in their area. "We used to do this by zip code, but we had all sorts of problems with that method because sometimes zip codes overlap. We have now moved to a much more finite method."
In short, if you're looking to create master data files that can be repurposed for several packaged applications (such as CRM, sales force automation, billing, marketing, etc.), make sure the software addresses that area of the business. Find out which CRM and inventory-management companies potential cleansing vendors have relationships with. Most cleansing companies have now been bought by larger companies (DataFlux was acquired by SAS in 2000, for example, and Trillium Software was recently purchased by Harte-Hanks Inc.), but as always, check the financial status of a vendor before you buy their software.
Data cleansing will cramp your budget, no question. And it's not only the cost of the software, which analysts say can range anywhere between $100,000 and $500,000 depending on company size and the amount of data you need to clean. To that, add any custom tools, and don't forget to tack on the cost of employee labor and training. Rybeck, of Emerson Process Management, says her rollout cost roughly $250,000, while Gregory of Hogan & Hartson says his company has spent roughly $100,000 thus far. And though vendors say installing the software should take from three to six months, it will likely take much longer to actually clean the data. It's a significant commitment, which is why most vendors right now cater to large-cap companies. For smaller companies, says Friedman, an outsourced approach might be the way to go. "A midsize company may not have the skills or the resources to be successful" with an in-house initiative, he says. But regardless of company size, "the most important thing is to think of data quality in a broad waynot just in the context of CRM or data warehousing, but as a business discipline."
Even so, as much money and time as it requires, data cleansing isn't going to be hard to justify once business managers understand the rewards. While companies can't always put a dollar figure on the return on their data-cleansing investment, they can cite better control and reduced costs of their marketing campaigns, the ability to cross- and up-sell valuable customers, improve supply chain efficiency and reduce risk.
Riese at Saab Cars, for example, says his data-cleansing initiative has saved the company tens of thousands of dollars because they can now do customized marketing campaigns in-house instead of hiring a third party. And, he says, more accurate matching of potential customers to dealers has helped the company increase sales. "The entire process of assigning a lead to a dealer and then following up on the back end, we have that nailed now. Last year was the best sales year in Saab's history, and I think what we have done has contributed to that."
And it doesn't take a genius to figure out that better data means better forecasting. "Data quality is the backbone of business intelligence," says Gartner's Friedman. Wise of Landstar says his data-quality rolloutwhich includes cleaning as well as standardizing (or normalizing) data across the company's six business unitswill ultimately enable the company to move to a single-customer-view model. This will not only help the company create greater sales opportunities and a more comprehensive view of the company's top customers, it will also reduce risk. "We provide a service and bill later, so if you're extending credit without a single view of the customer, you don't know what your exposure is." Wise says a single view of his company's financial systems is already complete, and his next target is Landstar's customer database, which has more than 15 years' worth of data on hundreds of thousands of customersmany of which actually belong to the same parent company. Wise won't reveal how much he expects to save, but says "it's a significant strategic decision," adding that his department's data-management team has tripled in size in the past three years alone.
For Wadsworth of MarketTouch, the direct marketing company, the return on investment is repeat business. "Our customers wouldn't come back to us if our data were wrong."