Top predictive analytics hurdles: creating the proper statistical algorithms and finding, sanitizing and accessing the needed data.
Implementing predictive analytics requires either a statistical analyst who understands your business and your data, or a team of people who can work together to combine data, business rules and business strategy.
When Yorkshire Water, the U.K. city's water and sewer utility, decided to use predictive analytics to determine the likelihood of failure in any of its pipes, "we had to create a pretty robust proof of concept first," says Melvin Parkinson, IT specialist at the utility. "And to do that, we needed to get our operations people together with technical people like myself and people with statistical knowledge."
Yorkshire Water, which provides 2.2 million households and 140,000 businesses with water and sewer services, decided it didn't have sufficient in-house expertise. So it turned to SPSS's consulting services to help implement the firm's data-mining and predictive-analytics software, called Clementime.
Predicting the likelihood of a pipe failure would allow the utility to fix the problem before a pipe burst. The utility's operations people already knew what factors increase the likelihood of flooding: the number of cellared properties, the length of sewers, rainfall, minor and major excavation work in the area, age of pipes, and dozens more. But when deciding which parts of the system needed work, most operations people could only juggle about four factors in making their recommendations.
The work with SPSS's statistical consultant did not always flow smoothly. "It was very time-consuming to explain our business and maintenance methods, and have the consultants program that into statistical form, test it, and then fix things that didn't work well with the program," says Parkinson.
But as a result, Yorkshire Water now has an application that can analyze more than 120 factors, and ranks sections of the utility's system by risk of failure on a weekly basis. "At this point, the system is pretty much self-working, although we are always refining it a bit here and there," says Parkinson.
Unfortunately, not every organization has the large, robust database Yorkshire Water had. Sometimes the need to create such systems becomes a major problem in developing predictions.
Time/Warner Retail Sales and Marketing, a subsidiary of Time Inc., is a magazine distribution company that regularly places about 400 Time Inc. and client magazine titles in 120,000 stores, generating more than $1 billion annually. For years, the division's marketing people made decisions about inventory in each store by poring over dozens of wholesaler reports. That laborious task led the company to consider predictive analytics to help in inventory, pricing, cover design, and other factors. But executives had to contend with the fact that much of the required data was owned and controlled by the wholesalers. Says Dilip Patel, Time/Warner director of BI systems and information management: "Building the database was not an easy task, because we had to deal with five major systems and dozens of smaller systems from mom-and-pop operations."
Time/Warner turned to Insightful Corp.'s S-PLUS Enterprise Server to deliver analytics to marketers, as well as to client publishers. But before it could be implemented, "we had to find common ground among wholesalers' systems," says Patel. The problems ranged from major formatting inconsistencies to relatively trivial but irksome idiosyncrasies. Some wholesalers didn't list stores by their actual retail addresses, for example, but by a description of their locationshelpful to the driver but useless in a database.
Time/Warner appointed Pittsburgh-based Management Science Associates Inc. to collect, standardize and consolidate the wholesaler data. MSA collected data on every Time/Warner publication, as well as data from its competitors (most wholesalers, who owned the data, agreed to provide it to Time/Warner). The competitive information helped Time/Warner to determine inventory for new publications.
MSA is responsible for cleaning, matching and verifying the store data, and for providing a monthly updated store-level sales data reporting system. According to Time/Warner's Patel, the ROI on the application is 282 percent since it was implemented in 2001, thanks primarily to a reduction in the amount of waste and publications returned from stores.
Do we have sufficient statistical analysis resources in-house?
What data do we have in-house, and how is it stored?