For many companies, the recent explosion in data is not a result of increased business transactions or better use of information and analytics. Rather, it's the result of unmanaged replication. Large email attachments that are broadly distributed, hundreds of extracts from production systems sent nightly to departmental databases, and unclear archive-and-purge processes drive data growth without creating any new information. The value of big data comes from new information and insights, not copies of existing data, and there are three main ways in which to get started down the right path.
The first task is to separate the signal from the noise. First, begin reducing the noise by locking down and simplifying the data environment with ILM (information lifecycle management), data governance, and master data management. This does not mean waiting to get started on big data; rather, plan to retire two copies of legacy data for every new data source created.
Second, it is critical to identify (even broadly) what new information and insights big data can provide and how that will impact the business. We've done a number of case studies that serve to illustrate this in action. Some of the actions you can take include:
Voice of the Customer: Summarize call-center and customer e-mail correspondence nightly with text mining tools to prioritize top product and service issues and desired features.
Accelerate Analytic Processes: Create a multi-terabyte "analysis-ready" database to support common analytic needs, such as customer marketing segmentation. One company accelerated their go-to-market processes by an order of magnitude with this technique.
Business Event Detection: Design channels to identify important business events during interactions with customers and automate responses. At a large insurer, for example, timely, targeted responses to customer behavior based on such identification improved close rates 20 percent and increased retention 10 percent.
Third, define the smallest possible scope for success. Be rigorous in defining the new information that is needed, and then decide if big data is the only source. If it is, then assess the smallest set of data required to generate that information. Ask questions such as: How much history is needed for trend analysis? How granular is the data needed?