Shedding Light on the Problem of Dark Data
More than 80 percent of enterprise data is considered “dark,” defined as data that is captured and stored—yet never used.
Dark data refers to data that enterprises capture and store, often as part of their regular business processes, that they simply fail to use. Hidden and usually unstructured, dark data is expensive to store and secure, but most companies do so for compliance reasons.
There are many reasons why dark data is unused. Some holds little value. A lot does have value but enterprises haven’t developed the strategy, business processes and/or IT processes to extract or analyze it.
Dark data contributes to an “analytics deficit,” meaning enterprises have a surplus of data, but a shortage of insights from it. There is a latent opportunity to use more of their existing data to better serve their customers, compete and operate.
The costs of dark data include loading, updating, storing and managing unused data—which consumes IT personnel time, storage space and CPU cycles. This time and infrastructure could be better spent on higher-value work.
The most successful enterprises identify and extract value from more of their data assets. They also identify and reduce the cost of managing data with little or limited value.
Different parts of a large financial services organization might engage the same customer with multiple services, including mortgages, life insurance and commercial banking. Those divisions might store customer information that, in isolation, has no direct value so it goes untouched. But with unified analysis, it could offer a deeper customer view that might identify new cross-selling or upselling opportunities.
By identifying and moving historical transaction records to the cloud and/or Hadoop, you can reduce your storage cost. You also could create new analytics insights by correlating that data with other data in Hadoop.
The first step is to understand whether and how different data sets—databases, tables and columns in a data warehouse—are used, by whom and at what cost. These metrics help IT make better decisions about where to place their data and how to use it.
At a strategic level, the key is for enterprises to start thinking about how to make better decisions once they make new combinations of data with what they already have.