By Anant Gupta
Despite the sometimes exaggerated hype surrounding big data, the fundamental assertion is true: data—and the decisions driven by data—now represent the next frontier of innovation and productivity.
Some large companies have used emerging technologies to extract significant value from big data. Visa recently announced that increasing the number of attributes it analyzes in each credit card transaction from 40 to 200 has saved the company 6 cents in every $100 worth of transactions. Wal-Mart uses a self-teaching semantic search tool that, honed by the monthly clickstream data of 45 million online shoppers, tailors offerings to online customers, has raised the rate of completed transactions by more than 10 percent.
But for most businesses, the promise of big data is not close to being fulfilled. So what makes extracting value from big data so difficult? Here are five factors.
The Three Vs: Volume, Velocity and Variety
First, big data is often characterized by the three Vs: its tremendous volume, the velocity at which the data needs to be processed and the variety of data types that it encompasses. The first two characteristics are fairly obvious; technology has made it possible to capture increasingly large amounts of information and make it available for analysis in real-time.
But mining the value of big data is difficult because it requires simultaneously analyzing various types of information—transactions, log data, social media interactions, machine data, geospatial data, video and audio data, and so on—much of which is unstructured. Traditional types of business data were available in a structured format and could have been automatically analyzed, such as a spreadsheet quantifying customer returns of different products at various stores over time. However, much of the value in big data exists in unstructured information, such as the transcript of a chat session between a retail customer and a customer service representative.
Synthesizing unstructured data from numerous sources and extracting relevant information from it can be as much an art as a science.
A Talent Shortage
A lot of articles, blog posts and opinion pieces have been written about the looming talent gap. Estimates suggest that the U.S. alone faces a shortage of 140,000 to 190,000 people with deep analytical skills, as well as 1.5 million analysts and managers to analyze big data and make decisions based on those findings. Another report predicts that only one-third of the 4.4 million big data jobs created by 2015 will be filled. Unlike traditional analytics, mining big data requires an extremely diverse set of skills including data visualization, statistics, machine learning and computer programming. Government policy should work to mitigate this talent shortage through forward-looking education and immigration policies.
Flawed Data Governance
Big data is not a substitute for—let alone a solution for—flawed information management practices.
Big data requires rigorous data governance structures. Without them, IT systems that have not been upgraded to handle large volumes of data are likely to collapse under the sheer weight of data being processed. Surveys suggest that business leaders are often more excited about the potential of big data than their IT counterparts, but that might be because of IT executives’ better understanding of the difficult reality of harnessing big data’s value.
A Lack of a Data-Driven Mind Set
Because mind set can be hard to accurately pin down, its power is often underestimated. That is a mistake when it comes to assessing the prerequisites to successful analytics deployment. It is virtually impossible for big data investments to deliver value if business leaders do not have a data-driven mind set—that is, if they do not believe that it is important for decisions to be based on hard numbers rather than gut feel and experience. But once the right mind set takes hold, other positive things will follow. For instance, data-driven business leaders will have a tremendous incentive to treat data, and the IT and analytics professionals who help deliver it in an understandable form, as a strategic asset. And these leaders are more likely to make it a priority to ease the flow of data across organizational silos.
An Absence of Technical Knowledge
Big data represents a convergence of IT and data science. The technologies include Hadoop, which enables the large-scale processing of diverse datasets; R, a programming language for statistics; and in-memory databases, which is where data resides on main memory as opposed to disk storage. Data science includes, among many other areas, machine learning and data warehousing. Big data professionals are expected to be familiar with both disciplines, but this combination is rare, despite the training courses that are sprouting up globally.
The emerging applications of big data is alluring, but the barriers to turning its promise into business benefits are daunting, to say the least. For businesses just beginning to explore big data, it is important to consider these challenges and develop well-defined and actionable business objectives before committing staff and financial investments to any big data initiatives.
About the Author
Anant Gupta is the CEO of HCL Technologies.