As big data takes shape and organizations accumulate larger volumes of data, many CIOs are reexamining storage strategies and technologies.
By Samuel Greengard
One of the sobering realities of the digital age is that the volume, velocity and variety of data streaming into and across the enterprise is nearly overwhelming. According to IT consulting and services firm Wipro, more than 2.9 million emails are sent every second, 20 hours of video are uploaded every minute, and more than 50 million tweets are generated each and every day. In fact, Wipro estimates that total data volumes will grow 44-fold to 35 zettabytes per year from 2009 to 2020.
Amid all of this data activity is the need for faster and more agile storage strategies. "The environment is changing rapidly," states Joshua Greenbaum, principal at Enterprise Applications Consulting. "CIOs must understand that there's not just a need for storage to accommodate growing volumes of data, but also different types of data. In addition to transactional databases, there is sensor data, time series data, data from logs, social media data, audio and video, and much more. All of this data is measured in the billions of records. Some of it is extremely small and incremental while other data sets are huge. It's all over the map."
A Different Set of Demands
Certainly, big data creates very different demands on enterprises and their IT systems. What's more, there's typically little or no benefit from optimizing transactional systems that move data around in monolithic blocks. "They are not engineered to meet the needs of organizations using big data," says Sean Peterson, managing director at Accenture. Instead, Peterson says, CIOs must focus on a storage platform that supports big data at a more modular but multi-petabyte capacity. At the same time, there's a need to address the growing prevalence of hybrid architectures that combine old and new database forms.
Within this emerging data environment, it's critical to understand how an IT infrastructure impacts performance and what types of systems are required to drive business results along with dependable backups, disaster recovery and business continuity. Some organizations benefit by adopting commodity hardware and a so-called "shared nothing" infrastructure, Peterson says. For example, "the use of the commodity platform with shared storage may be right when workloads are smaller, and concerns about storage bottlenecks impairing performance are minimal."
On the other hand, packaged and engineered systems may be appropriate, particularly when time-to-implement is critical. "These solutions may involve more upfront hard costs than decentralized storage," says Peterson. "But because they bundle technology and software, it's possible to get them in place much more quickly and avoid the complexities—and additional costs—associated with implementing Hadoop and connecting hardware and systems."
Greenbaum says that CIOs should ultimately focus on elastic storage because "the type and quantity of data may also vary at different times of the day or week, or during different seasons." Moreover, it's critical to understand how decision-making takes place and when an enterprise requires faster access to data than traditional disk-based storage provides. Within this emerging model, solid-state drives and in-memory appliances play a growing role.
Virtualization and the cloud can create benefits and obstacles, Peterson says. "In many situations," he says, "the right big data platform will consist of clusters of smaller, commodity servers, rather than enterprise-class platforms. That means that storage will be handled locally at the individual server level, rather than centralized and shared." However, all of this doesn't mean that big data requires a replacement of existing infrastructure—not does it eliminate the need for virtualization. It's not a matter of either-or but, rather, how to use the various technologies in complementary ways—and develop data architectures that encompass both of them.
Define Your Goals
In the end, success spins a tight orbit around truly understanding the entire data ecosystem and environment—within an enterprise and across business partners and the Internet, Greenbaum says. He believes any big data storage strategy starts with a practical question: What exactly are we trying to accomplish?
"Once you understand the business case for any given task you can begin to design the systems, architecture and interfaces that produce excellent results," Greenbaum concludes. "Applications are the last mile of the problem. When an enterprise has defined its need and requirements, everything else falls into place."
About the Author
Samuel Greengard is a contributing writer for CIO Insight. To read his previous CIO Insight article, "Six Key Lessons of DevOps," click here.