Networked, dynamic business processes built at a very granular level can produce billions and trillions of bytes of data each month. Given all this, it must be understood that the demands of big data have traditionally outstripped any improvements in technology cost/performance. Fortunately, new architectures and approaches have evolved over the last decade that can simplify managing these enormous data volumes, approaches that are finally being incorporated into the enterprise architectures of many large companies. These include:
Database Appliances and Accelerators: Relational database technology has evolved dramatically over the last decade, allowing terabytes, even petabytes, of data to be loaded and queried quickly and efficiently on a single platform. Database appliances bundle storage, processing, interconnects, and query processing onto a dedicated hardware and software platform optimized for database performance and management. Database accelerators use innovative storage and query optimizations to reduce database size and accelerate complex query performance. Where hardware upgrades on traditional relational databases might improve performance by a factor of two, appliances and accelerators can improve price-performance by a factor of 100. Most important, these technologies simplify management and administration by eliminating the need for expert tuning and configuration.
NOSQL Data Stores: A technology literally born from the Internet, Not-Only-SQL technology was designed from the start to manage enormous, distributed data sets that can be queried in milliseconds. Instead of normalizing data into relational tables that are then joined for answers, very large data sets are distributed across hundreds or thousands of processors, organized so that related data is stored together. Queries run in parallel across all processors, each returning answers based on its local data. This incredibly simple and scalable approach is very efficient and flexible, allowing for a wide variety of data types to be stored together, as well as sophisticated queries to be run.
Automated Analytics: Harvesting insights from big data requires analytics, and, in most companies, this is the domain of a small number of highly trained specialists. Capturing, cleansing, and combing through terabytes of data is often more art than science, and most analysts will tell you that their manual processes cannot be automated. However, over the last decade, advances in self-learning algorithms, genetic algorithms, and automated testing have produced programs that discover patterns, generate insights, and improve over time--in other words, they learn. These systems might not always outperform their human counterparts, but their automated processes might be the only way to scale to the demands of big data.
This article was originally published on 01-12-2012