By Michael Vizard
While there is a lot of controversy these days about the amount of data that the National Security Agency and other intelligence groups are collecting, analyzing all that data in ways that make it actionable is still a major challenge, regardless of how omnipotent an organization is perceived to be.
Speaking at the recent Security Innovation Network Summit in New York, Dawn Meyerriecks, deputy director for the directorate of science and technology at the Central Intelligence Agency, says that ingesting all of the data the agency requires remains a major challenge. And even once it is collected, analyzing it all in real-time is next to impossible.
“To watch all the video that currently moves across the Internet in one minute would take five years to watch,” says Meyerriecks. “And we can’t ingest all that data at scale.”
As a result, the CIA is concentrating its research and development investments on analytics applications and systems that would enable the agency to more easily analyze data where it resides as opposed to trying to store it in one central data warehouse, Meyerriecks says.
Most of that research and development activity is being managed through In-Q-tel, a venture capital firm created by the CIA, and the Intelligence Advanced Research Projects Activity (IARPA) organization that the Department of Advanced Research Projects Agency has set up.
Meyerriecks says that specific projects, such as IARPA’s Aggregative Contingent Estimation (ACE), is investigating advanced analytics technologies that would make it easier to analyze data in place. And IARPA’s Knowledge Discovery and Dissemination program is looking into adapter and semantic technologies that would make it less difficult to discover data and establish some meaningful context around it.
While the CIA is clearly operating at a level of scale that goes beyond the average enterprise, Howard Dresner, chief research officer for Dresner Advisory Services, says the agency is encountering many of the same advanced analytics challenges facing IT organizations as they move deeper into the realm of big data. Even with the use of Hadoop as a framework for storing data, the cost of collecting and correlating massive amounts of big data is still enormous.
To mitigate those costs, it would be less expensive if the analytics could be applied across federated sources of data. “That’s not something anybody is going to solve any time soon,” says Dresner. “They would first have to come up with a standard way to index all the data first.”
Naturally, systems integrators see big data analytics as a significant opportunity. CSC, for example, just acquired Infochimps, a provider of data analytics as a service that aggregates data using an implementation of Hadoop and a NoSQL database.
According to Travis Koberg, director for data services for CSC, the systems integrator expects the world of big data analytics to be federated across applications that will span both on-premise and cloud computing platforms. “We trying to build an industrial strength platform for big data,” says Koberg. “But we still believe that most of these applications are going to wind up being federated.”
The degree to which that ultimately happens, however, is anybody’s guess. Right now the pendulum is swinging toward aggregating data in the cloud. But as the cost of aggregating all that data continues to increase, the IT community—and the intelligence community—are clearly looking to breakthroughs that would enable them to analyze massive amounts of data regardless of where the data resides.