Applications: Data Management at work at the National Security Archive
It was once thought that the Internet would be the great equalizer, a place where knowledge could be shared across continents and cultures. Of course, in 2006, we know better: The rapid growth of the Web often makes finding credible information more difficult than the good old days of the Dewey Decimal System.
But at least one organization is fighting the good fight to keep the public informed: The National Security Archive. Located at George Washington University in Washington D.C., the NSA collects and organizes declassified U.S. government documents and shares them, for free, with anyone who wants them (photocopies cost 15 cents a page). Since its founding in 1985 by a group of journalists who wanted to make the documents they had gathered available to the public, the NSA has amassed nearly 2 million documents. It receives no government funding and gets its $2.3 million budget from private donations and sales of its own publications. It is the largest non-government library for declassified documents in the world, with 200 separate collections of materials.
If you¹re wondering how a non-profit organization that employs just 22 full-time staffers has the manpower to keep track of 2 million documents from 80 government agencies (and growing, by roughly 50,000 documents per year) and thousands of requests for information from virtually every newsroom in the world, you¹re not alone. In fact, up until recently, the NSA¹s entire archival and retrieval operation was paper-based. Each time the NSA would make a request for documents from a particular agency, the documents they received would be batched and assigned a sorting number, and each sorting number could contain literally thousands of documents.
The documents were then filed and stored in a warehouse, their whereabouts entered into a rudimentary database. "It was literally like searching for a needle in a haystack," says Carlos Osorio, information systems manager for the NSA. By 2004, the process had become "completely and utterly overwhelming." Researchers looking for a particular document would have to search through literally hundreds of boxes, scanning page by page until they found the right data. "Sometimes you would find things entirely by luck," because the data wasn¹t properly filed, Osorio says. "We would have to order up 20 boxes from the warehouse and search by hand for the right documents."
Finally, in 2005, the NSA installed document management software, called Alchemy, from Bellevue, Wash.-based Captaris Inc. Now, all incoming documents are digitally scanned and stored in a searchable content management system. The change has transformed the way the organization operates. "Now we know where everything is," Osorio says. Instead of waiting weeks and months for documents to be located, "we can have it at the speed of light." Documents can be quickly searched by keywords, saving researchers several hours per day, and lowering the costly expense of photocopying and retrieving data from the organization¹s warehouse. Osorio estimates the archive has already saved tens of thousands of dollars as a result of the implementation. "Plus," he adds, "it helps us preserve rare documents and make sure we don¹t misplace them."
Having all that data at one¹s fingertips is not just a matter of convenience; sometimes, it can be a matter of national security. "I remember back in 1999 when the New York Times published a story about Chinese scientists being used as spies in the U.S. to gain information about our nuclear technology. We supplied the Times with some documents for that piece. The same day the story was published, the Pentagon called us asking us for copies of our documents. They didn¹t have them."
But the real value, Osorio says, is the ability to provide important information to the public, and the ability to preserve history. "This technology will help us increase our output and provide a better service to the people," Osorio says, "But it will also help to keep our collective memory sharp."