The Data Revolution Will Be Televised

By Eric Pfeiffer  |  Posted 05-30-2006

John Merlin Williams is executive producer of the University of Michigan's Digital Media Commons, a multimedia facility, where he is working on the BlueStream project, an ambitious pilot project that searches and analyzes unstructured digital media files. The system uses commercially available software packages, including IBM Corp.'s Content Manager for access control and data modeling, Telestream Inc.'s FlipFactory for transcoding the video, Virage's VideoLogger for media analysis and metadata extraction, and IBM's VideoCharger and RealNetworks Inc.'s Helix Server to handle the streaming media.

The goal of the BlueStream project is make digital media easier to work with. The technology enables students to analyze the speech tracks within video files—say, Madeleine Albright giving a talk on foreign policy—and, assuming a good quality recording, automatically identify keywords at a 70 percent accuracy rate. These words become part of easy-to-search metadata. "We will even be able to track the voice structure of an individual and match it to existing libraries," Williams says.

Not only can faculty and students search for what Madeleine Albright said, and then listen to her speech via an iPod, laptop or mobile phone, they can also (if privacy concerns can be worked out), search for files that contain Albright's visual image. By combining standard image-recognition software (which analyzes the geometry of a person's face) with metadata tags, the system will allow students to search for a person's name and then pull up all video files with that person's image.

"If you look at the daily life of an average American, they are mostly dealing with multimedia, and it's coming from a number of sources," Williams says. "We need to be smart about how we plan to manage this data."

Derek Danois, the president of Berwyn, Pa.-based i3Archive Inc., agrees. His company helps radiologists at healthcare centers such as New York's Columbia Presbyterian, and also at the University of Pennsylvania, to quickly analyze an ever-increasing number of mammograms—about 35 million in the U.S. each year. The technology compares mammography images against a historical library and red-flags those outside the norm. A summary of the photographic data is extracted, indexed, and then analyzed using pattern recognition to match known cancer patterns with the new image.

Like Williams at the University of Michigan, Danois thinks as much about what the technology can do as what it can't. Often, for example, radiologists will annotate a mammogram with audio comments. At the moment i3Archive doesn't have the capability to analyze these comments. Still, Danois thinks these WAV files are important enough that they are saved and stored along with the other data. At a future date, "we can introduce technology that can search it and create metatags," Danois says. "There is so much out there. The first step is to prepare for the unknown."