Companies have more information than ever, but they are ill prepared to use it when unexpected situations arise. If companies want their information to be more useful, their information must be more agile, says MIT's Stuart Madnick.
: Mediation Systems">
How do you aggregate non-standardized data?
This is where the research we've done on "mediation systems," systems that are able to automatically translate non-standardized data on call, comes in. A mediator is a specific type of middleware that operates between a source of data, such as a database or semi-structured Web site (such as the price of stocks or books), and another piece of software that wants to use the data, called a receiver. The receiver might be a user making a data request or it might be an application program, using a database query language, such as SQL.
The role of mediators is to reconcile differences between the sources and receivers. These differences might relate to format (for example, the receiver wants database records and the source is a semi-structured HTML Web site), or meaning (the source provides prices in millions of Turkish liras and the receiver desires prices in U.S. dollars, with or without any Turkish or local U.S. taxes or shipping costs).
How unstructured can the data requested be?
Structure and standardization are two different dimensions. Structured data resembles tabular data, like a stock report, whereas unstructured is more like free-flowing text. At MIT, we're working with semi-structured data that is primarily tabular, such as weather reports that have temperature or wind speed, mixed with pictures of suns and moons and clouds. There are people working with natural language processing that can extract structured information from unstructured documents, but we're not working on those systems.
You've worked on mediation systems now for six or seven years. Why hasn't they been widely adopted?
Mediation is not as far along as simple aggregation. Today's aggregators are relatively straightforward. If you are comparing book prices in the U.S., the prices are already in U.S. dollars. Mediation becomes more important when you do complex aggregation, where you do more reconciliation and adjustment. The harder kind of mediation, logical or semantic mediation, has not been commercialized, to my knowledge. The research involved is not trivial.
How does your work tie in to Web services, XML and the Semantic Web?
Web services link programs and retrieve data across the Web. It's a technology that may be helpful, but it doesn't mediate. Web services as it exists today does not directly address the issue of semantics. XML is one of the basic Semantic Web technologies, but it's only a first or basic step. XML may have a precise way of indicating $14.95 is the price of a product, but XML itself doesn't tell you if it includes tax or not. There are additional layers in the Semantic Web agenda. Resource Description Framework (RDF) and Web Ontology Language (OWL) are only in a limited number of experimental systems, to my knowledge. They intend to provide a way to describe what information means: That the $14.95 is a monetary amount, and monetary amounts have currencies, and they may or may not include taxes.
Where does my work fit into this galaxy? I'm working with a team at MIT on Context Interchange (COIN) mediation technology. COIN makes an additional layer to Web services, so that the appropriate data and format and meaning of data are set between the services. The work we're focused on is primarily the issue of information exchange. It's one thing to say that 14.95 is in British pounds and includes the tax; it's another thing to know that Stuart Madnick wants prices in U.S. dollars without taxes, and to convert that 14.95 to U.S. dollars and eliminate the tax. We have developed various tools and techniques to facilitate that. And while doable, it is not to my knowledge the focus of the work on the Semantic Web. In many ways Semantic Web has much more ambitious goals. We are coming up with more specific solutions to much narrower problems.
If I'm a CIO, why should I care? CIOs are involved in developing infrastructures for IT organizations. They typically last a long time, and if you haven't planned for how they will change in the future, it's hard to change them later. A CIO can't close his company down and say come back when I'm done. Where semantic technologies are going in the future could have an important impact on the decisions you are making today on your future infrastructure. It does not matter whether you jump to semantic technologies today or tomorrow, but if you think it is coming down the road, the things you do today will make it easier or harder to jump.
What steps should CIOs focus on now if they want to make information more useful?
We've already talked about Web services. The other thing you hear people talk about today is service-oriented architecture (SOA). What both Web services and SOA are all about is how we modularize our systems. If you understand that more semantics in information is coming, you want to divide up your system in a way so it's easy to slide that in as it becomes more available.
Be aware of what's coming, and make modest commitments to experiment with some of these things. Rather than try to change an existing system that people depend on, experiment with a new system that's being put in that as initially used has a non-critical path so that the number of people who depend on it is fairly limited. Any place where you aggregate information would be a good place to look. You could pull together information from your shipping system and order-entry system. It doesn't replace your existing system, but will give you more value-added.
Semantic technologies won't happen overnight, but when they do they will come quickly.