Expert Voices: MIT's Stuart Madnick on Information Management and the Agile Corporation

By Allan Alter  |  Posted 05-31-2006

Expert Voices: MIT's Stuart Madnick on Information Management and the Agile Corporation

Stuart Madnick, a professor of information technology at the Massachusetts Institute of Technology's Sloan School of Business since 1972, has been thinking about how to get more value out of information longer than just about anyone. He began his career creating operating systems and studying software development, then moved on to co-direct or chair MIT's research programs on productivity from information technology and total data quality management. In the meantime, he co-founded four high-tech companies. He's enjoyed enough success to buy a 600 year-old castle in Northumbria, England, which he has run as a hotel for 20 years.

CIOs have devoted a great deal of effort to make their IT infrastructure more flexible and agile. Madnick argues their company's information needs to be agile too. The challenge today, he says, is "to be able to use information for purposes we haven't anticipated before and at speeds far faster than before." Imposing standards and structure on data can't do the job entirely: Too much information is unstructured, and standardizing all the data you have is impractical.

The solution, Madnick believes, lies in "aggregation" and "mediation": finding and pulling together the information someone needs, whatever the source or how the data is structured, and standardizing it on the fly to suit the user's preferences. Imagine a global comparison shopping Web site that scans Web sites for the best price, but reports back the price in the currency of your choice, with taxes and shipping costs, and you have an idea how this might benefit buyers. But this approach could, if successful, also give corporations the means to find answers to all sorts of unanticipated ad hoc queries, improving decision-making and responsiveness to changes in the business environment. Madnick and his fellow researchers at MIT's DARPA-underwritten Context Interchange (COIN) project have been working for six years to develop the necessary technology. An edited version of Executive Editor Allan Alter's interview with Prof. Madnick follows:

Next page: Bad Answers, Bad Decisions

Part 2

: Bad Answers, Bad Decisions">

CIO INSIGHT: Why is making better use of information such a high priority now?

MADNICK: There are a number of factors that are interrelated. One is the issue of competitiveness, spurred on by globalization. The second is productivity. The U.S. economy has made amazing productivity gains, but companies are always under pressure to do more. The third factor is effectiveness. More and more, people don't want to deal with hassles; they want businesses to be effective without being too complex. Regulation is also a factor. All these forces are putting pressure on companies. In all these cases, you can do the job better by using information more effectively.

The amount of information we now have access to is enormous compared to just a decade ago. Think of it as winning the lottery: You've won $200 million. Now, what do you do with it? It's a question we haven't thought about before. What is feasible has changed.

What does making information more useful actually mean?

Running the business better and responding better, being more resilient. The challenge now is to be more agile in using our information, to be able to use it for purposes we haven't anticipated before and at speeds far faster than before.

I'm concerned about what I call "ad hoc" decisions, That is, decisions that are unlike anything you've ever had to make before. The levees are breaking in New Orleans. What do we do? I think CIOs have gotten better at building institutional decision-support systems where we have a decision to make over and over again, like bank loans, but we have still have difficulty with ad hocs.

The example I use is when Russia froze all its debt payments six, seven years ago. The chairman of the board of one of our research sponsors, a large financial services company, said, "How much money do we have at stake?" It took them three weeks to get an answer. The financial information existed, but the company had 200 systems scattered around the world, and nothing was in place to feed the information on Russia.

I would say that at best, most companies are fighting the last war. If a case like the Russian one happened to your company, you would build a system to deal with that decision. But what about the next decision that you haven't anticipated? I don't think we're any better able to do that now than we were two decades ago.

But we now have the Internet, we have search engines, we have much more integration between our systems. Isn't it easier to make decisions when unanticipated questions arise?

The kind of information needed to make such decisions is inside the databases in a company, not on the general Internet. And what is the organization's total exposure to Russia is not a question Google could answer. Adopting standards is a non-trivial issue, just from a pragmatic point of view: there's so much inertia and difficulty and cost involved. Organizations go through terrific efforts to standardize data throughout their organization. But what happens? They become acquired by a different company, or they'll acquire another major company. When you talk about access to internal information, and tying together a whole host of systems in the organization, I would say we're not making a lot of progress.

Next page: Aggregating the Data

Part 3 : Aggregating the Data

So how do you solve that problem?

One of the areas I've been looking at is what I call "information aggregators." An aggregator pools information from a variety of sources, with or without prior arrangement. A comparison aggregator, for example, looks at online camcorder vendors and gives the range of prices and bells and whistles for comparison purposes. Relation aggregators, like financial account aggregators, takes your relationships with Citibank, Bank of America, Schwab and so on, and pulls them together. Complementary aggregators take a theme of interest and take information gathered from different sources to create a complete picture. It might be a particular company if you are an investor, for instance, or real estate information at Zillow.com.

And while a search engine provides you with static Web pages, an aggregator can aggregate information from dynamic Web pages because it directly interacts with these Web sites. If searching for prices of TVs, it goes to an online retailer's Web site, asks what your price is for this TV, gets the information and reports it back.

Aggregators provide insights from a composite picture that the pieces alone never could. Zillow , for example, provides lots of information about your house: a map of where it is located; the assessed value of your house and neighboring houses: and other details. Max Miles consolidates airline frequent-flyer programs and gives you one consolidated mileage statement. But there are other things it can do which are neat: If you go on a trip with many legs, say Boston-London-Paris-Boston, at end of the month it can tell you if an airline neglected to give you credit. By putting together the total picture, it can alert you to missing pieces.

Aggregators could see how your product pricing compares with your competitors; people can do it by hand, but if you have 1,000 products and ten competitors, that's 10,000 prices you have to pull together.

I think aggregation has been tremendously under-utilized so far. Ask a CIO how many Web sites exist within their company. Each one is a stovepipe doing, providing, gathering, disseminating information on a particular issue. Aggregation can create a picture you've never had before.

There are so many issues that run across an organization where the data lies within stovepipes: vendor relationships, profitability, customer relationships. I once heard of a company where one division was giving an award to a suppler as Suppler of the Year, while another division was barring working with that supplier for poor performance. An internal aggregator could pull together vendor performance information.

But what about the information you have that's stored away in different places? If you actually listed the information, you'd probably be lucky if one percent or so is standardized.

Standardized data means the exact same information in the same format and with the same meaning: for example, having one country code for Brazil - not BR in one system, BRZ in other systems, and BA in another. Here's another example of non-standardized data: there are over 40 different standards for geographical coordinates. Even within the military, the U.S. missile command and the U.S. artillery command uses different geographical coordinates. If I want to do comparison shopping on the Web, someone has to convert the price information into a consistent form. Does the price include shipping and taxes? What currency is it in?

Next page: Mediation Systems

Part 4

: Mediation Systems">

How do you aggregate non-standardized data?

This is where the research we've done on "mediation systems," systems that are able to automatically translate non-standardized data on call, comes in. A mediator is a specific type of middleware that operates between a source of data, such as a database or semi-structured Web site (such as the price of stocks or books), and another piece of software that wants to use the data, called a receiver. The receiver might be a user making a data request or it might be an application program, using a database query language, such as SQL.

The role of mediators is to reconcile differences between the sources and receivers. These differences might relate to format (for example, the receiver wants database records and the source is a semi-structured HTML Web site), or meaning (the source provides prices in millions of Turkish liras and the receiver desires prices in U.S. dollars, with or without any Turkish or local U.S. taxes or shipping costs).

How unstructured can the data requested be?

Structure and standardization are two different dimensions. Structured data resembles tabular data, like a stock report, whereas unstructured is more like free-flowing text. At MIT, we're working with semi-structured data that is primarily tabular, such as weather reports that have temperature or wind speed, mixed with pictures of suns and moons and clouds. There are people working with natural language processing that can extract structured information from unstructured documents, but we're not working on those systems.

You've worked on mediation systems now for six or seven years. Why hasn't they been widely adopted?

Mediation is not as far along as simple aggregation. Today's aggregators are relatively straightforward. If you are comparing book prices in the U.S., the prices are already in U.S. dollars. Mediation becomes more important when you do complex aggregation, where you do more reconciliation and adjustment. The harder kind of mediation, logical or semantic mediation, has not been commercialized, to my knowledge. The research involved is not trivial.

How does your work tie in to Web services, XML and the Semantic Web?

Web services link programs and retrieve data across the Web. It's a technology that may be helpful, but it doesn't mediate. Web services as it exists today does not directly address the issue of semantics. XML is one of the basic Semantic Web technologies, but it's only a first or basic step. XML may have a precise way of indicating $14.95 is the price of a product, but XML itself doesn't tell you if it includes tax or not. There are additional layers in the Semantic Web agenda. Resource Description Framework (RDF) and Web Ontology Language (OWL) are only in a limited number of experimental systems, to my knowledge. They intend to provide a way to describe what information means: That the $14.95 is a monetary amount, and monetary amounts have currencies, and they may or may not include taxes.

Where does my work fit into this galaxy? I'm working with a team at MIT on Context Interchange (COIN) mediation technology. COIN makes an additional layer to Web services, so that the appropriate data and format and meaning of data are set between the services. The work we're focused on is primarily the issue of information exchange. It's one thing to say that 14.95 is in British pounds and includes the tax; it's another thing to know that Stuart Madnick wants prices in U.S. dollars without taxes, and to convert that 14.95 to U.S. dollars and eliminate the tax. We have developed various tools and techniques to facilitate that. And while doable, it is not to my knowledge the focus of the work on the Semantic Web. In many ways Semantic Web has much more ambitious goals. We are coming up with more specific solutions to much narrower problems.

If I'm a CIO, why should I care? CIOs are involved in developing infrastructures for IT organizations. They typically last a long time, and if you haven't planned for how they will change in the future, it's hard to change them later. A CIO can't close his company down and say come back when I'm done. Where semantic technologies are going in the future could have an important impact on the decisions you are making today on your future infrastructure. It does not matter whether you jump to semantic technologies today or tomorrow, but if you think it is coming down the road, the things you do today will make it easier or harder to jump.

What steps should CIOs focus on now if they want to make information more useful?

We've already talked about Web services. The other thing you hear people talk about today is service-oriented architecture (SOA). What both Web services and SOA are all about is how we modularize our systems. If you understand that more semantics in information is coming, you want to divide up your system in a way so it's easy to slide that in as it becomes more available.

Be aware of what's coming, and make modest commitments to experiment with some of these things. Rather than try to change an existing system that people depend on, experiment with a new system that's being put in that as initially used has a non-critical path so that the number of people who depend on it is fairly limited. Any place where you aggregate information would be a good place to look. You could pull together information from your shipping system and order-entry system. It doesn't replace your existing system, but will give you more value-added.

Semantic technologies won't happen overnight, but when they do they will come quickly.