For the last half-dozen years, Tim Berners-Lee has been steadfastly promoting a vision of what his baby—the World Wide Web—should look like when it grows up.
He calls it the Semantic Web, so named because of his desire to bring not just order, but “meaning” to the vast amount of information on the Web.
Berners-Lee’s dream is to make it possible for computers to scour the Web and perform complex tasks—the kind that only people can currently handle. And the only way this can happen is if every piece of content on the Web is encoded with metadata that will allow computers to understand the meaning of the information.
Here’s how Berners-Lee envisioned the future in his landmark May 2001 Scientific American article, “The Semantic Web.” The scene describes a woman named Lucy telling her brother Peter that their mother needs new medical care:
Lucy explains to Peter that their mother needs to set up biweekly physical therapy sessions. She says she’ll use her “agent” to schedule the appointments. Using her handheld Web browser, Lucy instructs her Semantic Web agent to find information about the prescribed treatment from the doctor’s agent, scan lists of providers, and check for those with the highest ratings in her mother’s insurance plan within a 20-mile radius of her residence.
The agent then attempts to find available appointment times (provided by the agents of individual doctors, also via the Web) that match Peter’s and Lucy’s schedules. The Semantic Web agent completes all these tasks and, almost instantly, presents Lucy with a plan for her mother’s care.
It’s not quite HAL, from 2001: A Space Odyssey, but it’s close. And therein lies the problem. Computer scientists have had lots of trouble trying to impart artificial intelligence in highly controlled laboratory environments. Developing an intelligent agent that could operate across the World Wide Web is pure fantasy.
Everyone agrees that bringing more intelligence and order to the Web is the right thing to do. The disagreement comes over how best to do that. Berners-Lee, in his role as director of the World Wide Web Consortium (W3C), is pushing forward with his highly structured approach to the problem—developing complex new standards for posting Web content. Others—exemplified by the likes of search companies and Really Simple Syndication (RSS) publishers—are taking a less concerted yet more pragmatic approach that is paying dividends right away.
These technologies are designed to find relevant information amid the ever-expanding jumble of Web content. By the late 1990s, so much content was being posted to the Web that it was becoming increasingly difficult to find relevant information. Yahoo!’s early attempts to bring order, by having people create directories of Web sites, couldn’t keep up as the Web grew.
And computer search engines, such as those from Inktomi or AltaVista that looked for keywords, were yielding worsening results. With the limited amount of information embedded in Web content, computer search engines were not able to discern the difference between “Cork,” as in County Cork, Ireland; “cork,” as in a cork in a wine bottle; and “cork,” as in a cork oak tree.
Berners-Lee looked at this problem and saw it this way: “Most of the Web’s content today is designed for humans to read, not for computer programs to manipulate meaningfully,” he wrote in Scientific American. His solution was to get the W3C to develop the Resource Description Framework (RDF), RDF Schema and Web Ontology Language (OWL) standards. These standards allow programmers to embed information in Web content that, theoretically, tells a computer everything it needs to know about the meaning of the information: how one word relates to another word, for example, or the classes and subclasses of objects. It’s a bit like XML on steroids.
Semantic Web supporters such as Eric Miller, the person in charge of the W3C’s Semantic Web activities, like to point to some prominent companies that have developed Web sites using RDF. The one most often cited is Nokia, which has encoded the pages on its developer forum. But sites like these are few and far between. RDF and OWL are difficult to learn, time-consuming to use, and in the minds of some experts, don’t even work.
“Inferring metadata doesn’t work,” says Tim Bray, one of the principal developers of XML, who is now technology director at Sun Microsystems. “Google doesn’t infer metadata. It’s a deterministic calculation based on finding links and counting links. Inferring metadata by natural language processing has always been expensive and flaky with a poor return on investment. . . . To this day, I remain fairly unconvinced of the core Seman-tic Web proposition.” (Bray’s comments come from an interview in the February 2005 issue of Queue, published by the Association for Computing Machin-ery, or ACM.)
OWL, in particular, has garnered little support in the development community. “It’s dangerous when you see computer scientists use words like ‘ontology,'” cautions Peter O’Kelly, senior analyst with Burton Group. (Merriam-Webster’s Collegiate Dictionary defines ontology as “a branch of metaphysics concerned with the nature and relations of being.”) “There are lots of people that have talked about ontologies for years and have come away very frustrated.”
As Stanford students, Larry Page and Sergey Brin looked at the same problem—how to impart meaning to all the content on the Web—and decided to take a different approach. The two developed sophisticated software that relied on other clues to discover the meaning of content, such as which Web sites the information was linked to. And in 1998 they launched Google.
What most differentiates Google’s approach from Berners-Lee’s is that Google doesn’t require people to change the way they post content. As Sergey Brin told Infoworld’s 2002 CTO Forum, “I’d rather make progress by having computers under-stand what humans write, than by forcing -humans to write in ways that computers can understand.” In fact, Google has not participated at all in the W3C’s formulation of Semantic Web standards, says Eric Miller.
Yet Google’s impact on the Web is so dramatic that it probably makes more sense to call the next generation of the Web the “Google Web” rather than the “Semantic Web.”
Of course, Google’s approach doesn’t solve all the problems Berners-Lee set out to solve with his Semantic Web. But other technologies address some of those challenges as well. Take XML for example. Its attempt to tag content is less ambitious than RDF and OWL, but its very simplicity has meant that XML has become widely adopted by content creators. “XML was an outbreak of common sense,” O’Kelly says.
Many businesses are now using XML metadata to tag content so that computers can readily find, identify, and manipulate the information, much as an intelligent agent would. And RSS feeds, based on XML, allow individuals to have specific content sent directly to them, and they’re one of the hottest new schemes on the Web. (Today there are more than seven million RSS feeds, and the number is growing at the rate of 30,000 to 40,000 per day, according to Bray.)
The net result of all this activity is that Berners-Lee’s dream for the next stage of the Web is slowly unfolding, just not the way he envisioned. We aren’t likely to end up with a neatly constructed Semantic Web. Instead, it will be more of a patchwork quilt of homegrown solutions and standards. But isn’t that what the Web has always been about? n
Eric Nee, a longtime observer of Silicon Valley, has served in a variety of editorial positions at Forbes, Fortune and Upside magazines. His next column will appear in July.