Every company has too much unstructured data to manage.

Content management may be the information equivalent of the Gold Rush: Give workers the tools they need, and they'll find the nugget of knowledge gold buried in a ton of data dirt. The only problem is, in many cases you're the one who's actually helping create the dirt.

Most of the data that companies accumulate isn't created by workers typing names and numbers into database fields. That's structured data, information with clear labels defining exactly what the information is about, making it easier to find and use. It's the unstructured data—unlabeled information with no indication of what it is or how it's to be used—that creates headaches.

Every time someone writes a Word document, types an e-mail message, creates an Adobe Acrobat file, or generates an audio or video file—using all those tools that IT provides—another shovelful of unstructured data is tossed onto the company's information mountain. How much? Analysts estimate that fully 85 percent of all the data in an organization is unstructured—and that the amount of unstructured data in the average business doubles every two months.

Why is that a problem? Suppose you're looking for a company's address. If that information is sitting in a structured database, you simply use a query tool to search for the company's record and the address field appears. Simple. But suppose you're searching the uncharted, unstructured vastness of the Web for that same information nugget. Sure, it's probably somewhere on the company's Web site. But unless the word "address" is sitting nearby on the same HTML page containing the data you're after, a keyword search isn't likely to find it.

It may not be a big deal to go to the company's Web site, find the "About" page, and write down the address, but it requires a lot more time and effort than typing a few keywords. In fact, IDC research says the average knowledge worker spends 21/2 hours a day panning for information nuggets in unstructured sources like Web pages and Word files—even though many of those pages and files may be their own. A year's cost for 1,000 knowledge workers not finding what they need, according to IDC: $6 million.

Welcome to the world of unstructured data.

Ask Your CTO:

How much of our data is unstructured?

Tell Your Business Constituents:

Here's the difference between structured and unstructured data—and here's why getting the information you need is so difficult.

Ask Your Data Guru:

How much of our current efforts are focused on managing information in databases versus helping workers better coordinate and use their unstructured data?

This article was originally published on 05-01-2003
eWeek eWeek

Have the latest technology news and resources emailed to you everyday.