The ‘deep web’: vast and uncharted . . . until now

The world wide web has become so large so fast that sophisticated search engines are just scratching the surface of its vast information reservoir, according to a study released July 26.

The 41-page research paper, prepared by a South Dakota company that has developed new software to plumb the internet’s depths, estimates the web is 500 times larger than the maps provided by popular search engines like Yahoo!, AltaVista, and Google.com.

These hidden information mines, well-known to the net-savvy, have become a tremendous source of frustration for educators and researchers who can’t find the information they need with a few simple keystrokes.

“These days, it seems like search engines are a little like the weather: Everyone likes to complain about them,” said Danny Sullivan, editor of SearchEngineWatch.com, which analyzes search engines.

For years, the uncharted territory of the internet’s World Wide Web sector has been dubbed the “invisible web.”

BrightPlanet, the Sioux Falls, S.D., start-up behind the July 26 report, describes the terrain as the “deep web” to distinguish it from the surface information captured by internet search engines.

“It’s not an invisible web anymore. That’s what’s so cool about what we are doing,” said Thane Paulsen, BrightPlanet’s general manager.

Many researchers suspected that these underused outposts of cyberspace represented a substantial chunk of the internet, but no one seems to have explored the web’s back roads as extensively as BrightPlanet.

Deploying new software developed over the past six months, BrightPlanet estimates there are now about 550 billion documents stored on the web.

Combined, popular internet search engines index about 1 billion pages. One of the first web search engines, Lycos, had an index of 54,000 pages in mid-1994.

While search engines obviously have come a long way since 1994, they aren’t indexing even more pages because an increasing amount of information is stored in evolving, giant databases set up by government agencies, universities, and corporations.

Search engines rely on technology that generally identifies “static” pages, rather than the “dynamic” information stored in databases.

This means that general-purpose search engines will guide users to the home site that houses a huge database, but finding out what’s in the database requires additional queries.

BrightPlanet believes it has developed a solution with software called “LexiBot.”

With a single search request, the technology not only searches the pages indexed by traditional search engines, but delves into the databases on the internet and fishes out the information in them.

The LexiBot isn’t for everyone, BrightPlanet executives concede. For one thing, the software costs money$89.95 after a free 30-day trial. For another, a LexiBot search isn’t fast. Typical searches will take 10 to 25 minutes to complete, but could require up to 90 minutes for the most complex requests.

“This isn’t for grandma when she is looking for chocolate chip recipes on the internet,” Paulsen said.

The privately held company expects LexiBot to be particularly popular in academic and scientific circles. It also plans to sell its technology and services to businesses.

About 95 percent of the information stored in the deep web is free, according to BrightPlanet.

Several internet veterans who reviewed BrightPlanet’s research were intrigued, but warned that the company’s software could be too overwhelming.

“The World Wide Web is getting to be so humongous that you need specialized engines. A centralized approach like this isn’t going to be successful,” predicted Carl Malamud, co-founder of Petaluma, Calif.-based Invisible Worlds.

Like BrightPlanet, Invisible Worlds is trying to extract more data hidden from search engines, but it is customizing the information.

Malamud calls this process “giving context to the content.”

Sullivan agreed that BrightPlanet’s greatest challenge will be showing businesses and individuals how to effectively deploy the company’s breakthrough.

“No one else has come up with something like this yet, so when they fetch people all this information on the deep web, they are going to have to show people where to dive in. Otherwise, people will just drown.”