Build Your Own Yahoo!

The itch to create your own online portal eventually strikes just about every web searcher, usually after you've built up a collection of a few thousand choice bookmarks or favorites that you'd love to share with the rest of the world. There are several ways to scratch this itch, and to do it properly, you should make sure you have the right tools for the job.

Using web site managers like Homesite or FrontPage is an easy way to build your own directory of web sites, but you'll find that the effort quickly gets out of hand. A better choice is to use directory building and maintenance software that takes the gruntwork out of building your own directory.

These programs provide tools for establishing categories, adding new links, checking and deleting broken links, and all of the other tasks associated with maintaining a directory. And, of course, they include a search engine customized to provide best results for users of your directory.

Most of these programs require some programming knowledge, as well as access to your own web server. While they help produce and maintain high-quality directories, they're not for the technically faint of heart.

Here are four interesting tools for building your own Yahoo-like web directory. Note: I haven't used these programs extensively, so these are not endorsements -- what follows is simply an overview of what's available.

Hyperseek bills itself as the "high-end" search engine and directory software. The program comes with tons of features, and is the highest-priced of those covered here at $749. HyperSeek Requires the following: Perl Version 5.x, running on Unix or Windows NT (Unix is preferred). The company recommends at least 32 mb RAM on your server, and sufficient hard drive space (for example, 100,000 entries will take approximately 30 - 40 MB of disk space).

Hyperseek offers a scaled down version called iLink as freeware. iLink, in essence, lets you try before you buy, and when you're ready to make the jump from iLink to Hyperseek, your listings will be absorbed easily into the more robust Hyperseek system.

Links 2.0 offers many of the same features as Hyperseek at much lest cost: $150. Although it's designed to be easy to use, installation requires some technical skills and the ability to directly access your web server. Links v2.0 runs on almost any server including Windows, Unix or Mac, and a current version of Perl 5 (version 5.004 or better). The company offers a number of online forums and resources for help with installation, as well as a directory of installers, some of whom will install the program for free.

TurboSeek's design philosophy is to make creating and maintaining directories as easy as possible, even for those who lack programming skills. A TurboSeek License goes for $169, and for an additional $25 the company will install the program on your server. TurboSeek requires a Unix based host (Windows not supported at this stage) and Perl 5.002 or greater.

Link Voyager is a remotely hosted directory building and maintenance service. It requires no software to run -- you manage everything through your own browser, and all of the heavy lifting is done on Link Voyager's own servers.

In addition to providing the capability to build and maintain your own directory, Link Voyager offers an unique addition -- a crawler that can help you intelligently populate your directory with minimal effort. Once you've set up your directory structure, keywords appropriate to your topic, and "seeded" the crawler with links that you consider relevant, Link Voyager will use a variety of techniques to seek out and discover related web sites.

It builds a queue of these sites for you to review at your leisure. Adding one of these "found" listings to your directory is as simple as deciding which category it belongs in and making a few mouse clicks.

This capability makes Link Voyager an intriguing tool for competitive intelligence professionals, market researchers, or anyone else who manually searches the web for web sites falling into particular categories. The directory created by Link Voyager needn't be public -- it could serve as your own private, highly focused portal for your specific industry or knowledge area.

Creating and maintaining a web directory isn't for everyone. But if you find the urge to create your own mini-Yahoo just too strong to ignore, check out these directory building services. They'll largely free you from the drudgery associated with maintaining a comprehensive, fresh directory, allowing you to focus on what's really important -- delivering the most relevant listings to your users.

More on Google Filetypes

Yesterday's SearchDay described Google's new effort to index non-HTML file types, including Microsoft Word, Excel, and PowerPoint formats, as well as Real Text Format and PostScript files. While continuing to decline to provide specific numbers, Google spokesperson David Krane offered a bit more insight into how many of the new file types are now included in Google's index:

"We now have more than 35 million non-HTML files in our index. This includes 22 million PDFs, and a number of Microsoft Office documents, Corel WordPerfect files, etc."

According to Krane, the top three non-HTML formats in Google's index are:

AltaVista Plans Listing Enhancements

AltaVista plans to announce a new program that will allow webmasters to enhance their listings by adding logos, icons, custom taglines and text links to URLs. Though the program has not yet been officially announced, a brief summary of the program is available. I'll be following up with AltaVista in the next day or two and will provide more details in SearchDay next week.

About the author

Chris Sherman is a frequent contributor to several information industry journals. He's written several books, including The McGraw-Hill CD ROM Handbook and The Invisible Web: Uncovering Information Sources Search Engines Can't See, co-authored with Gary Price. Chris has written about search and search engines since 1994, when he developed online searching tutorials for several clients. From 1998 to 2001, he was About.com's Web Search Guide.