Pre-Browse The Content Of Any Site Via Look-Ahead Search

Scoping the contents of an information site is a tough task. that is why major search engines have become the major, most used and accessed gateways to anyone site pages. They are indeed, by far, the fastest and easiest method to discover and find out whether your site has any content relevant to their interests.

But major search engines may not provide the best of services if your site has lots of articles on a certain topic, if they are not properly titled, and if your site is not prominently ranked on those major search engines.

So, how can you can pre-empt your desire to know more about what a site contains without dwelling into the site itself to see it?

Photo credit:
">Aleksandar Milosevic

SurfwaxTom Holt, is launching today a new search technology called LookAhead and which may likely be the first tool to provide a first full-fledged solution to enhance content scoping and look-ahead of content clusters by topic or keyword.

I have taken the time to test drive and discuss this technology with Tom, and have recorded with him a good conversation about what LookAhead is all about.

You can listen to it right here (by clicking the play button here below) or read the full transcript of our conversation that follows.

Listen to the audio of this Good conversation:

Full transcript of conversation with Tom Holt, CEO of Surfwax and the man behind this new pre-search tool: LookAhead.

Robin Good: Hello everyone, here is Robin Good, live from Rome in Italy and today I am together with Tom Holt from SurfWax. Hello, Tom!

Tom Holt - LookAhead: Hello Robin! It is a pleasure to be talking with you.

RG: Well, not many people know Tom yet, and so the big fanfare, the beginning may not call for an immediate applause, but let's give a proper introduction to Tom, as he has been bringing to the Internet some interesting tools. SurfWax is a very interesting search engine with some very interesting characteristics. And, another tool from Tom that I have looked at and reviewed in the past on masternewmedia.org is Nextaris, some kind of newsmastering or RSS remixing filtering and aggregation tool that has also some unique capabilities. So Tom is the man directing and coordinating a small team that is doing actually very cutting-edge work, much ahead of many others, in developing tools that allow you to search and manage information in more effective ways than we are normally capable of.

Tom, would you like to give yourself a little bit more of introduction about who you are and what is your background?

Had an opportunity to work for several companies here in Silicon Valley when the Valley was still quite young. In 1998 we started working on tools to help folks articulate their search string.

We know people often come to the search box and they say "what term do I enter for this particular site and this particular search?". So we created a concept of what we call focus words, back in 1998, a controlled vocabulary or lexicon. From there, we developed a medical search tool which was highly rated by Search Engine Watch.

We then went on to collaboration tools and came out with, as you said, Nextaris, which is a integrated information exchange site that lets you not only look at a variety of different sources, but you can collect information from those sources easily, share that information, photos, whatever.

But what really excites us now is our look-ahead tool. It's a technology we developed numerous years ago as part of our 1998 model. And Look-Ahead, basically is, some people may look upon it as kind of an auto-completion technology, but it's much more than that.

It's a methodology by which websites can go ahead and create a unique lexicon of terms for their site, and as the user starts typing terms, or typing characters into the search box, for that site help will come up suggesting matching terms and articles appropriate to that site.

So, it's a way to easily and directly get into a site based upon the actual vocabulary of that site.

And what we found is, again, people come to a search box often not knowing what to type in, with Look-Ahead you can start to type, see what the vocabulary and content is on a site, and very easily then navigate to the appropriate pages.

RG: Well, Tom's really got the scoop right away out into the open while I wanted to, you know, warm up the waters a little bit before,.. but I guess I have to live with that. (Robin laughs) I am just joking, Tom. I'm happy you got the rabbit out of the hat immediately and yes, indeed my curiosity for today's good conversation was really focused on revealing this new service facility tool as you may want to call it, that allows you really to explore at a glance the content of a site without having to perform an actual search or start browsing the actual content of the site.

I've had the pleasure to spend a little time, a few days ago with Tom online in a real-time session supported by the excellent abilities of GoToMeeting screen-sharing solution, and iVocalize, that we were using at the time. And I was able to see what in practical terms this involves.

I'm going to take back some of my ability to recount this experience from the user standpoint. And what I saw, was on my own site, where Tom created an interesting and useful dummy example about how his technology would work:

On the Google search box that allows my readers to find content on my site, the moment that they would start typing, whatever key word or sentence on some topic or issue or tool, they would get immediately a popup box next to that search box that would reveal all of the articles that I have on that topic.

And the more that they went ahead specifying what they are looking for, the more things they would find without really going into the full search.

So, you get the immediate pre-browsing experience that opens up for you the inside of a website without really for you the need to dive into a full search, and explore one article at a time or to browse through the categories.

So I found that very fascinating. And, I want to ask Tom, you said you started this project from ideas that came up in your head in 1998. What was there to prompt you initially to do this type of work, Tom?

Tom Holt: Well, we found, Robin, again, that especially back in 1998, while search was, even when it was four years old or so, on the web, people still weren't very conversive.

In those days, people probably used just one term. So, for example, if someone was, wanted to find out the name of the cow that kicked over the lantern that started the famous Chicago fire, people would go to a search box and they would type in the word "Chicago".

And they'd obviously get a lot of results, very few of which were relevant.

So we found in 1998 that again, people didn't really know how to articulate their search. And out of that, we found again that if you could see what was available to you, it was almost like WYSIWYG on the graphical side, if you could see what is there on the site before you actually had to launch your search, you would as a user, perhaps recast or rearticulate the search terms you were using.

And so, based upon that input and experience, back again in the late '90's, we realized that we could improve search by helping, as you said perfectly, kind of a pre-browsing, opening up a site to the user, so they could kind of see what was there before they launched the search.

RG: Do you know of any other tool that does something similar?

Tom Holt: Well, I know that the Windows Help facility, when you go to Windows Help for example and you type... when you go to index and you type in a term it will auto-complete. So, there are auto-completion capabilities out there and there has been for several years.

Search Engine Watch mentioned a couple of sites that have this technology, but we as far as we know are the front-runner in developing this technology. Now, Google has come out about a year after we came out with our technology, they came out with Google Suggest, and that has auto-completion capability to it and so-forth.

But where we want to kind of place the emphasis in the uniqueness is in helping the user see the terminology in the site, the complete terminology.

Often the search engines don't fully crawl a website, or index all the pages and so forth.

We also provide the webmaster or the site owner with the ability to create their own unique vocabulary.

So, for example, if you type in the term, you start typing in the term "mountain lion," perhaps the term mountain lion is no where to be found on the site, but in fact the webmaster through our technology has said panther and mountain lion are the same, they are equivalent, so you could type in mountain lion and in fact find the pages with panther on it.

So, the ability to create a unique lexicon as we call it for a website really facilitates the process.

RG: Very interesting, so, let's see if I get this straight.

A webmaster or online publisher could license your technology, install it on his or her server, and then establish a vocabulary of reference words that can open up a trove of possibilities for the readers. And, those are trigger-keywords that the moment they are typed in the search box, before even the user click or is pressing enter to get a search page results are going to prompt the user on all the alternatives on that very topic. Am I correct on this?

One other point is that initially we're launching this service as a web-based ASP service so that you can, as a webmaster or an online publisher, ...so that you can simply submit your URL or URL's to our service.

We crawl, index the content on your site. You can download that crawled index, modify it, then reload it as your lexicon.

Or, if you have a thesaurus or lexicon already available, you can go ahead and upload that. So, it is available initially on an ASP basis.

And, yes it does very much do what you've suggested, in terms of what you the author of the lexicon can incorporate, trigger key words, et cetera.

We also do count analysis and so forth so that a user... I can type in for example mountain "M-O-U-N-T-A-I-N," and I can see that there are a variety of different hits there, mountain lion is four, mountain beautiful is fifteen or whatever, so the point being again is that the user not only what the content is on a site, but they can have some sense of the frequency or the occurrence of those terms within a site.

RG: Yeah, that is very useful. And, what I see during your previous demonstration, what was quite useful was the fact that the webmaster could completely customize these popup results, this pre-browsing experience. And, either showing just the titles, the titles with an excerpt or description, the titles associated with dates or categories and they can be displayed in any whatsoever order and any whatsoever layout and fashion. Is this still true?

Tom Holt: Yes, very much so. The user, the client as we call them, the webmaster or whoever has the ability to completely customize what we call the "look-ahead dropdown".

You would call it a popup, but it in fact it drops down typically below the search box. You can say whether you want it to drop down to the left or to the right, to the right of the search box.

Everything is CSS based, so you have the ability to control as you say the color, the columnation of the data, et cetera.

This drop down, actually we have another term we use for it; it's really a search palette.

And Robin, what our vision is, is that we are going to be able to move the world beyond just the old traditional text area and search button, such that the user can, as you said "pre-browse", because the richness of the information in the drop-down, in this search palette as we call it, will be amazing.

All kinds of lexicon relationships, all kinds of iconic representations, et cetera, so that you can do a variety of things within the drop down before you even launch your search.

And, again you can launch your search directly from any of the items in the drop down, or in fact if you select any of the terms in the drop down, that can go to the website's search box where you, the user, can further compose additional aspects to that search string.

So, for example, if you typed in mountain, and you selected mountain lion from the drop down, it could take you directly to a mountain lion page, or it could just populate the search box with "mountain lion," and then you could further modify that.

RG: Cool, what about this vocabulary, this custom lexicon, how difficult is it to set it up, to fill in the information, and to match it with the trigger key-words that are going to be associated with specific contents? Is that a work intensive task?

Tom Holt: The answer is both yes and no.

There's no panacea to building a robust thesaurus or lexicon, but is really, we think, amazing, and we call it "Lex-it", kind of like Nike had the old term "Just do it", well, we say "Lex-it".

The beauty of Lex-it is that it will go through your pages, and extract out not just names and places and kind of proper terms, but also extract out concepts, action items, et cetera.

And, what we found is that if you go to somebody and say "you take your website from scratch, and you create a lexicon", they say you know, "get real this is too difficult".

But Lex-it generates this very robust list of extracted concepts and terms which makes the editing much much easier. But, regardless, depending on the size of your site, the developing of a lexicon can be relatively intense.

We figure that the Lex-it technology does 90-95% of the job for you. And therefore, you know, if you've got 100,000 terms, you're still going to have to spend several days attuning that technology.

We've also found in showing this to some search engine optimization experts that they really see the beauty in this and the use of Lex-it alone can help folks look at key word structures across pages.

So it's... I guess that's the long answer, sorry about that, the short answer is: Lex-it definitely helps you create your lexicon, but there is still some labor involved.

RG: So is Lex-it a software component of your service offering or is this a separate tool that one would have to buy?

Tom Holt: It's part of our offering.

Basically there are two steps involved to use Look-ahead on your site.

1) You have to develop a lexicon and if you have an existing thesaurus, or an A-Z index or whatever, a keyword list, perhaps you already have what you need and you can go directly to the second step which is:

2) importing that lexicon into Look-ahead, and setting Look-ahead up on your site, and you're ready to go.

If you don't have an existing lexicon or thesaurus, then you can use Lex-it, and that's available as part of the Look-ahead service.

So, there's two steps involved, you may not need the first step, but if you do, it's all part of the service.

RG: What about my need to associate certain keywords with a group of selected contents that I want to be seen immediately when users on my site type those certain keywords?

Tom Holt: That is addressed by you basically creating the lexicon to accommodate exactly what you said.

If you want certain terminology to come up first, you can put flags in, or count structures in basically that would prioritize those key words.

But what we found so far, again, that even for sites like Hewlett-Packard, which we've done some examples of, where you have hundreds of thousands of pages and the term "digital camera" comes up numerous times, is that through the use of the count field, thereby putting weight on different terms, you the lexicon developer can prioritize what folks see first. But primarily, the terms come up in alphabetical order if you don't take some extra steps.

RG: Yes, but what I was wondering is: can I say that if people start typing RSS, they should get not only all the articles that talk about RSS but also all the articles that talk about Atom, or other type of content that refers to news feeds in general? so that they may not actually contain the RSS acronym and I have to specifically point to these articles that would need to be listed when a specific key word is typed in?

Tom Holt: What you would do as the lexicon developer or author is you would show RSS goes to these URLs, and then you can also use our "see also" structure that would say it also goes to articles on Atom or on the flip side, if they type in "Atom" it would take them not only to articles on Atom but also to articles that relate to RSS. So you can set it as you want. You're the one that has to say, though, Atom and RSS are in fact related terms.

This goes back to what we developed in 1998 and that was in essence creating synonyms for a variety of terms, helping the user articulate again what they are intending to find. So, if someone back in 1998 in our system typed in the word "stock" S-T-O-C-K, are they looking for soup stock, are they looking for a rifle stock, are they looking for financial stocks? Likewise, you, as the author of a lexicon, can say that RSS and Atom are related, and in fact if someone types in RSS, give them both pages and articles on Atom and RSS.

RG: Great, thank you for answering that one. If somebody came up and said "but couldn't I do basically the same with Google Suggest on my side?" What would you answer?

Tom Holt: Right now, Google Suggest offers you ten results from the World Wide Web.

So that if your site is not highly ranked by Google, there is a good probability they are not going to find anything specific to your site when you type in bandwidth for example into the Google suggestion box.

That's why we've taken the approach let's go site-specific.

Let's deal directly with the sites.

Because by so doing, you're automatically reducing the overall size of the domain. Hewlett-Packard may have 250,000 pages. That's a much smaller domain, more focused then having to deal with the 8 billion plus pages that Google has indexed.

So, part of our unique offering is we can make... by customizing to your effort, using Lex-it and so forth, you can customize the lexicon so it reflects just your site, and not, in fact, the World Wide Web, as is offered by Google Suggest.

Also, when you the user start typing into the LookAhead box, the drop down we offer can have anywhere from 100 and 500 different items in it. We let you the webmaster control that. So you can offer to your user more depth in what they see right away in the drop-down then what Google Suggest offers.

We've found that there's a great kind of discovery or exploration process that goes on by our users when they start typing in B-A-N-D to find "bandwidth" they see bandwidth used in a variety of ways, and they can scroll up and down inside that drop-down and see alternate expressions and terminology that they may in fact not thought of. So that we offer site-centric use of this technology and kind of a broader or in-depth exposure to terms by offering more than just ten items.

RG: Yes, indeed, and I was just thinking that maybe your technology would solve an issue that I haven't been able to solve otherwise by using the Google search. My Master New Media site has a number of international language versions, but the Google search does not allow me to separate their results of that indexing process that they do, and so if somebody searches for RSS on my masternewmedia.org English site, they will get within the page results also results for the articles in Russian, Spanish, Italian and other languages if they contain RSS. That's by using the Google search. And I have no way... I have also asked Google to you know, keep the type of service they're offering which is altogether free I must say, and separate the results according to the different language sites. So, I was thinking as you were talking if I could not ideally place your technology on the different language sites and configure it so that when people start typing there they get pre-browsing experience relevant only to the language site they're looking at.

Tom Holt: Are these language sites Master New Media sites, or are they also other sites external to Master New Media?

RG: No, they're all Master New Media sites, and they are under the same domain, there the same set of content in another language.

Tom Holt: What you would do with LookAhead in that case is, and again the short answer yes, you could do it; you would in essence crawl the Italian pages in one effort, and you would create a lexicon. You would crawl the English in a different effort. You would crawl the Spanish in a different effort for example, such that when and you could either have multiple, each different language site you will...for example if I'm at masternewmedia.org Spanish version, there might be, there would be a search box there. I would load that page and I would start typing in a word in Spanish and up would come just the terms appropriate to the Spanish pages, the Spanish version of your site and those terms would only relate to the Spanish pages, if I understand you correctly. So, yes, you can do that.

RG: Great, all right, well, I certainly get more and more interested in being a guinea pig for this in the future possibly. But, let me ask you now, what is in fact your business model around this? You said you were going to provide initially an ASP service, so I guess that would be monthly subscription price, but I gather from your words that you're planning also to deliver a licensed version for people to install on their servers. Can you anticipate anything about the pricing model and cost?

Tom Holt: The ASP model there will be two aspects to the pricing. The first aspect is actually relative to the LookAhead functionality and the drop down.

That is based upon how many searches are done per day on an average basis on your site and the size of your lexicon.

So, for the first aspect of the pricing, obviously a site that has just a few words in the lexicon and process three million searches a day that would be a higher cost than a site that maybe had a hundred thousand words in the lexicon but in fact only had a thousand searches a day.

So Robin, we ask folks to look at our examples to get a handle on how exciting and how well LookAhead works and then to call us, because, each situation is a bit different.

We would like very much to have a published pricelist, but because different sites have different levels of search and different-sized lexicons, we can't publish a real simple pricing matrix.

That's for the LookAhead side.

The Lex-it side, again, that depends how many pages you want crawled. If your site has a thousand pages, that might cost anywhere from $100 a month, somewhere in that area, even less, and so forth. But, if precisely you have 50 or 60 thousand pages, based upon the content would have to give them a custom quote.

Right now we're taking the approach that we're going to charge both those folks who want to use Lex-it and those folks that actually want to implement LookAhead on their site and we will negotiate and work out the pricing structure with them.

We hope that within six months we'll have enough of a handle on how to be able to categorize different requests that we come up with a fixed, published pricing list. But at this juncture, it's going to be on a customized basis.

RG: Can people actually try out this service in some way?

Tom Holt: Yes, they can go to Lookahead.surfwax.com, signup for a free, no-obligation trial, it's very easy to sign-up. That trial will permit them to go ahead and do some sample crawling of their site or any site they want to.

They can download the results of that crawl. They can "massage" that download to create a lexicon and then they can upload it.

They have the ability to create sample pages and instantaneously see what the header results might look like.

They can customize the HTML and the CSS all part of the trial.

The trial is available for two weeks, and so basically there are some limits to the trial. They can only import so many terms, they can only crawl so many pages, but they can fully test out the capabilities without obligation or cost.

Tom Holt: We're launching it today, Robin, Monday, September 19, and we sincerely appreciate your interest in all the services we've provided over the last few years, and especially your interest in LookAhead. We're very excited about launching this new, what we believe, genre or paradigm, for a search.

RG: I think you guys indeed have indeed something very interesting under the hood there, and it is really fascinating to see you work. I invite all the listeners who have a large amount of content and base their profit and revenues on the effective search by their users to go and experiment Tom Holt's LookAhead technology because it's worthwhile knowing there is something like this out there.

From Robin Good, in Rome, Italy, this is all for today. I leave it to you, Tom, for the final formal remarks and hellos. I must thank you really for showing a creating such good and interesting technology. I wish you the very best on making it to the market. Ciao for now!

Tom Holt: Ciao Robin, and thank you again for this opportunity to inform your audience, your readers and listeners, about LookAhead. We look forward to answering their questions and improving the service to really open up some new vistas.