All posts tagged ‘search’

The hits just keep getting killed off. Google is shutting down yet another service — the company’s domain blocking tool, which allowed logged-in users to block unwanted domains from Google’s search results.

Google’s site-blocking tool was originally aimed at “content farm spam,” but Google hasn’t done much with it of late, and it even stopped working for a while, despite being available via a link from your profile.

Now the service is officially gone, replaced by a Chrome add-on that does nearly the same thing. Unfortunately that means the ability to ban sites from Google’s search results is now limited to those using Google’s Chrome web browser. For more on the Chrome add-on see our earlier review.

The bad news about the Chrome extension is that it’s client-side filtering, not server-side. That means that if Google returns results from domains you’ve blocked those results are simply hidden (sometimes there’s even a brief flash of the blocked results).

That means you’ll end up with fewer search results than you would with the server-side solution, which filtered out your blocked domains before the results were sent. For example, if there are ten results on the first page and three are from domains you’ve blocked, using the add-on method you’ll only see seven results, whereas the server-side method would have fetched the next three results to show a total of ten.

If you used the account-based version of the blocking tool, you can head over to your account and grab the list of sites you had blocked. Just add those sites to the Chrome extension and you’ll be back up and running in no time, with not an Experts-Exchange, Quora or W3Schools link to be seen (or whatever you consider search results spam).

Even if you’re pretty good at searching, the majority of your website’s users are probably not. In fact, user experience expert Jakob Nielsen thinks most people are so bad at searching that site-specific search engines would do better to return navigation elements rather than actual search results.

Nielson’s research reveals that while more people reach for the search box to find what they’re after on a site, few of them “know how to use it.” The normally more prosaic Nielson writes:

It would certainly be nice if schools would get better at teaching kids how to search. But I don’t hold out much hope, because most people have the literary skills of an anteater (I was going to say, “a chimpanzee,” but these animals are too smart for my metaphor). Having new and varied vocabulary words spring from their foreheads wasn’t a survival skill for ice age hunters, so most people today can’t think up good queries without help.

Presumably Nielsen means literacy skills, not literary skills. That’s a pretty harsh critique, but if you’ve ever watched a less web-savvy friend or family member search for something you might be able to relate.

So how do you design your site’s search tool to help these “mediocre searchers” as Nielsen calls them?

Nielsen is critical of instant search suggestions, currently a popular way to help people using search tools. He claims that, while sometimes helpful, auto-complete tools can also be limiting because “users often view the drop-down as a mini-SERP and assume that it lists everything the site carries.”

The better way to do search according to Nielsen is to simply return product categories. The example in his report cites Costco, which, when searching for “television” will return all of its TV product categories rather than actual individual televisions. The product category links help users refine their choice and get to the televisions they actually want without having to wade through as many individual results.

It’s important to note that Nielsen is only advocating this sort of redirecting when the search term is “unambiguous and exactly matches the category.” As Nielsen notes, “until people begin to grasp the complexities of search and develop skills accordingly, businesses that take such extra steps to help users find what they need will improve customer success — and the bottom line.”

Imagine a search engine that threw out the web’s top one million sites and then searched what was left. Sounds insane, right? But that’s exactly what Million Short purports to do and the results are, well, interesting.

Million Short seems like a terrible idea. Why would you want to remove the top sites on the web from your search results? In most cases you wouldn’t, but what Million Short offers is a chance to discover sites that just don’t make it to the top of the results from more popular search engines like Google, Bing or even DuckDuckGo.

It could be that these missing sites are just small, or perhaps they don’t use cutthroat SEO tactics to compete for popular terms, or maybe they just cover topics so niche they’re unlikely to rise to the top of any but the most targeted of searches. It could also be that they’re content farms and other worthless pages. Whatever the case, skimming the top million sites off the web just might open your eyes to how narrow your filters (and Google’s) have made your results, and how that’s both good and bad.

As Million Short notes, popularity is not an inverse corollary to quality, but when the same popular sites show up over and over in your results you are inevitably missing out on something. And that’s what Million Short wants to show you.

It’s important to realize that Million Short is removing the top websites not just the top search results for individual queries. It’s also worth noting that Million Short doesn’t disclose where its search results are from, nor how it calculates the top sites. [Update: Sanjay Arora, founder of Exponential Labs, tells Webmonkey that Million Short is using “the Bing API… augmented with some of our own data” for search results. What constitutes a “top site” in Million Short is determined by Alexa and Million Short’s own crawl data.]

Most of the time, narrowing search results down to trusted, well-known sites like Google, Bing and other search engines do is a good thing. To see why just plug a few programming queries in Million Short and you’ll quickly realize just how helpful Stack Overflow — well inside the web’s top 1 million sites — has become. At the same time you might discover some unknown blog that will never make the top results in Google and happens to have the answer to exactly your problem. Is that better than the same answer from Stack Overflow? That’s up to you.

Million Short does offer some customization options you can use to both cut out the top sites and keep the handful you don’t want to be without. Additionally you can change the limit from the top million to the top 100,000, 10,000, 1,000 or 100 sites. If you decide you love it there is a search engine plugin that will work in Firefox, Chrome and Internet Explorer.

Perhaps the better way to think of Million Short is not so much a search engine, but a discovery engine. Million Short’s strength is not going to be answering the specific kind of queries that Google is forever optimizing its index to handle, but to discover less well-known sites and explore the more remote corners of the web that might be lost in other search indexes.

Some content is necessarily dynamic. If your site is just flat html files with no database behind them, there’s no easy way to build comments, contact forms or built-in search indexes. Luckily the web has a few solutions. For comments there are JavaScript solutions like Disqus or IntenseDebate, and contact forms can be built with Wufoo, but search is a little more difficult.

You could use Google’s Custom Search Engine tools, but then you’ll need to display things on Google’s terms (including a logo). Yahoo has a similar offering, but its results are often sub-par. The lack of search options for static sites led developer Jeff Kreeftmeijer to create Tapir, a JSON search API that indexes content from your site’s RSS feed.

Designed with static publishing systems in mind (like the popular Ruby on Rails tool, Jekyll), Tapir handles search through RSS and JavaScript without the overhead of a database on your own server. Tapir offers a JSON-based API and relies on Tire behind the scenes (which is powered by Elasticsearch, which in turn is powered by Lucene).

To use Tapir all you need to do is write a simple JavaScript-based search form, query the Tapir index for your site and then parse out the results to display for your visitors.

Tapir will parse and store the RSS feed you supply roughly every 15 minutes. For older posts (i.e. posts already long gone from your RSS feed) you’ll need to use the API to send over the data — something of a pain, but at least it’s a one-time pain.

If you’d like to give Tapir a try, just head over to the site, sign up for a token and read through the basic API docs for details on how to implement your search engine. The Tapir website says that sample code and better reference materials are coming soon, along with a JQuery plugin[Update: As Tapir creator, Kreeftmeijer, notes in the comments below, the JQuery plugin is now available].

Google has released its annual zeitgeist report, a look at how the world searched in the last year. The zeitgeist is Google’s record of popular search terms and draws on sources like Google Insights for Search and Google Trends. It’s also a reminder that, in addition to tracking you in the usual creepy ways, Google often reveals some interesting data.

The results are predictably disappointing — despite a year’s worth of events, Chatroulette and Apple’s iPad top the list of most popular searches — but the data visualization Google has created is impressive.

The visualizations combine HTML5 with some fancy JavaScript (which appears to rely on the Dojo framework) to offer maps, bar charts and timelines. The map is particularly cool, plotting out bar graphs of searches by country with an interactive timeline slider to narrow the results by month.

Other views include bar graphs of the top search terms by category. When you click on an individual bar, the graph morphs into a timeline.

There’s also a video with some overly-nostalgic music that walks you through the top terms of the year. Check it out: