Pathetic to absurd to disheartening in 97 queries.

While we had anecdotal evidence from our customers and the general public that the House search engine is below-standard, even the barest of data sets now indicates the degree of antipathy we apparently have for site visitors. After manually searching the most frequent queries on House.gov (see a table of the terms and top-ten results of each), I have arrived at the following initial conclusions:

Most of the 970 results are found returned from the Energy and Commerce Committee’s website from the 107th Congress (2001-2002). Their webmaster at the time ([my current supervisor]) said he did nothing to optimize the site’s pages for the House search engine. Generally speaking, there appears to be no rhyme or reason to how the algorithm determines relevance.

Queries are, in fact, case-sensitive. “Nancy Pelosi” and “nancy pelosi” produce different quantities of results, but the most “relevant” of each query are nearly identical. Commonly used search engines (Google, Yahoo!, etc.) are not case-sensitive.

Queries for singular and plural nouns are identical. “committee” and “committees” produce the exact same results.

The Pell Grant Underfunding PDF by the Oversight Committee that appears in the top-ten results of most state searches deserves further scrutiny. How this managed to be considered more relevant than any member-generated page in Colorado, Missouri, North Carolina, and Texas is worth determining.

Pages and documents produced by Energy and Commerce, Foreign Affairs, and Ways and Means Committees form a clear plurality of all results. I can proffer no explanation.

There is no apparent weight given to the title of a document. Untitled documents are not subdued in the results.

The search engine’s inability to discern titles of non-HTML documents (PDFs, MS Office docs, etc.) is a glaring drawback. In a screen reader’s links mode, “No Title Provided for This Document” is a sadistic running joke, a more verbose take on the “click here” gag.

The decision to keep House leadership links off the home page of House.gov last year was made in insipience (not mine, thank you). Ten of the tested queries are for House leadership; six are variations on Nancy Pelosi or Speaker of the House, and two of those are among the top-ten most frequent queries. None of the top-ten results for any leadership queries is a page from any leadership site.

I haven’t determined the criteria and percentage yet, but my generous estimate is that 15% of the results are at least relevant in the way that someone who can only write their own name is literate. The number of first-results that are relevant is less than the number of successful Apollo moon landings.

I have not yet codified “valuable” as opposed to “relevant” search results, but we know them when we see them. Of the 97 queries, only two produced a valuable first-result.

An enterprise search engine should provide reporting of this nature automatically, regularly, and with much more robust data (like click-through rates for different results, detailed session data, etc.). The existing search engine does not provide this kind of reporting, and without this information, it diminishes our ability to make sound information architecture decisions. To paraphrase Samuel Johnson: it is not only dull itself, it is the cause of dullness in others.

What I expected to encounter was a way to address customer complaints through improved metadata and other SEO techniques to compensate for a search engine that doesn’t dig deeply enough. What I discovered is that the current search engine doesn’t merely produce worthless results: it willfully and flagrantly leads site visitors astray. A piquant example of the insurmountable distance between visitors’ expectations and this engine’s results is that a search for ‘contact’ — the 14th most frequent query, and of the 97, the one with the most results (157,508) — yields not an overall House directory (which doesn’t exist) or any contact information of any kind for any House office in the first 25 results. Instead, the first 25 results feature testimony by staff of 1-800-CONTACTS from a 2002 Energy and Commerce Committee hearing, PDFs from the Foreign Affairs Committee (whoever ‘Dianne’ is should be determined and she should be consulted since she apparently also knows the secret to breaking into the top search results), and the 9/11 Commission report at #24.

I believe that this wisp of foul data not only crystallizes the argument to replace the existing search engine but that ceasing its use immediately — removing search from House.gov until a replacement is implemented — is an improvement and deserves serious consideration. Providing a search form is an affordance of goodwill and a best practice; attaching that form to this search engine is not merely indifferent but malicious. How this search engine was ever considered good enough for government work is an enigma, but that a half-decade of consideration has passed on how best to end its miserable existence is contemptible.

If my candor seems excessive, I recommend attempting the same 97 searches I performed; the mounting frustration from each scatalogically defective set of results all but led to a 98th search for a long drink and a short firearm. That site visitors give up on House.gov after one search is my hope. My recommendation is to spare them the experience.

This analysis is an early stage in a planned report of information architecture recommendations for House.gov and Member sites — I intend to parse this data further and research site traffic patterns to draw more conclusions and detailed recommendations. In the meantime, I’m looking forward to your feedback. Thanks for reading.

The preceding email was sent at 02:39 -0400 to all Web Solutions staff.

Just replacing the search technology may not solve the search issues. It sounds like you need a search engine program, a content strategy and a publishing environment that works with both. Not a small project.

You may already know what you need, I just wanted to warn you that there are no silver bullets. First do the hard work of identifying the right content for the top queries, then decide if a new engine is needed or if the existing one is misconfigured, mismanaged or out of date.

I sent this memo and originally posted this in September 2007. The memo advanced a business case for a new search engine at the House – originally budgeted for a Google Mini but implemented as a Google Custom Search at the beginning of 2009. There are major issues throughout the House’s web program (the least of which at this point is the lack of a cohesive web program) – it is among the reasons I resigned from the House in May 2009. The search engine was misconfigured, mismanaged, and out-of-date. Developing a search engine program, content strategy, and a publishing environment was not a small project; however, it was also not a broadly supported project and a source of intense frustration.

While the issues here are hardly unique to this branch of the American government, they can be addressed. While this particular memo was brash, site search analytics yielded positive results.