Adding search to a middleman blog

slightly simplier than google

We’re going to build a simple, niave search for middleman blogs. We’re going to build a search index at build time, and then use that index to perform the search itself on the client side.

Building the index

When you typed in something in google, it doesn’t then go and hit every page on the internet to check to see if there’s a match. It doesn’t even look at every page that it has squirreled away somewhere in the googleplex. What it consults is an index of the documents out there, and the index points to the page information. (We all know that it’s a lot more complicated than that really, but run with it.)

First thing we’re going to do is create a very simple version of this index for your site. This is going to be in a file called source/article.index.json.erb.

Go through all of the articles.

Add meta data for the article into the master map.

Find all of the words in the article, by stripping out all of the html tags, making things lowercase, and breaking it apart by white space.

Loading and Querying the index

Ok, lets build this from the ground up. All this goes into application.js.First we’re create a method that loads up the index if we need it. We’re going to use a promise here, so if multiple request come in at the same time only one will go to the server:

Now lets build a simple search. This is a little complicated, since we need to compute the intersection of the results if the user types in multiple words. Here’s what’s happening:

We create a promise, since we may need wait for the index to load.

We split the search term into multiple words.

Collect the results of the match_index function.

Compute the intersections of all the results

Look up the meta data based on the url.

Resolve the promise with the results.

varfind_article=function(search){varsearch_results=$.Deferred();article_index().done(function(index){// Split the search by widespace
varwords=search.toLowerCase().replace(/\s+$/,'').split(/\s+/);// Lookup the matches for each word
// Note using $.map seems to flatten the result.
varfull_results=[];$.each(words,function(i,word){full_results.push(match_index(word,index));});varurls=full_results[0];// If there are multiple words, compute the intersection
if(full_results.length>1){varurl_counts={}$.each(full_results,function(i,set){$.each(set,function(i,url){url_counts[url]=(url_counts[url]||0)+1;})});urls=[];$.each(url_counts,function(url,count){if(count==full_results.length){urls.push(url);}});}// Pull in the metadata
varresults={};$.each(urls,function(i,url){results[url]=index.articles[url];});search_results.resolve(results);});returnsearch_results;};

Wiring it up

First we need to call our code when the user inputs something in the text area:

Read next

See also

This article walks through the motivations driving and benefits of using a the Seed Architecture for building performant websites using Middleman, React, and a seperate API server such as Parse. The benefits are:
You get full SEO with a heavy client JavaScript site without having to do crazy things with PhantomJS or meta fragments. Hosting and operations become both cheap and doesn’t require a support team. Scaling out the system is mainly a bandwidth problem, and secondarily a API scaling problem for a subset of behavior.

Middleman extensions, like rails plugins, are packaged as gems. There are three main ways to extend middleman. You can add helpers, add middleman commands, or extend the sitemap generation in someway. Lets go through those in detail.
Creating the extension Create a gem using bundle gem _name_
$ bundle gem middleman-graphviz Add middleman-core to your gem dependancies in the .gemspec file:
spec.add_runtime_dependency 'middleman-core', ['>= 3.0.0'] Register your extension into middleman.

As part of the process of getting this site to work, I learned some more things about how to better build a site with middleman. Building off of our foundational article here are a few other things that I found very useful when using middleman to build a static site with a bunch of dynamically generated content.
Partials The index.html.haml, articles.html.haml, tag.html.haml and calendar.html.haml pages all use the same partial to list out the post archives, which are mostly the same.