It is possible to get the basic syntax highlighting for Ruby files in PhpStorm using the TextMate bundles support plug-in. It’s already included with Webstorm and you don’t need to install it, just make sure it’s enabled in Settings | Plugins.

For older versions of Webstorm TextMate Bundles support will not recognize *.rb files as supported by this bundle. To fix this problem open the file ‘Ruby.tmbundle\Syntaxes\Ruby.plist’ in some text editor, find ‘<key>fileTypes</key>’ section, add ‘<string>rb</string>’ under ‘<array>’

A demonstration, with repeatable steps, of how to quickly fire-up a Hadoop cluster on Amazon EC2, load data onto the HDFS (Hadoop Distributed File-System), write map-reduce scripts in Ruby and use them to run a map-reduce job on your Hadoop cluster. You will not need to ssh into the cluster, as all tasks are run from your local machine. Below I am using my MacBook Pro as my local machine, but the steps I have provided should be reproducible on other platforms running bash and Java.

Fire-Up Your Hadoop Cluster

I choose the Cloudera distribution of Hadoop which is still 100% Apache licensed, but has some additional benefits. One of these benefits is that it is released by Doug Cutting, who started Hadoop and drove it’s development at Yahoo! He also started Lucene, which is another of my favourite Apache Projects, so I have good faith that he knows what he is doing. Another benefit, as you will see, is that it is simple to fire-up a Hadoop cluster……

I love reading fantasy, I’ve even written about some of my favourite fantasy series on this blog. One of the things that I have always found interesting about fantasy literature (besides unworkable economies and unsustainable population densities – I tend to over-analyse when I read :)) was how they come up with the names for all the characters. Large fantasy series often contain hundreds of characters – that’s a lot of names. This line of though naturally led me to think of what I would do if I ever needed to make up a bunch of names and being the software developer that I am the answer was naturally – get my computer to make up the names for me.

If you do a search around the web for name generators you get quite a few results, unfortunately most of those don’t tell you how they do what they do and even that is besides the point since I wasn’t really happy with the results that most of these name generators produce. Either the results are way too random (how about 6 consonants in a row) or they are not random enough with clear traces of human intervention (i.e. choosing from a list of pre-made names). Then I found Chris Pounds excellent name generator page. One of the things that he has on this page is his language confluxer (lc) script so for my first attempt at writing a name generator I decided to basically take his script and clean it up a little bit. There were two reasons for this:

* he uses a pretty clever algorithm for his name generator, it is completely data driven and is therefore able to avoid the 6 consonants/vowels in a row issue while producing output that sounds similar to the data it is based on

* it was a yucky Perl script and nobody wants to work with that (except Perl programmers), so I felt it was my duty to make it a little bit nicer and since I’ve been playing around with Ruby lately, well you get the picture 🙂

The Name Generator Algorithm

As I said the script is completely data driven in that it takes a list of words (names in our case) as input and uses these to produce a bunch of randomised names that hopefully sound similar to the original input. It does the following:

* produces a list of starting letter pairs from the input data (all our names will start with one of these pairs)

* produces a map of which letters can follow which other letters based on the input data

* generates words/names by randomly selecting a starting pair and then appending to the word by randomly choosing a letter from the map based on what the last letter in our new word currently is

* this continues until the word length falls into a particular range (this range is hard-coded in the script)

There are a few more little twists that make this whole thing function but that is the essence of the algorithm.