Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

This post originally authored by Reed Law.

When building a web app, there’s often a mountain of data that I need to move from one format to another. Most web application frameworks provide a mechanism for this called migrations. A migration is a snippet of code that creates a change in the database. Migrations are convenient because they can run inside each environment (e.g. development, staging, and production) and create the same results without having to dump and restore databases.

Usually when I think about doing a one-off data migration, I first think of how easy it is to do in Ruby. Programming in Ruby all day makes me think in Ruby, so I almost automatically come up with something like this:

I needed to create cities for each unique city/state in the zones table. So I mapped each unique city plus state, to an array that then created a new city (with the help of a little regex to split the city and state). But with over forty thousand rows, this expression took several minutes before it overheated my laptop. Turns out this same migration can happen in under a second with a carefully crafted SQL statement. Here it is:

Normally, to figure out the proper syntax for a query like the one above, I have to review the PostgreSQL manual. In this case, the time saved was clearly worth the extra effort. The next time you need a migration hopefully you’ll head for the right tool. After a while, you may begin thinking in SQL.