August 9, 2015

Recently, I was tasked with writing a relative complex data migration script. The script involves connecting to a MySQL database, querying existing data and then inserting to a destination schema. Doing this in Bash would be quite hard to test and error prone. Some modern functional language would provide a better solution, e.g. Ruby, Scala, or Groovy. We opt to use Groovy as some of team members have Java background so there is less friction when doing maintenance. This blog post is to show you how to set up basic structure of Groovy scripting with Spock for Unit Testing and Gradle for building.

Groovy CLI

Firstly, we set up a basic script structure with Groovy CLI. Script: data-fix.groovy

The above code can be viewed in this Github commit. Next up, we will set up Unit Testing.

Unit Testing with Spock and Gradle

Spock provides a nice testing framework. I am a fan of its easy mocking syntax and BDD (Behavioural Driven Development) syntax "given, when, then". One way to setup Spock in Groovy is using Gradle build and dependencies management.

By default, Gradle assumes certain directory structures: src/main/groovy, and src/test/groovy. (You can change the above structure as described here). We will move our code into the above directory structure, and will create an empty test file ProcessorSpec.groovy under src/test/groovy directory.

Gradle wrapper is great to ensure build is run the same way across different machine. On a machine that does not have Gradle installed, it will first download Gradle and execute the build task. We can setup Gradle wrapper with this easy command:

Adding libraries

We got the basic skeleton done. The next step is to add logic into our script. The script will connect to MySQL database, so we will add mysql-connector to the script. In addition, to debug script, I'm a fan of adding logging statements to the flow. We will use @Grab to add dependencies into the script data-fix.groovy.

So what went wrong? @Grab is using Grape to manage dependencies, while Gradle has its own dependencies management. At this point, we have 2 options: use Gradle to manage all dependencies and execute script via Gradle, or mix and match between Gradle and Grape (Grape is for runtime, Gradle is only for testing). Both options have its own merits. For me, I prefer the simplicity of Grape at runtime, so I will continue with the later.

Using this method will violate DRY (Don't Repeat Yourself), as dependencies are defined in 2 places: @Grab and in Gradle dependencies. You can have a look at mrhaki blog post if you want to invoke Groovy script from Gradle task. I found passing script command line options as Gradle run properties is a bit awkward.

Adding more logic and tests

Simple logic - default localhost if host is not provided

Now that we have a structure going, we can add more logic into our script. The first easy one is set host to the parameter provided, otherwise default to 'localhost'.

Summary

As you can see, Groovy language is very easy to work with and powerful as a scripting language. Together with unit testing, you have confidence in your script doing the right thing and production ready. I truly believe you should Unit Test everything, including scripts; and the above is the setup to achieve just that.

January 21, 2015

As part of my work, we have the pleasure of working with MongoDB. Coming from the SQL world where queries start with WHERE clause or cryptic JOIN, the MongoDB JavaScript file is a pleasure to work with. I will share with you some example queries in this post.
As a background, we implemented a feature where user can endorse terms, similar to the LinkedIn endorsement. We have concept of term, and endorsement. A term is just a representation of a piece of text. An endorsement is an association between a term, user, and listing that a user's currently endorsing. Here's what they look like:

1. Updating using Regular expression

When we went live initially, there were a bunch of terms starting with "Authentic", such as "Authentic Italian", "Authentic Chinese". After a while, we didn't see much uptake for those terms. We decided to drop the word "Authentic". Good news is Mongo supports regex, so updating them is quite easy: searching and replace all terms starting with "Authentic" with an empty string.

6. Update the DBRef

Scenario: An existing term becomes obsolete, so we want to update all existing endorsements pointing to that obsolete terms to a different term. For example, term "Great Service" is obsolete and should be deleted, all existing endorsements should be updated to "Great or Professional Service".
First, find the term "Great Service", and the term "Great or Professional Service", note down the ObjectId for both.

7. Unique grouping from derived values

Scenario: we want to find out unique users contributing for each day. We can do distinct query based on userId and _created fields. However, _created is a timestamp, not a date. From the _created field, we need to derived a yyyy-MM-dd and perform distinct query. Luckily, MongoDB group() function and its keyf does support that.