Category: Ruby

Having written a CIF parser, TSDBExplorer, I found that although the parsing code was correct (thanks, Test Driven Development!), it’s hideously slow.

Using exactly the same set of tests and test data, I’m refactoring a chunk of the code. Whilst it’s quite demotivating to find the majority of your unit tests suddenly failing, being able to review each one and make it pass is absolutely great. I’ve definitely found the right way to code.

To anyone who isn’t a full-time programmer and who “just wants to get on and write code” – how many times have you suffered because you’ve “just got on with it” rather than writing tests and making them pass? Give Test Driven Development a try…

Previous projects of mine were met with a “*sigh* I suppose I need to write some tests, but I want to get on with the code”. TSDB Explorer is the first codebase I’ve written where the majority of code existed only after the unit tests. I have to say, it’s many times more robust, and I’m going to continue the trend in TubeHorus.

Part of me is delighted too at rcov and its “Here’s how much of your code is covered by tests”. There’s a definite warm feeling to be had when you’ve covered nearly all your code with tests, although as @bobtfish said – “That just shows you that your bugs are within your unit tests”. But I can always write more of them to make sure I don’t re-introduce bugs.

A project that I’m working on (OK, it’s TSDBExplorer) generates a metric shedload of database rows. For a record that says “This train runs between 01-01-2011 and 31-05-2011 on Mondays – Fridays”, the code generates a timetable for each day. It takes an age to import, and I hope it’s going to be fantastically quick at querying data.

There’s a big downside with ActiveRecord out-of-the-box – it takes a long time to INSERT a record in to a MySQL database. I left some INSERTs going at 9.50am, and they’d just about finished when I got back from the gym five hours later. 1.2 million rows in five hours is shockingly poor.

activerecord-import appears to solve the problem in the least impact way. To group up your INSERTs, you create new instances of a model object – say, Association. You push these in to an array, and then use the new ‘import’ method to do a mass INSERT.

I am quite happy at 62 minutes to insert 1.2 million rows, including processing, considering it’s an activity that only needs to be done twice, maybe three times a year.