A computational biologist's personal views on new technologies & publications on genomics & proteomics and their impact on drug discovery

Thursday, January 28, 2010

A little more Scala

I can't believe how thrilled I was to get a run-time error today! Because that was the first sign I had gotten past the Scala roadblock I mentioned in my previous post. It would have been nicer for the case to just work, but apparently my SAM file was incomplete or corrupt. But, moments later it ran correctly on a BAM file. For better or worse, I deserve nearly no credit for this step forward -- Mr. Google found me a key code example.

The problem I faced is that I have a Java class (from the Picard library for reading alignment data in SAM/BAM format). To get each record, an iterator is provided. But my first few attempts to guess the syntax just didn't work, so it was off to Google.

Ah, sweet success. But, while that's a step forward it doesn't really play with anything novel that Scala lends me. The example I found this in was actually implementing something richer, which I then borrowed (same imports as before)

First, I define a class which wraps an iterator and defines a foreach method:

Second, is the definition within the body of my object of a rule which allows iterators to be automatically converted to my wrapper object. Now, this sounds powerfully dangerous (and vice versa). A key constraint is Scala won't do this if there is any ambiguity -- if there are multiple legal solutions to what to promote to, it won't work. Finally, I rewrite the loop using the foreach construct.

Is this really better? Well, I think so -- for me. The code is terse but still clear. This also saves a lot of looking up some standard wordy idioms -- for some reason I never quite locked in the standard read-lines-one-at-a-time loop in C# -- always had to copy an example.

You can take some of this a bit far in Scala -- the syntax allows a lot of flexibility and some of the examples in the O'Reilly book are almost scary. I probably once would have been inspired to write my own domain specific language within Scala, but for now I'll pass.

Am I taking a performance hit with this? Good question -- I'm sort of trusting that the Scala compiler is smart enough to treat this all as syntactic sugar, but for most of what I do performance is well behind readibility and ease of coding & maintenance. Well, until the code becomes painfully slow.

I don't have them in front of me, but I can think of examples from back at Codon where I wanted to treat something like an iterator -- especially a strongly typed one. C# does let you use for loops using anything which implements the IEnumerable interface, but it can get tedious to wrap everything up when using a library class which I think should implement IEnumerable but the designer didn't.

I still have some playing to do, but maybe soon I'll put something together that I didn't have code to do previously. That would be a serious milestone.

About Me

Dr. Robison spent 10 years at Millennium Pharmaceuticals working with various genomics & proteomics technologies & working on multiple teams attempting to apply these throughout the drug discovery process. He spent 2 years at Codon Devices working on a variety of protein & metabolic engineering projects as well as monitoring a high-throughput gene synthesis facility. After a brief bit of consulting, he rejoined the cancer drug discovery field at Infinity Pharmaceuticals in May 2009. In September 2011 he joined Warp Drive Bio, a startup applying genomics to natural product drug discovery. Other recurring characters in this blog are his loyal Shih Tzu Amanda and his teenaged son alias TNG (The Next Generation).
Dr. Robison can be reached via his Gmail account, keith.e.robison@gmail.com
You can also follow him on Twitter as @OmicsOmicsBlog.