Doron's .NET Space

October 7, 2014

Our team has been developing an NLP library, used by Ginger Page, using Scala. Now, mostly, things have been really great. Scala allows us to go very fast, and allows us to develop the app’s NLP needs in a way that Java never could have. But, as all things in life, nothing good ever comes free. We’ve noticed that our app is suffering from many GC interruptions, hurting performance. I went to investigate, and this is my story.

Benchmark on a mobile device

Well, if you’re like us and developing a Scala library to be used with an existing app, it is useful to benchmark your lib externally. Usually performance bottlenecks that you’ll solve after running a benchmark from command line will translate to your app, and it is so much easier to benchmark on your PC and not on a mobile device. But the word ‘usually’ there is important. The HotSpot JVM and Dalvik (=Android’s VM) are quite different beasts, and memory constraints on your mobile device are also very different.

Therefore, I found it useful to build a standalone app in order to test my library’s performance. It has one button, which I programmed to call my library in different scenarios. It prints out how long it took to run the scenario (Side note: it was surprising to find out that my PC is ~100 times faster than my Nexus 4, which means that every 1 millisecond I measured on my PC is a whopping 100ms on mobile!).

Beware of the GC

After I built a demo app, I started clicking my big “TEST PERF” button, and noticed the printouts on my logcat (you should know that calls to “println” from Scala appear there (as Log.i), so it is useful to print out timing information from within your Scala code). I immediately noticed many messages, similar to this:

This means that the GC kicked in, and paused my app for 181ms, which is A LOT. But hey, you may ask, it seems the GC points out that my heap is 4688K, which is a lot less than my app’s heap size limit! Well, Dalvik doesn’t allocate a very big heap for your app initially. It grows it slowly. And before it grows it, it will attempt to free up memory. Hence, the above depressing message. How do we deal with this? Well, the best thing to do is try to allocate less memory! Let’s try that then.

Analyzing your heap

Using the Android DDMS, or in my case, the built-in Intellij plugin for Android, you can dump your device’s heap. I did so after clicking the “TEST PERF” button a couple of times, but before a GC was triggered. This allows me to check out the temporary objects I created in a profiler. I’m using YourKit profiler, which is a paid one, but the resulting hprof file (which IntelliJ helpfully converts to the standard format, if you’re a DDMS user you have to use the hprof–conv command line) can be viewed in any decent profiler, I believe. I started checking out the objects which are unreachable from the GC root – these are the temporary objects that the GC will collect, and my main problem. I came up with a few conclusions, which I think will be useful to any Scala on Android developer.

The Scala tax

Scala loves to create short lived objects. Android and Dalvik don’t like these so much. But not all is lost, Scala is still extremely usable under these conditions, you just have to be careful in how you use it.

Vectors – careful with those

Vector is the default IndexedSeq implementation, and it is a great data structure, in fact our “go to” data structure usually. But you need to be careful with it:

The Vector will allocate an Object[32] array to hold a single item. That’s a waste on our low memory machine. Consider using Arrays instead of Vectors where appropriate to overcome this issue.

Using IndexedSeq() to create an empty vector will create a redundant Object[32] array. You should use IndexedSeq.empty or Vector.empty instead.

Vector.tail is more costly than List.tail, as it creates a new Vector. If you have a recursive function that uses.head and .tail (a common pattern in Scala and other functional languages), consider using a list, or access your vector using the indexer (e.g. myVector(5)).

Method chaining – embrace .view

So you have a vector, an array or a list, and you’re doing something like vector.map(…).filter(…).map(…). Bad idea. In this example, you’ll be allocating a temporary vector after each method call. Instead, do a vector.view.map(…).filter(…).map(…).toVector. This will make sure you only allocate one vector, at the end of the chain.

Boxing is your enemy

Oh man, how I wish the JVM had the CLR’s generics and value types support. Unfortunately, it does not. This means that your beautiful generic function, which can work on any type, will probably cause boxing when used with primitive types. You might see it in your profiler asscala.runtime.BoxesRunTime.boxToInteger(int) or similar calls. Here’s some tips on avoiding boxing:

You can try using the @specialized annotation on your generic method or class. It doesn’t work in every case, and in one critical place in my code I had to…

Copy & paste some functions to work directly with an Integer.

Beware of using Option[Int]. It will box your integer. I have a lot of these, and haven’t yet decided the best approach. I might change my code to return –1 as the “None” value, or consider using this alternative Option class.

Some lambda expressions, which are based on classes that are not specialized, will cause your primitive to box. You might want to skip the lambdas in this case.

While instead of for

Scala for loops might not be what you think they are. They are more like C# Linq expressions than C# for or foreach. They get translated to map() and withFilter() calls with anonymous function for the loop body. This means that code like:

This can be easily replaced with a while loop that does none of the above:

var index = 0
while (i < s.size) {
//do stuff
i+=1
}

Yes, this is uglier, and I don’t recommend doing it everywhere. Just in the places that hurt the most (get called most often).

Beware of implicit conversions

Sometimes Scala will implicitly wrap your object with another object. Say you’re calling str.indexWhere(c=>c.isUpper). Did you know that your string is being wrapped with a StringOps object, and your char is being wrapped with a RichChar object? This might cause redundant allocations where you least expect them. In one performance critical place, I ended replacing an indexWherecall with a static function I created, that skipped both the StringOps conversion and the lambda expression altogether.

Scala’s Map gotcha

Scala’s Map.get, will wrap your value in an Option. Very convenient, I admit, but sometimes you just can’t afford the extra allocation. This sucks, but I had to use Java’s HashMap in one place to avoid this.

Conclusion & Disclaimer

My suggestions are things you should pay attention to, but you should only apply them where it hurts and after you noticed them in a profiler. You don’t want to pre-optimize your code, as you’ll just end up with uglier, not necessarily faster (or more memory efficient) code. Still, you should be very aware of where your code allocates memory when running on Android. Dump you heap, and optimize away. Good luck!