How Facebook dug deep within Android to fix its mobile app

And now the company has trained nearly 500 employees to code for mobile OSes.

MENLO PARK, CA—When Facebook's mobile app began misbehaving on an older version of Android in late 2012, Facebook engineers had to dive deep into Android's code to figure out what was causing the mishap. In a whiteboard session today at Facebook headquarters, mobile engineering director Mike Shaver described how Facebook identified a problem in Android itself and created a workaround for its own app so users wouldn't have to suffer.

At the beginning of the session, Shaver explained that he was particularly fond of the Android platform because it is both opportunistic and open. "The whole point of it is that there's no central power that drives it," Shaver said. "That's one of the things that makes Android so exciting to develop for." Facebook continues to see Android as a viable platform not only because of the amount of users Android has around the world, but because it allows Facebook to develop mobile applications that interact directly with the operating system. This isn't always easy, however.

In an official blog post, Facebook notes that there are many challenges to overcome when developing for Android, especially on older versions of the platform. But the company isn't referring to fragmentation issues. Android’s runtime engine, the Dalvik Virtual Machine, kept breaking the Facebook application during the app’s revamp this past December. “In order for the program to do anything, it has to work with Dalvik,” explained Shaver at the closed session. To ensure that an application runs smoothly, it needs to be able to use a large portion of small “methods”—elements defining behavior within the application. But when emulated in older iterations of Android like Gingerbread, Facebook's app would crash.

Gingerbread was released and written “long before the arc of Android’s success,” as Shaver put it. So it had some limitations baked into Dalvik and couldn't handle some of the capabilities and features that were essential to the Facebook application. Facebook’s official blog post refers to a specific bug the company had filed that detailed failed app installations. But rather than nix the whole Gingerbread user base altogether, Shaver said the team set out to “look under the hood.”

Here's how Facebook explains it in-depth:

During standard installation, a program called "dexopt" runs to prepare your app for the specific phone it's being installed on. Dexopt uses a fixed-size buffer (called the "LinearAlloc" buffer) to store information about all of the methods in your app. Recent versions of Android use an 8 or 16 MB buffer, but Froyo and Gingerbread (versions 2.2 and 2.3) only have 5 MB. Because older versions of Android have a relatively small buffer, our large number of methods was exceeding the buffer size and causing dexopt to crash.

After a bit of panic, we realized that we could work around this problem by breaking our app into multiple dex files, using the technique described here, which focuses on using secondary dex files for extension modules, not core parts of the app.

However, there was no way we could break our app up this way—too many of our classes are accessed directly by the Android framework. Instead, we needed to inject our secondary dex files directly into the system class loader. This isn't normally possible, but we examined the Android source code and used Java reflection to directly modify some of its internal structures. We were certainly glad and grateful that Android is open source—otherwise, this change wouldn't have been possible.

Android’s open nature allowed the team to delve deep into Dalvik and figure out where it had set up a buffer. Capped at 5MB, Facebook developers moved a pointer around the numbers to increase that threshold to 8MB. After sending the patch over to Google for their opinion, the new version of Facebook's app was approved in a few days and it became usable again for Gingerbread users. The problem did not affect newer versions of Android.

None of this would have been possible if it weren't for Facebook pushing its developers to have “mobile empathy”—the company’s plan to get its employees to put mobile first. At the beginning of 2012, Facebook employed only a small group of developers to work on its core applications, which at the time were really just wrappers of the site’s mobile Web experience. “It wasn't good enough,” stressed Shaver. Facebook then started an intensive training program in July 2012 in an effort to bring its developers up to speed on coding natively on both Android and iOS. The five-day intensive session features eight hours per day of training, taught by Big Nerd Ranch, and those who can last the week can start writing code the following Monday. As an aside, any Facebook employee can take the course. So far, 450 people from various backgrounds have completed the training.

Promoted Comments

Instead, we needed to inject our secondary dex files directly into the system class loader. This isn't normally possible, but we examined the Android source code and used Java reflection to directly modify some of its internal structures. We were certainly glad and grateful that Android is open source—otherwise, this change wouldn't have been possible.

Speaking as someone who does mobile development for a living...

Eek.

Speaking as someone who values good software abstraction and code design...

Barf.

Not only is their solution to the problem hacky, but the problem that they had, off the bat, seems shows that they did a BAD job of organizing their code in the first place. I don't really know what their code looks like, but this seems like code bloat to me. Problem: our astronomical number of methods doesn't fit into this buffer. Solution: Use multiple buffers! Perhaps this is a common solution to a known Android limitation, but I'm less inclined to believe that. If this was such a fundamental problem, why didn't Google address it somehow? Also, how come we haven't heard of more app developers doing insane things like this? It seems more likely to me that the app was just poorly architected to begin with.

Instead, we needed to inject our secondary dex files directly into the system class loader. This isn't normally possible, but we examined the Android source code and used Java reflection to directly modify some of its internal structures. We were certainly glad and grateful that Android is open source—otherwise, this change wouldn't have been possible.

Speaking as someone who does mobile development for a living...

Eek.

Speaking as someone who values good software abstraction and code design...

Barf.

Not only is their solution to the problem hacky, but the problem that they had, off the bat, seems shows that they did a BAD job of organizing their code in the first place. I don't really know what their code looks like, but this seems like code bloat to me. Problem: our astronomical number of methods doesn't fit into this buffer. Solution: Use multiple buffers! Perhaps this is a common solution to a known Android limitation, but I'm less inclined to believe that. If this was such a fundamental problem, why didn't Google address it somehow? Also, how come we haven't heard of more app developers doing insane things like this? It seems more likely to me that the app was just poorly architected to begin with.

You didn't read the FB blog entry. The buffer is in the Dalvik VM not the FB app. This is not a common solution to a known, probably little known, Android limitation - the weblog entry states as much. Google did expand the size of the buffer in newer versions of Android (up from 5MB in pre-Honeycomb to 8-16MB).

We haven't heard of app developers doing insane things like this because most developers wouldn't be yelling "Look at me mom!!! I'm lighting a cigarette at the gas station!!!"

99 Reader Comments

I just got my first Android phone this week (Galaxy S III). Not only is the Facebook mobile app a dramatic improvement over the Windows Phone app, it's in my opinion far superior to Facebook on a desktop browser. The team responsible for this app absolutely are deserving of applause.

And yet it still crashes on average two or three times an hour according to how many times I hear my wife yell explicatives at her phone (which I verified are directed at the Facebook app in particular).

I wish the Facebook app wouldn't run in the background 24/7. Not only does the process run constantly, it spawns like 4 services - none of which are necessary when you've disabled all possible notifications.

Instead, we needed to inject our secondary dex files directly into the system class loader. This isn't normally possible, but we examined the Android source code and used Java reflection to directly modify some of its internal structures. We were certainly glad and grateful that Android is open source—otherwise, this change wouldn't have been possible.

I'm not sure how they did it, but their rewrite from using HTML5 to using a native app caused massive performance issues on my old HTC Legend (simply scrolling yields approx 1 frame per 2 seconds half of the time) and it FCs half a dozen times a day.

Instead, we needed to inject our secondary dex files directly into the system class loader. This isn't normally possible, but we examined the Android source code and used Java reflection to directly modify some of its internal structures. We were certainly glad and grateful that Android is open source—otherwise, this change wouldn't have been possible.

Speaking as someone who does mobile development for a living...

Eek.

Speaking as someone who values good software abstraction and code design...

Barf.

Not only is their solution to the problem hacky, but the problem that they had, off the bat, seems shows that they did a BAD job of organizing their code in the first place. I don't really know what their code looks like, but this seems like code bloat to me. Problem: our astronomical number of methods doesn't fit into this buffer. Solution: Use multiple buffers! Perhaps this is a common solution to a known Android limitation, but I'm less inclined to believe that. If this was such a fundamental problem, why didn't Google address it somehow? Also, how come we haven't heard of more app developers doing insane things like this? It seems more likely to me that the app was just poorly architected to begin with.

Seriously though, I'm not begrudging Facebook getting the coverage here. It is still highly amusing to see them learning this stuff in 2013.

Reminds me of the HTC Thunderbolt launch back in 2011. Our error logs suddenly started blowing up one day, and we discovered a new phone had launched at the same time. Interesting. Digging through the source code revealed that some enterprising HTC developer had decided to increase a network buffer size in the OS by about 10x. Worked great for whatever he was doing I'm sure, but crashed a bunch of apps like clockwork.

On one hand, being able to view the source saved our bacon. On the other hand, iOS doesn't let HTC developers run rough shod through their code and generally doesn't have those sorts of issues. Double edged swords all around.

And yet it still crashes on average two or three times an hour according to how many times I hear my wife yell explicatives at her phone (which I verified are directed at the Facebook app in particular).

Facebook's performance probably varies depending on the age and hardware specs of your phone. It runs great on my Galaxy Nexus and my wife's Galaxy S3, but ran like ass on my wife's old HTC Thunderbolt and crashed a lot for her - even recently with their new app. I frequently had to clear the app cache, then uninstall and reinstall the Facebook app, to get it to work once it started acting up. Nothing else helped, not a reboot of the phone, not killing its processes, etc. It sounds like the Facebook app is pretty bloated, but they got it to work on old Android OS's...but just barely. Notice they don't mention in the article about how well it performs on old hardware, just that they got it working.

I do have a weird issue on my Galaxy Nexus where Facebook won't load images and is dog slow - but only when connected to my home WiFi. On 4G it works great, or connected to any WiFi other than my home WiFi. I think I need to fire up WireShark one of these days and see if there's something weird going on...

The Facebook app is leaps and bounds where it was a year ago though, like they say in the article. Apps written in custom code on Android tend to perform way better than apps that are just HTML with a wrapper. I'm glad they took the time to optimize for mobile, and to make mobile a priority, since that's the way "kids these days" are accessing Facebook. I probably spend more time on Facebook on my mobile than on my computer too, and I'm 40 years old.

Speaking of problem, current version...actually, the last few updates kinda broke the app. I noticed that comments are repeating and there's problem when working with images/status updates from other "app" within FB. For example, when someone posted an image using Instagram, you viewed it and when you tried to go back to where you were previously, it just loop on the same page no matter how many times you pressed the back (physical) button. Of course, the workaround is to click on the menu and select News Feed.

Instead, we needed to inject our secondary dex files directly into the system class loader. This isn't normally possible, but we examined the Android source code and used Java reflection to directly modify some of its internal structures. We were certainly glad and grateful that Android is open source—otherwise, this change wouldn't have been possible.

Speaking as someone who does mobile development for a living...

Eek.

Speaking as someone who values good software abstraction and code design...

Barf.

Not only is their solution to the problem hacky, but the problem that they had, off the bat, seems shows that they did a BAD job of organizing their code in the first place. I don't really know what their code looks like, but this seems like code bloat to me. Problem: our astronomical number of methods doesn't fit into this buffer. Solution: Use multiple buffers! Perhaps this is a common solution to a known Android limitation, but I'm less inclined to believe that. If this was such a fundamental problem, why didn't Google address it somehow? Also, how come we haven't heard of more app developers doing insane things like this? It seems more likely to me that the app was just poorly architected to begin with.

You didn't read the FB blog entry. The buffer is in the Dalvik VM not the FB app. This is not a common solution to a known, probably little known, Android limitation - the weblog entry states as much. Google did expand the size of the buffer in newer versions of Android (up from 5MB in pre-Honeycomb to 8-16MB).

We haven't heard of app developers doing insane things like this because most developers wouldn't be yelling "Look at me mom!!! I'm lighting a cigarette at the gas station!!!"

the problem that they had, off the bat, seems shows that they did a BAD job of organizing their code in the first place.

I agree! As we all know good designed architectures try to minimize the number of methods as much as possible. 1 giant page filling method is much better than sever small, easy to understand and test methods.

Quote:

Solution: Use multiple buffers!

You didn't get the part where the buffer is actually allocated by dalvik did you? Although to be fair that's pretty much the solution they went with anyhow.

Quote:

If this was such a fundamental problem, why didn't Google address it somehow?

You mean like increasing the buffer size in subsequent release? Oh wait...

You know a better Android Facebook app? m.facebook.comOther than messages going all FUBAR in the cache after a while, it's massively better - isn't as heavy on the system and doesn't use up all your idle power.

It's bad when your mobile site is honestly better than the native system application.

Instead, we needed to inject our secondary dex files directly into the system class loader. This isn't normally possible, but we examined the Android source code and used Java reflection to directly modify some of its internal structures. We were certainly glad and grateful that Android is open source—otherwise, this change wouldn't have been possible.

Speaking as someone who does mobile development for a living...

Eek.

Speaking as someone who values good software abstraction and code design...

Barf.

Not only is their solution to the problem hacky, but the problem that they had, off the bat, seems shows that they did a BAD job of organizing their code in the first place. I don't really know what their code looks like, but this seems like code bloat to me. Problem: our astronomical number of methods doesn't fit into this buffer. Solution: Use multiple buffers! Perhaps this is a common solution to a known Android limitation, but I'm less inclined to believe that. If this was such a fundamental problem, why didn't Google address it somehow? Also, how come we haven't heard of more app developers doing insane things like this? It seems more likely to me that the app was just poorly architected to begin with.

Speaking as a like-minded programmer, that's exactly what I thought when I was reading the article. I get the impression that the app is a bloated mess.

Being an android user I'm not surprised that FB blew up a buffer on older versions of the OS. It manages to somehow be incredibly bloated using lots of memory for several services and eating up lots of battery on background stuff while at the same time somehow managing to miss updates or really actually function very well at all. How an app can be so busy doing nothing I'll never know. Really should switch to one of the alternatives but I have gotten around to it since I really should flash a new mod first.

Speaking as a like-minded programmer, that's exactly what I thought when I was reading the article. I get the impression that the app is a bloated mess.

So you're defining "bloated" by the number of methods a project defines? And you actually think that's a good, sensible measurement? Really? So what you're saying copy and pasting all helper functions from a project into their calling position would somehow reduce the "bloat" of the project. You see the problem with that?

I'm not saying the code is or is not bloated (never reverse engineered the code, and it runs perfectly fine on my nexus 4, although 70MB of memory is on the higher end), but using the number of methods is a ridiculous measurement for "bloat".

500 developers? No wonder they ended up with such a piece of bloatware. It would not surprise me if functionality in some areas is duplicated many times.

Being java doesn't help either. That language forces your have more classes and methods than any language I know of.

Java doesn't force anything of the sort. It recommends that you break your application down into small pieces that do limited things for ease of readability, and ease of reusability but it doesn't force you.

Spoken as someone who is currently having to fix a bug in a 2000 line method.

As far as I can tell from past articles, this is how Facebook developers do things. Instead of changing their approach to fit the tools they will alter the tools to fit their (possibly broken) approach. For example on the main site instead of switching from PHP to a compiled language and CGI they instead built a system that translates PHP into C++ and then compiles it for speed. Anybody else would probably just roll some wrapper class libraries and build a next generation of the site directly in a compiled (or VM) language, but Facebook thinks inventing a code-translating compiler is easier.

The funny thing is, this problem was found decades ago. From the paper “Lambda the Ultimate Opcode” by Steele and Sussman (1979):

Quote:

The author of a compiler, for example, might well guess, "No one will ever use more than, say, ten nested DO loops; I'll double that for good measure, and make the nested-DO-table 20 long." Inevitably, someone eventually finds some reason to write 21 nested DO loops, and finds that the compiler overflows its fixed table […] On the other hand, had the compiler writer made the table 100 long or 1000 long, most of the [memory] would be wasted.

It is sad to see how this wisdom fails to be applied, and what damage it does (this fixed sized buffer costs developer hours, which is expensive).

Thank goodness for open source! Otherwise Facebook engineers couldn't have expended such time and effort to fix an ugly bug instead of being truly productive. Bad old iOS is closed source, so such a herculean feat would not have been possible with it--never mind that iOS doesn't need such fixes.

Instead, we needed to inject our secondary dex files directly into the system class loader. This isn't normally possible, but we examined the Android source code and used Java reflection to directly modify some of its internal structures. We were certainly glad and grateful that Android is open source—otherwise, this change wouldn't have been possible.

Speaking as someone who does mobile development for a living...

Eek.

Speaking as someone who values good software abstraction and code design...

Barf.

Not only is their solution to the problem hacky, but the problem that they had, off the bat, seems shows that they did a BAD job of organizing their code in the first place. I don't really know what their code looks like, but this seems like code bloat to me. Problem: our astronomical number of methods doesn't fit into this buffer. Solution: Use multiple buffers! Perhaps this is a common solution to a known Android limitation, but I'm less inclined to believe that. If this was such a fundamental problem, why didn't Google address it somehow? Also, how come we haven't heard of more app developers doing insane things like this? It seems more likely to me that the app was just poorly architected to begin with.

So much this. The fact that a dinky social networking front end is blowing the dalvik buffers sends my bullshit detector off the charts.

As far as I can tell from past articles, this is how Facebook developers do things. Instead of changing their approach to fit the tools they will alter the tools to fit their (possibly broken) approach. For example on the main site instead of switching from PHP to a compiled language and CGI they instead built a system that translates PHP into C++ and then compiles it for speed. Anybody else would probably just roll some wrapper class libraries and build a next generation of the site directly in a compiled (or VM) language, but Facebook thinks inventing a code-translating compiler is easier.

No, they think it is less likely to cause catastrophic problems. The PHP to C approach allows for incremental, minor conversion to faster code. It's not pretty, but then again neither is a PR fiasco. Love it or hate it, Facebook's approach here is about pragmatism, not ease. They are very different things, despite sometime being mistaken for each other.

Instead, we needed to inject our secondary dex files directly into the system class loader. This isn't normally possible, but we examined the Android source code and used Java reflection to directly modify some of its internal structures. We were certainly glad and grateful that Android is open source—otherwise, this change wouldn't have been possible.

Speaking as someone who does mobile development for a living...

Eek.

Speaking as someone who values good software abstraction and code design...

Barf.

Not only is their solution to the problem hacky, but the problem that they had, off the bat, seems shows that they did a BAD job of organizing their code in the first place. I don't really know what their code looks like, but this seems like code bloat to me. Problem: our astronomical number of methods doesn't fit into this buffer. Solution: Use multiple buffers! Perhaps this is a common solution to a known Android limitation, but I'm less inclined to believe that. If this was such a fundamental problem, why didn't Google address it somehow? Also, how come we haven't heard of more app developers doing insane things like this? It seems more likely to me that the app was just poorly architected to begin with.

Speaking as a developer with 20 years experience my first thought was, "Brilliant. They've managed to avoid redeveloping or branching their app. What a great risk-management based approach to delivering a solution."

There's a whole bunch of Facebook developers, testers and managers who didn't lose sleep over this.

Sure it's a hack, but it's a cost effective hack. Nothing but clever in my book.

For example on the main site instead of switching from PHP to a compiled language and CGI they instead built a system that translates PHP into C++ and then compiles it for speed. Anybody else would probably just roll some wrapper class libraries and build a next generation of the site directly in a compiled (or VM) language, but Facebook thinks inventing a code-translating compiler is easier.

That seems like an even more poorly thought out idea than what they did. Most programmers should not be let anywhere near a low level language. From a business standpoint, it's a lot cheaper to hire 10 wizards and 500 monkeys than it is to hire 500 properly good programmers.

Florence Ion / Florence was a former Reviews Editor at Ars, with a focus on Android, gadgets, and essential gear. She received a degree in journalism from San Francisco State University and lives in the Bay Area.