12 December 2010

StrictMode API for Built-In Performance Monitoring

Back Story

One great thing about Google is “20% time”: spending 20% of your time working on projects outside your main focus area. When I joined Google, I bounced all over the place, often joking that I had seven 20% projects. One project I kept coming back to was Android. I loved its open nature, giving me access to do whatever I wanted, including opening my garage door when I approached my house on my motorcycle. I really wanted it to succeed but I worried about one thing: It wasn’t always super smooth. Animations would sometimes stutter and UI elements weren’t always immediately responsive to input. It was pretty obvious that things were sometimes happening on the wrong thread.

As a heavy SMS user, one of my 20% projects during the Cupcake (Android 1.5) release was speeding up the Messaging app and making it feel smoother. I got the app to a happy state and then continued bouncing between other 20% projects. When the Donut (Android 1.6) release came out, I noticed that a few of my Messaging optimizations had been accidentally broken. I was sad for a bit but then I realized what Android really needed was always-on, built-in, pervasive performance monitoring.

I joined the Android team full-time just over a year ago and spent a lot of time investigating Froyo performance issues, in particular debugging ANRs (those annoying dialogs you get when an application stalls its main thread’s Looper). Debugging ANRs with the tools at hand was painful and boring. There wasn’t enough instrumentation to find the causes, especially when multiple processes were involved (doing Binder or ContentResolver operations to Services or ContentProviders in other processes). There had to be a better way to track down latency hiccups and ANRs...

Enter StrictMode

“I see you were doing 120 ms in a 16 ms zone...”

StrictMode is a new API in Gingerbread which primarily lets you set a policy on a thread declaring what you’re not allowed to do on that thread, and what the penalty is if you violate the policy. Implementation-wise, this policy is simply a thread-local integer bitmask.

By default everything is allowed and it won’t get in your way unless you want it to. The flags you can enable in the thread policy include:

detect disk writes

detect disk reads

detect network usage

on a violation: log

on a violation: crash

on a violation: dropbox

on a violation: show an annoying dialog

In addition, StrictMode has about a dozen hooks around most of the places that hit the disk (in java.io.*, android.database.sqlite.*, etc) and network (java.net.*) which check the current thread’s policy, reacting as you’ve asked.

StrictMode’s powerful part is that the per-thread policies are propagated whenever Binder IPC calls are made to other Services or Providers, and stack traces are stitched together across any number of processes.

Nobody wants to be slow

You might know all the places where your app does disk I/O, but do you know all the places where the system services and providers do? I don’t. I’m learning, but it’s a lot of code. We’re continually working to clarify performance implications in the SDK docs, but I usually rely on StrictMode to help catch calls that inadvertently hit the disk.

Background on disks on phones

Wait, what’s wrong with hitting the disk? Android devices are all running flash memory, right? That’s like a super-fast SSD with no moving parts? I shouldn’t have to care? Unfortunately, you do.

You can’t depend on the flash components or filesystems used in most Android devices to be consistently fast. The YAFFS filesystem used on many Android devices, for instance, has a global lock around all its operations. Only one disk operation can be in-flight across the entire device. Even a simple “stat” operation can take quite a while if you are unlucky. Other devices with more traditional block device-based filesystems still occasionally suffer when the block rotation layer decides to garbage collect and do some slow internal flash erase operations. (For some good geeky background reading, see lwn.net/Articles/353411)

The take-away is that the “disk” (or filesystem) on mobile devices is usually fast, but the 90th percentile latencies are often quite poor. Also, most filesystems slow down quite a bit as they get more full. (See slides from Google I/O Zippy Android apps talk, linked off code.google.com/p/zippy-android)

The “main” Thread

Android callbacks and lifecycle events all typically happen on the main thread (aka “UI thread”). This makes life easier most of the time, but it’s also something you need to be careful of because all animations, scrolls, and flings process their animations by callbacks on the main thread.

If you want to run an animation at 60 fps and an input event comes in (also on the main thread), you have 16 ms to run your code reacting to that input event. If you take longer than 16 ms, perhaps by writing to disk, you’ve now stuttered your animation. Disk reads are often better, but they can also take longer than 16 ms, especially on YAFFS if you’re waiting for the filesystem lock that’s held by a process in the middle of a write.

The network is especially slow and inconsistent, so you should never do network requests on your main thread. In fact, in the upcoming Honeycomb release we’ve made network requests on the main thread a fatal error, unless your app is targeting an API version before Honeycomb. So if you want to get ready for the Honeycomb SDK, make sure you’re never doing network requests on your UI thread. (see “Tips on being smooth” below.)

Enabling StrictMode

The recommended way to use StrictMode is to turn it on during development, learn from it, and turn it off before you ship your app.

That latter form was specifically added so you can target pre-Gingerbread API versions but still easily enable StrictMode using reflection or other techniques. For instance, you could be targeting Donut (Android 1.6) but still use StrictMode if you’re testing on a Gingerbread device or emulator, as long as you use enough Reflection to call StrictMode.enableDefaults().

Watching StrictMode

If you’re using penaltyLog(), the default, just run adb logcat and watch the terminal output. Any violations will be logged to your console, slightly rate-limited for duplicate elimination.

If you want to get fancier, turn on penaltyDropbox() and they’ll be written to the DropBoxManager, where you can extract them later withadb shell dumpsys dropbox data_app_strictmode --print

Tips on being smooth

Our Experience

During Android development we have a new “dogfood” build each day that the whole team uses. Throughout the development of Gingerbread we set up our daily dogfood builds to enable StrictMode logging and upload all found violations for analysis. Every hour a MapReduce job runs and produces an interactive report of all the event loop stalls, their stack traces (including cross-process ones), their latency percentiles, which processes/packages they appear in, etc.

Using the data from StrictMode we fixed hundreds of responsiveness bugs and animation glitches all across the board. We made performance optimizations in the Android core (e.g. system services and providers) so all apps on the system will benefit, as well as fixing up tons of app-specific issues (in both AOSP apps and Google apps). Even if you’re using Froyo today, the recent updates to GMail, Google Maps, and YouTube all benefited from StrictMode data collection gathered on Gingerbread devices.

Where we couldn’t automatically speed up the system, we instead added APIs to make certain patterns easier to do efficiently. For example, there is a new method SharedPreferences.Editor.apply(), which you should be using instead of commit() if you don’t need commit()’s return value. (It turns out almost nobody ever checks it.) You can even use reflection to conditionally use apply() vs. commit() depending on the user’s platform version.

Googlers who switched from Froyo to Gingerbread without seeing all the baby steps between were shocked at how much more responsive the system became. Our friends on the Chrome team then recently added something similar. Of course, StrictMode can’t take all the credit. The new concurrent garbage collector in Gingerbread also greatly reduces latency hiccups.

The Future

The StrictMode API and its capabilities will continue to expand. We have some good stuff lined up for StrictMode in Honeycomb but let us know what else you’d like to see! I’ll be answering questions on stackoverflow.com for questions tagged “strictmode”. Thanks!