Case study: converting a Shiny app to async

In this case study, we’ll work through an application of reasonable complexity, turning its slowest operations into futures/promises and modifying all the downstream reactive expressions and outputs to deal with promises.

Motivation

As a web service increases in popularity, so does the number of rogue scripts that abuse it for no apparent reason.

—Cheng’s Law of Why We Can’t Have Nice Things

I first noticed this in 2011, when the then-new RStudio IDE was starting to gather steam. We had a dashboard that tracked how often RStudio was being downloaded, and the numbers were generally tracking smoothly upward. But once every few months, we’d have huge spikes in the download counts, ten times greater than normal—and invariably, we’d find that all of the unexpected increase could be tracked to one or two IP addresses.

For hours or days we’d be inundated with thousands of downloads per hour, then just as suddenly, they’d cease. I didn’t know what was happening then, and I still don’t know today. Was it the world’s least competent denial-of-service attempt? Did someone write a download script with an accidental while (TRUE) around it?

Our application will let us examine downloads from CRAN for this kind of behavior. For any given day on CRAN, we’ll see what the top downloaders are and how they’re behaving.

Our source data

RStudio maintains the popular 0-Cloud CRAN mirror, and the log files it generates are freely available at http://cran-logs.rstudio.com/. Each day is a separate gzipped CSV file, and each row is a single package download. For privacy, IP addresses are anonymized by substituting each day’s IP addresses with unique integer IDs.

Fortunately for our purposes, there’s no need to analyze these logs at a high level to figure out which days are affected by badly behaved download scripts. These CRAN mirrors are popular enough that, according to Cheng’s Law, there should be plenty of rogue scripts hitting it every day of the year.

When the app starts, the “All traffic” tab shows you the number of package downloads per hour for all users vs. whales. In this screenshot, you can see the proportion of files downloaded by the top six downloaders on May 28, 2018. It may not look like a huge fraction at first, but keep in mind, we are only talking about six downloaders out of 52,815 total!

The “Biggest whales” tab simply shows the most prolific downloaders, with their number of downloads performed. Each anonymized IP address has been assigned an easier-to-remember name, and you can also see the country code of the original IP address.

The “Whales by hour” tab shows the hourly download counts for each whale individually. In this screenshot, you can see that the Netherlands’ relieved_snake downloaded at an extremely consistent rate during the whole day, while the American curly_capabara was active only during business hours in Eastern Standard Time. Still others, like colossal_chicken out of Hong Kong, was busy all day but at varying rates.

The “Detail View” has perhaps the most illuminating information. It lets you view every download made by a given whale on the day in question. The x dimension is time and the y dimension is what package they downloaded, so you can see at a glance exactly how many packages were downloaded, and how their various package downloads relate to each other. In this case, relieved_snake downloaded 104 different packages, in the same order, continuously, for the entire day.

Others behave very differently, like freezing_tapir, who downloaded devtools–and onlydevtools–for the whole day, racking up 19,180 downloads totalling 7.9 gigabytes for that one package alone!

Sadly, the app can’t tell us any more than that–it can’t explain why these downloaders are behaving this way, nor can it tell us their street addresses so that we can send ninjas in black RStudio helicoptors to make them stop.

The implementation

Now that you’ve seen what the app does, let’s talk about how it was implemented, then convert it from sync to async.

User interface

The user interface is a pretty typical shinydashboard. It’s important to note that the UI part of the app is entirely agnostic to whether the server is written in the sync or async style; when we port the app to async, we won’t touch the UI at all.

There are two major pieces of input we need from users: what date to examine (this app only lets us look at one day at a time) and how many of the most prolific downloaders to look at. We’ll put these two controls in the dashboard sidebar.

(We set date to two days ago by default, because there’s some lag between when a day ends and when its logs are published.)

The rest of the UI code is just typical shinydashboard scaffolding, plus some shinydashboard::valueBoxOutputs and plotOutputs. These are so trivial that they’re hardly worth talking about, but I’ll include the code here for completeness. Finally, there’s detailViewUI, a Shiny module that just contains more of the same (value boxes and plots).

Server logic

Based on these inputs and outputs, we’ll write a variety of reactive expressions and output renderers to download, manipulate, and visualize the relevant log data.

The reactive expressions:

data (eventReactive): Whenever input$date changes, the data reactive downloads the full log for that day from http://cran-logs.rstudio.com, and parses it.

whales (reactive): Reads from data(), tallies the number of downloads performed by each unique IP, and returns a data frame of the top input$count most prolific downloaders, along with their download counts.

whale_downloads (reactive): Joins the data() and whales() data frames, to return all of the details of the cetacean downloads.

The whales reactive expression depends on data, and whale_downloads depends on data and whales.

The outputs in this app are mostly either renderPlots that we populate with ggplot2, or shinydashboard::renderValueBoxes. They all rely on one or more of the reactive expressions we just described. We won’t catalog them all here, as they’re not individually interesting, but we will look at some archetypes below.

Improving performance and scalability

While this article is specifically about async, this is a good time to remind you that there are lots of ways to improve the performance of a Shiny app. Async is just one tool in the toolbox, and before reaching for that hammer, take a moment to consider your other options:

Have I used profvis to profile my code and determine what’s actually taking so long? (Human intuition is a notoriously bad profiler!)

Can I perform any calculations, summarizations, and aggregations offline, when my Shiny app isn’t even running, and save the results to .rds files to be read by the app?

Are there any opportunities to cache–that is, save the results of my calculations and use them if I get the same request later? (See memoise, or roll your own.)

Am I effectively leveraging reactive programming to make sure my reactives are doing as little work as possible?

These options are more generally useful than using async techniques because they can dramatically speed up the performance of an app even if only a single user is using it. While it obviously depends on the particulars of the app itself, a few lines of precomputation or caching logic can often lead to 10X-100X better performance. Async, on the other hand, generally doesn’t help make a single session faster. Instead, it helps a single Shiny process support more concurrent sessions without getting bogged down.

Async can be an essential tool when there is no way around performing expensive tasks (i.e. taking multiple seconds) while the user waits. For example, an app that analyzes any user-specified Twitter profile may get too many unique queries (assuming most people specify their own Twitter handle) for caching to be much help. And applications that invite users to upload their own datasets won’t have an opportunity to do any offline summarizing in advance. If you need to run apps like that and support lots of concurrent users, async can be a huge help.

In that sense, the cranwhales app isn’t a perfect example, because it has lots of opportunities for precomputation and caching that we’ll willfully ignore today so that I can better illustrate the points I want to make about async. When you’re working on your own app, though, please think carefully about all of the different techniques you have for improving performance.

Converting to async

Any code that relies on the result of those operations (if any), whether directly or indirectly, now must be converted to promise handlers that operate on the future object.

In this case, the slow operations are easy to identify: the downloading and parsing that takes place in the data reactive expression can each take several long seconds.

Converting the download and parsing operations into futures turns out to be the most complicated part of the process, for reasons we’ll get into later.

Assuming we do that successfully, the data reactive expression will no longer return a data frame, but a promise object (that resolves to a data frame). Since the whales and whale_downloads reactive expressions both rely on data, those will both also need to be converted to read and return promise objects. And therefore, because the outputs all rely on one or more reactive expressions, they will all need to know how to deal with promise objects.

Async code is infectious like that; once you turn the heart of your app into a promise, everything downstream must become promise-aware as well, all the way through to the observers and outputs.

With that overview out of the way, let’s dive into the code.

In the sections below, we’ll take a look at the code behind some outputs and reactive expressions. For each element, we’ll look first at the sync version, then the async version.

In some cases, these code snippets may be slightly abridged. See the GitHub repository for the full code.

Loading promises and future

I originally used multiprocess but file downloading inside a future seemed to fail on Mac. (I’ve found that it’s usually not worth spending a lot of time trying to figure out why multiprocess doesn’t work for some specific code; instead, just use multisession, since that’s probably going to be the solution anyway.)

The data reactive: future() all the things

The next thing we’ll do is convert the data event reactive to use future for the expensive bits. The original code looks lke this:

(Earlier, I said we wouldn’t take advantage of precomputation or caching. That wasn’t entirely true; in the code above, we cache the log files we download in a data_cache directory. I couldn’t bring myself to put my internet connection through that level of abuse, as I knew I’d be running this code thousands of times as I load tested it.)

For now, we’ll lose the withProgress/setProgress reporting, since doing that correctly requires some more advanced techniques that we’ll talk about later. We’ll come back and fix this code later, but for now:

Pretty straightforward. This reactive now returns a future (which counts as a promise), not a data frame.

Remember that we must read any reactive values (including input) and reactive expressions from outside the future. (You will get an error if you attempt to read one from inside the future.)

At this point, since there are no other long-running operations we want to make asynchronous, we’re actually done interacting directly with the future package. The rest of the reactive expressions will deal with the future returned by data using general async functions and operators from promises.

The whales reactive: simple pipelines are simple

The whales reactive takes the data frame from data, and uses dplyr to find the top input$count most prolific downloaders.

Since data() now returns a promise, the whole function needs to be modified to deal with promises.

This is basically a best-case scenario for working with promises. The whole expression consists of a single magrittr pipeline. There’s only one object (data()) that’s been converted to a promise. The promise object only appears once, at the head of the pipeline.

When the stars align like this, converting this code to async is literally as easy as replacing each %>% with %...>%:

The input (data()) is a promise, the resulting output object is a promise, each stage of the pipeline returns a promise; but we can read and write this code almost as easily as the synchronous version!

An example this simple may seem reductive, but this best-case scenario happens surprisingly often, if your coding style is influenced by the tidyverse. In this example app, 59% of the reactives, observers, and outputs were converted using nothing more than replacing %>% with %...>%.

One last thing before we move on. In the last section, I emphasized that reactive values cannot be read from inside a future. Here, we’re using head(input$count) inside a promise-pipeline; since data() is written using a future, doesn’t that mean… well… isn’t this wrong?

Nope—this code is just fine. The prohibition is against reading reactive values/expressions from inside a future, because code inside a future is executed in a totally different R process. The steps in a promise-pipeline aren’t futures, but promise handlers. These aren’t executed in a different process; rather, they’re executed back in the original R process after a promise is resolved. We’re allowed and expected to access reactive values and expressions from these handlers.

Remember, both data() and whales() now return a promise object, not a data frame. None of the dplyr verbs know how to deal with promises natively (and the same is true for almost every other R function, anywhere in the R universe).

We’re able to use %...>% with promises on the left-hand side and regular dplyr calls on the right-hand side, only because the %...>% operator “unwraps” the promise object for us, yielding a regular object (data frame or whatever) to be passed to dplyr. But in this case, we’re passing whales(), which a promise object, directly to inner_join, and inner_join has no idea what to do with it.

The fundamental thing to pattern-match on here, is that we have a block of code that relies on more than one promise object, and that means %...>% won’t be enough. This is a pretty common situation as well, and occurs in 12% of reactives and outputs in this example app.

Promises: the Gathering

This solution uses the promise gathering pattern, which combines promises_all, %...>%, and with.

The promise_all function gathers multiple promise objects together, and returns a single promise object. This new promise object doesn’t resolve until all the input promise objects are resolved, and it yields a list of those results.

You can make use of this pattern without remembering exactly how these pieces combine. Just remember that the arguments to promise_all provide the promise objects (future(1) and future(2)), along with the names you want to use to refer to their yielded values (x and y); and the code block you put in with() can refer to those names without worrying about the fact that they were ever promises to begin with.

This is structurally no different than the whales best-case scenario reactive. One thing worth pointing out is that an async renderValueBox means you return a promise that returns a valueBox; you don’t return a valueBox to whom you have passed a promise.

The other trick worth nothing is the pull verb, which is used to retrieve a specific column of a data frame as a vector (similar to $ or [[). In this case, pull(data, ip_id) is equivalent to data[["ip_id"]]. Note that pull is part of dplyr and isn’t specific to promises.

The biggest_whales plot: getting untidy

In a cruel twist of API design fate, one of the cornerstone packages of the tidyverse lacks a tidy API. I’m referring, of course, to ggplot2:

While dplyr and other tidyverse packages are designed to link calls together with %>%, the older ggplot2 package uses the + operator. This is mostly a small aesthetic wart when synchronous code, but it’s a real problem with async, because the promises package doesn’t currently have a promise-aware replacement for + like it does for %>%.

Fortunately, there’s a pretty good escape hatch for %>%, and %...>% inherited it too. Instead of a pipeline stage being a simple function call, you can put a { and } delimited code block, and inside of that code block, you can access the “it” value using a period (.).

The importance of this pattern cannot be overstated! Using %...>% and simple calls alone, you’re restricted to doing pipeline-compatible operations. But %...>% together with a curly-brace code block means your handler code can be any shape or size. Once inside that code block, you have a regular, non-promise value in . (if you even want to use it—sometimes you don’t, as we’ll see later). You can have zero, one, or more statements. You can use the . multiple times, in nested expressions, whatever.

Tip: if you have extensive or complex code to put in a code block, start the block by creating a properly named variable to store the value of .. The reason for this is that . may acquire a different meaning than you intend as you add code to the code block. For example, if a magrittr pipeline starts with ., instead of evaluating the pipeline and returning a value, it creates a function that takes a single argument. So the following code wouldn’t filter the resolved value of whales(), but instead, create an anonymous function that calls filter(n > 1000) on whatever you pass it.

There are other ways to work around the above problem as well, but I like this fix because it doesn’t require any thought or care. Just give the . value a new name, and forget the . exists.

For untidy code with a single promise object, just remember: pair a single %...>% with a code block and you should be able to do almost anything.

Revisiting the data reactive: progress support

Now that we have discussed a few techniques for writing async code, let’s come back to our original data event reactive, and this time do a more faithful async conversion that preserves the progress reporting functionality of the original.

First, the withProgress({...}) function cannot be used with async. withProgress is designed to wrap a slow synchronous action, and dismisses its progress dialog when the block of code it wraps is done executing. Since the call to future() will return immediately even though the actual task is far from done, using withProgress won’t work; the progress dialog would be dismissed before the download even got going.

It’s conceivable that withProgress could gain promise compatibility someday, but it’s not in Shiny v1.1.0. In the meantime, we can work around this by using the alternative, object-oriented progress API that Shiny offers. It’s a bit more verbose and fiddly than withProgress/setProgress, but it is flexible enough to work with futures/promises.

Second, progress messages can’t be sent from futures. This is simply because futures are executed in child processes, which don’t have direct access to the browser like the main Shiny process does.

It’s conceivable that future could gain the ability for child processes to communicate back to their parents, but no good solution exists at the time of this writing. In the meantime, we can work around this by taking the one future that does both downloading and parsing, and splitting it into two separate futures. After the download future has completed, we can send a progress message that parsing is beginning, and then start the parsing future.

The single future we wrote earlier has now become a pipeline of promises:

future (download)

send progress message

future (parse)

dismiss progress dialog

Note that neither the R6 call p$set(message = ...) nor the second future() call are tidy, so they use curly-brace blocks, as discussed in the above section about biggest_whales.

The final step of dismissing the progress dialog doesn’t use %...>% at all; because we want the progress dialog to dismiss whether the download and parse operations succeed or fail, we use the regular pipe %>% and finally() function instead. See the relevant section in Working with promises in R to learn more.

With these changes in place, we’ve now covered all of the changes to the application. You can see the full changes side-by-side via this GitHub diff.

Measuring scalability

It was a fair amount of work to do the sync-to-async conversion. Now we’d like to know if the conversion to async had the desired effect: improved responsiveness (i.e. lower latency) when the number of simultaneous visitors increases.

Load testing with Shiny (coming soon)

At the time of this writing, we are working on a suite of load testing tools for Shiny that is not publicly available yet, but was previewed by Sean Lopp during his epic rstudio::conf 2018 talk about running a Shiny load test with 10,000 simulated concurrent users.

You use these tools to easily record yourself using your Shiny app, which creates a test script; then play back that test script, but multiplied by dozens/hundreds/thousands of simulated concurrent users; and finally, analyze the timing data generated during the playback step to see what kind of latency the simulated users experienced.

To examine the effects of my async refactor, I recorded a simple test script by loading up the app, waiting for the first tab to appear, then clicking through each of the other tabs, pausing for several seconds each time before moving on to the next. When using the app without any other visitors, the homepage fully loads in less than a second, and the initial loading of data and rendering of the plot on the default tab takes about 7 seconds. After that, each tab takes no more than a couple of seconds to load. Overall, the entire test script, including time where the user is thinking, takes about 40 seconds under ideal settings (i.e. only a single concurrent user).

I then used this test script to generate load against the Shiny app running in my local RStudio. With the settings I chose, the playback tool introduced one new “user” session every 5 seconds, until 50 sessions total had been launched; then it waited until all the sessions were complete. I ran this test on both the sync and async versions in turn, which generated the following results.

Sync vs. async performance

In this plot, each row represents a single session, and the x dimension represents time. Each of the rectangles represents a single “step” in the test script, be it downloading the HTML for the homepage, fetching one of the two dozen JavaScript/CSS files, or waiting for the server to update outputs. So the wider a rectangle is, the longer the user had to wait. (The empty gaps between rectangles represents time the app is waiting for the user to click an input; their widths are hard-coded into the test script.)

Of particular importance are the red and pink rectangles, as these represent the initial page load. While these are taking place, the user is staring at a blank page, probably wondering if the server is down. Long waits during this stage are not only undesirable, but surprising and incomprehensible to the user; whereas the same user is probably prepared to wait a little while for a complicated visualization to be rendered in response to an input change.

And as you can see from this plot, the behavior of the async app is much improved in the critical metric of homepage/JS/CSS loading time. The sync version of the app starts displaying unacceptably long red/pink loading times as early as session 15, and by session #44 the maximum page load time has exceeded one minute. The async version at that point is showing 25 second load times, which is far from great, but still a significant step in the right direction.

Further optimizations

I was surprised that the async version’s page load times weren’t even faster, and even more surprised to see that the blue rectangles were just as wide as the sync version. Why isn’t the async version way faster? The sync version does all of its work on a single thread, and I specifically designed this app to be a nightmare for scalability by having each session kick off by parsing hundreds of megabytes of CSV, an operation that is quite expensive. The async version gets to spread these jobs across several workers. Why aren’t we seeing a greater time savings?

Mostly, it’s because calling future(read_csv("big_file.csv")) is almost a worst-case scenario for future and async. read_csv is generally fast, but because the CRAN log files are so big, read_csv("big_file.csv") is slow. The value it returns is a very large data frame, that has now been loaded not into the Shiny process, but a future worker process. In order to return that data frame to the Shiny process, that data must first be serialized (I believe future essentially uses saveRDS for this), transmitted to the Shiny process, and then deserialized; to make matters worse, the transmitting and deserialization steps happen on the main R thread that we’re working so hard to try to keep idle. The larger the data we send back and forth to the future, the more performance suffers, and in this case we’re sending back quite a lot of data.

We can make our code significantly faster by doing more summarizing, aggregation, and filtering inside the future; not only does this make more of the work happen in parallel, but by returning the data in already-processed form, we can have much less data to transfer from the worker process back to the Shiny process. (For example, the data for May 31, 2018 weighs 75MB before optimization, and 8.8MB afterwards.)

Compare all three runs in the image below (the newly optimized version is labelled “async2”). The homepage load times have dropped further, and the calculation times are now dramatically faster than the sync code.

Looking at the “async2” graph, the leading (bottom-left) edge has the same shape as before, as that’s simply the rate at which the load testing tool launches new sessions. But notice how much more closely the trailing (upper-right) edge matches the leading edge! It means that even as the number of active sessions ramped up, the amount of latency didn’t get dramatically worse, unlike with the “sync” and “async” versions. And each of the individual blue rectangles in the “async2” are comparatively tiny, meaning that users never have to wait more than a dozen seconds at the most for plots to update.

This last plot shows the same data as above, but with the sessions aligned by start time. You can clearly see how the sessions are both shorter and less variable in “async2” compared to the others. I’ve added a yellow vertical line at the 10 second mark; if the page load (red/pink) has not completed at this point, it’s likely that your visitor has left in disgust. While “async” does better than “sync”, they both break through the 10 second mark early and often. In contrast, the “async2” version just barely peeks over the line three times.

To get a visceral sense for what it feels like to use the app under load, here’s a video that shows what it’s like to browse the app while the load test is running at its peak. The left side of the screen shows “sync”, the right shows “async2”. In both cases, I navigated to the app when session #40 was started.

Take a look at the code diff for async vs. async2. While the code has not changed very dramatically, it has lost a little elegance and maintainability: the code for each of the affected outputs now has one foot in the the render function and one foot in the future. If your app’s total audience is a team of a hundred analysts and execs, you may choose to forgo the extra performance and stick with the original async (or even sync) code. But if you have serious scaling needs, the refactoring is probably a small price to pay.

Let’s get real for a second, though. If this weren’t an example app written for exposition purposes, but a real production app that was intended to scale to thousands of concurrent users across dozens of R processes, we wouldn’t download and parse CSV files on the fly. Instead, we’d establish a proper ETL procedure to run every night and put the results into a properly indexed database table, or RDS files with just the data we need. As I said earlier, a little precomputation and caching can make a huge difference!

Much of the remaining latency for the async2 branch is from ggplot2 plotting. Sean’s talk alluded to some upcoming plot caching features we’re adding to Shiny, and I imagine they will have as dramatic an effect for this test as they did for Sean.

Summing up

With async programming, expensive computations and tasks no longer need to be the scalability killers that they once were for Shiny. Armed with this and other common techniques like precomputation, caching, and load balancing, it’s possible to write responsive and scalable Shiny applications that can be safely deployed to thousands of concurrent users.