Dropbox dives into CoffeeScript

During July’s Hackweek, the three of us rewrote Dropbox’s full browser-side codebase to use CoffeeScript instead of JavaScript, and we’ve been really happy with how it’s been going so far. This is a controversial subject, so we thought we’d start by explaining why.

We’ve heard many arguments against CoffeeScript. Before diving in, we were most concerned about these two:

That it adds extra bloat to iterative development, because each tweak requires recompilation. In our case, we avoided this problem entirely by instrumenting our server code: whenever someone reloads a Dropbox page running on their development server, it compare mtimes between .coffee files and compiled .js equivalents. Anything needing an update gets compiled. Compilation is imperceptibly fast thanks to jashkenas and team. This means we didn’t need to change our workflow whatsoever, didn’t need to learn a new tool, or run any new background process (no coffee --watch). We just write CoffeeScript, reload the page, loop.

That debugging compiled js is annoying. It’s not, and the main reason is CoffeeScript is just JavaScript: it’s designed to be easy to debug, in part by leaving JavaScript semantics alone. We’ve heard many arguments for and against debuggability, and in the end, we convinced ourselves that it’s easy only after jumping in and trying it. We converted and debugged about 23,000 lines of JavaScript into CoffeeScript in one week without many issues. We took time to test the change carefully, then slowly rolled it out to users. One week after Hackweek had ended, it was fully launched.

Probably the most misleading argument we hear against CoffeeScript goes something like this: If you like Python or Ruby, go for CoffeeScript — it’s really just a matter of syntactic preference. This argument frustrates us, because it doesn’t consider history. Stick with us for a minute:

April 1995: Brendan Eich, a SICP enthusiast, joins Netscape with the promise of bringing Scheme to the browser.

He’s assigned to other projects in the first few months that he joins. Java launches in the meantime and explodes in popularity.

Later in ’95: Scheme is off the table. Upper management tasks Eich with creating a language that is to Java as VBScript is to C++, meant for amateurs doing simple tasks, the idea being that self-respecting pros would be busy cranking out Java applets. In Eich’s words:JS had to “look like Java” only less so, be Java’s dumb kid brother or boy-hostage sidekick. Plus, I had to be done in ten days or something worse than JS would have happened.Imagine Bruce Campbell Brendan Eich as he battled sleep deprivation to get a prototype out in 10 days, all the while baking his favorite concepts from Scheme and Self into a language that, on the surface, looked completely unrelated. LiveScript is born. It launches with Netscape Navigator 2.0 in September ’95.

December ’95: For reasons that are probably marketing-related and definitely ill-conceived, Netscape changes the name from LiveScript to JavaScript in version 2.0B3.

August ’96: Microsoft launches IE 3.0, the first version to include JavaScript support. Microsoft calls their version “JScript” (presumably for legal reasons).

November ’96: ECMA (Now Ecma) begins standardization. Netscape and Microsoft argue over the name. The result is an even worse name. Quoting Eich, ECMAScript “was always an unwanted trade name that sounds like a skin disease.”

Especially considering the strange, difficult and rushed circumstances of its origin, JavaScript did many things well: first class functions and objects, prototypes, dynamic typing, object literal syntax, closures, and more. But is it any surprise that it got a bunch of things wrong too? Just considering syntax, things like: obscuring prototypical OOP through confusingly classical syntax, the var keyword (forgot var? congrats, you’ve got a global!), automatic type coercion and == vs ===, automatic semicolon insertion woes, the arguments object (which acts like an array except when it doesn’t), and so on. Before any of these problems could be changed, JavaScript was already built into competing browsers and solidified by an international standards committee. The really bad news is, because browsers evolve slowly, browser-interpreted languages evolve slowly. Introducing new iteration constructs, adding default arguments, slices, splats, multiline strings, and so on is really difficult. Such efforts take years, and require cooperation among large corporations and standards bodies.

Our point is to forget CoffeeScript’s influences for a minute, because it fixes so many of these syntactic problems and at least partially breaks free of JavaScript’s slow evolution; even if you don’t care for significant whitespace, we recommend CoffeeScript for so many other reasons. Disclaimer: we love Python, and it’s Dropbox’s primary language, so we’re probably biased.

An interesting argument against CoffeeScript from Ryan Florence, that seemed plausible to us on first impression but didn’t hold up after we thought more about it, is the idea that (a) human beings process images and symbols faster than words, so (b) verbally readable code isn’t necessarily quicker to comprehend. Florence uses this to argue that (c) while CoffeeScript may be faster to read, JavaScript is probably faster to comprehend. We’d expect cognitive science provides plenty of evidence in support of (a), including the excellent circle example cited by Florence. (b) is easily proven by counterexample. Making the leap to (c) is where we ended up disagreeing:

CoffeeScript introduces new symbols! For example, (a,b,c) -> ... instead of function (a,b,c) {...}. Along with being shorter to type, we think this extra notation makes code easier to comprehend, similar to how math is often better explained through notation instead of words.

Consider one example where CoffeeScript does in fact swap a symbol for a word: || vs or. Is || really analogous to the circle in Florence’s example, with or being the verbal description of that circle? This needs the attention of a cognitive scientist, but our hunch is || functions more linguistically than it does symbolically to most readers, acting as a stand-in for the word or. So in this case we expect something more like the reverse of the circle example: we think || and or are about equally readable, but would give slight benefit to CoffeeScript’s or, as it replaces a stand-in for or with or itself. Humans are good at mapping meanings to symbols, but there’s nothing particularly or-esque about ||, so we suspect it adds a small amount of extra work to comprehend.

We’ll let this comparison speak for itself. We consider it our strongest argument in favor of CoffeeScript.

Statistics

JavaScript

CoffeeScript

Lines of code

23437

18417

Tokens

75334

66058

Characters

865613

659930

In the process of converting, we shaved off more than 5000 lines of code, a 21% reduction. Granted, many of those lines looked like this:

});
});
}
}

Regardless, fewer lines is beneficial for simple reasons — being able to fit more code into a single editor screen, for example.

Measuring reduction in code complexity is of course much harder, but we think the stats above, especially token count, are a good first-order approximation. Much more to say on that subject.

In production, we compile and concatenate all of our CoffeeScript source into a single JavaScript file, minify it, and serve it to browsers with gzip compression. The size of the compressed bundle didn’t change significantly pre- and post-coffee transformation, so our users shouldn’t notice anything different. The site performs and behaves as before.

Methodology

Rewriting over 23,000 lines of code in one (hack)week was a big undertaking. To significantly hasten the process and avoid bugs, we used js2coffee, a JavaScript to CoffeeScript compiler, to do all of the repetitive conversion tasks for us (things like converting JS blocks to CS blocks, or JS functions to CS functions). We’d start converting a new JS file by first compiling it individually to CS, then manually editing each line as we saw fit, improving style along the way, and making it more idiomatic. One example: the compiler isn’t smart enough to convert a JS three-clause for into a CS for/in. Instead it outputs a CS while with i++ at the end. We switched each of those to simpler loops. Another example: using string interpolation instead of concatenation in places where it made sense.

To make sure we didn’t break the site, we used a few different approaches to test:

We built a fuzz tester with Selenium. It takes a random walk across the website looking for exceptions. Give it enough time, and it theoretically should catch ’em all 😉

Tons of manual testing.

Going Forward

Dropbox now writes all new browser-side code in CoffeeScript, and we’ve been loving it. We’ve already written several thousand new lines of coffee since launching in July. Some of the things we’re looking to improve in the future:

Browser support for CoffeeScript source maps, so we can link JavaScript exceptions directly to the source code, and debug CoffeeScript live.

Native CoffeeScript support in browsers, so that during development, we can avoid the compilation to JavaScript altogether.

Thanks

To Brendan Eich and Jeremy Ashkenas for creating two fantastic languages.

Just felt like pointing this out: LOC is widely recognized as a poor indication of anything about your code other than how many LOC it is…

Secondly, I can write code with very few tokens, it is less readable than code with a sensible amount of tokens. Calculating intermediate values and assigning them to named variables can make algorithms significantly easier to understand. To that point, token count is also a very poor indication of code complexity.

I’m trying to come round to coffeescript but I have trouble when I reach real world code and see terrible Frankenstein code. Examples are all good and well but maybe somebody can point how to me how to write this (jQuery based snippet) better:

$(‘#someDiv’).html “””foo”””

It seems that method chaining in CS, a very useful feature in many many languages, requires you to write a mix of CS and JS syntax. I’d be very interested in knowing if there is a better way to do the above, and say, what if I wanted to write in CS the equivalent of $(‘#someDiv’).show().html(“foo”). This mashup of syntax really puts me off and is just about the ugliest thing I’ve ever seen.

SLOC says nothing about code quality or complexity but it says enough about how much time you’ll have to spend reading any given piece of code to go back and update/maintain it. Anyone who says SLOC is meaningless has a severe lack of appreciation for subtlety (read: is probably a very poor programmer).

That’s why I said it’s a “poor indication” as opposed to “meaningless”. If LOC is all you have to go on, you know nothing (other than LOC).

That being said, if two pieces of code _in the same language_ do the same thing in a different number of lines, the shorter one _may_ be better (bearing in mind it’s easy to reduce LOC by not doing intermediate assignments but that may make the code harder to understand).

The difficulty with comparing two different languages (coffeescript may compile to javascript, you can compile C to Assembly, you would never say they are the same language), is that languages are inherently different.

I can write a Python application and a C application that do identical things, the Python version will be many LOC shorter. Does that make the Python application “better”? Absolutely not.

Coffeescript introduces some syntax common to functional languages, particularly list comprehensions, which are inevitably going to significantly reduce LOC. Do they make the code more readable? *shrug* I can read a for loop fine and I really don’t feel it takes me less time to read a list comprehension.

I don’t think my post particularly bashed coffeescript and I asked if anyone knew how to do method chaining, the thing that really annoys me in coffeescript. Thanks though for trying to disparage me (and posting anonymously to boot). Anyone who says shorter code is, by default, easier to update/maintain has a severe lack of appreciation for subtlety (read: is probably a very poor programmer).

I should also point out you contradicted yourself in your first sentence. “code quality” and “complexity” are the two most important factors in how long you’ll have to spend reading, updating and maintaining code. By your own hand, LOC doesn’t tell you anything about those things, therefore, LOC really doesn’t tell you anything about how long you’ll have to spend reading, updating and maintaining the code…

There are meant to be <p> tags around those “foo”s above but they weren’t escaped and editing appears to be broken (just get a “something went wrong” message, guessed you missed that when you were testing your cofeescript move).

So……….. you moved to CoffeeScript to save on keystrokes? Because that’s what your argument is about. Not to save on disk space (actually disk consumption will probably expand). Not because CoffeeScript is easier to learn, because a CoffeeScript expert should know JavaScript too. Not because CoffeeScript is more powerful, because whatever CoffeeScript can achieve, can also be achieved in JavaScript, by definition. Not because CoffeeScript has better support, because clearly it doesn’t, either in terms of tools, resources and support. But to save on keystrokes. The fact you wrote this article in a defensive manner shows you doubted it was the right thing to do.

Please note: Sometimes we blog about upcoming products or features before they’re released, but timing and exact functionality of these features may change from what’s shared here. The decision to purchase our services should be made based on features that are currently available.