How did Twitter handle election tweets? Less Ruby, more Java

As Twitter continues to tweak Ruby performance, it moves more to JVM on servers.

On Election Day, as Donald Trump raged on Twitter about revolution, Nate Silver took an opportunity to pitch his book, and an 18-year old girl vowed to move to Australia "because their president is a Christian and actually supports what he says," Twitter's servers were handling a surge of 327,452 "tweets" per minute, according to Mazen Rawashdeh, Twitter's VP of Infrastructure Operations Engineering. In total, there were 31 million "election-related" posts to Twitter over the course of the day, and the traffic continued to periodically spike—at one point reaching 15,107 tweets per second.

Part of the reason Twitter was able to sustain the unprecedented continuous spikes in traffic, Rawashdeh said in Twitter's engineering blog, was a set of changes the company's operations team has been making over the past year in Twitter's infrastructure—including a move away from the Ruby web scripting language and runtime engine that Twitter was built on, and toward code (a combination of Ruby, Java and Scala code) running on a server-side Java Virtual Machine , while at the same time continuing to tweak Ruby's runtime for more performance.

Twitter has largely run on a modified version of Ruby, called Ruby Enterprise Edition, since 2009. But the interpreter for Ruby has put heavy loads on the processors of Twitter's servers, partially because of its "garbage collection"—the task of reclaiming memory from processes that have ended. Twitter's developers have continued to try to optimize Ruby's garbage collection to squeeze more performance out of the runtime, developing their own garbage collector called Kiji. But the company has also begun to move its development efforts away from Ruby and onto Java.

In school I actually did a project on the future of server side processing and Twitter was used as one of my main references. To be specific they use Scala on the back end, which you can think of as sort of a Java extension, as it includes Java in its library and builds on the already there functionality. All in all its a pretty cool language to look at and really good for back end processing, which is why we have some big companies like Twitter moving away from RoR.

Oh dear, that's even worse than "I'm moving to Canada!" since the Canadian PM is part of a party called "Conservative Party" (fair mistake I guess), but the Conservatives still further left than anyone in the US government.

Oh dear, that's even worse than "I'm moving to Canada!" since the Canadian PM is part of a party called "Conservative Party" (fair mistake I guess), but the Conservatives still further left than anyone in the US government.

Because the JVM has had years and years of worked put into it by very bright people, and as a result is in the top tier of VM performance. The CLR is also good, but the JVM runs on on Linux and has extensive supporting infrastructure, plus metric assloads of additional libraries if you need them.

To be clear, Twitter is using the JVM, not Java the language

edit:

on the server side, the JVM will be with us for a long long time, and a number of up and coming languages like Scala (or clojure) plus stuff like JRuby are leveraging it. Now, Java the language, well that's going to be with us for a long long time as well, but increasingly for existing codebases.

Never thought a company as cool and hip as Twitter would move towards something perceived as stale and uncool like Java. Props to them for seeing through the rhetoric. Personally, I enjoy working with the latest JEE stack; old apps built on it, not so much.

Never thought a company as cool and hip as Twitter would move towards something perceived as stale and uncool like Java. Props to them for seeing through the rhetoric. Personally, I enjoy working with the latest JEE stack; old apps built on it, not so much.

JEE has indeed greatly improved, and Java the language is moving forward (slowly, but still) as well. That said, Twitter isn't using Java.

It's telling that the wanted the technical sophistication of the JVM, but chose Scala instead of Java as the language they were going to develop on it with,

Oh dear, that's even worse than "I'm moving to Canada!" since the Canadian PM is part of a party called "Conservative Party" (fair mistake I guess), but the Conservatives still further left than anyone in the US government.

And then you pick the country with the an athiest woman. Nice!

The unmarried atheist who is living with her boyfriend, at that.

Unmarried? Living in sin!? AUSTRALIANS HOW COULD YOU LOOK PAST SOMEONE'S RELIGION AND DECIDE YOUR LEADER BASED ON THEIR MERITS!? HOWWWWW!?

Hopefully she left Twitter for good. Probably some no-good atheists working on that there backend.

Oh dear, that's even worse than "I'm moving to Canada!" since the Canadian PM is part of a party called "Conservative Party" (fair mistake I guess), but the Conservatives still further left than anyone in the US government.

And then you pick the country with the an athiest woman. Nice!

The unmarried atheist who is living with her boyfriend, at that.

Unmarried? Living in sin!? AUSTRALIANS HOW COULD YOU LOOK PAST SOMEONE'S RELIGION AND DECIDE YOUR LEADER BASED ON THEIR MERITS!? HOWWWWW!?

Hopefully she left Twitter for good. Probably some no-good atheists working on that there backend.

EDIT: Link above is from 2009 and talks of peak of 9000 tweets per min. Article (2012) is 327,452 "tweets" per minute.

Yes, they're using the Java VM to run Scala. I'll clarify that above. The important part is that the JVM is the core instead of Ruby EE for much of what they're doing. It sounds like the main issue with migration has been that it requires rewriting a lot of legacy components, and they've been focused in other areas for dev.

At one point Ruby people LOVED Twitter, now they all hate them - in part because they are crappy programmers who blame Ruby for much of their issue. People who have worked at Twitter have blamed the codebase instead. Just like Python/Java/Scala/InsertLanguageSlowerThanASMHERE, if you develop a part that needs to go faster, there's a way to write it in C (and in effect you can also write it in RAW ASM) to make it go faster. Instead they blame it all on Ruby - Meh.

At one point Ruby people LOVED Twitter, now they all hate them - in part because they are crappy programmers who blame Ruby for much of their issue. People who have worked at Twitter have blamed the codebase instead. Just like Python/Java/Scala/InsertLanguageSlowerThanASMHERE, if you develop a part that needs to go faster, there's a way to write it in C (and in effect you can also write it in RAW ASM) to make it go faster. Instead they blame it all on Ruby - Meh.

They blamed the garbage collector and interpreter of Ruby. It's rather hard to work around those things.

Additionally, I don't understand calling a successful company's programmers 'crappy' because they move from a solution that doesn't work to a solution that does.

Yes, they're using the Java VM to run Scala. I'll clarify that above. The important part is that the JVM is the core instead of Ruby EE for much of what they're doing. It sounds like the main issue with migration has been that it requires rewriting a lot of legacy components, and they've been focused in other areas for dev.

Interesting. Honestly though it is hard to imagine that there's a significant difference in programmer productivity or software quality between Java and Scala. Not that Scala might not be a better language, but unless you're comparing ancient or particularly klunky and generally non-commercial languages there's not going to be enough difference to matter. Maybe it is a matter of Scala being closer to Ruby in some ways, I don't know. Certainly with our massive Java code base here I'd consider it HIGHLY unlikely the effort to switch languages would ever pay off (given that they're both going to perform the same effectively on the same JVM). Language just isn't that big a factor.

At one point Ruby people LOVED Twitter, now they all hate them - in part because they are crappy programmers who blame Ruby for much of their issue. People who have worked at Twitter have blamed the codebase instead. Just like Python/Java/Scala/InsertLanguageSlowerThanASMHERE, if you develop a part that needs to go faster, there's a way to write it in C (and in effect you can also write it in RAW ASM) to make it go faster. Instead they blame it all on Ruby - Meh.

Coding performance-critical code in another language is an ugly hack. C++ is extremely ugly, and C isn't really portable. C doesn't have native multithreading, for example. That's a language you want to write your high-performance code in? Sure, it runs fast, but you're effectively tying your codebase down to one OS and one ISA.

CLR-languages also run on Linux, you just use the Microsoft-endorsed Mono instead of the official Windows implementation. If that sounds like it's not ideal then I agree with you, but it works.

Languages that compile to the JVM seems to be the best alternative right now. The JVM has performance that's twice as slow as C++ (Python is 25 times as slow, for comparison) and a portability better than anything. That's hard to beat.

Yes, they're using the Java VM to run Scala. I'll clarify that above. The important part is that the JVM is the core instead of Ruby EE for much of what they're doing. It sounds like the main issue with migration has been that it requires rewriting a lot of legacy components, and they've been focused in other areas for dev.

Interesting. Honestly though it is hard to imagine that there's a significant difference in programmer productivity or software quality between Java and Scala. Not that Scala might not be a better language, but unless you're comparing ancient or particularly klunky and generally non-commercial languages there's not going to be enough difference to matter. Maybe it is a matter of Scala being closer to Ruby in some ways, I don't know. Certainly with our massive Java code base here I'd consider it HIGHLY unlikely the effort to switch languages would ever pay off (given that they're both going to perform the same effectively on the same JVM). Language just isn't that big a factor.

You don't have to switch, as I understand it. You can keep your compiled Java code and just make your new Scala code reference to it. At least I think I've read some place that compiled Scala code is 100% compatible with Java code.

What we want isn't hipster programmers but running code. As a stakeholder I could really care less if the language that was used is 'cool' or not. In fact chances are the people who fancy themselves the cutting edge are people I'll regret having on staff later as they rapidly start focusing on problems of their own creation, then get bored and move on to some other place where they can blue sky instead of generating billable hours and good metrics.

Of course that's not to say that Scala is bad or anything. However you do have to look at the quality of tooling and other surrounding supporting infrastructure, not just the language and libraries. The difference in productivity between any 2 relatively modern languages which are both suitable to a task is going to be at most a couple %, whereas having a rock solid debugger, profilers, code analyzers, and good design tooling will make a much larger difference, maybe as much as 10%.

Yes, they're using the Java VM to run Scala. I'll clarify that above. The important part is that the JVM is the core instead of Ruby EE for much of what they're doing. It sounds like the main issue with migration has been that it requires rewriting a lot of legacy components, and they've been focused in other areas for dev.

Interesting. Honestly though it is hard to imagine that there's a significant difference in programmer productivity or software quality between Java and Scala. Not that Scala might not be a better language, but unless you're comparing ancient or particularly klunky and generally non-commercial languages there's not going to be enough difference to matter. Maybe it is a matter of Scala being closer to Ruby in some ways, I don't know. Certainly with our massive Java code base here I'd consider it HIGHLY unlikely the effort to switch languages would ever pay off (given that they're both going to perform the same effectively on the same JVM). Language just isn't that big a factor.

You don't have to switch, as I understand it. You can keep your compiled Java code and just make your new Scala code reference to it. At least I think I've read some place that compiled Scala code is 100% compatible with Java code.

Right, I would assume that would be true. My feeling at that point is if I found a task that was particularly benefited by implementing in Scala (or Clojure or whatever) then I'd code that code in that language. OTOH for basic garden-variety stuff Java probably suites you as well. That might change over time of course, and if you're doing a green field design you might not want to bother with any legacy language at all if your shiny new one is nimble enough (especially if you can just call Java libraries without any big hassle).

In enterprise space though you often find that there are entire frameworks of things in say Java that you really would want to be able to use (DB abstractions, service layers, etc). A lot of it involves tools and annotations and whatnot, so it does tend to be true that a lot of your more complex LOB stuff is nice to do in Java.

At one point Ruby people LOVED Twitter, now they all hate them - in part because they are crappy programmers who blame Ruby for much of their issue. People who have worked at Twitter have blamed the codebase instead. Just like Python/Java/Scala/InsertLanguageSlowerThanASMHERE, if you develop a part that needs to go faster, there's a way to write it in C (and in effect you can also write it in RAW ASM) to make it go faster. Instead they blame it all on Ruby - Meh.

Coding performance-critical code in another language is an ugly hack. C++ is extremely ugly, and C isn't really portable. C doesn't have native multithreading, for example. That's a language you want to write your high-performance code in? Sure, it runs fast, but you're effectively tying your codebase down to one OS and one ISA.

CLR-languages also run on Linux, you just use the Microsoft-endorsed Mono instead of the official Windows implementation. If that sounds like it's not ideal then I agree with you, but it works.

Languages that compile to the JVM seems to be the best alternative right now. The JVM has performance that's twice as slow as C++ (Python is 25 times as slow, for comparison) and a portability better than anything. That's hard to beat.

Performance is a hard thing to measure. Perl or Python for instance are a good bit slower than C as a general rule yet if you want to blast through a tight loop and do a bunch of regex processing you're going to be hard pressed to meet the performance of the perl interpreter for that one specific task. Java is a funny beast in other ways because modern JVMs are frighteningly good at dynamic optimization these days, something that is simply not possible with straight compiled languages. While Java is probably half as fast at your simple synthetic benchmarks a large network service with a high load running in a JVM can kick the ass of something written in C++ after its been up for a while.

I think we agree though, JVM is a hard thing to beat for a number of reasons. It is VERY scalable, you can run JBoss or OAS in 128 gigs of ram on a 32 way system and it just works. It is way faster than any of your dynamic interpreters (the P languages) and even if you CAN eek out a bit more performance in theory from a C++ application you'll spend 10x the money debugging it and developing it, best to just throw more hardware at the problem if it is really that big an issue.

What we want isn't hipster programmers but running code. As a stakeholder I could really care less if the language that was used is 'cool' or not. In fact chances are the people who fancy themselves the cutting edge are people I'll regret having on staff later as they rapidly start focusing on problems of their own creation, then get bored and move on to some other place where they can blue sky instead of generating billable hours and good metrics.

Of course that's not to say that Scala is bad or anything. However you do have to look at the quality of tooling and other surrounding supporting infrastructure, not just the language and libraries. The difference in productivity between any 2 relatively modern languages which are both suitable to a task is going to be at most a couple %, whereas having a rock solid debugger, profilers, code analyzers, and good design tooling will make a much larger difference, maybe as much as 10%.

Well, of course stakeholders are focused on results, but it's widely understood that new cutting edge software is often built by the best who use new cutting edge tools.

I believe you are understating the benefits of the Scala ecosystem. This isn't merely a slightly nicer syntax for writing the same old for loops and if checks.

People still write great stuff in C/C++, but most programmers would suggest that Java/C# enabled them to work at a higher level of abstraction and lends itself to a very different style of development.

Scala has a similar change. Compare the typesafe (creators of Scala) Akka framework to a typical message queueing system or compare typesafe's Slick framework to something like Hibernate or LINQ. I think typesafe is making much better infrastructure.

Java is a funny beast in other ways because modern JVMs are frighteningly good at dynamic optimization these days, something that is simply not possible with straight compiled languages.

That's at best 1/2 true. Modern compilers and linkers can now do profile guided optimisation, which boils down to instrumenting the binary, running the complied code, and optimising based on the results.

It's hideously expensive to do, so much so that Mozilla stopped using it for Firefox because they hit the 32bit memory limit. Still, it is only done one - not every time you run the program.

Did any of those Ruby EE went back into Ruby VM or Ruby 2.0? I get the feeling that they stopped improving Ruby VM long ago.

And they could have helped and work on JRuby, which is Ruby running on JVM. The performance benefits here has little to do with Ruby or Java or Scala at all. But JVM. So Did they not use JRuby instead?

MacRuby, Or RubyMotion both uses LLVM has the option to Compile Ruby. If VM were always the performance problem, why not run it with it as compiled?

What we want isn't hipster programmers but running code. As a stakeholder I could really care less if the language that was used is 'cool' or not. In fact chances are the people who fancy themselves the cutting edge are people I'll regret having on staff later as they rapidly start focusing on problems of their own creation, then get bored and move on to some other place where they can blue sky instead of generating billable hours and good metrics.

Of course that's not to say that Scala is bad or anything. However you do have to look at the quality of tooling and other surrounding supporting infrastructure, not just the language and libraries. The difference in productivity between any 2 relatively modern languages which are both suitable to a task is going to be at most a couple %, whereas having a rock solid debugger, profilers, code analyzers, and good design tooling will make a much larger difference, maybe as much as 10%.

Well, of course stakeholders are focused on results, but it's widely understood that new cutting edge software is often built by the best who use new cutting edge tools.

I believe you are understating the benefits of the Scala ecosystem. This isn't merely a slightly nicer syntax for writing the same old for loops and if checks.

People still write great stuff in C/C++, but most programmers would suggest that Java/C# enabled them to work at a higher level of abstraction and lends itself to a very different style of development.

Scala has a similar change. Compare the typesafe (creators of Scala) Akka framework to a typical message queueing system or compare typesafe's Slick framework to something like Hibernate or LINQ. I think typesafe is making much better infrastructure.

Yeah, I have mixed feelings about the whole abstraction process though. Clearly we'll never progress MUCH in terms of making better software and making it more easily unless we manage to create some level of abstraction. We did it with ASM and then FORTRAN/C/etc and in some degree we've played with 4GLs, relational algebras, and other abstractions, but we always hit a wall with that sort of abstraction at a certain point. It is just too hard to manage the boundary between abstraction and efficiency. What we need I think is PROCESS abstraction instead of language abstraction. That is we need a better way to use our language instead of a more terse language. We need a play instead of a book. Scala (or Eiffel, or whatever) may BE better, make improvements, but I think we're not going to advance much as long as humans are typing on keyboards.

Java is a funny beast in other ways because modern JVMs are frighteningly good at dynamic optimization these days, something that is simply not possible with straight compiled languages.

That's at best 1/2 true. Modern compilers and linkers can now do profile guided optimisation, which boils down to instrumenting the binary, running the complied code, and optimising based on the results.

It's hideously expensive to do, so much so that Mozilla stopped using it for Firefox because they hit the 32bit memory limit. Still, it is only done one - not every time you run the program.

Makes sense. I guess the question is whether it is more work than it is worth. I certainly don't have the resources to do that level of profiling for every weekly build of code. It may well be that things like that will obviate VMs. In a sense VMs are always a hack. Probably not within the lifecycle of my current product!

The whole Twitter story is just about using the right(er) tools for the right job. Ruby, Scala, Erlang, C/#/++, ASM, etc. were all created with different goals in mind and expecting either of them to be perfect for everything is foolish.

One of my clients went from 3 pimped out servers running at 100% to one running at 10% capacity by moving the core of their application (a game server) from Java to Erlang a few years back. They just did the research and figured any Java/JVM-based optimizations would still net them way less than a switch to a more fitting architecture would. The front end is still Rails, it only moved to Postgres from MySQL.

Does that mean Java sucks? Yeah, for the tasks they used it for it did—after a certain amount of active, parallel users was reached. Does that mean that Rails is great? Sure, for the task they use it for, it is—most of the stuff is generated once and then cached anyway.

It would be much nicer if such articles focused more on what challenges specifically a company is facing and how exactly a move helped them. That would be much more helpful than "ZOMG after they dicthed Ruby, everything went peachy and every developer now gets three blowjobs a day from happy customers!!1one"… Twitter, after all, has pretty specific challenges. And Facebook is running on PHP of all things—is PHP better than Java/the JVM, then?

The head of state of Australia is Her Majesty Queen Elizabeth, who is a Christian and is Supreme Governor of the Church of England, though she delegates much of her responsibilities over Australia to the Governor General.

British monarchs have been less authoritarian since Charles I came to a nasty end (after a civil war that ravaged the country for years).

I have mixed feelings about the whole abstraction process though. Clearly we'll never progress MUCH in terms of making better software and making it more easily unless we manage to create some level of abstraction. We did it with ASM and then FORTRAN/C/etc and in some degree we've played with 4GLs, relational algebras, and other abstractions, but we always hit a wall with that sort of abstraction at a certain point.

We seem to have trouble learning how to use abstraction effectively. The language features that support abstraction will, if not used carefully, create complication, inefficiency, obfuscation and confusion instead, and I have seen many well-intentioned efforts fall into that trap to some degree.

I have mixed feelings about the whole abstraction process though. Clearly we'll never progress MUCH in terms of making better software and making it more easily unless we manage to create some level of abstraction. We did it with ASM and then FORTRAN/C/etc and in some degree we've played with 4GLs, relational algebras, and other abstractions, but we always hit a wall with that sort of abstraction at a certain point.

We seem to have trouble learning how to use abstraction effectively. The language features that support abstraction will, if not used carefully, create complication, inefficiency, obfuscation and confusion instead, and I have seen many well-intentioned efforts fall into that trap to some degree.

I think the problem is more fundamental. A highly abstract language simply can't express all the possible choices that people want. I mean you simply HAVE to get down to the nitty gritty if you're going to specify exactly where that button goes and what other buttons get enabled when you click it, etc. We've tried many ways to get rid of the 'noise' part of all this kind of thing, and that can work, but there's just a level of language abstraction that creates more problems than it solves, or you end up with a toy language.