Slashdot videos: Now with more Slashdot!

View

Discuss

Share

We've improved Slashdot's video section; now you can view our video interviews, product close-ups and site visits with all the usual Slashdot options to comment, share, etc. No more walled garden! It's a work in progress -- we hope you'll check it out (Learn more about the recent updates).

machaut writes "Twitter, one of the highest profile Ruby on Rails-backed websites on the Internet, has in the past year started replacing some of their Ruby infrastructure with an emerging language called Scala, developed by Martin Odersky at Switzerland's École Polytechnique Fédérale de Lausanne. Although they still prefer Ruby on Rails for user-facing web applications, Twitter's developers have started replacing Ruby daemon servers with Scala alternatives, and plan eventually to serve API requests, which comprise the majority of their traffic, with Scala instead of Ruby. This week several articles have appeared that discuss this shift at Twitter. A technical interview with three Twitter developers was published on Artima. One of those developers, Alex Payne, Twitter's API lead, gave a talk on this subject at the Web 2.0 Expo this week, which was covered by Technology Review and The Register."

Kidding aside, is this a 'nail' in the coffin of scalable Ruby? 5 years ago people were saying the same thing about PHP scaling but Facebook has done a rather nice job of making it scale. Twitter was supposed to be the poster child of how awesome Ruby and RoR was.

If I want to use any Java software then I'll use Scala. I see people bashing Scala, saying the languages they know are good enough or they can just use jython/jruby/groovy, but they clearly know little about Scala.

One thing that's nice about Scala that Java, Jython, JRuby, and Groovy all lack is it's powerful type system and pattern matching. Once you get used to good pattern matching like in Scala, SML, OCaml, or Haskell you won't want to go back. Plus you get all the benefits of running on the JVM at high speed (unlike all the aforementioned JVM languages, except Java itself.)

Honestly, you should check out Scala before you bash it. It's a very good choice wherever you might choose Java, which is a good choice for the back end. Twitter's developers are smart and experienced. They didn't choose Scala just to be cool. It is a powerful tool that can get the job done in an elegant way.

I find your assertion of differentness being the main reason to use Ruby on Rails to be somewhat offensive let alone uninformed, as it suggests that the multitudes of developers using it are doing so not because of technical merits but because they're buying in to some image of differentness. A cursory examination of the typical Rails project and developer should indicate otherwise. Because you don't find it helpful in your work doesn't mean others don't find it helpful or see real benefits from using the system. Perhaps you should be less dismissive -- you'd find yourself with a lot more interesting stuff to discover!

Scala may be unknown, unused, and under-development, but Ruby is over 12 years old and the community that uses it is huge. So it is neither unknown or unused. (It is, however, under development, as most all modern languages are.)

Anyone who thinks Ruby [hulu.com] on [amazon.com] Rails [zvents.com] can't [scribd.com] scale [yellowpages.com] is as dogmatic in their anti-hype as the original hypers were. The right tool for the right job and all that.

Maybe they use Scala because writing Java code is painful by comparison. Tons of boilerplate, every exception has to be caught in every scope, no pattern matching, no named arguments, and on and on. For people like me, without Scala the JVM wouldn't even be under consideration, though I admit that Java has been more usable since it got generics.

When the Twitter folks wrote their own message queue, there were very limited options on the market. Seeing as Obie Fernandez has failed to even begin to explain, in technical terms (rather than saying "... made me throw up in my mouth") what's wrong with their implementation, forgive me if I don't consider this damning evidence

Moreover, if you're going to reference Basecamp, Campfire, Lighthouse, et al, perhaps you should also reference the ridiculous effort and resources that they expend in scaling Rails?

Rabid fanboyism does Rails and Ruby a disservice. I wouldn't touch that community with a 10 foot pole.

The problem is that most of those compiler/interpreters suck enormously.

Exactly. MRI (Matz' Ruby Interpreter) is known to have some serious scalability issues. Interestingly, one of the main issues with MRI comes from the way gcc compiles the big delegator switch in MRI's core, with a large sparse stack that causes ridiculous memory consumption (and sometimes even leaks). There's a set of 8 patches (the MBARI patchset) that drastically improve the situation. The reduced memory footprint and the much smaller stack also give a noticeable speed increase.

The good news is, these patches are progressively being merged upstream, so it's very likely that future MRI versions will be much better.

This change was driven by the companies need to reliably scale their operation to meet fast growing Tweet rates, already reaching 5000 per minute during the Obama Inauguration

In what parallel universe it is difficult to build a message queue capable of handling 83 messages per second? I built a fault-tolerant group message passing system 10 years ago that handled 30,000 messages per second on a dinky machine. Hell, Oracle's built in message queue system can handle more than 83 messages per second with ACID!

I will never, ever, ever understand the engineering choices of the Twitter team.

Traversing giant class hierarchies for method dispatch, on every method dispatch, is always going to be slow. Even caching is slow, since you will just end up with a copy of the class hierarchy. There's no way around it. This is why C++ and Haskell use "templating" and other similar Turing complete typing constructs to deal with the issue at compile time.

When you start using abstractions 20 layers deep, and have to climb up a tree for every expression (since everything is an object), you will face major difficulties.

Still, Ruby is a fine language, and 1.9 does offer improvements that bring it up to par with Python, for example.

So, a Ruby application can't scale well vertically -- one can't just upgrade the machine with more CPUs for example.

At the same time, no language is inherently prohibiting horizontal scaling, if application design provides for it -- adding more machines onto which the application can run in parallel.

Twitter could've been designed to permit horizontal scaling. Regrettably the article didn't say much about this approach. They are improving the vertical scalability of the application by switching to first-class threads (via the JVM), but are they not eventually going to hit the limits for vertical scaling?!

pageviews do not suddenly get easier to service because that page has a video on it.

No, they get considerably harder. Hulu, if I remember correctly, dynamically alters the bitrate to compensate slow connections and improve the quality on faster ones. It also puts a load on the server that's throwing out the video regardless of the bitrate.

> If your threading model is nasty or your memory access> features too coarse grained or a number of other things get designed> wrong at the language level and are by specification broken then you have problems

In Ruby's case, you overcome this by using processes rather than threads.

> On top of that if you don't have competition in your compiler space> then it doesn't matter if it's not part of the language spec any non> scalable parts of the compiler implementation is by default a problem with your language.

At the level we're talking about - 8M users - I think that compilation optimizations/bottlenecks are not the issue. It's about architecture. It doesn't matter if you're writing assembly language, you'll still need to scale out of one machine's memory space. I'm thinking stuff like Werner Vogel's discussions of CAP [infoq.com].

> A language is more than just its syntax it's> also the available implementation and tools

Yes, and I'd say that there's also the infrastructure surrounding the language - protocols, operating systems, caching tools, etc. Ruby is well suited to integrating with those sorts of things, which is why there are myriad techniques available for scaling Ruby apps.

All that said, I agree with your proposition - that some languages are well suited to some tasks. As some folks here have suggested, Twitter's messaging bus sounds like a perfect Erlang app.

Huh? Are you confused? Both companies made the stupid mistake of using a web scripting language to do backend heavy lifting.
Twitter is fixing that with Scala. (leaving RoR on the frontend because its really good at web frontends)
Facebook already fixed that with ERLANG, not PHP. (leaving PHP on the frontend because of the technical debt they've accumulated)
http://www.facebook.com/eblog [facebook.com]
BTW, PHP is the worst language ever. That is a fact, not my opinion.

If someone threw the necessary money at Ruby, it could have easily the performance of a modern Smalltalk implementation, as Ruby is essentially Smalltalk in diguise.

You might be interested in StrongTalk [strongtalk.org]. It's a Smalltalk-80 rewrite with optional strong typing and pervasive JIT, meaning that it's incredibly fast and robust. It is, unfortunately, Windows only at the moment, but it's all BSD licensed --- VM, image, source browser and all --- and they're looking for people to help with a Linux port.

That's fine for the vast majority of web applications, but you clearly don't understand the scale of the traffic a site like Twitter receives. What do you do when one database machine, no matter how fast, isn't enough? When your load balancer gets overloaded? How about handling that massive search index?

Even a site as simple as Twitter will present you with problems you never expected once it gets that popular. Starting off with something like you suggested is exactly what gets them in the mess they're in now. It must be designed for parallelism and scalability or it will fall over.

That said, if it's properly designed, you can probably make it work in any language, although you can dramatically reduce the number of production machines it takes if you have an efficient compiler/interpreter.

Twitter is not a trivial application to scale, considering the wide disparity in listeners to follower ratios, that views are dynamically generated by interpolating many-to-many message streams, and that each message is persistent forever.

As an analogy, It's like managing an IRC server, with persistent messages that are full-text indexed, with one channel per user, and unlimited number of users can join each other's channels. When you join a new user's channel, your chat log is automatically (and quickly) re-woven with messages from that channel according to relative time series of these messages. And, there's a global channel that everyone can watch to see what any user in any channel is saying at any time.

Now do this, all the while avoiding netsplits (i.e. missing messages), allowing retracts of almost message, recent or historical, and ensuring the channel history (eventually) reflects that change. And handle sudden bursts of activity among unpredictable sets of channels because they're all attending the same conference, or a burst of network-wide high activity because people are watching the World Cup or Obama's inauguration.

The point is that, while the idea is simple, the variability of use and disparity of activity is what makes life interesting; the messaging & DB architecture that works well for recent activity, for example, doesn't help for having reasonable persistent random-access to historical messages.

In all, Twitter has gotten a *lot* more reliable the past several months than it was a year ago.

I totally agree wrt Scala, but that's going too far. I went to a tech talk at Twitter where they were supposed to discuss their experience with Scala. Instead, they changed to subject to ask us whether we liked to format our code with a 2 spaces, 4 spaces, or 8 spaces. Oh my god! When others commented that the Scala coding standard stated the answer, the developers stated that it wasn't their style and they thought that they should write their own style guide. It wasn't that Scala's style is bad, just insanely idiotic NIH.

After many requests, they refused to actually talk about what we had actually come out to listen to. Instead we got a really crappy presentation on other non-sense and shown alpha quality code for a memcached client (since Java's wasn't cool enough). When asked whether they used Spring or a DI framework, Alex stated that he didn't understand what DI meant (had to be explained) and that he saw no value in it. His coworker stated that their code was so pretty that DI wasn't needed, but only necessary due to Java's verbosity. Total freakin morons.

Alex announced his book and admitted that the name was meant to confuse readers and redirect purchases of Martin's awesome text. So even worse then the garbage talk, he was frank about having a lack of integrity.

So no way in hell would I call those Twitter guys good. Perhaps before that "tech talk", but having met them.. no.

As much I do like coding PHP I completely agree with you. The fact that $arr[0] is the same as $arr['0'] is just insane and that each function has a different order of needle, haystack, replace, etc is just unbearable... So much that I already have wrapper functions so they are all the same...

But unless I find years of time and unlimited money there is no way I can ditch all my php code and go to ruby or python or anything else...

Scala also keeps Java's strong static typing and adds functional language features. I don't think it needs any development at all to be adapted for mainstream use.

Scala is a great thing, now what it needs is equally great tooling (i.e. IDE support, including refactoring, on the same level as we have for Java). And it's getting there - there is an Eclipse [scala-lang.org] plugin, and a NetBeans [netbeans.org] one - but it definitely needs more work and polish.

The payoff would be huge, though. Right now, as far as languages go, C# is far more advanced than Java. But Scala is equally more advanced than C# (the only thing on.NET that could compare with Scala is F#, and that's less stable and mature currently... but it will stabilize once VS2010 is released, so there isn't much time). Some significant investment into Scala by one of the big players could help straighten things out a bit (it's good when there are two major competing languages in the same niche, because it forces them to evolve at a more rapid pace, steal each other's features, and learn from each other's mistakes - as Java and C# did for several years, until Java began to stagnate).

About the only thing I don't like about Scala is generic type erasure [lamp.epfl.ch] it inherited from Java to maintain class compatibility; but there are workarounds already, and if Scala becomes The Next Language for the Java platform, it may well deprecate erasure and introduce its own reified generics then.

Now, if only someone would pick it up and throw money (and marketing resources) at it. Like, say, Google...

You know, technically there are more recent [bcs.org] Fortran specs. Fortran gets hauled out in the form of FORTRAN77 every time someone wants to talk trash about an old language. However, just like C or C++ or Java, the language has evolved. Fortran 2003 has objects, a pretty cool module system and basic thread support.

So what you're saying is that language features do affect scalability? Well, that was precisely my point.