Slashdot videos: Now with more Slashdot!

View

Discuss

Share

We've improved Slashdot's video section; now you can view our video interviews, product close-ups and site visits with all the usual Slashdot options to comment, share, etc. No more walled garden! It's a work in progress -- we hope you'll check it out (Learn more about the recent updates).

machaut writes "Twitter, one of the highest profile Ruby on Rails-backed websites on the Internet, has in the past year started replacing some of their Ruby infrastructure with an emerging language called Scala, developed by Martin Odersky at Switzerland's École Polytechnique Fédérale de Lausanne. Although they still prefer Ruby on Rails for user-facing web applications, Twitter's developers have started replacing Ruby daemon servers with Scala alternatives, and plan eventually to serve API requests, which comprise the majority of their traffic, with Scala instead of Ruby. This week several articles have appeared that discuss this shift at Twitter. A technical interview with three Twitter developers was published on Artima. One of those developers, Alex Payne, Twitter's API lead, gave a talk on this subject at the Web 2.0 Expo this week, which was covered by Technology Review and The Register."

Kidding aside, is this a 'nail' in the coffin of scalable Ruby? 5 years ago people were saying the same thing about PHP scaling but Facebook has done a rather nice job of making it scale. Twitter was supposed to be the poster child of how awesome Ruby and RoR was.

Never was a nail except for the Ruby community that was in denial. Ever developer in the world except Ruby fanboys knew Ruby's inherent problem was scalability. Benchmarks showed it but they would always slant their own benchmarks to show the opposite. But facts are facts and in the end you can't deny the truth. So this is where we are at. The question is will Ruby fanboys still choose to deny the issues with Ruby or accept that it does have inherent limitations?

You know, technically there are more recent [bcs.org] Fortran specs. Fortran gets hauled out in the form of FORTRAN77 every time someone wants to talk trash about an old language. However, just like C or C++ or Java, the language has evolved. Fortran 2003 has objects, a pretty cool module system and basic thread support.

So what you're saying is that language features do affect scalability? Well, that was precisely my point.

It depends partly on what your concurrency looks like. Erlang supports one model of concurrency, a lightweight message-passing one with no explicit threads or shared memory. Scala supports that one as well---less efficiently---but also supports standard Java multithreading, which some people find useful for some purposes.

I don't get this buzzword-bingo bullshit about Twitter (or whatever the fuck site-du-jour is) in regards to concurrency and scalability. this is not a complex application, it's something that you code one afternoon (in Java/PHP) then throw it in a rack full of HTTP server nodes, a load balancer (shit, even RR-DNS will do) and a RAMSAN for the DB. that's it. stop the drama.

That's fine for the vast majority of web applications, but you clearly don't understand the scale of the traffic a site like Twitter receives. What do you do when one database machine, no matter how fast, isn't enough? When your load balancer gets overloaded? How about handling that massive search index?

Even a site as simple as Twitter will present you with problems you never expected once it gets that popular. Starting off with something like you suggested is exactly what gets them in the mess they're in n

That's fine for the vast majority of web applications, but you clearly don't understand the scale of the traffic a site like Twitter receives.

I clearly don't. this type of webapp generally scales well, their problem is that they chose shitty technologies to "power" it. you want fast? code an Apache module that uses libmysql (or whatever DB it is they use). now that's fast.

What do you do when one database machine, no matter how fast, isn't enough?

I believe they can split the DB, add more RAM, cache a lot (a LOT, cache it in RAM, RAM is cheap), serve static content.

When your load balancer gets overloaded?

in this scenario the load balancer is the least of your problems. what happens if a switch dies? drives die, PSUs blow up, etc etc. use redundancy.

Twitter is not a trivial application to scale, considering the wide disparity in listeners to follower ratios, that views are dynamically generated by interpolating many-to-many message streams, and that each message is persistent forever.

As an analogy, It's like managing an IRC server, with persistent messages that are full-text indexed, with one channel per user, and unlimited number of users can join each other's channels. When you join a new user's channel, your chat log is automatically (and quickly) re-woven with messages from that channel according to relative time series of these messages. And, there's a global channel that everyone can watch to see what any user in any channel is saying at any time.

Now do this, all the while avoiding netsplits (i.e. missing messages), allowing retracts of almost message, recent or historical, and ensuring the channel history (eventually) reflects that change. And handle sudden bursts of activity among unpredictable sets of channels because they're all attending the same conference, or a burst of network-wide high activity because people are watching the World Cup or Obama's inauguration.

The point is that, while the idea is simple, the variability of use and disparity of activity is what makes life interesting; the messaging & DB architecture that works well for recent activity, for example, doesn't help for having reasonable persistent random-access to historical messages.

In all, Twitter has gotten a *lot* more reliable the past several months than it was a year ago.

> If your threading model is nasty or your memory access> features too coarse grained or a number of other things get designed> wrong at the language level and are by specification broken then you have problems

In Ruby's case, you overcome this by using processes rather than threads.

> On top of that if you don't have competition in your compiler space> then it doesn't matter if it's not part of the language spec any non> scalable parts of the compiler implementation is by default a problem

Visual Basic was so stupid that Microsoft could have easily made it almost as fast as C - if only they had wanted. I guess they had other priorities.

I must ask one thing: Have you ever heard of Self 93? They made it so good fifteen years ago that it topped at half the speed of C...a language where everything (!) was a message send and *anything* could have been redefined at *any* moment.

If someone threw the necessary money at Ruby, it could have easily the performance of a modern Smalltalk implementation,

If someone threw the necessary money at Ruby, it could have easily the performance of a modern Smalltalk implementation, as Ruby is essentially Smalltalk in diguise.

You might be interested in StrongTalk [strongtalk.org]. It's a Smalltalk-80 rewrite with optional strong typing and pervasive JIT, meaning that it's incredibly fast and robust. It is, unfortunately, Windows only at the moment, but it's all BSD licensed --- VM, image, source browser and all --- and they're looking for people to help with a Linux port.

No, my logic is that Ruby is essentially a member of a well-researched class of languages, for which (the class, not Ruby) high-performance VMs have already been developed (Cincom Smalltalk, Gemstone, Self93, Strongtalk...), but this development was always expensive (this was also the case of Java and.NET, obviously). IMO, your shit(ty?) analogy does not apply here.

Except most languages have more than one interpreter/compiler whit varying performance in varying environments.
Jython/IronPython/Python all perform a little different.
Ruby 1.9 and MacRuby don't perform the same.

The problem is that most of those compiler/interpreters suck enormously.

Exactly. MRI (Matz' Ruby Interpreter) is known to have some serious scalability issues. Interestingly, one of the main issues with MRI comes from the way gcc compiles the big delegator switch in MRI's core, with a large sparse stack that causes ridiculous memory consumption (and sometimes even leaks). There's a set of 8 patches (the MBARI patchset) that drastically improve the situation. The reduced memory footprint and the much smaller stack also give a noticeable speed increase.

The good news is, these patches are progressively being merged upstream, so it's very likely that future MRI versions will be much better.

And where Facebook and PHP can accomplish this goal for the frontend and the backend (and many other corporate and enterprise PHP sites), Ruby can merely do frontend work. That doesn't seem like scalability to me. That seems very limited. That to me says 'as long as you don't TAX the engine, it can perform'. That's not scaling. That's getting by.

Listen to your own Ruby rhetoric. Ruby itself doesn't have threading; it's FAUX threads. And PHP was never meant to be used for app development; it's meant for web development. That would be like trying to build an OS in Visual Basic (now watch some idiot try).

Yeah I understand what they are doing with Scala. And They DID try to do alot of it with Ruby over the years and Ruby people touted it over and over while we said it doesn't scale over and over and pointed out how they were removing more and more R

Ruby does not have a problem scaling. Neither, for that matter, does even Rails. (As the companies that run Basecamp, Campfire, LinkedIn, Lighthouse, and many others will tell you.)

The fact is that the Twitter folks tried to write their own message queue in Ruby [unlimitednovelty.com], when there was absolutely no reason to do so: there were plenty of pre-made message queues already available for Ruby, and already optimized. Not only did they choose to write their own, unnecessarily, they did it badly [obiefernandez.com].

Anyone who thinks Ruby [hulu.com] on [amazon.com] Rails [zvents.com] can't [scribd.com] scale [yellowpages.com] is as dogmatic in their anti-hype as the original hypers were. The right tool for the right job and all that.

pageviews do not suddenly get easier to service because that page has a video on it.

No, they get considerably harder. Hulu, if I remember correctly, dynamically alters the bitrate to compensate slow connections and improve the quality on faster ones. It also puts a load on the server that's throwing out the video regardless of the bitrate.

Jane Q. Public: Either you didn't read the comments of that blog or you're spreading FUD. Here is a comment from Alex Payne from that article:

Hoo boy. First of all, I hope you've had a chance to read my general reply to the articles about my Web 2.0 Expo talk [1] and this response to a vocal member of the Ruby community [2]. I sound like a pretty unreasonable guy filtered through the tech press and Reddit comments, but I hope less so in my own words.

Secondly, the quote at the top of your post is from my coworker, Steve Jenson, who's been participating in the discussion on this post.

On JRuby: as Steve said, we can't actually boot our main Rails app on JRuby. That's a blocker. Incidentally, if you know of anyone who has a large JRuby deployment, we'd be interested in that first-hand experience. If you don't, it might be a little early to say it would solve all our problems.

It's also incorrect to say that the way JRuby and Scala make use of the JVM is exactly the same. Much like our other decisions haven't been arbitrary, our decision to use Scala over other JVM-hosted languages was based on investigation.

On our culture: if you'd like to know about how we write code, or how our code has evolved over time, just ask us. We're all on Twitter, of course, but most of the engineers also have blogs and publish their email addresses. There's no need to speculate. Just ask. There's not a "raging debate" internally because we make our engineering decisions like engineers: we experiment, and base our decisions on the results of those experiments.

It's definitely true that Starling and Evented Starling are relatively immature queuing systems. I was eager to get them out of our stack. So, as Steve said, we put all the MQ's you think we'd try through their paces not too long ago, and we knocked one after another over in straightforward benchmarks. Some, like RabbitMQ, just up and died. Others chugged on, but slowly. Where we ran into issues, we contacted experts and applied best practices, but in the end, we found that Kestrel fit our particular use cases better and more reliably. This was not the hypothesis we had going into those benchmarks, but it's what the data bore out.

We get a lot of speculation to the tune of "why haven't those idiots tried x, it's so obvious!" Generally, we have tried x, as well as y and z. Funnily enough, I was actually pushing to get us on RabbitMQ, but our benchmarks showed that it just wouldn't work for us, which is a shame, because it advertises some sexy features.

Personally, I'm extremely NIH-averse; I research open source and commercial solutions before cutting a new path. In the case of our MQ, one of our engineers actually wrote Kestrel in his free time, so it was bit more like we adopted an existing open source project than rolled our own. Pretty much the last thing we want to be doing is focusing on problems outside our domain. As it so happens, though, moving messages around quickly is our business. I don't think it's crazy-go-nuts that we've spent some time on an MQ.

I hope my colleagues and I have been able to answer some of your questions. As I said, in the future, please consider emailing us so we can share our experience. Then, we can have a public discussion about facts, not speculation. Perhaps, as commenter sethladd suggested, the onus is on us to produce a whitepaper or presentation about our findings so as to stave off such speculation. Time constraints are the main reason why we haven't done so.

As many programmers can tell you, publishing a successful book these days is worth a lot more than just the dollars it brings in directly; there is recognition and reputation, and perhaps a better-paying job down the road. Whether Payne is interested in the latter I do not know, but my point is simply that the book royalties are sometimes the least of the returns for being a successful author.

PHP was a mature environment when facebook was launched. RoR was (and still is to a certain extent) a fad environment, popular primarily because of its differentness. People who build sites on a platform because it's the latest thing are less likely to stick with that platform than people who choose a platform that has a solid reputation but is boring. Scala, at a guess, is going to be the next fad platform. Like Ruby, it has some interesting ideas behind it, but it needs a lot of development before we can consider a stable platform for serious applications, I think.

I find your assertion of differentness being the main reason to use Ruby on Rails to be somewhat offensive let alone uninformed, as it suggests that the multitudes of developers using it are doing so not because of technical merits but because they're buying in to some image of differentness. A cursory examination of the typical Rails project and developer should indicate otherwise. Because you don't find it helpful in your work doesn't mean others don't find it helpful or see real benefits from using the s

RoR was (and still is to a certain extent) a fad environment, popular primarily because of its differentness.

Huh, I generally use it because it has really good ORM and migrations, and I really like the syntax (coming from Objective-C it's pretty slick. I also used the PHP language when I was starting out, but one day it tried to insist that $myArr[0] and $myArr["0"] actually pointed to the same object, and I have refused to deal with it ever since; I also got tired of typing str_sub_case_insensitive_for_real_safe(haystack, needle) -- or is it needle, haystack? And is this one of those prank functions that fails to substitute the value but still returns a value that evals true? Or if I leave out one of those underscores, am I in fact calling a function that behaves almost exactly the same way but fails under difficult-to-reproduce circumstances? Maybe they've fixed this and the other sundry atrocities? Maybe they've stopped trying to make it into Perl, as compiled by a C++ compiler, and tried fashioning it into an actual dynamic language? I know, I know, some people like PHP, but I think arguments for the superiority of PHP over Ruby (or Python or Scala or Lisp or WebObjects or Perl6 or really anything else) are going to rest completely on the skills of the Zend interpreter writers, and almost never on the quality/readability/maintainability of the code, or the ease of the development process. You can write good safe code in PHP, that is true, but it isn't very ergonomic.

You know, RoR is really good at replacing those old Paradox and FMP database systems. I can see how Facebook might prefer PHP, but people trying to replace little inventory/business processes systems generally only need to support a few dozen users, and don't have an army of developers to keep it running. The Universe is big enough to accommodate the utility of Ruby on Rails and the Twitter developer's stupidity.

As much I do like coding PHP I completely agree with you. The fact that $arr[0] is the same as $arr['0'] is just insane and that each function has a different order of needle, haystack, replace, etc is just unbearable... So much that I already have wrapper functions so they are all the same...

But unless I find years of time and unlimited money there is no way I can ditch all my php code and go to ruby or python or anything else...

I'm not even sure if that was what it was, I think the problem I was having on top of that was that I would pop something off the head of $arr and sometimes it would be $arr[0] and sometimes it would be $arr[1], and this is crazy, but I dug in and low and behold I was actually storing values in the array to ["0"] and ["1"], and since arrays and hashes are the same thing in PHP (this is the real sin), it did what it thought was right, and created

Scala has the significant advantage that it's built on Java and interoperable with Java. Scala source code compiles directly into.class files. You get the speed of the JVM (which is acceptably quick these days), the ability to easily call Java APIs from within Scala, and the ability to run your Scala code on any machine with the JVM.

It's popular to dislike Java, and even as a well paid Java developer I'm not a huge fan of the language. But Java still is extremely common, and you can even write Java code for your Scala code to use while you're learning Scala.

Scala also keeps Java's strong static typing and adds functional language features. I don't think it needs any development at all to be adapted for mainstream use.

On the other hand, as a C++ developer I found learning Java to be child's play. The learning curve from Java to Scala, for me at least, is noticeably steeper. If anything kneecaps Scala I suspect it will be the barrier to entry, not the language itself.

Scala also keeps Java's strong static typing and adds functional language features. I don't think it needs any development at all to be adapted for mainstream use.

Scala is a great thing, now what it needs is equally great tooling (i.e. IDE support, including refactoring, on the same level as we have for Java). And it's getting there - there is an Eclipse [scala-lang.org] plugin, and a NetBeans [netbeans.org] one - but it definitely needs more work and polish.

The payoff would be huge, though. Right now, as far as languages go, C# is far more advanced than Java. But Scala is equally more advanced than C# (the only thing on.NET that could compare with Scala is F#, and that's less stable and mature cu

While Facebook uses PHP where Twitter uses Rails, Facebook uses a plethora of languages to make the whole system work. So Twitter really isn't going to Scala any more than Facebook is going to Erlang. Which is the say that they use the best tool for the job, not one tool for every job.

Huh? Are you confused? Both companies made the stupid mistake of using a web scripting language to do backend heavy lifting.
Twitter is fixing that with Scala. (leaving RoR on the frontend because its really good at web frontends)
Facebook already fixed that with ERLANG, not PHP. (leaving PHP on the frontend because of the technical debt they've accumulated)
http://www.facebook.com/eblog [facebook.com]
BTW, PHP is the worst language ever. That is a fact, not my opinion.

Scala may be unknown, unused, and under-development, but Ruby is over 12 years old and the community that uses it is huge. So it is neither unknown or unused. (It is, however, under development, as most all modern languages are.)

Anyone watching which companies are growing during the recession... Should get their info from a more complete source.Anyone looking at how the latest high-volume services are building infrastructure on the Web.. I'll give you thatNeil Gaiman [twitter.com]
Who? (Turns out to be some author) **Update! I just wrote another page!Demi Moore [twitter.com]
Who cares?The President of the United States [twitter.com] (though to be fair, his status isn't updated recently
I would prefer something with mor

Wow I just realized what an angry old man I must sound like there. I didn't mean to be that harsh. I'm just don't like twitter, and spending time speaking about its technical *wonders* seems to be a waste of time, since I assume it's about 3 lines of code (Or should be), and mimics most highschool coding class students' first projects.

Wow, there I go again. So angry. I just quit caffeine, you must understand!

Nothing personal intended here, but not only do you sound like an angry old man, but calling Ruby a "fad language" makes you sound like an ignorant, angry old man. Ruby has been around for over 12 years and is still one of the fastest growing languages around.

I'm confused - how does a language "scale"? Can you suddenly have 1 billion items in your array instead of 100 million? If that is indeed the case then like the parent said: it is crappy coding and/or design.

I'm fairly confident you can do twitter in whatever language you want and if designed properly it will scale in the proper sense of the word.

"Scalability" is a multi-faceted term. Most people think of "vertical scaling of the servers" when they hear the term "scalable". Which is to say that the code can handle a higher transaction load on a beefier server.

But there is quite a bit more to scalability than that. There's horizontal scalability of the servers. i.e. Does the software support plugging more boxes in to handle a greater load? Then there's the development scalability. i.e. Are the concerns of th

I read between the lines that you call C or C++ solid-code, and if I'm not mistaken, you will find that the kids are doing Scala because the code is more solid. Scala benefits from a typing system close to OCaml's which makes Scala code very, very solid -- especially if you keep away Java specifics (such as nullable objects) in your code and take special care when interacting with Java libs that may do so.

If I'm mistaken and you're not talking about C/C++, I hope you are not talking about dynamic languages which offer no guarantee whatsoever; you know as a developer I enjoy actually spending my time on working on the business side of my application -- and how to make it scalable, rather than working on low-level specifics and on testing if every pointer is null before dereferencing them. A type system that does this for me (which Scala or ML's parametrized type Option allows) is a bliss.

Now, I'm not enumerate every language under the sun to see what code you call solid, I guess your answer would be that the code is solid whatever the language it's written in. In the end, it all comes down to binary instructions, right? The question is: how many guarantees do the tools give you? In the case of Scala's compiler, it gives you a lot AND offers you a very enjoyable, lightweight yet powerful syntax.

Or just jruby which has proven itself to be quite able to handle under load. From what I understand they had issues with their Message Queue which could have benefited from the native threads in jruby. But we are all speculating on what really is going on with Twitter.

And what does scala have over say erlang for concurrency and performance?

What?
Scala is Java mixed with OCaml -- you get an extremly powerful typing system, but it feels like a "dynamic language" such as Python or Ruby. With the performance of Java.
Under the hood, it's a brand-new language, very different from all those: it merges functional and object-oriented programming. Yet, for the regular programmer, it feels like Ruby... until he gets used to more powerful features and learns how to designs more complex libraries as embedded DSLs. All that while running on the JVM and

The combination of static typing, type inference and concise syntax for higher-order functions give you a lot that Groovy and Java just fall short on.

In Java's case, it's the simple fact that you can't reasonably implement HOF's like filter, map, etc. without using anonymous inner classes. If you use anonymous inner classes, you end up with a ratio of like 8:1 of boilerplate:relvant expressions. Believe me, I've tried it. If you want to see just how baroque it gets, take a look at functionaljava.org [functionaljava.org].

Twitter's developers care more about being cool and hip and using the latest tool so that they remain popular, than they do about having a site that stays up 7 days a week.

Exactly. Scalability problems arise from poor implementation, not from language choices. Scalable platforms have been implemented in the past with PHP, ASP, Perl, C, Java, and I'm sure with Ruby, Python, or your favorite new language. Twitter is a massive-scale site, they should be looking at deep engineering, not a buzzword platform that promises easy scalability for dummies.

Scala may help them alleviate problems they've hit in the Rails framework. What will help them with the problems they hit in Scala?

Seriously speaking, when you cannot have long lived processes due to memory leaks, you're having a language/platform problem. When you only have green threads, you're having a language/platform problem. Can you architect around it? Probably, but it may be sub-optimal, and bottom line is, why bother when there are better tools available?

According to the rebuttals in the comments of the blog post in one of my sibling posts here, part of Twitter's scalability problem was poor implementation of the Ruby interpreter. Lots of small objects cause the heap to get fragmented and eventually it runs out of memory. Java interpreters have better GC and you can swap out different GC algorithms in some of them.

Why does everyone assume the people at Twitter are a bunch of newbies who don't know about deep engineering? Is it just because their analysis didn't lead them to your preferred buzzword?

As the joke I posted above insinuates, Scala runs on top of the Java platform. And unlike Ruby, it focuses on the use of the platform's features. So the platform is more than tested enough. Why they feel the need to use Scala rather than straight-up Java is one of life's great mysteries. But for now, their platform should be fine.

Whether the code they write is scalable and holds up under loads or not is an entirely different topic.

From the Article:Scala is different from other concurrent languages in that it contains no language
support for concurrency beyond the standard thread model offered by the host environment. Instead of specialized language constructs we rely on Scala's general abstraction capabilities to define higher-level concurrency models. In such a way, we were able to dene all essential operations of Erlang's actor-based process model in the Scala library.

Maybe they use Scala because writing Java code is painful by comparison. Tons of boilerplate, every exception has to be caught in every scope, no pattern matching, no named arguments, and on and on. For people like me, without Scala the JVM wouldn't even be under consideration, though I admit that Java has been more usable since it got generics.

In the real world circa 2009, I routinely port C++ code to Java, where it runs almost as fast (usually within 20% or so) as the C++, sometimes even a bit faster if the algorithms are not very cache-friendly.

The JVM is very fast, especially if you're using it in a production environment with the right flags set.

If I want to use any Java software then I'll use Scala. I see people bashing Scala, saying the languages they know are good enough or they can just use jython/jruby/groovy, but they clearly know little about Scala.

One thing that's nice about Scala that Java, Jython, JRuby, and Groovy all lack is it's powerful type system and pattern matching. Once you get used to good pattern matching like in Scala, SML, OCaml, or Haskell you won't want to go back. Plus you get all the benefits of running on the JVM at high speed (unlike all the aforementioned JVM languages, except Java itself.)

Honestly, you should check out Scala before you bash it. It's a very good choice wherever you might choose Java, which is a good choice for the back end. Twitter's developers are smart and experienced. They didn't choose Scala just to be cool. It is a powerful tool that can get the job done in an elegant way.

This blog post takes the attitude that Twitter didn't move to Scala because ROR had a problem, but because the in-house messaging system Twitter created performed poorly. The author does not work at Twitter but many of the Twitter developers (including Alex Payne) respond in the comments. I found the article to be very interesting and the comments even more so. They give a sense of how much research Twitter did before this change.