‣ “I have yet to ﬁnd any examples of websites that have heavy tra c and stream media that run from a Ruby on Rails platform, can you suggest any sites that will demonstrate that the ruby platform is stable and reliable enough to use on a commercial level?”

‣ “We are concerned about the long-term viability of Ruby on Rails as a development language and environment.”

‣ “How easily can a ruby site be converted to another language? (If for any reason we were forced to abandon ruby at some point in the future or I can’t ﬁnd someone to work with our code?).”

‣ “My company has some concerns on whether or not Ruby on Rails is the right platform to deploy on if we have a very large scale app.”

‣ What is a “scalable” application? ‣ What are some hardware layouts? ‣ Where do you get the hardware? ‣ How do you pay for it? ‣ Where do you put it? ‣ Who runs it? ‣ How do you watch it? ‣ What do you need relative to an application? ‣ What are the commonalities of scalable web architectures? ‣ What are the unique bottlenecks for Ruby on Rails applications? ‣ What's the best way to start so you can make sure everything scales? ‣ What are the common mistakes?

But are these really Ruby or Rails specific ?

They have to do with designing and then running scalable “internet” applications

A Sysadmin’s view ‣ Ruby on Rails is simply one part ‣ Developers have to understand Rails horizontally (of course, otherwise they couldn’t write the application) ‣ Developers ideally understand the vertical stack ‣ It can get complicated fast and it’s easy to overengineer

What do you do with 1000s of physical machines? 100s of TB of storage? In 4 facilities on 2 continents?

The 10% rule ‣ Google’s earning release: ‣ quot;Other cost of revenues, which is comprised primarily of data center operational expenses, as well as credit card processing charges, increased to $307 million, or 10% of revenues, in the fourth quarter of 2006, compared to $223 million, or 8% of revenues, in the third quarter.quot;

The 10% rule ‣ A common rule of thumb I tell people is to target their performance goals in application design and coding so that their infrastructure (not including people) is ≤10% of an application’s revenue.

The 10% rule ‣ Meaning if you’re making $1.2 million dollars a year o of an online application, then you should be in area of spending $120,000/year or $10,000/month on servers, storage and bandwidth. ‣ And from the other way around, if you’re spending $10,000 a month on these same things, then you know where to push your revenue to.

Or maybe this is just a cost. It used to be for me.

A joyent.net node (-ish)

Whatever you do ‣ Keep it simple ‣ Standardize, Standardize, Standardize ‣ Try and use open technologies

Some of my rules ‣ Virtualization, virtualization, virtualization ‣ Separating hardware components ‣ Keep the hardware setup simple ‣ Things should add up ‣ Conﬁguration management and distributed control ‣ Pool and split ‣ Understand what each component can do as a maximum and a minimum

What’s the cost breakpoints? ‣ Including people costs ‣ It’s generally cheaper at the $20,000-$30,000/ month spending to do it in-house. *Assuming you or at least one of your guys knows what they’re doing.

Why we started with Dells ‣ Responsive ‣ They put us in touch with di erent leasing companies and arrangements ‣ They shipped ‣ We were a Dell/EMC shop (even with Solaris running on them)

Why we ended up using Sun ‣ The rails (literally the rack rails) ‣ RAS ‣ Hot-swappable components ‣ Energy e cient ‣ True ALOM/iLOM that works with console ‣ Often cheapest per CPU, per GB RAM ‣ Often cheapest in TCO ‣ We’re on Solaris (there’s some assurances there)

Lease if you can ‣ Generally it’s about 10-50% down ‣ And can be “ok” interest rate wise: 8-18% ‣ Do FMV where you turn over the systems at year 2 ‣ How do you do it? Demonstrate that you have the cash overwise and push your vendor.

Comparison ‣ Total $6500 for 20 systems in a rack on a lease. ‣ “2850s” at The Planet or Rackspace: $900-1200 each ($18,000 - $24,000/month) ‣ DIY: does require a more involved human or two of them (that could use up the di erence; a great sysadmin/racker is $100K+)

What do you run on it?

The Key

What are the patterns of deployment? Lessons learned

Ruby ‣ I like that Ruby is process-based ‣ I actually don’t think it should ever be threaded ‣ I think it should focus on being as asynchronous and event-based on a per process basis ‣ I think it should be loosely coupled ‣ What does a “VM” do then: it manages LWPs ‣ This is erlang versus java

So how do you run a rails process? ‣ FCGI ‣ Mongrel (event-driven) ‣ JRuby in Glassﬁsh

LDAP ‣ Hierarchical database ‣ Great for parent-child modeled data ‣ We use for all authentication, user databases, DNS ... ‣ Basically as much as we can

Why?

The multi-master replication is amazing when you’ve been living in MySQL and PostgreSQL lands

Sina ‣ “With over 230 million registered users, over 42 million long-term paid users for special services, and over 450 million peak daily hits, Sina is one of the largest Web portals and a leading online media and value-added information service provider in China.” ‣ 12 Sun Fire T1000 servers running Solaris 10 and the Sun Java System Directory Server.

Pay attention to how you store your files

A story

Hashed directory structures ‣ Never more than 10K ﬁles / subdirs in a single directory (I aim for a max of 4K or so..) ‣ Keep it simple to implement / remember ‣ Don't get carried away and nest too deeply, that can hurt performance too

A couple of approaches

The 16x256

‣ Pre-create 16 top level dirs, 256 subdirs each which gives you 4096 quot;bucketsquot;. ‣ Keeping to the 10K per bucket rule, that's 4M quot;thingsquot; you can put into this structure. Go to 256 x 256 if you're big and/or want to keep the number of things in the buckets lower.

‣ How do you decide where to put stu ? ‣ Pick randomly from 1 to 16 and from 1 to 256. Store path in the proﬁle. What's it look like: userid=76340 fspath=/data/12/245/76340/file1,file2,etc.. ‣ You get nice even distribution, but the downside is that you can't quot;computequot; the directory path from the thing's ID.

The Hasher

‣ Idea is to compute the FS path from something you already know. ‣ Big plus is that anything you write that needs to access the FS doesn't need to look up the path in a database. ‣ Dubious value since you probably had to look the object/thing you're doing this for in the database anyways.. but you get the idea...

‣ Example: Use the userid to form the multi-level quot;hashquot; into the ﬁlesystem. Take for example the ﬁrst two digits as your top level directory, the second two as the subdirectories. So sticking with our userid above we'd get a path like: ‣ /data/76/34/76340

‣ Downside is you can end up building stupid logic around the thing to handle low ids (where does user quot;46quot; go?) or end up padding stu , all of which is ugly.

‣ A fancier alternative to this is using something like a MD5 hash (which you probably also already have for sessions) and that works well, is easy to implements, tends to give you better distribution quot;for freequot;, and looks sexy to boot: # echo quot;76340quot; | md5 e7ceb3e68b9095be49948d849b44181f gives us: /data/e7/ceb/76340

Downsides of the MD5-style ‣ Distribution is still unpredictable ‣ Watch your crypt()-style implementation cause it might output characters you need to escape! ‣ You can't compute it in your head

But ‣ The attractiveness of using some sort of computed hash will mostly depend on what sort of ID structure you already have, or or planning to use. ‣ Some are very friendly to simple hashing, some are not. ‣ So think “friendly”

Jamis does something like this ‣ http://www.37signals.com/svn/archives2/ id_partitioning.php

DNS

‣ Don’t forget about it. ‣ Always surprising how little people know about DNS servers ‣ Federation by DNS is an easy way to split your customers into pods.

‣ Use DNS ‣ Great load balancers ‣ Event-driven mongrels ‣ A relational database isn’t the only datastore: we use LDAP, J-EAI, ﬁle system too ‣ A Rails process should only be doing Rails ‣ Static assets should be coming from static servers ‣ Go layer7 where you can: a rails process should only be doing one controller ‣ Federate and separate as much as you can

Scaling a Rails Application from the Bottom Up - soso.io

Scaling a Rails Application – Thinking About the Full ...

I found a great presentation about scaling web applications that I wanted to share with the world. Enjoy! Scaling a Rails Application from the Bottom UpRead more

RailsConf1日目 Scaling a Rails Application from the Bottom ...

RailsConf 1日目はTutorial/Guidebook day。 registrationをすませた後は”Scaling a Rails Application from the Bottom Up“に参加してみる。Read more

These presentations are classified and categorized, so you will always find everything clearly laid out and in context.
You are watching Scaling a Rails Application from the Bottom Up presentation right now. We are staying up to date!