Monday, August 27, 2007

ActiveResource: A Rails Scalability Solution

ActiveResource looks very much like ActiveRecord, but appearances can be deceiving. It's not supposed to be a distinct object brokering system, just a client API for REST web services. ActiveResource's gem_server blurb says "think ActiveRecord for web services." That's very, very different from "a plug-and-play replacement for the pivotal role that ActiveRecord plays in Rails". The logical conclusion is that you shouldn't use ActiveResource as a distributed network version of ActiveRecord. Resources are different from objects, after all. But what the hell. Let's be bad guys.

The killer app for ActiveResource isn't Amazon S3 clients. We already have an Amazon S3 library. It isn't Flickr mashups either. The killer app for ActiveResource is scalability - the major bugbear of Rails apps, or at least, the major flaw in Rails' reputation.

ActiveResource gives you a kick-ass scalability solution. Consider that with Rails, Web apps are now as cheap and easy to create as shell scripts. As anyone with a background in economics will grasp immediately, it's inevitable that this low cost means people will start to throw Web apps at problems as casually as they currently throw shell scripts at problems, and this is exactly what's starting to happen. Certainly, with Rails, a Web architecture which requires two Web apps is no big deal.

An emerging strategy for Rails scalability is to pair ActiveRecord server apps on the back end with ActiveResource client apps on the front end. Your ActiveResource app sees the public Web. Your ActiveRecord app serves no HTML at all. Instead, it simply supplies its client app with data via REST. It's almost like an exponential version of MVC. Your MVC database app acts as a model, and your MVC customer-facing app serves as both controller and view. MVC to the second power.

You can add new servers for the customer-facing side without scaling the database side - or vice versa. It's also easy to independently load-balance the front and back end apps. Decoupling makes your deployment more flexible.

I'm not a deployment expert, but I have this architecture from people who are. This may become a dominant use case for ActiveResource in the future. Not just because it gives you a powerful way to address Rails' scaling problems, but also because the only people using ActiveResource are Rails developers. DHH is an amazing evangelist, but he's a terrible ambassador. ActiveResource will probably only get users from within the Rails community for the time being. This makes it even more likely that people will want to use it as a transparent network proxy to ActiveRecord. It's just what most ActiveResource users will expect.

With that in mind, I might as well say here what I said at the Rails Edge mini-conference and in blog posts the other day. At my current project we have a small but useful extension to ARes. We've implemented an ActiveRecord-style find(:include), so you can do ARes finds with association classes included. It was actually very easy to code. The tricky part is packaging it. It may have to become two plugins, one for the client side and one for the server side. It mostly uses code from Edge Rails which makes #to_xml more robust on the server side, but it also patches ARes on the client side slightly too.

Be warned, this architecture has a downside. It's not just that ActiveResource doesn't work with form handlers or validation or anything else with the same ease and grace of ActiveRecord. If you've ever heard war stories (or horror stories) from EJB developers, you know that duplicating model code for two sides of a network is always a pain in the ass. Fortunately, the best practice strategy for EJB development is to write Ruby code to auto-generate your EJB files. I use best practice in the sense that it's a practice, and the best one, not in the usual "nobody ever got fired for choosing IBM" sense of the term. I'm not trying to suggest, for instance, that EJB developers have ever actually heard of it. This obscure best practice was defined in a book called Code Generation In Action. The author, Jack Herrington, included EJB examples because he had to. He wanted to make it a Ruby book, but his publishers wouldn't let him, because at the time, the market didn't exist. So it's a Ruby book disguised as a general programming book which just happens to have Ruby in it. And it has the best solution for EJB ever written.

As one of the first Ruby books available which didn't require learning Japanese, I suspect Code Generation In Action had a major influence on the design of Rails. I found out about it from DHH's blog, and certainly code generation is one of Rails' core strategies. Ruby makes code generation so easy that duplicating model code for two sides of a network doesn't have to be any kind of inconvenience at all. Although I currently don't have any code to address this, we may need it at my current project sooner or later, and if I were to lay out any kind of ActiveResource roadmap to make the framework more useful for people who want to use it this way, automating duplicate model generation would be a major milestone.

Update: in practice there are very serious roadblocks inherent to this architecture. You lose so much by replacing ActiveRecord with ActiveResource that it's almost like coding Web apps without a framework at all. The architecture's very handy, but it doesn't come for free.

16 comments:

ActiveResource's biggest hurdle is your essentially putting HTTP lipstick on a scalable-messaging pig and thus vying for market share in industries and implementation situations where money is probably no object (finance , military , etc) but covering your ass is. ActiveResource still has no where near the track record of JMS or to a lesser extent MSMQ , and dosen't have custom ActiveResource implementations by IBM , Oracle , TibCO , et al like JMS does. Which allows people in those industries to go home and sleep because in the morning they have someone to call and yell at if something is wrong.

To be fair I don't really blame DHH about not pushing ActiveResource. Not because its not worth his time its just going to be a message that he is either going to be preached to the choir (kool-aid drinking rails community) or fall on deaf ears 'big-iron' industry types.

Which comes to my next big point. Rails needs to have a big iron backer. Sun was Java's big iron backer , and although not a direct relation to the context IBM was Linux's big iron backer.

The point being is for rails to ever make the cross over from web apps (whether they be simple or complex) to serious global usage in mission critical infrastructure we need to get some large company to say "Yes , we support rails , and here is why"

No way. Big iron considered harmful. Absolutely central to Rails' philosophy is the idea that making those people happy is not enough fun to be worth doing. I absolutely agree. For me the Rails Envy guys have it all wrong. Rails vs .NET is a silly question. I mean, they're funny, definitely, but from my point of view that kind of programming is so not fun that if it was the only type of work available I'd move into another field. It's not Rails vs. PHP, it's Rails vs. drawing and painting. There's always been programming which was satisfying for its own sake. Rails is a way you can do that kind of programming in the daytime, while paying your bills.

Anyway, that would change with big iron involved. Big iron is the devil in Rails' worldview, and I'm absolutely down with that. The devil. Fundamentally the enemy. Rails doesn't need big iron, and if it encounters big iron, it should either attack or run away. Big iron would be so utterly poisonous to the Rails culture that whatever benefit an endorsement might give would not be worth it.

Also, in a sense, Rails already has big iron backers. Sun's already gotten behind Rails, by funding the JRuby project, and Jeff Bezos from Amazon has invested in the company which created it. But these big iron backers have nothing to contribute but money. All the vision comes from the community.

Your comparison of Java to Rails is way off, also. Sun wasn't Java's big iron backer - Sun was the creator of Java. Without Sun Java would not have existed. And there are huge stretches of Java which are just hideously disgusting. Pretty much all the good stuff in Java came from Josh Bloch, who was brought in to patch up the mess that Sun had made. Java is really not something to aspire to or to want to be like.

Also I think if it were lipstick on a pig it'd be HTTP lipstick on an object-brokering pig. Scalable messaging is a whole nother ball of wax.

I think you're way off on every point. But what's really weird about your comment is that you didn't specify what kind of benefit an endorsement would have in the first place. It's not even clear what the payoff in being endorsed by a "big iron backer" is supposed to be. It sounds like you're saying the big advantage is that it makes it easy to get crappy customers. But crappy customers are not good to get. A crappy customer is like food that's gone bad. You should avoid it no matter how hungry you are.

Firstly, I agree with Giles, stacking ActiveResource against JMS is a bit of a specious comparison. They serve two different needs.

Next, Big iron considered harmful indeed. I think a lot of companies are waking up to this idea as well. The biggest of the iron, Google, lead the way here showing that you could beat big vendor solutions with a combination open-source and roll-your-own software.

The CYA you get from signing a contract with IBM or Oracle or whomever usually isn't worth the price. You spend a lot and get very little. If something breaks is one of two things: 1) You have a well known and you could probably find the solution in their knowledge base or with a bit of googling as fast or faster than you can get an answer from their support organization. 2) You have an obscure problem, if its a monolithic product like Oracle you are tied to their patch cycle which runs at their speed, not yours. Maybe there support can come up with a workaround, and maybe they don't make you upgrade your support contract to get it.

That may seem like a jaded perspective, but it has been typical of my experience with big iron support. Open source community software is almost always a better solution IMHO. Even with something like MySQL or Tomcat you can often diagnose the problem and patch it as fast or faster than you can get a fix from a vendor. And if you can't do it you can find someone in the community who can, and maybe you have to pay them, but its still probably less than your support deal. Also, there's all that money you saved on a license.

Exactly. I take it as a given that open source is superior to licensing.

Also, don't forget that the big iron backers had armies of people hacking away at these problems for a decade without once producing Ruby on Rails. Even the EJB solution I describe passed under their radar.

This discussion has VERY little to do with the subject of my post, however. One of the assumptions of my post is that open source is always superior in every case. Another assumption is that big iron is considered harmful. It's a good thing to question your assumptions from time to time but for me personally that time is not now. Seriously, I'm sorry, but if you don't agree with those assumptions, that's a totally different conversation, and I personally don't have time to have it. I'm WAY busy. Whether true or false, those assumptions are the foundation of my post. The big iron assumption is practically the foundation of my whole blog.

True, I was writing and realized that my comment was about two different things, the big iron and ActiveResource as a scalability pattern so I split it up, and being cynical and jaded the big iron part came more easily.

Back on-topic:

Another great thing about ActiveResource is that it frees you from having to use ActiveRecord at all. You can still take advantage of the fun and (development) speed of Rails if you can't, or don't want to use ActiveRecord for some reason. The Rails resource conventions are pretty simple and should be easy to implement with any back-end you like.

I work on an open source sharded database project, HiveDB and we are using the same web service -> ORM pattern with great success. Right now, hiveDB is just a Java library, but in the next couple of releases we are going to be pushing a lot of code form the Cafepress implementation down into the open source project and it will become a full webservice-as-data-access-layer solution.

It would be nice to have one Active(?) model class that could use ActiveRecord or ActiveResource as a persistence strategy to alleviate the duplicode problem. I wonder if there is way to combine the ActiveRecord and ActiveResource hierarchies without the code getting too gnarly. The only thing I've done with ActiveResource so far is a little playing around with the Highrise wrapper, but I wonder if you could generate ActiveResource calls dynamically like ActiveRecord finders? If so you could have a model class that delegates to either an ActiveResource or ActiveRecord based on configuration. Just thinking out loud now, it might be a dumb idea. But if there was a convention around the controller method for finding one object by another's id then it seems like it wouldn't be too hard. Maybe something like ":resource/:id/:related_object_type" (generalized from Highrise feed urls).

Maybe this is the point where AR dynamically generating getters and setters from the table becomes too much magic. It's only "too much" in the sense that it causes problems when you want to swap in ARes for it.

The problem with ARes, as you said, is that you don't know what attributes an object has until you've actually pulled the resource from the web. How do you handle it in forms? Use method_missing and accept every call? Obviously you have the problem of typos and no real checking until the AR server blows up.

So perhaps it's time to start defining the attributes inside the model itself. You have a single model that doesn't know anything about persistence, but it defines all the business logic - including basic getters and setters. Then you can swap it with some persistence strategy. For now this is either AR or ARes, but obviously that opens it up to whatever else you want.

Ultimately I think the problem lies in the ActiveRecord pattern itself. It's very useful and there's no denying that the Rails implementation is slick. But it couples business logic to persistence logic, and that makes it tough when you want to use some other persistence mechanism. I don't have PoEAA on hand, but basically Fowler says that AR is awesome when you can accept that coupling, but as soon as you start to get a bit more complicated you'll need to look elsewhere.

I think you're onto something there, but I think it's possible to make ARes map to ActiveRecord a lot more gracefully than it does right now. My current project is under a pretty tight deadline, but if I had the time to do it right, I think I'd have the same code which generates rake db:schema:dump auto-generate explicit attr_accessors or something similar on ARes models, and do it in a way compatible with what ARes does with method definition now. It's definitely **possible**, and I think with Ruby it might even end up elegant enough to be worth doing.

-object brokering can be done via messaging systems. in fact when your talking highly distributed systems across the world thats what they rely on.

-I'm not sure if you had a bad day or what but since you felt the need to have an attack fest on my post let me just give you my main opinion.

1. We are not the only smart people who have access to computers

2. Other smart people who work in big buildings for companies with nice logos might see what were doing over here and say 'that is a good idea'

3. Those people might start spreading and using rails more and *gasp* CONTRIBUTE in the SAME design asthetic that has gotten rails this far.

4. People who work at "crappy companies" can enjoy software that was written by developers who were able to take advantage of all the nice things.

Thats it. We are all WAY busy , so I'm not sure why you assumed I was looking for an ideological discussion. I like all the assumptions about this blog, I have many of the same. I am just stating what I have observed in industry.

Thanks for your blog its great btw.

But were not the only ones doing cool stuff and to think that our way and only our way is absolutely best now and will be absolutely best in the future is dangerous.

Or maybe "mixin some persistence strategy"? It seems like that would be the ruby way to do it. To make AR and ARes and A* persistence mixins. It would preserve at least some of the semantic of AR. Person.find_by_name('Dr. Gonzo') just reads so well. There is an Amazon web services plugin for AR that operates in a similar fashion loadsfrom_amazon.

Obviously you have the problem of typos and no real checking until the AR server blows up.

Yeah, the validation code would definitely need to be moved up into the persistence agnostic model class.

Aside: Every time I use words like agnostic I wonder how it would fit into the notion of opinionated software?

Hey Nima, nothing personal, I'm just very very firmly opinionated against big iron.

@Britt - I'll have to check out loads_from_amazon. I think Rails is opinionated about ARec and database-agnostic - you can combine opinionated with agnostic, in fact choosing what to have opinions about is probably a very important step. in practical terms everything I'm doing which attempts to treat ARes as a swap-in for ARec requires a **LOT** of steps to get it to work - manually combining date params, manually opening classes and/or objects and assigning accessor methods, manual this, manual that, etc. I think when I get time to think it through the changes will be extractable into something tidy, but so much of the Rails freebies are currently missing that it's almost like developing Web apps without a framework at all. I definitely hope I can plunder something from loads_as_amazon, because ARes just doesn't plug in effortlessly for ARec at all. even [] needs to be defined on it, and the attributes stuff in ARes' method_missing is much less useful than it is in ARec, because it can't automatically handle associations or attributes that aren't passed over the network (i.e. user passwords).

This was touched on in a several comments above, but to use ActiveResource as a mere wrapper for ActiveRecord is a recipe for trouble -- you lose a lot, and gain only a little. Instead, we're looking at ActiveResource as a slick but simple way of providing a unified interface for resources whose data source does not lend themselves to ActiveRecord -- LDAP, legacy systems, etc. -- or better yet, when a model aggregates data from multiple sources.

There is a huge need for this in the enterprise, where -- unless you've signed your soul away to a single vendor (and even then...) -- a whole menagerie of heterogeneous system need to talk. ActiveResource is simple enough to act as the common language. We're now in the process of putting our corporate user directory behind AR, providing a single interface for accessing a "User" model sourced from all over the place (ActiveDirectory LDAP, HR, voicemail system....). And because it's just HTTP and a bunch of standardized URL, just about anything can talk to it.