Orion Henry on Heroku, Doozer and Paxos, Ruby

Recorded at:

Bio Orion Henry has been a principal architect and founder at Heroku since its inception in 2007. He designed much of the company's infrastructure and pioneered its use of Erlang. Orion has founded several technology companies in the last decade as well as run a software consultancy in Los Angeles. He graduated from UC San Diego in 1997 with a degree in Computer Science.

Sponsored Content

The Erlang Factory is an event that focuses on Erlang - the computer language that was designed to support distributed, fault-tolerant, soft-realtime applications with requirements for high availability and high concurrency. The main part of the Factory is the conference - a two-day collection of focused subject tracks with an enormous opportunity to meet the best minds in Erlang and network with experts in all its uses and applications.

Nice to meet you. My name is Orion Henry, I’m one of the three founders of Heroku, the Ruby Platform as a Service launched in late 2007 early 2008 part of a YCombinator batch acquired by SalesForce this last December - January and also a big Ruby and Go and Erlang advocate here at the Erlang Factory.

We are trying to provide a product that is different from hosting but is more, is not hosting sort of in the same way that an automobile is not a horse, is not a carriage. We are trying to sort of obsolete that entire idea and provide users with a powerful, integrated cloud platform that basically takes care of automatically of all the basic maintenance and futz you deal with any kind of hosting environment. Much in the same way when Amazon came out with EC2 they virtualized away a lot of the things you had to deal with with the physical servers and physical hardware. We are virtualizing a way a lot of the things have to do with the operating systems and the organization of your application, now with our platform all you really need to deal with is your code and the deployment of that code, providing the highest level of abstraction there is currently available out there for deploying code, for running code in production.

I believe the one that most properly fits us is PaaS- platform as a service. We are not Software as a Service, which would be providing email as a service like Gmail or CRM as a service like SalesForce. That qualifies as PaaS or SaaS. If we were providing virtualized VPS or EC2 like services that’s infrastructure as a service, we provide something that is very different from both of those, it’s a platform for running and deploying your code, it isn’t servers and it isn’t running software.

A Ruby application by default generates for you a Postgres database on our platform for you to store your data in, you can use that as a queue if you want to, we have a wide catalogue of add-ons so that if you want email services or New Relic, if you want a queuing system, these kind of things are available on our add-on catalogue and then you can get those made internally by Heroku others are made by third party vendors running these services in the EC2 cloud and then can provide you low latency access to their queues and other various services.

We don’t do any automatic scaling with the platform at this time. We provide users with the ability to independently scale whatever processes they need, so each application obviously has a very different sort of scaling profile. It would be very dumb of us to just naively throw 100 processes at a problem if you load one up because for all we know that would cause more contention, more lock in your database and make things worse not better. And given that you are paying for these dynos, scaling them up in these cases when you may not need them is unwise. A lot of our users either have a good sense of when the load is going to go up and preemptively going to raise the number of dynos that they need, upgrade the database that they need, if they need a larger one or in some cases have, you know, monitors on their code that notice “we’re under too much load and it’s time to allocate more resources” and they’ll make an API call into our system requesting more dynos in an automated fashion. That’s how it currently works.

When we launched the Cedar Stack, NodeJS is now officially supported on our platform. We didn’t experimental launch with it, kind of for fun about a year ago, but it wasn’t very useful because our HTTP stack was still doing HTTP 1.0, downgrading to that as connections would go through nginx at one point and Varnish at another point, so doing a lot of the long polling and chunked responses and things like that people enjoy, it didn’t work on the platform so there was not a lot of benefit at that time. We have since upgraded our stack to have all the features that people are looking for and as such now you can deploy NodeJS apps, it has a proper slug compiler, it will pull dependencies and libraries and add them to your application appropriately and we will give you all the routing people need for their NodeJS apps.

JavaScript is a much maligned and misunderstood language, but in fact it’s a very light abstraction over the reactor model of concurrency which honestly I didn’t understand the first couple of hundred times I wrote JavaScript and just thought it was a completely insane decision to have every function, pass functions in as the argument to get the result but when you understand the reactor strategy to concurrency it is a very valid strategy and JavaScript is a pretty good implementation of that. But really the reason why NodeJS is an excellent app is there is a lot of energy and momentum behind it right now, a lot of bleeding edge people are enjoying it and playing with it at the moment but also because it still is relatively small community that isn’t yet fragmented so we can provide a single solution for NodeJS users that will make all of them happy versus if we went to something say something more generations old like if we were going to provide C++ support, provide Fortran support, there‘s so many different ways people use that and so many different libraries, and so many different deployment techniques we would never be able to make anyone happy for more entrenched languages. As we are adding languages to the platform picking newer languages with a smaller base and less fragmentation, obviously it makes more sense exploring our polyglotism.

Yes, obviously Node does a good job at least for IO dependent tasks of exploiting parallelism internally within the V8 VM. But just like a Ruby process if you need multiple nodes you can simply scale up and scale down the number of back ends to handle whatever it is that you are doing.

Yes, we certainly considered offering JRuby and at some point in the future we might, I don’t know, we don’t have a hard roadmap for what other versions of Ruby or what other languages we might potentially offer but that is certainly one that we are excited about, you know providing people.

We’ve gone through three stacks now, Aspen was our original, Bamboo was the de facto stack for the last year and a half or so, and Cedar is what we just launched, ABC is what we just launched last week. We’ve gone through and found a lot of Ruby specific parts of our stack, we have sort of cleaned them up and made them more general abstraction process management, we’ve improved the tools for launching and running processes, we’ve created this thing called “Procfile” which is basically a way to describe the processes that make up your app, how to run them, what they’re called, how many of them there are and in a very general sense. In Bamboo we said “listen, this is the way your app works, you have foreground and background processes, you have web and background processes, with web you use Thin, with background processes you can use DJ” and figured that would cover most people’s needs. Truth be told, there is a lot of other ways we might want to construct an app, you may not want to use Thin, you might want to use Goliath, we might not want to use DJ, you might want to use Beanstalk or a RabbitMQ consumer. Now with Procfile you can just enumerate all the different types of processes you need, you can scale them independently you can specify exactly how you want them to be launched, if you are working locally you can use the foreman gem to spin all these processes up, to allow you to test on your laptop, you can also have foreman export upstart scripts so if you want to take your app with Procfile, you don’t want to use Heroku but you want to use this tool and deploy to production, you can generate all the upstart scripts necessary to do it and if you deploy the Heroku we’ll read your Procfile, we’ll determine all your different process types and you can say ”I need three high priority queues and I need five low priority queues and give me ten web workers” , all that is easily done and easily scaled in our cloud environment once the Procfile is installed. Procfile, I would say, is probably the key corner stone of this new system where it’s becoming a platform rather than for Thin and DJ, but for platform for general process management, however they assemble to build your application. There is a lot of other things that come with it, including the abstraction from the one free dyno to the free dyno hours, that’s going to become more interesting later on as we give you tools to play with, your idling behavior and things like that, and also upgrading the HTTP stack to give people better control over HTTP requests and their SSL endpoints and things like that.

We currently, officially support Ruby, officially support NodeJS, there is a lot of flexibility for the Procfile model, I’ve been looking online and seen a bunch of people who found new ways to run and deploy languages and types of applications we don’t officially support yet. I’ve seen Python and Django apps running on our platform, I’ve seen Clojure apps running on our platform [Editor's note: Heroku now officially supports Clojure], I’ve seen Go apps running on our platform, so its pretty exciting to see what people can do and there is a lot of flexibly there, I suspect we'll see a lot more interesting things coming out of the platform now that there is so much more ability to adjust what you are deploying to our platform.

For our core process management infrastructure it’s still written predominantly in Ruby. We use Erlang for our routing infrastructure, the HTTP router, the syslog router, the TCP router, all these things are written in Erlang and provide us with a great deal of control and reliability and concurrency. Erlang is an excellent language when you need to make very robust, highly demanding and very concurrent applications. But the central logic that deals with the launching of the Cedar processes, the API calls, the web interface, the command line interface is all written in Ruby. Go is the third language we use and that is seen in Doozer, the open source project that Blake and Keith recently launched as a Paxos implementation that we are starting to integrate into our platform.

Well, Paxos solves high availability distributied consistency; it’s sort of a different corner of the CAP table than most data stores. Basically a bunch of academic papers came out and basically described this algorithm of distributed consensus for Paxos where you have a bunch of nodes getting together and any time there's a mutation state and they all vote on it and you have to have a quorum of members available in order to change the state, if you get isolated you basically fall into read-only mode you can resync with the rest of the quorum you might get bumped out as a master node and at that point have a slave node nominated and you find that out when you resync. And if you need no single point to failure and if you need consistent data, Paxos is the only algorithm that provides that. We couldn’t find any good open source implementations of the Paxos algorithm, which is a shame because no single point failure is something that everybody wants. There are certain operational and performance trade-offs you need to make in order to get that kind of consensus but the utility provided is extraordinary. One of Google’s biggest keys to success of their infrastructure is they implemented Paxos internally vis-à-vis a piece of software known as Chubby and that’s one of the reasons why they have been able to produce such a highly reliable robust infrastructure for all of their things. We’re looking for a good Paxos implementation because as much as we can remove single points of failure from our platform, eventually somewhere there was a single point of failure that we could push into a smaller and smaller area but we wanted to remove it entirely and we realized in order to get there we need Paxos. Blake and Keith did a fantastic job researching it and implementing the algorithm, it’s open source now and now everybody gets to benefit from it. I’m envious of all the new startups who want to start things and have access to something as cool as Doozer to use for that, it’s a great tool and the world is a better place for it now that it’s out.

Go is a good choice, it’s a low level language more similar to C than most other higher level languages and it has a really robust concurrency model. The Goroutines in Go use a queuing system that is actually remarkably similar to Erlang’s messaging system for concurrency. It’s definitely a slightly different design in the case of, the fundamental difference in the case of Erlang you send the message asynchronously and you block to receive them, whereas in Go you block to send the messages instead of the receiver. It’s a slightly different trade-off, you get different issues with contention and backed up queues of data, honestly they are a lot more similar than most people realize, especially the way they attack that strategy. It’s a cool language, Blake and Keith got very excited about it when it first came out and wanted to do something very interesting with it, I think it was a good choice. Erlang would have been another fantastic choice to build in, but Go turned out to be an excellent choice and got the project done.

Given you need to get consensus from multiple nodes in order to make any kind of mutation of state, you can get very good read performance on the Paxos nodes but the write performance suffers as result in the case of Doozer. You can get hundreds of writes per second versus what might normally be tens of thousands or hundreds of thousands of writes per second on a system that doesn’t need to check with all of its buddies every time it wants to change state. Normally you do a multi-tiered approach, where you can use, for instance if you have lets say master slave replication with a database or with Postgres or MySQL or Redis, you can use a Doozer cluster, a Paxos cluster to figure out who the master is at any given time, if you want to know if its time to promote a new master you can do so in a reliable way involving consensus so you don’t have any of those horrible problems where you get network partitions and half the users decide “it’s time to promote the master” and half the users don't and all your data is all over the place. So its sort of using for coordination of other less consistent items and data sources and nodes , it’s a good way to utilize its robustness but without having to limit yourself to a couple of hundred writes per second.

How NoSQL buzzword compliant? Well I don’t know about that, we use Redis pretty heavily, we re big fans of Redis, we now have Doozer which you could argue is a NoSQL data store of our own creation, we are not leaning heavily on any of the other NoSQL data stores, not because they’re not awesome, there a lot of really really good ones out there but there hasn’t been a really keen fit , but at the same time we are also very SQL friendly, we run the biggest SQL database as a service system probably on the planet . With our Shogun add-on for Heroku, we have some fantastic SQL experts on our team. Ryan came out with his Queue Classic, which does really nice low latency evented queuing system through Postgres tables, way more efficient than any other implementation I have ever seen and it's really nice. When it comes to data, Heroku is a great place in general, SQL or NoSQL.

I do, not as much as I used to, I have a lot of responsibility now dealing with the company as a larger entity, managing people, providing guidance on the direction of the company. But I would go crazy if I couldn’t code, I am an engineer at heart so I have to go in and write code, I have to have my maker hours in order to stay sane.

Lot of things have improved, like Ruby 1.9.2 is a much better implementation of the language than we have had in the past let’s say a couple of years ago a lot of fragmentation with the Ruby VM. It was very clear the VM was not as good as it could be, you had at least five or six teams, all heading off in their own directions doing a lot of interesting work. There was Rubinius which was founded by Engine Yard, they sort of pioneered to whole RubySpec process which was fantastic; JRuby provided a lot of integration with the Java world, provided a good threading model for people who needed it, but things seem to be consolidating now on the 1.9.2 implementation which is nice to see that fragmentation come down, the Ruby Gem system was kind of a mess when it first got started, there was the ability to publish and install gems was a bit spotty, but since Gemcutter came out that’s been fantastic and made to serve as the canonical gem source. Bundler has come around and now there’s finally excellent app specific way to specify gem dependencies and installation which is another thing the language really needed, the syntax has got a lot of nice sugar in the 1.9 release . I’m very happy with Ruby and the strides it made as a language in the last couple of years, and that growth has really helped Heroku too. When we first made Heroku we had to have a forked version of Rails installed on all of our runtimes in order to do things in a way slightly different than Rails wanted to do it in order to run in our cloud environment. We’ve taken a lot of those changes, we’ve been able to submit them to the community and to the Rails developers and get them incorporated so as it stands now there is almost nothing that we have to do to the Rails app in order to run well because a lot of those lessons we learned on our platform and have been integrated into Rails and into Ruby.

1.9.2 is the default I believe it is the only Ruby the VM currently installed , there may be support for other Ruby VMs down the road but as of the release of 1.9.2 it’s where it’s at. We are hoping the community will really start standardizing the 1.9.2 VM, that consolidation would help everyone.