Search This Blog

Automated deployment systems: push vs. pull

I've been immersed in the world of automated deployment systems for quite a while. Because I like Python, I've been using Fabric, but I also dabbled in Puppet. When people are asked about alternatives to Puppet in the Python world, many mention Fabric, but in fact these two systems are very different. Their main difference is the topic of this blog post.

Fabric is what I consider a 'push' automated deployment system: you install Fabric on a server, and from there you push deployments by running remote commands via ssh on a set of servers. In the Ruby world, an example of a push system is Capistrano.

The main advantages of a 'push' system are:

control: everything is synchronous, and under your control. You can see right away is something went wrong, and you can correct it immediately.

simplicity: in the case of Fabric, a 'fabfile' is just a collection of Python functions that copy files over to a remote server and execute commands over ssh on that server; it's all very easy to set up and run

The main disadvantages of a 'push' system are:

lack of full automation: it's not usually possible to boot a server and have it configure itself without some sort of client/server protocol which push systems don't generally support (see 'pull' systems below for that)

lack of scalability: when you're dealing with hundreds of servers, a push system starts showing its limits, unless it makes heavy use of threading or multi-processing

Puppet is what I consider a 'pull' automated deployment system (actually to be more precise, it is a configuration management system). In such a system, you have a server which acts as a master, and clients which contact the master to find out what they need to do, thus pulling their configuration information from the master. In Puppet, configuration files are called manifests. They are written in a specific language and they are declarative, i.e. they tell each client what to do, not how to do it. The Puppet client software running on each server knows how to interpret the manifest files and how to translate them into actions specific to the operating system of that server. For example, you specify in your manifest file that you want a user created and you don't need to say 'run the adduser command on server X'. Other examples of 'pull' deployment/configuration management systems are bcfg2 (Python),Chef (Ruby) and slack (Perl). A newcomer in the Python world is a port of Chef called kokki (it looks like it's very much in its infancy still, but I hope the author will continue to actively develop it).

The main advantages of a 'pull' system are:

full automation capabilities: it is possible, and indeed advisable, to fully automate the configuration of a newly booted server using a 'pull' deployment system (for details on how I've done it with Puppet, see this post)

increased scalability: in a 'pull' system, clients contact the server independently of each other, so the system as a whole is more scalable than a 'push' system

The main disadvantages of a 'pull' system are:

proprietary configuration management language: with the notable exception of Chef, which uses pure Ruby for its configuration 'recipes', most other pull system use their own proprietary way of specifying the configuration to be deployed (Puppet's language looks like a cross between Perl and Ruby, while bcfg2 uses...gasp...XML); this turns out to be a pretty big drawback, because if you're not using the system on a daily basis, you're guaranteed to forget it (as happened to me with Puppet)

scalability is still an issue: unless you deploy several master servers and keep them in sync, that one master will start getting swamped as you add more and more clients and thus will become your bottleneck

My particular preference is to use a 'pull' system for the initial configuration of a server, including all the packages necessary to deploy my application (for example tornado). For the actual application code deployment, I prefer to use a 'push' system, because it gives me more control over how exactly I do the deployment. I can take a server out of the load balancer, deploy, test, then put it back, rinse and repeat.

In discussions with Holger Krekel at PyCon, I realized that execnet might be a good replacement for Fabric for my needs. It already provides remote command execution via ssh, and an rsync-like file transfer protocol. All it needs is a small library of functions on top to do common system administration tasks such as running commands as sudo, etc. I also want to look into kokki as a replacement for Puppet in my deployment architecture.

A parting thought: my colleague Dan Mesh suggested using a queuing mechanism for the client-server protocol in a 'pull' system. In fact, I am becoming more and more convinced that as far as scalability is concerned, when in doubt, use a queuing mechanism. In this deployment architecture, the master would post tasks to be done by a specific client to a central queue. The client would check the queue periodically for a task assigned to it, would execute it then would send a report back to the server when done. Of course, you need to worry about authentication in this scenario, but it seems that it would solve a lot of the scalability issues that both push and pull systems exhibit. Who knows, we may build it at Evite and open source it...so stay tuned ;-)

Get link

Facebook

Twitter

Pinterest

Google+

Email

Other Apps

Comments

Good compare/contrast between the two different approaches. I've actually separated the two categories of tools not by "push" vs "pull" but rather orchestration vs configuration management. Orchestration provides a centralized coordination of multiple steps in the distributed environment. Configuration management drives each host to a particular state based on the specification (be it proprietary domain-specific or general purpose language).

You seem to suggest with fabric you can't bootstrap a system, but this is exactly what I am doing. Using rackspace cloud I've been deploying Django sites in a matter of minutes after first getting access to the fresh VM.

"you install Fabric on a server"

I don't have Fabric on my server. Only on my desktop.

Am I doing something horribly wrong or have I just miss-followed your post?

Dougal -- thanks for the comment. When I said that you can't bootstrap a system with Fabric, I was referring to a fully automated bootstrapping/application deployment scenario, where you just boot up a VM and have the boot mechanism configure it so it contacts a master server and 'pulls' its configuration information from there, then proceeds to configure itself with no further intervention from you. Note that you didn't run any commands on it, other than booting it up (for more details, see http://agiletesting.blogspot.com/2009/09/bootstrapping-ec2-images-as-puppet.html)

In your scenario, you boot up a VM, then from your desktop you 'push' commands to it via Fabric, telling the VM to install the appropriate packages, then your own application. This works well up to a point, and I don't recommend changing this procedure if it works for you. However, it starts to break down when you're dealing with hundreds of VMs.

So at the end of the day it's a question of scale, which is why the word 'scalability' appears so often in my blog post.

In any case, I am a big fan of Fabric myself, so I'm glad it works well for you too.

I actually think these push systems are nice complements to Puppet et al - use them for ad-hoc management when necessary, and for workflow-based tasks that Puppet doesn't (yet) support, and use Puppet for the core configurations.

For the record, Puppet can be used in more like a push mode, but it's still not ad-hoc - you can push the code and trigger the run manually, but it's still going to be model-driven and tend to be a comprehensive configuration.

For me the bigger question is are you making an kind of one-time change, like triggering a deployment, or are you declaring the overall state of the system? Eventually Puppet will excel at the deployment triggers, too, but for now it's better to use its declarative tools in combination with an ad-hoc tool like Fabric or capistrano.

Luke -- thanks for the comment, I fully agree with you that push and pull tools can be successfully used in a complementary fashion. The Puppet + Fabric combo has been working well for me. As I said, a gripe I have w/ Puppet is that its configuration language is proprietary and easy to forget, unless you use it on a daily basis.

I did a bit of research into Config management systems, and while most of them are pull, I did come across Smartfrog from HP research which wa a peer to peer push/pull which looked very interesting.

I use Fabric for most deployment functions but it falls short on real config management, currently I use buildout, due to a fairly small numebr of servers, but for scaling up I'll certainly look at smartfrog or possibly puppet again.

It seems like comparing Fabric to Capistrano is more apt than comparing it to Puppet. Deployment tools like Fabric and Capistrano are procedure oriented, whereas configuration management tools are more "state achievement" oriented. Yes, states are achieved by running procedures but the configuration management system should encapsulate them. I think ultimately you can have configuration management replace procedures; chef-deploy is a good example because it allows you to regard the code revision deployed as part of the configuration state.

I like what EngineYard has done (apparently RightScale does too, they copied it), they use a message system (nanite) with Chef to tell the edges they need to update. That may make the server vulnerable to the thundering herd problem that all pull systems are but it allows one to "push" in a scale-free manner. Multiple hosts simultaneously updating will swamp the configuration management system but I think your point about putting a queue in to moderate a scaled up environment alleviate that is good.

Ian -- thanks for the comment. I did say that Fabric is in the same category as Capistrano, your 'procedure oriented' category, my 'push' category. You're making good points. Thanks also for the link to nanite. It seems similar to what I had in mind when I was referring to a queuing mechanism, but at the same time it also seems very complicated. As they say themselves, 'it has a lot of moving parts'. I would aim for something simpler.

Volcane -- I am aware of mcollective, it's a bit too complicated and proprietary for my taste. But I am following your project with interest and I am definitely getting new ideas from it, so great job with it!

R.I.Pienaar don't take my criticism too hard ;-) I was referring most of all to the RPC-based language that I'd have to learn if I were to write new clients and agents:

http://marionette-collective.org/simplerpc/

In general, learning somebody's framework's language turns me off, because my brain has limited capacity. I find that the more levels of indirection I need to learn, the harder it gets to remember everything I need for my job....

That's why I prefer Chef to Puppet, because at least I know everything is pure Ruby as opposed to somebody's 'proprietary' DSL.

All this being said, I do understand that you can do a lot with mcollective out of the box, without worrying about SimpleRPC etc.

OK, I see what you mean, gotta be said though it's not a DSL or custom language.

It's just Ruby with a few helpers, the only DSL like bit is the description files and they are 100% optional. But I agree getting to know the design principals behind even a native Ruby framework can be a challenge.

You're perfectly right in that deployments on large number of servers will require multi-threading but that's hardly rocket science. KwateeSDCM is a fairly simple and straightforward tool that does just that

Hi,I'm new to deployment automation and in search of deployment automation tools including server provisioning. I gone through your post for Push vs Pull systems and found it is interesting. Can you suggest such tools for Java and Java EE deployment automation. I short listed a few tools. But an you suggest how Puppet is useful for mu case.

Popular posts from this blog

Here's a good interview question for a tester: how do you define performance/load/stress testing? Many times people use these terms interchangeably, but they have in fact quite different meanings. This post is a quick review of these concepts, based on my own experience, but also using definitions from testing literature -- in particular: "Testing computer software" by Kaner et al, "Software testing techniques" by Loveland et al, and "Testing applications on the Web" by Nguyen et al.

Update July 7th, 2005

From the referrer logs I see that this post comes up fairly often in Google searches. I'm updating it with a link to a later post I wrote called 'More on performance vs. load testing'.

Performance testing

The goal of performance testing is not to find bugs, but to eliminate bottlenecks and establish a baseline for future regression testing. To conduct performance testing is to engage in a carefully controlled process of measurement and analys…

I first saw nsupdate mentioned on the devops-toolchain mailing list as a tool for dynamically updating DNS zone files from the command line. Since this definitely beats manual editing of zone files, I'd thought I'd give it a try. My setup is BIND 9 on Ubuntu 10.04. I won't go into the details of setting up BIND 9 on Ubuntu -- see a good article about this here.

It took me a while to get nsupdate to work. There are lots of resources out there, but as usual it's hard to separate the grain from the chaff. When everything was said and done, the solution was relatively simple. Here it is.

Gatling is a modern load testing tool written in Scala. As part of the Jenkins setup I am in charge of, I wanted to run load tests using Gatling against a collection of pages for a given website. Here are my notes on how I managed to do this.

Running Gatling as a Docker container locally

There is a Docker image already available on DockerHub, so you can simply pull down the image locally:

$ docker pull denvazh/gatling:2.2.2
Instructions on how to run a container based on this image are available on GitHub: $ docker run -it --rm -v /home/core/gatling/conf:/opt/gatling/conf \ -v /home/core/gatling/user-files:/opt/gatling/user-files \ -v /home/core/gatling/results:/opt/gatling/results \ denvazh/gatling:2.2.2 Based on these instructions, I created a local directory called gatling, and under it I created 3 sub-directories: conf, results and user-files. I left the conf and results directories empty, and under user-files I created a simulations directory containing a Gatling load test scenario writ…