I have a decent amount of experience at this point with puppet both from experience using it to manage the infrastructure running Fedora as well as setting it up at a pretty large scale at HubSpot. But in a new gig, I decided it was worth rounding myself out a bit and giving chef a try. Not out of any deep seated dislike of puppet but there are a few pieces that I’ve continued to run up against which are a little grating and so I figured it was worth broadening my horizons. The nice thing is that both are fairly successful open source communities and realistically, as long as you’re using a system, you probably can’t go that wrong or switch in the future.

Side-note: I’ve also been playing with Michael Dehaan’s new project, ansible which is also interesting. But I don’t think it’s mature enough to use for a production environment yet and I also was mostly interested in it as a better remote execution layer as opposed to another full fledged config management tool. But yeah. It’s there. It’s interesting. I’ll probably write more about it later.

With a little bit of chef time under my belt, I have to say that I’m not struck by drastic differences. The terminologies are different, the DSL used on the config side is a bit different but they act pretty similarly and you can get either of them to do what you want. That said, there are a few things (good and bad) that I’ve noticed about chef and figured I’d share for others who are looking at deciding for themselves. Note that a few of the things in the dislikes section may well just be me missing something and being a n00b… suggestions welcome!

Things I’ve Liked

Hosted Chef is a very very nice option to have. Props to the Opscode team for building an infrastructure to run the server side for youand especially for making the barrier to entry nearly zero by letting you manage up to five hosts for free. Given some of my headaches around running a puppetmaster previously, I’m glad not to be having to pull together everything to run a chef server

Knife is actually pretty cool. I was skeptical before using it but it does a pretty nice job of encapsulating a lot of common tasks for you

Knife gets really cool with the addition of the ec2 plugin. Launch servers, register them with hosted chef and have them ready to go. I’ve built all of the surrounding bits and as the environment I’m dealing with grows, I think I’ll grow out of being able to use knife ec2 effectively, but it’s great for an easy starting point

Chef solo seems to work okay and have a few niceties over a master-less puppet setup but I didn’t spend much time with masterless puppet, so it’s probably just that I didn’t find the related nice pieces

Things I’ve Disliked / Been Annoyed By

The package support in the Fedora/CentOS/RHEL universe is pretty poor. I realize that all the cool kids use Ubuntu these days but tons of server infrastructures are not. Todd does a great job with the puppet (+ ecosystem) packages for Fedora and EPEL. Would love to see someone do similar for all of the Chef stuff

A lot of the cookbooks that are out there and published are Ubuntu specific. Even the ones which strive to work across distros often end up coercing the Fedora universe to look more like Debian. Which isn’t necessarily a path I want to go down

Probably just a side effect of this but a lot of cookbooks using things which aren’t the standard init system (eg, depending on runit)

knife-ec2 makes you think you can get away with using it but I keep tripping across things it doesn’t support and making me consider abandoning it

Like many people, we use Jenkins at work as our continuous integration server and we require that all changes that are committed go through being built in CI before they can get deployed. Yesterday, someone asked if we could add another jenkins slave to try to reduce the amount of time spent waiting on builds. While the slaves are fully puppetized and so it’s not much work to bring an additional slave online, my own anecdotal experience made me think that we weren’t really held up often in a way that additional slaves would help. I had a vague memory of some graphs within jenkins so eventually found them but didn’t really find them that enlightening. The scale is funky, it’s a weird exponential moving average and I just didn’t find it that easy to get any insight from them.

So last night, I sat down and wrote a quick little script to run via cron and pull some statistics and throw them into graphite. Already with less than a day of data, I’m better able to tell that we end up with a few periods of about ten minutes where having more executors could help that are correlated with when someone does a commit to one of the projects at the base of our dependency tree. So that gives us a lot better idea of whether or not the cost of an additional machine is worth the few minutes that we’d be able to save in those cases.

Since it didn’t look like anyone else had done anything along these lines yet, I put the code up on github. There are a lot more stats that could be pulled out via the jenkins api, this is really just a starting point for what I needed today.

I spent last week out in California for the O’Reilly Velocity Conference. It was in Santa Clara, which I hadn’t been to and frankly, I would be perfectly happy to not return. Parts of California are nice, Santa Clara is an office building wasteland. No good food options, nothing really going on, etc. But I was there for a conference and not for other stuff, so it sufficed.

The conference was actually very good. It has been a few years since I’ve been to a conference between grad school, my daughter being born, and being at a startup where conferences weren’t the priority. But it was good to get back to it. Had a lot of good hallway conversations with people about things that are relevant to us and saw a lot of good presentations. And Velocity is especially relevant to me at this point as it was all about various web performance and operations stuff. Where, unsurprisingly, there’s a lot of cool stuff going on.

I mostly kept to the more operations-y tracks just because they map better to what I’m currently working on. I’ve come away with a bunch of things to look into and posted a whole bunch of choice quotes over on Twitter, but a few takeaways boiled down for here would include

If you’re using a public cloud provider, plan for things to fail. Build your systems expecting it and you’ll have less pain.

HubSpot is doing an awesome job with post-mortems. DanM actually posted a great blog post over on our dev blog about things we’ve learned from doing a lot of them.

Everyone complains and focuses on javascript performance but that’s misguided. The bottleneck is the DOM. Interestingly, none of the browser guys talked about that apparently

DevOps has mostly been about putting developers into ops (hi!) but also needs to be about putting ops into dev

Web performance has been very successful in tying itself to business metrics. Weirdly, operations has overall been less successful at that

There’s a lot of work going on to help with debugging and working on webapps for mobile platforms. Very cool.

None of those are particularly earth shattering revelations, but still good to see/hear.

Also, on Tuesday night I did a talk for the Ignite track. So 5 minutes, 20 slides, auto-advancing. My topic was “Just Too Late” and was largely around some things I’ve discovered transitioning into a role where I’m doing more ops stuff and the fact that I feel like I get to things too late. But then turning it around and showing that’s not really so. Stay tuned for a longer blog post on the topic. But the talk went really well. It was fun, a lot of positive feedback and was good for me to get back to it. Looking forward to submitting some (full-length) proposals for talks for some conferences later this year.

I also had a few thoughts on the way conferences have changed since I last went to one

Twitter really is a pretty big game changer. Lots of conversation on twitter during the conference about which sessions were good, useful tidbits from sessions, etc. I actually felt that the experience was pretty strongly enhanced by it

Conference wireless still sucks. But you can get decent data now for devices and avoid the use of the conference wireless entirely. This made it easier to stay on twitter during the conference

An iPad (or other tablet) is a pretty perfect device for looking at stuff during a conference. It sits on your lap so you can just check it sporadically, the battery lasts all day, you can get data from a cellular provider, and it’s reasonably fast.

Anyway, good time was had. Thanks to all the people that I met and chatted up. And hopefully it won’t be as long before I make it to another conference

Instead of working on the product which is front and center to all of our customers or even working on the free tools at grader.com that millions of people use, I’m now instead focused quite a bit on various infrastructure related things for us. Obviously, I’ve done some of that all along, but at this point, it’s my primary job.

It’s a lot of fun. We are heavy users of EC2 and some of the other Amazon services. We also are using Rackspace Cloud some. And I wouldn’t be surprised if we add another provider in the future. So there is a challenge in making all of these environments look the same for the rest of our dev team as well as our on call folks. We’re also working to make it so that we can easily continue to scale out as our compute needs increase. All the sorts of things that I’ve spent some time thinking about over the years, but there’s no theoretical here — we’re really deploying, managing and everything else a pretty large distributed system. We are using a fair bit of open source stuff in addition to building some stuff ourselves. The first thing was obviously ami-creator but there’s more to come almost certainly. In addition, we’ll probably be doing some work and submitting some patches to improve some of the tools and things that we use as it makes sense to do so.

And as we we are growing like crazy, I’m looking to hire some people to join my team to help us get even more things done. If I were writing a job description it would probably include bits and pieces like Linux administration, python, puppet, probably devops (as it’s something that’s in mind), cloud automation (… even though I still hate the word cloud), release and build tooling, monitoring, and more. Sound interesting? Drop me a line and let’s talk.

I’ve been having to build some new CentOS images to be used with EC2 for work recently. I went into it thinking that it shouldn’t be too big of a deal. I know that some work had been going on in this area and Fedora 14 is now available on EC2, so I figured I could convince the same toolchain to work.

Unfortunately, I was pretty disappointed with my options.

Do some building by hand on an actual instance, then do the bundling and upload off of the running instance.

Some of the ThinCrust stuff initially looked promising, but it seems like it’s largely unmaintained these days and the ec2 conversion bits didn’t really work at this point. I was able to get my initial images this way, but mostly by having a wrapper shell script of doom that made me sad.

There’s always the rPath tools, but I wanted to stick to something more native and fully open source

The new kid on the block is apparently BoxGrinder but I found it to be a lot over-complicated and not that robust. I’m sorry, but generating your own format that you then transform into a kickstart config and even run through appliance-creator via exec from your ruby tool just felt wrong. No offense, but just felt like a lot more than I wanted to deal with

So, I sat down and spent an evening hacking and have the beginnings of a working ami-creator.
It’s pretty straight-forward and uses all of the python-imgcreate stuff that’s used to build Fedora live images. Your input is a kickstart config and out the other side pops an image that you can bundle and upload to EC2.

Thus far, I’ve tested it to build CentOS 5 and Fedora 14 images. I’m sure there are some bugs but at this point, it’s worth getting it out for more people to play with. Hopefully it’s something that’s a lot simpler and more accessible for people to build images and I think it will also fit in a lot better with having Fedora release engineering building the EC2 images in Fedora 15 if they want.

One of the big outstanding pieces that I still want to add is the necessary bits to be able to (optionally) go ahead and upload and register as an AMI with your EC2 account. But release early, release often.

I’ve been wanting to play with tumblr, so I’ve set up a new blog for my bike blogging to try it out. Check it out for exciting race reports, some video and probably some other random thoughts on cyclocross as I begin my inaugural season of cyclocross racing.

There might be some other reorganization and moving around here as well in the future when I have a little bit of spare time. Which, since I’m racing cross, might not be for a few months

The more I see it, the more I want to just completely see the usage of the word “cloud” go away. While it’s somewhat of a cliche to say so, it’s a term that has a very hazy and non-concrete meaning. So whenever you start to use it, you immediately end up in the “well, what is a cloud” discussion. And thus, I have a set of suggestions for those places where you might have wanted to use the word “cloud” to instead use something which actually has meaning.

If you’re using cloud to refer to EC2, use EC2 instead. It’s concrete and it means very real things about your deployment and scaling models as well as how you’re managing your infrastructure.

If you’re using cloud to refer to some service which runs over the Internet, either refer to the service or just say the Internet. You don’t store your mail “in the cloud”, you host it with Google apps. You don’t backup “to the cloud”, you have your backups stored over the Internet with Mozy or Carbonite.

If you’re using cloud to refer to the idea of some hosted application platform, just say the platform. You don’t run your python app “in the cloud”, you run it on AppEngine (or something else).

If you’re using cloud to mean that you are using virtualization and have some management stack on top of it, then please just say you’re running in a virtualized environment.

If you’re using cloud to refer to having your server infrastructure hosted in a virtualized environment by someone else, again, just say you’re running in a virtualized environment.

If you’re using cloud to refer to a “visible mass of little drops of water or frozen crystals suspended in the atmosphere”, then congratulations, you can continue to use the word cloud. And thanks to Wikipedia for the definition

Following this simple idea will let you avoid the otherwise impossible to avoid discussion of the semantics of the word “cloud” and what you happen to mean about it and how you might be wrong and … This then means you’ll be that much closer to achieving whatever goal you hoped to achieve as you’ll spend less time talking and more time doing. And as an added benefit, you’ll avoid getting grumpy emails from me about the fact that you’ve used such a terribly over-used and under-meaninged term.

As a road rider and racer, my cycling season tends to wind down about this time. If I were to start racing cyclocross, I’d extend it out, but for now, I’m staying out of that. The past two years, I’ve marked the end of my season with racing at the Jamestown Classic down in Rhode Island. This year, a combination of the fact that I really kind of needed to work the day and also that my fitness wasn’t really where it should have been for a race led to me skipping it. Now I’m a little bummed that I did, but c’est la vie. I’ve spent most of the past six weeks generally riding just for fun and without any real training goals in mind, although I have been watching the power numbers on my shiny new powertap out of curiosity.

Looking back on the season, it was one that was both successful on some fronts and utterly not on others. I did a good job of keeping up a good base training routine through the winter but then ended up doing little in the way of racing over the course of the spring and summer. First it was waiting for the new bike, then it was being busy, then the weather sucked, then I got hit by a car, then travel, and then the season was over. Even though I didn’t race much, I felt like I was a lot better prepared for the races I did do and that my fitness was higher as a result of the base training through last winter.

So I think it’s now time to start easing myself back into a bit more of a routine in preparation for the winter of base training. I picked up a new trainer to replace the freebie I had been using that’s significantly quieter. Last winter, I was able to do trainer time in the evenings, but with my current schedule that seems unlikely so I’m going to start getting up a little earlier to get time in before the ride into work.

Set it up last night and had the first ride on it was this morning and it’s pretty nice — quieter than the old one and seems a bit smoother as well. I’ve got a pretty good set up to start with to be able to watch DVDs or online video. I’m then streaming the audio to my iPhone with AirPhones so that I don’t have to have a long headphone cable or worry about turning up the speakers really loud. Today was watching some TV via Hulu and then a Spinervals DVD. For the latter, though, I need some better music. What do other people listen to as a good upbeat playlist for time on the trainer or even general race warmup, etc?