small, sharp, composable

The previous incarnation of the project unnecessarily aimed at holism. We were building up an entire toolchain of snowflakes to help you understand the availability and performance at the top level of your website. We were experimenting with dashboards and long term storage of data. There was talk of building monitoring and alerting components, as if we could do it better.

It soon became clear that this was a mistake. Instead of trying to own the entire experience, scope should be reduced and I should do what I know how to do best - take measurements and make room so those outputs can be integrated into existing telemetry ecosystems. Today, It is perfectly reasonble for a company to invest in a SaaS product such as Librato or to roll their own in-house solutions based on Graphite and Grafana. Context-rich logs are made searchable via services like Papertrail or in-house via Heka, Elasticsearch, and Kibana. If canary is going to provide value, it needs to be easy to integrate into such environments.

v1 takes a step towards correcting the project and offers a way forward for future experimentation.

what's in v1?

The v1 release contains a core set of interfaces along with a cli tool, canary. Usings it looks like so:

As you can see, we ask the tool to monitor a single website (http://www.canary.io), and the results are emitted to STDOUT. The canary tool is meant to be used in a similar fashion to ping - it gives you quick insight into the basic availability and performance of your target.

This release also introduces two interfaces, Sampler and Publisher that can be read about via the docs. It is believed that these will reduce friction for future expansion.

what's coming in v2?

v2 is an intermediary release that will introduce the canaryd command. canaryd is similar to canary, but is capable of monitoring multiple sites and receives its configuration via a JSON manifest. A representative manifest can be found here.

After v2, I'll build other publishers, beginning with one for Librato.

A roadmap and deprecation

The initial roadmap is housed here and contains a short list of goals that I need to reach in order to scratch a personal itch. Once those have been completed, the project should be in good shape for further improvement.

At this time I also plan to deprecate canaryio/canaryd, canaryio/sensord, and canaryio/meta]. The repositories will remain intact, but all issues will be closed and the READMEs will be updated accordingly. Anyone is welcome to fork and run with those projects as they see fit, but they will no longer be supported by me.

I will also be shutting down the existing api.canary.io and watch.canary.io sites since the core project is now heading in a much simpler direction. I am very grateful for all of the community support, and am especially thankful for the support of Rackspace for hosting us up to this point and for the talented Jeremy Green for all the time spent on watch.canary.io. Thank you very much for helping make this experiment possible.

I've been hacking around on AWS for five years now, and am starting to compile a list of tips and tricks to help make the most of things. I'm hoping to use this series of posts to help me clarify my own thinking, and if you find any of this helpful, even better.

When working in a bare metal environment, I'd likely provision a new box and hook it up to a Chef Server or Puppet Master. I'd spend most of my time thinking about my on-instance configuration management, as that matters quite a bit since these boxes are going to hang around for a long time.

This isn't the right way to approach EC2. In the AWS model, you want your EC2 instances to be stateless and thrown away often. You should be pushing state onto dedicated services such as Heroku Postgres or Amazon RDS / DynamoDB. Your computer instances should be as thin as possible, focusing as much of their compute power on the task at hand.

It's useful to think of EC2 instances as nothing more than dynamic compute containers awaiting instruction. For example, take a look at this small ruby script:

The Good Parts

at this point, we're not required to use more complex configuration management - a shell script is just fine

we stand a fair chance at being able to run any app we want - just give us a tarball, and we're set if the right libs are installed on the host

Room for Improvement

Things that immediately jump to mind:

what if we want to configure this app - what then?

how do we know that it booted okay?

how do I maintain a fleet of these with minimal overhead?

what if I need a load balancer?

what if I want to run a lot of different apps, with minimal overhead?

I don't like installing packages at boot time

how do we reliably support apps that are not ruby?

In Summary

The more you treat EC2 instances like bare metal computers, the more you'll hate your life. This is a proven ratio, embedded deep within the universe. I'm pretty sure it was featured in a recent Dan Brown novel. Treat them like single-task compute instances, and you'll find yourself working with something more like Lego Blocks. Start treating complexity as a smell (as God intended), and watch your life improve.

Future posts will likely dig into improvements and higher level concerns. I'll probably also go all hipster on you and demonstrate how docker could be applied here.

This morning, I upgraded the public version of canaryd to the 1.0.0 branch. The big public-facing benefit here is websocket support for streaming down per-check measurements, rather than having to continously poll the measurements endpoint.