Miscellany from a philomath

Main menu

Post navigation

Last Tuesday (29th), I attended Span Conf, “Span is a practical developer conference about building scalable systems.” it was the first one, and was mostly organised by Couchbase/Matt Revell.

It was an interesting day, attendance was around 130, with a single-track, and I attended most of the talks, before having to head off to fly home, fortunately the final two talks were recorded, so I’ll hopefully get a chance to see them soon.

First up was Richard Conway (@azurecoder), this started off with a discussion about the work Microsoft has put into getting Hadoop to run on Windows (specifically in Azure), but the last quarter covered Spark, the syntax for transformations is a considerable improvement over Hadoop, but he did say that performance is still lacking a bit behind the more highly tuned Hadoop, they’re working on it tho’

The quick-fire approach continued with Javier Ramirez (@supercoco9) on “API Analytics with Redis and BigQuery”, the Redis part fell on the floor somewhere, so this was mostly about being able to parse 1Tb of data a second, and how Google’s BigQuery works, this was probably more interesting to BigData folks than to me.

Next up was Alex Dean (@alexcrdean) “Why Your Company Needs a Unified Log”, for me at least, this heated things up a bit, Alex founded Snowplow, the folks behind “Snowplow Analytics”, which has https://github.com/snowplow/snowplow as it’s main project/product.

He covered the history of Business Analytics, from traditional Data-Warehousing via nightly ETL, right up to today’s must-have-more-data-right-now!

I took advantage of the conference discount and bought Alex’s book, and I’ve enjoyed what’s there so far…

We continued with Richard Astbury (@richorama), who presented Microsoft’s “New Orleans” project, which implements “Silos” and “Grains”, Grains are created per entity to be tracked, and run in a clustered container (a Silo), during the talk, I kept thinking about how similar this sounded to Stateless Session EJBs and Erlang’s processes, after the talk, someone referred to it as Erlang.NET, and indeed, Robert Virding asked why they didn’t use Erlang despite this, it was actually an interesting talk, for anybody wanting to run a lightweight process per entity (psmgr?), defining a lightweight process lifecycle, including things like “suspend”.

The example cited was the online support for Halo 4 where there was a Grain per player recording the player stats, and unlike previous versions, it didn’t fall over on launch day.

Next up was Stephane Maldini (@smaldini) on “Reactive Micro-services With Reactor”, this was a bit rambling, but covered Reactor, which is a “foundation for asynchronous applications on the JVM” and is an alternative to the Microsoft-supported “Reactive extensions” and reactive event processing, this was interesting, tho’, perhaps could’ve been a bit more focussed on practical uses, he touched on reactively generating “Back Pressure”, a topic we’d return to later.

We only had a couple of lightning talks after lunch, first up was a chappie from IBM talking about Bluemix, the eye-catching demo combining Node-Red (https://github.com/node-red/node-red) with Cloud translation and Cloud “question answering” services was impressive, tho’, whether or not the people who would be most comfortable using visual programming techniques like that would actually do that is debatable, it looked good tho’

The second lightning talk was from Steve Alexander (@now_talking), about how organisations should be divided, along Dunbar’s Number-lines (http://en.wikipedia.org/wiki/Dunbar’s_number), this was a brief talk, delivered in Steve’s own style, and was an interesting wee diversion, for a better summary see this tweet.

Next up was Phil Wills’ (@philwills) talk “Scaling TheGuardian.com with Scala, Elasticsearch and more”, I’d previously watched Graham Tackley’s talk on this so, Phil’s talk wasn’t quite as new, main lessons were “separate your services” and “don’t allow a slow-service to bring down the rest of your site”, this was more a look at the evolution of the Guardian site over the past twelve years (while Phil’s been there).

Following on from that was Michael Nitschinger (@daschl) talking about “Building A Reactive Database Driver On The JVM”, at the start Matt Revell made a point of saying that it wasn’t a “Vendor Talk” (Michael works for Couchbase, who provided a lot of the organisational manpower), and to be honest, that was a disservice to Michael’s talk, it was a really interesting look at how they rebuilt the Couchbase Java driver, to support both synchronous and asynchronous styles (synchronous calls to the async layer), how they managed to build “Back Pressure” into the driver, and a look at the underlying architecture of something that most folk would naively think was “just a driver”, this was really quite interesting, and the slides are here.

Next up, was “Staying Agile In The Face Of The Data Deluge”, from Martin Kleppman (@martinkl), coincidentally, I’d purchased his up-coming book a couple of weeks ago, which is still in the early stages, and his talk was fascinating, this was about servicing requests (not necessarily HTTP requests) using Stream Processing, primarily building on-top of Kafka (for volume), and using the Apache Samza project (which he’s a committer on) for doing it, again, familiar topics came up, “Back Pressure”, streaming transformations, event histories.

I can recommend watching the video of this when it’s available, he’s uploaded his slides already (the final slide, with references is well worth a look).

The final talk I got to see before I left, was Elliot Murphy (@sstatik) “Safeguarding sensitive data in the cloud”, I’m pleased that Heroku cover most of his recommendations, but this was another really interesting talk, he compared the food industry, where people who don’t follow basic hygiene rules are considered negligent, saying that even doing the simple things (washing your hands etc) results in saving a lot of people from illness, to data-security, where a lot of people don’t follow the “basics”, he covered “hand washing” in data-security terms, and some useful guidelines on changing culture around the security of data, again, a really interesting talk, and the slides are here.

Unfortunately I had to cut and run before the last two talks, including Robert Virding talking about the history of Erlang, but I’ll watch out for the videos.

It was a nice opportunity to catch up with some old Canonical friends, and hopefully, they’ll find the energy to run it again next year.

After almost seven and a half years, I changed jobs from Canonical to Heroku.

I’d been at Canonical for more than three and a half times longer than any previous job, it took a long time to get bored, and I’m not even sure I did, I grew frustrated with too many things and was looking for something new.

Fortunately, an ex-colleague from Canonical popped up with a daring rescue for which I’m very grateful, and just over 3 months later, I find myself getting my mind blown by how Heroku does things…lots of new things, too many services to count, and a vibrancy that had long left Canonical.

Ruby, Erlang, Go, Splunk, instant deployments, fantastic support data gathering tools, the list of new and cool things at Heroku could go on and on, and for all right now I’m still a bit bewildered and trying to find my feet, it’s exhilerating

Note, this app uses the Plug module, and Cowboy to do the web-serving, and Jazz to encode/decode JSON.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

defmodule ExMastro.JenkinsNotifications do
import Plug.Conn
use Plug.Router
plug Plug.Parsers, parsers: [ExMastro.Plugs.Parsers.JSON]
plug :match
plug :dispatch
def init(options) do
options
end
post "/jenkins/notifications" do
conn = Plug.Conn.fetch_params(conn)
# We have the server in conn.params["server"]·
# And the rest of the build in the conn.params[:json]
send_resp(conn, 201, conn.params[:json]["name"])
end
match _ do
send_resp(conn, 404, "oops")
end
end

Line 5 is interesting here, Plugs are stackable, which in practice means that they get called in order to handle HTTP requests, and as long as they return a Plug.Conn structure, and don’t call send_response, they will continue.

Line 5 puts a parsing Plug into the top of stack, parsers are expected to check to see whether or not they can process the content of the request, and if so, they return a tuple of:

{:ok, a map with things to put into the conn.params structure, and the Conn}

This code should look for errors in the body (from the parsing plug), and do something better…

Line 5 uses pattern matching to pick up on “application/json” content, then attempts to read the body from the Plug.Conn record, handling errors, or…parsing them as JSON, note the ! to parse and raise if it can’t parse the JSON.

This would cause the service to crash and burn, but we can setup a supervision tree to restart it.

I’ve been using Bazaar for over 6 years, distributed version control was for me, a huge improvement over what we used previously, Subversion, which was itself a huge improvement over CVS.

Twice, we’ve thought about branching, and twice, we’ve ended up with the same basic strategy.

We want to have active development on trunk, a staging deployment, mainly to test code with a production-like setup (and data volumes), and a production release.

The production release has been scheduled as often as a week (and we’ve had more frequently than that), but has sometimes been a month or more apart (after we got out of initial development of features).

What we ended up with was three main branches, named for their deployment environments, with trunk, production and staging branches.

Show-stopper defects found in staging are fixed in staging, and the fix merged back to trunk, same for production.

Trunk is almost always deployable, but, we don’t currently use feature-flags to hide “part-baked” functionality in production, and we rarely use long-lived feature branches to develop features against.

All features/fixes are branches from trunk, and committed right back to trunk once they’ve been reviewed, this approach can sometimes lead to “super-branches”, huge branches which incorporate everything from the infrastructure to the UI, and these are problematic to review, I can definitely recommend keeping branches below 200 lines of diff for ease of reviewing.

Instead of feature-flags, we’ve tended to build from the bottom-up, so…put in the infrastructure to support the UI, and then add the UI at the end.

This hasn’t always worked well, mainly because it’s often that only seeing the UI that a user has their requirements solidified…and that can mean some rework, we’ve gotten round that by having pre-merge-to-trunk customer acceptance demos, but even this can be a bit tricky, as the user doesn’t get to use the software themselves, generally they’re getting a screen-shared demo.

We rarely have show-stopper-fix-in-production bugs, most issues are things that can make their way through the trunk-staging-production process

We switched a couple of months ago from a failing Scrum-like process to a mostly-Kanban process.

Why did we switch?

We were attempting Scrum, but in reality, our team is too small, and we have a lot of new tasks coming in, these were tasks that couldn’t really be put off, and were pushing out our burndown a lot, our sprints ended with trying to get features landed, and the day before the end invariably had several reviews in the queue.

When we were committing to a set of tasks, our customers weren’t, so every period we ended rushing to get stuff ready for the bi-weekly demo, but many of the “we also spent a fair bit of time doing this” items weren’t being recorded.

Another big failing was that we had our bi-weekly iteration planning meeting late on a friday afternoon (UK time), so after a while, when questions needed to be asked, we didn’t…because it made the end of our week that little bit closer.

Also, there was too much “throw it over the wall” from both the customer and the developers, the demo was failing (as Scrum demos are somewhat likely to) because developers don’t demo “bugs”, we failed to identify the stories of most value to the customer early, they didn’t use the product, until very late in the development cycle, and so we didn’t get into a good iterative pattern.

Personally, the goal (and the gratification) of developing software is people using it, and I believe we left that too late, for various reasons.

So, Scrum just wasn’t working very well.

What were we hoping to gain?

We wanted to have better recording of tasks, when you have a prescribed story list, but lots of interrupt items, we discovered the interrupt items weren’t being recorded very well, this makes commitments to stories within a sprint hard.

We wanted to shorten the release cadence, it’s of course possible to disconnect sprints and releases in a Scrum process, but when things are focussed on the end-of-sprint demo, the expectation is that we’re demoing the stuff that will be available in the release, so we naturally fall into bi-weekly releases.

Tightening the release period, means shorter feedback loops, but this has to be understood as a way of changing things rapidly.

Kanban in practice

Since we moved to Kanban, we’ve mostly brought our releases into a regularly weekly cadence, but as the classic “pigs and chickens” story for Scrum illustrates, without commitment from all parties, this is hard, “common purpose” is way underestimated in companies, different teams have different agendas, and even if they need a feature, that doesn’t mean they’ll have time to actually shepherd that feature through to completion.

We brought in fairly strict Work-in-Progress (WIP) limits for our lanes, and we’ve incrementally added lanes, to reflect our process better, and there’s resistance to adding lanes (which IMO is good), but we’ve increased them from three to eight lanes since we started.

We’ve been using LeankitKanban, which has proved “ok” at best, the UI is not great, but functionally it works.

We don’t have WIP limits for some lanes, specifically our release lanes, but we’ve found that including an “External” lane, has proved useful to keep an eye on tasks that are dependent on someone from outside our team, we can check-in with the tasks in the lane regularly, and ensure they’re not being forgotten about, we put in the “Blocked” field what it is that we are waiting on, to ensure that it’s clearly defined (or at least as clearly defined as it can be).

We have better recording of our interrupt tasks, I’m now confident that we’re recording pretty much every task that we spend time on, this proves useful in identifying repetitive tasks, as well as the more obvious use of allowing someone to pick up tasks in the event someone’s off on holiday or sick.

We started off overfilling the “Not Started” branch, which made prioritisation harder, it’s demonstrably easier to prioritise a small number of items than a large list, and we we tended to have 15-20 items in “Not Started”, but be putting items at the top of the lane regularly, so the bottom items were never getting anywhere.

We’re now keeping only 6-10 items in the “Not Started” lane, clearly prioritised, and with useful input from stakeholders on the prioritisation of items, this tightens the feedback loop on bugs in particular, as if they’re at the top of the “Not Started” lane, then they will be worked on very soon.

We’ve had problems with incomplete cards, these are problematic because they block the “In Progress” lane for an indeterminate time while we work out what to do with them, now we’re fairly strict about ensuring that before a card can be moved from the Icebox, it’s well enough understood that it can be worked on, this is an ideal opportunity for knowledge sharing to ensure that the card can be implemented by anybody in the team, as well as the opportunity to discuss the card before it’s taken up by a single developer, one of the big problems in remote-working is that you can’t really have “over-the-desk” conversations, bouncing ideas around, the spontaneity is lost when having to start a Google Hangout, so fixes tend to be made in isolation, properly capturing the conversation around a card as it’s moved into “Not Started” is essential tho’.

Conclusion

The jury’s still out on Kanban, there’s a misconception that Kanban is a process, when it’s really a visualisation of your workflow, if your workflow is unclear, or faulty, Kanban won’t magically fix it, but it provides tools to make it easy to identify bottlenecks in your existing workflow, and it doesn’t define a workflow itself.

The options for lanes are virtually infinite, and you have to create the lanes to reflect your current workflow, identify the problems using gathered data (and this is one area where Leankit really shines), and then come up with solutions, and repeat the process…

The significance of the friction that WIP generates has to be understood properly, it’s there to force discussions, it should be healthy friction, in a team with good “common purpose”, hitting the WIP limit for a lane is the siren that says “There’s a blockage here, what can I do to unblock it?”. If you’re unable to unblock it, then a wider conversation is needed, extending the WIP limit without this conversation would be a mistake, but an easy one to make.

One of the other tenets of Kanban is continuous improvement, experimentation is encouraged, with such a loose mechanism, it’s easy to increase the WIP in a lane for a period, and assess the implications for other workflow stages.

We’ve recently tried to limit the number of cards in the “Not Started” lane, this has has meant that we’ve run out of items a couple of times, but we get together and move some items over, but the upside is that we’ve had better prioritisation.

I hibernated over winter, not doing much around the house, but as the weather’s picked up (and with the imminent visit of my mum and dad for the first time), I’ve started to push my projects forward.

I bought a 5m reel of 5050 RGB LEDs, which are destined for under-cupboard lighting in the kitchen, in order to improve this, I’ve been experimenting with Jeenodes, and specifically an LED Node v2a, to drive the strips of LED lighting.

These are basically mini 3v arduinos with built-in radio transceivers, which makes communication between them easy enough, you can send in-memory structures fairly trivially with the RF12 library.

I’ve hooked it up to a PIR sensor, and that controls the lighting (the colour of the lighting is controlled separately) via a ZMQ connected service.

I’ve got 5 Jeenodes running (and the kits for a couple more) and two Raspberry Pis providing the horsepower, along with some other bits and pieces.

I’ve also implemented calendar event broadcasting (coming from Google calendars), this was fun to do, and it essentially pulls calendar events for a list of calendars, and emits start/end of those events, services can be written to listen for those events and respond.

Current task is building a ZMQ service that talks to my node-rfxtrx library, which will then be connected to the front-end I’m building, and at that point, I’ll be happy for now

I’ve been working a fair bit with Go recently, my employer has selected it as the language for developing services with, and we’re looking at ways in which Go can solve our problems.

I gave a talk last month looking at the syntax of Go, compared to Python.

Go turns out to be fast, easily compiled, and mostly easily understood, the mantra “Don’t communicate by sharing memory; share memory by communicating.” is essential to understand why Go is great to work with.

Goroutines and channels are great to synchronise processes…but, wait, yes, I’ve been a Python programmer for a long time, and it would be remiss of me to neglect Python’s built-in libraries.

We have the venerable threading library, threading is not great, it does the job (mainly slowing your Python application down), and for quick demos, you need to do a lot go make it safe to work with.

Python 2.6 (released at the start of 2008) brought with it the multiprocessing library, which has Queues and Pipes, which are really not that far removed from Go’s channels, and with a tiny bit of sugar, can prove quite useful, it gets round Python’s GIL by starting separate interpreters which means the interpreter doesn’t have to use the GIL.

After hibernating for most of the winter (certainly as far as home projects go), the recent spring-like weather (it’s stopped being so cold and miserable), has awakened fresh interest.

I spent some time investigating lots of options, and Stian Eikland’s approach looks good to me.

Basically, it’s a collection of micro-services joined using a ZeroMQ “bus”, I’ve reworked some of his code into CoffeeScript classes, and extended the number of services.

Some services push data to a Broker, and others subscribe to messages from the Broker.

I’ve written my own Cosm service, which looks for sensor messages, and pushes the data to Cosm, using my Cosm API, and today, I completed the first draft of a Google Calendar event publisher, which can publish calendar events to the bus…

To fetch the events from the Google Calendar, it requires read-only access to the private calendar feed, it calculates a 24 hour date period, and fetches those events, and then interprets them and publishes them to the bus.