The true life story of a kid from Bribie Island (I’ve been there!) running a marathon in Antartica, via being a touring musical comedian, doing things like this:

This book is an interesting and light read, and came kindly recommended by Michael Carden, who pretty much insisted I take the book off him at a cafe. I don’t regret reading it and would recommend it to people looking for a light autobiography for a rainy (and perhaps cold) evening or two.

Oh, and the Scared Weird Little Guys of course are responsible for this gem…

This book is highly recommended and now I really want to go for a run.

After 20 incredible years as part of a musical comedy duo, Scared Weird Little Guy, Rusty Berther found himself running a marathon in Antarctica. What drove him to this? In this hilarious and honest account of his life as a Scared Weird Little Guy, and his long journey attempting an extreme physical and mental challenge at the bottom of the world, Rusty examines where he started from, and where he just might be going to.

I’m doing the Linux Foundation Kubernetes Fundamentals course at the moment, and I was very disappointed in the chapter on Ingress Controllers. To be honest it feels like an after thought — there is no lab, and the provided examples don’t work if you re-type them into Kubernetes (you can’t cut and paste of course, just to add to the fun).

I found this super annoying, so I thought I’d write up my own notes on how to get nginx working as an Ingress Controller on Kubernetes.

First off, the nginx project has excellent installation resources online at github. The only wart with their instructions is that they changed the labels used on the pods for the ingress controller, which means the validation steps in the document don’t work until that is fixed. That is reported in a github issue and there was a proposed fix that didn’t have an associated issue that pre-dates the creation of the issue.

That bit is mostly explained by the Linux Foundation course. Well, he links to the github page at least and then you just read the docs. The bit that isn’t well explained is how to setup ingress for a pod. This is partially because kubectl doesn’t have a command line to do this yet — you have to POST an API request to get it done instead.

This causes the nginx configuration to get re-created inside the nginx pod by magix pixies. Now, assuming we have a route from our desktop to 10.244.2.13, we can just go to http://ghost.10.244.2.13.nip.io in a browser and you should be greeted by the default front page for the ghost installation (which turns out to be a publishing platform, who knew?).

To cleanup the ingress, you can use the normal “get”, “describe”, and “delete” verbs that you use for other things in kubectl, with the object type of “ingress”.

]]>http://www.madebymikal.com/kubernetes-fundamentals-setting-up-nginx-ingress/feed/0What’s missing from the ONAP community — an open design processhttp://www.madebymikal.com/whats-missing-from-the-onap-community-an-open-design-process/
http://www.madebymikal.com/whats-missing-from-the-onap-community-an-open-design-process/#respondThu, 30 Aug 2018 01:43:07 +0000http://www.madebymikal.com/?p=5610Continue reading What’s missing from the ONAP community — an open design process]]>

I’ve been thinking a fair bit about ONAP and its future releases recently. This is in the context of trying to implement a system for a client which is based on ONAP. Its really hard though, because its hard to determine how various components of ONAP are intended to work, or interoperate.

It took me a while, but I’ve realised what’s missing here…

OpenStack has an open design process. If you want to add a new feature to Nova for example, the first step is you need to write down what the feature is intended to do, how it integrates with the rest of Nova, and how people might use it. The target audience for that document is both the Nova development team, but also people who operate OpenStack deployments.

ONAP has no equivalent that I can find. So for example, they say that in Casablanca they are going to implement a “AAI Enricher” to ease lookup of data from external systems in their inventory database, but I can’t find anywhere where they explain how the integration between arbitrary external systems and ONAP AAI will work.

I think ONAP would really benefit from a good hard look at their design processes and how approachable they are for people outside their development teams. The current use case proposal process (videos, conference talks, and powerpoint presentations) just isn’t great for people who are trying to figure out how to deploy their software.

]]>http://www.madebymikal.com/whats-missing-from-the-onap-community-an-open-design-process/feed/0Learning from the mistakes that even big projects makehttp://www.madebymikal.com/learning-from-the-mistakes-that-even-big-projects-make/
http://www.madebymikal.com/learning-from-the-mistakes-that-even-big-projects-make/#respondFri, 24 Aug 2018 04:35:08 +0000http://www.madebymikal.com/?p=5574Continue reading Learning from the mistakes that even big projects make]]>

OpenStack is an orchestration system for setting up virtual machines and associated other virtual resources such as networks and storage on clusters of computers. At a high level, OpenStack is just configuring existing facilities of the host operating system — there isn’t really a lot of difference between OpenStack and a room full of system admins frantically resolving tickets requesting virtual machines be setup. The only real difference is scale and predictability.

To do its job, OpenStack needs to be able to manipulate parts of the operating system which are normally reserved for administrative users. This talk is the story of how OpenStack has done that thing over time, what we learnt along the way, and what I’d do differently if I had my time again. Lots of systems need to do these things, so even if you never use OpenStack hopefully there are things to be learnt here.

That said, someone I respect suggested last weekend that good conference talks are actionable. A talk full of OpenStack war stories isn’t actionable, so I’ve spent the last week re-writing this talk to hopefully be more of a call to action than just an interesting story. I apologise for any mismatch between the original proposal and what I present here that might therefore exist.Back to the task in hand though — providing control of virtual resources to untrusted users. OpenStack has gone through several iterations of how it thinks this should be done, so perhaps its illustrative to start by asking how other similar systems achieve this. There are lots of systems that have a requirement to configure privileged parts of the host operating system. The most obvious example I can think of is Docker. How does Docker do this? Well… its actually not all that pretty. Docker presents its API over a unix domain socket by default in order to limit control to local users (you can of course configure this). So to provide access to Docker, you add users to the docker group, which owns that domain socket. The Docker documentation warns that “the docker group grants privileges equivalent to the root user“. So that went well.

Docker is really an example of the simplest way of solving this problem — by not solving it at all. That works well enough for systems where you can tightly control the users who need access to those privileged operations — in Docker’s case by making them have an account in the right group on the system and logging in locally. However, OpenStack’s whole point is to let untrusted remote users create virtual machines, so we’re going to have to do better than that.

The next level up is to do something with sudo. The way we all use sudo day to day, you allow users in the sudoers group to become root and execute any old command, with a configuration entry that probably looks a little like this:

# Allow members of group sudo to execute any command
%sudo ALL=(ALL:ALL) ALL

Now that config entry is basically line noise, but it says “allow members of the group called sudo, on any host, to run any command as root”. You can of course embed this into your python code using subprocess.call() or similar. On the security front, its possible to do a little bit better than a “nova can execute anything” entry. For example:

%sudo ALL=/bin/ls

This says that the sudo group on all hosts can execute /bin/ls with any arguments. OpenStack never actually specified the complete list of commands it executed. That was left as a job for packagers, which of course meant it wasn’t done well.

So there’s our first actionable thing — if you assume that someone else (packagers, the ops team, whoever) is going to analyse your code well enough to solve the security problem that you can’t be bothered solving, then you have a problem. Now, we weren’t necessarily deliberately punting here. Its obvious to me how to grep the code for commands run as root to add them to a sudo configuration file, but that’s unfair. I wrote some of this code, I am much closer to it than a system admin who just wants to get the thing deployed.

We can of course do better than just raw sudo. Next we tried a thing called rootwrap, which was mostly an attempt to provide a better boundary around exactly what commands you can expect an OpenStack binary to execute. So for example, maybe its ok for me to read the contents of a configuration file specific to a virtual machine I am managing, but I probably shouldn’t be able to read /etc/shadow or whatever. We can do that by doing something like this:

sudo nova-rootwrap /etc/nova/rootwrap.conf /bin/ls /etc

Where nova-rootwrap is a program which takes a configuration file and a command line to run. The contents of the configuration file are used to determine if the command line should be executed.

Now we can limit the sudo configuration file to only needing to be able to execute nova-rootwrap. I thought about putting in a whole bunch of slides about exactly how to configure rootwrap, but then I realised that this talk is only 25 minutes and you can totally google that stuff.

So instead, here’s my second actionable thing… Is there a trivial change you can make which will dramatically improve security? I don’t think anyone would claim that rootwrap is rocket science, but it improved things a lot — deployers didn’t need to grep out the command lines we executed any more, and we could do things like specify what paths we were allowed to do things in. Are there similarly trivial changes that you can make to improve your world?

But wait! Here’s my third actionable thing as well — what are the costs of your design? Some of these are obvious — for example with this design executing something with escalated permissions causes us to pay to fork a process. In fact its worse with rootwrap, because we pay to fork, start a python interpreter to parse a configuration file, and then fork again for the actual binary we wanted in the first place. That cost adds up if you need to execute many small commands, for example when plugging in a new virtual network interface. At one point we measured this for network interfaces and the costs were in the tens of seconds per interface.

There is another cost though which I think is actually more important. The only way we have with this mechanism to do something with escalated permissions is to execute it as a separate process. This is a horrible interface and forces us to do some really weird things. Let’s checkout some examples…

Which of the following commands are reasonable?

shred –n3 –sSIZE PATH
touch PATH
rm –rf PATH
mkdir –p PATH

These are just some examples, there are many others. The first is probably the most reasonable. It doesn’t seem wise to me for us to implement our own data shredding code, so using a system command for that seems reasonable. The other examples are perhaps less reasonable — the rm one is particularly scary to me. But none of these are the best example…

Some commentary first. This code existed in the middle of a method that does other things. Its one of five command lines that method executes. What does it do?

Its actually not too bad. Using root permissions, it writes a zero to the multicast_snooping sysctl for the network bridge being setup. It then checks the exit code and raises an exception if its not 0 or 1.

That said, its also horrid. In order to write a single byte to a sysctl as root, we are forced to fork, start a python process, read a configuration file, and then fork again. For an operation that in some situations might need to happen hundreds of times for OpenStack to restart on a node.

This is how we get to the third way that OpenStack does escalated permissions. If we could just write python code that ran as root, we could write this instead:

Its not perfect, but its a lot cheaper to execute and we could put it in a method with a helpful name like “disable multicast snooping” for extra credit. Which brings us to…

Hire Angus Lees and make him angry. Angus noticed this problem well before the rest of us. We were all lounging around basking in our own general cleverness. What Angus proposed is that instead of all this forking and parsing and general mucking around, that we just start a separate process at startup with special permissions, and then send it commands to execute.

He could have done that with a relatively horrible API, for example just sending command lines down the pipe and getting their responses back to parse, but instead he implemented a system of python decorators which let us call a method which is marked up as saying “I want to run as root!”.

So here’s the destination in our journey, how we actually do that thing in OpenStack now:

The decorator before the method definition is a bit opaque, but basically says “run this thing as root”, and the rest is a method which can be called from anywhere within our code.

There are a few things you need to do to setup privsep, but I don’t have time in this talk to discuss the specifics. Effectively you need to arrange for the privsep helper to start with escalated permissions, and you need to move the code which will run with one of these decorators to a sub path of your source tree to stop other code from accidentally being escalated. privsep is also capable of running with more than one set of permissions — it will start a helper for each set. That’s what this decorator is doing, specifying what permissions we need for this method.

And here we land at my final actionable thing. Make it easy to do the right thing, and hard to do the wrong thing. Rusty Russell used to talk about this at linux.conf.au when he was going through a phase of trying to clean up kernel APIs — its important that your interfaces make it obvious how to use them correctly, and make it hard to use them incorrectly.

In the example used for this talk, having command lines executed as root meant that the prevalent example of how to do many things was a command line. So people started doing that even when they didn’t need escalated permissions — for example calling mkdir instead of using our helper function to recursively make a set of directories.

We’ve cleaned that up, but we’ve also made it much much harder to just drop a command line into our code base to run as root, which will hopefully stop some of this problem re-occuring in the future. I don’t think OpenStack has reached perfection in this regard yet, but we continue to improve a little each day and that’s probably all we can hope for.

privsep can be used for non-OpenStack projects too. There’s really nothing specific about most of OpenStack’s underlying libraries in fact, and there’s probably things there which are useful to you. In fact the real problem is working out what is where because there’s so much of it.

One final thing — privsep makes it possible to specify the exact permissions needed to do something. For example, setting up a network bridge probably doesn’t need “read everything on the filesystem” permissions. We originally did that, but stepped back to using a singled escalated permissions set that maps to what you get with sudo, because working out what permissions a single operation needed was actually quite hard. We were trying to lower the barrier for entry for doing things the right way. I don’t think I really have time to dig into that much more here, but I’d be happy to chat about it sometime this weekend or on the Internet later.

So in summary:

Don’t assume someone else will solve the problem for you.

Are there trivial changes you can make that will drastically improve security?

Think about the costs of your design.

Hire smart people and let them be annoyed about things that have always “just been than way”. Let them fix those things.

Make it easy to do things the right way and hard to do things the wrong way.

city2surf 2018 was yesterday, so how did the race go? First off, thanks to everyone who helped out with my fund raising for the Black Dog Institute — you raised nearly $2,000 AUD for this important charity, which is very impressive. Thanks for everyone’s support!

city2surf is 14kms, with 166 meters of vertical elevation gain. For the second year running I was in the green start group, which is for people who have previously finished the event in less than 90 minutes. There is one start group before this, red, which is for people who can finish in less than 70 minutes. In reality I think its unlikely that I’ll ever make it to red — it would require me to shave about 30 seconds per kilometre off my time to just scrape in, and I think that would be hard to do.

Training for city2surf last year I tore my right achilles, so I was pretty much starting from scratch for this years event — at the start of the year I could run about 50 meters before I had issues. Luckily I was referred to an excellent physiotherapist who has helped me build back up safely — I highly recommend Cameron at Southside Physio Therapy if you live in Canberra.

Overall I ran a lot in training for this year — a total of 540 kilometres. I was also a lot more consistent than in previous years, which is something I’m pretty proud of given how cold winters are in Canberra. Cold weather, short days, and getting sick seem to always get in the way of winter training for me.

On the day I was worried about being cold while running, but that wasn’t an issue. It was about 10 degrees when we started and maybe a couple of degrees warmer than that at the end. The maximum for the day was only 16, which is cold for Sydney at this time of year. There was a tiny bit of spitting rain, but nothing serious. Wind was the real issue — it was very windy at the finish, and I think if it had been like that for the entire race it would have been much less fun.

That said, I finished in 76:32, which is about three minutes faster than last year and a personal best. Overall, an excellent experience and I’ll be back again.

This proposal was submitted for pyconau 2018. It wasn’t accepted, but given I’d put the effort into writing up the proposal I’ll post it here in case its useful some other time. The oblique references to OpensStack are because pycon had an “anonymous” review system in 2018, and I was avoiding saying things which directly identified me as the author.

OpenStack and Kubernetes solve very similar problems. Yet they approach those problems in very different ways. What can we learn from the different approaches taken? The differences aren’t just technical though, there are some interesting social differences too.

OpenStack and Kubernetes solve very similar problems – at their most basic level they both want to place workloads on large clusters of machines, and ensure that those placement decisions are as close to optimal as possible. The two projects even have similar approaches to the fundamentals – they are both orchestration systems at their core, seeking to help existing technologies run at scale instead of inventing their own hypervisors or container run times.

Yet they have very different approaches to how to perform these tasks. OpenStack takes a heavily centralised and monolithic approach to orchestration, whilst Kubernetes has a less stateful and more laissez faire approach. Some of that is about early technical choices and the heritage of the projects, but some of it is also about hubris and a desire to tightly control. To be honest I lived the OpenStack experience so I feel I should be solidly in that camp, but the Kubernetes approach is clever and elegant. There’s a lot to like on the Kubernetes side of the fence.

Its increasingly common that at some point you’ll encounter one of these systems, as neither seems likely to go away in the next few years. Understanding some of the basics of their operation is therefore useful, as well as being interesting at a purely hypothetical level.

This proposal was submitted for pyconau 2018. It was accepted, but hasn’t been presented yet. The oblique references to OpensStack are because pycon had an “anonymous” review system in 2018, and I was avoiding saying things which directly identified me as the author.

Since 2011, I’ve worked on a large Open Source project in python. It kind of got out of hand – 1000s of developers and millions of lines of code. Yet despite being well resourced, we made the same mistakes that those tiny scripts you whip up to solve a small problem make. Come learn from our fail.

This talk will use the privilege separation daemon that the project wrote to tell the story of decisions that were expedient at the time, and how we regretted them later. In a universe in which you can only run commands as root via sudo, dd’ing from one file on the filesystem to another seems almost reasonable. Especially if you ignore that the filenames are defined by the user. Heck, we shell out to “mv” to move files around, even when we don’t need escalated permissions to move the file in question.

While we’ll focus mainly on the security apparatus because it is the gift that keeps on giving, we’ll bump into other examples along the way as well. For example how we had pluggable drivers, but you have to turn them on by passing in python module paths. So what happens when we change the interface the driver is required to implement and you have a third party driver? The answer isn’t good. Or how we refused to use existing Open Source code from other projects through a mixture of hubris and licensing religion.

On a strictly technical front, this is a talk about how to do user space privilege separation sensibly. Although we should probably discuss why we also chose in the last six months to not do it as safely as we could.

For a softer technical take, the talk will cover how doing things right was less well documented than doing things the wrong way. Code reviewers didn’t know the anti-patterns, which were common in the code base, so made weird assumptions about what was ok or not.

On a human front, this is about herding cats. Developers with external pressures from their various employers, skipping steps because it was expedient, and how throwing automation in front of developers because having a conversation as adults is hard. Ultimately we ended up being close to stalled before we were “saved” from an unexpected direction.

In the end I think we’re in a reasonable place now, so I certainly don’t intend to give a lecture about doom and gloom. Think of us more as a light hearted object lesson.

So let me be clear here, I don’t think its a bad thing that Microsoft bought github. No one is forcing you to use their services, in fact they make it trivial to stop using them. So what’s the big deal.

Writing this down here because it took me a while to figure out for myself…

ONAP OOM deploys ONAP using Kubernetes, which effectively means Docker images at the moment. It needs to fetch a lot of Docker images, so there is a convenient script provided to pre-pull those images to make install faster and more reliable.

The script in the OOM codebase isn’t very flexible, so Jira issue OOM-655 was filed for a better script. The script was covered in code review 30169. Disappointingly, the code reviewer there doesn’t seem to have actually read the jira issue or the code before abandoning the patch — which isn’t very impressive.

So how do you get the nicer pre-pull script?

Its actually not too hard once you know the review ID. Just do this inside your OOM git clone:

$ git review -d 30169

You might be prompted for your gerrit details because the ONAP gerrit requires login. Once git review has run, you’ll be left sitting in a branch from when the review was uploaded that includes the script:

$ git branch
master
* review/james_forsyth/30169

Now just rebase that to bring it in mine with master and get on with your life:

You’re welcome. I’d like to see the ONAP community take code reviews a bit more seriously, but ONAP seems super corporate (even compared to OpenStack), so I’m not surprised that they haven’t done a very good job here.