So Monty and Sean have recently blogged about about the structures (1, 2) they think may work better for OpenStack. I like the thrust of their thinking but had some mumblings of my own to add.

Firstly, I very much like the focus on social structure and needs – what our users and deployers need from us. That seems entirely right.

And I very much like the getting away from TC picking winners and losers. That was never an enjoyable thing when I was on the TC, and I don’t think it has made OpenStack better.

However, the thing that picking winners and losers did was that it allowed users to pick an API and depend on it. Because it was the ‘X API for OpenStack’. If we don’t pick winners, then there is no way to say that something is the ‘X API for OpenStack’, and that means that there is no forcing function for consistency between different deployer clouds. And so this appears to be why Ring 0 is needed: we think our users want consistency in being able to deploy their application to Rackspace or HP Helion. They want vendor neutrality, and by giving up winners-and-losers we give up vendor neutrality for our users.

Thats the only explanation I can come up with for needing a Ring 0 – because its still winners and losers (e.g. picking an arbitrary project) keystone, grandfathering it in, if you will. If we really want to get out of the role of selecting projects, I think we need to avoid this. And we need to avoid it without losing vendor neutrality (or we need to give up the idea of vendor neutrality).

One might say that we must pick winners for the very core just by its, but I don’t think thats true. If the core is small, many people will still want vendor neutrality higher up the stack. If the core is large, then we’ll have a larger % of APIs covered and stable granting vendor neutrality. So a core with fixed APIs will be under constant pressure to expand: not just from developers of projects, but from users that want API X to be fixed and guaranteed available and working a particular way at [most] OpenStack clouds.

Ring 0 also fulfils a quality aspect – we can check that it all works together well in a realistic timeframe with our existing tooling. We are essentially proposing to pick functionality that we guarantee to users; and an API for that which they have everywhere, and the matching implementation we’ve tested.

To pull from Monty’s post:

“What does a basic end user need to get a compute resource that works and seems like a computer? (end user facet)

What does Nova need to count on existing so that it can provide that. “

He then goes on to list a bunch of things, but most of them are not needed for that:

We need Nova (its the only compute API in the project today). We don’t need keystone (Nova can run in noauth mode and deployers could just have e.g. Apache auth on top). We don’t need Neutron (Nova can do that itself). We don’t need cinder (use local volumes). We need Glance. We don’t need Designate. We don’t need a tonne of stuff that Nova has in it (e.g. quotas) – end users kicking off a simple machine have -very- basic needs.

Consider the things that used to be in Nova: Deploying containers. Neutron. Cinder. Glance. Ironic. We’ve been slowly decomposing Nova (yay!!!) and if we keep doing so we can imagine getting to a point where there truly is a tightly focused code base that just does one thing well. I worry that we won’t get there unless we can ensure there is no pressure to be inside Nova to ‘win’.

So there’s a choice between a relatively large set of APIs that make the guaranteed available APIs be comprehensive, or a small set that that will give users what they need just at the beginning but might not be broadly available and we’ll be depending on some unspecified process for the deployers to agree and consolidate around what ones they make available consistently.

In sort one of the big reasons we were picking winners and losers in the TC was to consolidate effort around a single API – not implementation (keystone is already on its second implementation). All the angst about defcore and compatibility testing is going to be multiplied when there is lots of ecosystem choice around APIs above Ring 0, and the only reason that won’t be a problem for Ring 0 is that we’ll still be picking winners.

How might we do this?

One way would be to keep picking winners at the API definition level but not the implementation level, and make the competition be able to replace something entirely if they implement the existing API [and win hearts and minds of deployers]. That would open the door to everything being flexible – and its happened before with Keystone.

Another way would be to not even have a Ring 0. Instead have a project/program that is aimed at delivering the reference API feature-set built out of a single, flat Big Tent – and allow that project/program to make localised decisions about what components to use (or not). Testing that all those things work together is not much different than the current approach, but we’d have separated out as a single cohesive entity the building of a product (Ring 0 is clearly a product) from the projects that might go into it. Projects that have unstable APIs would clearly be rejected by this team; projects with stable APIs would be considered etc. This team wouldn’t be the TC : they too would be subject to the TC’s rulings.

We could even run multiple such teams – as hinted at by Dean Troyer one of the email thread posts. Running with that I’d then be suggesting

IaaS product: selects components from the tent to make OpenStack/IaaS

PaaS product: selects components from the tent to make OpenStack/PaaS

CaaS product (containers)

SaaS product (storage)

NaaS product (networking – but things like NFV, not the basic Neutron we love today). Things where the thing you get is useful in its own right, not just as plumbing for a VM.

So OpenStack/NaaS would have an API or set of APIs, and they’d be responsible for considering maturity, feature set, and so on, but wouldn’t ‘own’ Neutron, or ‘Neutron incubator’ or any other component – they would be a *cross project* team, focused at the product layer, rather than the component layer, which nearly all of our folk end up locked into today.

Lastly Sean has also pointed out that we have large N N^2 communication issues – I think I’m proposing to drive the scope of any one project down to a minimum, which gives us more N, but shrinks the size within any project, so folk don’t burn out as easily, *and* so that it is easier to predict the impact of changes – clear contracts and APIs help a huge amount there.

Since its very early days subunit has had a single model – you run a process, it outputs test results. This works great, except when it doesn’t.

On the up side, you have a one way pipeline – there’s no interactivity needed, which makes it very very easy to write a subunit backend that e.g. testr can use.

On the downside, there’s no interactivity, which means that anytime you want to do something with those tests, a new process is needed – and thats sometimes quite expensive – particularly in test suites with 10’s of thousands of tests.Now, for use in the development edit-execute loop, this is arguably ok, because one needs to load the new tests into memory anyway; but wouldn’t it be nice if tools like testr that run tests for you didn’t have to decide upfront exactly how they were going to run. If instead they could get things running straight away and then give progressively larger and larger units of work to be run, without forcing a new process (and thus new discovery directory walking and importing) ? Secondly, testr has an inconsistent interface – if testr is letting a user debug things to testr through to child workers in a chain, it needs to use something structured (e.g. subunit) and route stdin to the actual worker, but the final testr needs to unwrap everything – this is needlessly complex. Lastly, for some languages at least, its possibly to dynamically pick up new code at runtime – so a simple inotify loop and we could avoid new-process (and more importantly complete-enumeration) *entirely*, leading to very fast edit-test cycles.

So, in this blog post I’m really running this idea up the flagpole, and trying to sketch out the interface – and hopefully get feedback on it.

Taking subunit.run as an example process to do this to:

There should be an option to change from one-shot to server mode

In server mode, it will listen for commands somewhere (lets say stdin)

On startup it might eager load the available tests

One command would be list-tests – which would enumerate all the tests to its output channel (which is stdout today – so lets stay with that for now)

Another would be run-tests, which would take a set of test ids, and then filter-and-run just those ids from the available tests, output, as it does today, going to stdout. Passing somewhat large sets of test ids in may be desirable, because some test runners perform fixture optimisations (e.g. bringing up DB servers or web servers) and test-at-a-time is pretty much worst case for that sort of environment.

Another would be be std-in a command providing a packet of stdin – used for interacting with debuggers

So that seems pretty approachable to me – we don’t even need an async loop in there, as long as we’re willing to patch select etc (for the stdin handling in some environments like Twisted). If we don’t want to monkey patch like that, we’ll need to make stdin a socketpair, and have an event loop running to shepard bytes from the real stdin to the one we let the rest of Python have.

What about that nirvana above? If we assume inotify support, then list_tests (and run_tests) can just consult a changed-file list and reload those modules before continuing. Reloading them just-in-time would be likely to create havoc – I think reloading only when synchronised with test completion makes a great deal of sense.

Would such a test server make sense in other languages? What about e.g. testtools.run vs subunit.run – such a server wouldn’t want to use subunit, but perhaps a regular CLI UI would be nice…

Just saw http://sny.no/2014/04/dbts and I feel compelled to note that distributed bug trackers are not new – the earliest I personally encountered was Aaron Bentley’s Bugs everywhere – coming up on it’s 10th birthday. BE meets many of the criteria in the dbts post I read earlier today, but it hasn’t taken over the world – and I think this is in large part due to the propogation nature of bugs being very different to code – different solutions are needed.

XXXX: With distributed code versioning we often see people going to some effort to avoid conflicts – semantic conflicts are common, and representation conflicts extremely common.The idions

Note that only one of these things – the commit to fix the bug – happens in the same code tree as the code; and that the commit fixing it may be delayed by many things before the fix is available to users. Also note that conceptually conflicts can happen in any of those fields except 1).

Anyhow – my humble suggestion for tackling the conflicts angle is to treat all changes to a bug as events in a timeline – e.g. adding a tag ‘foo’ is an event to add ‘foo’, rather than an event setting the tags list to ‘bar,foo’ – then multiple editors adding ‘foo’ do not conflict (or need special handling). Collaboratively edited fields would be likely be unsatisfying with this approach though – last-writer-wins isn’t a great story. OTOH the number of people that edit the collaborative fields on any given bug tend to be quite low – so one could defer that to manual fixups.

Further, as a developer wanting local access to my bug database, syncing all of these things is appealing – but if I’m dealing with a million-bug bug database, I may actually need the ability to filter what I sync or do not sync with some care. Even if I want everything, query performance on such a database is crucial for usability (something git demonstrated convincingly in the VCS space).

Lastly, I don’t think distributed bug tracking is needed – it doesn’t solve a deeply burning use case – offline access would be a 90% solution for most people. What does need rethinking is the hugely manual process most bug systems use today. Making tools like whoopsie-daisy widely available is much more interesting (and that may require distributed underpinnings to work well and securely). Automatic collation of distinct reports and surfacing the most commonly experienced faults to developers offers a path to evidence based assessment of quality – something I think we badly need.

I feel like I’m taking a big personal risk writing this, even though I know the internet is large and probably no-one will read this .

So, dear reader, please be gentle.

As we grow – as people, as developers, as professionals – some lessons are are hard to learn (e.g. you have to keep trying and trying to learn the task), and some are hard to experience (they might still be hard to learn, but just being there is hard itself…) I want to talk about a particular lesson I started learning in late 2008/early 2009 – while I was at Canonical – sadly one of those that was hard to experience.

At the time I was one of the core developers on Bazaar, and I was feeling pretty happy about our progress, how bzr was developing, features, community etc. There was a bunch of pressure on to succeed in the marketplace, but that was ok, challenges bring out the stubborn in me . There was one glitch though – we’d been having a bunch of contentious code reviews, and my manager (Martin Pool) was chatting to me about them.

I was – as far as I could tell – doing precisely the right thing from a peer review perspective: I was safeguarding the project, preventing changes that didn’t fit properly, or that reduced key aspects- performance, usability – from landing until they were fixed.

However, the folk on the other side of the review were feeling frustrated, that nothing they could do would fix it, and generally very unhappy. Reviews and design discussions would grind to a halt, and they felt I was the cause. [They were right].

And here was the thing – I simply couldn’t understand the issue. I was doing my job; I wasn’t angry at the people submitting code; I wasn’t hostile; I wasn’t attacking them (but I was being shall we say frank about the work being submitted). I remember saying to Martin one day ‘look, I just don’t get it – can you show me what I said wrong?’ … and he couldn’t.

Canonical has a 360′ review system – every 6 months / year (it changed over time) you review your peers, subordinate(s) and manager(s), and they review you. Imagine my surprise – I was used to getting very positive reports with some constructive suggestions – when I scored low on a bunch of the inter-personal metrics in the review. Martin explained that it was the reviews thing – folk were genuinely unhappy, even as they commended me on my technical merits. Further to that, he said that I really needed to stop worrying about technical improvement and focus on this inter-personal stuff.

Two really important things happened around this time. Firstly, Steve Alexander, who was one of my managers-once-removed at the time, reached out to me and suggested I read a book – Getting out of the box – and that we might have a chat about the issue after I had read it. I did so, and we chatted. That book gave me a language and viewpoint for thinking about the problem. It didn’t solve it, but it meant that I ‘got it’, which I hadn’t before.

So then the second thing happened – we had a company all hands and I got to chat with Claire Davis (head of HR at Canonical at the time) about what was going on. To this day the sheer embarrassment I felt when she told me that the broad perception of me amongst other teams managers was – and I paraphrase a longer, more nuance conversation here – “technically fantastic but very scary to have on the team – will disrupt and cause trouble”.

So, at this point about 6 months had passed, I knew what I wanted – I wanted folk to want to work with me, to find my presence beneficial and positive on both technical and team aspects. I already knew then that what I seek is technical challenges: I crave novelty, new challenges, new problems. Once things become easy, it call all too easily slip into tedium. So at that time my reasoning was somewhat selfish: how was I to get challenges if no-one wanted to work with me except in extremis?

I spent the next year working on myself as much as specific projects: learning more and more about how to play well with others.

In June 2010 I got a performance review I could be proud of again – I was – in no way – perfect, but I’d made massive strides. This journey had also made huge improvements to my personal life – a lot of stress between Lynne and I had gone away. Shortly after that I was invited to apply for a new role within Canonical as Technical Architect for Launchpad – and Francis Lacoste told me that it was only due to my improved ability to play well with others that I was even considered. I literally could not have done the job 18 months before. I got the job, and I think I did pretty well – in fact I was awarded an internal ‘Spotlight on Success’ award for what we (it was a whole Launchpad team team effort) achieved while I was in that role.

So, what did I change/learn? There’s just a couple of key changes I needed to make in myself, but a) they aren’t sticky: if I get overly tired, ye old terrible Robert can leak out, and b) there’s actually a /lot/ of learnable skills in this area, much of which is derived – lots of practice and critical self review is a good thing. The main thing I learnt was that I was Selfish. Yes – capital S. For instance, in a discussion about adding working tree filter to bzr, I would focus on the impact/risk on me-and-things-I-directly-care-about: would it make my life harder, would it make bzr slower, was there anything that could go wrong. And I would spend only a little time thinking about what the proposer needed: they needed support and assistance making their idea reach the standards the bzr community had agreed on. The net effect of my behaviours was that I was a class A asshole when it came to getting proposals into a code base.

The key things I had to change were:

I need to think about the needs of the person I’m speaking to *and not my own*. [Thats not to say you should ignore your needs, but you shouldn't dwell on them: if they are critical, your brain will prompt you].

There’s always a reason people do things: if it doesn’t make sense, ask them! [The crucial conversations books have some useful modelling here on how and why people do things, and on how-and-why conversations and confrontations go bad and how to fix them.]

Ok so this is all interesting and so forth, but why the blog post?

Firstly, I want to thank four folk who were particularly instrumental in helping me learn this lesson: Martin, Steve, Claire and of course my wife Lynne – I owe you all an unmeasurable debt for your support and assistance.

Secondly, I realised today that while I’ve apologised one on one to particular folk who I knew I’d made life hard for, I’d never really made a widespread apology. So here it is: I spent many years as an ass, and while I didn’t mean to be one, intent doesn’t actually count here – actions do. I’m sorry for making your life hell in the past, and I hope I’m doing better now.

Lastly, if I’m an ass to you now, I’m sorry, I’m probably regressing to old habits because I’m too tired – something I try to avoid, but it’s not always possible. Please tell me, and I will go get some sleep then come and apologise to you, and try to do better in future.

I’ve transitioned to a new key – announcement here or below. If you’ve signed my key in the past please consider signing my new key to get it integrated into the web of trust. Thanks!

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1,SHA256
Sun, 2013-10-13
Time for me to migrate to a new key (shockingly late - sorry!).
My old key is set to expire early next year. Please use my new key effective
immediately. If you have signed my old key then please sign my key - this
message is signed by both keys (and the new key is signed by my old key).
old key:
pub 1024D/FBD3EB8E 2002-07-20
Key fingerprint = 9222 8732 859D 25CC 2560 B617 867B F9A9 FBD3 EB8E
new key:
pub 4096R/AAC0E286 2013-10-13
Key fingerprint = 8244 0CEA B440 83C7 9431 D2CC 298E 9A19 AAC0 E286
The new key is up on the keyservers, so you can just pull it from there.
- -Rob
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iEYEARECAAYFAlJZ8FEACgkQhnv5qfvT644WxACfWBoKdVW+YDrMR1H9IY6iJUk8
ZC8AoIMRc55CTXsyn3S7GWCfOR1QONVhiQEcBAEBCAAGBQJSWfBRAAoJEInv1Yjp
ddbfbvgIAKDsvPLQil/94l7A3Y4h4CME95qVT+m9C+/mR642u8gERJ1NhpqGzR8z
fNo8X3TChWyFOaH/rYV+bOyaytC95k13omjR9HmLJPi/l4lnDiy/vopMuJaDrqF4
4IS7DTQsb8dAkCVMb7vgSaAbh+tGmnHphLNnuJngJ2McOs6gCrg3Rb89DzVywFtC
Hu9t6Sv9b0UAgfc66ftqpK71FSo9bLQ4vGrDPsAhJpXb83kOQHLXuwUuWs9vtJ62
Mikb0kzAjlQYPwNx6UNpQaILZ1MYLa3JXjataAsTqcKtbxcyKgLQOrZy55ZYoZO5
+qdZ1+wiD3+usr/GFDUX9KiM/f6N+Xo=
=EVi2
-----END PGP SIGNATURE-----

Python 3 recently introduced a nice feature – subtests. When I was putting subunit version 2 together I tried to cater for this via a heuristic approach – permitting the already known requirement that some tests which are reported are not runnable be combined with substring matching to identify subtests.

However that has panned out poorly, when I went to integrate this with testr the code started to get fugly.

So, I’m going to extend the StreamResult API to know about subtests, and issue a subunit protocol bump – to 2.1 – to add a new field for labelling subtest events. My plan is to make this build a recursive tree structure – that is given test “test_foo” with subtest “i=3″ which the Python subtest code would identify as “test_foo (i=3)”, they should be identified in StreamResult as test_id “test_foo (i=3)” and parent_test_id “test_foo”. This can then nest arbitrarily deep if test runners decide to do that, and the individual runnability becomes up to the test runner, not testrepository / subunit / StreamResult.

I have a complete implementation of the StreamResult API up as a patch for testtools. Thats 2K LOC including comeprehensive tests.

Similarly, I have an implementation of a StreamResult parser and emitter for subunit. Thats 1K new LOC including comprehensive tests, and another 500 lines of churn where I migrate all the subunit filters to v2.

pdb debugging works through subunit v2, permitting dropping into a debugger to work. Yay.

With no further tuning I was able to get 2Gbps doing linear file copies via Samba, which I suspect is rather pushing the limits of my circa 2007 home server – I’ll investigate futher to identify where the bottlenecks are, but the networking itself I suspect is ok – netperf got me 6.7Gbps in a trivial test.

StreamResult, covered in my last few blog posts, has panned out pretty well.

Until that is, that I sat down to do a serialised version of it. It became fairly clear that the wire protocol can be very simple – just one event type that has a bunch of optional fields – test ids, routing code, file data, mime-type etc. It is up to the recipient at the far end of a stream to derive semantic meaning, which means that encoding a lot of rules (such as a data packet can have either a test status or file data) into the wire protocol isn’t called for.

If the wire protocol doesn’t have those rules, Python parsers that convert a bytestream into StreamResult API calls will have to manually split packets that have both status() and file() data in them… this means it would be impossible to create many legitimate bytestreams via the normal StreamResult API.

That seems to be an unnecessary restriction, and thinking about it, having a very simple ‘here is an event about a test run’ API that carries any information we have and maps down a very simple wire protocol should be about as easy to work with as the current file or status API.

Most combinations of file+status parameters is trivially interpretable, but there is one that had no prior definition – a test_status with no test id specified. Files with no testid are easily considered as ‘global scope’ for their source, so perhaps test_status should be treated the same way? [Feedback in comments or email please]. For now I’m going to leave the meaning undefined and unconstrained.

So I’m preparing a change to my patchset for StreamResult to:

Drop the file() method altogether.

Add file_bytes, mime_type and eof parameters to status().

Make the test_id and test_status parameters to status() optional.

This will make the API trivially serialisable (both to JSON or protobufs or whatever, or to the custom binary format I’m considering for subunit), and equally trivially parsable, which I think is a good thing.