Tag Archives: Development

Since I posted the “Free Fall” development post, I’ve been thinking a bit about the pros and cons of this type of off-release development.

The OpenStack Swift project does not do free fall because they are on a constant “ship ready” state for the project and only loosely flow the broader OpenStack release track. My team at Dell also has minimal free fall development because we have a more frequent release clock and choose to have the team focus together through dev/integrate/harden cycles as much as possible.

From a Lean/Agile/CI perspective, I would work to avoid hidden development where possible. New features are introduced by split test (they are in the code, but not active for most users) so that the all changes in incremental. That means that refactoring, rearchitecture and new capabilities appear less disruptively. While it may this approach appears to take more effort in the short term; my experience is that it accelerates delivery because we are less likely to over develop code.

Unfortunately, free fall development has the opposite effect. Having code that appears in big blocks is contrary to best practices in my opinion. Further, it rewards groups that work asynchronously and

While I think that OpenStack benefits from free fall work, I think that it is ultimately counter-productive.

I’ve been watching a pattern emerge on the semiannual OpenStack release cycles for a while now. There is a hidden but crucial development phase that accelerates projects faster than many observers realize. In fact, I believe that substantial work is happening outside of the “normal” design cycle during what I call “free fall” development.

Understanding when the cool, innovative stuff happens is essential to getting (and giving) the most from OpenStack.

The published release cycle looms like a 6 stage ballistic trajectory. Launching at the design summit, the release features change and progress the most in the first 3 milestones. At the apogee of the release, maximum velocity is reached just as we start having to decide which features are complete enough to include in the release. Since many are not ready, we have to jettison (really, defer) partial work to ensure that we can land the release on schedule.

I think of the period where we lose potential features as free fall because thing can go in any direction. The release literally reverses course: instead of expanding, it is contracting. This process is very healthy for OpenStack. It favors code stability and “long” hardening times. For operators, this means that the code stops changing early enough that we have more time to test and operationalize the release.

But what happens to the jettisoned work? In free fall, objects in motion stay in motion. The code does not just disappear! It continues on its original upward trajectory.

The developers who invested time in the code do not simply take a 3 month sabbatical, nor do they stop their work and start testing the code that was kept. No, after the short in/out sorting pause, the free fall work continues onward with rockets blasting. The challenge is that it is now getting outside of the orbit of the release plan and beyond the radar of many people who are tracking the release.

The consequence of this ongoing development is that developers (and the features they are working on) show up at the summit with 3 extra months of work completed. It also means that OpenStack starts each release cycle with a bucket of operationally ready code. Wow, that’s a huge advantage for the project in terms of delivered work, feature velocity and innovation. Even better, it means that the design summit can focus on practical discussions of real prototypes and functional features.

Unfortunately, this free fall work has hidden costs:

It is relatively hidden because it is outside of the normal release cycle.

It makes true design discussions less productive because the implemented code is more likely to make the next release cycle

Integration for the work is postponed because it continues before branching

Teams that are busy hardening a core feature can be left out of work on the next iteration of the same feature

Forking can make it hard to capture bugs caught during hardening

I think OpenStack greatly benefits from free fall development; consequently, I think we need to acknowledge and embrace it to reduce its costs. A more explicit mid-release design synchronization when or before we fork may help make this hidden work more transparent.

I think that part of the confusion is how difficult it is for each category of cloud user to see their challenges/issues for the other classes of user.

We see this in spades during internal PaaS discussions. People with development backgrounds has a fundamentally different concept of a PaaS benefits. In many cases, those same benefits (delegation to a provider for core services like database) are considered disadvantages for the other class of user (you want someone else to manage what!).

Ultimately, the applications are at the core of any XaaS conversation and define what “type” of cloud need to be consumed.

One of the most consistent comments I hear about cloud applications is that it fundamentally changes the way applications are written. I’m not talking about the technologies, but the processes and infrastructure.

Since our underlying assumption of a cloud application is that node failure is expected then our development efforts need to build in that assumption before any code is written. Consequently, cloud apps should be written directly on cloud infrastructure.

In old school development, I would have all the components for my application on my desktop. That’s necessary for daily work, but does not give me a warm fuzzy for success in production.

Today’s scale production environments involve replicated data with synchronization lags, shared multi-writer memcache, load balancers, and mixed code versions. There is no way that I can simulate that on my desktop! There is no way I can fully anticipate how that will behave all together!

The traditional alternative is to wait. Wait for QA to try and find bugs through trial and error. Or (more likely) wait for users to discover the problem post deployment.

My alternative is to constantly deploy the application to a system that matches production. As a bonus, I then attack the deployment with integration tests and simulators.

If you’re thinking that is too much effort then you are no thinking deeply enough. This model forces developers to invest in install and deployment automation. That means that you will be able to test earlier in the cycle. It means you will be able to fix issues more quickly. And that you’ll be able to ship more often. It means that you can involve operations and networking specialists well before production. You may even see more collaboration between your development, quality, and operations teams.

Forget about that last one – if those teams actually worked together you might accidently ship product on time. Gasp!

I had an interesting argument recently in a very crowded meeting – maybe we were all getting that purple meeting haze, but it started to take on all the makings of a holy war (so I knew it would make a good blog post).

We were discussing an API for interacting with a server cloud and the API was intentionally very abstracted. Specifically, you could manage a virtual server but you could not see which host was providing the resources. The vendor wanted to hide the raw resources from API consumers. This abstraction was good; it made the API simpler and allowed the provider to flexibility about how it implemented the backend. The API abstraction made the underlying system opaque.

So far it was all rainbows, unicorns and smiling yellow hatted yard gnomes.

Then I wanted to know if it was possible to relate information between the new API and the existing resource transparent API. Why would I want to do that? I was interested in the 5% case where we needed to get information about the specific resources that assigned. For example, when setting up redundant database replication, we want to make sure that they are not assigned to the same physical hosts.

More importantly, I do not want the vendor to clutter their new abstracted API with stuff to handle these odd ball use cases. Calling them 5% use-cases is deceptive: they are really in the hugely diverse bucket of use-cases outside of the 95% that are handled nicely the abstractions bucket. Trying to crow bar in these extra long tail use-cases will make the API unwieldy for the intended audience.

Someone else in the meeting disagreed with the premise of my question and wanted me to explain it. In answer, I used the tautology “Abstractions are useful, until they are not.”

The clearest example of this concept is the difference between Rails ActiveRecord and Hibernate. Both are excellent object-relational (OR) abstractions. The make the most general cases (select * from foo where ID = bar) quick and easy. But they are radically different at the edge of the abstraction. ActiveRecord expects that programmers will write directly to the database’s native SQL to handle the 5% exceptions. Hibernate added the complex and cumbersome HQL on top of their abstraction layer. HQL is nearly (in some cases, more) complex than the SQL language that it tries to abstract. For Hibernate, this is really an anti-abstraction that’s no longer useful.

Over stretching an abstraction encourages the wrong behaviors and leads to overly complex and difficult to maintain APIs. When you reach the edge of an abstraction, it’s healthy to peek under the covers. Chances are that you’re doing something wrong or something unique enough that you’ve outgrown the abstraction.

And that’s enough for now because blog posts are useful, until they are not.