Building the right thing, building it right, fast

Who I am

My name is Jakub Holy and I’m a software craftsmanship enthusiast and a (mainly JVM-based) developer since ~ 2005, consultant, and occasionally a project manager, working currently with TeliaSonera AS in Norway. More under About and #opinion.

Jez Humble: Four Principles of Low-Risk Software Releases – how to make your releases safer by making them incremental (versioned artifacts instead of overwritting, expand & contract DB scripts, versioned APIs, releasing to a subset of customers first), separating software deployment from releasing it so that end-users can use it (=> you can do smoke tests, canary releasing, dark launching [feature in place but not visible to users, already doing something]; includes feature toggles [toggle on only for somebody, switch off new buggy feature, …]), delivering features in smaller batches (=> more frequently, smaller risk of any individual release thanks to less stuff and easier roll-back/forward), and optimizing for resiliance (=> ability to provision a running production system to a known good state in predictable time – crucial when stuff fails).

The Game of Distributed Systems Programming. Which Level Are You? (via Kent Beck) – we start with a naive approach to distributed systems, treating them as just a little different local systems, then (painfully) come to understand the fallacies of distributed programming and start to program explicitely for the distributed environment leveraging asynchronous messaging and (often functional) languages with good support for concurrency and distribution. We suffer by random, subtle, non-deterministic defects and try to separate and restrict non-determinism by becoming purely functional … . Much recommended to anybody dealing with distributed systems (i.e. everybody, nowadays). The discussion is worth reading as well.

Shapes Don’t Draw – thought-provoking criticism of inappropriate use of OOP, which leads to bad and inflexible code. Simplification is OK as long as the domain is equally simple – but in the real world shapes do not draw themselves. (And Trades don’t decide their price and certainly shouldn’t reference services and a database.)

The framework sorts the issues facing leaders into five contexts defined by the nature of the relationship between cause and effect. Four of these—simple, complicated, complex, and chaotic—require leaders to diagnose situations and to act in contextually appropriate ways. The fifth—disorder—applies when it is unclear which of the other four contexts is predominant.

Et spørsmål om kompleksitet (Norwegian). Key ideas mixed with my own: Command & control management in the traditional Ford way works very well – but only in stable domains with clear cause-and-effect relationships (i.e. the Simple context of Cynefin). But many tasks today have lot of uncertanity and complexity and deal with creating new, never before seen things. We try to lead projects as if they were automobile factories while often they are more like research – and researchers cannot plan when they will make a breakthrough. Most of the new development of IT systems falls into the Complex context of Cynefin – there is lot of uncertanity, no clear answers, we cannot forsee problems, and have to base our progress on empirical experience and leverage emergence (emergent design, ..).

The Economics of Developer Testing – a very interesting reflection on the cost and value of testing and what is enough tests. Tests cost to develop and maintain (and different tests cost differently, the more complex the more expensive). Not having tests costs too – usually quite a lot. To find the right ballance between tests and code and different types of tests we must be aware of their cost and benefits, both short & long term. Worth reading, good links. (Note: We often tend to underestimate the cost of not having good tests. Much more then you might think.)

Quotes

I get paid for code that works, not for tests, so my philosophy is to test as little as possible to reach a given level of confidence (I suspect this level of confidence is high compared to industry standards, but that could just be hubris). If I don’t typically make a kind of mistake (like setting the wrong variables in a constructor), I don’t test for it. I do tend to make sense of test errors, so I’m extra careful when I have logic with complicated conditionals. When coding on a team, I modify my strategy to carefully test code that we, collectively, tend to get wrong.

Different people will have different testing strategies based on this philosophy, but that seems reasonable to me given the immature state of understanding of how tests can best fit into the inner loop of coding. Ten or twenty years from now we’ll likely have a more universal theory of which tests to write, which tests not to write, and how to tell the difference. In the meantime, experimentation seems in order.

Sometimes “vagrant destroy” fails with an exception from the depths of the virtualbox Ruby gem or vagrant up freezes for a long time only to fail with SSH connection failure message. Here are some tips how to solve such problems.

I was fortunate to attend Kent Beck’s lecture summarizing his experiences and thoughts regarding efficient software design. Traditionally there have been two schools of thought about design: Predictive design, trying to design everything upfront (and making lot of wrong decisions) and reactive design, where any design is only done if it is absolutely necessary for implementing a feature (thus developing often on top of an insufficient design). Kent tried hard to discover such a design method that really delivers on the promises of both while avoiding their failures. This method is based on evolving design frequently in small, safe steps and focusing on learning while following some key best practices. It doesn’t really matter what scope of design we are are speaking about, the method and principles are the same whether you’re redesigning a class or a complex system.

“Throughput is improving as machines get stronger. However, there is a sweet-spot, a point where adding hardware doesn’t help performance. The sweet spot is around the XL machine, which reaches a [max] throughput of around 7000 tpm.” (transactions per minute => ~ 110 tx/sec)

Disclaimer: No banchmark proves anything generally applicable, it’s always necessary to run one’s own production load and measure that to see how in reality a DB performs for one’s actual needs.

Notes

The number of concurrent connections is by default derived from the memory, namely 150 for a small 1.5GB instance and 650 for a large 7.5GB instance. According to one expert it’s completely OK to set it to 1000 connections without regard to memory; MySQL should handle it.