Data-themed articles, essays, and studies

Only The Simple

There is a definite Cult of Complexity in analysis and software engineering – an often-unstated philosophy that complexity implies sophistication, intelligence, and general cache. That cult is telegraphed by terms like “advanced” analytics – as if the complement to “advanced” were somehow elementary or remedial.

There is a certain buzz that comes with creating something not easily understood by others. But like a recitation of Twinkle Twinkle Little Star in ancient Greek, we can easily demonstrate knowledge, without offering communication.

And communication is where it’s at. Without communication even the best and most important results won’t be understood, much less believed. Complexity gets in the way of communication: it obscures; it’s a turn-off to those who might benefit from what we’ve done; and really it telegraphs insecurity and indifference more than intelligence and cache.

It’s more difficult, if sometime less appreciated, to solve a complex problem simply and elegantly. Still, all really good solutions – and really great ideas – are fundamentally as simple as they can be. Some complexity we do need, but a much of it we do not.

Pride might drive us to overly complex solutions, but another driver is economic. What better job security is there than a complex model? The computations alone can keep many people and computers busy for years. I once attended a conference where the proposed remedy for a complex, poorly-performing fluid-flow model was to add more model parameters – the idea being that with enough degrees of freedom, things would ultimately be OK. But as experienced analysts already know (guess how I was making myself unpopular that day…) many parameters will let us fit anything, while predicting little or nothing. I now frequently see empirical models that are untested for predictive robustness. To the initiated, such models are instantly recognizable – they are so clearly more complex than the essential phenomena they purport to represent, they just couldn’t be predictive. And it turns out they aren’t…

Complexity also has its own inertia – anyone who has written software can testify that without due care, before we know it there are 100, 500, or 1000 lines of tangled code representing our pathway through the problem at hand. Errors are inevitable in such situations, because we can’t stand back and assess what code really does. Vendors who tout solutions that can be crafted in “days” are pulling our leg – you can make a mess in days, but getting to a comprehensible solution usually takes longer.

The real beauty of engineering or analysis comes when we take some crazy and complex thing, and render it to a form simple enough for all of us to understand. The elegance of a simple solution? Now that’s advanced.

In the physics community the elegant solution has traditionally been the most highly prized – people like Einstein, Landau, Dirac, Feynman, Maxwell, Teller, to name just a few – were celebrated for creating simple, elegant theories that enhanced comprehension. We don’t have to operate on their level to emulate their intent, and agree simplicity and elegance are the ultimate goals and real prizes in any data analysis.

Elegance and simplicity are not really luxuries – if our result is unexpected or undesirable, we’ll need to convince an unhappy audience – who may not like math, or mathematicians – why our result is actually correct. I’ve learned the hard way. In that scenario, only the simple results survive. From simplicity we get to comprehension, from comprehension we get to adoption, and from adoption we get the chance to make an impact.