Data-themed articles, essays, and studies

What Doesn’t Change

Ask any physicist: when you can identify something in a dynamic system that is invariant and unchanging, it’s been a good day. It could be energy, momentum, or a particular particle count that is conserved – constants simplify and assist analysis.

By analogy, one of my favorite programming features is the constant declaration – particularly when it is available in class or function declarations. In many current languages – particularly “scripting” languages – it isn’t possible to know what a function might change without reading its text (or, just possibly, the documentation), as altering objects outside the function’s declaration is common and even expected.

Constant declarations, as part of strict function declarations (roughly, that the function can only work on well-typed objects in the function’s declaration) do three things that are really great:

They indicate what will and will not be changed

They make it impossible to alter something we shouldn’t (at least not without an explicit and readily-detected override).

They force us to think about the actions a function will take, and as a side benefit, help us create short and focused functions.

By contrast, what we currently experience is a lot more like this:

Functions become little more than code containers, with is no implied contract regarding their actions on the outside world.

Side effects are common, and often difficult to detect even in code reviews.

Functions become a lot longer than they should be, as there is little to constrain them. Longer functions are harder to reuse, harder to maintain, harder to analyze, harder to document.

Functions often end up doing several things, not just one. Likewise, this makes everything about working with them more difficult.

I’ve known very disciplined programmers who deal with these issues, but the methodologies of discipline are usually personal and difficult to reproduce. More often, to meet the pressures of practical problems, code is written to meet current contingencies, and these issues ultimately surface.

I appreciate many vendors, as well as the creativity of my programmer friends – but where I often disagree with both is the question of what constitutes an ideal programming language for data-related work.

These languages propagate for reasons of simplicity, rapid early progress (which sells tools), and specialty features.

Unfortunately, I think that’s backwards. As a community we tend to write too much, too quickly, without an appropriate eye towards future consequences, and then pay the price later on. What allows our work to easily scale to practical and complex problems are structured- and object-oriented programming constructs – they encourage us to think about the entities and actions we are crafting, and how someone else might use them. I was asked by a vendor product manager what they could do to improve their scripting language, and my answer was “Change it to C++.” (His look of distaste suggested that I had not given the expected answer, which was probably to add some new feature to what they already had. But he didn’t ask that, he asked what I change…)

I’ve heard the argument that languages like C#/C++/Java are not practical, and would discourage customers from using a particular tool or environment. Well, I’m not sure who they’re talking with, but a small thing like learning a language won’t slow down the programmers I know. In fact, many developers use one of these languages, which already have the key features promoting scalability. We still have to use the language properly, but better language support would be a start to better development in data- and web-related outcomes. I hear you – I’m dreaming, and common-currency languages are part of what won’t change. But it would be nice – anyone for a C++-to-R translator?