Abstractions, The Costs

Why do we strive to abstract the data?

After some years wondering, I do not have an exact answer to that question yet. But I can share what I have found and learnt so far.

First, that answer is the source of our thirst for tunable/descriptive/flexible notation in our programming languages. Of-course the relationship between these two, to a great extent, is mechanical and in some cases mathematical, yet the drive is not.

Second, notations are meant for modeling intentions, not the data. It is not possible to give a different meaning to the data, only different interpretations. The data is immutable in concept. Being mistaken about this, as if we model the data, too, stems from that answer.

How can we understand this question better?

Cognitive Load

Cognitive load can be described using ball juggling analogy. It is the number of balls/items that one can keep in the air/mind. And it is limited. Cognitive Load is the currency of the land of programmers.

Most importantly the amount of Cognitive Load that a team can bear, is limited. We have to choose where to spend it; on tools or on the problem at hand.

We spend some of our $CL (Cognitive Load) on a daily basis on babysitting our tools and environment. There are also unintended complexities in our code base that hinders our daily onboarding process. Onboarding should be incorporated into our daily workflow. It’s not just about bringing in new recruits. It is also about warming up for starting with your own code on a daily basis.

I can not recall how I wrote the code in the same sense that I can not recall how I drew a cartoon — it’s not hard to remember the process but it’s hard to remember the decisions and the reasons for each line.

Failing to manage our $CL account results in different kinds of unbalance; unbalanced life/work situation being one of them, a costly one for programmers and companies.

Is it possible to devise general enough solutions to cover most of our cases?

The Case of Go

One might argue that by not employing a more descriptive language we might end up with too much conventions in our code-base and/or our infrastructure/deployment.

First and unfortunately solving a problem does not mean understanding the problem (at least in programming) — unless you are working on a grand theory, it’s always case specific. Because we are solving/facing other people’s problems, not abstract rigid concepts.

Second, it’s better to have problem-specific tools that keep things in check rather than complex syntax and semantics and getting surprised in most improper time.

Go is meant to produce maintainable code bases for large teams. It also results maintainable code bases for a single developer. And it does not provide many language features. It is very verbose at times. And despite my personal love for things like pattern matching (pattern matching to fields is what an interface is to methods), I have to admit that verbosity is the best form of simplicity. Virtually anybody can handle a decade old, verbose piece of code.

This Go code loads the config (simplest form — from a discussion on Elixir Forum):

Take errors in Go for example. Go does not consider errors as exceptions. Simply because they are not exceptions. They are the other expected outcome. Of-course Go provides notations to present exceptions and when faced one, it panics (because it was an exception and was unexpected).

Let’s see the same code in another language (I like Elixir very much. That’s why it is used for demonstration purpose!):

Or even more concisely:

Where the error handling had gone? Where is it taking place?

That’s the main drawback of treating errors as exceptions. Implicit error handling (aka exception handling) creats two totally different, unrelated execution contexts. Inside one of them resides the linear, clear happy path and inside the other one, resides a totally non-linear, spaghetti graph of logic flow. At the same time anybody can read and understand Go code, ten years from now, without knowing much Go.

I even learnt to stop using channels to represent lazy computations and started to use scanner pattern more and more. While channels are part of Go notation, you can not feed them synchronously which makes them improper for using in APIs most of the times because they leak/force concurrency without being a part of requirements for a specific bit of logic up front.

Also structs should be considered temporary tools, not final concepts. They can be seen as named closures — more exactly as shared mutable named closures. And their main functionality is to provide protocol/interface descriptions, like protobuf files. They transfer the data into your environment, the programming language in use. They have no meaning by themselves. Of-course they can be used for implementing logic or services. But when facing data, they should be considered as just descriptions.

Keeping things simple, while not easy, is essential. The whole Unix OS is made upon just three abstractions (Process, File and Socket — and socket is actually a special case of File). Also it helps with understanding/reasoning-about the code. If you have not noticed, I have to remind you bugs are valid code with no compile time error, and they have passed all the tests, we wrote! So bringing in more complexities into our problem is obviously counter productive; it’s developer experience vs product requirements. It’s better to be creative about the problem/solution than about our tools.

So should we stop abstracting things? Don’t we need more powerful tools?

Abstract What You Need

Abstract what you need, not what you provide. Simplify what you provide and try not to hide things.

For example if a logger for debugging is needed, then we need just a function:

This is just a generalized form of accept interfaces return concrete types. There are times that it makes sense to accept simpler language constructs other than interfaces, like a function or a map — unless the evolution of the code demands otherwise, as it ages. And if some specific levels of logging is needed inside a package, an interface should be defined inside that package.

We do not care what is the actual underlying type of the logger.

If understanding one thing requires understanding other things then they are tightly coupled (which is bad). By abstracting what you need, it is possible to prevent complexity from leaking into the current scope of the code. And Go actually provides a nice mechanism for doing this, via interfaces.

Cluttering the code-base with unnecessary made-up concepts and complex notation will not help. If obsession for flexible or descriptive notations stops you from being productive, then you might consider changing your tool/PL. But here comes the question: does your final product demands you to do so?

Why do we strive to abstract the data?

Because we do not understand the data, hence we try to “make it more descriptive” to understand it “better”. We abstract to understand. We need to name things to make them known. We overload data with unnecessary metadata. We try to know the data, to make it known to our environment/PL, to have a context to understand it.

But that abstraction is just a temporary thing not a mandatory part of the data. Because if that was the case, then a problem solved in Python would be impossible to solve in Clojure or Go. And that’s very important.

The actual generalization must happen at understanding level, not when/where the abstraction takes place. It is not a̶b̶s̶t̶r̶a̶c̶t̶ ̶-̶>̶ ̶u̶n̶d̶e̶r̶s̶t̶a̶n̶d̶ ̶-̶>̶ ̶s̶i̶m̶p̶l̶i̶f̶y̶ — which leaks too much overloaded “helper” concepts into the code. It is simplify -> understand -> abstract. Considering abstracting things as a tool for understanding them, will always fail. Because our understanding of the problem grows in time. But hard coded abstractions — as the delusional premature excitement about understanding the problem — do not grow that easily and may have already made our code base, unmaintainable.

We must not abstract to understand. We must abstract our understanding.