Post navigation

Getters

In the Java world property accessors are traditionally prefixed with “get” and “set”, the Java bean convention:

person.getFirstName()

Code becomes more pleasant to read if you omit the “get” prefix:

person.firstName()

Of course, you can do this only if you don’t use a framework that depends on the convention to recognize properties via reflection (like some OR mappers, for example).

What about setters? I rarely write setters anymore. If you design your classes as immutable types you don’t need setters. Even if your class has mutable state you probably want to control this state via methods more specific to the domain of the problem. Also, the more you apply the tell, don’t ask principle the less you will find the need for getters.

Brevity vs. verbosity

There were times when it was common to see mass variable declarations like the following at the beginning of a function:

int i, j, k, l, m, n;
float a, b, c, u, v, x, y, z;

Fortunately, times have changed for the better, and most programmers are aware that descriptive naming is important. However, some programmers do over-compensate. Length of an identifier is not a virtue by itself.

The Objective-C Cocoa framework is famous for overly long method names:

[array objectAtIndex:index]

Parts of Objective-C were inspired by Smalltalk. But in Smalltalk the same method is called at:

[array at:index]

This is a reasonably sufficient name for such a common functionality in programming.

Here’s another example: If the concept of a measurement station is very prevalent in the domain of your project then it’s ok to call instances just station instead of measurementStation if it’s the only kind of station in the domain.

Yes, the IDE does auto-complete long names. However, readability of the code decreases if the reader has to scan the same long-winded names over and over again:

Often you can find names that are more to the point than longer descriptions, e.g. acquire instead of takeOwnershipOf. (source)

Hungarian notation and friends

The famous Hungarian notation is no longer in widespread use. However, there are variations of it that I would recommend against as well for the sake of readability. For example, bookList or bookArray can be simply books. Another variation would be conventions like myField or m_field for member variables. If you need these notations to determine the origin of a variable, then your scopes are probably too big, i.e. your methods, functions or classes are too long. Additionally, IDEs and editors for programmers can highlight these different scopes anyway. Other examples for unnecessary Hungarian-style notation are IFoo for interfaces, EFoo for enums or the infamous FooImpl.

Screaming constants

There is really no need for constants and enum values to constantly SCREAM at you and other readers. This SCREAMING_CASE convention has its origin in C, where constants used to be defined as macros when the const keyword wasn’t introduced yet, and it later found its way into other programming language ecosystems. Names for constants and enum values are not more important than other identifiers and don’t have to be spelled differently. Try it, you will enjoy the newfound silence in your code.

Conclusion

These are some tips to improve readability of code through better names. Some of these tips go against traditional conventions, so you should discuss them with your team before applying them. Consistency within an existing code base can definitely be more important. But if you have the freedom you should definitely give them a try.

Most programmers like freedom. So there are many means of hiding implementations in modern programming languages, e.g. interfaces in Java, header files in C/C++ and visibility modifiers like private and protected in most object-oriented languages. Even your ordinary functions or public class interface gives you the freedom to change the implementation without needing to touch the clients. Evolvability in this sense means you can change and refine your implementations without requiring others, namely clients of your code, to change.

Changing the class interface or function signatures within a project is often possible and feasible, at least if you have access to all client code and use powerful refactoring tools. If you published your code as a library or do not want to break all client code or forcing them to adapt to your changes you have to consider your interface code to be fixed. This takes away some of your precious freedom. So you have to design your interfaces carefully with evolability in mind.

Some programming languages implement the uniform access principle (UAP) that eases evolvability in that it allows you to migrate from public attributes to properties/method calls without changing the clients: Read and write access to the attribute uses the same syntax as invoking corresponding methods. For clarification an example in Python where you may start with a class like:

Now the nice thing is: The above client code still works without changes!

Scala uses a similar and quite concise mechanism for implementing the UAP wheres .NET provides some special syntax for properties but still migration from public fields easily possible.

So in languages supporting the UAP you can start really simple with public attributes holding the plain value without worrying about some potential future. If you later need more sophisticated stuff like caching, computation of the value, validation or even remote retrieval you can add it using language features without touching or bothering clients.

Unfortunately some powerful and widespread languages like Java and C++ lack support for UAP. Changing a public field to a more complex property means the introduction of getter and setter methods and changing all clients. Therefore you see, especially in Java, many data classes littered with trivial getter and setter pairs doing nothing interesting and introducing unnecessary bloat to maintain the evolvability of the code.

What’s so great about unnamed namespaces?

Back when I still used them, my code would usually evolve gradually through a few different “stages of visibility”. The first of these stages was the unnamed-namespace. Later stages would either be a free-function or a private/public member-function.

Lets say I identify a bit of code that I could reuse. I refactor it into a separate function. Since that bit of code is only used in that compile unit, it makes sense to put this function into an unnamed namespace that is only visible in the implementation of that unit.

Okay great, now we have reusability within this one compile unit, and we didn’t even have to recompile any of the units clients. Also, we can just “Hack away” on this code. It’s very local and exists solely to provide for our implementation needs. We can cobble it together without worrying that anyone else might ever have to use it.

This all feels pretty great at first. You are writing smaller functions and classes after all.

Whole class hierarchies are defined this way. Invisible to all but yourself. Protected and sheltered from the ugly world of external clients.

What’s so bad about unnamed namespaces?

However, there are two sides to this coin. Over time, one of two things usually happens:

1. The code is never needed again outside of the unit. Forgotten by all but the compiler, it exists happily in its seclusion.
2. The code is needed elsewhere.

Guess which one happens more often. The code is needed elsewhere. After all, that is usually the reason we refactored it into a function in the first place. Its reusability. When this is the case, one of these scenarios usually happes:

1. People forgot about it, and solve the problem again.
2. People never learned about it, and solve the problem again.
3. People know about it, and copy-and-paste the code to solve their problem.
4. People know about it and make the function more widely available to call it directly.

Except for the last, that’s a pretty grim outlook. The first two cases are usually the result of the bad discoverability. If you haven’t worked with that code extensively, it is pretty certain that you do not even know that is exists.

The third is often a consequence of the fact that this function was not initially written for reuse. This can mean that it cannot be called from the outside because it cannot be accessed. But often, there’s some small dependency to the exact place where it’s defined. People came to this function because they want to solve another problem, not to figure out how to make this function visible to them. Call it lazyness or pragmatism, but they now have a case for just copying it. It happens and shouldn’t be incentivised.

A Bug? In my code?

Now imagine you don’t care much about such noble long term code quality concerns as code duplication. After all, deduplication just increases coupling, right?

But you do care about satisfied customers, possibly because your job depends on it. One of your customers provides you with a crash dump and the stacktrace clearly points to your hidden and protected function. Since you’re a good developer, you decide to reproduce the crash in a unit test.

Only that does not work. The function is not accessible to your test. You first need to refactor the code to actually make it testable. That’s a terrible situation to be in.

What to do instead.

There’s really only two choices. Either make it a public function of your unit immediatly, or move it to another unit.

For functional units, its usually not a problem to just make them public. At least as long as the function does not access any global data.

For class units, there is a decision to make, but it is simple. Will using preserve all class invariants? If so, you can move it or make it a public function. But if not, you absolutely should move it to another unit. Often, this actually helps with deciding for what to create a new class!

Note that private and protected functions suffer many of the same drawbacks as functions in unnamed-namespaces. Sometimes, either of these options is a valid shortcut. But if you can, please, avoid them.

Change is in the nature of software development. Most difficult aspects of the craft revolve around dealing with change. How does one keep software extensible? How do you adapt to new business requirements?

With experience comes the intuition that some kind of changes are more volatile than other changes. For example, it is often safer to add a new function or type to an application than change an existing one.

This is because adding something new means that it is not already strongly connected to the rest of the application. Or at least that’s the assumption. You have yet to decide how the new component interacts with the rest of the application. Usually this is done by a, preferably small, incision in the innards of your software. The first change, the adding, should not break anything. If anything, the small incision should be the only dangerous aspect of the change.

This is as very important concept: adding should not break things! This is so important, I want to give it a name:

The Rule of Additive Changes

Adding something to a well-designed software system should not break existing functionality. Exceptions should be thoroughly documented and communicated.

Systems should always be designed and tought so that the rule of additive changes holds. Failure to do so will lead to confusing surprises in the best cases, and well hidden bugs in worse cases.

Inheritance

Quoting from Wikipedia:

“If S is a subtype of T, then objects of type T in a program may be replaced with objects of type S without altering any of the desirable properties of that program”

This relies on subtyping as an additive change: S works at least as good as any T, so it is an extension, an addition. You should therefore design your systems in a way that the Liskov Substition Principle, and therefore the rule of additive changes, both hold: An addition of a new type in a hierarchy cannot break anything.

Whitelists vs. Blacklists

Blacklists will often violate the rule of additive changes. Once you add a new element to the domain, the domain behind the blacklist will change as well, while the domain behind a whitelist will be unaffected. Ultimately, both can be what you want, but usually, the more contained change will break less – and you can still change the whitelist explicitly later!

Note that systems that filter classes from a hierarchy via RTTI or, even more subtle, via ask-interfaces, are blacklists. Those systems can break easily when new types are introduces to a hierarchy. Extra care needs to be taken to make sure the rule of addition holds for these systems.

Introspection and Reflection

Without introspection and reflection, programs cannot know when you are adding a new type or a new function. However, with introspection, they can. Any additive change can also be an incision point. Therefore, you need to be extra careful when designing systems that use introspection: They should not break existing functionality for adding something.

For example, adding a function to enable a specific new functionality is okay. A common case of this would be to adding a function to a controller in a web-framework to add a new action. This will not inferfere with existing functionality, so it is fine.

On the other hand, adding a member to a controller should not disable or change functionality. Adding a special member for “filtering” or some kind of security setting falls into this category. You think you’re merely adding something, but in fact you are modifying. A system that relies on such behavior therefore violates the rule of additive changes. Decorating the member is a much better alternative, as that makes it clear that you are indeed modifying something, which might break existing functionality.

Not all languages or frameworks provide this possibility though. In that case, the only alternative is good communication and documentation!

Refactoring

Many engineers implicitly assume that the rule of additive changes holds. In his book “Working Effectively With Legacy Code”, Micheal Feathers proposes the sprout and wrap techniques to change legacy software. The underlying technique is the same for both: formulating a potentially breaking change as mostly additive, with only a small incision point. In the presence of systems that do not follow the rule of additive changes, such risk minimization does not work at all. For example, adding additional function can break a system that relies heavily on introspection – which goes against all intuition.

Conclusion

This rule is not a new concept. It is something that many programmers have in their head already, but possibly fractured into lots of smaller guidelines. But it is one overarching concept and it needs a name to be accessible as such. For me, that makes things a lot clearer when reasoning about systems at large.

Don’t be too alarmed by the title. Functions are immortal concepts and there’s nothing wrong with a getter method. Except when you write code under the rules of the Object Calisthenics (rule number 9 directly forbids getter and setter methods). Or when you try to adhere to the ideal of encapsulation, a cornerstone of object-oriented programming. Or when your code would really benefit from some other design choices. So, most of the time, basically. Nobody dies if you write a getter method, but you should make a concious decision for it, not just write it out of old habit.

One thing the Object Calisthenics can teach you is the immediate effect of different design choices. The rules are strict enough to place a lot of burden on your programming, so you’ll feel the pain of every trade-off. In most of your day-to-day programming, you also make the decisions, but don’t feel the consequences right away, so you get used to certain patterns (habits) that work well for the moment and might or might not work in the long run. You should have an alternative right at hands for every pattern you use. Otherwise, it’s not a pattern, it’s a trap.

Some alternatives

Here is an incomplete list of common alternatives to common patterns or structures that you might already be aware of:

if-else statement (explicit conditional): You can replace most explicit conditionals with implicit ones. In object-oriented programming, calling polymorphic methods is a common alternative. Instead of writing if and else, you call a method that is overwritten in two different fashions. A polymorphic method call can be seen as an implicit switch-case over the object type.

else statement: In the Object Calisthenics, rule 2 directly forbids the usage of else. A common alternative is an early return in the then-block. This might require you to extract the if-statement to its own method, but that’s probably a good idea anyway.

for-loop: One of the basic building blocks of every higher-level programming language are loops. These explicit iterations are so common that most programmers forget their implicit counterpart. Yeah, I’m talking about recursion here. You can replace every explicit loop by an implicit loop using recursion and vice versa. Your only limit is the size of your stack – if you are bound to one. Recursion is an early brain-teaser in every computer science curriculum, but not part of the average programmer’s toolbox. I’m not sure if that’s a bad thing, but its an alternative nonetheless.

setter method: The first and foremost alternative to a state-altering operation are immutable objects. You can’t alter the state of an immutable, so you have to create a series of edited copies. Syntactic sugar like fluent interfaces fit perfectly in this scenario. You can probably imagine that you’ll need to change the whole code dealing with the immutables, but you’ll be surprised how simple things can be once you let go of mutable state, bad conscience about “wasteful” heap usage and any premature thought about “performance”.

Keep in mind that most alternatives aren’t really “better”, they are just different. There is no silver bullet, every approach has its own advantages and drawbacks, both shortterm and in the long run. Your job as a competent programmer is to choose the right approach for each situation. You should make a deliberate choice and probably document your rationale somewhere (a project-related blog, wiki or issue tracker comes to mind). To be able to make that choice, you need to know about the pros and cons of as much alternatives as you can handle. The two lamest rationales are “I’ve always done it this way” and “I don’t know any other way”.

An alternative for get

In this blog post, you’ll learn one possible alternative to getter methods. It might not be the best or even fitting for your specific task, but it’s worth evaluating. The underlying principle is called “Tell, don’t Ask”. You convert the getter (aka asking the object about some value) to a method that applies a function on the value (aka telling the object to work with the value). But what does “applying” mean and what’s a function?

A function is defined as a conversion of some input into some output, preferably without any side-effects. We might also call it a mapping, because we map every possible input to a certain output. In programming, every method that takes a parameter (or several of them) and returns something (isn’t void) can be viewed as a function as long as the internal state of the method’s object isn’t modified. So you’ve probably programmed a lot of functions already, most of the time without realizing it.

In Java 8 or other modern object-oriented programming languages, the notion of functions are important parts of the toolbox. But you can work with functions in Java since the earliest days, just not as convenient. Let’s talk about an example. I won’t use any code you can look at, so you’ll have to use your imagination for this. So you have a collection of student objects (imagine a group of students standing around). We want to print a list of all these students onto the console. Each student object can say its name and matriculation number if asked by plain old getters. Damn! Somebody already made the design choice for us that these are our duties:

Iterate over all student objects in our collection. (If you don’t want to use a loop for this you know an alternative!)

Ask each student object about its name and matriculation number.

Carry the data over to the console object and tell the console to print both informations.

But because this is only in our imagination, we can go back in imagined time and eliminate the imagined choice for getters. We want to write our student objects without getters, so let’s get rid of them! Instead, each student object knows about their name and matriculation number, but cannot be asked directly. But you can tell the student object to supply these informations to the only (or a specific) method of an object that you give to it. Read the previous sentence again (if you’ve not already done it). That’s the whole trick. Our “function” is an object with only one method that happens to have exactly the parameters that can be provided by the student object. This method might return a formatted string that we can take to the console object or it might use the console itself (this would result in no return value and a side effect, but why not?). We create this function object and tell each student object to use it. We don’t ask the student object for data, we tell it to do work (Tell, don’t Ask).

In this example, the result is the same. But our first approach centers the action around our “main” algorithm by gathering all the data and then acting on it. We don’t feel pain using this approach, but we were forced to use it by the absence of a function-accepting method and the presence of getters on the student objects. Our second approach prepares the action by creating the function object and then delegates the work to the objects holding the data. We were able to use it because of the presence of a function-accepting method on the student objects. The absence of getters in the second approach is a by-product, they simply aren’t necessary anymore. Why write getters that nobody uses?

We can observe the following characteristics: In a “traditional”, imperative style with getters, the data flows (gets asked) and the functionality stays in place. In a Tell, don’t Ask style with functions, the data tends to stay in place while the functionality gets passed around (“flows”).

Weighing the options

This is just one other alternative to the common “imperative getter” style. As stated, it isn’t “better”, just maybe better suited for a particular situation. In my opinion, the “functional operation” style is not straight-forward and doesn’t pay off immediately, but can be very rewarding in the long run. It opens the door to a whole paradigm of writing sourcecode that can reveal inherent or underlying concepts in your solution domain a lot clearer than the imperative style. By eliminating the getter methods, you force this paradigm on your readers and fellow developers. But maybe you don’t really need to get rid of the getters, just reduce their usage to the hard cases.

So the title of this blog post is a bit misleading. Every time you write a getter, you’ve probably considered all alternatives and made the informed decision that a getter method is the best way forward. Every time you want to change that decision afterwards, you can add the function-accepting method right alongside the getters. No need to be pure or exclusive, just use the best of two worlds. Just don’t let the functions die (or never be born) because you “didn’t know about them” or found the style “unfamiliar”. Those are mere temporary problems. And one of them is solved right now. Happy coding!

Many modern programming languages offer a way declare variables without an explicit type if the type can be inferred, either dynamically or statically. Many also allow for variables to be explicitly defined with a type. For example, Scala and C# let you omit the explicit variable type via the var keyword, but both also allow defining variables with explicit types. I’m coming from the C++ world, where “auto” is available for this purpose since the relatively recent C++11. However, people are still debating whether you should actually use it.

Pros

Herb Sutter popularised the almost-always-auto style. He advocates that using more type inference is good because it is roughly equivalent to programming against interfaces instead of implementations. He says that “Overcommitting to explicit types makes code less generic and more interdependent, and therefore more brittle and limited.” However, he also mentions that you might sometimes want to use explicit types.

Now what exactly is overcommiting here? When is the right time to use explicit types?

Cons

Opponents to implicit typing, many of them experienced veterans, often state that they want the actual type visible in the source code. They don’t want to rely on type inference being right. They want the code to explicitly state what’s going on.

At first, I figured that was just conservatism in the face of a new “scary” feature that they did not fully understand. After all, IDEs can usually infer the type on-the-fly and you can hover on a variable to let it show you the type.

For C++, the function signature is a natural boundary where you often insert explicit types, unless you want to commit to the compile time and physical dependency cost that comes with templates. Other languages, such as Groovy, do not have this trade-off and let you skip explicit types almost everywhere. After working with Groovy/Grails for a while, where the dominant style seems to be to omit types whereever possible, it dawned on me that the opponents of implicit typing have a point. Not only does the IDE often fail to show me the inferred type (even though it still works way more often than I would have anticipated), but I also found it harder to follow and modify code that did not mention explicit types. Seemingly contrary to Herb Sutter’s argument, that code felt more brittle than I had liked.

Middle-ground

As usual, the truth seems to be somewhere in the middle. I propose the following rule for when to use explicit types:

Explicit typing for domain-types

Implicit typing everywhere else

Code using types from the problem domain should be as specific as possible. There’s no need for it to be generic – it’s actually counter-productive, as otherwise the code model would be inconsistent with model of the problem domain. This is also the most important aspect to grok when reading code, so it should be explicit. The type is as important as the action on it.

On the other hand, for pure-fabrication types that do not respresent a concept in the domain, the action is important, while the type is merely a means to achieve this action. Typically, most of the elements from a language’s standard library fall into this category. All your containers, iterators, callables. Their types are merely implementation details: an associative container could be an array, or a hash-map or a tree structure. Exchanging it rarely changes the meaning of the code in the problem domain – it just changes its performance characteristics.

Containers will occasionally contain domain-types in their type. What do you do about those? I think they belong in the “everywhere else” catergory, but you should be take extra care to name the contained type when working with it – for example when declaring the variable of the for-each loop on it, or when inserting something into it. This way, the “collection of domain-type” aspect will become clear, but the specific container implementation will stay implicit – like it should.

In the previous part, I’ve shown my guidelines for setting up compilation units. When writing simple application code with C++11, either classes or free-functions should be your main building blocks. Therefor, in this part, I will focus on what to look out for when writing class declarations.

While templates can be very useful, they do not scale well as the code base gets larger. Metaprogramming or other niche styles have their places, too, but I like to look at those as a means to create language extensions rather than principal implementation tools.

Avoid inline implementations

…especially in header files. It can be tempting to write classes solely in the header file. In fact, it has almost become a sign of quality for parts of C++ code to be header only. But this scales badly in most cases, and evolving such a code-base will result in a dramatic explosion of compile times. Always splitting classes into a declaration and definition acts as a first-level compile- firewall and dependency-breaker. Users of your class no longer need to worry about changes in the implementation of the member functions of that class. Note that those changes are often indirect: a change only affects a class that is used in the implementation of your class’ member functions. By splitting the declaration and definition, users of your class do not have to be recompiled.

But why stop at the compiler? The same argument holds for programmers. If you start to split interface and implementation on this level, you automatically provide ‘reader-firewalls’ as well. By just providing a clean header file, you are giving readers sort of a manual for your class. No need to look at the implementation at all, if the interface is well-defined.

Inline code definition is also the main reason against excessive use of templates. Yes, they grant a lot of flexibility, but you pay a hefty price which needs to be justified by an enormous reduction of complexity elsewhere. In general, templates are a bit too powerful for their own good, which is why they need extra moderation.

Always declare implicit functions

Implicitly declared functions seem comfortable, but they have a few implications that are hard to understand. First of, if an implicit function gets generated for your class, it will be generated as inline. This means that the implementation becomes a dependency to all users of your class. This can have very subtle effects such as this:

On the surface, it looks like there should be no dependency (other than the name) on MyEntry when including this header. But there is!
The destructor is not declared so it will get generated – as inline. Because deletion of a vector requires the held type to be complete, any place that needs to be able to destruct a MyEntryManager also needs to know how to destruct MyEntry, which is not intended at all. Remember there’s a total of six functions that can be implicitly generated! Because of that, there are analogous problems for copy-construction, assignment, move-construction and move-assignment.

To avoid these problems, either delete the function explicitly in the header, default it in the implementation file, or actually implement it. You rarely need to do the latter, so I advise to default all the ones you need, and delete the rest:

This has another nice side effect because the vector-template gets instantiated into that object file and does not “bloat” all use-sites.

Exactly one public function and one private data section per class

..starting with the public section. This is where you address the next programmer that has to read your class. And it should be the only place for him to look.

I avoid private member functions because they cannot be tested easily and can add hidden compile-time dependencies to a project. Why should a user of your class recompile if you change an implementation detail? For small and trivial implementation helpers, the unnamed-namespace in the implementation file is a much better place. If those helpers become larger or more complex, it is a better idea to implement them in a collaborating class, which can be tested and reused.

Protected member functions split your interface to two parts, one exclusively for derived classes and one for everyone (including derived classes). This is very rarely needed, and in almost all of those cases, a separate interface will scale better (although it is slightly harder to implement).

Either an interface or an implementation

So far, I have left inheritance out of the picture and only talked about concrete classes. Inheritance is actually rarely needed, composition often suffices. But if it is needed, make sure that a class is either concrete and final (implementations), or has a complete and minimal set of pure-virtual member functions (interfaces). This will result in shallow hierarchies and easily understood interfaces. Remember that inheritance is not a tool for sharing code from the classes you implement, but for the code using those classes – i.e. where the Liskov Substition Principle holds.

Now it gets really easy to implement new classes in the hierarchy: Just implement all the functions in the interface. No more questioning whether to leave the default behaviour or override. You will also automatically tend towards clearer separation of components – things that need to be polymorphic move to the interface, other functionality merely uses it.

This pattern is useful even when polymorphy is not needed. Such small interfaces devoid of any implementation detail can act as another compiler firewall. Collaborators can work with just the interface and do not have to be recompiled when the implementation changes. Also, the interface can be implemented for mock or fake objects in testing.

Conclusion

This concludes the second part of the series. I originally intended it to be about how to write a whole class, but that would have been too much to digest for one post. I am well aware that some of these guidelines can stir quite the controversy in the C++ community. For example, declaring the implicit functions seems to be in conflict with the recently popular rule of zero. Scott Meyers had similar concerns, but does not quite touch the inline aspect.

For me personally, these guidelines have helped tremendously, especially when scaling to bigger code-bases. But as before, I am curious what others are thinking about this!