In class today, my professor was discussing how to structure a class. The course primarily uses Java and I have more Java experience than the teacher (he comes from a C++ background), so I mentioned that in Java one should favor immutability. My professor asked me to justify my answer, and I gave the reasons that I've heard from the Java community:

Safety (especially with threading)

Reduced object count

Allows certain optimizations (especially for garbage collector)

The professor challenged my statement by saying that he'd like to see some statistical measurement of these benefits. I cited a wealth of anecdotal evidence, but even as I did so, I realized he was right: as far as I know, there hasn't been an empirical study of whether immutability actually provides the benefits it promises in real-world code. I know it does from experience, but others' experiences may differ.

So, my question is, have there been any statistical studies done on the effects of immutability in real-world code?

I don't have any answers here, but I feel I must point out that this is not always true - there are some types of classes that this works for, and some very much not. It takes no stretch of imagination to realize that for some types of classes, immutability will actually increase the object count. Not to mention that sometimes it just wouldnt make sense. That said, the other benefits would still remain.
–
AviDSep 29 '09 at 15:27

5

Reduced object count? Why would that be? I'd expect the opposite, since on immutable datastructures every operation creates a new object.
–
sepp2kSep 29 '09 at 15:29

Interesting question! I agree that from my experience, immutability is extremely useful, though all of my evidence is also anecdotal. Another benefit that you didn't mention is that code that uses immutable objects is generally easier to reason about.
–
Laurence GonsalvesSep 29 '09 at 15:30

3

sepp2k: I assume he's referring to the fact that you don't need to make defensive copies.
–
Laurence GonsalvesSep 29 '09 at 15:31

2

@seep2k, the argument for reduced object count is that there will be less defensive copying. Consider String - if it were mutable, every method taking one would be making copies.
–
YishaiSep 29 '09 at 15:32

9 Answers
9

I would point to Item 15 in Effective Java. The value of immutability is in the design (and it isn't always appropriate - it is just a good first approximation) and design preferences are rarely argued from a statistical point of view, but we have seen mutable objects (Calendar, Date) that have gone really bad, and serious replacements (JodaTime, JSR-310) have opted for immutability.

+1: awesome link :) And I completely agree: I've never seen anyone say "statistics prove this design is superior to this other design", but I have seen a lot of "such and such is a time-tested best practice in most applications".
–
JulietOct 1 '09 at 15:57

The biggest advantage of immutability in Java, in my opinion, is simplicity. It becomes much simpler to reason about the state of an object, if that state cannot change. This is of course even more important in a multi-threaded environment, but even in simple, linear single-threaded programs it can make things far easier to understand.

For what its worth, the ability for the compiler to optimize immutable objects is well-documented. Off the top of my head:

The Haskell compiler performs deforestation (also called short-cut fusion), where Haskell will transform the expression map f . map g to map f . g. Since Haskell functions are immutable, these expressions are guaranteed to produce equivalent output, but the second function runs twice as fast since we don't need to create an intermediate list.

Common subexpression elimination where we could convert x = foo(12); y = foo(12) to temp = foo(12); x = temp; y = temp; is only possible if the compiler can guarantee foo is a pure function. To my knowledge, the D compiler can perform substitutions like this using the pure and immutable keywords. If I remember correctly, some C and C++ compilers will aggressively optimize calls to these functions marked "pure" (or whatever the equivalent keyword is).

So long as we don't have mutable state, a sufficiently smart compiler can execute linear blocks of code multiple threads with a guarantee that we won't corrupt the state of variables in another thread.

Regarding concurrency, the pitfalls of concurrency using mutable state are well-documented and don't need to be restated.

Sure, this is all anecdotal evidence, but that's pretty much the best you'll get. The immutable vs mutable debate is largely a pissing match, and you are not going to find a paper making a sweeping generalization like "functional programming is superior to imperative programming".

At most, you'll probably find that you can summarize the benefits of immutable vs mutable in a set of best practices rather than as codified studies and statistics. For example, mutable state is the enemy of multithreaded programming; on the other hand, mutable queues and arrays are often easier to write and more efficient in practice than their immutable variants.

It takes practice, but eventually you learn to use the right tool for the job, rather than shoehorning your favorite pet paradigm into project.

I think your professor's being overly stubborn (probably deliberately, to push you to a fuller understanding). Really the benefits of immutability are not so much what the complier can do with optimisations, but really that it's much easier for us humans to read and understand. A variable that is guaranteed to be set when the object is created and is guaranteed not to change afterwards, is much easier to grok and reason with than one which is this value now but might be set to some other value later.

This is especially true with threading, in that you don't need to worry about processor caches and monitors and all that boilerplate that comes with avoiding concurrent modifications, when the language guarantees that no such modification can possibly occur.

And once you express the benefits of immutability as "the code is easier to follow", it feels a bit sillier to ask for empirical measurements of productivity increases vis-a-vis "easier-to-followness".

On the other hand, the compiler and Hotspot can probably perform certain optimisations based on knowing that a value can never change - like you I have a feeling that this would take place and is a good things but I'm not sure of the details. It's a lot more likely that there will be empirical data for the types of optimisation that can occur, and how much faster the resulting code is.

What would you objectively measure? GC and object count could be measured with mutable/immutable versions of the same program (although how typical that would be would be subjective, so this is a pretty weak argument). I can't imagine how you could measure the removal of threading bugs, except maybe anecdotally by comparison with a real world example of a production application plagued by intermittent issues fixed by adding immutability.

which should print "Processed 536.21 rows/s". How do you plan to implement count() with an immutable? Even if you use an immutable value object for the counter itself, s can't be immutable since it would have to replace the counter object inside of itself. The only way out would be:

s = s.count ();

which means to copy the state of s for every round in the loop. While this can be done, it surely isn't as efficient as incrementing the internal counter.

Moreover, most people would fail to use this API right because they would expect count() to modify the state of the object instead of returning a new one. So in this case, it would create more bugs.

Most of the points you raised could be solved by proper documentation of the Stats class.
–
KevlarSep 29 '09 at 15:55

I certainly wouldn't expect a method called "count" to modify the state of the object it was invoked on. I'd expect it to count something and return the result. If you're going to have side-effects, use a name that makes it obvious. eg: "increment" would probably be a better choice in this case.
–
Laurence GonsalvesSep 29 '09 at 16:11

@Aaron: To back up Kevlar's point, your example doesn't seem anymore confusing than the way string.Replace(...) returns a new object, rather than mutating a string in place. Moreover, functional programming style discourages the use of variables, so you probably wouldn't see "s = s.count()" too often. Its much more idiomatic to use fold_left: let res = List.fold_left (fun s x -> s.process(x) ) processor items; res.count()
–
JulietSep 29 '09 at 16:31

I needed a simple example. As for functional programming: Most people can't reprogram their brain to get used to it. So the question remains how useful immutables can/should be in an imperative language. My opinion: Immutables are nice to solve a couple of cases but for others, they are more problem than solution.
–
Aaron DigullaSep 29 '09 at 16:39

As other comments have claimed, it would be very, very hard to collect statistics on the merits of immutable objects, because it would be virtually impossible to find control cases - pairs of software applications which are alike in every way, except that one uses immutable objects and the other does not. (In nearly every case, I would claim that one version of that software was written some time after the other, and learned numerous lessons from the first, and so improvements in performance will have many causes.) Any experienced programmer who thinks about this for a moment ought to realize this. I think your professor is trying to deflect your suggestion.

Meanwhile, it is very easy to make cogent arguments in favor of immutability, at least in Java, and probably in C# and other OO languages. As Yishai states, Effective Java makes this argument well. So does the copy of Java Concurrency in Practice sitting on my bookshelf.

Immutable objects allow code which to share an object's value by sharing a reference. Mutable objects, however, have the identity that code which wants to share an object's identity to do so by sharing a reference. Both kinds of sharing are essential in most applications. If one doesn't have immutable objects available, it's possible to share values by copying them into either new objects or objects supplied by the intended recipient of those values. Getting my without mutable objects is much harder. One could somewhat "fake" mutable objects by saying stateOfUniverse = stateOfUniverse.withSomeChange(...), but would requires that nothing else modify stateOfUniverse while its withSomeChange method is running [precluding any sort of multi-threading]. Further, if one were e.g. trying to track a fleet of trucks, and part of the code was interested in one particular truck, it would be necessary for that code to always look up that truck in a table of trucks any time it might have changed.

A better approach is to subdivide the universe into entities and values. Entities would have changeable characteristics, but an immutable identity, so a storage location of e.g. type Truck could continue to identify the same truck even as the truck itself changes position, loads and unloads cargo, etc. Values would not have generally have a particular identity, but would have immutable characteristics. A Truck might store its location as type WorldCoordinate. A WorldCoordinate that represents 45.6789012N 98.7654321W would continue to so as long as any reference to it exists; if a truck that was at that location moved north slightly, it would create a new WorldCoordinate to represent 45.6789013N 98.7654321W, abandon the old one, and store a reference to that new one.

It is generally easiest to reason about code when everything encapsulates either an immutable value or an immutable identity, and when the things which are supposed to have an immutable identity are mutable. If one didn't want to use any mutable objects outside a variable stateOfUniverse, updating a truck's position would require something like: