A young co-worker who was studying OO has asked me why every object is passed by reference, which is the opposite of primitive types or structs. It is a common characteristic of languages such as Java and C#.

I couldn't find a good answer for him.

What are the motivations for this design decision?
Were developers of these languages tired of having to create pointers and typedefs every time?

Are you asking why Java and C# have you pass parameters by reference instead of by value, or by reference instead of by pointer?
–
robertMar 11 '11 at 15:29

@Robert, there is on the high level any difference between "reference instead of by pointer"? Do you think I should change the title to something like 'why object always are reference?" ?
–
Gustavo CardosoMar 11 '11 at 16:09

@Anto: A Java reference is in all ways identical to a properly used C pointer (properly used: not type-cast, not set to invalid memory, not set by a literal).
–
Zan LynxMar 11 '11 at 20:24

3

Also to be really pedantic, the title is incorrect (at least as far as .net is concerned). Objects are NOT passed by reference, references are passed by value. When you pass a object to a method the reference value is copied to a new reference within the method body. I think its a shame that "objects are passed by reference" has entered the list of common programmer quotes when it is incorrect and leads to a poorer understanding of references for new programmers starting out.
–
SecretDeveloperMar 11 '11 at 21:04

@S.Lott: No, it makes sense in OO terms to pass by reference because, semantically, you don't want to make copies of objects. You want to pass the object rather than a copy of it. If you're passing by value, it breaks the OO metaphor somewhat because you've got all these clones of objects being generated all over the place that don't make sense at a higher level.
–
intuitedMar 11 '11 at 20:23

@Gustavo: I think we are arguing the same point. You mention semantics of OOP and refer to the metaphor of OOP being additional reasons to my own. It seems to me that the creators of the OOP made it the way they did to "minimize memory consumption" and "Save on CPU time"
–
TimMar 14 '11 at 11:54

We've eliminated the redundant copy, but now we've introduced another problem: we've created an object on the heap that won't get automatically destroyed. We have to deal with it ourselves:

MyClass someObj = getNewObject();
delete someObj;

Knowing who is responsible for deleting an object allocated in this way is something that can only be communicated by comments or by convention. It easily leads to memory leaks.

Lots of workaround have been suggested to solve these two issues - return value optimisation (in which the compiler is smart enough not to create the redundant copy in return-by-value), passing a reference to the method (so the function injects into an existing object rather than creating a new one), smart pointers (so that the question of ownership is moot).

The Java/C# creators realised that always returning object by reference was a better solution, especially if the language supported it natively. It ties into a lot of other features the languages have, such as garbage collection, etc.

Return-by-value is bad enough, but pass-by-value is even worse when it comes to objects, and I think that was the real problem they were trying to avoid.
–
Mason WheelerMar 11 '11 at 15:48

for sure you have a valid point. But the OO designs problem that @Mason pointed was the final motivation of the change. There were no meaning to keep the difference between reference and value when you just want to use the reference.
–
Gustavo CardosoMar 11 '11 at 16:21

Many other answers have good info. I'd like to add one important point about cloning that's only been partially addressed.

Using references is smart. Copying things is dangerous.

As others have said, in Java, there is no natural "clone". This is not just a missing feature. You never want to just willy-nilly* copy (whether shallow or deep) every property in an object. What if that property was a database connection? You can't just "clone" a database connection anymore than you can clone a human. Initialization exists for a reason.

Deep copies are a problem of their own - how deep do you really go? You definitely couldn't copy anything that is static (including any Class objects).

So for the same reason why there is no natural clone, objects that are passed as copies would create insanity. Even if you could "clone" a DB connection - how would you now ensure that it is closed?

* See the comments - By this "never" statement, I mean an auto-clone that clones every property. Java didn't provide one, and it's probably not a good idea for you as a user of the language to create your own, for the reasons listed here. Cloning only non-transient fields would be a start, but even then you'd need to be diligent about defining transient where appropriate.

I have trouble understanding the jump from good objections to cloning in certain conditions to the statement that it is never needed. And I have encountered situations where an exact duplicate was needed, where no static functions where involved, no IO or open connections could be at issue... I understand the risks of cloning, but I can't see the blanket never.
–
IncaMar 11 '11 at 18:50

2

@Inca - You may be misunderstanding me. Intentionally implemented clone is fine. By "willy-nilly" I mean copying all properties without thinking about it -- without purposeful intent. The Java language designers forced this intent by requiring user-created implementation of clone.
–
NickCMar 11 '11 at 19:11

Using references to immutable objects is smart. Making simple values like Date mutable, and then creating multiple references to them isn't.
–
kevin clineJul 3 '12 at 0:38

@NickC: The main reason "cloning things willy nilly" is dangerous is that languages/frameworks like Java and .net don't have any means of indicating declaratively whether a reference encapsulates mutable state, identity, both, or neither. If field contains an object reference that encapsulates mutable state but not identity, cloning the object requires that the the object holding the state be duplicated, and a reference to that duplicate stored in the field. If the reference encapsulates identity but not mutable state, the field in the copy must refer to the same object as in the original.
–
supercatAug 30 '12 at 20:44

Objects are always referenced in Java. They are never passed around themselves.

One advantage is that this simplifies the language. A C++ object can be represented as a value or a reference, creating a need to use two different operators to access a member: . and ->. (There are reasons why this can't be consolidated; for example, smart pointers are values that are references, and have to keep those distinct.) Java only needs ..

Another reason is that polymorphism has to be done by reference, not value; an object treated by value is just there, and has a fixed type. It's possible to screw this up in C++.

Also, Java can switch the default assignment/copy/whatever. In C++, it's a more or less deep copy, while in Java it's a simple pointer assignment/copy/whatever, with .clone() and such in case you need to copy.

Sometimes it become really ugly when you use '(*object)->'
–
Gustavo CardosoMar 11 '11 at 15:56

1

It's worth noting that C++ distinguishes between pointers, references and values. SomeClass* is a pointer to an object. SomeClass& is a reference to an object. SomeClass is a value type.
–
AntMar 11 '11 at 15:59

I have already asked @Rober on the initial question, but I'll do it here too: the difference between * and & on C++ is just a low level technal thing, isn't? Are they, on high level, semantically they are the same.
–
Gustavo CardosoMar 11 '11 at 16:17

3

@Gustavo Cardoso: The difference is semantic; on a low technical level they're generally identical. A pointer points to an object, or is NULL (a defined bad value). Unless const, its value can be changed to point to other objects. A reference is another name for an object, cannot be NULL, and cannot be reseated. It's generally implemented by simple use of pointers, but that's an implementation detail.
–
David ThornleyMar 11 '11 at 16:22

Your initial statement about C# objects being passed by reference is not correct. In C#, objects are reference types, but by default they are passed by value just like value types. In the case of a reference type, the "value" that is being copied as a pass-by-value method parameter is the reference itself, so changes to properties inside a method will be reflected outside the method scope.

However, if you were to re-assign the parameter variable itself inside a method, you will see that this change is not reflected outside the method scope. In contrast, if you actually pass a parameter by reference using the ref keyword, this behavior works as expected.

The designers of Java and alike languages wanted to apply the "everything is an object" concept. And passing data as reference is very quick and doesn't consume much memory.

Additional extended boring comment

Altougth, those languages use object references (Java, Delphi, C#, VB.NET, Vala, Scala, PHP), the truth is that object references are pointers to objects in disguise. The null value, the memory allocation, the copy of a reference without copying the entire data of an object, all of them are object pointers, not plain objects !!!

In Object Pascal (not Delphi), anc C++ (not Java, not C#), an object can be declared as an static allocated variable, and also with a dynamic allocated variable, thru the use of a pointer ("object reference" without "sugar syntax"). Each case use certain syntax, and there is not way to get confused as in Java "and friends". In those languages, an object can be both passed as value or as reference.

The programmer knows when a pointer syntax is required, and when is not required, but in Java and alike languages, this is confusing.

Before Java existed or became mainstream, many programmers learnt O.O. in C++ without pointers, passing by value or by reference when required. When switched from learning to business apps., then, they commonly use object pointers. The Q.T. library is good example of that.

When I learnt Java, I tried to follow the everything is an object concept, but got confused at coding. Eventually, I said "ok, this are objects dynamically allocated with a pointer with the syntax of a statically allocated object", and didn't have trouble to code, again.

Because otherwise, the function should be able to automatically create a (obviously deep) copy of any kind of object that is passed to it. And usually it can't guess out to make it. So you would have to define the copy/clone method implementation for all of your objects/classes.

Copying primitive types is trivial, it usually translates to one machine instruction.

Copying objects is not trivial, the object can contain members that are objects themselves. Copying objects is expensive in CPU time and memory. There are even multiple ways of copying an object depending on the context.

Passing objects by reference is cheap and it also becomes handy when you want to share/update the object information between multiple clients of the object.

Complex data structures (especially those that are recursive) require pointers. Passing objects by reference is just a safer way of passing pointers.

Java and C# do take control over low-level memory from you. The "heap" where the objects you create resides lives its own life; for instance, garbage collector reaps objects whenever it prefers.

Since there is a separate layer of indirection between your program and that "heap", the two ways to refer to an object, by value and by pointer (like in C++), become indistinguishable: you always refer to objects "by pointer" to somewhere in the heap. That's why such design approach makes pass-by-reference the default semantics of assignment. Java, C#, Ruby, et cetera.

The above only concerns imperative languages. In the languages mentioned above the control over the memory is passed to the runtime, but the language design also says "hey, but actually, there is the memory, and there are the objects, and they do occupy the memory". Functional languages abstract even further, by excluding the concept of "memory" from their definition. That's why pass-by-reference doesn't necessarily apply to all of the languages where you don't control the low-level memory.

Because Java was designed as a better C++, and C# was designed as a better Java, and the developers of these languages were tired of the fundamentally broken C++ object model, in which objects are value types.

Two of the three fundamental principles of object-oriented programming are inheritance and polymorphism, and treating objects as value types instead of reference types wreaks havoc with both. When you pass an object to a function as a parameter, the compiler needs to know how many bytes to pass. When your object is a reference type, the answer is simple: the size of a pointer, same for all objects. But when your object is a value type, it has to pass the actual size of the value. Since a derived class can add new fields, this means sizeof(derived) != sizeof(base), and polymorphism goes out the window.

The output of this program is not what it would be for an equivalent program in any sane OO language, because you can't pass a derived object by value to a function expecting a base object, so the compiler creates a hidden copy constructor and passes a copy of the Parent part of the Child object, instead of passing the Child object like you told it to do. Hidden semantic gotchas like this are why passing objects by value should be avoided in C++ and is not possible at all in almost every other OO language.

Very good point. However, I focused on return problems as working around them takes quite a bit of effort; this program can be fixed with the addition of a single ampersand: void foo(Parent& a)
–
AntMar 11 '11 at 15:54

The answer is in the name (well almost anyways). A reference (like an address) just refers to something else, a value is another copy of something else. I'm sure someone has probably mentioned something to the following effect but there will be circumstances in which one and not the other is suitable (Memory security vs Memory Efficiency).
It's all about managing the memory, memory, memory...... MEMORY!
:D

In OO Programming, you may create a larger Derived class from a Base one, and then pass it to functions expecting a Base one. Pretty trivial eh ?

Except that the size of the argument of a function is fixed, and determined at compile-time. You can argue all you want, executable code is like this, and languages need be executed at one point or another (purely interpreted languages are not limited by this...)

Now, there is one piece of data that is well-defined on a computer: the address of a memory cell, usually expressed as one or two "words". It's visible either as pointers or references in programming languages.

So in order to pass objects of arbitrary length, the most simple thing to do is to pass a pointer/reference to this object.

This is a technical limitation of OO Programming.

But since for large types, you generally prefer passing references anyway to avoid copying, it's not generally considered a major blow :)

There is one important consequence though, in Java or C#, when passing an object to a method, you have no idea whether your object will be modified by the method or not. It makes debugging / parallelization that harder, and this is the issue Functional Languages and the Transparential Referency are trying to address --> copying is not that bad (when it makes sense).