Move Semantics in C++11, Part 1: A New Way of Thinking About Objects

Not every resource transfer is a copy operation. In many programming tasks, the resource only moves from one object to another, emptying the source object in the process. The semantics and formal properties of these 'move semantics' are a new C++11 paradigm to make code more efficient and simulate real-world situations more accurately, as Danny Kalev explains in this two-part series.

Like this article? We recommend

Like this article? We recommend

Copying objects is an operation that takes a source and a target, yielding two independent objects with identical states. However, many real-life entities don't behave like that—you can move them, but not copy them. For example, when you relocate to a new home, you just end up having a new home after the relocation—not two homes with identical residents and furniture. (Even if you own multiple houses, you can only be in one of them at a time. Moving to a new home doesn't clone you.) In this series, I'll introduce you to move semantics and discuss the formal definitions of copying and moving objects in C++.

The Semantics of Copying

Graphical user interface (GUI) operating systems let you select objects and cut or copy them to a new destination. Cutting moves the original object to a new destination, whereas copying creates a new object, leaving the source object in its original place. This is the classic difference between moving and copying operations. Let's look at other file-oriented operations to pinpoint the differences between move and copy operations.

Suppose you're downloading an MP3 clip from iTunes. At the end of the process you have a local copy of the clip, while the iTunes server still retains its original file. These two files have identical and independent states. Therefore, downloading a file from iTunes is a copy operation. The state of an object is defined as the set of its non-static data members' values. In other words, an object's state is its value. Among other things, an object's state indicates which resources it owns. For example, a string object that allocates memory has a non-null pointer member and a counter that keeps track of the buffer's size. Similarly, an MP3 file's state is the set of values of its data bytes.

For this discussion, it's crucial to understand what it means for two objects to have both identical and independent states:

Identical states. Two objects o1 and o2 have identical states if each non-static data member in o1 has the same value as does its corresponding data member in o2.

Independent states. Two identical objects have independent states if changing the state of one object doesn't affect the state of the other object. For example, if iTunes deleted or altered its file after you downloaded it, your private copy would remain intact, and vice versa.

Let's apply the state independence principle to two strings:

char s1[]="abc";
char s2[]="abc";

The states of s1 and s2 are identical. Furthermore, modifying the state of s1 doesn't affect the state of s2. Hence, the strings have independent states, as the following listing shows:

s1[0]='d';
cout<<s1<<'\t'<<s2<<endl;

The output of the statement above is as follows:

dbc abc

In certain cases, you can have two or more objects whose states are identical but not independent. For example, the smart pointer std::shared_ptr shares its resource among all of its instances. Changing the resource through one of the instances affects every other instance automatically. Therefore, what appears to be a copy operation of shared_ptr objects isn't a pure copy operation, because the resulting objects don't have independent states.

Now that we've defined pure copy semantics, let's look at another real-world scenario. Suppose you're visiting a gallery of paintings, and you're so impressed that you decide to purchase one of the paintings for your living room. Is purchasing an original painting a copy operation? No. After your purchase, that object is in the same state as before the purchase, although it has moved from the gallery to your home. Let's try to imagine a scenario in which purchasing a painting would count as copying. Instead of selling you the original painting, suppose the gallery offered to sell you a replica, made on demand from the original painting. In that case, the purchasing operation would indeed be a copy operation, as it would generate a new object with the same content, while leaving the original painting on the gallery wall.

TIP

Another move operations was involved in that transaction—money moved from your bank account to the artist's.

Back to strings. A trivial example of moving strings might look like this:

This move operation transferred the ownership of the memory buffer from s1 to s2. Consequently, you have only one object with the original state, except that the owning object is now s2 rather than s1. As with copying, moving involves two objects, but the end result is that only the target object has the desired state. After the move operation, the source object's state is unspecified. In general, you shouldn't assume that a moved-from object has retained its pre-move state.

NOTE

We'll revisit the issue of a moved-from object's state in part 2 of this series.

Time to Move

The topic of move semantics is interesting, but why would you want your programming language to support it? Because copying objects is an expensive operation. It requires a lot of memory and a large number of CPU cycles, especially when large objects such images, video clips, or census data files are involved. You may be surprised to learn that C++ copies objects silently in many cases. For example, take a function that returns an object by value:

Consider this question: How many copies of the local object res are created and destroyed when the statement string s=concat (one, two); is executed? Too many, that's for sure! When concat() returns, it copy-constructs a temporary string object on the caller's stack, and the local string res is destroyed. Next, the implementation copy-constructs s using the temporary string as its argument. Finally, the temporary is destroyed. You need two copy constructions and two destructor calls to copy the content of res to s! This overhead is certainly unnecessary, because its sole purpose is to move the content of res to s.

The creators of C++ became aware of this problem long ago. They even designed a specific optimization technique to eliminate spurious copies (and destructor calls), known as named return value (NRV) or return value optimization (RVO). With RVO, the compiler rewrites the code of concat() so that, instead of returning an object by value, the function conceptually takes a third reference argument to the target object.

NOTE

Return value optimization is discussed in article 12.8-31 of the C++11 standard. The standard is protected by copyright laws, and therefore is not freely available online. You can purchase a digital copy of the final standard, or examine a free working draft (PDF) that's nearly identical to the finished version.

The revised concat() writes the result to the target object directly, as shown here:

RVO doesn't eliminate the root problem; that is, the creation of spurious copies when all you really need is to move a value.

In Quest of Perfect Forwarding

The challenge is to design a perfect forwarding mechanism that lets you pass values directly to the target, without introducing temporary copies along the way. When thinking of such a mechanism, the notion of reference variables springs to mind. References are an efficient vehicle for accessing objects without copying them. However, the traditional references of C++ can only bind to lvalues; that is, named objects that can appear on the left side of an assignment expression. Temporaries, literals, and other objects that don't support pure copying (such as auto_ptr) can't bind safely to reference variables.

Enter Rvalue References

About a decade ago, members of the standards committee started to experiment with a new type of reference variables known as rvalue references. Unlike traditional reference variables (now called lvalue references), rvalue references can safely bind to rvalues. Syntactically, an rvalue reference looks like this:

T&& rref;

Two new canonical member functions were also added to C++11:

A move constructor is the move counterpart of a copy constructor. Instead of copying a source object into a target, a move constructor pilfers the resources of the source, moving them to the target. Consequently, after a move construction the source object is left in an unspecified state, and the target object becomes the exclusive owner of those resources. A move constructor has this signature:

C::C(C&& );

A move assignment operator is the move counterpart of a copy assignment operator. It pilfers the resources of the source object, transferring them to *this. A move assignment operator has this signature:

C& C::operator=(C&&);

The early specifications of move operations assumed that after a move operation the source object was left in an unusable state. The only thing you could do with it was to call its destructor. This draconian policy was changed recently; in C++11, it's agreed that a moved-from object is left in an "empty," albeit valid state, similar to that of a default constructed object.

Move Constructors in Action

Technically, you can write a move constructor without using rvalue references. In fact, for a long time several Standard Library containers have been doing just that. Let's see how a move constructor of a simplified string class would work. Assume that a string class has two data members: a pointer to the memory buffer, and a size counter. Moving merely assigns these two data members to the corresponding members of the target, and it sets the source's data members to zeroes:

This leaves us with a new dilemma—should string classes use copy semantics or move semantics? Clearly, using move semantics is more efficient. However, in certain cases you really want two independent copies of a string. Part 2 of this series will explain how to choose between copy and move semantics; you'll learn how to use C++11 rvalue references to define move canonical member functions, namely the move constructor and the move assignment operator; and finally, you'll overload function sets that take rvalue references and lvalue references.