I heard a recent talk by Herb Sutter who suggested that the reasons to pass std::vector and std::string by const & are largely gone. He suggested that writing a function such as the following is now preferable:

I understand that the return_val will be an rvalue at the point the function returns and can therefore be returned using move semantics, which are very cheap. However, inval is still much larger than the size of a reference (which is usually implemented as a pointer). This is because a std::string has various components including a pointer into the heap and a member char[] for short string optimization. So it seems to me that passing by reference is still a good idea.

I think the best answer to the question is probably to read Dave Abrahams's article about it on C++ Next. I'd add that I see nothing about this that qualifies as off-topic or not constructive. It's a clear question, about programming, to which there are factual answers.
– Jerry CoffinApr 19 '12 at 15:26

Fascinating, so if you're going to have to make a copy anyway, pass-by-value is likely faster than pass-by-reference.
– BenjApr 19 '12 at 15:53

1

@Sz. I am sensitive to questions being falsely categorized as duplicates and closed. I do not remember the details of this case and have not re-reviewed them. Instead I am simply going to delete my comment on the assumption that I made a mistake. Thank you for bringing this to my attention.
– Howard HinnantJul 14 at 15:00

@HowardHinnant, thank you very much, it's always a precious moment when one comes across this level of attentiveness and sensibility, it's so refreshing! (I'll delete mine then, of course.)
– Sz.Jul 14 at 15:07

13 Answers
13

Let's say I have function A which calls function B, which calls function C. And A passes a string through B and into C. A does not know or care about C; all A knows about is B. That is, C is an implementation detail of B.

Let's say that A is defined as follows:

void A()
{
B("value");
}

If B and C take the string by const&, then it looks something like this:

Hello, copy constructor and potential memory allocation (ignore the Short String Optimization (SSO)). C++11's move semantics are supposed to make it possible to remove needless copy-constructing, right? And A passes a temporary; there's no reason why C should have to copy the data. It should just abscond with what was given to it.

Except it can't. Because it takes a const&.

If I change C to take its parameter by value, that just causes B to do the copy into that parameter; I gain nothing.

So if I had just passed str by value through all of the functions, relying on std::move to shuffle the data around, we wouldn't have this problem. If someone wants to hold on to it, they can. If they don't, oh well.

Is it more expensive? Yes; moving into a value is more expensive than using references. Is it less expensive than the copy? Not for small strings with SSO. Is it worth doing?

When you say that moving into a value is more expensive than using references, that's still more expensive by a constant amount (independent of the length of the string being moved) right?
– Neil GApr 19 '12 at 19:42

3

@NeilG : Do you understand what "implementation-dependent" means? What you're saying is wrong, because it depends on if and how SSO is implemented.
– ildjarnApr 19 '12 at 21:50

16

@ildjarn: In order analysis, if the worst case of something is bound by a constant, then it's still constant time. Is there not a longest small string? Doesn't that string take some constant amount of time to copy? Don't all smaller strings take less time to copy? Then, string copying for small strings is "constant time" in order analysis — despite small strings taking varying amounts of time to copy. Order analysis is concerned with asymptotic behaviour.
– Neil GApr 19 '12 at 21:56

6

@NeilG : Sure, but your original question was "that's still more expensive by a constant amount (independent of the length of the string being moved) right?" The point I'm trying to make is, it could be more expensive by different constant amounts depending on the length of the string, which gets summed up as "no".
– ildjarnApr 19 '12 at 21:59

10

Why would the string be moved from B to C in the by value case? If B is B(std::string b) and C is C(std::string c) then either we have to call C(std::move(b)) in B or b has to remain unchanged (thus 'unmoved from') until exiting B. (Perhaps an optimizing compiler will move the string under the as-if rule if b isn't used after the call but I don't think there is a strong guarantee.) The same is true for the copy of str to m_str. Even if a function paramter was initialized with an rvalue it is an lvalue inside the function and std::move is required to move from that lvalue.
– PixelchemistJul 16 '15 at 11:04

No. Many people take this advice (including Dave Abrahams) beyond the domain it applies to, and simplify it to apply to allstd::string parameters -- Always passing std::string by value is not a "best practice" for any and all arbitrary parameters and applications because the optimizations these talks/articles focus on apply only to a restricted set of cases.

If you're returning a value, mutating the parameter, or taking the value, then passing by value could save expensive copying and offer syntactical convenience.

As ever, passing by const reference saves much copying when you don't need a copy.

Now to the specific example:

However inval is still quite a lot larger than the size of a reference (which is usually implemented as a pointer). This is because a std::string has various components including a pointer into the heap and a member char[] for short string optimization. So it seems to me that passing by reference is still a good idea. Can anyone explain why Herb might have said this?

If stack size is a concern (and assuming this is not inlined/optimized), return_val + inval > return_val -- IOW, peak stack usage can be reduced by passing by value here (note: oversimplification of ABIs). Meanwhile, passing by const reference can disable the optimizations. The primary reason here is not to avoid stack growth, but to ensure the optimization can be performed where it is applicable.

The days of passing by const reference aren't over -- the rules just more complicated than they once were. If performance is important, you'll be wise to consider how you pass these types, based on the details you use in your implementations.

These functions are implemented in a separate compilation unit in order to avoid inlining. Then :
1. If you pass a literal to these two functions, you will not see much difference in performances. In both cases, a string object has to be created
2. If you pass another std::string object, foo2 will outperform foo1, because foo1 will do a deep copy.

What's more relevant is what's happening inside the function: would it, if called with a reference, need to make a copy internally that can be omitted when passing by value?
– leftaroundaboutApr 19 '12 at 15:58

1

@leftaroundabout Yes, off course. My assumption that both functions are doing exactly the same thing.
– BЈовићApr 19 '12 at 16:02

5

That's not my point. Whether passing by value or by reference is better depends on what you're doing inside the function. In your example, you're not actually using much of the string object, so reference is obviously better. But if the function's task were to place the string in some struct or to perform, say, some recursive algorithm involving multiple splits of the string, passing by value might actually save some copying, compared to passing by reference. Nicol Bolas explains it quite well.
– leftaroundaboutApr 19 '12 at 18:55

2

To me "it depends on what you do inside the function" is bad design - since you are basing the signature of the function on the internals of the implementation.
– Hans OlssonDec 9 '16 at 16:48

1

Might be a typo, but the last two literal timings have 10x fewer loops.
– TankorSmashFeb 23 '17 at 19:20

@KeithThompson The Guideline quote (Don’t copy your function arguments. Instead, pass them by value and let the compiler do the copying.) is COPIED from that page. If that's not clear enough, I can't help. I don't fully trust compilers to make the best choices. I'd rather be very clear about my intents in the way I define function arguments. #1 If it's read-only, it's a const ref&. #2 If I need to write it or I know it gets out of scope... I use a value. #3 If I need to modify the original value, I pass by ref&. #4 I use pointers * if an argument is optional so I can nullptr it.
– CodeAngryAug 23 '13 at 19:39

10

I'm not taking sides on the question of whether to pass by value or by reference. My point is that you advocate passing by reference in some cases, but then cite (seemingly to support your position) a guideline that recommends always passing by value. If you disagree with the guideline, you might want to say so and explain why. (The links to cpp-next.com aren't working for me.)
– Keith ThompsonAug 23 '13 at 19:53

3

@KeithThompson: You're mis-paraphrasing the guideline. It is not to "always" pass by value. To summarize, it was "If you would have made a local copy, use pass by value to have the compiler perform that copy for you." It's not saying to use pass-by-value when you weren't going to make a copy.
– Ben VoigtApr 27 '15 at 22:37

If you change this to take the string by value then you'll end up moving or copying the parameter, and there's no need for that. Not only is copy/move likely more expensive, but it also introduces a new potential failure; the copy/move could throw an exception (e.g., allocation during copy could fail) whereas taking a reference to an existing value can't.

If you do need a copy then passing and returning by value is usually (always?) the best option. In fact I generally wouldn't worry about it in C++03 unless you find that extra copies actually causes a performance problem. Copy elision seems pretty reliable on modern compilers. I think people's skepticism and insistence that you have to check your table of compiler support for RVO is mostly obsolete nowadays.

In short, C++11 doesn't really change anything in this regard except for people that didn't trust copy elision.

In C++17, we have basic_string_view<?>, which brings us down to basically one narrow use case for std::string const& parameters.

The existence of move semantics has eliminated one use case for std::string const& -- if you are planning on storing the parameter, taking a std::string by value is more optimal, as you can move out of the parameter.

If someone called your function with a raw C "string" this means only one std::string buffer is ever allocated, as opposed to two in the std::string const& case.

However, if you don't intend to make a copy, taking by std::string const& is still useful in C++14.

With std::string_view, so long as you aren't passing said string to an API that expects C-style '\0'-terminated character buffers, you can more efficiently get std::string like functionality without risking any allocation. A raw C string can even be turned into a std::string_view without any allocation or character copying.

At that point, the use for std::string const& is when you aren't copying the data wholesale, and are going to pass it on to a C-style API that expects a null terminated buffer, and you need the higher level string functions that std::string provides. In practice, this is a rare set of requirements.

I appreciate this answer – but I do want to point out it that it does suffer (as many quality answers do) from a bit of domain-specific bias. To wit: “In practice, this is a rare set of requirements”… in my own development experience, these constraints – which seem abnormally narrow to the author – are met, like, literally all the time. It’s worth pointing this out.
– fish2000Sep 6 '18 at 18:37

1

@fish2000 To be clear, for std::string to dominate you don't just need some of those requirements but all of them. Any one or even two of those is, I'd admit, common. Maybe you do commonly need all 3 (like, you are doing some parsing of a string argument to pick which C API you are going to pass it wholesale to?)
– Yakk - Adam NevraumontSep 6 '18 at 19:00

@Yakk-AdamNevraumont It is a YMMV thing – but it’s a frequent use-case if (say) you are programming against POSIX, or other APIs where C-string semantics are, like, the lowest common denominator. I should say really that I love std::string_view – as you point out, “A raw C string can even be turned into a std::string_view without any allocation or character copying” which is something worth remembering to those who are using C++ in the context of such API usage, indeed.
– fish2000Sep 7 '18 at 16:15

The pass-by-value solution requires only one overload but costs an extra move construction when passing lvalues and xvalues. This may or may not be acceptable for any given situation. Both solutions have advantages and disadvantages.

std::string is a standard library class. It already is both moveable and copyable. I don't see how this is relevant. The OP is asking more about the performance of move vs. references, not the performance of move vs. copy.
– Nicol BolasApr 19 '12 at 16:19

3

This answer counts the number of moves and copies a std::string will undergo under the pass-by-value design described by both Herb and Dave, vs passing by reference with a pair of overloaded functions. I use the OP's code in the demo, except for substituting in a dummy string to shout-out when it is getting copied/moved.
– Howard HinnantApr 19 '12 at 16:35

std::string is not Plain Old Data(POD), and its raw size is not the most relevant thing ever. For example, if you pass in a string which is above the length of SSO and allocated on the heap, I would expect the copy constructor to not copy the SSO storage.

The reason this is recommended is because inval is constructed from the argument expression, and thus is always moved or copied as appropriate- there is no performance loss, assuming that you need ownership of the argument. If you don't, a const reference could still be the better way to go.

Interesting point about the copy constructor being smart enough not to worry about the SSO if it's not using it. Probably correct, I'm going to have to check if that's true ;-)
– BenjApr 19 '12 at 15:34

3

@Benj: Old comment I know, but if SSO is small enough copying it unconditionally is faster than doing a conditional branch. For example, 64 bytes is a cache line and can be copied in a really trivial amount of time. Probably 8 cycles or less on x86_64.
– Zan LynxAug 15 '13 at 18:58

Even if the SSO is not copied by the copy constructor, an std::string<> is 32 bytes that are allocated from the stack, 16 of which need to be initialized. Compare this to just 8 bytes allocated and initialized for a reference: It is twice the amount of CPU work, and it takes up four times as much cache space that won't be available to other data.
– cmasterJun 19 '18 at 22:10

Oh, and I forgot to talk about passing function arguments in registers; that would bring the stack usage of the reference down to zero for the last callee...
– cmasterJun 19 '18 at 22:11

There is a pitfall not mentioned in any of the other answers here: if you pass a string literal to a const std::string& parameter, it will pass a reference to a temporary string, created on-the-fly to hold the characters of the literal. If you then save that reference, it will be invalid once the temporary string is deallocated. To be safe, you must save a copy, not the reference. The problem stems from the fact that string literals are const char[N] types, requiring promotion to std::string.

This is a different issue and WidgetBadRef doesn't need to have a const& parameter to go wrong. The question is if WidgetSafeCopy just took a string parameter would it be slower? (I think the copy temporary to member is certainly easier to spot)
– Superfly JonMar 10 '17 at 10:53

@JustinTime: thank you; I removed the incorrect final sentence claiming, in effect, that std::string&& would be a universal reference.
– circlepi314Apr 18 '17 at 23:04

@circlepi314 You're welcome. It's an easy mix-up to make, it can sometimes be confusing whether any given T&& is a deduced universal reference or a non-deduced rvalue reference; it probably would've been clearer if they introduced a different symbol for universal references (such as &&&, as a combination of & and &&), but that would probably just look silly.
– Justin TimeApr 19 '17 at 21:50

The benchmarks show that passing std::strings by value, in cases where the function will copy it in anyway, can be significantly slower!

This is because you are forcing it to always make a full copy (and then move into place), while the const& version will update the old string which may reuse the already-allocated buffer.

See his slide 27: For “set” functions, option 1 is the same as it always was. Option 2 adds an overload for rvalue reference, but this gives a combinatorial explosion if there are multiple parameters.

It is only for “sink” parameters where a string must be created (not have its existing value changed) that the pass-by-value trick is valid. That is, constructors in which the parameter directly initializes the member of the matching type.

If you want to see how deep you can go in worrying about this, watch Nicolai Josuttis’s presentation and good luck with that (“Perfect — Done!” n times after finding fault with the previous version. Ever been there?)

His advice boils down to only using value parameters for a function f that takes so-called sink arguments, assuming you will move construct from these sink arguments.

This general approach only adds the overhead of a move constructor for both lvalue and rvalue arguments compared to an optimal implementation of f tailored to lvalue and rvalue arguments respectively. To see why this is the case, suppose f takes a value parameter, where T is some copy and move constructible type:

void f(T x) {
T y{std::move(x)};
}

Calling f with an lvalue argument will result in a copy constructor being called to construct x, and a move constructor being called to construct y. On the other hand, calling f with an rvalue argument will cause a move constructor to be called to construct x, and another move constructor to be called to construct y.

In general, the optimal implementation of f for lvalue arguments is as follows:

void f(const T& x) {
T y{x};
}

In this case, only one copy constructor is called to construct y. The optimal implementation of f for rvalue arguments is, again in general, as follows:

void f(T&& x) {
T y{std::move(x)};
}

In this case, only one move constructor is called to construct y.

So a sensible compromise is to take a value parameter and have one extra move constructor call for either lvalue or rvalue arguments with respect to the optimal implementation, which is also the advice given in Herb's talk.

As @JDługosz pointed out in the comments, passing by value only makes sense for functions that will construct some object from the sink argument. When you have a function f that copies its argument, the pass-by-value approach will have more overhead than a general pass-by-const-reference approach. The pass-by-value approach for a function f that retains a copy of its parameter will have the form:

void f(T x) {
T y{...};
...
y = std::move(x);
}

In this case, there is a copy construction and a move assignment for an lvalue argument, and a move construction and move assignment for an rvalue argument. The most optimal case for an lvalue argument is:

void f(const T& x) {
T y{...};
...
y = x;
}

This boils down to an assignment only, which is potentially much cheaper than the copy constructor plus move assignment required for the pass-by-value approach. The reason for this is that the assignment might reuse existing allocated memory in y, and therefore prevent (de)allocations, whereas the copy constructor will usually allocate memory.

For an rvalue argument the most optimal implementation for f that retains a copy has the form:

void f(T&& x) {
T y{...};
...
y = std::move(x);
}

So, only a move assignment in this case. Passing an rvalue to the version of f that takes a const reference only costs an assignment instead of a move assignment. So relatively speaking, the version of f taking a const reference in this case as the general implementation is preferable.

So in general, for the most optimal implementation, you will need to overload or do some kind of perfect forwarding as shown in the talk. The drawback is a combinatorial explosion in the number of overloads required, depending on the number of parameters for f in case you opt to overload on the value category of the argument. Perfect forwarding has the drawback that f becomes a template function, which prevents making it virtual, and results in significantly more complex code if you want to get it 100% right (see the talk for the gory details).

The problem is that "const" is a non-granular qualifier. What is usually meant by "const string ref" is "don't modify this string", not "don't modify the reference count". There is simply no way, in C++, to say which members are "const". They either all are, or none of them are.

In order to hack around this language issue, STL could allow "C()" in your example to make a move-semantic copy anyway, and dutifully ignore the "const" with regard to the reference count (mutable). As long as it was well-specified, this would be fine.

Since STL doesn't, I have a version of a string that const_casts<> away the reference counter (no way to retroactively make something mutable in a class hierarchy), and - lo and behold - you can freely pass cmstring's as const references, and make copies of them in deep functions, all day long, with no leaks or issues.

Since C++ offers no "derived class const granularity" here, writing up a good specification and making a shiny new "const movable string" (cmstring) object is the best solution I've seen.

Thank you for your interest in this question.
Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).