string_ref: a non-owning reference to a string

Overview

References to strings are very common in C++ programs, but
often the callee doesn't care about the exact type of the object
that owns the data. 3 things generally happen in this case:

The callee takes std::string and insists that callers copy the
data if it was originally owned by another type.

The callee takes two parameters—a char* and
a length (or just char* and assumes
0-termination)—and reduces the readability and safety of
calls and loses any helper functions the original type
provided.

The callee is rewritten as a template and its implementation
is moved to a header file. This can increase flexibility if the
author takes the time to code to a weaker iterator concept, but it
can also increase compile time and code size, and can even
introduce bugs if the author misses an assumption that the
argument's contents are contiguous.

Google and LLVM have independently implemented a string-reference type to
encapsulate this kind of argument. string_ref is implicitly
constructible from const char* and std::string. It
provides most of the const member operations from
std::string to ease conversion. This paper follows Chromium
in extending string_ref to basic_string_ref<charT, traits>
(Chromium omits traits). We provide typedefs to parallel the 4
basic_string typedefs.

Operations on string_ref apply to the characters in the
string, and not the pointers that refer to the characters. This introduces
the possibility that the underlying characters might change while a
string_ref referring to them is in an associative container,
which would break the container, but we believe this risk is worthwhile
because it matches existing practice and matches user intentions more
often.

Both Google's and LLVM's string_ref types extend
the interface from std::string to provide some
helpful utility functions:

Inventions in this paper

Google's StringPiece provides as_string and
ToString methods to convert to std::string. LLVM's
StringRef provides both a str() explicit
conversion and an implicit operator std::string(). Since this
paper builds on top of C++11, we provide an explicit
conversion operator.

Google's and LLVM's string_ref types provide a subset of
std::string's searching operations, but they do provide
pos arguments to specify where to start the search. Because string_ref::substr is
much cheaper than string::substr, this paper removes the
pos argument entirely.

None of the existing classes have constexpr methods.

Bikeshed!

What do we call this class?

string_piece

string_range

string_view

sub_string

Modifications vs std::string

The interface of string_ref is similar to, but not exactly
the same as the interface of std::string. In general, we want
to minimize differences between std::string and
string_ref so that users can go back and forth between the two
often. This section justifies the differences whose utility we think
overcomes that general rule.

Additions

We should consider adding these methods to std::string in
C++17, but we can't modify std::string in a TS, so this paper
doesn't propose such changes.

remove_prefix() and
remove_suffix() make
it easy to parse strings using string_ref. They could both
be implemented as non-member functions (e.g. str.remove_prefix(n)
=== str = str.substr(n)), but it seems useful to
provide the simplest mutators as member functions. Note that other
traversal primitives need to be non-members so that they're
extensible, which may argue for pulling these out too.

starts_with and ends_with are common queries on
strings. The non-member equivalents produce calls that are somewhat
ambiguous between starts_with(haystack, needle) vs
starts_with(needle, haystack), while
haystack.starts_with(needle) is the only English reading of
the member version.

Removals

copy: std::string::copy is copy
out, not in. It's not well named. Users can always use
std::copy instead.

pos and n parameters to methods have been
removed from string_ref. std::string needs
these parameters because std::string::substring is an
expensive (copying and sometimes allocating) operation. However, these
are always integral parameters, so the compiler can't check that their
order is correct, and readers often have a hard time. Because string_ref::substr is cheap,
we insist users call it instead of passing its arguments to other
functions.

?.2
Class template basic_string_ref [strings.string_ref]

A string-like object that refers to a const sized piece of
memory owned by another object.

We provide implicit constructors so users can pass in a const
char* or a std::string wherever a
string_ref is expected.

It is expected that user-defined string-like types will define an
implicit conversion to string_ref (or another appropriate
instance of basic_string_ref)
to interoperate with functions that need to read strings.

Unlike std::strings and string literals, data()
may return a pointer to a buffer that is not null-terminated. Therefore it
is typically a mistake to pass data()
to a routine that takes just a const charT* and expects a
null-terminated string.

?.2.5 basic_string_ref modifiers [strings.string_ref.modifiers]

?.2.6 basic_string_ref string operations [strings.string_ref.ops]

Unlike std::string, string_ref provides no whole-string methods with position or length parameters. Instead, users should use the substr() method to create the character sequence they're actually interested in, and use that.

?.2.6.2 comparisons [strings.string_ref.ops.comparison]

Determines the effective length
rlen of the strings to compare as the
smallest of size() and other.size(). The
function then compares the two strings by calling
traits::compare(data(), other.data(),
rlen).

Returns:

The nonzero result if the result of the
comparison is nonzero. Otherwise, returns a value as indicated in the following table:

?.2.8 basic_string_ref inserter [strings.string_ref.io]

Behaves as a formatted output function
(27.7.3.6.1). After constructing a sentry object, if
this object returns true when converted to a value of
type bool, determines padding as described in
22.4.2.2.2, then inserts the resulting sequence of characters
seq as if by calling os.rdbuf()->sputn(seq,
n), where n is the larger
of os.width() and str.size(); then calls
os.width(0).

?.3 Numeric conversions [strings.conversions.numeric]

[Note: The functions below mirror C++11's
stox() functions on strings. Because
string_ref can cheaply change to refer to a smaller
substring, we should also add a more convenient interface that advances
the string_ref as it parses an integer out, with the rough
signature stox_consume(string_ref&, int
base=10). However, failure is expected in parsing functions, so
it's not appropriate to raise an exception when no integer is available to
be parsed. Existing practice is to use a signature like bool
stox_consume(string_ref&, T& result, int
base=10), but this requires a temporary integral variable to store
the result in. If we get an optional<T> type in a TS, a
better signature along the lines of optional<T>
stox_consume(string_ref&, int base=10) becomes
available. To avoid precluding the optional signature, I
haven't included any of the consume variants in this
paper. — end note]