Table of Contents

1. Introduction and motivation

sizeof and offsetof currently allow programmers to access the layout of data structures
with the resolution of one byte. If a data structure needs to be compact, bit fields
allow programmers to specify members that take less than one byte, which saves memory but sizeof and offsetof cannot be used with bit fields. The proposed bit_sizeof and bit_offsetof keywords would allow code to inspect the locations of individual bits of
structures.

In order to interact directly with memory that contains data structures with bit fields,
the existing limitations of the C++ language require verbose and error-prone manual memory
allocation. Consider the following example program which works in C++14, illustrating the
messy and error-prone manual layout of bits necessary so that current C++ programs can
know where member variables are stored:

The Foo structure should obviously use bit fields because it is many small member
variables packed into one byte, but the need to locate the bits in memory currently
requires manual allocation of bits in order to access their location. If the bit_sizeof and bit_offsetof keywords were added to C++, the code would be able to properly use bit
fields:

Finding the locations of members of compact structures is necessary in JIT compilers that
interact with data structures where a different instruction must be written into an
instruction buffer depending on where the desired bit is located in the destination byte.
Development of such a compiler motivated the abandonment of bit fields in a change to WebKit in https://trac.webkit.org/changeset/166465/trunk/Source/WebCore/rendering/style/RenderStyle.h and other structures have manual bit allocation for similar reasons. A memory allocator
that pre-initializes memory for structures with bit fields would also benefit from
knowledge of the locations of bit fields in structures. In general, allowing more
precision with bit field location and size determination will enable more efficient
code to be written in C++.

2. Behavior of bit_sizeof and bit_offsetof

bit_sizeof is an operator that returns the size in bits of the type of the operand. bit_offsetof is an operator that returns the number of bits between the member and the
beginning of the structure. Consider the following illustrative example:

bit_sizeof(instance.B) should return 5. bit_sizeof(A::C) should return 3. bit_sizeof(A::D) should return CHAR_BIT because bit_sizeof can be used with members that are not bit fields. bit_sizeof(A) should return the number of bits in A including padding, similar to [N4296] 5.3.3.2. bit_sizeof instance should be a unary expression form corresponding to the unary
expression form of sizeof in [N4296] 14.6.2.3. [N4296] 5.3.1 defines a sizeof...(identifier) which counts the number of template parameters in a variadic template, but such a form for bit_sizeof would not make sense. Like mentioned in [N4296] 5.1.1.13.3, bit_sizeof(A::D+42ull) should return the size of the result of the contained expression, in this case the number
of bits in an unsignedlonglong. Like [N4296] 5.3.3.3, bit_sizeof(&staticFunction) should
return the number of bits in a function pointer, but bit_sizeof(staticFunction) is invalid.
Like [N4296] 3.9.1.10, bit_sizeof(std::nullptr_t) should be equal to bit_sizeof(void*).
Like [N4296] 5.3.3.1, bit_sizeof(char), bit_sizeof(signedchar), and bit_sizeof(unsignedchar) should all equal CHAR_BIT. Like [N4296] 5.3.3.2, bit_sizeof(*parentPointer) should equal bit_sizeof(A), bit_sizeof(EmptyStruct) should be greater than 0, and bit_sizeof(fiveChars) should be 5*CHAR_BIT. A new definition is necessary linking the definitions sizeof and bit_sizeof,
because bit_sizeof(uintptr_t) should be CHAR_BIT*sizeof(uintptr_t) and the same should
be true for all non-bit-field types. bit_sizeof(A::Method) and bit_sizeof(A::StaticMember) are invalid like their corresponding sizeof. Like [N4296] 5.1.1.5, classE{inta[bit_sizeof(*this)];}; should be invalid because it would need to determine the size of an incomplete type.

bit_offsetof(A,B) would return 0 if B is at the beginning of A in memory. bit_offsetof(A,C) could return 5 because the beginning of C would likely be located 5 bits
after the beginning of A in memory. bit_offsetof(A,D) can work with non-bit-field
members and would likely return CHAR_BIT depending on how the compiler lays out the
members of A. A compiler implementer would need to make sure bit_offsetof returns the
correct offsets with the presence of vtable pointers. Zero-length bit fields cannot be
operands of bit_sizeof or bit_offsetof because they don’t have a name, but their presence
could influence the values returned by bit_offsetof for other members because they change
the memory location. Like offsetof, noexcept(bit_offsetof(A,C)) should always be true.
Like [N4296] footnote 195, bit_offsetof would be required to return the bit offsets even if
operator& is overloaded. These requirements make less sense for bit_offsetof because
using & or std::addressof to get the address of a bit field should still be invalid.

3. std::bit_size_t

The return type of bit_sizeof and bit_offsetof should be specified.
I propose one of three options:

std::size_t. This matches the return type of sizeof and offsetof, and std::size_t is
a commonly used type for counting. This presents a problem with large structures. Consider the
following code:

If the return type of bit_sizeof or bit_offsetof were std::size_t, then this otherwise
valid code would need to be declared to be ill-formed. If someone is iterating all the
bits in the entire address space with a std::size_t, it will overflow after iterating
1/CHAR_BIT of the bits. This is an existing problem that will be untouched by this
specification.

A new type std::bit_size_t that would be able to hold the maximum value of the number
of addressable bits. For example on a 32-bit system with CHAR_BIT of 8, a std::size_t could be a 32-bit integer because there will never be more than 232 bytes in memory,
but a std::bit_size_t could be a 64-bit integer so that it can hold the possible
maximum value of 240-1. As another example, a 64-bit system with a maximum virtual
address space size of 248 and CHAR_BIT of 8 could use a 64-bit integer for std::bit_size_t because it would be able to hold the maximum possible value of 254-1.
Code would likely often convert between std::size_t and std::bit_size_t, but on many
systems a static_cast would not be necessary if they were typedef’ed to the same
underlying integer type. An implementer might choose to make std::bit_size_t the same
type as std::vector<bool>::size_type.

A new strongly typed integer, like a class that has an explicit operator std::size_t() or other way to automatically convert to and from std::size_t. The explicit would
prevent programmers from accidentally converting between the integer types, which
could be different. If such a class were created, a corresponding class could be
made to wrap a std::size_t for converting to the type of integer used for counting bits.

4. bit_offsetof: macro or keyword?

offsetof is currently defined to be a macro. Common uses of offsetof can be emulated with
a macro that subtracts addresses and has no special interaction with the compiler. If operator& is overloaded the compiler needs to use something like std::addressof which has
special behavior in the compiler in order for the offsetof macro to behave correctly and
comply with footnote 195 of [N4296]. For example, the libc++ implementation of the offsetof
macro is just #define offsetof(t, d) __builtin_offsetof(t, d)

bit_offsetof could be specified as a macro to match the definition of offsetof, but it
would need to have special behavior because there is no way to subtract the addresses of
bit fields and because it will also have the condition that it must behave correctly even
when operator& is overloaded. Because of this need of special behavior, it may be simpler
just to define bit_offsetof as a new keyword like bit_sizeof and sizeof.

5. Revision History

r0 This was presented at the meeting in Kona in 2017 to LEWG. LEWG sees this as static reflection, the Reflection SG is therefore a better venue.