Thoughts from a C++ library developer.

Implementation Challenge flag_set: Type-safe, hard to misuse bitmask

Sometimes when writing an API you need to pass various flags to a function.
For example, when opening a file you can pass information like whether or not the file is opened for reading,
writing, binary, write at the end etc.
And often those flags can be combined arbitrarily.

Usually you’d implement that by using a bitmask:
Each flag is a bit in an integer,
they can be set/reset and toggled with bitwise operations.
However, the naive implementation isn’t very good:
I’ll explain why and show you how to do it better.

Bitmask

An enum is used to define the actual flag values.
Each flag is represented by one bit,
so the enumerators are assigned powers of two.
And you can use bitwise operations directly with enums,
so an integer with bit 1 and 2 set here is flag a and flag b.

However, this approach has multiple drawbacks.
For starters, classical C enums aren’t scoped and convert to an int every chance they’ll get.
Also after you’ve combined two flags, you don’t have an object of type flags anymore,
but an int, so you’ll loose type safety.

We can fix those problems by using C++11’s enum class.
But because this prevents conversion to the underlying integer type,
this also prevents using the bitwise operators.
We’d have to overload all of them individually:

Bitwise operations aren’t very intuitive.
It would be nice if there was a better API to set a flag
or if it would somehow be possible to prevent these kinds of misuse.

So let’s do exactly that.

The general idea

As plain old C enums aren’t very safe, we want to use an enum class,
but then we need to overload the operators.
This is too much work, so they have to be generated automatically
for enums we want to use as flags.

And when generating the operators with some kind of magic,
we can think a little bit more out of the box.
There is no need to return the enum directly from the bitwise operators,
in fact we shouldn’t.
If we return some kind of different type to represent a combination of multiple flags,
we can write functions that should only accept one flag,
and functions that can accept a combination of flags
and the compiler will remind us if we make a mistake.

So let’s have a flag container, a flag_set.
This type stores which flags are set and which aren’t.
Like the enum itself, it can store that in an integer,
where each bit represents one flag.

So to reset multiple you & the complements.
It would be an error, however, to write a & b,
as this would always be 0 for two individual, different flags.

With that we can identify two kinds of concepts:
A flag combination and a flag mask.
A flag combination is either an individual enumerator or multiple |ed together.
You can use a flag combination to set, toggle and check for flags.
A flag mask is a complemented flag combination.
You can & them together and use it to clear flags.

With that in mind we can define two different types flag_combo and flag_mask.
Like flag_set they also are containers of flags,
but they have semantic information.
The operator&= of flag_set can then only be overloaded for taking a flag_mask,
so code like set &= a won’t compile,
making it impossible to make that mistake.

But what if you truly want to write set &= a?
Let’s look at the semantic meaning of “misusing” the operators:

set |= ~a - set everything excepta

set &= a - clear everything excepta

set ^= ~a - toggle everything excepta

(set & ~a) != 0 - check for everything excepta

So swapping the concepts around is useful if you have many flags and want to do something for all of them except one (or few).
This is reasonable, so it should be allowed.
It is not the normal behavior, however, so it should be more explicit.

We can easily write a function combo() that takes a mask and returns the appropriate combination,
and mask() that does the opposite.
Then the above behavior is still possible it just requires set &= mask(a).

Implementation

flag_set_impl

All three types flag_set, flag_combo and flag_mask basically have the same implementation.
All three need to store multiple flags as bits in an integer.

As the three types share a common behavior, but it is very important that they are three distinct types,
the flag_set_impl has a Tag parameter.
This is just a dummy, but two instantiations with different types there are two different types,
which allows overloading etc.

We’ll store the bits in an integer, select_flag_set_int gives us that integer.
It is the smallest unsigned integer type that has at least that many bits.
The implementation just uses specializations, nothing too interesting.

This approach doesn’t work for enums with more than 64 flags,
but I’ll solve that when it comes up.

One of the other problems I wanted to prevent is making a mistake when assigning the values to the enum flags.
It can be prevented by simply keeping the default values.
But then instead of being the corresponding mask directly,
it is the index of the bit.
The mask is easily created by shifting 1 the right number of times,
which is what mask() does.

Mask here isn’t flag mask, just the integer that has the other bits “masked”.

We’ll add two named constructors.
One returns a flag_set_impl where no flags are set, one where all of them are.
The second is more interesting: we can’t return the maximum value of the integer directly,
as we might not use all bits of them directly.
If the upper bits are 1s all_set() wouldn’t be equal to a | b | ... ,
as their upper bits are 0s.
So we’ll shift 1 one more than we’ll have flags and subtract 1.
This works and is works even if the enum uses all bits as unsigned overflow is well-defined.

We’ll also add two regular constructors, which aren’t interesting,
as long as they are explicit.

Next are the important member functions to set/clear/toggle a single bit.
They’re all straightforward and make use of the private constructor taking int_type.
Note that they aren’t doing it in-place,
rather they return a new flag_set_impl allowing them to work with C++11 constexpr rules.

Other member functions not shown are a toggle_all(), to_int() and is_set(),
as well as bitwise_or(), bitwise_and() and bitwise_xor().
They’re all constexpr and not in-place and simply forward to the corresponding bitwise operations.

Note that the entire interface of this class is an implementation detail.

flag_combo and flag_mask

As tag type we use an on the fly struct declaration, as it really isn’t important.

The only thing the user should now about are the bitwise operations, we overload them like this:

We can | two flag_combo objects as well as a combo with an enumerator,
result is a flag_combo

We can & two flag_mask objects yielding a mask.

We can ~ a flag_combo or an enumerator yielding a mask.

We can ~ a flag_mask yielding a combo.

We can also compare two masks/combos for equality as well as a combo with an enumerator.

Implementation is very straightforward with the given interface as are the mask() and combo()
conversions.

flag_set

flag_set is the important type for the user,
it shouldn’t worry too much about the other ones.
It uses flag_set_impl as a member and all functions simply forward to it.

flag_set provides the straightforward named member functions:
set(),reset(),toggle() as well as set_all(),reset_all() and toggle_all().
Unlike flag_set_impl they work in-place as that’s more convenient for the user
and set() also has a bool value overload.

It can also be created from a flag combination (i.e. flag_combo or enumerator)
as well as assigned to:

I’ll get back to the traits, otherwise it simply checks whether the argument is either the enum directly
or a flag_combo<Enum>.
So simple SFINAE ensures that the conversion only works for a | b and not ~a.

flag_set also provides the compound bitwise operations,
|= and ^= are constrained like the constructor, &= requires a flag_mask,
catching a potential mistake as I wanted.

A little bit more interesting are the non-compound operators.
We can use identical overloads for operator|, operator^ and operator&,
each returning the new flag_set,
but then we’d miss one:
using operator& to check whether bits are set.
This operator& takes a flag combination not a mask
and it also should return bool.

But this is trivial to add as a flag combination and a flag masks are two distinct types.
Unlike other implementations I thus can get rid of the conversion to boolflag_set would need otherwise.

Automatically generating the overloads for the enum

We’ve done everything except one last piece is missing:
There are still no bitwise operations for the enum directly,
all we could overload are the ones taking at least one user-defined type.

flag_set_impl also needs to know how many flags are in an enum,
in order to select the integer type and implement the all_set() constructor.

We can solve two problems at once by introducing the flag_set_traits.
This is a class template that can be specialized for your own types,
i.e. enums.
It must provide a static constexpr function size() that returns the number of flags in the enum,
used by the flag_set_impl.

And it can also be used to “generate” the bitwise operations.
We can’t overload them directly, as we don’t know the type of the enum yet.
So all we can do is write them as templates in a global scope.

Global scope is needed because ADL doesn’t help us here,
as we don’t know the namespace of the flag.

But then every type would suddenly have an operator~,
which could be a better match then the one they actually provide!

This is clearly a bad idea,
so instead we can constrain the templates.
We can use SFINAE to enable them only if the type is an enum with specialized flag_set_traits.
Then they only apply where we actually want them.
Detecting a specialization isn’t hard either,
we can simply require that every specialization inherits from std::true_type
and check flag_set_traits<Enum>::value.

Now this still isn’t a nice solution - it is still a global templated operator,
but there aren’t nice solutions.
The only other one besides “do it manually” is with a macro.

We need to create a mask when building the complement of a flag,
and a combination when we or two together.

Automatically using a correct flag_set_traits

The approach with the flag_set_traits works and is non-intrusive.
It is a little bit ugly, however:
When you define your enum you’ll have to close the namespace,
open the namespace of the flag_set_traits,
specialize it, and then open the original one again,
if you need to add anything else.

It would be better if the default flag_set_traits specialization would work on its own.
This can be done as well, on the cost of making it intrusive.
The default flag_set_traits can check whether the argument is an enum
and whether it has a special enumerator, i.e. _flag_set_size.
If that’s the case it inherits from std::true_type
and uses _flag_set_size as the return value for size(),
else it inherits from std::false_type.

Conclusion

We’ve now created a way to implement flags simply by writing the following code:

enumclassflags{a,b,c,…_flag_set_size};

There is no need to assign powers of two,
no need to use a macro or overload operators.
It just works out of the box.

Furthermore it uses the type system to give the bitwise operations semantic information,
so that the compiler can check common mistakes when misusing the operators.
But unless the user deliberately want to make the “mistake”,
it doesn’t need to care, as the use of the types are hidden away.