The tarjan-coloring bridge is significant departure from the old bridge. It's design is based
on the Tarjan Strongly Connected Component algorithm.
SCC links are calculated as part of the Tarjan dfs on the second scan to calculate the low index and
form the components.
The algorithm does inspect all children after they have been processed, which enables us to do backward
propagation of the reachable SCCs.
In order to avoid the xref mess of the old algorithm, we color each component based on the sum of all
reachable components (colors). This means no matter how many non-bridge objects are in between two bridges,
they will all receive the same color and be represented in the final output as a single node in the SCC
graph.
This allows for a reasonably compact output graph as it will only include colors with no bridges when
a single object has multiple bridges reachable.
Said that, the basic algorithm received a series of optimizations:
1)Flag based merging, great idea from Mark, it bounds color and xref merging to be linear on the number of
colors found.
2)Objects and Colors bucket allocation. Instead of using expansible arrays we use fixed size buckets that
form a linked list. This allow us to use direct pointers instead of indexes. It avoid the expensive expansions.
Another benefit is that it reduces the work by sgen's malloc, which can be really slow.
3)Replace the object hash table with header patching. We tag objects with both pinning and forwarded bits and
store a pointer in the lock word. This eliminates the hashtable, a big source of perf issues. Patching objects
back at the end of bridge processing is much much cheaper than free'ing all the hashtable entries.
4)Color deduplication. It's possible to produce duplicated colors due to mutually unreachable paths that
both points to the same set of bridges. A cache is introduced to reduce it for colors points to 2 or 3 others.
The way we build colors allow us in the future to fix the hub object problem (SetupDoubleFan).
Experimental results with sgen-bridge-pathologies shows this implementation been 2-3x faster than the new one.

This was previously used to not store counters that were not going to be dumped, but because we can now sample them, we have to make sure that they are stored, even if we do not plan to dump or sample them.