Feature #5392

Symbol GC

I looked more into Symbol GC. The biggest problem is IDs are not VALUEs. My outburst at RubyConf based on my stupid assumption that they were -- I was trying to attack the problem using WeakRefs.

If IDs were VALUEs and Symbols were allocated like any other Object, the existing GC mark and root machinery (including C stack root scans), would take care of it, with an additional sweep of the global_symbol lookup tables.

However, the remaining issue is IDs stored in globals. No matter what, IDs stored in C globals will need to be rb_gc_register_address(VALUE*) roots -- this means CRuby API/contract changes.

Adding a standalone ID mark table and a rb_gc_mark_id() function will not fix problem of lone IDs on the C stack.

What was the original reason to distinguish Symbol IDs from Object VALUEs, besides making lexer tokens simple to map.
Would changing IDs to be allocated VALUE objects simplify internals anyway? This change could also allow Anonymous Symbols and Anonymous Methods.

I looked more into Symbol GC. The biggest problem is IDs are not VALUEs. My outburst at RubyConf based on my stupid assumption that they were -- I was trying to attack the problem using WeakRefs.

If IDs were VALUEs and Symbols were allocated like any other Object, the existing GC mark and root machinery (including C stack root scans), would take care of it, with an additional sweep of the global_symbol lookup tables.

However, the remaining issue is IDs stored in globals. No matter what, IDs stored in C globals will need to be rb_gc_register_address(VALUE*) roots -- this means CRuby API/contract changes.

Adding a standalone ID mark table and a rb_gc_mark_id() function will not fix problem of lone IDs on the C stack.

What was the original reason to distinguish Symbol IDs from Object VALUEs, besides making lexer tokens simple to map.
Would changing IDs to be allocated VALUE objects simplify internals anyway? This change could also allow Anonymous Symbols and Anonymous Methods.

How would you ensure identity? Do a search on every Symbol creation? Keep a hash map?

Unless I misunderstand your question, we would insure identity with the same mechanism that exists now: a String->Symbol hash map. The difference is the hash map is pruned of dead Symbols during GC sweep. If available, WeakRefs and RefQueues would reduce the cost.