I propose that we eliminate the new "f" suffix and just make the compiler smart enough to see literal strings with .frozen the same way.

So this code:

str = "mystring".freeze

Would be equivalent in the compiler to this code:

str = "mystring"f

And the fstring table would still be used to return pooled instances.

IMPLEMENTATION NOTES:

The fstring table already exists on master and would be used for these pooled strings. An open question is whether the compiler should forever optimize "str".frozen to return the pooled version or whether it should check (inline-cache style) whether String#freeze has been replaced. I am ok with either, but the best potential comes from ignoring String#freeze redefinitions...or making it impossible to redefine String#freeze.

BONUS BIKESHEDDING:

If we do not want to overload the existing .freeze method in this way, we could follow suggestions in http://bugs.ruby-lang.org/issues/8977 to add a new "frozen" method (or some other name) that the compiler would understand.

If it were "frozen", the following two lines would be equivalent:

str = "mystring".frozen
str = "mystring"f

In addition, using .frozen on any string would put it in the fstring table and return that pooled version.

I also propose one alternative method name: the unary ~ operator.

There is no ~ on String right now, and it has no meaning for strings that we'd be overriding. So the following two lines would be equivalent:

str = ~"mystring"
str = "mystring"f

JUSTIFICATION:

Making the compiler aware of normal method-based String freezing has the following advantages:

It will parse in all versions of Ruby.

It will be equivalent in all versions of Ruby other than the fstring pooling.

It extends neatly to Array and Hash; the compiler can see Array or Hash with literal elements and return the same object.

History

If we do not want to overload the existing .freeze method in this way, we could follow suggestions in http://bugs.ruby-lang.org/issues/8977 to add a new "frozen" method (or some other name) that the compiler would understand.

=begin
I am leaning toward #frozen if we want a new name... as in "give me the frozen version of this string". I know that there was some concern that "frozen" was too similar to "freeze" in http://bugs.ruby-lang.org/issues/8977 but it still feels like the best name.

If we can't do a new name that refers to freezing, I'd rather just stick with .freeze.

I am happy to see another discussion on this, I feel "str"f is just a hack.

I strongly agree and think #freeze is the right name.

On the aesthetics side, I personally dislike prefix/suffix forms, they feel like u'str' in python which just makes me think it does not support the right strings by default. I kind of like %f{ ... } but #freeze fits even more in Ruby I think, even if a bit long but at least its semantics are clear.

I propose that we eliminate the new "f" suffix and just make the compiler
smart enough to see literal strings with .frozen the same way.

So this code:

str = "mystring".freeze

Would be equivalent in the compiler to this code:

str = "mystring"f

And the fstring table would still be used to return pooled instances.

This is a great idea IMHO. The backwards compatibility is a huge win
and I think this is the best idea so far regarding frozen strings.

IMPLEMENTATION NOTES:

The fstring table already exists on master and would be used for these
pooled strings. An open question is whether the compiler should
forever optimize "str".frozen to return the pooled version or whether
it should check (inline-cache style) whether String#freeze has been
replaced. I am ok with either, but the best potential comes from
ignoring String#freeze redefinitions...or making it impossible to
redefine String#freeze.

Initially (a few minutes ago), I thought it'd be better to inline-cache
to minimize surprise/keep compatibility. And maybe spew a loud warning on
String#freeze redefinition.

But thinking about this more, string literals are already special.
String#initialize is already ignored for string literals, so perhaps
#freeze may be unredefinedable, as well.

If we do not want to overload the existing .freeze method in this way,
we could follow suggestions in http://bugs.ruby-lang.org/issues/8977
to add a new "frozen" method (or some other name) that the compiler
would understand.

I think having only .freeze is better (especially for compatibility) and
a new .frozen method would be of minimal benefit.
(But you know Ruby far better than I do)

I also propose one alternative method name: the unary ~ operator.

There is no ~ on String right now, and it has no meaning for strings that we'd be overriding. So the following two lines would be equivalent:

I have another idea What about wrap strings in double backquotes or accents?

"this is an interpolable string"
'this is an uninterpolable string'this is a frozen string => double backquoute
´this is also a frozen string´ => simple accent

Accents looks elegant, but I don't know if they are cumbersome in some keyboard distributions and I am not sure if they al limited to UTF8 code, but double backquote seems easy to add to the parser and feels pretty natural.

Whatever character(s) you like the most, (string, ´sting´, string, ~string~, \string) I like the idea to be used as a wrapper, not adding a special symbol before or after the string definition.

The double pipe can be tricky is you want to set a frozen string as default value of a block argument.
The double ^ has confict with the xor operator.
The double ~ has conflict with the complement operator
The double backslash seems fancy, like an opposite of regex, but since can also be used to break lines and escape characters. And probably is more difficult to parse . You tell me.

I know nothing about the parser internals, but the double backquote feels like a string and seems reasonable easy to implement. Seems the best option.

This idea can just coexist with the .freeze method. Is a bit of syntax sugar.

I am actually very concerned about compiler tricks with freeze cause it leads to non-obvious code.

x = "hello".freeze
y = "hello".freeze
x.object_id

10
x.object_id == y.object_id

I don't think you should ever rely on this to be true, since it won't be on older Ruby impls or impls that don't yet have #freeze optimizations. Even Java, with its interned Strings, strongly discourages ever using object identity to compare strings. IDEs even flag it as a warning.

a = "hello"
a.object_id

100
a.freeze
a.object_id
100 # must be 100
...
So the way #freeze operates then depends on where it is being executed, I dislike that.

Much prefer just adding #frozen, we can implement it sort of cleanly in 2.0 (except for GC hooking) and simply alias #freeze in 1.9 and earlier.

So here's the same question I asked in the #frozen feature: why can't #freeze just use the fstring table?

fstrings will GC and clear themselves from that table

large strings put into the table will take up no more space than if they were not frozen

So #freeze could do what you suggest here and always use the fstring table. In the "literal".freeze case, the compiler could do additional magic to go to the table immediately.

This question applies equally to "str"f logic. I'm not sure what the answer is, because I don't know how frozen strings in @charliesome's patch interact with C extensions.

Actually, it occurred to me that the interaction with C extensions is actually even simpler; if a string leaks out to C exts, it's no worse than any string leaking out. The only difference is that the C ext would still have a reference while the fstring table does not. So I think it's no worse than current interaction with strings.

So here's the same question I asked in the #frozen feature: why can't
#freeze just use the fstring table?

That would be an interesting experiment. After all, it is #freeze and
not #freeze!, so maybe we have some leverage there.

I think we do. The worst case scenario is that while referenced we have more entries in the table, which may include strings that become "shady" and pass out to C exts. But those strings would stay alive under the current definition of "shady" and even under older Ruby versions with a purely conservative GC the effects are no worse.

So basically:

If the string is long lived normally, it will take up X bytes for its lifetime.

If the string gets stored in the fstring table, it will last no longer than it would without the fstring table.

If the string is short-lived, it will have a bit more overhead for dealing with fstring table, but very little; hash calculation and table management at most.

I don't doubt your analysis, but I don't think it's any worse with #freeze using fstring table. It's just multiplied by the number of strings that get frozen. Critical failure * N is still a critical failure.

There are 3 things being discussed here, I think it is fairly important we split them out.

Parser optimisation for "string".freeze

Unconditionally have #freeze return a pooled string

Change the semantics of #freeze so it amends the current object and operates like .NET / Java intern does.

1) is completely doable with little side-effects. My caveat is that if #1 is the only thing done, the semantics for #freeze depend on the invocation. That said, this is minor. I totally accept that and prefer "string".freeze to "string"f.

2) without 3) really scares me.

Imagine the odd semantics:

a = "hello"
a.freeze # freezes one RVALUE in memory and returns a different RVALUE

As to 3) I don't think it can be implemented in MRI. If an RVALUE is moved in memory, MRI is going to have to crawl the heap and rewrite all the RVALUE that hold a ref to it, it does not keep track of this internally.

There are 3 things being discussed here, I think it is fairly important we split them out.

Parser optimisation for "string".freeze

1) is completely doable with little side-effects. My caveat is that if #1 is the only thing done, the semantics for #freeze depend on the invocation. That said, this is minor. I totally accept that and prefer "string".freeze to "string"f.

It's a part of byte-code optimization, not parser. Since we have done
it already for several methods, no problem there.

Unconditionally have #freeze return a pooled string

Change the semantics of #freeze so it amends the current object and operates like .NET / Java intern does.

2) without 3) really scares me.

Imagine the odd semantics:

a = "hello"
a.freeze # freezes one RVALUE in memory and returns a different RVALUE

As to 3) I don't think it can be implemented in MRI. If an RVALUE is moved in memory, MRI is going to have to crawl the heap and rewrite all the RVALUE that hold a ref to it, it does not keep track of this internally.

It seems like everyone agrees that "string".freeze is a better choice
than adding incompatible syntax now. That was the original proposal in this
issue.

Should we remove "string"f on master and replace it with charliesome's
patch for "string".freeze? Or do we want to bikeshed a shorter name?

It occurred to me the there's already "string".b which returns a binary
string. Should we consider "string".f which is similar to "string"f syntax
but is just a normal method?

I think we're in agreement that we want the method format rather than the
"f" suffix, so it's just a matter of deciding if we want a different method
name for the new compiler-aware method.

Yes. I feel like regexen have suffixes because of decades of perl
precedence, but they (suffixes) don't belong anywhere else.

For a method, I feel like #freeze is the better name, my only question is:
is anyone monkeypatching it (and therefore will be bitten by this
optimisation)? I doubt it, but we should still ask. The same question would
have to be asked of the new method; I think there's more chance of #f being
used in the wild than an overridden #freeze

Also in the favour of #freeze, it gives existing code a boost without any
modification.

There are 3 things being discussed here, I think it is fairly important we split them out.

Parser optimisation for "string".freeze

1) is completely doable with little side-effects. My caveat is that if #1 is the only thing done, the semantics for #freeze depend on the invocation. That said, this is minor. I totally accept that and prefer "string".freeze to "string"f.

It's a part of byte-code optimization, not parser. Since we have done
it already for several methods, no problem there.

So can we move this optimization to the parser instead? I think Sam means:

# optimized literal by parser. This may use pooled string since
# the string never existed in ObjectSpace before this line of code:
"string".freeze

Change the semantics of #freeze so it amends the current object and operates like .NET / Java intern does.

2) without 3) really scares me.

Imagine the odd semantics:

a = "hello"
a.freeze # freezes one RVALUE in memory and returns a different RVALUE

Yes, this scares me if a.freeze made a different RVALUE

As to 3) I don't think it can be implemented in MRI. If an RVALUE is moved in memory, MRI is going to have to crawl the heap and rewrite all the RVALUE that hold a ref to it, it does not keep track of this internally.

I have added #9042 and #9043 for removing the "f" suffix and adding the #f method, respectively.

I'm starting to lean toward making #f be the only magic form, so nobody can complain that we're adding incompatible syntax ("f" suffix) or changing the semantics of an existing method (#freeze optimization).

I'm starting to lean toward making #f be the only magic form, so nobody can complain that we're ... changing the semantics of an existing method (#freeze optimization).

I don't get this argument. Optimized String#freeze doesn't really change semantics in any real way. I'm happy to just ignore anyone that complains about optimizing #freeze on a string literal.

There's only one way this could possibly affect any Ruby code - and that's if the code inspects the object_id of literal strings that it immediately calls #freeze on. I'd say any code that breaks due to this was already fairly brittle anyway.

Not directly, but I wasn't able to come up with performance benefits
from my patch for https://bugs.ruby-lang.org/issues/8998
However, I don't have real apps which depend on hash/string-keys
performance.

I like this idea because:
- No syntax change
- Semantics was changed ("literal".freeze.object_id => anytime same),
but I can't imagine the apps which rely on this behavior.
- Except 2nd point, this is no compatibility issue.