Yep. chain uses voldemort type, map does not.
Georgi: try copying chain implementation into your local file, but
instead of having a static internal struct, move it to a module-level
struct (you probably have to repeat the template parameters, but not the
constraints). See if it produces a similar slimming effect.
-Steve

We definitely need to fix Voldemort types. Walter and I bounced a few
ideas during DConf.
1. Do the cleartext/compress/hash troika everywhere (currently we do it
on Windows). That should help in several places.
2. For Voldemort types, simply generate a GUID-style random string of
e.g. 64 bytes. Currently the name of the type is composed out of its
full scope, with some exponentially-increasing repetition of substrings.
That would compress well, but it's just a lot of work to produce and
then compress the name. A random string is cheap to produce and adequate
for Voldemort types (you don't care for their name anyway...
Voldemort... get it?).
I very much advocate slapping a 64-long random string for all Voldermort
returns and calling it a day. I bet Liran's code will get a lot quicker
to build and smaller to boot.
Andrei

We definitely need to fix Voldemort types. Walter and I bounced a few
ideas during DConf.
1. Do the cleartext/compress/hash troika everywhere (currently we do it
on Windows). That should help in several places.
2. For Voldemort types, simply generate a GUID-style random string of
e.g. 64 bytes. Currently the name of the type is composed out of its
full scope, with some exponentially-increasing repetition of substrings.
That would compress well, but it's just a lot of work to produce and
then compress the name. A random string is cheap to produce and adequate
for Voldemort types (you don't care for their name anyway...
Voldemort... get it?).
I very much advocate slapping a 64-long random string for all Voldermort
returns and calling it a day. I bet Liran's code will get a lot quicker
to build and smaller to boot.

This may be crazy, but I solved this problem by simply moving the struct
outside the function. What about a lowering that does this for you?
Example:
auto foo(T)(T t) if (someConstraints)
{
static struct Result
{
T t;
}
return Result(t);
}
Changes to:
private struct foo_Result(T) if (someConstraints)
{
T t;
}
auto foo(T)(T t) if (someConstraints)
{
alias Result = foo_Result!T // inserted by compiler
return Result(t);
}
Or even better:
template(T) foo if (someConstraints)
{
struct Result
{
T t;
}
auto foo(T t)
{
return Result(t);
}
}
What this does is lower the number of times the template parameters (and
function arguments) appears in the type name to 1. This gets rid of the
exponential nature of the symbol name.
This doesn't work for voldemorts with closures, but maybe it can somehow
be reconstructed so the context is included in the generated struct.
-Steve

Consider using a 'static struct'. A non-static struct will
include a hidden member that points to the stack frame of
foo(), i.e. "produces smaller code".

Queue me going to the language reference to look up static
structs and reading
"A struct can be prevented from being nested by using the static
attribute, but then of course it will not be able to access
variables from its enclosing scope."
Not a fan of that description...

Queue me going to the language reference to look up static structs and reading
"A struct can be prevented from being nested by using the static attribute, but
then of course it will not be able to access variables from its enclosing
scope."
Not a fan of that description...

This solution works awesomely, actually. It even produces smaller code
than
moving the struct completely outside.
I'm going to use this mechanism in iopipe to get my voldemorts back :)

Consider using a 'static struct'. A non-static struct will include a
hidden member that points to the stack frame of foo(), i.e. "produces
smaller code".

Yes, I did use static in my actual code. In fact, I found a bug because
of this (one of my voldemort functions was allocating a closure because
I had thought I was using a member but was actually using a function
parameter).
But this doesn't fix the symbol size problem. The other mechanism does.
-Steve

Using 64 character random strings will make symbolic debugging
unpleasant.

Using 6.4 megabyte strings already makes symbolic debugging
unpleasant.
The one thing that worries me about random strings is that it
needs to be the same across all builds, or you'll get random
linking errors when doing package-at-a-time or whatever (dmd
already has some problems like this!). But building a gigantic
string then compressing or hashing it still sucks... what we need
is a O(1) solution that is still unique and repeatable.

Using 64 character random strings will make symbolic debugging
unpleasant.

Using 6.4 megabyte strings already makes symbolic debugging
unpleasant.
The one thing that worries me about random strings is that it
needs to be the same across all builds, or you'll get random
linking errors when doing package-at-a-time or whatever (dmd
already has some problems like this!). But building a gigantic
string then compressing or hashing it still sucks... what we
need is a O(1) solution that is still unique and repeatable.

Clearly the reason for building such a gigantic string was some
sort of repetition. Detect the repetition on-the-fly to avoid
processing all of it. This way the generated name is already
compressed.

(...)
Clearly the reason for building such a gigantic string was some
sort of repetition. Detect the repetition on-the-fly to avoid
processing all of it. This way the generated name is already
compressed.

Using 64 character random strings will make symbolic debugging
unpleasant.

Using 6.4 megabyte strings already makes symbolic debugging
unpleasant.
The one thing that worries me about random strings is that it
needs to be the same across all builds, or you'll get random
linking errors when doing package-at-a-time or whatever (dmd
already has some problems like this!). But building a gigantic
string then compressing or hashing it still sucks... what we
need is a O(1) solution that is still unique and repeatable.

Good point. Using a SHA1 derived from the string instead of a
GUID is imho better. It has the advantage of repeatability, is
even shorter and not very expensive to generate.

You could simply add a "trivial" version of the struct +
enclosing function name (ie once, without repetitions) before or
after the random character string. This way you would know which
struct its referring to, its unique, and you still avoid
generating a 5 Exabyte large symbol name just to
compress/hash/whatever it.

Unfortunately, the PR doesn't cure the root cause, but only
provides linear 45-55% improvement, which doesn't scale with
the exponential growth of the symbol size:
case time to compile du -h strings file | wc -c
NG-case before 0m19.708s 339M 338.23M
NG-case after 0m27.006s 218M 209.35M
OK-case before 0m1.466s 16M 15.33M
OK-case after 0m1.856s 11M 9.28M
For more info on the measurements:
https://github.com/dlang/dmd/pull/5793#issuecomment-220550682

IMO, the best way forward is:
+ The compiler should lower voldemort types, according to the
scheme that Steve suggested
(http://forum.dlang.org/post/nhkmo7$ob5$1 digitalmars.com)
+ After that, during symbol generation (mangling) if a symbol
starts getting larger than some threshold (e.g. 800 characters),
the mangling algorithm should detect that and bail out by
generating some unique id instead. The only valuable information
that the symbol must include is the module name and location
(line and column number) of the template instantiation.

IMO, the best way forward is:
+ The compiler should lower voldemort types, according to the
scheme that Steve suggested
(http://forum.dlang.org/post/nhkmo7$ob5$1 digitalmars.com)
+ After that, during symbol generation (mangling) if a symbol
starts getting larger than some threshold (e.g. 800
characters), the mangling algorithm should detect that and bail
out by generating some unique id instead. The only valuable
information that the symbol must include is the module name and
location (line and column number) of the template instantiation.

Location info shouldn't be used. This will break things like
interface files and dynamic libraries.

IMO, the best way forward is:
+ The compiler should lower voldemort types, according to the
scheme that Steve suggested
(http://forum.dlang.org/post/nhkmo7$ob5$1 digitalmars.com)
+ After that, during symbol generation (mangling) if a symbol
starts getting larger than some threshold (e.g. 800
characters), the mangling algorithm should detect that and
bail out by generating some unique id instead. The only
valuable information that the symbol must include is the
module name and location (line and column number) of the
template instantiation.

Location info shouldn't be used. This will break things like
interface files and dynamic libraries.

Well... template-heavy code doesn't play well header-files and
dynamic libraries. Most of the time templates are used for the
implementation of an interface, but template types such as ranges
are unsuable in function signatures. That's why they're called
voldemort types - because they're the ones that can not/must not
be named.
Instead dynamic libraries should use stable types such as
interfaces, arrays and function pointers, which don't have the
aforementioned symbol size problems.
Since we're using random numbers for symbols (instead of the
actual names) it would not be possible for such symbols to be
part of an interface, because a different invocation of the
compiler would produce different symbol names. Such symbols
should always an implementation detail, and not part of an
interface. That's why location info would play no role, except
for debugging purposes.

IMO, the best way forward is:
+ The compiler should lower voldemort types, according to the
scheme that Steve suggested
(http://forum.dlang.org/post/nhkmo7$ob5$1 digitalmars.com)
+ After that, during symbol generation (mangling) if a symbol
starts getting larger than some threshold (e.g. 800
characters), the mangling algorithm should detect that and
bail out by generating some unique id instead. The only
valuable information that the symbol must include is the
module name and location (line and column number) of the
template instantiation.

Location info shouldn't be used. This will break things like
interface files and dynamic libraries.

Well... template-heavy code doesn't play well header-files and
dynamic libraries. Most of the time templates are used for the
implementation of an interface, but template types such as
ranges are unsuable in function signatures. That's why they're
called voldemort types - because they're the ones that can
not/must not be named.
Instead dynamic libraries should use stable types such as
interfaces, arrays and function pointers, which don't have the
aforementioned symbol size problems.
Since we're using random numbers for symbols (instead of the
actual names) it would not be possible for such symbols to be
part of an interface, because a different invocation of the
compiler would produce different symbol names. Such symbols
should always an implementation detail, and not part of an
interface. That's why location info would play no role, except
for debugging purposes.

Rene
How do you expect the compiler to know the exact return type,
only by looking at this signature:
auto transmogrify(string str);
A possible implementation might be this:
auto transmogrify(string str)
{
return str.map!someFunc.filter!otherFunc.joiner();
}
or something completly different.

Rene
How do you expect the compiler to know the exact return type,
only by looking at this signature:
auto transmogrify(string str);
A possible implementation might be this:
auto transmogrify(string str)
{
return str.map!someFunc.filter!otherFunc.joiner();
}
or something completly different.

Rene
How do you expect the compiler to know the exact return type,
only by looking at this signature:
auto transmogrify(string str);
A possible implementation might be this:
auto transmogrify(string str)
{
return str.map!someFunc.filter!otherFunc.joiner();
}
or something completly different.

The only issue is code bloat. It also happens with separate
compilation, becuase the compiler can't know if the template has
already been instantiated in a different compilation unit.
Only in a single compilation unit/invocation the compiler can
reliably see all the places where the same template instance is
used. In that case, the actual mangled name shouldn't matter,
AFAIU.

This is a fallacy. I don't think so, at all, when the baseline is an
extremely long string. In any given scope there are at most a handful of
Voldemort types at play, and it's easy to identify which is which. We've
all done so many times with IPs, GUIDs of objects, directories, etc.
etc. -- Andrei

This is a fallacy. I don't think so, at all, when the baseline
is an extremely long string.

I agree with Andrei.
I solved it this way
https://github.com/ldc-developers/ldc/pull/1445:
"Hashed symbols look like this:
_D3one3two5three3L3433_46a82aac733d8a4b3588d7fa8937aad66Result3fooZ
ddemangle gives:
one.two.three.L34._46a82aac733d8a4b3588d7fa8937aad6.Result.foo
Meaning: this symbol is defined in module one.two.three on line
34. The identifier is foo and is contained in the struct or class
Result."

This is a fallacy. I don't think so, at all, when the baseline is an
extremely long string.

I agree with Andrei.
I solved it this way https://github.com/ldc-developers/ldc/pull/1445:
"Hashed symbols look like this:
_D3one3two5three3L3433_46a82aac733d8a4b3588d7fa8937aad66Result3fooZ
ddemangle gives:
one.two.three.L34._46a82aac733d8a4b3588d7fa8937aad6.Result.foo
Meaning: this symbol is defined in module one.two.three on line 34. The
identifier is foo and is contained in the struct or class Result."

This is nice. How difficult would it be to rework it into a PR for dmd?
-- Andrei

https://github.com/ldc-developers/ldc/pull/1445:
"Hashed symbols look like this:
_D3one3two5three3L3433_46a82aac733d8a4b3588d7fa8937aad66Result3fooZ
ddemangle gives:
one.two.three.L34._46a82aac733d8a4b3588d7fa8937aad6.Result.foo
Meaning: this symbol is defined in module one.two.three on
line 34. The
identifier is foo and is contained in the struct or class
Result."

This is nice. How difficult would it be to rework it into a PR
for dmd? -- Andrei

I can work on it, but only if it will not result in a long debate
afterwards (!!!).
One obstacle is the hasher itself: I am not going to implement
one myself! In the LDC PR, I used LLVM's MD5 hasher and Phobos's
MD5 hasher. Perhaps it is better to use a faster hasher (I have
no expertise on that; Murmur?), so we will have to carry our own
copy of a good hasher implementation. Or perhaps the speedlimit
is memory access and hash algorithm speed doesn't matter.
I made the hashing optional, with a symbol length threshold.
Getting rid of the variable threshold would be good, such that
the (few) large symbols in Phobos are hashed too and all will
work fine. Perhaps 1k is a good threshold.

https://github.com/ldc-developers/ldc/pull/1445:
"Hashed symbols look like this:
_D3one3two5three3L3433_46a82aac733d8a4b3588d7fa8937aad66Result3fooZ
ddemangle gives:
one.two.three.L34._46a82aac733d8a4b3588d7fa8937aad6.Result.foo
Meaning: this symbol is defined in module one.two.three on line 34. The
identifier is foo and is contained in the struct or class Result."

This is nice. How difficult would it be to rework it into a PR for
dmd? -- Andrei

I can work on it, but only if it will not result in a long debate
afterwards (!!!).

Thanks. I'll get back to you on that.

One obstacle is the hasher itself: I am not going to implement one
myself! In the LDC PR, I used LLVM's MD5 hasher and Phobos's MD5 hasher.
Perhaps it is better to use a faster hasher (I have no expertise on
that; Murmur?), so we will have to carry our own copy of a good hasher
implementation. Or perhaps the speedlimit is memory access and hash
algorithm speed doesn't matter.
I made the hashing optional, with a symbol length threshold. Getting rid
of the variable threshold would be good, such that the (few) large
symbols in Phobos are hashed too and all will work fine. Perhaps 1k is a
good threshold.

I don't see a need for hashing something. Would a randomly-generated
string suffice?
Andrei

One obstacle is the hasher itself: I am not going to implement one
myself! In the LDC PR, I used LLVM's MD5 hasher and Phobos's MD5
hasher. Perhaps it is better to use a faster hasher (I have no
expertise on that; Murmur?), so we will have to carry our own copy
of a good hasher implementation. Or perhaps the speedlimit is memory
access and hash algorithm speed doesn't matter.
I made the hashing optional, with a symbol length threshold. Getting
rid of the variable threshold would be good, such that the (few)
large symbols in Phobos are hashed too and all will work fine.
Perhaps 1k is a good threshold.

I don't see a need for hashing something. Would a randomly-generated
string suffice?

[...]
Wouldn't we want the same symbol to be generated if we call a
Voldemort-returning function with the same compile-time arguments (but
not necessarily runtime arguments) multiple times from different places?
This is likely not an issue when compiling all sources at once, but may
be a problem with incremental compilation.
T
--
"Hi." "'Lo."

I don't see a need for hashing something. Would a
randomly-generated string suffice?

Naïve question: is a (randomly-)?generated _anything_ required?
I've probably missed something, but it seems like line noise just
because we feel we must. I guess there's a possibility that
there would be multiple matches on the same line with the same
object and identifier... Stick the column in there too and call
it a day?
-Wyatt

I've probably missed something, but it seems like line noise
just because we feel we must.

Line and file is not unique because the same template would
generate different functions for different arguments.
Template arguments are a part of the identifier... and that can
really blow up the size because they might be template arguments
which might be template arguments, etc., but each unique argument
could be creating an entirely different function.

I don't see a need for hashing something. Would a
randomly-generated string
suffice?

Hashing produces reproducible unique values. Random will not be
reproducible and may not even be unique.

The question is: is it actually good for them to be reproducible?
The very idea behind voldemort types is that you don't reference
them directly in any way, it is just implementation detail. To me
it does make sense to apply it to debugging too (debugging of
deeply chained template types isn't really very usable anyway).

1. yes, because the same type can be generated from multiple compilation units
(the linker removes the duplicates - if they're not duplicates, the linker
won't
remove them).
2. having the compiler produce different .o files on different runs with the
same inputs is pretty eyebrow raising, and makes it harder to debug the
compiler, and harder to have a test suite for the compiler.

2. having the compiler produce different .o files on different runs
with the same inputs is pretty eyebrow raising, and makes it harder to
debug the compiler, and harder to have a test suite for the compiler.

Yeah, I think hashing is the way to go. Random strings just raise more
issues than they solve.
T
--
In theory, software is implemented according to the design that has been
carefully worked out beforehand. In practice, design documents are written
after the fact to describe the sorry mess that has gone on before.

I don't see a need for hashing something. Would a
randomly-generated string
suffice?

Hashing produces reproducible unique values. Random will not
be reproducible and may not even be unique.

The question is: is it actually good for them to be
reproducible? The very idea behind voldemort types is that you
don't reference them directly in any way, it is just
implementation detail. To me it does make sense to apply it to
debugging too (debugging of deeply chained template types isn't
really very usable anyway).

It would make binary comparison of libraries and executables
difficult which troubles me as comparing hashes is a basics of
binary distribution security : you can check that a precompiled
binary is legit by recompiling it in the same conditions and
comparing the two. It would be way harder if random components
were added.

It would make binary comparison of libraries and executables
difficult which troubles me as comparing hashes is a basics of
binary distribution security : you can check that a precompiled
binary is legit by recompiling it in the same conditions and
comparing the two. It would be way harder if random components
were added.

It would make binary comparison of libraries and executables
difficult which troubles me as comparing hashes is a basics of
binary distribution security : you can check that a precompiled
binary is legit by recompiling it in the same conditions and
comparing the two. It would be way harder if random components
were added.

My recollection is that successively compiled binaries are rarely
directly comparable, because of timestamps embedded by compilers.
So I have to ask: are there standard tools to understand enough
of the ELF binary format (or whatever, for the given platform) to
compare everything except the timestamps?

My recollection is that successively compiled binaries are
rarely directly comparable, because of timestamps embedded by
compilers.
So I have to ask: are there standard tools to understand enough
of the ELF binary format (or whatever, for the given platform)
to compare everything except the timestamps?

I just looked at an ELF format document, and didn't see any
reference to timestamps. Maybe I'm confusing that with
compression
formats like gzip produces.

This is a fallacy. I don't think so, at all, when the baseline
is an extremely long string.

I agree with Andrei.
I solved it this way
https://github.com/ldc-developers/ldc/pull/1445:
"Hashed symbols look like this:
_D3one3two5three3L3433_46a82aac733d8a4b3588d7fa8937aad66Result3fooZ
ddemangle gives:
one.two.three.L34._46a82aac733d8a4b3588d7fa8937aad6.Result.foo
Meaning: this symbol is defined in module one.two.three on line
34. The identifier is foo and is contained in the struct or
class Result."

I like your approach. As I said earlier, it would be best if can
prevent the generation of long symbols in the first place,
because that would improve the compilation times significantly.
Walter's PR slows down the compilation with 25-40% according to
my tests. I expect that compilation would be faster if the whole
process is skipped altogether.

As I said earlier, it would be best if can prevent the
generation of long symbols in the first place, because that
would improve the compilation times significantly.

From what I've observed, generating the long symbol name itself
is fast. If we avoid the deep type hierarchy, then I think indeed
you can expect compile time improvement.

Walter's PR slows down the compilation with 25-40% according to
my tests. I expect that compilation would be faster if the
whole process is skipped altogether.

MD5 hashing slowed down builds by a few percent for Weka (note:
LDC machinecodegen is slower than DMD's, so percentage-wise...),
which can then be compensated for using PGO ;-) /+ <--
shameless PGO plug +/

As I said earlier, it would be best if can prevent the
generation of long symbols in the first place, because that
would improve the compilation times significantly.

From what I've observed, generating the long symbol name itself
is fast. If we avoid the deep type hierarchy, then I think
indeed you can expect compile time improvement.

Walter's PR slows down the compilation with 25-40% according
to my tests. I expect that compilation would be faster if the
whole process is skipped altogether.

MD5 hashing slowed down builds by a few percent for Weka (note:
LDC machinecodegen is slower than DMD's, so
percentage-wise...), which can then be compensated for using
PGO ;-) /+ <-- shameless PGO plug +/

IIUC, your scheme works like this:
1. DMDFE creates a mangled symbol name
2. Create a MD-5 hash of thr symbol use the hash instead of the
full name.
If minimal change in Georgi's almost trivial program w.r.t LoC
(compared to e.g. Weka's huge codebase) increases compilation
time from 1.5sec to 27sec, I can't imagine how slower it would
take for a larger project.
We should cure root cause. Genetating uselessly large symbols
(even if hashed after that) is part of that problem, so I think
it should never done if they start growing than e.g. 500 bytes.
The solution that Steven showed is exactly what the compiler
should do. Another reason why the compiler should do it is
because often voldemort types capture outer context (local
variables, alias paramters, delegates, etc.), which makes it very
hard for the user to manually extract the voldemort type out of
the function.

As I said earlier, it would be best if can prevent the
generation of long symbols in the first place, because that
would improve the compilation times significantly.

From what I've observed, generating the long symbol name
itself is fast. If we avoid the deep type hierarchy, then I
think indeed you can expect compile time improvement.

Walter's PR slows down the compilation with 25-40% according
to my tests. I expect that compilation would be faster if the
whole process is skipped altogether.

MD5 hashing slowed down builds by a few percent for Weka
(note: LDC machinecodegen is slower than DMD's, so
percentage-wise...), which can then be compensated for using
PGO ;-) /+ <-- shameless PGO plug +/

IIUC, your scheme works like this:
1. DMDFE creates a mangled symbol name
2. Create a MD-5 hash of thr symbol use the hash instead of the
full name.
If minimal change in Georgi's almost trivial program w.r.t LoC
(compared to e.g. Weka's huge codebase) increases compilation
time from 1.5sec to 27sec, I can't imagine how slower it would
take for a larger project.
We should cure root cause. Genetating uselessly large symbols
(even if hashed after that) is part of that problem, so I think
it should never done if they start growing than e.g. 500 bytes.
The solution that Steven showed is exactly what the compiler
should do. Another reason why the compiler should do it is
because often voldemort types capture outer context (local
variables, alias paramters, delegates, etc.), which makes it
very hard for the user to manually extract the voldemort type
out of the function.

I see two separate issues that I think should be handled
independently:
1) Exponential growth of symbol name with voldemort types.
I like Steven's solution where the compiler lowers the struct
outside of the method.
2) Long symbol names in general which could arise even without
voldemort types involved especially with chaining multiple
algorithms.
I like Johan Engelen solution in LDC for symbols longer than a
threshold. For symbols shorter than the threshold I think
Walter's compression algorithm could be used to gain 40-60%
reduction and still retain the full type information.

1) Exponential growth of symbol name with voldemort types.
I like Steven's solution where the compiler lowers the struct
outside of
the method.

Was talking to Walter on the phone and he just had one of those
great ideas: encode in the function mangle that it returns
"auto". I thought that was fantastic. Would that work? -- Andrei

This is a very interesting idea. I see one problem though: The
real issue is not just the return type but also the types of the
input parameters to a function. So even if the return type of a
method is encoded as "auto" the moment the result of that method
is passed as parameter to another template method the long
typename will resurface. I do not think the type of the input
parameters could be encoded as "auto" since the different
instances of a template method will clash in names.
In essence the problem that should be solved is the long names of
types.

This is a very interesting idea. I see one problem though: The
real issue is not just the return type but also the types of
the input parameters to a function. So even if the return type
of a method is encoded as "auto" the moment the result of that
method is passed as parameter to another template method the
long typename will resurface. I do not think the type of the
input parameters could be encoded as "auto" since the different
instances of a template method will clash in names.
In essence the problem that should be solved is the long names
of types.

I keep having it go through my head, if we had to hash the
result, I'd prefer to has the source (minus comments &
whitespace, and after replacements have been done) to come up
with a reproducible value. This of course is if long names or
compression won't do the job.

Was talking to Walter on the phone and he just had one of
those great ideas: encode in the function mangle that it
returns "auto". I thought that was fantastic. Would that work?
-- Andrei

Equivalent to not mangling return type at all. But this leaves
you with 2^^n growth, still exponential. Also is there any
evidence that compression is faster than hashing?

Would it be practical to use type-erasure in the middle of a call
chain? It would cut 2^^n right at the knees at the possibly
negligible runtime cost.
It would impose a burden on the client (as I don't think a
compiler should do it on its own - or even if it could), but if
it's really Pareto distributed then it shouldn't be that
difficult to find the hot spots.

He said that that won't happen any longer, the growth was
because of the return type. Is that correct?

https://issues.dlang.org/show_bug.cgi?id=15831#c4
Looks like growth is due to the fact that the voldemort type is
in the scope of a function and function is fun!(T).fun(T) - hence
each level multiplies by 2. And return type is not part of the
mangled name already - that would cause circular dependency, you
would need the voldemort type mangling to generate it.

He said that that won't happen any longer, the growth was because of
the return type. Is that correct?

https://issues.dlang.org/show_bug.cgi?id=15831#c4
Looks like growth is due to the fact that the voldemort type is in the
scope of a function and function is fun!(T).fun(T) - hence each level
multiplies by 2. And return type is not part of the mangled name already
- that would cause circular dependency, you would need the voldemort
type mangling to generate it.

Just to make sure I understand: do you agree or disagree that there's no
more exponential growth if we encode "auto return" in the mangling? Thx!
-- Andrei

He said that that won't happen any longer, the growth was because of
the return type. Is that correct?

https://issues.dlang.org/show_bug.cgi?id=15831#c4
Looks like growth is due to the fact that the voldemort type is in the
scope of a function and function is fun!(T).fun(T) - hence each level
multiplies by 2. And return type is not part of the mangled name already
- that would cause circular dependency, you would need the voldemort
type mangling to generate it.

Just to make sure I understand: do you agree or disagree that there's no
more exponential growth if we encode "auto return" in the mangling? Thx!
-- Andrei

Not sure if someone has definitively answered before, but no, this does
not. I think this would shrink the growth from 3^n to 2^n.
The exponential growth happens because of the repeating of the template
types.
If you look at the example in the bug report, there is no return types
anywhere. They exist in the mangled names, but are not relevant to the
FQN or symbol resolution. The return type only plays a factor for
preventing linking against other files that expect a certain return
type. In fact, this may be a reason to NOT shortcut the return mangle
(if you had compiled against foo one day, and foo's internal voldemort
type changes the next, it would still link, even though the type may
have changed).
Indeed, D does not overload based on return type ever.
-Steve

In fact, this may be a reason to NOT shortcut the return mangle
(if you had compiled against foo one day, and foo's internal voldemort
type changes the next, it would still link, even though the type may
have changed).

Actually, this is not true. It will only fail to link if the name of the
internal type changes, or you return some other type :(
-Steve

He said that that won't happen any longer, the growth was because of
the return type. Is that correct?

https://issues.dlang.org/show_bug.cgi?id=15831#c4
Looks like growth is due to the fact that the voldemort type is in the
scope of a function and function is fun!(T).fun(T) - hence each level
multiplies by 2. And return type is not part of the mangled name already
- that would cause circular dependency, you would need the voldemort
type mangling to generate it.

Just to make sure I understand: do you agree or disagree that there's no
more exponential growth if we encode "auto return" in the mangling? Thx!
-- Andrei

Not sure if someone has definitively answered before, but no, this does
not. I think this would shrink the growth from 3^n to 2^n.

To clarify, I think this is still a good idea, but only if the type is
defined internally (no reason to repeat the entire function signature of
the function you are defining). But if you specify auto return and
return int, it should still be mangled as returning int.
-Steve

It does factor into type matching when a function pointer is used, for
example.

Yes, that may be the only time return type is factored in, but
practically speaking, a function overload set that overloads solely on
return type is not directly callable (you will get ambiguity error), so
there is little point to create such a situation.
-Steve

Equivalent to not mangling return type at all. But this leaves
you with 2^^n growth, still exponential. Also is there any
evidence that compression is faster than hashing?

Depends on implementation and algorithm. However even the
weakest compression settings can yield huge initial compression
benefits. In normal text a reduction of 2:1 is expected, and will
. The benefit being compression still contains all it's
information, and hashing will reduce the size to a fixed length
but throw away all that information for unique identity.
LZO is a compressor/decompressor that prioritizes speed and can
be used in real-time applications for very little cost to add
compression to any application. I tinkered with some of it's
options a while back and it did okay, not as good as zlib but
that's to be expected. Although using zlib on it's fastest
setting will likely yield good results too.
http://www.oberhumer.com/opensource/lzo/

Depends on implementation and algorithm. However even the weakest compression
settings can yield huge initial compression benefits. In normal text a
reduction
of 2:1 is expected, and will . The benefit being compression still contains all
it's information, and hashing will reduce the size to a fixed length but throw
away all that information for unique identity.
LZO is a compressor/decompressor that prioritizes speed and can be used in
real-time applications for very little cost to add compression to any
application. I tinkered with some of it's options a while back and it did okay,
not as good as zlib but that's to be expected. Although using zlib on it's
fastest setting will likely yield good results too.
http://www.oberhumer.com/opensource/lzo/

The underlying symbols are still growing exponentially. Nobody
has the RAM for that, regardless of what size the resultant
binary is.
Compressing the symbol names is a bandage. The compiler needs a
new kidney.

The underlying symbols are still growing exponentially. Nobody has the
RAM for that, regardless of what size the resultant binary is.
Compressing the symbol names is a bandage. The compiler needs a new kidney.

My understanding is that the encoding "auto" return in the mangling
makes any exponential growth disappear. Is that the case? -- Andrei

The underlying symbols are still growing exponentially. Nobody
has the
RAM for that, regardless of what size the resultant binary is.
Compressing the symbol names is a bandage. The compiler needs
a new kidney.

My understanding is that the encoding "auto" return in the
mangling makes any exponential growth disappear. Is that the
case? -- Andrei

No:
auto foo(T)(T x)
{
struct Vold { T payload; }
return Vold(x);
}
foo(5)
_D3mod10__T3fooTiZ3fooFNaNbNiNfiZS3mod10__T3fooTiZ3fooFiZ4Vold
foo(5).foo
_D3mod38__T3fooTS3mod10__T3fooTiZ3fooFiZ4VoldZ3fooFNaNbNiNfS3mod10__T3fooTiZ3fooFiZ4VoldZS3mod38__T3fooTS3mod10__T3fooTiZ3fooFiZ4VoldZ3fooFS3mod10__T3fooTiZ3fooFiZ4VoldZ4Vold
foo(5).foo.foo
_D3mod94__T3fooTS3mod38__T3fooTS3mod10__T3fooTiZ3fooFiZ4VoldZ3fooFS3mod10__T3fooTiZ3fooFiZ4VoldZ4VoldZ3fooFNaNbNiNfS3mod38__T3fooTS3mod10__T3fooTiZ3fooFiZ4VoldZ3fooFS3mod10__T3fooTiZ3fooFiZ4VoldZ4VoldZS3mod94__T3fooTS3mod38__T3fooTS3mod10__T3fooTiZ3fooFiZ4VoldZ3fooFS3mod10__T3fooTiZ3fooFiZ4VoldZ4VoldZ3fooFS3mod38__T3fooTS3mod10__T3fooTiZ3fooFiZ4VoldZ3fooFS3mod10__T3fooTiZ3fooFiZ4VoldZ4VoldZ4Vold
Lengths: 62 | 174 | 398
Just dropping the return types to a single character ($) shrinks
the names, but it doesn't solve the core of the problem. Still
exponential:
foo(5)
_D3mod1010__T3fooTiZ3fooFNaNbNiNfiZ($)
foo(5).foo
_D3mod38__T3fooT(S3mod10__T3fooTiZ3fooFiZ4Vold)Z3fooFNaNbNiNf(S3mod10__T3fooTiZ3fooFiZ4Vold)Z{$}
foo(5).foo.foo
_D3mod94__T3fooT{S3mod38__T3fooT(S3mod10__T3fooTiZ3fooFiZ4Vold)Z3fooF(S3mod10__T3fooTiZ3fooFiZ4Vold)Z4Vold}Z3fooFNaNbNiNf{S3mod38__T3fooT(S3mod10__T3fooTiZ3fooFiZ4Vold)Z3fooF(S3mod10__T3fooTiZ3fooFiZ4Vold)Z4Vold}Z$
Lengths: 36 | 90 | 202
Note: the part inside () is the return type of the first. The
part in {} is the return type of the second. I left those in for
illustrative purposes.

Walter's PR slows down the compilation with
25-40% according to my tests. I expect that compilation would be faster if the
whole process is skipped altogether.

The compressor only kicks in if the string is longer than 64 bytes. I set it
pretty low in order to have it kick in often, and hopefully flush out any bugs
in it.
For a production version, the minimum size should be set substantially larger,
based on testing to provide a reasonable balance.
That should speed it up a lot. Also, the algorithm is a bit naively
implemented,
I bet it could be speeded up substantially.
Hashing isn't algorithmically cheap, either.

Walter's PR slows down the compilation with
25-40% according to my tests. I expect that compilation would be
faster if the
whole process is skipped altogether.

The compressor only kicks in if the string is longer than 64 bytes. I
set it pretty low in order to have it kick in often, and hopefully flush
out any bugs in it.
For a production version, the minimum size should be set substantially
larger, based on testing to provide a reasonable balance.
That should speed it up a lot. Also, the algorithm is a bit naively
implemented, I bet it could be speeded up substantially.
Hashing isn't algorithmically cheap, either.

From the measurements shown the work seems Pareto distributed - the
major time spender is the few long symbols, not the many short symbols.
There are a few long symbols that dominate everything else by an order
of magnitude. The one percenters! :o) -- Andrei

I also don't think compression should be a performance issue. I
heard that some compression algorithms are as fast as the data
comes in, so fast enough for our purpose.
The reason I chose hashing is simplicity and a guaranteed small
symbol size. I wasn't sure whether a 5MB symbol would compress to
a reasonable size. I wanted symbols to be readable; so less than,
say, 80 chars.
For people interested in finding out whether mangling changes
will improve compile speed performance: forget about linking and
mangling, and just assign symbol names like "a", "b", "c", etc.,
and see what happens.

We definitely need to fix Voldemort types. Walter and I bounced
a few ideas during DConf.
1. Do the cleartext/compress/hash troika everywhere (currently
we do it on Windows). That should help in several places.
2. For Voldemort types, simply generate a GUID-style random
string of e.g. 64 bytes. Currently the name of the type is
composed out of its full scope, with some
exponentially-increasing repetition of substrings. That would
compress well, but it's just a lot of work to produce and then
compress the name. A random string is cheap to produce and
adequate for Voldemort types (you don't care for their name
anyway... Voldemort... get it?).
I very much advocate slapping a 64-long random string for all
Voldermort returns and calling it a day. I bet Liran's code
will get a lot quicker to build and smaller to boot.
Andrei

I have an Idea of reproducible, less-than-exponential time
mangling, although I don't know how actionable.
Leave mangling as is, but pretend that you are mangling something
different, for example when the input is
foo!(boo!(bar!(baz!(int))), bar!(baz!(int)), baz!(int))
pretend that you are mangling
foo!(boo!(bar!(baz!(int))), #1, #2)
Where #1 and #2 are special symbols that refer to stuff that was
**already in the name**, particularly:
#1: bar!(baz!(int))
#2: baz!(int)
Now, this is an reduction of the exponential computation time
only if you can detect repetitions before you go inside them, for
example, in case of #1, detect the existence of bar!(baz!(int))
without looking inside it and seeing baz!(int). Do you think it
could be done?
You would also need a deterministic way to assign those symbols
#1, #2, #3... to the stuff in the name.
Compression can also be done at the end if necessary to reduce
the polynomial growth.

I have an Idea of reproducible, less-than-exponential time
mangling, although I don't know how actionable.
Leave mangling as is, but pretend that you are mangling
something different, for example when the input is
foo!(boo!(bar!(baz!(int))), bar!(baz!(int)), baz!(int))
pretend that you are mangling
foo!(boo!(bar!(baz!(int))), #1, #2)
Where #1 and #2 are special symbols that refer to stuff that
was **already in the name**, particularly:
#1: bar!(baz!(int))
#2: baz!(int)

I have an Idea of reproducible, less-than-exponential time
mangling, although I don't know how actionable.
Leave mangling as is, but pretend that you are mangling
something different, for example when the input is
foo!(boo!(bar!(baz!(int))), bar!(baz!(int)), baz!(int))
pretend that you are mangling
foo!(boo!(bar!(baz!(int))), #1, #2)
Where #1 and #2 are special symbols that refer to stuff that
was **already in the name**, particularly:
#1: bar!(baz!(int))
#2: baz!(int)

So if I understand correctly, you tried implementing something
like this, but it didn't help much even with size of mangled
names. Are you sure you were testing on the pathological case
(exponential stuff), rather than a bad one? Assuming your
experiment is correct, something interesting is happening, and I
will be observing you guys finding out what it is. :)

I have an Idea of reproducible, less-than-exponential time mangling, although I
don't know how actionable.
Leave mangling as is, but pretend that you are mangling something different,
for
example when the input is
foo!(boo!(bar!(baz!(int))), bar!(baz!(int)), baz!(int))
pretend that you are mangling
foo!(boo!(bar!(baz!(int))), #1, #2)
Where #1 and #2 are special symbols that refer to stuff that was **already in
the name**, particularly:
#1: bar!(baz!(int))
#2: baz!(int)

This is what LZW compression does, except that LZW does it in general rather
than just for identifiers.

I have an Idea of reproducible, less-than-exponential time
mangling, although I
don't know how actionable.
Leave mangling as is, but pretend that you are mangling
something different, for
example when the input is
foo!(boo!(bar!(baz!(int))), bar!(baz!(int)), baz!(int))
pretend that you are mangling
foo!(boo!(bar!(baz!(int))), #1, #2)
Where #1 and #2 are special symbols that refer to stuff that
was **already in
the name**, particularly:
#1: bar!(baz!(int))
#2: baz!(int)

This is what LZW compression does, except that LZW does it in
general rather than just for identifiers.

That's true, but a general compression algorithm requires a
stream of chars as input, so to compress something of exponential
size you still need exponential runtime in order to iterate
(while compressing) on the exponential input.
The idea was to avoid such exponential iteration in the first
place by doing some sort of caching.
My thinking is that if you only reduce size but keep exponential
runtime, you are going to have to revisit the issue in a few
months anyway when people start using things you enabled more
heavily.
Anyway I wish you good luck with all this!

Yep. chain uses voldemort type, map does not.
Georgi: try copying chain implementation into your local file,
but instead of having a static internal struct, move it to a
module-level struct (you probably have to repeat the template
parameters, but not the constraints). See if it produces a
similar slimming effect.
-Steve

Yes,
Making a local copy of chain and moving the structure outside of
the method solved the problem and reduced the code size.
The stripped size even reduced from the version that did not
experience the huge increase. Stripped: 7.4Mb -> 5.5MB
Should we actually change the implementation in phobos to be like
this?
In the company I work in we are very sensitive to the code size
so anything that can be made for this in phobos will help with
adoption.
I am still curious though why the code size increased so
dramatically with such a small change. Both versions return a
result from chain and actually the one that does not experience
the increase is more complex (includes the one that does).
In my actual code I have multiple calls to chain on top of the
result of this and the code size does not increase dramatically.
I am also concerned by the fact that ldc on mac just could not
compile the source which produced the big binary (entering an
infinite loop).
Thanks

Yep. chain uses voldemort type, map does not.
Georgi: try copying chain implementation into your local file, but
instead of having a static internal struct, move it to a module-level
struct (you probably have to repeat the template parameters, but not
the constraints). See if it produces a similar slimming effect.

Yes,
Making a local copy of chain and moving the structure outside of the
method solved the problem and reduced the code size.

Thanks, that confirms the bug reports are for the same issue.

The stripped size even reduced from the version that did not experience
the huge increase. Stripped: 7.4Mb -> 5.5MB
Should we actually change the implementation in phobos to be like this?

I'd much prefer we fix the compiler to handle voldemort types in a more
sane way rather than try and fix the code to deal with compiler
deficiencies. However, if this isn't something that's easily done (or
done within a decent time frame), we may look at that solution.

In the company I work in we are very sensitive to the code size so
anything that can be made for this in phobos will help with adoption.
I am still curious though why the code size increased so dramatically
with such a small change. Both versions return a result from chain and
actually the one that does not experience the increase is more complex
(includes the one that does).

Each call to a voldemort wrapping function (one that wraps one type into
another "voldemort" type, or internal struct) increases the symbol size
at a rate of 3^n. Non-voldemort types do not suffer from this, because
the symbol size increases at just n.

In my actual code I have multiple calls to chain on top of the result of
this and the code size does not increase dramatically.
I am also concerned by the fact that ldc on mac just could not compile
the source which produced the big binary (entering an infinite loop).

I would expect the possibility that the issue is with the linker. In my
toy experiments, the compiler works just fine and produces an object
file. It's the linker having issues with parsing those humungous symbols.
-Steve

Making a local copy of chain and moving the structure outside of the
method solved the problem and reduced the code size.
The stripped size even reduced from the version that did not experience
the huge increase. Stripped: 7.4Mb -> 5.5MB

Making a local copy of chain and moving the structure outside
of the
method solved the problem and reduced the code size.
The stripped size even reduced from the version that did not
experience
the huge increase. Stripped: 7.4Mb -> 5.5MB

Thanks very much for helping with these measurements! -- Andrei

Just some more numbers:
On the version with the local chain which is 5.5MB
# strings test.local_choose|wc -l
4170
# strings test.local_choose|wc -c
5194012
Which mean that of the 5.5MB total binary size 4.95MB are
strings.
Inspecting the content of the strings over 98% are symbol names.
Is there a way to not have all the symbol names as strings even
after the binary was stripped?
I think that some compression for the symbol names is a good idea.
I agree though that a O(n) or better for the voldemort types is
also needed.