When compiling the attached program with g++ 3.3, the compiler takes about 80 MB
of main memory on Intel/x86. When compiling it with g++ 3.4, the compiler takes
> 400 MB and eventually crashes (potentially due to the Linux kernel killing
processes due to out-of-memory).
Since standard libraries are different between 3.3 and 3.4, I provide two
preprocessed files. (This is boost random number library random_test.cpp.)
g++ -v rt-3.3.ii
[...]
Configured with: ../gcc-3.3/configure --prefix=/usr/local --enable-threads
--enable-shared
Thread model: posix
gcc version 3.3
(ok)
/opt/exp/gcc-3.4/bin/g++ -v rt-3.4.ii
[...]
g++: Internal error: Killed (program cc1plus)
Please submit a full bug report.
See <URL:http://gcc.gnu.org/bugs.html> for instructions.

I think the problem is that 3.4 is not able to collect garbage while instantiating the
templates. Calling ggc_collect while instantiating the templates and at the right level, I
get
{GC 95280k -> 45466k}
which shows that it gets rid of half of the memory but it crashes right after doing that.

Created attachment 5425[details]
broken patch
This is broken but really it is not the patch itself which is broken but rather
the C++ front-end keeps references to variables on the stack/registers without
references in variables seeable by the GC.

Subject: Re: [3.4 Regression] memory consumption for heavy template instantiations tripled since 3.3
>
> ------- Additional Comments From pinskia at gcc dot gnu dot org 2003-12-19 08:43 -------
> It goes up to abot >480MB on powerpc-apple-darwin, then drops to around 250MB.
I can get about 30MB at -O0, for unit-at-a-time we however still needs
250MB, this is the size of all templates instantiated together. I don't
think we can reduce this for 3.4 further and it is no longer regression,
in the future we may make trees more compact. This testcase has also
interesting runtime properties, Mark may want to look at the
for_each_template_param_r problem.
Honza
>
> --
>
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12850

Subject: Re: [3.4 Regression] memory consumption for heavy template instantiations tripled since 3.3
For a record, here is profile of the run. Lots of overhead is comming
from quadratic behaviour in templates and frineds.
Honza

The only memory leak I had was from shorten_branches in final.c which I have a fix for
now but that does account for the 60M difference between GC and real allocated
memory (even though I suspect there are large amounts of pages still allocated because
the GC is spread all over them). Also malloc only accounts for 20M.

Subject: Re: [3.4/3.5 Regression] memory consumption for heavy template instantiations tripled since 3.3
>
> ------- Additional Comments From pinskia at gcc dot gnu dot org 2004-01-27 16:35 -------
> The only memory leak I had was from shorten_branches in final.c which I have a fix for
> now but that does account for the 60M difference between GC and real allocated
> memory (even though I suspect there are large amounts of pages still allocated because
> the GC is spread all over them). Also malloc only accounts for 20M.
I have additional patches in testing cutting this into roughtly 118MB,
still there is room for improvement as really we shall be decreasing
amount of memory during the compilation stage that we don't (the parsed
program after template instantiation is slightly over 60MB of GGC memory)
We also burn a lot of unnecesary memory in C++ parser during name
lookup, I am probably not going to address this as I simply don't
understand the issue at all.
Honza
>
> --
>
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12850
>
> ------- You are receiving this mail because: -------
> You are the assignee for the bug, or are watching the assignee.

Here are the results for -O0, now that PR 18683 is now fixed:
cp/lex.c:716 (copy_decl) 1087604: 0.3% 0: 0.0% 5906492:10.6% 0: 0.0%
56404
cp/pt.c:3978 (coerce_template_parms) 41586524: 9.6% 0: 0.0% 136540: 0.2%
3865680: 7.4% 1138236
Though we do create a lot:
cp/parser.c:278 (cp_lexer_new_main) 0: 0.0% 22585856:36.1% 0: 0.0%
6332928:12.1% 5
Which is mostly a ggc_realloc of a buffer of all the tokens, maybe there is a better way of allocating this
buffer as it seems like we create a lot of overhead because ot it.

The initial CP lexer bugger size is 10000:
#define CP_LEXER_BUFFER_SIZE 10000
That came in with the lex-all-ahead patch from Matt and Zack,
on 2004-09-20 (parser.c rev. 1.250 for the CVS history diggers)
but it seems a bit low to me if you're going to lex the whole
file up front. I would not be surprised if the average C++
code with lots of templates has several 100,000 tokens... Let
me see:
- preprocessed sources for generate.ii from PR8361, blank and
pound lines stripped: 36200 lines
- an average of 7 tokens per line in the first 500 lines, let's
assume that's a reasonable average for the whole file (it's
easy to instrument g++ to get the exact number of tokens, if
you want more accurate numbers ;-)
That makes it >250,000 tokens for this file.
Since we double the buffer, we have:
10,000 + 20,000 + 40,000 + 80,000 + 160,000 + 320,000 = 630000
That is the number of tokens we have allocate room for, with no
ggc-collect in the middle. With ggc-page, which has power-of-2
based page sizes, it's safe to assume that each previous buffer
is too small to be reallocated, so a full new buffer is allocated
and the old one is memcpy-ed to the new one. With checking off,
we ggc_free the old buffer, but with checking enabled we don't
so after finishing the whole lexing process, we have keep around
a buffer of ~380,000*sizeof(cp_token), so that's roughly 10MB
of memory we can't reclaim until the first ggc_collect call.
Maybe buffer should not be in GC memory at all? We know the
exact live time of buffer, and as far as I can tell we never
ggc_collect while it is live. According to the comments for
cp_lexer, "Tokens are never added to the cp_lexer after it is
created." So it may be cheaper to have the buffer xmalloced,
and memcpy-ed to a buffer in GC space just before saving it
in the new cp_lexer object.
So two suggestions for a person who wants to make g++ a little
faster here:
- make CP_LEXER_BUFFER_SIZE larger. To make it use pages more
efficiently, look for some ratio of pagesize/(sizeof (cp_token))
- see buffer in parser.c:cp_lexer_new_main can be moved out of GC
space as suggested above.

Subject: Re: memory consumption for heavy template instantiations tripled since 3.3
"steven at gcc dot gnu dot org" <gcc-bugzilla@gcc.gnu.org> writes:
[...]
| Maybe buffer should not be in GC memory at all? We know the
| exact live time of buffer, and as far as I can tell we never
| ggc_collect while it is live. According to the comments for
| cp_lexer, "Tokens are never added to the cp_lexer after it is
| created." So it may be cheaper to have the buffer xmalloced,
| and memcpy-ed to a buffer in GC space just before saving it
| in the new cp_lexer object.
Your analysis makes sense to me. I never quite understood the
addiction to GC-allocated memory throughout the compiler.
-- Gaby

(In reply to comment #36)
> The initial CP lexer bugger size is 10000:
The same amount of garbage is also done for PR 8361.
Also note I could not compile this source again becuase of the use of long double which causes an ICE
for ppc-darwin but that has been fixed already.

Subject: Re: memory consumption for heavy template instantiations tripled since 3.3
"pinskia at gcc dot gnu dot org" <gcc-bugzilla@gcc.gnu.org> writes:
| cp/tree.c:827 (ovl_cons) 11464712: 3.2% 0: 0.0% 660240: 1.4% 1732136:
| 5.2% 433034
|
| Hmm OVERLOAD tree takes 3% of the Garbage which seems like too big,
| though I don't know how big
| long the OVERLOAD trees are, I might add something to count that.
It is not uncommon to have large overload sets in C++ -- that is what
people do when they discover that they can overload in the literal
sense ;-)
-- Gaby