Call me a ricer, but I did give gcc-4.8.0-rc1 a try .
I had enabled lto with gcc-4.7 and I wanted to see how lto develops. I'm pleased to say that things got a lot better and I would consider it no more insane than -O3. The reason is mainly that -O3 causes runtime breakage whereas lto usually fails at build time. I'm doing that on my testing system, so unfortunatelly it won't get as much of everyday testing.
My main interest in lto is in space savings, because I'm an embedded developer and if you have to make an update over gprs you're thankful for every byte saved.

I did make a backup of my /var/db/pkg to be able to have a close look at the sizes, but this has to wait until next weekend. And it won't tell me the space savings of lto either, because I'm mixing things up a bit. I'm comparing stuff compiled with gcc-4.7 (mostly with lto) with stuff compiled under gcc-4.8 and lto. But this is just a quick and dirty test.

If you want to try yourself copy gcc-4.8.0_alpha20121216.ebuild from the toolchain overlay to gcc-4.8.0_rc20130316.ebuild of your local overlay.

I could imagine very bad results for packages not obeying LDFLAGS, because this can result in an completely unoptimized binary.

I could imagine very bad results for packages not obeying LDFLAGS, because this can result in an completely unoptimized binary.

Are you sure? I was understanding that .o files are optimized with -flto anyway but only that at link time a second optimization takes place. Has this changed in gcc-4.8? This would be bad...

It didn't change in gcc-4.8. From the gcc manpage:

Quote:

Additionally, the optimization flags used to compile individual files are not necessarily related to those used at link time.

However the example they provide there didn't optimize during the compile. However some optimizations are quite pointless during compile (e.g inlining) when you will do an lto, so ideally you don't want to optimize during compile anymore and want to do that at linking, but if LDFLAGS will be ignored you end up without optimization.

Additionally, the optimization flags used to compile individual files are not necessarily related to those used at link time.

I read this as: With -c -flto the optimization parameter (i.e. C{XX,}FLAGS) are applied and again for linking (i.e. LDFLAGS). So if you specify the same optimizations in C{XX,}FLAGS as in LDFLAGS you might have redundant optimizations (like e.g. inlining decisision might change as you mentioned), but even if LDFLAGS are ignored the result should not be worse than without -flto.

I've done some mesurements with compiling wget. Wget is rather small, its unoptimized binary is not even 500k, so the results may be different with other sources.

As reference 100 % is compiled with gcc-4.7.2 without optimization, which is 471192 Bytes large and took 4.62 (user) seconds to build.

First of all, "mv" you're right. If you don't optimize with lto during link, you're still left with the other optimizations you've enabled during compile. The compile time overhead for adding lto information is about 10%.
What I'm more confused about is, that it does make a size difference of about 6% if I'm compiling and linking with "-Os -flto", or if I ommit "-Os" at compile time. How does that conform to the man page?

For -O2 the size between 4.7 and 4.8 stayed about the same, for -O3, "-O2 -flto" and any combination of {-O0,-Os} and {,-flto} it shrank about 1% and for "-O3 -flto" it shrank 2%.

Simply optimizing for size gives about 74% in size (349808 Bytes) and 207% in compile time.
To get to the smallest size of 70% (332784 Bytes) I had to compile and link with "-Os -flto" which took 345% of the original time. That's 28% more time than -O3 would take.

So all in all, there seem to be improvements, with gcc-4.8. There are no behaviour changes with lto compared to 4.7, but the quality improved. If there is a size improvement of one percent on avaerage that would be great (seems small, but don't expect wonders from updating a mature compiler).

What I'm more confused about is, that it does make a size difference of about 6% if I'm compiling and linking with "-Os -flto", or if I ommit "-Os" at compile time. How does that conform to the man page?

There are two possible explanations, but I don't know which is true:
First explanation is trivial: The optimization flags used for compiling should be stored in the .o files and re-used for the linking phase. At least, this was the original plan, but the implementation was buggy or incomplete - only some flags seemed to be stored. So it might be the case that this has been fixed (I think this was one of the causes for many issues with lto).
The other explanation is less trivial: It might be the case that some optimization used for -Os is not idempotent, i.e. if it applies to already optimized code, it can optimize even more (or maybe produces worse code). Hence, if the code was already optimized during compiling and is optimized again you get a different result than if optimized only once.

You could try "-fno-fat-lto-objects". This makes sure that no compiled binary code is contained in the object files at all And it should speed up the compile process somewhat.

Of course then only a lto-aware linker is able to link the object files to an executable binary. If after that apllying "-Os" twice results in a different binary size than only linking it with "-Os" you should ask at the gcc-help mailing list and possibly file a bug ^^

You could try "-fno-fat-lto-objects". This makes sure that no compiled binary code is contained in the object files at all :-)

Except for the intermediate code, of course, which is the one being optimized. I would be very surprised if this option would change anything besides the size of .o files (and perhaps of .a libraries): The linker will remove all other binary code anyway in the LTO phase.

I would be very surprised if this option would change anything besides the size of .o files (and perhaps of .a libraries): The linker will remove all other binary code anyway in the LTO phase.

This is not the first surprise I had with lto and I'm quite sure it won't be the last either.

The build time reduces dramatically. It's now about the same as normal -Os without lto. But the sizes are exactly the same as the first lto build. It's still the case that the resulting binary is smaller when -Os is specified durling compile and link.

But the -fno-fat-lto-objects option didn't work out of the box. I had to use the "gcc-" wrappers for "ar", "ranlib" and "nm", or else I got tons of unresolved symbols. Same problem when using gold. My binutils are built with --enable-gold and --enable-plugins.

#This is not the first surprise I had with lto and I'm quite sure it won't be the last either.

Nothing what you wrote is surprising, provided that you used a program whose built system uses internally .a libraries (which I thought you wouldn't do for a first test): As expected and documented, the option just omits the emission of the processor code to lto files which is not used during lto - in the ideal case which happens currently out-of-the-box only if you link only .o files. If you also link .a files (or first even generate .a files with ar) you need plugins and gold linker.
The speed and size difference for gold comes from the fact that with gold the whole program - including .a files - is optimized while otherwise the .a data is just linked with only the first optimization phase.
Concerning the explaation I gave earlier, still both cases are possible.

I also activated LTO globally, but with an the 201212xx alpha version of GCC 4.8 and for a full kde-desktop. It worked quite well after disabling LTO for about 30-50 packages. I’ll try to test it again for the rc.

I got my ebuild for testing from the hardened-development overlay._________________Being unpolitical means being political without realizing it. - Arne Babenhauserheide ( http://draketo.de )

I was surprised by the huge speedup. I didn't expect to get lto optimization almost for free.
The wrappers are documented in the gcc man page, so no surprise there.
You do not need the gold linker to link against archives of no-fat-lto-objects, but the linker needs plugin support. Which refers to the liblto_plugin.so.
I choose wget as my benchmark, because it's a single program in plain C, but not so highly optimized like busybox.

Regarding "-fno-fat-lto-objects": My main concern is that during some builds the optimization flags may be stripped during the final link phase (depending on the toolchain and/or portage, I just don't know enough about it) which will result in a binary with no optimization at all, so I have refrained from using it. A lto-aware toolchain should not use any optimization during the compile phase, but only during the final link phase (btw, did you ever try -flto=4 or similar on a multicore box, it works quite good), saving quite an amount of compilation time. But there are still some issues, as the double-size-optimized binary of firetwister shows - I still think its a kind of bug ^^[/code]

Which AFAIK means practically that you do need the gold linker unless you have written your own linker.

The wiki thinks so:
http://gcc.gnu.org/wiki/LinkTimeOptimization
But I don't think so.
1. Using gold results in a slightly smaller binary, if gold would be used implicitly the binary size should be the same. I know we've seen some mysteries regarding sizes.
2. Renaming ld.gold still results in a successful link, except -fuse-ld=gold (4.8 feature) is specified.
3.

man gcc wrote:

-fuse-linker-plugin
Enables the use of a linker plugin during link-time optimization. This option relies on plugin support in the linker, which is available in gold or in GNU ld 2.21 or newer.

drhirsch wrote:

Regarding "-fno-fat-lto-objects": My main concern is that during some builds the optimization flags may be stripped during the final link phase ... which will result in a binary with no optimization at all

Don't worry about that. Even the simplest program will fail to link, so this is a great way of detecting bad buildscripts.
You can override it on a per-packet base in your no-lto.conf with -ffat-lto-objects.

drhirsch wrote:

did you ever try -flto=4 or similar on a multicore box

I'm using -flto=2 during link. It's only a dual core, and the main problem is, that it only has 2 gigs of ram. Not using swap frequently results in "virtual memory exhausted" errors.

I also activated LTO globally, but with an the 201212xx alpha version of GCC 4.8 and for a full kde-desktop. It worked quite well after disabling LTO for about 30-50 packages. I’ll try to test it again for the rc.

I got my ebuild for testing from the hardened-development overlay.

Having a hardened setup, i used the same source for gcc-4.8. (rc2 now the released version ) Did you get following error after successfully installling ?

Failed to set XT_PAX markings -re for:
/var/tmp/portage/sys-devel/gcc-4.8.0/image//usr/libexec/gcc/x86_64-pc-linux-gnu/4.8.0/cc1
Failed to set XT_PAX markings -re for:
/var/tmp/portage/sys-devel/gcc-4.8.0/image//usr/libexec/gcc/x86_64-pc-linux-gnu/4.8.0/cc1plus
Executables may be killed by PaX kernels.

There were 3/4 package that didn't build with lto,although they worked with gcc-4.7.2-r1. Also 4 just failed to build x11-apps/luit sys-libs/db-4.8.30 net-libs/webkit-gtk-1.10.2-r300 net-wireless/gnome-bluetooth-3.6.1

Also i can't build with genkernel the latest hardened-sources-3.8.4 there is the folowing error

The gcc from hardened-development didn't compile for me - it dies not so late with some undefined symbol, I forgot. When I omitted all gentoo/hardened patches it worked.

Concerning the failure of hardened-sources to compile, this is not so surpising: They build a rather complex plugin for the gcc, and probably that plugin interface has changed. With new version of hardened-sources you can configure the kernel such that this plugin is not used, so when you switch it off, the kernel should compile. However, you have then a less hardened kernel, of course

*
* LTO support is still experimental and unstable.
* Any bugs resulting from the use of LTO will not be fixed.
*
* If you have issues with packages unable to locate libstdc++.la,
* then try running 'fix_libtool_files.sh' on the old gcc versions.
* You might want to review the GCC upgrade guide when moving between
* major versions (like 4.2 to 4.3):
* http://www.gentoo.org/doc/en/gcc-upgrading.xml
>>> Auto-cleaning packages...

Interesting discussion about such bleeding edge technique, I'm also experimenting with, since some weeks.

I just wondering why none of you had listed the

Code:

dev-libs/libaio no-lto.conf

package in your package.env file, which causes me lots of trouble.

The tricky thing with it for me was, it compiles fine with LTO, but as followof this my akonadi-server doesn't work anymore, so my kmail stops working and also I wasn't able to recompile my mariadb package which works fine until then.
I presume it should be the same if you use mysql for akonadi.

I finished rebuilding my system, lto isn't looking as good as in the first test anymore.
I've only 4 packages that do not build with gcc-4.8 at all, which is really good:

Code:

dev-python/pygobject
kde-base/kopete
sys-apps/paludis
x11-apps/luit

Did anybody compile them with 4.8 and this is just another lto bug?

I've seen much more problems with libraries that build fine, but packages depending on them then fail with undefined references.
Libreoffice should build with lto (EXTRA_ECONF="--enable-lto"), but it needs 6gigs of ram. Ram not swap! I wasn't able to build the qt and intel drivers because of lack of ram as well.

Randy Andy I can confirm your problem with libaio, but I probably wouldn't have it noticed for some time on that machine.

So if you compare the size of the first and the last list you see that the improvements don't look like a big step forward anymore. One could compile the last list with gcc-4.7, so there would be only the common ones left. However my aim wasn't to get most packages optimized with lto, but to see if it's prime-time ready and frankly - NO it still isn't, but I think it's still better than -O3.
Having more packages that compile fine, but then letting dependencies of them fail at link, like libXfixes, libasyncns, coin, orbit, libxdg-basedir, libical, dotconf or even fail at runtime like libaio (the only one so far, so better than 4.7) is worse than simply having a package fail to link. However I think there should be something like a lto overlay for the brave testers.
In case of text size reduction, there have been some improvements. Unfortunatelly I realized I forgot to save the size of /usr/lib64/debug halfway during merge, so the size of /usr/lib64 isn't accurate. However comparing the space savings from those directories for the second half of the merge, it seems like the savings are evenly distributed, so the numbers shouldn't be that wrong.

/usr/bin 90%
/usr/lib64/ 94%
/usr/libexec/ 100%
/bin 102%
/lib64 100%
----
Overall the size of the binaries is only 95% of the size of the binaries compiled with gcc-4.7 and lto

My gcc bug report because of the different sizes can now be explained. -O0 does disable local optimization, and optimization in general, so you have to optimize with at least with -O1 at compile time. It's similar to specifying -O0 and -finline-functions, which does nothing.

You already got them with gcc-4.7, though they didn't work for me back then. What those wrappers do is probably best explained by the script I provided in post 9.
IMHO you shouldn't do anything with those wrappers but using them, like you would use their non-wrapped counterparts. But this is where the hassle is. You have to convince all the buildsystems out there to use the wrapped ones and not the original ones. That's why I submitted the bug for eselect/binutils-config which should take care of that systemwide.

I see now. They are there, are working - at least for v4.8, but simply they are not called when required.
I'll wait until gcc 4.8 hit portage and I'll try to use globally -fno-fat-lto-objects.
Thank you!_________________Sorry for my English. I'm still learning this language.