As a follow-up to Waldo Bastian's analysis of KDE startup times, Leon Bottou has implemented an inspired hack to improve the startup of C++ programs under GNU/Intel systems. "Waldo Bastian's document demonstrates that the current g++ implementation generates lots of expensive
run-time relocations. This translates into the slow startup of large C++
applications (KDE, StarOffice, etc.). The attached program "objprelink.c" is designed to reduce the problem. Expect startup times 30-50% faster." Update: 08/01 4:52 AM by N: Consult Leon's objprelink page for some great details and up-to-date information on this hack as well as on the prelinker mentioned by Bero. Thanks to freekde for the tip-off.

If I understand correctly, Leon's hack works around the problem by adding a level of indirection - a stub - to each function in a class's virtual table, and changing references to the function to point to the new stub instead -- thereby eliminating a whole lot of symbol lookups and relocations.

Check out Leon's email for the exact juicy details and for the Intel/GCC-specific C code of the program you will need to process object files before linking. One possible downside of this optimization is that virtual function invocations may now be slower due to the extra indirection involved.

And of course, no matter how brilliant the hack, we are still working around faults in the GNU linker. Apparently some work is going on in that area as well as can be seen in this email from Jakub Jelinek.

Comments

I am also bothered by KDE's not-so-fast performance. I recompiled KDE packages but didn't saw any improvement. But when I recompiled Linux Kernel 2.4-2 on RH 7.1, I saw 40% improvement in KDE + its apps.

There is several way to use mod_perl, but most of the time, you will use it to cache the compilation process of Perl on your script, and then you will simply re-call it the next time it is request. Then, your re-call of your script will be handled by mod_perl like a function call.

No you are lazy... all the instructions are on Leon's webpage including how to compile the objprelink.c file. And about being stupid: even you can copy and paste the gcc line into a konsole and copy the resulting objprelink executable into /bin, /usr/bin/ or /usr/local/bin, where ever you want it.

I tried it on mandrake 7.2 glibc 2.1.3 with newest binutils (2.11.0.8) and libelf(0.7). Wouldn't compile (some missing declarations, STV_DEFAULT and others) after some playing to include these declarations from binutils it compiled and runs. prelink with the n option (dryrun) seems to work fine, but if I want to prelink for real it bails out with something like: no space for dynamic.
Whatever that means...

I think it could be made to work, but prolly has no real priority since everyone goes to 2.2...

hey..than it wasn't due to 2.1.3 since I managed to compile it on 2.1.3 but also get this .dynamic error(and thought it to be due to glibc). If it happens in 2.2 as well it must be something else? Maybe I'll send a mail to the author this evening.

Already people are reporting great speedups with this hack. Everyone seems in favor of including it in KDE 2.2. What does this mean for KDE Init and for distributions with prelinking already? Is it still worthwhile?

seems this trick has lots of advantages (speed especially), so why not always compile kde with the speed improvements from now on?

KDE is better than any other WM, EXCEPT when launching applications (it's so slow!).

If we can improve KDE's speed by up to 50%, then all new release should be tuned like this (I really dunno why all of a sudden KDE is capable to be so fast and that nobody discovered or put it on focus before)

While we are talking about speed, has there been any improvement on image rendering/decoding, last time I checked (2.1.1) Konqueror and Pixie where unusable as a thumbnail viewers because of the horrible preview speed. I believe they both use the same libs(Qt or KDE core?) for this. Wouldnt there be a big performance boost for the whole environment if it were to be optimised (or at least for Konqueror and Pixie).

I'm not sure if it got into 2.1 or not, but Pixie's thumbnail manager has supported load on demand for quite some time that's extremely fast when browsing existing thumbnails. I've also just implemented load on demand for mimetype data as well, so you can enter a directory of > 2000 thumbnailed images (I took all my photos and makde a bunch of copies ;-) and start browsing any thumbnail essentially immediately. It used to take around 5-6 seconds, not bad but this is even better. It's faster than anything else I've been able to compare it to, both on Linux and Windows. A new version should be released in about a week. If load on demand wasn't implemented in KDE 2.1, I strongly suggest you upgrade. You'll get a new UI and other goodies as well.

Hi mosfet,
I was not commenting on the speed of viewing existing the thumbnails, sorry if that was unclear.
It is the speed that KDE handles pics, if you for example click on an jpeg image in konq you can see it gradually appearing, but for example in GQview it's displayed immediately. I dont know what makes it so could it be kio? But I seem to remember a thread on the mailing list concerning poor performance in KDE image libs, no optimised ASM code for instance.
Ill check out Pixie as soon as possible, will the new release be based on KDE 2.2?

Well, you said thumbnails, so you were pretty unclear ;-) Your seeing it slower in Konq because it's incrementally loading and rendering it. Good for web based images, bad for local files. Use a different component for viewing images, not the HTML widget (which is what you have it set to ;-).

This was never an issue with Pixie, which never did incremental loading (it's be nice to add for remote images, tho). I don't think you used it much... As far as ASM and other things for loading images, that won't help at all. The main bottleneck in loading images is disk. It could help for things like smoothscaling thumbnails, but 2/3 of the time is spent in disk I/O (I checked), so not much. The "poor performance" of KDE/QT image loading is mostly people not knowing what they are talking about. For example, both Qt and imlib both call libgif in essentially the same way, same for libjpeg, libpng, etc... for loading data.

I can't imagine that looking up even a few tens of thousand symbols in a symbol table should make any appreciable difference in program startup time for a properly implemented symbol table.

While this is a neat hack, it sounds to me that the problem is not with g++ but with the data structures that the runtime system uses for relocation (linear search?). Probably that should get fixed, and that would speed things up generally, not just in this special case.

Well, the startup time of non-prelinked binary, at least on my machine, is mostly filled with hard drive seek tests (double 450Mhz PIII, kernel.org linux-2.4.5). I wonder whether prelinking doesn't streamline some disk accesses at the same time by coincidence, maybe just by not referring to the pages which don't need to be accessed at startup. I imagine that if the relocation tables are spread around the binary, there will be a decent amount of seeking at startup, just to get the right pages in.

Well, my assumption is that linux does some memory<->disk mapping of binaries' pages, if it doesn't then I'm obviously wrong.

Would linear search be sooo slow given that there really aren't that many symbols to look-up (I doubt it's tens of thousands). I imagine that a typical symbol table would be - well, the one in libc-2.2 is about 2k symbols. Maybe that really goes up to tens of thousands for kde+qt apps??? :-(

So Gnome (version 1.2.1) had about the same amount of relocations as KDE1 (1.1.2). The big hit came with KDE2 (2.1.1/2). This is, so I think, directly related to the DCOP stuff and all the other things going on in the background. Look at kwrite from KDE2 starting:

holle@chaos:~/.p > LD_DEBUG=statistics /opt/kde2/bin/kwrite
01151: number of relocations: 51023
01152: number of relocations: 1466
01152: number of relocations: 46994
DCOPServer up and running.

Now that is a lot ...

I think we need to streamline the API a little bit. Make more use of inline functions and try to get rid of function duplicates i.e. two functions doing mainly the same thing.
Maybe we can come up with a late binding feature like python has, where the functions code gets bound at the very moment it is used and not earlier (and for again and again for python ...)

Using inlining for methods is not a good idea for
a C++ library. Great for apps, bad for libs. Think
of what happens when you try to change the
implementation later. I needed to change some
kstyle* stuff a while ago and couldn't. Argh.

Sorry, KStyle has no inline methods... doh! KThemeStyle does, but that is not called by any other applications, only dynamically loaded by the theme engine. Either way, inline methods are very common in libs (look at Qt: grep inline *h | wc --lines gives you 697 occurances).

Write a new style plugin based off of KThemeStyle, it's a plugin that provides the theme engine, remember. Those headers are included in the KDE libraries simply so people could derive from them, but no one ever did (people wrote very few styles period).

If you really do have a style you can release it today.

Your change would of also most certainly required private and protected member and data changes anyways, so still doesn't make an argument against inline methods, which are used hundreds of times in both KDE at QT.
Should we dump private and protected members as well?

A) This does not prevent you from making a new KStyle or KThemeStyle, as you claimed. I made very sure you can do anything you want with the plugin mechanism and saying BC prevents you from doing anything is just incorrect.

B) KDE headers currently include 1,797 incidents of inline methods. They are a very good way to optimize code and are the equivalent of #define macros in C. Dropping them isn't what I'd recommend to any developer, unless if you like unneeded method call overhead.