Our recent poll (courtesy KDE.com) on the upcoming KDE 2.2 suggests that the area of
greatest concern for KDE users is speed -- at this time, out of 3,463 votes, over 24% consider speed as most important for developers to address. Waldo Bastian, who developed the kdeinit speed hack among other things, has written a paper entitled "Making C++ ready for the desktop", in which he analyzes the various startup phases of a C++ program. Noting that one component of linking -- namely, library relocations -- is currently slow, he offers some suggestions for optimizations. An interesting read.

Bookmark/Search this post with

Comments

Hmm... I surely am very glad to know Waldo and I'm honoured to call myself his friend. Yet, it should be a matter of pride for us to avoid the kind of remarks tou did. Waldo *did publish* his study, thus hoping *all* the Linux community will profit/help/contribute. If an inventive Gnome developer will take Waldo's notes, have a strike of genius and find a marvelous solution to our speed problem, I'm sure we *all* will profit. And it goes like this for every other thing in Free Code. And that's why we like it so much, even if we don't always realize this actual reason.

I would like to thank to Waldo for his constant preocupation for KDE's performance aspects. He is one of the major contributors to KDE-2's at least equal to KDE-1 performance, despite of it (KDE-2) being at least 3 times more complex.

I see. I used to use KDE all the time. I don't at the moment, although it't not for any advocacy reasons. Comments like yours don't make me want to come running back (nor will similar comments from Gnome advocates make me proud).

First of, I AM glad that Waldo is working on KDE, as GNOME is not using C++ and had he been working on GNOME the article (if one had been made) would have had to do with C and not C++, and since KDE is a big part of my life, I am happy that a person with this much insight and intelligent is working on making it better.. This was not meant as a KDE-GNOME thing, I mentioned GNOME 'cause they use C and not C++...

Actually, if you read the article, there are a number of things that would help Gnome as much as it would help KDE. For one, the dirty pages that are created when libraries are relocated take up valuable memory, and increase the amount of I/O required to load something. I/O tends to never be fast, so it's something everyone wants to avoid if possible. A more optimized relocator in the ld.so would not only help KDE, but also Gnome as the memory footprint could very well shrink for both projects.

If you have 213 dirty pages per application, that means that every task (except one) is about 800k bigger than it could've been. Not only that, that also means that you have to load another 800k of code from the disk, which adds to the start-up time if you were unlucky enough to have the content expire from the cache (and on small or low-memory systems, it's likely the case). If you assume that the average consumer drive can do 5-10MB/sec, reading off the drive platters, that's already a tenth of a second wasted right there. Having more free memory pages for cache never hurt anyone.

No, C has the same problem, just less of it. Just remember that kmail has about 60 000 relocations, and the (fun) GTK game freeciv has a mere 1172.

But Windows does fix it, so even (e.g.,) Qt/Professional with nearly the same amount of relocations will start fast on windows. But this is only because Windows's dynamic linker isn't very versatile, while Unix's is very versatile (and therefore, slower).

You just can decide between a fast program or a small-sized program. C++ is mostly small-sized, if it has a good style of code reuse. That means, you've to call more function, more time to spend in.
-- But that doesn't mean, that you can't write fast code in c++.

AFAIK, Windows DLLs have a "preferred address". If the dynamic linker finds that the preferred address is unallocated and there's enough room to take up the DLL at that address, it is loaded there, and no relocation is needed. Otherwise, it loads the DLL at a different address and relocates it.

I think that this is the best thing to do: When a library is linked, it has to be decided in some way what the preferred address is. This must take into account the shared objects that the library depends on.

DLLs do have a preferred address. However, as soon as a DLL has to be relocated, it can't be shared anymore, because windows has no concept of Posistion Independent Code (PIC): all function calls inside the DLL have to be changed when it is relocated.

Linux (and all other Unices I know of) does support PIC: it is essential to the ELF binary format, AFAIK. This means relocation isn't expensive: the pages of the libary won't have to be touched, making them shareable.

The thing both the Windows and Unix dynlinkers have to do is to resolve the calls made by an application to a library (or lib -> lib): the app makes a call to a fixed address, inside the relocation table. The dynlinker generates a call on this fixed address to the dynamically linked function.

The problem with KDE is the sheer number of function calls exposes the inefficiency of the dynlinker.

Loading a library on a preferred address may speed up this proces. It may break ELF though.

I didn't understand - all calls in PIC code, even inside a single DLL, are done indirectly using a PLT table ? Are there no relative call instructions in assembler !? But Windows simply modifies DLL code replacing all adresses ? - a text segment is writable ? Uff !

Excellent paper. I had always wondered about the performance of dynamic linking with C++, and library caching in general.

While the "kdeinit speed hack" is called a hack, it actually sounds like the right way to do it. What better way to keep the libraries loaded? It's possible that certain libraries could even be preloaded, like the filemanager components (IMO, the only application that really requires instantaneous loading).

On a side note, the only real other speed problem in KDE would be Konqueror's re-rendering of content. It takes a long time to load large pages, like for instance a huge message board. However, the re-rendering is killer when you've finished reading a post and you're clicking "back". Same goes for pages with many images (like Konq's thumbnail render). Perhaps these final page renderings should be cached somehow. Does anyone know how Netscape and IE go back and forth so fast between alread-visited views?

I completely agree with what you said. The filemanager/browser is one of the applications that really should start as fast as possible. Also going back and forward in the history is at present somewhat slow. Dirk Mueller is doing some nice optimizations concerning khtml at the moment, so I have the feeling this will be solved/fixed somehow. For me the main problem is drawing the page on my screen. On my fast hardware (not being under load at that moment) I can very often actually see the page (slashdot for example) being 'sweeped' on my screen from top to bottom. I really don't have a clue what is causing this behaviour (Hardware ? Qt ? Khtml ?) :(

I wasn't sure if you were serious here or not...
Surely you are aware that konqueror has a cache just like other browsers? It has even been enhanced in 2.2 with auto-synching and offline-viewing mode.
You set it on the web-browsing proxy page. (Dont ask me! Its prob on the proxy page because a cache IS a proxy to all intents)

And the slow rendering you are talking about has nothing to do with rendering. Konq is acknowledged as having one of the fastest renderers in the world beating even IE. The delay you are talking about is just your slow net connection downloading the hundreds of entries in a post-list, Once downloaded it renders that almost instantly because it is mainlt text.

God I wish people would read their manuals. There are so many badly set up linux systems out there and all blame kde rather than their own lack of intrest in setting up properly, Now that would give more speed increase than optimising the whole dynamic loader!

While Konq may render a page faster than other browsers (and it certainly does), and it caches the content, it does not cache the "render".

I was recently visiting a large SuSE forum that took almost a minute to load. This is tolerable, but it immediately became a problem when I read a comment and then clicked "back". I had to wait another minute as Konq re-rendered the forum. My solution? Create a split view and drag links to comments into the other view. A better solution? Konqueror should have a method for rapidly rendering previously viewed content. Perhaps storing the final rendered canvas into memory for the current session.

And don't worry, I read my manuals. I'm a programmer, after all. I might even want to contribute to solve this problem, but I'm hesitant since Dirk didn't accept one of my other patches.

It just pisses me off the Konq can't do it yet. I love working in KDE, but once you have used KDE for a few days and you fire up Windows again (because my online banking does not work under Konq.) you then realise just how slow KDE is, in general, I am able to load IE, go to google and perform a search and be redirected to the first on the list before Konq. has even started.

The reason Windows GUI's are, in general, faster is that most graphics cards implement GDI calls in the hardware. Try disabling hardware acceleration in your graphics settings and you'll see exactly what I mean.

PalmOS devices have been doing something like this for a while due to their limited memory and CPU horsepower. I believe iSilo (judging by its speed) does this. The open source Plucker for PalmOS documents a file format that is essentially a prerendered web page, and that saves both CPU and memory. Prerendering does not have to mean saving the entire bitmap. A compressed format that saves prerendered bitmaps, preparsed text and previously made formatting decision would save both space and time. It can be used in place of the raw html and jpg files.

It can be done. It's not a Windows-specific feature either, as some have suggested to your post. Have you tried Opera? It's back/forward cache performance is lightning fast, way ahead of anything else on Linux. If you play with its settings, you will find a "cache rendered images" setting as well. So yes, something much like what you suggested can and has been done on Linux.

A simple strace on konsole shows more than 200
failed opens (ENOENT). I.E. It fails to find the dynamic dependencies 200+ times. This transforms into so many open/access/stat calls which obviously slows down the startup. I just tried to adjust the links to SOs so that all required dependencies are resolved at first shot. Believe me, it speeds up things to a good extent. Can't anything be done to hardcode the paths in binaries at compile time? Also ld.so.preload of libc and libqt seems to be a good idea for "KDE
only" users.

Basically, "kdeinit" is a daemon, initialized with LD_BIND_NOW, which causes it to be loaded entirely, all at once (rather than as the parts are required), then we dlopen the program, fork() kdeinit, and enter the program.

With LD_BIND_NOW, the relocations of all the kde libraries (that's a lot of relocations!) are done only once. And they won't be done again on that fork()