After the unexpectedly successful attempt to speed up 64 bit Gentoo installations using a modified approach to prelinking (see http://forums.gentoo.org/viewtopic-t-494955-highlight-.html for more details) where multple cases of considerable speed gain have been reported, I was eagerly searching for a similar solution for 32 bit systems.

I did this although it was not clear to me whether such a solution could possibly exist at all. The problem here was the limited virtual address space of the IA-32 architecture.

However, after some experimentation I found a satisfactory result: My script was able to prelink all of my shared libraries to different, unique base addresses.

I wrote that script as a wrapper around the prelink command. The script checks whether my new scheme actually can be applied, and applies it if possible.

If the new scheme does not work for a system, the script includes a fallback to the standard Gentoo prelinking method, which always works but is less efficient. It also remembers that fallback for future runs of the tool, in order to avoid futile reattempts (wasting your time). Once the fallback occurred, my script will be just another way to perform the standard Gentoo prelinking commands.

Which means you can use it anyway: Either it works better than the standard prelink approach, or it does the same.

And even in fallback mode there is a little advantage compared to using the prelink binary directly: Usually you can run the script without any arguments at all (except perhaps for -v / --verbose), because the script will use the right switches for the prelink command automatically.

Using the script, I was able to prelink successfully all shared libraries to unique base addresses on one of my boxes. On this box I use IceWM as a window manager, so there are way fewer SOs installed than on a fully-blown KDE system.

On the other (heavily bloated) box the new approach failed, and so it performed a fallback to the standard Gentoo method.

Interestingly, the new approach actually seemed to work on the bloated box also. But when I checked /etc/prelink.conf, I noticed that for some reason the kde-3.5 libraries were not among the set of search paths used by prelink!

So I added /usr/kde/3.5/lib to PRELINK_PATH in /etc/env.d/99local manually, ran env-update to propagate my changes into /etc/prelink.conf, and eventually re-ran my tool. And only then it failed.

So it seems the Gentoo devs have excluded the KDE libraries from prelinking in the standard configuration.

That is, as long as you do not manually add the KDE libraries to the prelink search path (as I did), my new prelinking scheme will typically also work with KDE systems, despite of the fact that KDE is too large for a 32-bit-processor to be prelinked: KDE will just be skipped when prelinking, as it always has been using the standard Gentoo prelinking guide.

If you actually want to prelink KDE, there are 2 choices for you:

Purchase a 64 bit processor and recompile your system in 64 bit native mode, yielding a fully-blown 64-bit Gentoo with multi-exabyte virtual addressing space for every single process. Then you can easily add the KDE-paths to /etc/env.d/99local, and prelink will certainly never run ouf of virtual addresses.

Or just stay with 32 bit and also add the KDE-paths to /etc/env.d/99local. But expect fallback to the rather inefficient Gentoo standard method then. On the other hand, a KDE with partial prelinking is still better than a KDE without any prelinking at all. It can never be as effective as on a 64-bit box, but a moderate speed gain can still be expected.

Before finally presenting the latest version of my guide, a big "thank you" goes to all the people in this thread who have contributed to this guide by testing or suggesting valuable additions.

If you have a 64-bit system and are also using KDE, add even more paths to the colon-separated list of prelink search paths in /etc/env.d/99local. If you are using kde-3.5, add the following path to PRELINK_PATH in that file: /usr/kde/3.5/lib. (Substitute 3.5 by your KDE version.) Or even better, just do an ls /usr/kde and you can see which KDE versions are actually installed. Add the lib paths for all of them to PRELINK_PATH.

Run

Code:

env-update

.

Run my script prelink-system (You can find it at the end of this article). Hint: Run the script with the --help option for learning more about its options.

Run my script again as

Code:

prelink --verbose

each time your libraries might have been updated, such as after an emerge. This will prelink any new/updated libraries also.

I have also written a second script show-writable-code-segments which lists all writable code segments which currently exist in memory.

Based on the naive assumption that relocated libraries might remain writable after they have been relocated, this would then print a list of loaded libraries where prelinking was not effective.

Which means, the shorter that list is, the better.

However, I am not sure how correct this naive assumtion is. If the dynamic loader resets the memory protection for relocated code pages back to read-only after relocation, this won't work. If you know more about this, please let me know.

If you want to revert what my script has done, run the following command:

Code:

prelink -au && prelink -amR

This should restore the default behaviour of the Gentoo prelink guide.

prelink-system can operate in two different modes, "full" and
"incremental", as specified using the command line options. The
default mode is "incremental", unless if run for the first time,
when "full" will be the default.

In full mode, each and every library and executable within prelink's
search path will be processed. This will take a long time, but will
will be done thoroughly. Because of its long running time, full mode
should only be used once in a month or so, or after reaching some sort
of "system installation milestone", such as recompiling large parts of
the system.

In incremental mode, only new or updated libraries and executables
within prelink's search path will be examined, and thus there may be a
very small chance for doing things not as optimal as in full mode.

However, incremental mode is much faster than full mode, and can
therefore be run after every "emerge" installation operation or on a
daily basis without performance concerns.

Note: The effect of prelinking will only be as good as the settings of
PRELINK_PATH and PRELINK_PATH_MASK which can be customized in your
/etc/env.d/99local configuration file.

I suggest adding the following lines to that configuration files,
unless you know what you are doing and see a stringent reason not to
do so:

# Open a new file for reading.
# If already in use, the old handle will be closed first.
# Returns the new file handle.
sub open {
my($self, $filename)= @_;
DESTROY;
local *FH;
open FH, '<', $filename or die "Cannot open '$filename': $!";
$self->{filename}= $filename;
return $self->{fh}= *FH{IO};
}
}

In order for prelinking to be effective, all binaries and libraries should be assigned a different, unique address range.

This will allow the loader to just map the code segments of those binaries and libraries into the address space of a process and running them without any changes. That is, no relocations are necessary.

It will also allow sharing those mapped pages between processes, because it's always the same data, mapped at the same virtual address within all address spaces.

This will enable you, for instance, to start 100 copies of your favorite KDE word processor, but there will only be a single instance of the word processor's code pages in memory, which will be shared among all those 100 instances.

Relocation is never an issue for binaries (I am using "binaries" in this context when I am referring to executables other than shared libraries) which use SOs, just for SOs themselves. This is because all binaries are typically loaded at the same address anyway. Which will not be a problem, because initially each binary is "alone" in the address space of its process, so no address conflicts can occur at all before SOs are to be loaded.

In contrary to 64 bit processors where it is not a problem to assign each and every binary and library in the system its unique virtual address space range, 32 bit processors have a rather limited overall virtual address space size. So it's not possible to map each binary and library to its own, unique memory range, because we will be running out of virtual addresses then.

However, as you may have noted from what I wrote first, it is not actually necessary to assign each binary and library its own address range: It it sufficient if only all the libraries are assigned different address ranges. It does not hurt if the binaries use conflicting address ranges (among all of the binaries), as long as the do not conflict with any of the SOs. (It will not hurt because normally there is only a single binary in every address space. But there will multiple SOs be mapped into the same address space.)

So, the first important thing my script does is relinking all objects, binaries as well as libraries, using the --conserve-memory switch of prelink. This may be suboptimal for libraries, but is good enough for binaries and will allow the binaries to share the same address space ranges. I also disable address randomization in this phase to make it more likely for each binary to get the same base address.

After that, a second prelinking pass is performed. This time, only the libraries are relocated, and now without the --conserve-memory switch. This should assign a unique address range to each library, but leave the binaries relocated as the are. Address randomization is also enabled, because it's a nice security feature and won't hurt in this phase neither.

If we are lucky, the total code requirements of all SOs will still fit into the 2 or 3 gigabyte address range available to 32 bit processes, without wasting precious virtual address space for also assigning unique address ranges to binaries.

The successfulness of this approach is therefore subject to the question, whether the combined code segment sizes of all SOs in the system exceeds 2 or three gigabytes, or not.

This might not be the case on my box, so it seems to works.

But more testing on different systems is needed in oder to see whether this also works as well on other installations. Which is why testing is required.

Last edited by Guenther Brunthaler on Mon Sep 11, 2006 2:27 pm; edited 6 times in total

Joined: 06 Jan 2006Posts: 1160Location: in bed in front of the computer

Posted: Fri Sep 08, 2006 3:32 pm Post subject:

Interesitng. I'm gonna check it on my laptop as soon as possible.

I got this (3rd phase)

Code:

...
prelink: /usr/local/bin/bjfilteri255: Could not find one of the dependencies
prelink: /opt/cxoffice/bin/wineserver: Could not find one of the dependencies
prelink: Could not find virtual address slot for /usr/lib/openoffice/program/libsfx680li.so
ERROR: Could not prelink --force --libs-only --all --random
Use /usr/local/sbin/prelink-system --help for help.

prelink: Could not find virtual address slot for /usr/lib/openoffice/program/libsfx680li.so

Thank you very much! This was exactly what I was waiting for: Now I know that prelink does not silently fail if it is running out of virtual addresses, but actually gives an error.

With that knowledge, I was able to add fallback code to my script, which reverts to the conventional

Code:

prelink -avm

method of the Gentoo Prelink guide in such cases. My article now contains the updated script.

Unfortunately for you, that also means my script cannot help you: The combined size of all your shared objects alone (not counting the remaining binaries) exceeds the maximum virtual address size provided by Linux to 32 bit processes; so there is no way of assigning a different address range to each and every shared library.

In order to revert your system to the previous prelinking stage, execute the following commands:

Code:

prelink -au && prelink -afvm

This should restore the prelinkage state of your system to that of the standard Gentoo prelinking guide.

It seems, that you are actually one of those Gentoo users who actually needs a 64 bit processor which its large virtual address space. Then prelinking will no more be a problem. (As it seems, prelinking is the killer argument for purchasing a 64 bit processor. Forget about it's wider integers. Forget about it being faster. The mere fact that it allows unrestricted prelinking can start up apps like KDE faster by a factor of 2, and also save a lot of RAM because code pages can be shared to a much higher degree between processes.)

However, there may perhaps still exist a solution to your 32-bit box: I know the Linux kernel provides some choices about the memory layout, which directly affects how much virtual addressing space is available to applications.

So, if you are lucky and use one of the other memory models, the increased virtual address space might be large enough for all your libraries to be prelinked with my script without having to return to the rather inefficient standard settings.

Anyway, thank you for your help. I will update my script in such a way that it reverts to the standard settings automatically if an error occurs like you have encountered one.

Joined: 06 Jan 2006Posts: 1160Location: in bed in front of the computer

Posted: Fri Sep 08, 2006 9:40 pm Post subject:

After the error, I didn't automaticly revert to old prelink, but I tried how what has been done works. Result is much better than before. It at least fels faster.

This machine really needs a cleanup. So when it is done new prelink might still work.

I'm gonna try the script on another 32bit machine soon. That one is just brought up, so there is not much unnececary apps there. (Just wait for OO to compile)_________________Nature does not hurry, yet everything is accomplished.
Lao Tzu

Joined: 06 Jan 2006Posts: 1160Location: in bed in front of the computer

Posted: Sat Sep 09, 2006 3:46 pm Post subject:

On the other machine with kde, oo, firefox, and thunderbird prelink went ok. The difference is that this machine has less bloat. I hope to get the first one to prelink too, just a cleanup is needed._________________Nature does not hurry, yet everything is accomplished.
Lao Tzu

Thanks for the script, it worked very nice for a long time. However, lately it switched to low-efficient mode. My question: Is it possible to completely revert prelinking and restore full-efficiency mode (after removing some dirs or packages)?

Btw: I noticed that some time ago during a prelink update a prelink script was added to my cron jobs. Does this interfere with your prelink-system script or doesn't it matter at all? I've removed it but think it has already been executed once or twice. Fortunately, I didn't experience any negative effects.

So, if you are lucky and use one of the other memory models, the increased virtual address space might be large enough for all your libraries to be prelinked with my script without having to return to the rather inefficient standard settings.

Maybe I'll try this too. You're talking about the High Memory support option (1GB/4GB/64GB), don't you? My config is currently set at 4GB, I could try 64GB and see if it improves anything.

Is it possible to completely revert prelinking and restore full-efficiency mode (after removing some dirs or packages)?

Yes. Just run

Code:

prelink-system --full

This will undo any previous prelinkage in its first phase, and then retry full-efficiency mode.

Small_Penguin wrote:

Btw: I noticed that some time ago during a prelink update a prelink script was added to my cron jobs.

Yes, I noted this too and just modified my copy of that newly installed cron file to do just nothing. (Of course I could just have deleted it, but making it a no-op script instead has the advantage that future reinstallations won't recreate the original cron file, thanks to Gentoo's configuration file protection.)

Small_Penguin wrote:

Does this interfere with your prelink-system script or doesn't it matter at all?

While there are cannot be any catastrophic consequences when using that cron job together with my script, I do not recommend actually doing this.

While this is a safe approach, it will not have the advantages of my script's full-efficiency mode with its unique .so base addresses.

Regarding the usage of prelinking in cron jobs from a conceptual point of view, I prefer running prelink-system manually after emerging some package, because there is little use updating .so file base addresses when no new .so files have been installed at all...

But if a cron job is desired, it should not be a problem running my script from within a cron job instead.

However, care should be taken to avoid running multiple instances of my script or the underlying "prelink"-command at the same time, as this might result in race conditions in the prelink algorithm. (I have not checked whether this is actually true. Perhaps "prelink" uses lock files internally and is thus safe to be run at any time. But I would not count on it.)

But I'm afraid, a 64GB mode won't help prelink on a 32 bit processor, because the 32 bit limit the virtual address space of any process to at most 4 GB. No matter how much RAM is actually installed and is available as a total to different processes.

And as prelink operates on virtual addresses, no memory model beyond 4 GB can actually help prelink.

What I actually wanted to refer to in my original posting were the different kernel options how to partition the kernel - user space address assignment. That's the "Memory model" options in the "Processor type and features" section of the kernel configuration.

I have got a largish desktop box (gnome + kde + misc other stuff) that would fail with prelink -aR.
But ever since I compiled the 2.6.19 kernel with "64 bit Memory and IO resources (EXPERIMENTAL)" option turned on, prelink succeeds with these parameters.
Now "High Memory support" set to 64GB forces that same CONFIG_RESOURCES_64BIT flag to be set, so it should have the same effect plus enabling Intel PAE switching.

Now it is true I did not test is syntetically, and rather on the live desktop system, but I doubt my libs shrunk suddenly to below 4gb.

I am using farly recent 32 bit dual core pentium 4 chip, so it must support PAE. Although setting CONFIG_RESOURCES_64BIT seems sufficient.

Also konqueror goes down from 0.1 second real load time to 0.01s. It's 10 times faster when prelinked.

Yes, I noted this too and just modified my copy of that newly installed cron file to do just nothing. (Of course I could just have deleted it, but making it a no-op script instead has the advantage that future reinstallations won't recreate the original cron file, thanks to Gentoo's configuration file protection.)

Good idea.

Quote:

But ever since I compiled the 2.6.19 kernel with "64 bit Memory and IO resources (EXPERIMENTAL)" option turned on, prelink succeeds with these parameters.
Now "High Memory support" set to 64GB forces that same CONFIG_RESOURCES_64BIT flag to be set, so it should have the same effect plus enabling Intel PAE switching.

Confirmed. I've enabled the experimental 64bit memory and IO resources option, and prelink-system now runs in full efficient mode. Thanks.

When I'm doing 'prelink -au && prelink -afR' and not getting any errors. Does that mean it works fine for me? How I can check that everything is fine. Normally in order to check if binary is prelinked I use