My script looks for packages that require the failed package to be built before proceeding. For instance, continuing with my example above:

if xbitmaps were to fail, the script would perform "equery depends x11-misc/xbitmaps", producing these packages as dependents:
x11-base/xorg-server-1.1.1
x11-apps/xsetroot-1.0.1
x11-libs/openmotif-2.2.3-r9

These aren't automatically removed from the emerge queue. Rather, the output of emerge -p is performed on each of them. If xbitmaps were really needed to proceed, portage would include xbitmaps in the output of emerge -p. However, since I already have an old version of xbitmaps, portage doesn't require that the new version be installed before proceeding with these emerges. Thus, the script only removes packages which must be built before proceeding.

I hope I explained this okay.

Makes perfect sense, thanks!
It seems your approach is stronger than just what depends on this, ie by using the -p flag. Why can't we just use that approach in the bash part of Guenther's script?
@Guenther - thanks for all the work! The simulator results were interesting.
Would we be able to use the algorithm count_zero outlines (ie if APPB depends on FAILED_APP, do an emerge --pretend of APPB to see if it would actually require FAILED_APP to be emerged first, and only if so move APPB to the failed list.)

ie if APPB depends on FAILED_APP, do an emerge --pretend of APPB to see if it would actually require FAILED_APP to be emerged first, and only if so move APPB to the failed list.

We could - if only Portage dependency information was to be trusted!

It's still the same problem: emerge --pretend also uses Portage's dependency information, which might be wrong. It's not only emerge --emptytree which is affected.

To be fair: Actually, it's not Portage's fault at all!

Not portage makes a mistake in such cases, but the ebuilds simply provide incorrect dependency information.

That is, they specify more dependencies than they really need.

Or - even worse - they do not specify dependencies which they actually do require.

In the first case, nothing disastrous happens (more packages are compiled than necessary), unless that incorrect dependency leads to a circular dependency graph.

In the second case, the missing dependency will go undiscovered as long as the actually required package happens already to be installed by coincidence (or rather due to dependency requirements of other packages which are already installed).

The algorithm in my script will handle both cases, because it is not really dependent on Portage's dependency information. While Portage's information will be used, it actually only serves as a speed-up for the algorithm.

So the problem is clearly in the ebuilds.

Unforunately, incorrectly specified dependencies are not necessarily easy to detect.

It's not always just so that a package says: "I depend on this and that package, period."

Rather there are typically conditional dependencies involved: "If this USE-flag is set, and that USE flag is not set, then I'll require that first dependency. Otherwise, if that third USE flag is set, and ..."

Furthermore, dependencies cannot only be triggered by USE-flags, but also on other types of conditions like package version number ranges. Actually, the full power of the shell programming language can be used to evaluate conditions, which means the complexity of dependency specification is potentially unlimited.

And if that was not complicated enough, the complexity is further increased by the fact that ebuilds can be inherited from other ebuilds, including their dependencies.

Summing up, dependency specifications can get pretty complex and error-prone in ebuilds, and one should not be too surprised if some ebuilds contains incorrect dependency information as a consequence.

function insert(pkg,revdep)
if pkg is not in buildlist,
if pkg has deps,
foreach listofdeps(pkg) as dep
if dep not in revdep
insert(dep,revdep+pkg)
add pkg to buildlist

example:

let a, b, c, d some packages
a depends on b
b depends on d
c depends on d
d depends on a

so we have a cycle a>b>d>a, and an innocent package c.

we run install(a c), here's the running pass, done by hand:

Code:

insert(a) is called
a not in buildlist
pkg has dep b
b not in revdep
insert(b,a) is called
b not in buildlist
pkg has dep d
d not in revdep
insert(d,a b) is called
d not in buildlist
pkg has dep a
a is in revdep, a ignored
add d to buildlist
buildlist is d
return
add b to buildlist
buildlist is d b
return
add a to buildlist
buildlist is d b a
return
insert(c)
pkg has dep d
d in buildlist, ignored
add c to buildlist
build list is d b a c
return
build d b a c

now run install(c a), you will end up with a different order of a, b and d (it's: b a d c). which one is right? hell if I know. they may even be both right or both wrong. all the fault of a package seemingly not related to the cyclic dependency of a b d. so when you do emerge world, the order of the content of any package of the world file influences the order of cyclic dependencies. now, it's insane to try to twist around a thousand unrelated packages just to build things in the right order.

what the algorithm does inside, is building an acyclic oriented graph, and move recursively inside it, building the buildlist along the path.

with graph theory, it is provable that:
1. each package not in a dep cycle will have all of its deps in the right order
2. it is impossible to solve circular deps with the given information, as it results in an infinite loop (thus the passage of redep, which eventually breaks the n-cycle at n-1 steps)

the only way to solve cyclicness is to have more information. this specific information is the extra bit of contextual knowledge we have over the machine:
- linux-headers do not compile, so they can be installed before gcc
- you already have some other gcc that can build glibc, so glibc depends on gcc in a less important manner gcc depends on glibc
this will result in the instinctive order of linux-headers glibc gcc. so how did we solve cyclicness inn the end? we didn't. cyclicness is unsolvable. what we added is more information to the graph, rendering it somehow acyclic. what we did is prevailing one path over another on the graph. we said one was dependent in a 'less important manner', so what we did is that we weighted the graph. by assigning a weight, a value, a priority to each dependency of a package, we could sum them while walking the graph, and take the path with the bigger (smaller, whatever distinctive) priority, thus favoring an order.
this way, install(a c) and install(c a) will yield the same, and right, order of a b and d.

so, "if only Portage dependency information was to be trusted" is a bit harsh
in the end, emerge does a great job dealing with dependencies. it just lacks information._________________Moved to using Arch Linux
Life is meant to be lived, not given up...
HOLY COW I'M TOTALLY GOING SO FAST OH F***

Last edited by Lloeki on Wed Oct 04, 2006 2:13 pm; edited 1 time in total

Well, this is all really informative. In terms of implementation, it comes down to doing the best we can. In this case I reckon that comes down to using emerge --pretend, as that's the best info we have. After all it takes into account the user's current USE flag set, and does give a list of what we'll need to pull in as against what we might.

I prefer the algorithm in count_zero's script so I reckon I'll try building the list (for ordering) with Guenther's script and then doing the actual install with count_zero's.

One question: what does portage do if it gets a circular dependency like you both outline? Just that I've never seen it happen. It's not hard to detect such a situation in any case.

I don't know, seeing how emerge -pve gcc and emerge -pve glibc give out the same result. Anyway, emerge has no other choice than to break the cycle, whether it does it haphazardly or willingly at some place is beyond my knowledge: a look at the source would be much helpful to understand the black magic, but I'm fairly new (albeit learning) to python.

Guenther, have you considered python as a language for your script? this is really a great language (even if I miss curly braces: they really help in readability, all the more with pair highlighting in IDEs), and one certainly has to worry if an upgrade breaks python, since it will break portage too... so yes, relying on perl is not the thing to do, but python may be a good choice._________________Moved to using Arch Linux
Life is meant to be lived, not given up...
HOLY COW I'M TOTALLY GOING SO FAST OH F***

Well, yes and no: I have no experience with python, while I have much experience with Perl.

Another language to learn... where learning a new language is usually the smallest obstacle: Learning the associated libraries/APIs and run-time-environment is the real challenge.

For instance JAVA: The core language is very small and can easily be learned in a couple of hours (especially with experience in C++). But the wealth of runtime libraries available surely need considerable time to master.

And I'm afraid, the same will hold for python, which also provides numerous runtime-libraries for the programmer to use.

Being a Perl guy, I admit python has a smaller, easier to learn core language, a more consise syntax, and generally looks more elegant than Perl.

I especially like its generators and the fact that all of its runtime libraries are based on exception handling.

But there are also drawbacks.

When first trying python on a Windows host, I encountered many serious problems in its UNICODE support, rendering it almost useless for internationalized applications. (Perhaps they have fixed that since then?)

I also dislike its habit of prefixing/postfixing everything with underscores (such as constructor names) for special-purpose functions. To me, it's rather a hint what features the language designers forgot to think about in the first place, than a well-designed syntactic construct.

Another issue are its rather idiosyncratic string-quotation mechanisms which I also consider not the most elegant solutions to the problem.

Finally, I like to make heavy use of anonymous functions in Perl.

And the lambda functions or python are so crippled that they are not really a replacement for that.

Summing up, I consider python to be good, but not excellent.

BTW, I think JavaScript is a largely underestimated language (regarding its potential).

It has amicably stolen many of the best features of Perl, but avoided to also inherit Perl's bloat. It has a cleaner syntax than Perl which is closer to C or JAVA, and allows easier integration into host applications than most other script language.

And in comparison to other popular scripting languages like LUA, JavaScript provides much more expressive power.

I would really be happy to see JavaScript as a standard shell language instead of the Bourne shell.

But reality is different...

Lloeki wrote:

so yes, relying on perl is not the thing to do, but python may be a good choice.

Agreed.

But to be honest: I'm just too lazy to learn a new scripting language, only to improve a script which already works and does it's job.

Also, this whole problem only arised after another improvement has been added to the script: Interruption-free operation.

Before that, the script simply stopped as soon as it encountered the first failing package, and the user was free to decide whether to edit the script file and remove any packages (such as dependent packages, a.k.a . "reverse dependencies") from it.

And finally, we still have count zero's excellent script!

I therefore suggest the following approach:

For the common cases, where the emerge -auDN world at the beginning of my guide worked without any problems, the current version of my script will do fine. This will usually be the case for installations where administrators run "emerge --sync && emerge -auDN world" in regular intervals anyway in order to keep their systems up-to-date.

For more specific cases, such a partially inconsistent systems or if rebuilding some packages can be expected to fail in advance, use count zero's excellent script which uses my script only as a dependency list generator backend.

For the hardest cases where nothing else works, use BadPenguin's method of doing a complete stage-1 reinstall.

Hi all,
I ran the program (script) after having gone through the necessary preliminaries according to the
guide.
The list to be recompiled contained some 800 packages.
I know this is a lot,but sometimes you try something once and leave it in.Pure lazyness on my part
and having 250 GB of which only 8% is used.
Some people wondered how long it would take:
from oct.2 21:17 to oct.5 02:48 hr.
My box: athlon-xp @1447 Mhz,1G RAM IDE HD.
I had a problem after reboot:
The kernel was compiled with gcc 3.4 , the rest with gcc 4.1. so X didn't work.
I recompiled the kernel and then did

Code:

module-rebuild rebuild

When the nvidia-driver finished compiling there was a message on the screen:

and X still couldn't start.
I emerge synced and a new nvidia driver was available: 1.0-8774.
I emerged that one,same error message.
Apparently something like this has happened before to people and they always solved it by emergeing a newer driver.
For me this didn't work as you can see.
I desperation I downloaded the same driver from the nvidia site as a file which is selfextracting
and ran it.No error messages and lo and behold it worked!
The kernel module is now located at different folder.
I would like to know if I did anything wrong,if so what?
Gerard.

By the way, the generated script will do this in a recoverable way:
It can be aborted at any time by you, and will continue where it left off
when you re-run it. (The package where the script was interrupted will
have to be compiled again from its beginning, though.)

Isn't there a way to avoid this (The package where the script was interrupted will
have to be compiled again from its beginning)?

The reason for the error message was quite simple: The NVIDIA-drivers I was using at that time were outdated, and so a driver update was required in order for the NVIDIA drivers to work with the new kernel. (The "unknown symbol" problems arise from the fact that the NVIDIA drivers wants to use old kernel functions which have been removed from newer versions of the kernel.)

So it's neither a problem of Portage nor my script; you just need to update your NVIDIA drivers.

Gerard van Vuuren wrote:

I emerge synced and a new nvidia driver was available: 1.0-8774.
I emerged that one,same error message.

I'm not using binary NVIDIA drivers on the box where I'm writing this posting, so I can't check. But I remember there was an important change in the NVIDIA driver packages a couple of months ago: Before the change, there were two different ebuilds, one for OpenGL/GLX and one for the NVIDIA kernel driver. And since the change, a new ebuild has been provided (with a different package name) which combined both ebuilds into a single one.

Or was it exactly the opposite of what I've written here? Can't remember.

Anyway, look for a those new ebuilds. The NVIDIA Hardware Acceleration Guide has also been updated accordingly, I presume.

Gerard van Vuuren wrote:

For me this didn't work as you can see.

Because you only updated your old driver; but you need to unmerge it and install the completely new driver.

Gerard van Vuuren wrote:

No error messages and lo and behold it worked!
The kernel module is now located at different folder.

One more remark:
Now when it boots X doesn't start until eth0 is finished.
It used to start before the starting of eth0 was visible on VT1.
Gerard

That could be the result of some bug fix in the service startup scripts.

Unless you have modified your X startup files to pass the -nolisten tcp option to the X server, the X server will listen on port 6000 for inbound connection requests from remote applications which want to use your display. (The xauth cookies should normally ensure that only authenticated remote applications can get access to your display.)

But in order to listen to network interfaces, the network drivers must have brought up before. Which means network has to start before X.

BTW, I always use -nolisten tcp in my X startup scripts, as I am not using X via network connections other than through SSH tunnels (which will bypass the port 6000 connection method anyway; i. e. that port 6000 listening is not required at all for SSH X11 forwarding).

Note that X connections via port 6000 are also totally insecure, because they are not encrypted and thus only make sense (if at all) within a completely trusted LAN.

Isn't there a way to avoid this (The package where the script was interrupted will
have to be compiled again from its beginning)?

Unfortunately, not when using emerge.

The whole emerge/ebuild stuff is essentially nothing else than a bunch of scripts which run the Makefile of a package with the right arguments and environment variable settings.

However, it is sometimes possible to restart a Makefile's actions from "within the middle" if the emerge's actions are done manually.

That is, in order to emerge some package xxx, instead of doing a

Code:

emerge xxx

you can do the following:

Code:

ebuild $(equery which xxx) compile

If there is an error during the compile, you can go into the ebuild's working directory (usually /var/tmp/xxx/work) and run make again there (after fixing the problem), which should continue to compile the remaining source files.

As soon as the compilation finished successfully, you need to run the following commands to complete the job:

Code:

ebuild $(equery which xxx) qmerge

which will install the compiled binaries into your live filesystem

Code:

ebuild $(equery which xxx) clean

which will remove the build directory

Code:

emerge --noreplace xxx

which will register the package in the world list of installed packages

Code:

emerge --ask --clean xxx

which will remove old versions of the package

Unfortunately, this assumes the Makefile of the package has been written in a clean way, which will usually be the case. But there might also be some packages where the above approach will not work; in such cases it is not possible to restart a make "from the middle".

That could be the result of some bug fix in the service startup scripts.

you might just be right, there's that:

Code:

$ cat /etc/init.d/xdm
(snip)
depend() {
need localmount

# this should start as early as possible
# we can't do 'before *' as that breaks it
# (#139824) Start after ypbind and autofs for network authentication
after bootmisc readahead-list ypbind autofs openvpn gpm netmount
before alsasound net.lo

# Start before X
use acpid hald xfs
}
(snip)

so, if you have netmount, which evidently depends on net, x will start after netmount, thus after net.
you can adjust some things in /etc/conf.d/rc, namely RC_NET_STRICT_CHECKING, but htis might badly affect some things.
or remove netmount altogether from startup, or if x really doesn't need netmount but you mount some remote folder, remove netmount from the after line._________________Moved to using Arch Linux
Life is meant to be lived, not given up...
HOLY COW I'M TOTALLY GOING SO FAST OH F***

Last weekend, I used the script to update my system to the new GCC and everything seems to be working fine! I especially like the hands-free aspect and it's definately a time saver. No more baby-sitting portage with "--skip-first" to get around the troublesome packages.

Although I did manage to find one package that would trip-up the script: wargus. When you emerge it, the ebuild halts while waiting for the game CD. Luckily it was at the end of the compilation run and I noticed it. I didn't have the game CD handy, however, so I altered the state file of your script to skip ahead a package. Perhaps a "--skip-first" option would be handy after all?

I'm not sure how to handle this from within a batch script. If there were a --batch switch for emerge or ebuild, I would use it.

I could redirect the output of yes as the standard input for the ebuilds, or just send newline characters.

But that won't help in such cases, because operator action (such as inserting CDs) is actually required; not just pressing Enter.

Doogman wrote:

so I altered the state file of your script to skip ahead a package.

Which is the right thing to do in such cases.

Doogman wrote:

Perhaps a "--skip-first" option would be handy after all?

That's a good point - I'll consider this. But skipping is not exactly what is wanted in this scenario: If a package is skipped by simply incrementing the index from the state file, it is considered as being "compiled and up to date". This means the admin has to keep track about such skipped packages, and manually re-emerge them later.

Perhaps I should create another package list, where manually skipped packages will be recorded as a reference for the administrator later?

However, until I implement something, incrementing the state file entry manually is the easiest way to skip a package.

I'll also post a description of the state file contents to this forum.

Motivated by a posting from Doogman I decided to publish a description of the state file which is maintained automatically by my script for the reference of those who want to hack the script or modify its behaviour.

The state file is based on the same basename and filesystem location as the generated script, but has the string ".state" appended to the basename, and a dot (".") prepended to the basename.

The state file is a normal UNIX format text file.

The format is simple:

The first line contains the state file version number. This version number must match the internal version number of the script which is in control of the state file. The script will consider the state file to be outdated if the version numbers do not match, and will ignore its contents.

The remaining lines of the state file have only a specific meaning in the context of a specific version number in the first line.

In the following, there is a description of the state file contents for state file version number 1.13:

Line 1: The version number as stated above.

Line 2: The GCC version which is currently used for compilation. If this does not match the actual default compiler version, the state file contents will be ignored and recompilation of all packaged will restart from the beginning. This is a safeguard for situations where a compiler version update was made before the script finishes its execution in the first place, which should normally not be the case.

Line 3: A line containing 3 unsigned integer values, separated by a single space character. The values have the following meaning:

Progress counter. This is the item index number of the last package emerged successfully. When re-running the script after an interruption, it will skip all packages with item index numbers less than or equal to this value. This means, an effect similar to emerge's --skipfirst can be emulated by just manually incrementing this state file value before re-running the script.

The number of packages recompiled successfully in the current pass of the script.

The number of failing packages in the current pass of the script. If this number is nonzero when the current pass of the script is done, another pass will be run for the failing packages. But this will only be done if a least one package has been rebuilt successfully in the last pass. This constraint has been added to eliminate the possibility for infinite loops.

The package index numbers mentioned above refer to a specific package name and version each.

The associations are directly listed in the generated script file in lines starting with the word item.

The format of those lines is simple:

itemindexpackage_name_and_version

where index is the item index number, and package_name_and_version is the full package name and package version.

(Sorry, this might be in a wrong thread, but I was reinstalling gcc and after that using Guenther's script..)

Ok, I have a really weird problem.. emerge --pretend --update --deep --newuse world wanted to recompile gcc 4.1.1 with "GTK" flag, so I did that My system is built with 2006.1/desktop profile, so there was no gcc upgrade, just a reinstall. After rebooting, I noticed that ntfs-3g mounts are no longer working, I get an error message:

"Error: Volume name could not be converted to current locale: Invalid or incomplete multibyte or wide character"

It still mounts the volume, but "ls -l -a" doesn't show anything, not even the dot-directories. The volume is not corrupted, I checked it in Windows. Also kernel's read-only ntfs mount works fine.

And here are two other symptoms that are perhaps related (they appeared at the same time):

It was not easy but I got it done. I am not sure exactly why but gcc-config-1.3.14 did not fully activate gcc-4.1. Although it did say gcc-4.1 was default. The problem was parts of kde were still looking in the /usr/lib/gcc/x86_64-pc-linux-gnu/3.4.6 folder for libraries and complaining that the ABI was wrong. After installing eselect-compiler-2.0.0_rc2-r1 (said that gcc-3.4.6 was default) and using it to select gcc-4.1 everything worked. I did have this script rebuild my packages again just to be sure and all but 5 packages emerged:

I have just used Guenther's script for a second time to recompile my entire system (once for gcc 4.1/gentoo 2006.1 and another time for glibc 2.5). Whilst it was compiling I developed the following script to find out how long it would take to finish.

Code:

#!/bin/bash
# Script to find out how long it will take to finish recompiling
# all the packages in Guenther Brunthaler's recompile-remaining-packages
# script (http://forums.gentoo.org/viewtopic-t-494331.html)
#
# Gary Macindoe
# 29-Oct-2006

export recompile_script="/root/recompile-remaining-packages"

export genlop_c="$(mktemp)" || exit 1

genlop -nc > "${genlop_c}"

# Functions to add dates together

# parse_date <string>
# Takes a lingual date as the first argument and parses
# it into environment variables.
# The variables set by this function are:
# SECS: the number of seconds
# MINS: the number of minutes
# HOURS: the number of hours
# DAYS: the number of days
# WEEKS: the number of weeks
# YEARS: the number of years
# The function returns 0 on success, 1 on failure.
function parse_date () {
local number token

# add_dates <days1> <hours1> <minutes1> <seconds1> <days2> <hours2> <minutes2> <seconds2>
# Takes two dates and adds them. Carries over seconds, minutes and hours to minutes, hours and days.
# The variables set by this function are:
# TOTAL_DAYS: the total number of days
# TOTAL_HOURS: the total number of hours
# TOTAL_MINS: the total number of minutes
# TOTAL_SECS: the total number of seconds
# Returns 0 on success, 1 on failure.
function add_dates () {
if [ $# -ne 8 ];
then
return 1
fi