This script will clean out any tarballs in your /usr/portage/distfiles directory where a newer one is proven to exist. As the presence of a newer tarball for a given piece of software usually implies that you have emerged a newer version, it also implies that you don't need the tarball for the older version sitting around on your disk simply wasting space! No more ... they will be cleaned up.

Notes: this script runs in a "pretend" mode by default , where files that would be deleted are displayed, but not actually deleted. To clean out the files, pass on the --nopretend parameter and those old source files will be wiped up quicker than a flash! It only operates on the following types of file: .tar.gz, .tar.bz2, .tgz. Of course, I won't be held responsible if it doesn't work as expected!!! But I've tested it and it works just fine for me, and of course it will not delete files unless you explicitly tell it to. If you find it useful, do check back because (as usual) I will edit the post when I make improvements to the script. Code follows:

Would delete: AfterStep-1.8.10 in favour of AfterStep-1.8.8
All these are needed for X
Would delete: X420src-1 in favour of X420src-2
Would delete: X420src-2 in favour of X420src-3
freetype is 1.3.1 is needed yet
Would delete: freetype-1.3.1 in favour of freetype-2.0.8
still default compiler
Would delete: gcc-2.95.3 in favour of gcc-3.0.3
maybe other baddies.. If i used this though, my poor dialup would be working overtime

Not really, just things I haven't done yet . The way it checks version numbers is flawed (it needs to break into major/minor numbers and calculate properly), and there are various other things to be done. Basically, it needs more intelligence - I will do it when I get the time, and the script will get a bit larger. I trust you will find the quick hack introduced into the script useful, which allows you to make it ignore certain files - that should tide you over until the next revision.

Quote:

Reimplement it in Python and it could become part of emerge

Thanks, but I think Perl is a much better language than Python - I don't usually touch Python. On the other hand, it's a simple program (for now) so how hard could it be?

Ah, coming closer to making this work properly now. I knocked up a proper version checking system for my new script which umerges old builds lying around on the system. Just sending this post to let the watchers know that the script here will soon be adapted so it works perfectly (without that nasty masking kludge)!

This is a really neat script. I'm basically posting here so I'll get an email when you post the version that you're currently working on. I did a cursory search and I have a few distfiles that would be eliminated with your current script that I wouldn't like to get rid of just yet. _________________if i never try anything, i never learn anything..
if i never take a risk, i stay where i am..

Thanks for the comments. I would appreciate it if people could send me a list of those packages (by private message, no need to clutter up this area with postings) which should be protected from deletion (some have been discussed here such as the XFree86 collection, freetype and so on). I would like to classify them into two categories:

* Packages which have logically grouped files (i.e. X420src-1, X42src-2, X420src-3). In this case we don't mind if an older version is deleted (e.g. XFree 4.1) but the script needs to treat the collection of files as one.

* Packages which should be left because some packages still depend on them (freetype 1.x, gcc 2.9.5 and so on).

Ideally, I want to make the script SLOTS more aware so that it can automatically determine if a distfile is still relied upon by other builds present on a user's system, and avoid deleting them. That might take a little while though ... if anyone understands the slots system very well then by all means get in touch. I think perhaps a workable way of doing it would be to scan through the files in /var/cache/edb/dep looking for occurrences of a package which *must* be equal to version so and so. If it is present then leave the distfile alone. Then again, that might be flawed and perhaps there is a much better way ...

Wouldn't it be better to just find what packages are unprotected and delete those distfiles? I know emerge -cp package can list multiple versions of package as "protected." It seems the functionality to determine if something is safe to remove is already built into emerge, so why duplicate it?

I think you're missing the point. Firstly, the script is about removing distfiles, not about any form of package management per se. The objective originally was this: to remove distfiles that are superceded by those of later versions (as a result of newer versions of software). Such distfiles are completely useless in that they will never be used again unless, for some reason, you want to specifically build an older version of the package which still has an ebuild available. As it stands, it has absolutely nothing to do with what you have emerged or not (as evidenced by the complete absence of the emerge command in the entire script).

As for the idea posed in my last comment, I'll explain with an example:

1. Let's say you install cool_prog-r1 which has dependencies on aux_prog-1.0 and another_lib-1.5 to build
2. Later on your Portage tree has updated ebuilds for aux_prog-1.5 and another_lib-2.0. You emerge them. In the meantime ther older versions reamain on your system because cool_prog-r1 has *explicit* dependencies (expressed in the ebuild) for the older versions and the newer ones would break this particular package. This is what I understand SLOTS are for, otherwise Portage would simply break all kinds of software otherwise.
3. Because you emerged the newer versions of aux_prog and another_lib the newer distfiles are present, as well as the old ones. Fine.
4. You run my script to clean out older distfiles. It sees that aux_prog-1.0 is older than aux_prog-1.5 and that another_lib-1.5 is older than another_lib-2.0 so it deletes the older versions.
5. You decide you want to recompile cool_prog-r1 with some different optimisations or settings. Now Portage has to go and download the older versions of aux_prog and another_lib for *nothing*!!! It's just a waste of bandwidth.

Do you see what I'm getting at now? The idea is a bandwidth saving measurement. Because we often like to recompile our packages and because many packages are dependent on very specific versions of other packages, the idea is merely to avoid deleting distfiles which would be used if you recompiled a build you already have on your system. IMO, the rationale behind this is sensible: if you've already compiled something, chances are that you may want to do it again in the future (not necessarily because a new version came out). Whereas the script as it stands will just delete any older version of a distfile blindly.

Quote:

It seems the functionality to determine if something is safe to remove

Like I said before, my script doesn't remove packages. It removes the files containing the source code from which they were built. This comment is moot.

So the logic would be:
1) Aha, I see a distfile that is older
2) But wait a moment, the user has a package currently emerged which just happens to have a DEPEND or RDEPEND line referencing a package to which this older distfile is related. The user might want to rebuild it at a later stage and not appreciate the emerge process having to get that distfile again - so let's skip that one.

None of this functionality is in Portage. Portage doesn't have a facility to remove old distfiles, a measure which is designed to prevent tedious manual traversal of the distfiles in the interests of saving disk space, trying to guess which ones are completely redundant and judging by some of the comments I got, it is obviously something useful. I cannot see how emerge -cp is of any use whatsoever in trying to achieve this goal as it doesn't list dependencies. It only lists packages which are safe for removal because newer versions have been emerged. I'm not interested in that, because I am not trying to rewrite Portage here ....

Furthermore, why would you want to delete a distfile just because it isn't emerged on your system at the time? I have plenty of downloaded distfiles which aren't installed on my Gentoo box (I regularly use the -f parameter). That doesn't mean I won't do it tomorrow, or that I won't want to burn these distfiles on a CD to save time at another location. And it wouldn't be very nice if you used a shared distfiles directory from a server (as I do). I have Gnome distfiles in my share now which I don't intend to install on my main box yet, but maybe I want to from another. Having said that, it could be useful in some cases as an optional parameter. The problem is then you have to reliably map distfile names <-> package names (they're not necessarily the same and one package might have more than one distfile related to it - such as X). Those are the sort of problems I was hinting at. If you know how to do this cleanly and efficiently then I would like to know ...

The problem is then you have to reliably map distfile names <-> package names (they're not necessarily the same and one package might have more than one distfile related to it - such as X). Those are the sort of problems I was hinting at. If you know how to do this cleanly and efficiently then I would like to know ...

You could easily use /usr/portage/category/package/files/pack-ver.digest to accomplish this. For each version of the package it contains the actual LIST of the files associated with that package. No more guessing which files XFree 4.2 consists of...

Very good script! I'm also posting here to get a notice when you post the new version of your script

~Progster

Yup, I've been emailed about this too! I started, but I'm afraid it's on hold at the moment, lost the blasted code from before so I had to start again. I will get this done one day (hopefully before the second coming) ...