Over time, any system builds up cruft - files and directories that don't belong to any package.
This script tries to list all the cruft on a system with as few as possible false positives to help you keep your system in good working order.
Note that plenty of packages drop extra files all over the place; any extra help is appreciated.

NOTE: This is not the whole script, I got bored updating it in two places so this is just a sample. Follow the below link for the actual script.

#
# name: python_version
# desc: run without arguments and it will export the version of python
# currently in use as $PYVER
#
python_version() {
local tmpstr
python=${python:-/usr/bin/python}
tmpstr="$(${python} -V 2>&1 )"
export PYVER_ALL="${tmpstr#Python }"

# Files and directory trees to omit, ordered alphabetically.
# If a package drops files or directories in more than one place, move its
# definitions to the appropriate stanza. ldconfig symlinks go in the last
# stanza. Put large lists of single files next to the CONTENTS listing code.
PRUNE="
/boot
/dev

Seems pretty straightforeward. All you have to do is add files or directories to the ignore section for some of your own scripts or whatever. Is there a way to actually remove those files as well with this script? Maybe pass a --real flag or something to do the damage?_________________Ban Reality TV!
Adopt an Unanswered Post

It is actually quite useful - I just added parsing of /var and it picked up a load of cached man pages I no longer have the originals of, which is nice. I think since beginning writing it I've saved maybe 200MB of hard disk space, which isn't bad

Basically, you can't - except by deleting it and seeing what breaks.
Generally files which are left over from upgrades will be safe to delete.
This is of course a work in progress as it has only been tested on a few systems, which is why I would appreciate help to eliminate false positives.

If you have genlop installed, you can use my strategy of looking at the mtime on offending files and then running genlop -l | grep "Jan 26 10" (for instance) to see if it landed just before a particular ebuild finished merging; if that ebuild is the latest version of a package on your system then my script needs a new entry, otherwise it's a holdover from an old version and is probably safe to delete.

I think this is something that really should be implemented within emerge (if it isn't already, I didn't check). Simply go through ALL not installed packages and remove all existing files that are linked to them. Feature-request??_________________if only I could fill my heart with love...

Simply go through ALL not installed packages and remove all existing files that are linked to them.

Well, *that* wouldn't work - one problem of a source distro is that there's no way to know what a package will install without actually merging it.

But I'm waiting to get this a bit more finely tuned before getting it into portage, which is why I would appreciate people testing it. Remember, it doesn't actually do anything to your filesystem, so it is safe to run and see what it outputs.

Last edited by ecatmur on Sun Mar 28, 2004 3:40 pm; edited 1 time in total

after running this, in addition to a whole lotta stuff, it's telling me to delete most all of my .conf files and pretty much all of nessus...i've had this system up for exactly 2 days now...i doubt this is accurate?

If it's listing .conf files, that's a potential problem... can you post or pm me what they are, and which packages they pertain to?
If it's listing part of nessus, then I imagine nessus is being very ill-behaved and dumping files all over your filesystem. I say this because the nessus ebuild lacks a postinst section, implying any extraneous files are created by nessus when it is run.

I could be wrong, though... but I won't know unless you tell me what the files it lists in error are.

If it's listing .conf files, that's a potential problem... can you post or pm me what they are, and which packages they pertain to?

I ran the above script, and I got a few config files in the list, which is of concern. They are
1. all config files in webmin
2./etc/lilo.conf ( I am using lilo )
3./etc/nessus/nessusd.conf
4./etc/mplayer.conf
5./etc/proftpd/proftpd.conf

I got all these files, running a filter of config files on the result. I wonder whether some of the other files listed by the script depends on some active package on my system.

OK... sorted lilo, nessus and proftpd. I don't have webmin so I've just put in a rule to ignore the whole of /etc/webmin and /var/webmin - if this can be improved on let me know.

Is this a bug with the script or a bug with the ebuilds?

Don't most ebuilds install files in /etc/ with knowledge of where they came from? I think rather than make the script exclude these programs, you should use the script to report bugs on the files that don't keep track of what they dump in /etc/ no?

Great script, ecatmur. I have just run it and got a rid of quite a lot of stuff leftover by portage, especially in /etc. I still do not understand why /etc is not cleaned after uninstalling package. It can be useful sometimes if you tweak config files, uninstall package and then realize you wan it back, but most of the time it just leaves garbage, especially when upgrading.

The first two files are really imporatnt, they should not be removed. /etc/kernels is directory where genkernel stores kernel configs of kernels it built. I think most people want to keep that.
Your script also lists log files in /var/log directory, I would not consider those files as cruft.

All of these files are active, except /var/log/mail*, but I think those are due to net-mail/ssmtp installed on my system (it does not run, but it is a requiremnt for at). I know about these files: /var/log/emerge_fix-db.log - log file that is created when running fixpackages (needed for people with many binary packages), /var/log/privoxy/* are due to privoxy bing run. I do not know about other files, but they are being used. I am using sysklogd-1.4.1-r10 log daemon.

Also, most of these files above exist with .0.bz2, .1.bz2, ..., .6.bz2 extensions. These are probably created by /etc/cron.daily/syslog.cron script. Your script should probably take this into account.

The script also lists files in /var/run/, such as /var/run/gpm.pid, these files contains pid of daemons running on my system. Therefore the /var/run/ directory should probably be excluded.

OK... sorted lilo, nessus and proftpd. I don't have webmin so I've just put in a rule to ignore the whole of /etc/webmin and /var/webmin - if this can be improved on let me know.

Is this a bug with the script or a bug with the ebuilds?

The script can always be improved. IIRC about Webmin, it does loads of weird stuff so it'd be hard to keep track of what is actually supposed to be in /etc/webmin and /var/webmin. However if someone could write a piece of code (pref. bash, though perl or python is OK too) to list the files that are supposed to be there that'd be useful.

Quote:

Don't most ebuilds install files in /etc/ with knowledge of where they came from? I think rather than make the script exclude these programs, you should use the script to report bugs on the files that don't keep track of what they dump in /etc/ no?

If I did that, I'd have no time to actually maintain the script (or do anything else). Some programs spew files around without good reason, but for quite a lot there's no real alternative; also of course there are config files the admin has to create - these belong to the script but aren't in its contents list.

Better handling of pidfiles - it guesses pidfiles for started services both by appending .pid to their service name and by grepping the service file for start-stop-daemon. Still, there may be some pidfiles I've missed - but for instance, kdm drops a pidfile there which you don't want to remain there after deinstalling kde, so I'm not excluding the whole of /var/run.

Unfortunately I don't have any fonts.list files on my system so I don't know where they turn up. Could you post where they are found on your system?

you got me thinking here, how does portage deal with emerge unmerge currently?

doesnt it keep a sort of diff file log for things that are created as emerge goes about the file system?

next Q if it does, why the heck doesnt emerge unmerge remove these things then ?!?

The great majority of the files script like this catches are files that were created after the install, i.e. files that are needed during operation of the software package. For example a game you install will create a file containing highscores for all users on a machine. Or any file you change after install, unmerge looks at timestamps and removes only files that timestamps consistent with package install time. I do not like this behavior very much , because if you for example touch a binary, or any file, it will be left behind after unmerge.
Then there is a special case of kerel sources, wher you have to perform compile explicitly and since compile produces new files, unmerging of sources will not remove directory containg sources, this has to be done manually.

SO, I think it is mainly because packaging system can only monitor files belonging to a package when installing it, any files created during runs are files of unknown origin for the packaging system.

I like the script. You just helped me clean about half a gig of stuff off my drive. But it's generating a list of over 16,000 files which do not need deleting, especially a lot of stuff in /etc, some stuff in /usr/bin, lots and lots of webmin and usermin stuff and lots of stuff in /var.