The new version 9.0 of squash_dir now also supports sys-fs/unionfs-fuse which is new in the portage tree.

One advantage of this is that now you do not get any problems with new kernels if the aufs2 patch is not yet ported: The existing in-kernel fuse support suffices for unionfs-fuse (this was already the case for funionfs, but funionfs is not in the official portage tree).

Since I could not find the download address in this script immediately, here it is once more: initscripts

Hi,
I tried your script for my portage tree(squashfs + unionfs-fuse) and I'm a little bit surprised about the fact that rsync tries to delete a lot of files with the "_HIDDEN~"-ending. Each run of emerge --sync / eix-sync produces more of this file deletions....

This is a side effect of unionfs-fuse: This tool seems to mirror the directory .unionfs from the writable branch. This is not good: If somebody knows a way to avoid this, please let me know. I checked the unionfs-fuse manpage for a corresponding option but did not find one (I did not yet inspect the source or other documents).

So far, the only workaround which I know is to add --exclude=/.unionfs to the PORTAGE_RSYNC_OPTS.

squash_dir now has a new name and has become a package of its own: You can find the current tarball on the same webpage as previously (but under a new name reflecting the current version). Please copy that tarball into your $DISTDIR so that the ebuild will later not download it again. In the tarball you will find in addition a corresponding ebuild and INSTALL instructions (not matter whether you want to use the ebuild or not).

Please also read the ChangeLog:

The most important change is that squash_dir will now create the squashfile and clear the original directory when the squashfile did not exist! This feature required previously in the thread is now implemented, but please be aware that it is a dangerous feature if you start with a wrong configuration (although several sanity checks are made, of course).

The only problem I have in my environment (AMD64 with 32bit-chroot using the same /usr/portage) is that unionfs-fuse doesn't work very reliable. in such a case /usr/portage is then no longer accessible and I have to restart
/etc/init.d/squash_portage. This happens one or two times per day without any error messages in syslog. It looks like nfs when the server disappears._________________Train Hard Or Don't Train At All

The only problem I have in my environment (AMD64 with 32bit-chroot using the same /usr/portage)

How do you do that? Using mount --bind? If this mount --bind happens before the (re)start of /etc/init.d/squash_*, the mounting in that script will of course have no effect in the chroot. I am calling

Code:

mount --make-shared /

(in another initscript) before calling mount --bind. This should cause the mount to propagate to the chroot although it may lead to some other problems to make / shared... On the other hand, I used the script only with aufs2: Maybe mounts by fuse are not propagated despite --make-shared, I did not try now. If this is the case, I don't know how your problem could be solved.

I know when it happens (due to a cronjob checking /usr/portage periodically)

Only checking? It does not mount/umount something or perhaps restart some initscripts (perhaps openrc thinks, squash_* should be restarted, for some reason)? Did you try mount --make-shared / as I suggested? (You would need to do this and the mount --bind before starting squash_*)

I suggest you try the scripts from the page of one of my previous posts; they do not use illegal things like exit on bad places (if it does not work with baselayout-1, please report here). On the same page you will also find a working aufs2 live ebuild. Do not use aufs with current kernels.

thanks mv, i am making progress. I have emerged your squash_dir package and am following the README here. I actually did get portage to squash and mount, but I got an error regarding aufs. So I need to patch the kernel for aufs2 and load the module. Correct?

And the masked packages, like autoconf etc, those are OK right?

I will followup with results, and a tested install script for the impatient. (if results are 100%)

I am rolling this up through a few clean kvms to test and verify. Kernel patching does not comfort me. Ultimately it will live on a certain kvm, that will export the dir over NFS for every machine on the network. as described http://en.gentoo-wiki.com/wiki/Sharing_Portage_over_NFS

(6 copies of portage is killing me.. 1 copy, squashed, and stored in one file for archiving will be a big improvement .)

I actually did get portage to squash and mount, but I got an error regarding aufs. So I need to patch the kernel for aufs2 and load the module. Correct?

Yes. Alternatively you can try to set ORDER="unionfs-fuse aufs" in /etc/conf.d/squash_foo in which case unionfs-fuse is attempted first. The advantage is that this does not need a kernel patch, only unionfs-fuse must be installed and the fuse module (which is in mainstream kernel) must be loaded. However, although the required space is roughly the same, it may be slower than aufs2. Moreover, there is the problem with the .funionfs directory mentioned some posts ago (you can also find the workaround there which is also described in the documentation: Search for PORTAGE_RSYNC_EXTRA_OPTS), and maybe some other problems with chroot described also in previous posts.

Quote:

And the masked packages, like autoconf etc, those are OK right?

It is just a question of time until these become stable. I am using them since months without problems. Only in the recent version some standard things can be done cleanly in a documented manner with autoconf.

thanks mv, it is working well. I have one 45 meg file, mounted over loopback and it is fast. I say this should be built in as the default for /usr/portage once aufs etc gets stable. Thanks for all the hours you spent on this, I can tell.
For reference and the impatient, here are the exact commands I used to install mv's script for /usr/portage. I've ran through these 3 times so it should work fine for a 2.6.31 install as of 2010-01-30. I am not saying paste these in and go, read through them and step one line at a time. set -e and set -u are also your friends.

# and append "-aufs" to local version
sed -i 's/CONFIG_LOCALVERSION=\".*\"/CONFIG_LOCALVERSION=\"-aufs\"/' .config

# build and install kernel with minimal patches for aufs module
make
make modules_install
make install

# load new kernel (and pwn anyone who blindly pasted script)
reboot

after booted into -aufs kernel,

Code:

# build and install the new aufs2 module
cd ~/aufs2-standalone.git/
make

# is there a better way to get the ko file in the right spot? I don't really know if this is the right techinque.. ?
mkdir -p /lib/modules/2.6.31-gentoo-r6-aufs/kernel/fs/aufs/
cp aufs.ko /lib/modules/2.6.31-gentoo-r6-aufs/kernel/fs/aufs/

# refresh modules
depmod -a

# load aufs module - hope this works.. if so it should be successful (you may already have loop loaded)
modprobe aufs
modprobe loop

i'm going to let this go for a week, no problems so far, and if still none next week I will experiment squshing /usr/src

to get this to export over nfs, i had to edit ~/aufs2-standalone.git/config.mk to have CONFIG_AUFS_EXPORT = y and CONFIG_AUFS_INO_T_64 = y (2nd one is only amd64). And after rebuilding and reinstalling the aufs.ko, I had to set ...,fsid=2000) in /etc/exports for /usr/portage options .. where 2000 is a unique export ID

The ebuild to squash_dir is now available on the mv overlay which can be installed with layman: You might have to do

Code:

layman -f

first to get the most current list of overlays, and then

Code:

layman -a mv

will install the corresponding overlay. It is recommended to put the line

Code:

mv

into /etc/eix-sync.conf (you have to generate this file if you have not done so, earlier) and to use

Code:

eix-sync

instead of eix --emerge. This way, you will always get the newest versions of the ebuild in case of updates (usually, updates will not be announced here). Of course, instead of the line mv you might also want to use * to update all layman ebuilds.

Hi I have sometimes a problem with the squash_dir (portage) runscript. The stopping fails with an error ... rc-status shows that squash_portage is still running.. so it is no longer possible to start or restart the squashfs portage tree without manual intervention. It is also not possible to stop because it fails again and again...

This is strange; I cannot produce this here: Recent versions of squash_dir (certainly since 10.3, probably also earlier ones) should even after such an error try to umount the other directories, too. Of course, this means anyway that the "stop" will fail, but everything should be umounted (if it can be umounted).

Quote:

3) start, restart produce this output

Code:

/etc/init.d/squash_portage start
* WARNING: squash_portage has already been started

This is clear and due to a slight misconception of openrc (and I suppose, the same for baselayout-1). The problem is: What should squash_dir do in such a case? Returning error status 0 and claiming that everything is ok is probably not appropriate (squash_dir received an error for umounting which can have all sorts of unexpected consequences). On the other hand, if the error status is nonzero, openrc will automatically assume that squash_dir was not stopped - there is no possibility from within the script to say "Something strange happened, but I stop anyway".
The solution which openrc wants in such a case (if you are sure that the stop was successful) is that you call

Quote:

/etc/init.d/squash_portage zap

However, once more: It is strange that squash_portage only wants to umount /usr/portage and after the failure does not try to umount the other directories - I cannot see from the code (in current versions) why this could happen, and I also cannot produce this behaviour.

I use the "bleeding edge" from your overlay and have currently version 10.5 (2010/05/20) installed together with two slightly different vanilla 2.6.34 64bit kernels on two different hosts. And it's reproducable on both! And It happens nearly everytime when I stop it.

I get always this "umount: /usr/portage.readonly: device is busy." message and when I execute "umount /usr/portage.readonly" immediately after "squash_portage stop" it works and /usr/portage.readonly is no longer mounted. I have absolutely no idea why /usr/portage.readonlly is always busy in my environments....

PS:
And it's not the mount problem ( https://bugs.gentoo.org/show_bug.cgi?id=304443 ) because I no longer do a bind-mount for my 32bit-chroot. My workaround for the 32bit-chroot is to copy the whole 40MB portage.sqfs-file to the chroot directory in the chroot32-startscript. This is at the moment the only way to avoid the mentioned file system corruption..._________________Train Hard Or Don't Train At All

that the sys-fs/unionfs-fuse-0.25_alpha9999 doesn't run stable in my environment.

It seems that high load (e.g. using eix-sync) kills the unionfs-fuse stuff and then /usr/portage is empty....
As a consequence of this I ended several times with an 4kb portage.sqfs file his evening....
and a busy /usr/portage.readonly._________________Train Hard Or Don't Train At All

II get always this "umount: /usr/portage.readonly: device is busy." message and when I execute "umount /usr/portage.readonly" immediately after "squash_portage stop" it works and /usr/portage.readonly is no longer mounted.

I suppose that there is not much the script can do about this. (Maybe inserting a sleep before line 592 helps?). However, the strange thing is that if you call squash_portage stop afterwards it should attempt again to umount /usr/portage.readonly: If this is succesful you see probably no message for this, but if you retry again it cannot be successful: Either at the second or the third call you should see a message that /usr/portage.readonly cannot be umounted (either for some reason in the second call or in the third call because it is already umounted).

Quote:

that the sys-fs/unionfs-fuse-0.25_alpha9999 doesn't run stable in my environment.

Please report this upstream to unionfs-fuse. For manual testing and reporting: This is what "squash_portage start" should actually execute in your case:

Upstream is just reading here and would like to see a detailed problem discription.

ok the unionfs-fuse issue: (happens with mv's overlay version and the new 0.24 version, so I switched back to 0.23)
a simple eix-sync/emerge -sync which executes rsync ends with an empty /usr/portage. According to mtab it is still mounted as unionfs-fuse - so from my point of view the unionfs-fuse must be dead in some way because the underlying readonly-squashfs (/usr/portage.readonly) is still accessible. I'm not able to give more information because there are NO error messages in the logs and no kernel-oops. (only the rsync-error messages that it is not able to write/read some stuff to/from /usr/portage)

Quote:

I suppose that there is not much the script can do about this. (Maybe inserting a sleep before line 592 helps?).

The workaround for this strange umount behaviour helps!!! First I tried 10 seconds, then I reduced the sleep to 1 second and it works.
I tested and verified it this evening more than 10times on both hosts with a one second sleep in line 592. The error occurs now only when I comment out the "sleep 1" line._________________Train Hard Or Don't Train At All

Are you sure its 0.24 and not 0.25alpha? 0.24 is mostly a bugfix release to 0.23. But 0.25 simplifies path building (unionfs internal) and handles MAX_PATH_LEN better. But this change might have introduced bugs.
However, I'm a Debian user and only found that thread here, as I had been curious what Gentoo is using unionfs-fuse for. So that means I am not familiar with what you are doing.
If you want to help me to debug it, I'm sure we will have have a solution immediately as soon as I know what is going on.

So could please desribe what you are doing and what is exactly the failure?

From mv's post I can see how you run unionfs, but then it starts to become unclear. There "busy" problem on umount? And/or you cannot access the union anymore?

If possible, please compile unionfs-fuse with "-DDEBUG" and then run it in the forground using the "-d" flag and capure the output.

Presently there are not syslogs, as it would deadlock on writing syslogs, if /var is a union served by unionfs-fuse. So syslogs are disabled until the ring-buffer+syslog threads branch is ready (almost done and will be merged with 0.25 final).

The second line should free the squash-filesystem (the second alternative in this line should actually not occur and is only there as a fallback for some versions of util-linux with loop-aes patch).
The problem seems to be that in the moment when the first umount returns, the corresponding directory /usr/portage.readonly might still be reported as busy: The sleep I suggested is between the two lines. Of course, I can hardcode the sleep in the script, but it appears to me like a hack: I would suppose that the first umount should not return until all filesystems in the union are freed (and thus no longer busy).

Edit: Inserted also flock and option -i (who knows: maybe this causes the problem? However, without it, I had problems with aufs2). The flock appears necessary since umount does not seem to be thread-safe, and with openrc it might happen that several instances of the script (for separate directories) run in parallel.