Is there a way to atomically replace a squashfs image file that is already mounted?
In my case, /mnt/cdrom/squashfs.img is mounted on /mnt/livecd and /usr is a symlink to /mnt/livecd/usr. So, all commands and binaries are coming from /mnt/livecd. All the daemons in the system which are running, have their executable pages coming from /mnt/livecd.
I want to replace /mnt/cdrom/squashfs.img without killing those daemons. I can not unmount (won't even be allowed because FS is in use) because even the pages belonging to 'mount' program and its dependent libs can get paged out. I can not remount after overwriting squashfs.img because between 'cp' and 'mount -o remount...', some daemon may have asked for a file block which may not even be a valid block number in the new image.
Its pretty much like pulling the disk from underneath the FS. But because the FS is read only in this case, it should be doable.
Is this possible?
Thanks
PS: The use case is something like this:
Boot a gentoo based distro from /mnt/cdrom/squashfs.img (from a simple fat32 4GB partiton at the beginning of the disk) mounted on /mnt/livecd just like a livecd.
Get updates into /mnt/cdrom/squashfs.new.img in one powerful gentoo based system.
Replace the backing store for /mnt/livecd and point it to /mnt/cdrom/squashfs.new.img
Keep toggling between the two for subsequent updates (keeping one to rollback to in case of a failed update)
Use the same image on all Linux systems and a small update shell script which wgets the new squ image and switches to it
Maintenance Nirvana!

On 22/01/15 02:28, long.wanglong wrote:
> hi，
>
> I have encountered kernel hung task when running stability and stress test.
>
> test scenarios:
> 1)the kernel hungtask settings are following:
> hung_task_panic = 1
> hung_task_timeout_secs = 120
> 2)the rootfs type is squashfs(read-only)
>
> what the test does is to fork many child process and each process will alloc memory.
> when there is no free memory in the system, OOM killer is triggerred. and then the kernel triggers hung task(after about five minutes) .
> the reason for hung task is that some process keep D states for 120 seconds.
>
> if there is no free memory in the system, many process state is D, they enter into D state by kernel path `squashfs_cache_get()--->wait_event()`.
> the backtrace is:
>
> [ 313.950118] [<c02d2014>] (__schedule+0x448/0x5cc) from [<c014e510>] (squashfs_cache_get+0x120/0x3ec)
> [ 314.059660] [<c014e510>] (squashfs_cache_get+0x120/0x3ec) from [<c014fd1c>] (squashfs_readpage+0x748/0xa2c)
> [ 314.176497] [<c014fd1c>] (squashfs_readpage+0x748/0xa2c) from [<c00b7be0>] (__do_page_cache_readahead+0x1ac/0x200)
> [ 314.300621] [<c00b7be0>] (__do_page_cache_readahead+0x1ac/0x200) from [<c00b7e98>] (ra_submit+0x24/0x28)
> [ 314.414325] [<c00b7e98>] (ra_submit+0x24/0x28) from [<c00b043c>] (filemap_fault+0x16c/0x3f0)
> [ 314.515521] [<c00b043c>] (filemap_fault+0x16c/0x3f0) from [<c00c94e0>] (__do_fault+0xc0/0x570)
> [ 314.618802] [<c00c94e0>] (__do_fault+0xc0/0x570) from [<c00cbdc4>] (handle_pte_fault+0x47c/0x1048)
> [ 314.726250] [<c00cbdc4>] (handle_pte_fault+0x47c/0x1048) from [<c00cd928>] (handle_mm_fault+0x164/0x218)
> [ 314.839959] [<c00cd928>] (handle_mm_fault+0x164/0x218) from [<c02d4878>] (do_page_fault.part.7+0x108/0x360)
> [ 314.956788] [<c02d4878>] (do_page_fault.part.7+0x108/0x360) from [<c02d4afc>] (do_page_fault+0x2c/0x70)
> [ 315.069442] [<c02d4afc>] (do_page_fault+0x2c/0x70) from [<c00084cc>] (do_PrefetchAbort+0x2c/0x90)
> [ 315.175850] [<c00084cc>] (do_PrefetchAbort+0x2c/0x90) from [<c02d3674>] (ret_from_exception+0x0/0x10)
>
> when a task is already exiting because of OOM killer,the next time OOM killer will kill the same task.
> so, if the first time of OOM killer select a task(A) that in D state (the task ingore exit signal beacuse of D state).
> then the next time of OOM killer will also kill task A. In this scenario, oom killer will not free memory.
>
> with no free memory, many process sleep in function squashfs_cache_get. about 2 minutes, the system hung task and panic.
> because of OOM feature and squashfs, on heavy system, This problem is easily reproduce.
>
> Is this a problem about squashfs or about the OOM killer. Can anyone give me some good ideas about this?
This is not a Squashfs issue, it is a well known problem with
the OOM killer trying to kill tasks which are slow to exit (being
in D state). Just google "OOM hung task" to see how long this
issue has been around.
The OOM killer is worse than useless in embedded systems because
its behaviour is unpredictable and can leave a system in a
zombified or half zombified state. Due to this reason many
embedded systems disable the OOM killer entirely, and ensure
there is adequate memory backed up by a watchdog which reboots
a hung system.
Phillip
>
> Best Regards
> Wang Long
>
>
>
>
>
>
>
>
>
> .
>

On Thu, Sep 18, 2014 at 6:08 AM, Phillip Lougher
<phillip@...> wrote:
>
> Yes, I was intending to ask Guan, Xin for a signed-off patch formatted so that
> it could be applied to git directly. But on re-reviewing it I notice a signed-off line,
> which is the essential requirement, but it still needs a bit of manual intervention
> to get git to attribute the author correctly, because it lacks any From: information, and
> there's no date information. I guess I can produce that because the author and date
> is fairly obvious, but I never like doing more than that because then you're
> in the realms of inventing stuff that wasn't there.
Sorry for being casual ... Here is the patch in a more complete format:
---------- >8 ----------
Author: Guan, Xin <guanx.bac@...>
Date: Sat Sep 13 13:15:26 2014 +0200
Fix 2GB-limit of the is_fragment(...) function.
Applies to squashfs-tools 4.3.
Reported-by: Bruno Wolff III <bruno@...>
Signed-off-by: Guan, Xin <guanx.bac@...>
--- squashfs4.3/squashfs-tools/mksquashfs.c 2014-05-13
00:18:20.000000000 +0200
+++ squashfs4.3.new/squashfs-tools/mksquashfs.c 2014-09-13
13:15:25.817160603 +0200
@@ -2055,7 +2055,7 @@ struct file_info *duplicate(long long fi
inline int is_fragment(struct inode_info *inode)
{
- int file_size = inode->buf.st_size;
+ off_t file_size = inode->buf.st_size;
/*
* If this block is to be compressed differently to the
---------- >8 ----------
Guan

On 18/09/14 04:28, Bruno Wolff III wrote:
> On Thu, Sep 18, 2014 at 03:55:31 +0100,
> Phillip Lougher <phillip@...> wrote:
>>
>> You should not be getting leaks with Valgrind. I frequently run Valgrind,
>> and in the last release of Mksquashfs/Unsquashfs I removed all the (minor)
>> leaks that Valgrind identified - this was part of the massive buffer space
>> hardening of the code.
>
> I'll see if they still happen after I apply the race condition patch. And if so, that they aren't in a Fedora library.
>
>> ==27610== LEAK SUMMARY:
>> ==27610== definitely lost: 0 bytes in 0 blocks
>> ==27610== indirectly lost: 0 bytes in 0 blocks
>> ==27610== possibly lost: 4,352 bytes in 16 blocks
>> ==27610== still reachable: 66,263,321 bytes in 13,645 blocks
>> ==27610== suppressed: 0 bytes in 0 blocks
>> ==27610== Rerun with --leak-check=full to see details of leaked memory
>
> Mostly I got something like the above. Occasionally I got numbers other than 0 for definitely lost and indirectly lost. I haven't run with --leak-check=full yet to see where the problem is reported.
>
When you do get these outputs, send the report to me (and the input filesystem if
it's not too large). My aim is to eliminate all the leaks. There may be leaks
remaining in obscure pathways, but, to my knowledge the remaining leaks
are in error pathways where it is known Mksquashfs will be aborting.
>> The failure strongly suggests there is a rare race condition in fragment
>> writing in Mksquashfs. Reviewing the code changes in the latest release
>> of Mksquashfs around fragment writing, I discovered I have inadvertently
>> put back a race condition fixed in 2009. This race condition could
>> generate the filesystem corruption seen.
>
> Hopefully that was what I was seeing.
>
>> I have pushed the fix to git, and the commit is here:
>
> Thanks I'll try this out over the weekend.
>
> I noticed that the incorrect type patch (for the 2+ GB file issue) hasn't shown up in master yet. I just wanted to make sure that it eventually made it there, I'm not actually waiting on it.
>
Yes, I was intending to ask Guan, Xin for a signed-off patch formatted so that
it could be applied to git directly. But on re-reviewing it I notice a signed-off line,
which is the essential requirement, but it still needs a bit of manual intervention
to get git to attribute the author correctly, because it lacks any From: information, and
there's no date information. I guess I can produce that because the author and date
is fairly obvious, but I never like doing more than that because then you're
in the realms of inventing stuff that wasn't there.
> Thanks.
> .
>

On Thu, Sep 18, 2014 at 03:55:31 +0100,
Phillip Lougher <phillip@...> wrote:
>
>You should not be getting leaks with Valgrind. I frequently run Valgrind,
>and in the last release of Mksquashfs/Unsquashfs I removed all the (minor)
>leaks that Valgrind identified - this was part of the massive buffer space
>hardening of the code.
I'll see if they still happen after I apply the race condition patch. And
if so, that they aren't in a Fedora library.
>==27610== LEAK SUMMARY:
>==27610== definitely lost: 0 bytes in 0 blocks
>==27610== indirectly lost: 0 bytes in 0 blocks
>==27610== possibly lost: 4,352 bytes in 16 blocks
>==27610== still reachable: 66,263,321 bytes in 13,645 blocks
>==27610== suppressed: 0 bytes in 0 blocks
>==27610== Rerun with --leak-check=full to see details of leaked memory
Mostly I got something like the above. Occasionally I got numbers other
than 0 for definitely lost and indirectly lost. I haven't run with
--leak-check=full yet to see where the problem is reported.
>The failure strongly suggests there is a rare race condition in fragment
>writing in Mksquashfs. Reviewing the code changes in the latest release
>of Mksquashfs around fragment writing, I discovered I have inadvertently
>put back a race condition fixed in 2009. This race condition could
>generate the filesystem corruption seen.
Hopefully that was what I was seeing.
>I have pushed the fix to git, and the commit is here:
Thanks I'll try this out over the weekend.
I noticed that the incorrect type patch (for the 2+ GB file issue) hasn't
shown up in master yet. I just wanted to make sure that it eventually
made it there, I'm not actually waiting on it.
Thanks.

On 18/09/14 00:11, Bruno Wolff III wrote:
> On Sun, Sep 14, 2014 at 11:17:27 +0200,
> Guan Xin <guanx.bac@...> wrote:
>> On Sun, Sep 14, 2014 at 5:29 AM, Bruno Wolff III <bruno@...> wrote:
>>>
>>> I have seen it happen again on a different (also old) machine. Both machines
>>> are i686 and both times it was in the lz4 test. With this being intermittent
>>> it could have pontentially been a problem for a while, since I don't
>>> normally run the test many times. So far I haven't seen it happen on an
>>> x86_64 machine. All of these machines have at least 2 CPUs.
>>>
>>
>> Can you run mksquashfs and unsquashfs under valgrind?
>
> I have tried that and occasionally get warnings about direct and indirect
> memory leaks.
You should not be getting leaks with Valgrind. I frequently run Valgrind,
and in the last release of Mksquashfs/Unsquashfs I removed all the (minor)
leaks that Valgrind identified - this was part of the massive buffer space
hardening of the code.
Typically, this is the output you should get from Valgrind for Mksquashfs
==27610==
==27610== HEAP SUMMARY:
==27610== in use at exit: 66,267,673 bytes in 13,661 blocks
==27610== total heap usage: 21,827 allocs, 8,166 frees, 413,849,219 bytes allocated
==27610==
==27610== LEAK SUMMARY:
==27610== definitely lost: 0 bytes in 0 blocks
==27610== indirectly lost: 0 bytes in 0 blocks
==27610== possibly lost: 4,352 bytes in 16 blocks
==27610== still reachable: 66,263,321 bytes in 13,645 blocks
==27610== suppressed: 0 bytes in 0 blocks
==27610== Rerun with --leak-check=full to see details of leaked memory
No leaks reported - the 4,352 "possibly lost" is constant across all
sizes of filesystem compressed, and hence not particularly worrisome.
But, in general for highly multi-threaded programs like Mksquashfs and
Unsquashfs Valgrind (or more specifically memcheck, the default)
has major limitations because it serialises all the threads, and only
runs one thread at a time. Hence if the corruption is due to a bad interaction
between threads it will not show anything. Helgrind is better for
multi-threaded programs but it is not specifically a memchecker.
I tend to use Valgrind (memcheck) for identifying leaks, Helgrind for
locking issues in multi-threading, and then use the inbuilt
glibc MALLOC_CHECK_ memchecking which uses a special version of
malloc (see man mallopt) that works without serialising the threads.
I also occasionally use "Electric Fence" (http://linux.die.net/man/3/efence)
which is similar and sometimes useful.
>
> I am also having trouble getting the issue to show up again, with or without
> valgrind. I only saw three or four instances (though at least one on each
> of two machines) so far. I have done some kernel updates since and that
> might affect performance or potentially there could be a kernel bug (these
> are 3.17 rc kernels). I'll keep trying for a while and may try down
> grading the kernel to try to see if it might be kernel related.
I have run some tests using the test script. On a modern quad core system
running a 686 install under virtualisation (2 cores), the bug did not show up
in over 2000 runs. On an old 686/atom based eeepc, however, has it showed
up once in ~2000 runs. These are running Debian 7.x (Wheezy) and hence
an older kernel (3.2).
The failure strongly suggests there is a rare race condition in fragment
writing in Mksquashfs. Reviewing the code changes in the latest release
of Mksquashfs around fragment writing, I discovered I have inadvertently
put back a race condition fixed in 2009. This race condition could
generate the filesystem corruption seen.
I have pushed the fix to git, and the commit is here:
https://git.kernel.org/cgit/fs/squashfs/squashfs-tools.git/commit/?id=de03266983ceb62e5365aac84fcd3b2fd4d16e6f
The patch is also appended below.
Phillip
From de03266983ceb62e5365aac84fcd3b2fd4d16e6f Mon Sep 17 00:00:00 2001
From: Phillip Lougher <phillip@...>
Date: Thu, 18 Sep 2014 01:28:11 +0100
Subject: [PATCH] mksquashfs: fix rare race in fragment waiting in filesystem
finalisation
Fix a rare race condition in fragment waiting when finalising the
filesystem. This is a race condition that was initially fixed in 2009,
but inadvertantly re-introduced in the latest release when the code
was rewritten.
Background:
When finalising the filesystem, the main control thread needs to ensure
all the in-flight fragments have been queued to the writer thread before
asking the writer thread to finish, and then writing the metadata.
It does this by waiting on the fragments_outstanding counter. Once this
counter reaches 0, it synchronises with the writer thread, waiting until
the writer thread reports no outstanding data to be written.
However, the main thread can race with the fragment deflator thread(s)
because the fragment deflator thread(s) decrement the fragments_outstanding
counter and release the mutex before queueing the compressed fragment
to the writer thread, i.e. the offending code is:
fragments_outstanding --;
pthread_mutex_unlock(&fragment_mutex);
queue_put(to_writer, write_buffer);
In extremely rare circumstances, the main thread may see the
fragments_outstanding counter is zero before the fragment
deflator sends the fragment buffer to the writer thread, and synchronise
with the writer thread, and finalise before the fragment has been written.
The fix is to ensure the fragment is queued to the writer thread
before releasing the mutex.
Signed-off-by: Phillip Lougher <phillip@...>
---
squashfs-tools/mksquashfs.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/squashfs-tools/mksquashfs.c b/squashfs-tools/mksquashfs.c
index 87b7d86..f1fcff1 100644
--- a/squashfs-tools/mksquashfs.c
+++ b/squashfs-tools/mksquashfs.c
@@ -2419,8 +2419,8 @@ void *frag_deflator(void *arg)
write_buffer->block = bytes;
bytes += compressed_size;
fragments_outstanding --;
- pthread_mutex_unlock(&fragment_mutex);
queue_put(to_writer, write_buffer);
+ pthread_mutex_unlock(&fragment_mutex);
TRACE("Writing fragment %lld, uncompressed size %d, "
"compressed size %d\n", file_buffer->block,
file_buffer->size, compressed_size);
--
1.7.10.4

On Sun, Sep 14, 2014 at 11:17:27 +0200,
Guan Xin <guanx.bac@...> wrote:
>On Sun, Sep 14, 2014 at 5:29 AM, Bruno Wolff III <bruno@...> wrote:
>>
>> I have seen it happen again on a different (also old) machine. Both machines
>> are i686 and both times it was in the lz4 test. With this being intermittent
>> it could have pontentially been a problem for a while, since I don't
>> normally run the test many times. So far I haven't seen it happen on an
>> x86_64 machine. All of these machines have at least 2 CPUs.
>>
>
>Can you run mksquashfs and unsquashfs under valgrind?
I have tried that and occasionally get warnings about direct and indirect
memory leaks.
I am also having trouble getting the issue to show up again, with or without
valgrind. I only saw three or four instances (though at least one on each
of two machines) so far. I have done some kernel updates since and that
might affect performance or potentially there could be a kernel bug (these
are 3.17 rc kernels). I'll keep trying for a while and may try down
grading the kernel to try to see if it might be kernel related.

There's enough interest in LZ4-compressed squashfs that I received an
unsolicited patch adding support to squashfuse. Googling turns up at least
a couple of people using squashfuse to mount LZ4 squashfs images. Obviously
it would be best if they could use in-kernel support as well.
Please let us know when you submit LZ4 support to mainline, and I'll be
sure to support the effort.
-V
On 13/09/14 12:42, Guan Xin wrote:
> Dear Squashfs Developer(s),
>
> I've been using LZ4 compressed squashfs with Linux-3.14.x for a little
while
> (using a modified patch than Philiip's, see attached).
> The decompression speed is fair (much less CPU time than LZO).
>
I have an updated LZ4 patch for 3.14 in a repository, waiting for when
people ask for it (*). Now that it's been asked for a number of times,
I may make the repository public, and provide a link to it.
> The compression speed of LZ4-HC is 5 times that of LZO,
> and more than 20 times without -Xhc. This is magnificent!
> Ref: http://thread.gmane.org/gmane.linux.file-systems/76409
>
> So ... Is there any plan to push it to mainline?
>
Yes. I failed to mainline it last year due to lack of
interest. This was disappointing, I asked for interested parties to come
forward, but got silence except for negative responses. Because of the way
mainlining works this killed the attempt.
The wise/experienced response thereafter is not to carry on regardless,
ignoring the responses as this is a sure fire way of never getting the
feature mainlined. Instead, I have played a waiting game, and have waited
for people to realise they do want it.
Once there is a sufficient number of people publicly asking for it
to be mainlined, and I can count on them to publicly support the
next mainlining attempt, I will try again. This hopefully should
happen in the next couple of months, perhaps if things go well the
next merge window. It all depends on how many people ask for it.
(*) Not making the refreshed patches for 3.14 publicly available is
deliberate. This means people can't silently download the new patches,
and instead have to ask me for them - and in the process this allows me to
gauge the level of interest.
Phillip
> Regards,
> Guan
>
>
>
>
------------------------------------------------------------------------------
> Want excitement?
> Manually upgrade your production database.
> When you want reliability, choose Perforce
> Perforce version control. Predictably reliable.
>
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
>
>
>
> _______________________________________________
> Squashfs-devel mailing list
> Squashfs-devel@...
> https://lists.sourceforge.net/lists/listinfo/squashfs-devel
>
------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Squashfs-devel mailing list
Squashfs-devel@...
https://lists.sourceforge.net/lists/listinfo/squashfs-devel

On Mon, Sep 15, 2014 at 11:32:36 +0200,
Guan Xin <guanx.bac@...> wrote:
>
>In addition, I don't think somebody's doubt over your making
>squashfs-lz4 mainline
>should be taken into account, if he doesn't make use of squashfs himself.
>Non-users' negative opinions should be considered only when they can prove that
>your thing is harmful, and not to be considered when they simply don't
>see it helpful.
I was speaking as the maintainer of squashfs-tools for Fedora. I don't
remember any RFE bugs requesting lz4 support. Though I don't read the
Fedora kernel bug reports and since lz4 has been available in Fedora's
squashfs-tools since a couple of days after 4.3 was released, it's possible
someone would have filed a bug with the kernel rather than with
squashfs-tools.
Also, I think the most important use of squashfs under Fedora is for
live images (which is where my personal interest is) and I don't think
lz4 is a good fit there.
However if the mainline kernel does get lz4 support for squashfs, it will
become available in Fedora pretty quickly (in rawhide almost immediately)
and regular users will be able to use it. In contrast if it doesn't get
into the mainline kernel, the Fedora kernel team will NOT include it
in the Fedora kernel and it won't be easily available to Fedora users.

On Mon, Sep 15, 2014 at 3:02 AM, Phillip Lougher
<phillip@...> wrote:
>
> (*) Not making the refreshed patches for 3.14 publicly available is
> deliberate. This means people can't silently download the new patches,
> and instead have to ask me for them - and in the process this allows me to
> gauge the level of interest.
>
When I was debugging mksquashfs yesterday I came across this thread:
http://www.pclinuxos.com/forum/index.php/topic,127240.0.html?PHPSESSID=nnqk8vm30gc5i8fcikfl7bl532
Obviously, there are more people than you have recorded who are interested in,
and are wishing to use lz4.
In addition, I don't think somebody's doubt over your making
squashfs-lz4 mainline
should be taken into account, if he doesn't make use of squashfs himself.
Non-users' negative opinions should be considered only when they can prove that
your thing is harmful, and not to be considered when they simply don't
see it helpful.
Regards,
Guan

On Mon, Sep 15, 2014 at 02:02:43 +0100,
Phillip Lougher <phillip@...> wrote:
>
>Once there is a sufficient number of people publicly asking for it
>to be mainlined, and I can count on them to publicly support the
>next mainlining attempt, I will try again. This hopefully should
>happen in the next couple of months, perhaps if things go well the
>next merge window. It all depends on how many people ask for it.
While I can't say I have Fedora users clamoring for lz4 support for
squashfs, we do support it in squashfs-tools now and will support it
in the kernel as soon as it is in main line.

On Sun, Sep 14, 2014 at 16:58:31 +0200,
Guan Xin <guanx.bac@...> wrote:
>On Sun, Sep 14, 2014 at 11:17 AM, Guan Xin <guanx.bac@...> wrote:
>>
>> Can you run mksquashfs and unsquashfs under valgrind?
>>
>
>Apart from that, you can also do a "mount test" (as is called as such
>in your script) for lz4.
>If both unsquashfs and mount-test fail, the problem is probably in mksquashfs;
>if unsquashfs fails and mount test ok, probably problem in unsquashfs.
I'm going to try to figure out if this is a mksquashfs or an unsquashfs
issue. I need to change the script a but to help with that. I have
seen this happen again, so I think there is a real problem somewhere.
All have been with lz4, so the issue is likely tied to that.
I haven't used valgrind previously, but might be able to figure it out.

On 14/09/14 04:29, Bruno Wolff III wrote:
> On Sat, Sep 13, 2014 at 16:06:52 -0500,
> Bruno Wolff III <bruno@...> wrote:
>>
>> Or it might have been a CPU over heating.
LZ4 has the least CPU overhead of the compressors. If CPU over-heating
was the cause you would expect it to happen on XZ compression. This
makes over-heating a less probable cause.
>>
>> I ran the script again with the same data directory and I didn't see the
>> error. The machine I use is old and running the squashfs test and doing
>> a compare of two 3GB files at the same time might have resulted in a
>> transient failure. I am going to do some more testing and on different
>> machines and see if I can get it to happen again.
>
> I have seen it happen again on a different (also old) machine. Both machines
> are i686 and both times it was in the lz4 test. With this being intermittent
> it could have pontentially been a problem for a while, since I don't normally
> run the test many times. So far I haven't seen it happen on an x86_64
> machine. All of these machines have at least 2 CPUs.
OK, while two times are not conclusive, it is certainly suggestive that
there's something up with LZ4 compression and older machines (and not
elsewhere, it is the combination of LZ4 and older machines which is important).
Additionally, given your previous debug output showed filenames like
"combined-zero-4K-4-urandom-1" it is also plausible to assume we're
hitting a "perfect storm" of expensive to compress uncompressible
data, combined with a slow CPU, while using the LZ4 compressor.
Now there's two curious things about this "perfect storm", first,
expensive to compress uncompressible data on a slow CPU causes
data to queue up in memory waiting to be compressed. Lots of data
queued up increases the likelihood if something is corrupting data then
some of the queued up data will become corrupted (random data corruption
almost never shows up on fast CPUs because there's no data waiting in
memory to become corrupted).
The second curious thing about uncompressible data is it tends to cause
the compressor to produce more output than the source data, i.e. the
compressed data is larger than the original data! Both LZO and LZ4
suffer from this, and they have a function that calculates the worst
case expansion, i.e. how much larger the output data could be than the
source data.
Now Mksquashfs compresses from a source buffer to a destination buffer.
The destination buffer is always a maximum size of block_size. Given
a source buffer of block_size the resultant output could overflow
the destination buffer when uncompressible data is handled.
The LZO API is particularly poor in this respect, it provides no
way to limit the output to a maximum size of block_size. Due to this
the LZO wrapper code has to use a "bounce buffer", an intermediate buffer
sized to the worst case expansion of the compressed output - if the resultant
output is <= block_size this is then copied to the destination buffer,
otherwise the original source data is used (guaranteed to be <= block_size).
This is particularly annoying, but it is the only way to ensure the
LZO compressor does not overflow the destination buffer.
The normal LZ4 compressor API behaves in exactly the same way. But, it
provides a special limited output function which is "guaranteed" to never
overflow the maximum specified.
The LZ4 wrapper code uses this function to compress directly into the
destination buffer, i.e.
if(hc)
res = LZ4_compressHC_limitedOutput(src, dest, size, block_size);
else
res = LZ4_compress_limitedOutput(src, dest, size, block_size);
The "perfect storm" scenario here with LZ4 compression, a slow CPU and
uncompressible data suggests this function is writing beyond the end
of the destination buffer even though it's supposed not to. Given we have
lots of data queued up in memory this is corrupting other data. Now if
it corrupted waiting input data, this would result in a corrupted
filesystem, but not in a decompression failure (the decompression won't
fail it simply would decompress corrupted data). So it has to corrupt
either already compressed destination buffers, or destination buffers
currently being compressed on other threads. But this is exactly what
an over-run would corrupt as long as there's multiple CPUs.
Of course there's no real evidence this is happening, but, it is the
most plausible scenario that matches all the facts.
I'll do some testing to see if I can reproduce this issue.
Phillip
>
> If it keeps happening fairly regularly I should be able to tell if it
> is a problem when the img is being compressed or uncompressed.
>
> ------------------------------------------------------------------------------
> Want excitement?
> Manually upgrade your production database.
> When you want reliability, choose Perforce
> Perforce version control. Predictably reliable.
> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
> _______________________________________________
> Squashfs-devel mailing list
> Squashfs-devel@...
> https://lists.sourceforge.net/lists/listinfo/squashfs-devel
>

On 13/09/14 12:27, Guan Xin wrote:
> Hi Bruno,
>
> First of all, thank you very much for pointing this out!
> I have made my first backup with squashfs (instead of tar)
> which can trigger this bug just yesterday. You saved my data!
>
> This looks like a 2GB-limit problem instead of that declared
> in the original report. The following one-line patch fixes it, hopefully.
Yes, this looks like the cause of the bug. It will cause
Mksquashfs/Unsquashfs/kernel-code to behave in exactly the reported way.
It is a serious bug, but, for what it's worth, it will not manifest
on >= 2GB files, but only on exactly 2GB files (in actual fact
on any filesize an exact multiple of the blocksize, and bit 31 set).
There is an interesting subtlety here, even in the exactly 2GB case
Mksquashfs will generate a filesystem with all the content
included, it is the Unsquashfs/kernel-code which fails to decode this
unexpected filesystem
The is_fragment() function is a choice, it decides whether the
last block is stored as a normal block or as a fragment. When it
malfunctions it decides to store the last block as a fragment, in effect
it behaves as if the -always-use-fragment option has been set. Now,
even though we didn't want the last block to be stored as fragment, if
it does get stored as fragment it shouldn't affect the correctness of the
filesystem produced - and it doesn't.
The corner case where the last block is a whole block (block size in bytes) is
Mksquashfs stores the block_size block as a fragment. Later in Unsquashfs and
the kernel-code the code calculates the size of the fragment as
file_size & ~(block_size - 1) on the basis a fragment is always a non
block size remainder. In the case that Mksquashfs has stored a block_size
fragment this produces a zero size fragment, with the result the entire
last block is zero filled (the code always zero fills to the end of the
block, if the fragment was 500 bytes, the code will zero fill from 501 ->
block_size).
Of course Unsquashfs/kernel-code is entirely correct, we should never
have a fragment a whole block_size in size. But, the one small
upshot in this is Mksquashfs does not actually loose the data, and if people
did hit this bug, the data could be recovered.
Phillip
>
> Fix 2GB-limit of the is_fragment(...) function.
> Applies to squashfs-tools 4.3.
> Signed-off-by: Guan, Xin <guanx.bac@...>
> --- squashfs4.3/squashfs-tools/mksquashfs.c 2014-05-13
> 00:18:20.000000000 +0200
> +++ squashfs4.3.new/squashfs-tools/mksquashfs.c 2014-09-13
> 13:15:25.817160603 +0200
> @@ -2055,7 +2055,7 @@ struct file_info *duplicate(long long fi
>
> inline int is_fragment(struct inode_info *inode)
> {
> - int file_size = inode->buf.st_size;
> + off_t file_size = inode->buf.st_size;
>
> /*
> * If this block is to be compressed differently to the
>
> Best regards,
> Guan
>
> On Fri, Sep 12, 2014 at 7:52 PM, Bruno Wolff III <bruno@...> wrote:
>> I received a but report for an unsquashed file not matching the original:
>> https://bugzilla.redhat.com/show_bug.cgi?id=1141206
>> I was able to reproduce the issue in f21 (not yet released) both with
>> the normal Fedora package and with mksquashfs and unsquashfs built from
>> the squashfs master repo. So this bug appears to still be in the
>> development version of squashfs-tools. (Or perhaps there is a gcc or
>> library issue in Fedora.)
>>
>> To reproduce the problem run:
>> yes 3 | dd of=testdata bs=1k count=2M
>> mksquashfs testdata testdata.img -b 1048576
>> unsquashfs testdata.img
>> cmp testdata squashfs-root/testdata
>>
>> The reporter claims that using the -no-fragments option works around the
>> problem, but I haven't tested that.
>>
>> ------------------------------------------------------------------------------
>> Want excitement?
>> Manually upgrade your production database.
>> When you want reliability, choose Perforce
>> Perforce version control. Predictably reliable.
>> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Squashfs-devel mailing list
>> Squashfs-devel@...
>> https://lists.sourceforge.net/lists/listinfo/squashfs-devel
>
> ------------------------------------------------------------------------------
> Want excitement?
> Manually upgrade your production database.
> When you want reliability, choose Perforce
> Perforce version control. Predictably reliable.
> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
> _______________________________________________
> Squashfs-devel mailing list
> Squashfs-devel@...
> https://lists.sourceforge.net/lists/listinfo/squashfs-devel
>

On 13/09/14 12:42, Guan Xin wrote:
> Dear Squashfs Developer(s),
>
> I've been using LZ4 compressed squashfs with Linux-3.14.x for a little while
> (using a modified patch than Philiip's, see attached).
> The decompression speed is fair (much less CPU time than LZO).
>
I have an updated LZ4 patch for 3.14 in a repository, waiting for when
people ask for it (*). Now that it's been asked for a number of times,
I may make the repository public, and provide a link to it.
> The compression speed of LZ4-HC is 5 times that of LZO,
> and more than 20 times without -Xhc. This is magnificent!
> Ref: http://thread.gmane.org/gmane.linux.file-systems/76409
>
> So ... Is there any plan to push it to mainline?
>
Yes. I failed to mainline it last year due to lack of
interest. This was disappointing, I asked for interested parties to come
forward, but got silence except for negative responses. Because of the way
mainlining works this killed the attempt.
The wise/experienced response thereafter is not to carry on regardless,
ignoring the responses as this is a sure fire way of never getting the
feature mainlined. Instead, I have played a waiting game, and have waited
for people to realise they do want it.
Once there is a sufficient number of people publicly asking for it
to be mainlined, and I can count on them to publicly support the
next mainlining attempt, I will try again. This hopefully should
happen in the next couple of months, perhaps if things go well the
next merge window. It all depends on how many people ask for it.
(*) Not making the refreshed patches for 3.14 publicly available is
deliberate. This means people can't silently download the new patches,
and instead have to ask me for them - and in the process this allows me to
gauge the level of interest.
Phillip
> Regards,
> Guan
>
>
>
> ------------------------------------------------------------------------------
> Want excitement?
> Manually upgrade your production database.
> When you want reliability, choose Perforce
> Perforce version control. Predictably reliable.
> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
>
>
>
> _______________________________________________
> Squashfs-devel mailing list
> Squashfs-devel@...
> https://lists.sourceforge.net/lists/listinfo/squashfs-devel
>

On 12/09/14 18:52, Bruno Wolff III wrote:
> I received a but report for an unsquashed file not matching the original:
> https://bugzilla.redhat.com/show_bug.cgi?id=1141206
> I was able to reproduce the issue in f21 (not yet released) both with
> the normal Fedora package and with mksquashfs and unsquashfs built from
> the squashfs master repo. So this bug appears to still be in the
> development version of squashfs-tools. (Or perhaps there is a gcc or
> library issue in Fedora.)
>
> To reproduce the problem run:
> yes 3 | dd of=testdata bs=1k count=2M
> mksquashfs testdata testdata.img -b 1048576
> unsquashfs testdata.img
> cmp testdata squashfs-root/testdata
>
> The reporter claims that using the -no-fragments option works around the
> problem, but I haven't tested that.
>
I have just come back from a vacation (without internet access), and I have
just seen this. As there's already been a lot of activity, and it's late
here (gone midnight), I'll briefly respond where appropriate on the
other emails.
> ------------------------------------------------------------------------------
> Want excitement?
> Manually upgrade your production database.
> When you want reliability, choose Perforce
> Perforce version control. Predictably reliable.
> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
> _______________________________________________
> Squashfs-devel mailing list
> Squashfs-devel@...
> https://lists.sourceforge.net/lists/listinfo/squashfs-devel
>

On Sun, Sep 14, 2014 at 11:17 AM, Guan Xin <guanx.bac@...> wrote:
>
> Can you run mksquashfs and unsquashfs under valgrind?
>
Apart from that, you can also do a "mount test" (as is called as such
in your script) for lz4.
If both unsquashfs and mount-test fail, the problem is probably in mksquashfs;
if unsquashfs fails and mount test ok, probably problem in unsquashfs.
Guan

On Sun, Sep 14, 2014 at 5:29 AM, Bruno Wolff III <bruno@...> wrote:
>
> I have seen it happen again on a different (also old) machine. Both machines
> are i686 and both times it was in the lz4 test. With this being intermittent
> it could have pontentially been a problem for a while, since I don't
> normally run the test many times. So far I haven't seen it happen on an
> x86_64 machine. All of these machines have at least 2 CPUs.
>
Can you run mksquashfs and unsquashfs under valgrind?
FYI, I ran the script on a 2-core/2-thread x86 Linux, and on a 4c/8t
x86_64, both without error. This, of course, doesn't guarantee it's
bug-free.
Guan

On Sat, Sep 13, 2014 at 16:06:52 -0500,
Bruno Wolff III <bruno@...> wrote:
>
>Or it might have been a CPU over heating.
>
>I ran the script again with the same data directory and I didn't see the
>error. The machine I use is old and running the squashfs test and doing
>a compare of two 3GB files at the same time might have resulted in a
>transient failure. I am going to do some more testing and on different
>machines and see if I can get it to happen again.
I have seen it happen again on a different (also old) machine. Both machines
are i686 and both times it was in the lz4 test. With this being intermittent
it could have pontentially been a problem for a while, since I don't normally
run the test many times. So far I haven't seen it happen on an x86_64
machine. All of these machines have at least 2 CPUs.
If it keeps happening fairly regularly I should be able to tell if it
is a problem when the img is being compressed or uncompressed.

On Sat, Sep 13, 2014 at 22:45:25 +0200,
Guan Xin <guanx.bac@...> wrote:
>On Sat, Sep 13, 2014 at 10:26 PM, Bruno Wolff III <bruno@...> wrote:
>
>Looks like you are using this test script:
>https://fedoraproject.org/wiki/QA:Testcase%20squashfs-tools%20compression
>which is highly distro-specific.
>Could you please provide a simpler / more generic testcase?
>Thanks!
>This may be a bug in unsquashfs, which is unimportant;
>or it may be in mksquashfs, in which case is a big problem.
Or it might have been a CPU over heating.
I ran the script again with the same data directory and I didn't see the
error. The machine I use is old and running the squashfs test and doing
a compare of two 3GB files at the same time might have resulted in a
transient failure. I am going to do some more testing and on different
machines and see if I can get it to happen again.

On Sat, Sep 13, 2014 at 10:26 PM, Bruno Wolff III <bruno@...> wrote:
>
> Your fix may be causing a problem for LZ4. I had a regression test fail with
> a version of squashfs-tools with your patch.
>
> Testing unmounted extract using lz4 compression.
> Parallel unsquashfs: Using 2 processors
> 188 inodes (2316 blocks) to write
>
> [=| ] 38/2316
> 1%
> lz4 uncompress failed with error code -12842
> Extract test failed for lz4 compression.
> ... (snip)
Looks like you are using this test script:
https://fedoraproject.org/wiki/QA:Testcase%20squashfs-tools%20compression
which is highly distro-specific.
Could you please provide a simpler / more generic testcase?
Thanks!
This may be a bug in unsquashfs, which is unimportant;
or it may be in mksquashfs, in which case is a big problem.
Guan