Hi all
Thought I'd make a record of witnessing my setup starting to exhibit
what seems to me like TRIM/gc related SSD thrashing recently.
Setup is:
* Acer Travelmate 8172
* Crucial RealSSD C300 64G
* Gentoo, current kernel 3.1.1-pf
* 6 GPT partitions
- sda1-2 LUKS encrypted, ROOT btrfs multiple devices,
noatime,nodiratime,compress=lzo,ssd
- sda3-4 LUKS encrypted, HOME btrfs multiple devices,
noatime,nodiratime,compress=lzo,ssd
- sda5, no encryption, PUB btrfs
- sda6, LUKS encrypted, SWAP
Multiple partitions were made at the time when I was under the
impression it would help dm-crypt with multicore. Milan has since
replied in the list that making multiple partitions to achieve good
multicore performance hasn't been necessary for a number of recent
kernel versions. It is too much effort to redo it right now, so I'm
leaving that for a more suitable time.
I believe I started with 2.6.38 something like 7 months ago. Did not do
a swap partition at the time, this was a recent addition by reducing
sda5 by about 5GB. Because of a battery.ko led trigger related kernel
BUG on resume that is still unfixed (and bko is still down), btrfs
recently got into a bad state where writing data to certain areas of
HOME caused kernel BUGs.
So I rsync'd /home and recreated that btrfs. Other than some other
rare'ish btrfs BUGs with pre-3.1 kernels, which I've overcome by
mounting with an even older kernel version i.e. off systemrescuecd, this
setup has worked well.
But in the last some days, after uptime running into a few weeks and
machine going through some pretty heavy desktop workloads, I noticed
occasionally UI subsystems would grind to a halt while the HDD (well,
SSD now) light frantically flashes for a while. Mouse, keyboard, nothing
responds, sound plays until buffer runs out, network also eventually
disconnects. When SSD light stops, machine returns to normal, responsive
state.
It progressively got worse though, to where the stalls would last up to
a minute or so. Drive light frantic flashing seemed to indicate that I
have probably reached some kind of a garbage collection limit, but OTOH
I could be totally wrong. Which is also why I'm posting, perhaps a smart
person knows better. I think having had the SSD filled with these
encrypted partitions perhaps is bothering the gc mechanism?
So I now went ahead, upgraded cryptsetup to 1.4.1 and added
--allow-discards to my mount parameters. During the 5+ hrs post-reboots
uptime no stalling+thrashing has happened yet, but workload also hasn't
reached using swap yet according to free.
swapon also seems to support discard, so I enabled that, too:
-d, --discard
Discard freed swap pages before they are reused, if the
swap device supports the discard or trim operation. This may improve
performance on some Solid State Devices, but often it does not. The
/etc/fstab mount option discard may be also used to enable discard flag.
Like subject says, now some excited situation monitoring and praying is
going on in the meanwhile. Let's see how this whole thing holds up. Hope
you've enjoyed the story.
--
Leho Kraav, M.Sc.
http://leho.kraav.com