On Mon, Nov 26, 2012 at 03:17:45PM -0800, Zach Brown wrote:
> > This function efficiently counts the number of bits in a block of> > memory.> > Would it be worth the annoying build- and run-time machinery to detect> and use the -msse4.2 __builtin_popcount() gcc intrinsic?
I thought about doing it, but I was in a bit of a hurry implementing
this patch set, and I wasn't even sure how to correctly implement the
build- and run-time machinery (i.e., detecting whether the gcc you're
compiling with supports __builtin_popcount, and implementing a
run-time fallback is the CPU doesn't support popcount instruction ---
which by the way isn't properly part of SSE 4.2; it has its own
separate CPUID bit, IIRC). Is there some userspace application
licensed under LGPLv2 which does this cleanly from which I could
borrow code?
I suppose I should first check and see how much difference it makes to
with a hard-coded use __builtin_popcnt(). If it makes a sufficiently
large improvement, it's probably worth the hair of implementing the
fallback machinery.
- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

On Mon, Nov 26, 2012 at 08:45:05PM -0500, Theodore Ts'o wrote:
> I suppose I should first check and see how much difference it makes to> with a hard-coded use __builtin_popcnt(). If it makes a sufficiently> large improvement, it's probably worth the hair of implementing the> fallback machinery.
I did some quick benchmarking, and the difference it makes when
checking 4TB's worth of bitmaps is negligble:
slow popcount: 0.2623
fast popcount: 0.0700
For a 128TB's worth of bitmaps, the time difference is:
slow popcount: 8.0185
fast popcount: 2.2066
I measured running e2fsck on an empty 128TB file system, and that took
202 CPU seconds (assuming all of the fs metadata blocks are in cache),
so with this optimization we would save at most 3%. (For comparison,
using an unmodified 1.42.6 e2fsck, it burned 392.7 CPU seconds.)
My conclusion is that using __builtin_popcnt() is a nice-to-have, and
if someone sends me patches I'll probably take them as a optimization,
but it's not super high priority for me.
- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

On Tue, Nov 27, 2012 at 12:16:17AM -0500, Theodore Ts'o wrote:
> I did some quick benchmarking, and the difference it makes when> checking 4TB's worth of bitmaps is negligble:> > slow popcount: 0.2623> fast popcount: 0.0700> > For a 128TB's worth of bitmaps, the time difference is:> > slow popcount: 8.0185> fast popcount: 2.2066> > I measured running e2fsck on an empty 128TB file system, and that took> 202 CPU seconds (assuming all of the fs metadata blocks are in cache),> so with this optimization we would save at most 3%. (For comparison,> using an unmodified 1.42.6 e2fsck, it burned 392.7 CPU seconds.)
Nice, thanks for taking the time to get numbers.
> My conclusion is that using __builtin_popcnt() is a nice-to-have, and> if someone sends me patches I'll probably take them as a optimization,> but it's not super high priority for me.
Agreed. I'll chuck it at the end of my fun-projects-some-day list as
well, but getting it right for all the platforms that e2fsprogs
supports.. meh :).
- z
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

On Mon, Nov 26, 2012 at 08:45:05PM -0500, Theodore Ts'o wrote:
> build- and run-time machinery (i.e., detecting whether the gcc you're> compiling with supports __builtin_popcount, and implementing a> run-time fallback is the CPU doesn't support popcount instruction ---> which by the way isn't properly part of SSE 4.2; it has its own> separate CPUID bit, IIRC).
*nod*
> Is there some userspace application licensed under LGPLv2 which does> this cleanly from which I could borrow code?
Not that I know of, off the top of my head. I think I'd first check the
usual crypto suspects :).
- z
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

On Tue, Nov 27, 2012 at 09:50:23AM -0800, Zach Brown wrote:
> > Agreed. I'll chuck it at the end of my fun-projects-some-day list as> well, but getting it right for all the platforms that e2fsprogs> supports.. meh :).
It's not strictly necessary to get things right for all platforms; we
already have some accelerations using asm statements which only work
for one platform already --- although it's already the case that I
didn't bother to make 64-bit set/clear/test bit optimizations for x86,
mainly because I didn't think it was worth it, especially on modern
CPU's. (And with the red/black tree backend for bitmaps, the asm
bitops are even less important.)
- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html