On 11/30/2010 10:16 PM, Paul Eggert wrote:
> Invoke MAX_MERGE(total, level) with level == 15.
> 2 << level yields 65536, and 65536 * 65536 overflows to zero.
I managed to reproduce this bug on a (faked) host with
32768 processors, using a command like this:
seq 1000000000 | sort --parallel=32768 -S 10G
The result was a floating point exception (actually, a division
by zero) and 'sort' crashed.
However, the bug is timing dependent and is very hard to
reproduce. I tried many more times to reproduce it, and
they all failed.
This proved to my satisfaction that it is a real bug, though,
so I pushed the following patch.
>From 1561c2b228d93a049e527824e14ad4fe8c256b52 Mon Sep 17 00:00:00 2001
From: Paul Eggert <address@hidden>
Date: Wed, 1 Dec 2010 21:50:00 -0800
Subject: [PATCH] sort: fix bug on 64-bit hosts with at least 32768 processors
* src/sort.c (MAX_MERGE): Avoid integer overflow when on a machine
with (say) 32-bit int and 64-bit size_t and when level == 15.
Without this fix, on such a machine with 32768 or more processors,
the level computation could overflow on large input, and this
would result in division by zero.
---
src/sort.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/src/sort.c b/src/sort.c
index 1aa1eb4..5c368cd 100644
--- a/src/sort.c
+++ b/src/sort.c
@@ -107,7 +107,7 @@ struct rlimit { size_t rlim_cur; };
/* Maximum number of lines to merge every time a NODE is taken from
the MERGE_QUEUE. Node is at LEVEL in the binary merge tree,
and is responsible for merging TOTAL lines. */
-#define MAX_MERGE(total, level) ((total) / ((2 << level) * (2 << level)) + 1)
+#define MAX_MERGE(total, level) (((total) >> (2 * ((level) + 1))) + 1)
/* Heuristic value for the number of lines for which it is worth
creating a subthread, during an internal merge sort, on a machine
--
1.7.2