Latest revision as of 07:54, 13 December 2009

Each program should be implemented the same way - the same way as this Icon program.
The sum-file benchmark measures line-oriented I/O and string conversion.

Each program should:

* read integers from stdin, one line at a time
* print the sum of those integers

Correct output for this 6KB input file is:

500

Programs should use built-in line-oriented I/O functions rather than custom-code. No line will exceed 128 characters, including newline. Reading one line at a time, the programs should run in constant space. Ideally, we could get this benchmark up to the first rank; Clean is the current number one language for this benchmark, at 2.74 seconds[1], and anything Clean can do Haskell should be able to do as well!

Those guys tell us these benchmarks don't favor C and then impose a limit on line length? What's the purpose of that if not to allow the use of C's getline() primitive (in both senses of the word)? And if we're picky, all submitted programs are incorrect, as they assume the sum fits into a machine word, but this assumption is unwarranted. This again favors C, which lacks arbitrary precision integers. -- UdoStenzel

This uses the fast, strict loop from the illegal/strict entry, but a
chunk-wise lazy reader from the current lazy bytestring entry.
It is the most efficient entry in any language, but it was rejected: "NOT ACCEPTED: should use built-in line-oriented I/O functions rather than custom-code"[2]

An short alternative (but performance isn't great with the {{{read}}}).
However, it's the only 1 line entry in any language, it's very
Haskellish, and GHC does an excellent job compiling the foldr into a
tight loop. The Int constraint doesn't seem to change performance much,
but causes less code to be generated.

main =print.foldr((+).read)(0::Int).lines=<<getContents

Since this is an accumulation, wouldn't foldl' work better?

main =print.foldl' (+) (0::Int) . map read . lines =<< getContents

-- UdoStenzel

I tried that tweak and the speed was the same
-- ChrisKuklewicz

Other options

main =print.sum.mapread.lines=<<getContents
main =print.foldl((.read).(+))(0::Int).lines=<<getContents

It isn't entirely valid, but as has been pointed out, neither are any of the other entries.
Like Chris, I couldn't tell any difference between foldl and foldl', nor did restricting to Ints make any impact.
(Other test data may differ)

This is (with -O2) about three times slower than the current best entry, which, if it scales up, will put it in the middle of the pack, close to Ocaml byte code and CMUCL.

An improved version, this uses about 0.6 of the heap (according to -prof). Replacing sum . map with a foldr is the key. Also, why the need for `valid'? -- Don

Right, I didn't bother with heap - and I guess I expected sum to be strict enough (isn't it?) valid`s raison d'tre is the possibility of non-numeric characters (whitespace) in the file. Perhaps the spec excludes that possibility? -k

Nice and concise! I'd suggest renaming sumify, but perhaps str2int is better than my read variant? Does the ::Int buy you much? I really like to avoid the overflow bug, even if not relevant for the test data set. -k