]]>By: stevehttp://steve.vinoski.net/blog/2007/09/29/more-file-processing-with-erlang/comment-page-1/#comment-36
Wed, 03 Oct 2007 21:09:22 +0000http://steve.vinoski.net/blog/2007/09/29/more-file-processing-with-erlang/#comment-36Hi Pete, yes, MacBook Pro disk I/O throughput is best case about 45 MB/sec from some figures I’ve seen, which I think would put us in the 4-5 sec range for reading this data, best case. If I run the Ruby solution on the full dataset, the first time it takes 7.5-8 secs, but the second time it takes 2.2-2.5 secs, which I believe shows the caching effects. But if we’re getting cached data, then I think that reinforces my point about the Erlang code not being I/O bound.

Your second paragraph above has some very good insights, and your final paragraph is right on the money, IMO. I’d be really interested in seeing your benchmark results (I assume they’ll be on your website?), and needless to say Tim’s T2 results should be quite interesting as well.

]]>By: Pete Kirkhamhttp://steve.vinoski.net/blog/2007/09/29/more-file-processing-with-erlang/comment-page-1/#comment-35
Wed, 03 Oct 2007 19:59:48 +0000http://steve.vinoski.net/blog/2007/09/29/more-file-processing-with-erlang/#comment-35I do think on that particular measure you’re reading from cache rather than physical disk; on my laptop it takes 8.5s if you clear the cache, 0.4s if you don’t. Most laptop drives are around 30MBps; a bit more if 10,000 rpm. Flash drives are twice that, but excel at seek time.

However, that sort of number may actually be representative of the target system – if you assume a transfer rate in the 2GBps ballpark (Sun’s Thumper gives 2GBps to memory), and the T2′s 1.4GHz core, that gives only 0.7 CPU cycles (per hardware thread) to process each byte of data, so even the C++ code (which takes around 3 cycles per byte on my laptop) would be CPU limited rather than IO limited if you don’t spread the load over the available hardware threads. There’s more estimates at the link I put as website.

If it’s a language war then it’s between erlang and ruby; I’m trying to find out what the VM of either language should be doing to solve this problem, so am benchmarking to find where the costs are if you write close-to-the-metal code to solve it. I wouldn’t write a log file extraction script in bit-twiddly C++; I’d use Perl or XSLT. You really shouldn’t have to, but performing experiments to help think about where the bottlenecks may be is useful.