I have been doing some transfer rate tests among hosts
with BroadcomBCM5752 or
BCM5708
(tg3 Linux driver),
Intel82541PI
(e1000 Linux driver),
RealtekRTL-8169
(r8169 linux driver) chipsets (more details
on these tests sometime later).
During these I have been quite confused by the behaviour of
the Broadcom BCM5752 with
jumbo frames,
until I discovered that officially it does not allow them, but
in practice supports them in transmission only. In addition the
RTL-8169 supports jumbo frames but only up to 7000B instead of
the more common 9000B.
Anyhow the BCM5752 has various form of network processing
offloading, including TCP segmentation offloading, which means
that it can handle small frames on the wire fairly efficiently,
while the RTL-8169 does not, so jumbo frame support is rather
more useful for it.
Also, to take advantage of the ability of the BCM5752 to
transmit but not receive jumbo frames is easy: change the
relevant routes to have with an MTU value higher than 1500, but
set or leave the advertised MSS value under 1500 (minus 40 for
the header), for example (the advmss 1460 part
is redundant) instead of issuing

My impression is that there are very few commodities,
as often even very similar products have important
differences, which may or not matter to everybody, but
matter to someone. One of the latest illustrations are
large differences among 2.5" hard disc drives
as to seequential small block performance, which is
quite useful to have. The other tests also show
remarkable differences (around 25%), and in different
ways, as different onboard firmware
reacts differently to different usage patterns. For
example
multithreaded reading and writing
shows differences in performance of four times
between fastest and slowest.

From an interesting review of the new
AMDX2 5500
there are some
impressive numbers about memory latency differences
between current AMD and
IntelCPUs.
Latency in main
RAM
still matters, especially
for systems with smaller caches, and less so the larger the
cache. As to this Intel has the lead, thanks to their superior
capital base that enables investing in better process technology
which results in more onchip cache, and Intel are clearly
driving the memory market towards memory with high latencies and
high bandwidths, which is the combination that best feeds the
large caches of their CPU chips via their high latency memory
buses.
SDR memory had
latencies of a few cycles,
DDR of half a dozen
cycles,
DDR2 of a dozen
cycles, and now
DDR3 of a couple
dozen cycles. A little known detail is tha the intrinsic speed
of a memory cell has improved very little over the past decade
or two (a mere doubling, from around 100MHz to 200MHz), and all
transfer rate increases have come from higher degrees of
pipelining and parallelism at the integrated circuit level.
Unfortunately pipelining and parallelism only work well in the
aggregate, as they involve those ever higher latencies already
mentioned, especially noticeable for random accesses to memory.
Intel seem to be driving RAM to become what used to be
called bulk store in the mainframe era, a cheap vast
repository of seldom used data that can be recalled to main
memory faster than from disk. The level 2 cache is the new main
memory. In other words, RAM is no longer meant to be that random
access, and is meant to be a commodity, where the performance
and value reside in the onchip memory. It is no coincidence that
Intel has exited the RAM market long ago, and have invested
massive capital in processes that allow ever greater amounts of
onchip memory.