Joe Landman wrote:
> Hi Tahir:
>> Tahir Malas wrote:
>>> Since the memory cost of our
>> system will dominate other costs, we can afford to pass to dual-core
>> technology. However, the questions that arise are follows.
>>>> 1. Will it worth? And can we gain any advantages over single-core
>> with the
>> not-so-good scalability of our parallel programs?
>>> It depends upon the code. If your code requires very low latency, the
> benefit of dual core nodes are that you have 4 interconnected cores
> (think of them as individual processors) connected over a very high
> speed low latency interface. If this is well coupled to the rest of
> the system through an external low latency interface (Infinipath, IB,
> Myrinet, etc), and your code is latency sensitive, then dual core
> could be a substantial win for you. If your code simply hammers on
> memory bandwidth, then it is possible in some cases for it to be a
> liability relative to single core. Some cases (weather codes)
> demonstrated something like this here in the recent past.
It is probably worth pointing out here that the latency being
referred to is network
latency. Latency to memory is >>worse<< when cache coherency is
turned on in single or
dual core SMP configurations (something like ~100 nanos vs ~55
nanos). I am assuming
that a single-chip dual-core will have to have a cache coherent
memory reference protocol
and be slower. While it is true that scalability limited by
network latency may improve,
a bandwidth intensive application may suffer (amenability to
prefetching affects how much)
in a cache-coherenct context because of the overhead added by
larger memory reference
latencies.
>>> 2. Another question is that is dual-core technology brings any
>> advantages
>> for the efficient usage of high amount of memory that we will
>> utilize? 3. 3.
>>> Not really advantage or disadvantage. With single core, your
> aggregate memory bandwidth is N(cores) * Bandwidth of one of the
> memory busses. With dual core, it is (N(cores)/2) * Bandwidth of one
> of the memory busses. This may or may not be an issue for your code.
Or to perhaps put it more simply, it is limited by the number of
on-chip memory controllers on
the board (not to mention their clock and the speed/type of your
DIMMs).
rbw
AHPCRC