This chip should be now sampling:"Suzhou PowerCore Technology Co. Ltd plans to deliver working versions of its CP1 (below) this summer. It will come in 8-, 10 and 12-core versions, each with a 64K data cache and 32K instruction cache. Like the IBM chip, the CP1 has 96MB of L3 cache, 115GB/s memory bandwidth and can support 32 lanes of PCIe 3.0."

As has been argumented about a million times by the x86 crowd on this very forum:benchmarks running in cache is cheating.

When you see GFLOP/s being 20 times the number of GB/s (20 instructions run for each byte fetched or stored in memory) you gotta ask yourself, where did the data come from?

IBM has a much more realistic performance per bandwidth number there.

_________________This weeks pet peeve:Using "voltage" instead of "potential", which leads to inventing new words like "amperage" instead of "current" (I, measured in A) or possible "charge" (amperehours, Ah or Coulomb, C). Sometimes I don't even know what people mean.

(e6500 systems have still highest coremark per core/mhz and coremark/watt unless I'm mistaken.And the t4240 being the most powerfull single SoC device. Too bad we have to wait for blender benchmarks, heh...)UPDATE. The most powerfull (in coremark) single CPU might be Intel Xeon E5 2687W at 3,4Ghzhttp://www.cpu-world.com/CPUs/Xeon/Intel-Xeon%20E5-2687W.html ( 150W power )UPDATE: And Core i7 860 result is about 1/3 of T4240...

Last edited by KimmoK on 11-Jun-2015 at 03:00 PM.Last edited by KimmoK on 11-Jun-2015 at 02:58 PM.Last edited by KimmoK on 11-Jun-2015 at 02:57 PM.Last edited by KimmoK on 11-Jun-2015 at 02:54 PM.Last edited by KimmoK on 11-Jun-2015 at 02:48 PM.Last edited by KimmoK on 11-Jun-2015 at 02:45 PM.Last edited by KimmoK on 11-Jun-2015 at 02:39 PM.Last edited by KimmoK on 11-Jun-2015 at 02:36 PM.

Take the coremark per core, divide by frequency of e5 (3400). Multiply by frequency of T4 (1800). Result: 13239.1535294. That's pretty badly beaten by the T4 with its 15656.13. The Intel way of doing multithreading really does not impress me. On P4 they were getting 20% extra, now they're getting 20% extra.

While freescale manages 70-80% extra when running two threads.

But I guess it matters nothing to the consumer when they manage to clock it so much higher.

_________________This weeks pet peeve:Using "voltage" instead of "potential", which leads to inventing new words like "amperage" instead of "current" (I, measured in A) or possible "charge" (amperehours, Ah or Coulomb, C). Sometimes I don't even know what people mean.

There is a guy on Real world tech forums with a POWER8 machine - he says he ran multiple benchmarks and in single thread POWER8 is about as fast as Sandy Bridge. Only in multithreaded enviroment it can compare to Haswell(IBM designed POWER8 primarily for multiple threads). Haswell core is much more sophisticated with greater IPC and better vector unit. Skylake Xeons will probably have SMT4(4 threads per core), Knight's Hill is confirmed to have SMT4(Knights's hill uses a highly modified Silvermont-Atom core).We'll see what POWER9 brings but so far Intel is kicking butt(and at much lower TDP)

As has been argumented about a million times by the x86 crowd on this very forum:benchmarks running in cache is cheating.

IBM's POWER8 has 96MB of cache, whereas Intel's Xeon E5v3 has only 40MB. So IBM is cheating...Quote:

When you see GFLOP/s being 20 times the number of GB/s (20 instructions run for each byte fetched or stored in memory) you gotta ask yourself, where did the data come from?

By the more cache usage? Floating point code tends to be more linear and localized, so it makes better use of caches. That's also the reason why the SIMD paradigm became so much important and widespread: because it's common to manipulate local data doing the same operations.Quote:

IBM has a much more realistic performance per bandwidth number there.

With so much big cache it should run in cache more often than Intel's chip. Cheating?

@olegil

Quote:

olegil wrote:@KimmoK

Take the coremark per core, divide by frequency of e5 (3400). Multiply by frequency of T4 (1800). Result: 13239.1535294. That's pretty badly beaten by the T4 with its 15656.13. The Intel way of doing multithreading really does not impress me. On P4 they were getting 20% extra, now they're getting 20% extra.

While freescale manages 70-80% extra when running two threads.

But I guess it matters nothing to the consumer when they manage to clock it so much higher.

Don't bother. All benchmarks are invalid here unless they show that Power is superior solution. Benchmark of choice here seems to be coremark which is made for embedded solutions which have quite different needs that desktop programs.

But I report also some data about the Avoton family, which is comparable to the T4240 one, as I state before (sorry for the typo).Intel® Atom™ processor C2750 (8 cores, 2.4 GHz, 20 W): SPECint_rate_base2006 (8 copy) = 102