Re slide 7:
- irrespective of whether the compiler can auto-parallelize the SpecInt code, if the benchmark includes stuff that probes how well the cores can access shared data, that 'stuff' needs to be optimized to the target hardware.
- the slide horizontal axis is labelled 'threads', with an implication that each thread is running on a dedicated core.
- the figures given for the A57 indicate that 4 cores give about 3 times the performance of a single core..
- this could be indicative of a potential issue of excessive latency in the interconnect that links cores to memory etc..
Re performance: claimed vs realized: @wsw1982, the marketing guys will always try and claim the biggest figure they think that they can get away with; but behind the scenes, the engineers BETTER have an EDA environment that allows them to run the actual benchmark code on a simulation of a complete test chip (albeit rather slowly...); so they know the reality already (within the simulation environment accuracy limits.... eg DDR response time modelling...). If they don't have this capability, some people may get some 'surprises' when they start to evaluate the real Si...

Medfield is already highly competitive with the best ARM's licensees can field and that's on Intel's older 32nm LP process;
Silvermont is a fundamentally new ATOM microrachitecture on a new 22nm LP process and it's been tweaked to the last femtoferrad by the Intel Army.
The combined Haswell and Silvermont Intel avalanche is coming in 2013 and I can't wait to see how the market/competition respond!

I won't judge the ARM performance until the real product exist. Just 1 years ago, there are full of people in EETIMES argued that "based on benchmark" the ARM a9 was 2~4 time faster than current atom, and were soon busted by the comparison between real mobile product. Don't take me wrong, I don't mean those people are incapable, but the result from ARM is just misleading and bluffing. They also declared the ATOM is incapable of being used in smartphone until 22nm when then medfield was already in massive production, what a reputation.

Power aside don't forget ARM parts are made for switch and MP blades and dense implementations.
Four nodes per blade at 16 cores on the work load places ARM 32 bit parts while memory limited, and certainly any 64 bit part, in Xeon DP/MP performance territory. Benchmark the compute module not the individual processor. Mike Bruzzone at Camp Marketing

arm is moving really fast. we have caught it with arm9, arm11, arm cortext a8, a9, but now it moves to A50!
yet each arm platform find it way on different usage.
We provide arm board R&D OEM/ODM service, get you a quick product prototype and demo sample, save you precious time and let you focus on application and market.
www.quickembed.com
Samsung ARM cortex A8 S5PV210 tiny210 board http://www.quickembed.com/Tools/Shop/A8/201202/245.html

Some benchmarks in specInt2006 can be auto-parallelized by compiler it has some effect on the overall score.
Not sure about specInt2000 - it was retired long time ago (even specInt2006 was supposed to be retired this year). But it's likely possible to play tricks to make specInt2000 results look better.
It's annoying that ARM uses long retired benchmark. specInt2000 working set sizes are substantially smaller than specInt2006, so just having large cache (which was expensive 5-10 years ago) would give a nice performance boost, e.g. http://cpudb.stanford.edu/visualize/performance_by_freq_and_cache