Breaking: AMD shows die of Orochi, a 32nm 8-core Bulldozer-based CPU

We don't know much about it, but at the first annual Global Technology Conference hosted by GLOBALFOUNDRIES, AMD's Chekib Akrout showed the first images of the upcoming Orochi core processor:

Here is what we know for sure about the upcoming Orochi processor: it is going to be the second 32nm product from AMD after the upcoming Llano Fusion core is built, it uses a set of 4 Bulldozer modules that bring 8 processing cores and 8 threads with integration of AMD's unique SMT alternative.

If you haven't read details about the new Bulldozer core and what it has to offer, definitely check out our recent preview of the processor based on information revealed at the Hot Chips conference last month.

Nothing else was shared about the Orochi CPU in particular but we thought the hardware porn was worth the mention!

A good way to express what Bulldozer is can be summed up as “slimmed
down, but double wide”. For each traditional core, AMD has instituted a
dual ALU design with robust floating point and SSE units. Each core
can handle two threads, like SMT, but actually has separate execution
units which each process individual threads without sharing execution
resources.

Each unit features a single fetch and decode stage. The decode
stage is comprised of four units, but we do not yet know their inner
workings. In the previous K7/K10.5 generations of parts, there are
three complex decode units. On the Intel side with Core 2 and Nehalem,
there are three simple decode units and a single complex. AMD also did
not cover subjects such as macro-ops and macro-op fusion. AMD has
beefed up their decode stage significantly though. It simply had to,
because it is now feeding dual integer schedulers and a floating point
scheduler feeding 2 x 128 bit FMACs and MMX units.

Fetch, decode, floating point/SSE, and the L2 cache are the
shared components. Since most workloads are integer based, AMD doubled
the integer units. These 128 bit packed integer pipes are a step above
what was offered in the Phenom II. In theory, there should be a
sizeable per clock increase in integer and floating point apps on
Bulldozer over the Phenom II. When something is more heavily threaded,
then we will see dramatic improvements in performance. Each integer
core features its own L1 D-cache. AMD has again not clarified how much
L1 or L2 cache there is for each discrete unit, or L3 cache sizes for
the entire processor.