Intel used to be an ARM architecture licensee until 2006, when it sold its XScale division to Marvell. Intel had grown too large, too defocused, and in turn its core business had suffered. Don’t be confused, the focus wasn’t to be shifted back to desktop, but rather back to x86.

It wouldn’t be until 2008 that Intel would reveal its more focused strategy unto the world: Atom.

Intel's Atom processor core

While ARM and its licensees played off Atom as not being remotely threatening, all of them knew that it was only a matter of time. Publicly they reasserted ARM’s dominance in the market. Four billion ARM chips shipped last year alone. Intel sold on the order of tens of millions of Atoms. But privately, the wheels were in motion.

ARM inked a deal with Globalfoundries, AMD’s manufacturing arm, to bring ARM based SoCs to the fab. This gives ARM the sort of modern manufacturing it needs to compete with Intel. The second thing that’s changed is ARM licensees are now much more eager to talk about their architectures and what makes them special.

ARM offers two licensing arrangements to its partners: a processor license or an architecture license. A processor license allows the partner to take an ARM designed core and implement it in their SoC. An architecture license allows the partner to take an ARM instruction set and use it in their own processor. The former is easier to implement, while the latter allows the licensee the ability to optimize the architecture for its specific needs.

The Palm Pre - Powered by ARM

Companies like Samsung and TI hold ARM processor licenses. The Cortex A8 used in the iPhone 3GS (Samsung) and the Palm Pre (TI) is licensed directly from ARM. Marvell however has been an ARM architecture licensee for the past 5 years.

It’s an ARMADA

Marvell is introducing a fleet of new SoCs (system on a chip) and the brand is called ARMADA. Get it?

Marvell is introducing four series of ARMADA and their target markets are below:

SoC

Market

ARMADA 100 Series

eBook Readers, digital photo frames, portable NAV devices, etc...

ARMADA 500 Series

MIDs, netbooks

ARMADA 600 Series

Smartphones, MIDs

ARMADA 1000 Series

Blu-ray players, TVs, set-top boxes

These are SoCs so they’ve got CPU, GPU, I/O and networking all included on a single chip. The entire ARMADA line is built on TSMC’s 55nm process. The 100 is super low performance, useful in eBook readers, digital photo frames, IP cameras, etc... The 1000 is a multi-core version of the 100 with additional blocks designed for Blu-ray players, digital TVs and HD set-top boxes.

Both the 100 and 1000 are based on Marvell’s Sheeva PJ1 ARM core. This core uses the ARMv5 instruction set like the ARM9 processor, but performance-wise it should be comparable to an ARM11 implementation.

It’s a single issue in-order core with data forwarding support. The core is a hybrid between the original Marvell CPU team and the XScale CPU team that Marvell acquired in 2006. The pipeline depth is between 5 and 8 stages depending on the instruction group.

The core has two separate ALUs (simple single cycle and complex two cycle), a load/store unit and a multiply unit. The ARMv5 instruction set doesn’t explicitly require floating point so there’s a separate coprocessor for all fp operations. Integer SIMD is handled through a separate Wireless MMX2 unit.

Marvell wouldn’t reveal die sizes but indicated that the PJ1 is comparable to ARM11 based designs in both size and power characteristics.

The more interesting SoCs are in the ARMADA 500 and 600 families. They use the Sheeva PJ4 core, Marvell’s answer to the Cortex A8.

The ARM Cortex A8 is an in-order, dual issue microprocessor with a 13-stage integer pipeline clocked at around 600 - 800MHz today. Marvell’s PJ4 core implements the same ARMv7 instruction set, but the architecture is much different. It’s still an in-order, dual issue core but the integer pipeline is 6 - 9 stages depending on the instruction.

The shorter pipeline apparently doesn’t come at the expense of clock speed. Through the use of some custom logic Marvell is able to deliver clock speeds greater than 1GHz.

Both L1 and L2 caches are supported, just like the Cortex A8.

The biggest issue I can see with Marvell’s PJ4 is that it doesn’t support ARM’s NEON SIMDfp instruction set. Marvell argues that Wireless MMX2 penetration is higher than NEON. Given the limited use of Cortex A8 in the market today, I don’t see a lack of NEON compatibility as a major issue for now but it could be one down the road depending on developer uptake.

On paper the PJ4 would appear to have much higher IPC and clock speed than the Cortex A8. Marvell was unwilling to share any power or performance data at this time, so it remains to be seen exactly how well Marvell’s architecture competes in the real world but on paper, at a high level, it looks good.

It's interesting with nvidia, they say their new GPU is fully c++ compliant right? So they bypass the whole instruction set issue by just realizing everything is written in a higher language and then compiled anyway, so if they have a compiler and equal capabilities and their GPU-also-CPU is in PC's anyway.. well it's almost like the real thing.
Reply