Multiprocessor attacks tasks with DSP clusters

San Mateo, Calif. - Cradle Technologies Inc., one of the few surviving proponents of large-scale multiprocessing for media convergence applications, is announcing availability of its first silicon offering. The 3400 chip, comprising eight DSP cores, four proprietary general-purpose RISC cores and a cluster of I/O processing resources, represents the initial, base-level implementation of the Cradle architecture.

The architecture is aimed at videoconferencing, HDTV set-tops, high-resolution scanner/printers and other applications that require intense, stream-driven digital processing combined with modest amounts of control traffic. Rather than taking a conventional approach to these applications-with either a massive DSP chip or an ASIC with application-specific data paths-Cradle attacks the computing problem with clusters of DSP devices on a single chip.

A Cradle DSP cluster includes four multistream processors (MSPs), as the company calls them, that share a modest-sized but fast local memory: 64 kbytes of data memory and 32 kbytes of instruction memory. The memory may be configured as cache or as local scratchpad and is managed by a RISC-based memory transfer engine (MTE)-sort of a DMA controller on steroids. Each MSP, in turn, includes a general-purpose RISC core and two DSP cores. Each DSP has its own small local RAM and address-generation capability.

The initial implementation includes one such cluster, although Cradle envisions multiple clusters in future implementations. In addition to the computing cluster, the chip includes an I/O processing cluster with two more general-purpose RISC cores, two MTEs and its own block of program and data memory.

Chip input and output is meant to be nearly as flexible as the computing resources. General-purpose programmable I/O pins, backed by small FIFOs and control state machines, link to the I/O processing cluster to offer general-purpose I/O for data sources and destinations. These pins can also be used to cascade devices to increase the level of parallelism in computations. Since the pins are implemented as CMOS 3.3-volt I/O pads, they are not suitable for low-voltage signaling but are otherwise flexible. A dedicated SDRAM controller is provided, so this critical interface is not left up to the programmable resources. The SDRAM interface has its own FIFO, with reordering capability to optimize bank accesses.

All of these resources-the DSP cluster, the I/O cluster, SDRAM controller and a collection of semaphores and synchronization devices-are interconnected through a shared global on-chip bus.

Cradle is quick to point out that this is not a general-purpose computing architecture. Nor is it a variant on symmetric multiprocessing or a scaled-down massively parallel supercomputer. It is an architecture specifically intended for applications in which data arrives in ordered streams, requires stream-oriented signal processing without a high frequency of context changes and is readily decomposed into substreams by exploiting inherent parallelism in the data.

All such architectures thrive or die by the ability of their programming tools to fit real applications into the preconceptions of the architects. Appropriately, Cradle has invested a lot of energy in its development tools. A base set of tools provides C-level programmability for the various engines, based on ANSI C and GNU libraries. A proprietary eCOS operating system is provided as well. Perhaps more important, the company has developed a cycle-accurate full-chip simulator and interlocked set of debugging tools that allow developers not only to observe code execution on the engines but also to watch the interaction of the clusters across the global bus.

A second level of software aims the new silicon at two specific application areas. Cradle has assembled royalty-free application building blocks for these two initial target applications: video processing and printer image processing. The blocks include video codecs, a TCP/IP stack, RTP, CCD calibration and image enhancement algorithms for video processing.

The chip comes in two versions: the ECE 3400 comes with the video communications software package and the MPE 3400 comes with the printer processing package. In either case, the silicon is the same.

The chip is implemented in a 150-nanometer process and the processor cluster runs on a 220-MHz clock. The global bus clock is set at 300 MHz. At this speed Cradle rates the chip at a peak computational rate of twenty-eight 8-by-8-bit gigaMACs. The chip is rated at 3 watts typical at 1.2 volts. The chip, with either software package, is priced under $50 in large quantities, with parts available now.