Newisys has developed a chip that will allow the Opteron processor from Advanced Micro Devices to compete in the upper reaches of the server market.

Horus, the name of the Newisys chip, will let computer makers put together servers containing more than eight processors. The company also is developing servers containing eight, 16 and 32 processors based on Horus, the company revealed at the Hot Chips conference, which took place at Stanford University this week.

The ability to create servers with large numbers of chips will enable AMD to enter the market for large, multiprocessor computers. Not many of these computers get sold every year, but they carry a hefty price tag. The base price of a 32-processor Superdome server from Hewlett-Packard starts at just below $93,000. Participating in this market also comes with a whiff of prestige.

Currently, only eight Opteron chips can be linked together through HyperTransport links to make a single server. Most manufacturers, however, don't make eight-way Opterons, opting instead to sell one-, two- or four-processor servers.

Some supercomputers are built with Opteron chips, such as Red Storm, but these generally consist of two- and four-processor boxes lashed together in a giant cluster.

The processor ceiling on Opteron exists because of the problem of cache coherency, said Nathan Brookwood, an analyst at Insight 64. Data is stored in main memory as well as in a pool of memory embedded in each chip, called a cache. After a chip fetches data from memory, it needs to check the caches of all the other chips in a computer to ensure that the data hasn't been changed by another chip. In Opteron computers, the chip sends a signal to each of its cohorts to ensure coherency.

"The problem is: How do you make sure that the most recent value is in play?" Brookwood said. The cross traffic isn't much of a problem in two- and four-processor servers, or standard clusters of four-processor boxes, but it starts to become a problem in eight-processor boxes.

Horus solves these problems by acting as a cache monitor for four processors on a given board. In a 16-processor Opteron server, for example, cache coherency would be accomplished by cross traffic between the four Horus chips assigned to the four quads of processors.