Is Intel’s ‘Penryn’ Chip Hiding Graphics Support?

This site may earn affiliate commissions from the links on this page. Terms of use.

It might not make much difference to the end user, but a tiny instruction in Intel’s upcoming SSE-4 instruction set included in the “Penryn” processor might be the first step on Intel’s road toward enhancing its CPUs with graphics capabilities.

Much of the discussion of “Penryn”  Intel’s first 45-nm CPU, probably due by the end of the year  here at the Microprocessor Forum rehashed announcements Intel has previously disclosed, such as the addition of a Deep Power Down state as well as the disclosure that it will contain new SSE-4 multimedia instructions.

One of those instructions, a “streaming load instruction,” might be Intel’s answer to the general-purpose GPU (GPGPU) proposals being put forth by AMD’s ATI division as well as Nvidia, Intel executives said. AMD, meanwhile, has put forward a strategy to bring the graphics processor and general purpose CPU together by 2009, a strategy it calls “Fusion”.

The instruction, part of Intel’s new SSE-4 instruction set, gives special priority to graphics data, allowing it to bypass the normal CPU cache.

Put simply, Intel promises that the Penryn will be a dual-core family whose die will be a smaller 107 sq. mm, an additional 2 Mbytes L2 cache for a total of 6 Mbytes, 47 new SSE-4 instructions, enhance Dynamic Acceleration technology, and some additional microarchitectural enhancements for additional instructions per clock (IPC) performance. The Penryn family will also include quad-core products, whose cache sizes will measure about 12 Mbytes.

Intel described each of these technologies in March, although executives added a few more details on each.

To briefly recap:

Deep Power Down technology: Each processor core contains a voltage regulatr sensor that monitors the CPUs. Upon receiving what’s known as a “MWAIT Level 6” request, the CPU flushes the level-1 cache and saves its state, then the level-2 caches. The chip makes a check to make sure that there aren’t any inbound clock or DMA traffic, then enters the “leakage off” state.

Dynamic Acceleration Technology: When a dual-core chip encounters a single-threaded application, the other core sits idle. In that case, the first core can enter a “frequency boost” state where the clock speed is ramped up beyond its rated speed, or overclocked. The core remains in the accelerated state for a “thermally significant” amount of time, making sure that the chip isn’t damaged by increasing the clock frequency.

VTX: Intel’s hardware support for virtualization, also known as VMCS. When a virtual machine is run on a Penryn-class chip, the hardware hides the entry/exit virtualization commands from the software, accelerating the instruction context switches by 25 to 75 percent, according to Intel.

SSE-4: While the specific instructions themselves will primarily be of interest to developers, Intel highlighted four specific areas that the SSE-4 instructions will be useful for: dot products, for 3d content creation; motion estimation; finding the best sum-of-absolute differences, a “branchy” operation that usually requires several lines of code; and the streaming load instruction. The architecture also includes a “super shuffle” engine  used to more efficiently process SSE data formatting, and a radix-16 divider code that’s half as fast as the previous architecture. The improved motion estimation uses the performance of the CPU to look for motion across the bulk of the image, not just on a per-pixel basis.

The streaming load instruction is a 16-byte aligned load instruction. But interestingly, the results are held in a temporary stream buffer that bypasses the normal cache hierarchy, a high-priority expressway that other data types haven’t received. Intel identified the streaming-load instruction as ideal for GPU-CPU sharing, as well as imaging.

“This is an interesting instruction, as it opens the door to new areas of collaboration between CPU and the GPU,” said Stephen Fischer, the lead Penryn architect at Intel, during a presentation here. The instruction improves the read buffer from the GPU to the CPU by a factor of eight, he said.

When asked at a lunch panel whether the instruction was a response to AMD’s “Fusion,” Fischer replied, “I could see where people would say that,” he said.

At this point, very little can be said about Intel’s strategy of merging graphics and general-purpose CPU cores. For years, Intel has pushed its integrated chipsets, combining graphics and general-purpose CPU core logic, as the most cost-effective solution for mainstream PCs. The company is also rumored to be planning a re-entry into the standalone graphics market. Meanwhile, companies like Nvidia and ATI have begun positioning their graphics chips as arrays of CPU cores that were designed for graphics, but can also be applied to a very limited subset of general-purpose computing as well.

Kevin Krewell, a former In-Stat analyst now working in technical marketing for Nvidia, declined to comment specifically on the new instruction. “Any enhancement to the X86 architecture is good for us, as it enhances the interfaces between the CPU and the GPU,” he said.

The problems that the GPU providers are trying to solve have already been solved by the CPU, Fischer said. “It helps potentially tip the balance toward the CPU,” he said.

Nathan Brookwood, an analyst with Insight64, cautioned that the impact of any of the new instructions won’t be felt for at least a year. “it might benefit ‘Nehalem,'” he said. “Intel will probably cite some benchmarks that show an improvement. But for the typical end user, it won’t have much immediate impact.”

This site may earn affiliate commissions from the links on this page. Terms of use.

ExtremeTech Newsletter

Subscribe Today to get the latest ExtremeTech news delivered right to your inbox.

Email

This newsletter may contain advertising, deals, or affiliate links. Subscribing to a newsletter indicates your consent to our
Terms of Use and
Privacy Policy. You may unsubscribe from the newsletter at any time.