Intel, Microsoft describe parallel progress

SAN FRANCISCO, Calif.  Intel and Microsoft are taking small steps on the long road to creating a new parallel programming model for tomorrow's multicore processors. The companies discussed their progress in separate presentations at the Intel Developer Forum here.

Microsoft talked about its vision for adding new layers to its system software stack as well as point extensions it is making to its .Net environment. Intel discussed new planned extensions to the x86 instruction set and showed progress on Ct, extensions to the C++ language aimed at supporting greater parallelism.

From the advent of computing, software got a free ride as Moore's Law drove serial processors to greater performance levels. But growing problems of power leakage in microprocessors has driven a shift to putting more cores on a die, forcing a historic transition to a parallel programming model yet to be invented, said David Callahan, who leads Microsoft's Parallel Computing Initiative, announced late last year.

Microsoft and Intel are backing various academic research initiatives to help plow the way forward. At IDF, they shared some of the progress from their internal corporate teams.

As if this job was not ambitious enough, Microsoft hopes to use the parallel shift to enable advances in computer interfaces.

"This is really about a new set of natural and immersive experiences we want to deliver," said Callahan. "The parallel computing shift is just a sort of accident along the way."

The underlying software plumbing needs an overhaul before such work can begin. Callahan said tomorrow's system software will be much more layered into separate elements including new runtime environments that sit in a user space below application libraries and above hypervisors and the core operating system kernel.

The runtime environments will act as schedulers, working cooperatively with hypervisors that map virtual to physical resources and OSes that manage access to physical hardware. "This represents a re-factoring of traditional OS services," said Callahan.

The aim is to better handle the growing number of competing requests in multicore environments. Even today's PCs host a "terrifying number" of processes running in parallel, creating sequential-processing bottlenecks and losses in data locality, he said.

Microsoft will expose its runtime layer to third parties including Intel because it expects there will be a need for many kinds of interoperable software abstractions from different vendors to serve different application types. Tomorrow's software also calls for improved techniques in cooperative scheduling, better thread-level performance and enhanced message passing.

"There are a deep set of changes before you can even get to rebuilding libraries and rewriting apps," said Callahan.

"This is an ambitious shift, and this is just their first cut at it," said Michael McCool, chief scientist at RapidMind (Waterloo, Ontario) which sells parallel programming tools for the x86 and other processors.

"Initially they have done some of the obvious things supporting parallel tasks, but I haven't seen anything about efforts to abstract data," McCool added.

Tomorrow's parallel programming model will need new categories for sorting data so it can be marshaled into appropriate locations in cache at the right time, said McCool. He noted that Intel's latest high speed processor interconnect significantly reduces latency, but if the wrong data appears in cache, latency can shoot up dramatically.

Parallel is "Distributed". No new magic escept more "Cores" on a device. Our design has two 32-Bit Processors on a single FPGA from Actel - the AX2000. The on-chip memory is used for 1024 Instructions and 1024 Operands - Harvard Architecture. Our Operating System is called FOPS [File Oriented Programming System). Each basic function of the Language has an ALP76 assigned. Each processing statement has several calls to various functions. These are distributed [parallel] and is possible to execute the entire statement in one "fell swoop" as we used to call it. However if there is a function waiting for a "call"; a DDP bit says wait for interrupt. Any processor not activated is sound asleep with no clocks applied. We are not heavy with registers in our pipeline - only four, but they can be executing in parallel - and have transparent Jump, Jump and Store Return [Call], and Conditional Jumps. Both of our processors; the ALP76 [32-Bit] and ERIN76 [64-Bit] use Schematic Design vs HDL. Decided to stay with the old school here and one of eight is some sort of Jump anyway. Questions - ask at---
Richard E. Hartney, President
Erin Greene & Associates LLC
Email: rhartney1@cox.net

Parallel Processing is just another way of saying "Distributed Processing". We are currently disecting a Software Operating System called FOPS (File Oriented Programming System) that was designed in 1972. At the same time; in order to keep costs to a minimum we have designed a Computer called ALP76 (A Language Processor) that is an Emulation of the Honeywell DDP516 which had One Accumulator, One B-Register and One Index Register, expanded to 32 Bits. Most of our present day Engineers don't remember why registers were added - a hardware stack is/was faster than Core Memory. New Instructions have been added to complement data handling such as Clear/Set/Test Bit, Load/Store Byte. Our task is rather simple as FOPS currently has only 9,976 Instructions. ( I counted them!!!) Subroutine lengths are less than 512 instructions. We are targeting the ACTEL AX2000-1 FPGA containing two processors using on-chip memory for Instructions and Operands with a modified Harvard Architecture and is software Programmable with any combination Series, Parallel, Orthoganol instruction execution -defined during boot power-on load. Have not decided whether to make this feature programmable on the fly. Gut feel says - YES. Each of the FOPS operators shown below has its its own processor assigned. Punctuation marks are grouped. 10. The FOPS LEXICON
The FOPS language consists of vocabulary words, variable markers and punctuation marks which are used to form legal strings called statements for execution by the system. The Language is File, Form, Page and Part Oriented as used by the Application Programmer. The System Programmer uses an Assembler not a Compiler for the entire system. Compliers are; in my mindset; very inefficient. The Hardware uses Schematic input not HDL which; again; in my mind not very efficient.
COMMANDS OPERATORS MODIFIERS VARIALBE MARKERS
PRINT FILE IN FORM ?X?-Literal marker
PRIORITY PAGE FOR -Value Marker
DISPLAY PART IF [X]-Numerical position marker
DEMAND FORM ON
SET +(plus) ENQUE PUNCTUATION MARKS
DELETE -(Minus) SYSTEM /-Step Marker
COMPILE =(Equals) VARIABLES :-Range of values marker
WAIT => (Greater) LP1 ;-Statement terminator
TO <= (Less) LP2 ,-List delimiter
DO READER LP3 @-Keyboard input terminator
DONE LIGHTPEN LP4 .-(Center Dot) subfile marker
SEND STEP TIME *-Comment Marker
RECEIVE VIDEO CLOCK
(Reserved) &(Concatenation BLINK
Operator
UNBLINK
$XX$ (1)
T1
T2
(1) All 4 character variables starting and ending with $ reserved for error handling and Initialization.
I too will accept comments and thanks for accepting my comments.
Richard E. Hartney, President
Erin Greene & Aasociates LLC
Business Information Management Systems
Email: rhartney1@cox.net

My hunch is that for a parallel programming model to get broad adoption, it will have to be a relatively modest change. Learning a new language, or substantially restructing an existing app with millions of lines of code, are too high a barrier to overcome on the timescales involved here. 8-core processors are around the corner, after all.
(Here's an e-Book on multicore programming - covers various programming models including OpenMP, Intel's TBB, Pthreads, Cilk++, MPI.
http://www.cilk.com/multicore-e-book)