From 2009 to 2020: A history of developments in programmability

Predictions come easy, but from the demise of FPGAs to the emergence of 32-core processors, DSPs are showing some strong trends and I think it is possible to divine what will happen in the next few years as we move towards the next order of magnitude increase in computational efficiency.

Editor's note: This is the second of a multi-part 2020 Vision series outlining what the future may hold, as viewed by technologists within Texas Instruments. Click here for part 1.

Predicting the future is primarily an act of the imagination. However, digital signal processors are showing some strong trends and I think it is possible to predict what will happen in the next few years as we move towards the next order of magnitude increase in computational efficiency.

Here are my thoughts on the next 12 years.

2009: Multicore is here. With the increase in SoC architectures, single-core CPU devices have become more the exception than the rule.

2012: Network-on-Chip (NoC) arrives. A NoC is a high-performance device, which is really a grouping of processing islands connected by packet-based, point-to-point asynchronous communication highways.

2010"2015: Component-based software. The number of cores on a device is still fairly modest, and individual software components are developed for a single computational cluster by "component developers" and then "assembled" onto a multi-core system. Development tools for this methodology improve steadily as virtualization of hardware through middleware is driven by efforts such as the SCA (Software Communications Architecture) for SDR (software-defined radio). Auto generation of glue code between components becomes the norm.

2015"2020: Single program multiple data (SPMD). The component-based approach begins to fail as the number of cores reaches 32. Turning to techniques used in high-performance computing (HPC), the embedded software community develops the SPMD approach where a program can be compiled to run over multiple cores. While initially requiring explicit description of the communication flow, pragmas are now employed to enable the parallel nature of algorithms to be exploited by a variety of multi-core devices.

2015: The Death of the FPGA. An important footnote in the history of programmability is the demise of the FPGA. Small multi-core CPUs consume significantly less power as well as provide a richer set of mapping options for complex algorithms and communication patterns than does the distributed fabric of ALUs and LUTs that make up FPGAs.

2020: The CPU disappears. Spreading functionality across multiple CPUs drastically simplifies the silicon overhead on each CPU, and hardware-based OS support manages NoC traffic efficiently. Programmers are unaware of the communication between CPUs and can develop/debug code without having to know which individual execution units are involved. Programming follows more the overall flow of data than its individual parts.

The range of devices available in 2020 will be about the same as it is in 2009. In 2020, embedded DSPs will still be a heterogeneous combination of CPUs and accelerators. Even though programmers are unaware of the individual devices when programming, it will still be true that some devices perform certain tasks much better than others.

Since much of the value of SoCs is placed in the careful choice of peripherals, CPU and DSP manufacturers differentiate themselves by providing the best combination of different IP blocks and how they connect. In the end, the quality of development tools and application software support will determine the first-tier players.

About the author
Alan Gatherer is the CTO for the High Performance Multicore Processors group at TI and is responsible for all strategic development of TI's digital baseband modems for 3G wireless infrastructure. Since joining TI in 1993, he has worked on various digital modem technologies including cable modem, ADSL and 3G handset and basestation modems. In addition, he holds 60 patents and is author of the book "The Application of Programmable DSPs in Mobile Communications."

The following are some of the mis-steps taken in the design of computer architectures that need to be undone:
1. Adoption of the Von Neumann Architecture.
This led to the following problems:
a. Machine Code use as the means of programming, rather than, patching together the hardware in real time
using patch-cords (somewhat like patching calls manually at a manual telephone exchange).
b. This soon gave rise, despite Von Neumann's vehemant opposition to "wasting" valuable computer time for
noncomputational tasks, to the development, over his objections, of programs which could convert from Programs
composed of punched cards or paper tape (then translated to ASCII or EBCDIC code), from Assembler Mnemonics to
Object (Machine) Code.
b. This led to freezing of the hardware architecture in the interests of forward compatibility of the hardware with all the advances then being made in software design...
i. Development of program "Monitors" for supervising the running of jobs
ii. Development of Autocoder programs in the US that converted assembly language to object (machine) code.
iii. Development of Autocode programs in the UK that converted "high level" languages to machine code.
iv. The development of languages such as FORTRAN, COBOL, and ALGOL that allowed the computer to be programmed using
these languages instead of the painful lower level assembly language or machine code.
Here are the problems associated with these developmets:
Since hardware design was beyond the ken of these new software developers, they got into the habit of encapsulating the hardware in cocoons of software that hid the hardness of the hardware from sight and swept them under the carpet, thus machine language was hidden by assemblers and assembly language by high level language compilers, and so on. All these remained vestigial artifacts of computation, although being software designers, they did not try to get rid of them.
v. Hardware innovation was limited to making the Von Neumann Architecture faster - this led to complexity skyrocketing while basically letting the essential VNM architecture stagnate, and a band-aid approach to speed improvement. Band-aids to making the hardware faster included: Pipelining, branch prediction and instruction prefetching, data and instruction cache, translation lookaside buffers, virtual memory heirarchies, loop unrolling, register renaming, and on and on, giving smaller and smaller benefits with greater and greater complexity.
vi. The invention of pointers and data structures, all based on binary objects stored in memory. A slight error in a pointer arithmetic could start code execution from places not corresponding to instruction boundaries, or data access similarly displaced from their intended frames of reference, leading to program runaway. At the same time, reuse of the hardware made sure that debugging was made difficult, as all the evidence of what was happening was continually being overwritten.