Tools

Extreme FORTH

By Stephen Pelc, September 24, 2008

Multicore processors, FORTH programming, and the relationship between software and silicon

Stephen Pelc is the managing director of MicroProcessor Engineering, a provider of hardware, software, and firmware development tools. Stephen confesses to having programmed in DIBOL, Fortran, and Algol-60 amongst other languages.

In the article, Modern Forth, I focused on the impact of modern Forth compiler design on current register-oriented CPUs. In this article, I examine the relationship between software and silicon, and discuss a search for simplicity to improve performance and reduce chip size and power consumption.

For those of you whose software life is based around C and other "Pasgol" languages, shift your perspective of Forth and start thinking of it as a two-stack silicon machine. Forth compilers for conventional CPUs just map this model onto a register-oriented model. We will also see how to map the C virtual machine (VM) onto a two-stack VM.

Chips designed to run Forth well have been produced for more than 20 years, including the Novix NC4000, the Harris/Intersil RTX2000, and Silicon Composers SC32. There has been a flurry of cores for implementation in FPGAs, including MicroCore. Today, the state of the art is the 40-core SEAforth processor from IntellaSys. (Later in this article, I look at the C18 core and interconnects used in the SEAforth chips.) But first, I examine changes to the canonical Forth VM to achieve the goals of performance and code size.

Revisiting the Forth Virtual Machine

We will look again at the Forth and C VMs, see where the Forth VM is weak, and discover how to adjust it to improve execution of both languages. This leads to some understanding of why the IntellaSys C18 core is as it is.

[Click image to view at full size]

Figure 1

The canonical Forth virtual machine is weak in several areas:

It does not execute C well, which is important for commercial exploitation of general-purpose silicon stack machines. C requires a frame pointer for access to local variables and buffers in main memory. The two stacks are not in addressable memory.

It is weak for DSP operations, which restricts performance in embedded applications without changes to the VM or increased compiler complexity.

Without index operations, dealing with complex data structures is cumbersome, especially when a base address is passed as an argument to a word/function.

DSP operations often require three or four parameters to be manipulated. For example:

source address, destination address and length,

first source address, second source address, destination address and length.

Canonical Forth requires ugly source code to deal with these situations. Several silicon implementations provide index and scratch registers, and others have provided more access to the top of the return stack. Using the top of the return stack as a loop counter has been common for a long time; for example, the FOR ... NEXT loop structure.

The Forth community has long talked about TOS (top of data stack), NOS (next/second on data stack) and TOR (top or return stack). These are not quite enough for DSP operations. Chuck Moore's current silicon includes A and B registers which are used both as index registers and for scratch storage. Efficient execution of C requires a frame pointer, and a spare index register is always useful. We end up with the model in Figure 2.

[Click image to view at full size]

Figure 2

The A and B registers are used as scratch locations and for stepping through memory using auto-increment and auto-decrement addressing modes. The X and Y index registers have base+index addressing and can be used as frame and thread-local storage pointers. The X and Y registers are important for general-purpose CPUs, and are not implemented in the IntellaSys C18 core.

The impact of the A and B registers can be seen in this biquad filter implementation by Gary Bergstrom for a 16-bit embedded system. Gary commented on the previous article about Forth's return stack not getting in the way of parameters:

This has to be one of the most underrated points in Forth. Factoring words in Forth is natural and the lack of return addresses interspersed with the data allows this to be very efficient. In most languages you can't factor to the degree that you can in Forth without having severe run-time speed consequences. You can't keep passing data to lower and lower layers without building new stack frames, with the same data repeated in them, again and again.

In this example, the A and B registers are set up by the words A and B in BIQUAD. These registers are now parameters to the lower layers with no parameter passing overhead. Use of these registers has removed the need for local variables while permitting additional factorisation. They have also considerably reduced stack manipulation in both the source code and the compiled code. Because parameter passing is efficient, what would be inline code in other languages is encapsulated as factors, which in turn reduces code size. The importance of code density will become apparent in the next section.

The X and Y registers above show their worth in larger systems for indexed addressing into structures in memory. They will be used in a conventional Forth system to access local variables and buffers, and to provide a pointer to thread-local storage. One of them will be used as a frame pointer by a C compiler.

These changes to the Forth VM improve code density and performance in Forth. They also permit two-stack machines to run C efficiently. A more in-depth look at this VM will appear in the EuroForth 2008 conference proceedings and on the EuroForth conference website in October 2008.

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Video

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!