Squeezing Forth for more performance with INLINE[ ]

When the Forth language first came onto the world stage it was implemented as something called an "indirect threaded code" language. This is a very clever way to make a language that fit a lot of stuff in a small space. You can see that with the big Forth systems that are running on the TI-99. An interpreter, a compiler, a text editor and a virtual memory disk system typically can fit in 16K using this approach.

The secret to this code density is that every routine is identified by just one address. The secret to making it run quickly is creating a little routine to read those addresses and do the code associated with them as fast as possible. On the TMS9900 that little routine is 3 instructions of assembly language so it's pretty fast. But as we have seen in the benchmarks with GCC and assembly language code there is a price penalty with threaded code that can range from 2.5 to as much as 10X or more in the worst cases.

It turns out that ITC Forth spends about 50% of the time running those three instructions that read the address which sometimes us called the inner interpreter. (as opposed to the "outer" text interpreter)

At the bottom of every Forth system are a pile of little routines written in assembler that do all the real work. They are simple things that read the contents of a memory address or add two numbers together. The code is all there and it is normally just called by the inner interpreter. Each routine ends with a call back to the inner interpreter.

Wouldn't it be handy if we could use all that code the way a conventional compiler does and copy the routines into memory all in row and eliminate the inner interpreter?

This little piece of Forth code does just that. It reads a string of Forth words that are assembly language words and strings the code together as one long routine. This is not as space efficient, but it is about 2.4 times faster!

Here is an example that takes one number on the stack and performs a pile of operations on it.

Very nice, indeed! A version of this would work quite nicely for fbForth 2.0 words in the user dictionary in RAM. However, I am afraid it would not work very well for code (ALC) words in the fbForth 2.0 resident dictionary (RD). Though the cfa and pfa of virtually every word in the RD are in ROM bank 0, their bodies are scattered over all four ROM banks, which would not yield to a linear copy of their machine language.