Meta

64-bit FORTH

Before I describe a 64-bit FORTH version, I need to explain something about the more-established, general 32-bit version of this low-level language. The 32-bit FORTH had an accepted method of storing 64-bit, so-called ‘double-width’ numbers, in two positions on the stack, with the most-significant word ‘on top’, and the less-significant word in second position from the ‘top’. Correspondingly, ‘normal’ 32-bit FORTH possesses special operators that can either perform full, double-width arithmetic, which treats two consecutive stack-positions as defining a single number, or mixed-width arithmetic, in which two single-width numbers can lead to a double-width product, or by which a double-width number can be divided by a single-width, to arrive at a single-width quotient, and optionally, also to arrive at a single-width modulus / remainder.

This is a fashion in which 32-bit CPUs have generally been able to perform 64-bit arithmetic, partially. And if the reader is not familiar with how this is accessible under FORTH, I can suggest This External Article as a source of reference.

But, if the reader has installed the 64-bit GNU FORTH, which is also just called ‘gforth’ under Linux, then I should call to his attention, that now, each stack-position is capable of holding a 64-bit number, and that all the operators on those numbers are possible, which would otherwise be available for 32-bit numbers, with no special naming.

The following is a small text-session-clip, that illustrates how this works:

So the ‘/mod’, the ‘lshift’ and the ‘+’ operators are spelled exactly as they would have been for 32-bit FORTH, but operate on potential 64-bit numbers. ‘gforth’ still preserves the double-width operators with the special naming, but in this case, double-width actually means 128-bit ! Its implementation of the standard Fetch operator, which is still named ‘@’, now fetches a 64-bit value from RAM. And I have already documented this slight incompatibility in This Earlier Posting.

If we can assume that our source-code is to be compiled on 64-bit FORTH, we can just perform 64-bit operations on single stack-positions, at will.

It should also be noted that in FORTH comments and stack-traces, the topmost stack-position is written on the right-hand side of the textual list. The apparent negative number above, in the second stack position from the logical top, after ‘ m* .s ‘ , is the result of the most-significant bit of that word being a (1) and not a (0). By convention, in signed integers, this will trigger that a negative number is meant, using two’s complement. And this is still the case, in hexadecimal. But, because this word is the less-significant of the two listed, its most-significant bit will no longer be the most-significant, after it has been combined with the other word, thereby again forming a positive number when printed out as a single 128-bit, signed, integer.

(Edit 07/31/2017 : )

One fact which I have blatantly ignored in my own coding, was that the way in which I chose to separate a single numeric value into two bit-fields – through a modulus-division – is not the most efficient in terms of how many CPU-clock-cycles it consumes. A preferable way to do the same thing, is by using ‘rshift’, and then masking.

The reason for this is the fact that when a CPU is instructed to left-shift or right-shift a binary register-content, doing so takes up about 1 clock-cycle per bit-position shifted. What people may not realize, is that although addition and subtraction can easily be performed in one step by logic-circuits, multiplication and division may not be, assuming a generic, general-purpose CPU. To multiply two 64-bit numbers, actually means to perform 64 additions optionally, each depending on the value of one bit. And to divide a 64-bit number by another, actually means to perform 64 subtractions optionally, each depending on the outcome of a comparison. Maybe for 32-bit or 16-bit registers, we don’t care. But by the time we’re using 64-bit numbers, it penalizes our CPU-load twice as much.

What some people might therefore ask would be, ‘Why does Dirk use this simplistic modulus-division?’ And the answer is just that: It’s easier to write, and consumes fewer words or instructions. Hence, code that has been optimized for size, has not been optimized for speed, and vice-versa.

But as I wrote, the second form will execute several times as fast as the first, while also being the longer to write – or to store as part of a Dictionary.

A type of statement which might roll off some people’s tongues easily, is that a compiler – or a programming language – has been ‘Optimized for Size And Speed’. Unfortunately, those would be stereotyped words.

FORTH is generally optimized to use the smallest-possible Dictionary, which takes precedence over how fast it executes. Sometimes but not always, a smaller Dictionary also leads to faster execution.