This page contains Pisto's opinions. Everything should be taken as suggestions, and not as "The Truth".

Let's Do Better Than Binary Computers

Why all these pages about ternary computers? Because we believe ternary computers should be better than binary computers. Well, to this date (October 21st, 2009), almost everything has to be designed and engineered. This is the right time to make the right choices.

At Least, Let's Not Repeat the Same Errors

My main concern is about the tryte, the triad and the tribble. Honestly, I do not think these concepts should be anything more than typedefs. Let me explain why.

In binary world, we commonly use bytes. While this was very useful for 16 bits operating and file systems many years ago, it is already no longer necessary with 64 bits operating and file systems, since 2^64 units of RAM or disk space is more than what we can install in even the largest clusters (Top 500). At the same time, we already know that Microsoft and Intel are working on future 128 bits computers. So, with 2^128 addressable units, it is very much more than what is needed practically (128-bit storage: are you high?). I mean, at this point, why not just give an address for each bit in RAM or in a storage volume? Well, in binary world, it is too late...

When I want to isolate a single bit within a byte, I need the byte, a mask, a shift operator and the "AND" operator just to know if the bit is "0" or "1". This is clearly not optimal.

A "bool" variable (in C++) is actually a byte, while only one bit should have been enough.

Finally, I am tired of mixing Mbps (for bits) and MBps (for Bytes) when talking about bandwith of USB 2/3, SATA 3/6Gbps or a network connexion. This is not the biggest problem, but a factor of 8 could become problematic if we choose the wrong unit.

In the previous example, I want to show that the ultimate integers should be made of 1 through 90 trits. Programmers should have the liberty to choose the resolution they need. Not between 2, 3, 6, 12 and all other multiples of 2 and 3 trits. We need full flexibility. If someone wants 47-trits integers, well, it should be available. Here is the recursive syntax to declare an integer:

int<(const int<5> n)> var_name [= value];

In other words, the length of an integer is a const int<5> positive integer in the range of [1, 90].

I know, 90 trits seems a lot. But, why 90? First of all, it is to do better than 64 bits (about 42 trits) and 128 bits (about 82 trits). Then, it is because we also have to plan Single Instructions on Multiple Data (SIMD) even if we are not there yet. I will talk about that later. But the question could be: why not 96? 100? 108? 120? etc. I would respond: the more we have trits, the more future processors will be energy consuming. So, we have to choose what is just good enough. I believe 90 trits is good enough for integers.

In future ternary processors, there must be 90 trits regular registers. To load an integer, we need to provide the address of the integer and its length, as well as the instruction to load it. When loading a number, we could add zeros in the most significant trits to fill the register, if necessary.

In the previous example, there is a static cast which is now easier to write: sc<type>(value).

Finally, I would like instructions to add and/or subtract three integers at the same time. I other words, ternary C++ would have these operators: "operator+ +()", "operator+ -()", "operator- +()" and "operator- -()". So, instead of doing two additions to sum three numbers, let's have instructions to sum and/or subtract three integers at once. Look at this example:

PP
1st number: PP
2nd number: OP +
3rd number: PP +
------
POO

Why not? It seems possible.

Now, what should we do for trytes, triads and tribbles? Here is the answer in a future stdint.h file:

Pointers are defined by this syntax: ptr<type> ptr_name [= value]. With this notation, there is no ambiguity about the exact type of listed pointers as in actual C++:

int* ptr1, ptr2; // vs
int *ptr1, *ptr2; // ?

It is also possible to cast a pointer to an integer, at the condition we use enough trits. So, the function sizeof() becomes very useful for this task. Finally, the static cast "sc<>()" automatically determines the right destination type of the cast. There is a similar concept in C++0x.

The downside of such long addresses is that all tree structures will need much more memory to hold all the pointers.

Now, what about distributed memory? Is it possible to store two components in a single integer? For example, is it possible to store an ID, which is used to identify the host or the peripheral that owns the requested data, and an address? The answer is yes. We just need to use a universal code for the ID, reserve value "0" for the local regular virtual memory space, and use the remaining trits for the address. Example (little endian):

TT ttttt.. = 90 trits
ID Address

This is just a suggestion... ;-) At least, it remains flexible if, one day, we have hundreds of thousands of computers that share their memory.

Floating Point Numbers

Floating point numbers should be able to store an integer of 90 trits. Furthermore, we must improve the actual model of float/double. Here is the solution illustrated in the following C++ example:

The syntax is "float<(const int<5> m), (const int<5>n)>", where m sets the number of trits for the exponent, and n, the number of trits for the mantissa. Yes, the exponent's size comes first: it is just a convention since the exponent should also be stored first (and in little endian, like the mantissa).

A float is made of two integers, which are made of 1 through 90 trits (you choose!). The exponent can have a positive and negative value, as well as the mantissa.

Like in binary float's and double's, all trits of the mantissa must be pushed toward the most significant trit. The most significant trit gives the sign. For example, -9.0 is encoded as -1.0*3^2. Here are some other rules that are illustrated with a float<4, 8> (all trits written in little endian):

Negative infinity: PPPP NNNNNNNN

Value zero : OOOO OOOOOOOO

Positive infinity: PPPP PPPPPPPP

Not a Number : xxxx xxxxxxxO (the most significant trit in mantissa must be P or N, unless all x'es are O's)

One more important detail: the floating point numbers are loaded in a special kind of registers. These registers are called "Multimedia Registers". They have more trits than regular registers, which have "only" 90 trits. In fact, multimedia registers have 360 trits (2 pairs of 90-trits integers). They are used for floating point computations, operations on complex numbers, and for SIMD. The two halves of these registers (180 and 180 trits) are accessible individually.

Compressed/Compacted Floats - The cfloat Data-type

It is a compromise between small exponent/long mantissa and long exponent/small mantissa. For numbers in the order of ±1, you want a lot of resolution in the mantissa. At the same time, you agree to have less resolution for very big and very small numbers (exponent much greater or smaller than 0). And you want all that with the same total number of trits, but you do not want to specify a fixed number of trits for the exponent and the mantissa. The solution? A cfloat, or a compressed/compacted float.

This time, the smallest cfloat has 4 trits. The biggest one has 90 trits. And it is quite easy to cast from one size to the other.

Now, why a minimum of 4 trits? Because we have to encode both the exponent and the mantissa. Ok. One more question: how is a cfloat encoded? The exponent comes first and it is the component that is encoded with a universal code. This is the reason why the exponent should always be stored before the mantissa, and this is why those two components should be stored in little endian. Finally, depending of the universal code, we need up to 3 trits to encode values "1" or "-1". After that, we need one more trit to store the mantissa normally. So, 3+1 is 4.

Because the exponent is compressed, it needs to be decompressed before doing any operation with the value. So, the cfloat could be copied in a multimedia register (it takes a maximum of 90 trits at this point). Then, it could be decompressed to recreate the full exponent, and to seperate the mantissa. When all operations are done, the exponent gets compressed, and the remaining free trits are used to store the most significant trits of the mantissa. Yes, it takes more time, but it is the cost for the economy in memory space and/or bandwidth.

In the middle of a computation, it is also possible to put in cache an intermediate result as a regular float. This could speed up things a lot, and this keeps a maximum of precision during all the process. Finally, only the final result is converted back to cfloat for storage purposes.

Native Support for Complex Numbers

My little $30 Sharp calculator can do complex operations. Ternary computers should be able to do such operations too.

The cplx type follows the same pattern as the regular float. In fact, a complex number is made of two identical float<m, n>. So, the syntax for the complex type is similar: cplx<(const int<5> m), (const int<5> n)> var_name [= value];.

As we can see in the previous C++ code, we can write a complex constant with two floats and the letter "i" in the middle to separate the real and imaginary components (-1.0i0.0, -1.0i, -1i, -1i-0.0, etc.).

The sqrt() function, as well as many other scientific functions, may have a special version for complex numbers. So, sqrt(float<2,4>(-1)) will fail, but sqrt(cplx<2,4>(-1)) will succeed. In the C++ code, the cast type is implicit, since the sqrt() function will "transfer" the needed type.

Casting from float to cplx should be easy: m and n are the same in float<m,n> and cplx<m,n>: sc<cplx<>>(floatValue). Casting from an integer to a complex number needs more care since m, the length of the exponent, has to be set to a value large enough: in a float, all trits are shifted to the most significant trit, and the exponent is set accordingly. So, even if the mantissa has the length of the integer, there is still an implicit shift to conform the specification of the float data-type. So, m must be larger than log3(n).

The real and imaginary components are accessible with the value.r and value.i syntax respectively.

A complex value is loaded in a multimedia register (360 trits). By the way, many cast operations are done in this kind of register (int to float, float to cplx, float to cfloat, and vice-versa).

Same thing: one complex number made of two cfloat of the same size (n trits each).

SIMD - Single Instruction, Multiple Data

Here, I am not going to list all possible instructions. The only thing that I have to tell is why I have limited integers to 90 trits, and SIMD/Multimedia registers to 360 trits.

As I said, we must do better than what people have done in binary world. Then, we must know how much is good enough. Finally, we have to minimize the number of trit as much as possible, because any computation is energy consuming.

In SIMD world, we must place many similar integer/float/complex numbers in an array of N trits. What is important here is that N must let me choose the number of integer/float/complex numbers I want according to the size of these numbers. In other words, if I want to compute 10 numbers at once, N must be divisible by 10. Why 10? Because, we like to write numbers in Base 10. So, there are many chances that some problems will have arrays containing a number of items that is a multiple of 10.

At first, like probably many of you, I have chosen 81 trits as the maximum. But 81 would mean 324 trits for a whole SIMD register, which means 3*3*3*3*2*2 trits. This is very flexible for powers of 3, but not very flexible for powers of 2 or for any other prime number. 96 trits is not better: 96 * 4 = 384 = 2*2*2*2*2*2*2*3. 108 trits may be better: 108 * 4 = 432 = 2*2*2*2*3*3*3, but 108 trits seems a lot. Finally, 90 trits seems to be the best choice: 2*2*2*3*3*5. It is the best compromise between prime factor flexibily and energy economy. Here are the possible SIMD instructions with 360 trits:

360 int<1> or bool

180 int<2> or float<1,1> (if we want to allow that much float in a CPU)

120 int<3> or float<...> (same comment)

90 int<4> or float<...> or cplx<1,1> (if we want to allow that much cplx in a CPU)

72 int<5> or ...

60 int<6> or ...

45 int<8> or ...

40 int<9> or ...

36 int<10> or ...

30 int<12> or ...

24 int<15> or ...

20 int<18> or ...

18 int<20> or ...

15 int<24> or ...

12 int<30> or ...

10 int<36> or ...

9 int<40> or ... (it is possible to do 3 operator+ +() at once)

8 int<45> or ...

6 int<60> or ... (it is possible to do 2 operator+ +() at once)

5 int<72> or ...

4 int<90> or ...

2 float<90,90>

1 cplx<90,90>

Yes, I know. Asking for such flexibility in SIMD processors is probably asking the moon, but this list is mainly to justify the choice of 90 trits as the limit for integers.

Conclusion

I hope you enjoyed (and understood) my suggestions. I am really aware of the fact that this is actually science fiction. Even if all this do not come to life, at least it was my pleasure to write the previous sections.