The following was a proposal I wrote proposing an
improvement to the brnch prediction in a new ICL mainframe. The main
scheme was what is now called a two-bit Smith branch predictor, but
the interesting bit is at the end where I show how the branch
prediction can be improved using only one bit per hash entry by using
random numbers.

An area like this I used random numbers for is when
gathering statistics on program running. It would be daft to do this
nowadays with memory so cheap but once upon a time an extra 16k
really meant something. The idea is that instead of using 4-byte
integers for the count you use only a single byte, but the value is
only incremented with a certain probability that goes down the higher
the number in the byte. Thus the exact number isn't kept only a
approximation to the log of the number.

Probably the neatest application of probability I know
of that is still of use in computing is to form a directory
fingerprint so one can normally avoid checking it for an entry which
isn't present. Hmm..., I just did a search using Vivisimo and I can't
find any reference to the technique so perhaps it ain't so useful any
more :(

D McQuillan
BRA 4OPAW x2633
25 Sept '79

Re: Jump Prediction on S4

The jump prediction described in the new PSD about the pipeline
for S$ predicts that a conditional jump will jump the same way as it
did last. I also suppose that there is no clearing of the jump
prediction table - a jump address hashes to a place in a table in
which a single bit is kept giving the prediction.

The jump prediction could be very significantly improved I think
by having 2-bits per entry and only changing a prediction if it is
wrong twice in succession. What is wrong with having one bit is:

a) In repeated loop the first time round is predicted wrong each
time - thus the prediction is wrong twice per invocation of the loop
(either explicitly in an outer loop or implicitly in a repeated
subroutine call).

b) Single executions of contrary jumps in straight sequences of
code have an undue disruptive influence when they hash to the same
places as repeated jumps.

There are a number of alternatives which work better in these
circumstances. My favourite is a two bit method. One bit (j)holds the
result of the last jump and another (p) hold the prediction for the
next jump hashing to the location. The prediction is changed if it is
wrong and differed from the last jump bit also. The action is shown
in the table below

prediction=p

j

jump is

success

next p

j=jump

0

0

0

Y

0

0

0

0

1

N

0

1

0

1

0

Y

0

0

0

1

1

N

1

1

1

0

0

N

0

0

1

0

1

Y

1

1

1

1

0

N

1

0

1

1

1

Y

1

1

Prediction % right

Repeat Loop size

Random prob of jump

2

3

4

5

1/3

1/4

PSD Method

0%

33%

50%

60%

56%

62%

2-bit method

50%

67%

75%

80%

59%

69%

Formulae:

PSD method repeated loop size N

% right = (N-2)/N

2-bit method

% right = (N-1)/N

PSD method random prob of jump

% right = 1-2g

2-bit method

% right = (1-2g-2g2)/(1-g)

Where g = p(1-p)
and p = probability of successful jump

Difference repeated loop size N = 1/N

Difference random prob f jump = (g-4g2)/(1-g) approx= p
when p small

Other interesting methods are:-

The 1-bit random method – the prediction bit is only changed
with probability p when it is wrong.. The following shows the % of
correct predictions:

Repeat Loop size

2

3

4

5

p=1/2

33%

43%

53%

61%

p small

50%

55%

62%

68%

The probabilities are the same as the PSD method for random
probabilities if jumping. The only disadvantage is the method takes
longer to settle down and it is not as good as the 2-bit method.

> 2-bit methods. If more than 2 bits are used then better
prediction can be done for the random probability of jump case
approaching the best case of % prediction right = max(p,1-p). The
pattern for small loops can be recognized so that if

true false false true false false

is encountered a prediction of true is made and so a 100% success
rate achieved for such loops.

An 8-bit method might be

New 8-bit = old 8 bit shift left 1 and
put in new jump at the right

so the 8 bits contain a historical record of the last 8 jumps. A
256 entry table would be used prediction indexed by the 8-bit value
with values like