In a land far far away (meaning some 10-20 years ago) it used to be very important to do all kinds of extreme optimizations to get games and other performance-sensitive code to run at a decent speed. Often, one important optimization was to remove expensive arithmetic operations like multiplications and — heaven forfend — divisions. Anyone who was around then tends to have amassed a lot of “tricks” for this, which these days are really just meaningless trivia occupying valuable brain real-estate. I’m hoping that writing “my” tricks down might help me purge them from memory. Hey, it’s worth a try!

Multiplies through table lookup
On certain old 8-bit machines, such as the Apple II which had a 6502 with no multiply instruction, you would do multiplies either “longhand style” in code, or through table lookups. Table-based multiplication would either use a table of squares

With the table of squares either the numbers had to be swapped so a was the larger number to avoid a negative index in the second table lookup, or the table had to be set up to handle negative indices.

The same operation can be done in 3 multiplies, 3 additions, and two trigonometric calls (as well as a division by two, which was usually handled by folding it into the trig table):
t = y + x * tan(θ/2)
x' = x - t * sin(θ)
y' = x' * tan(θ/2) + t

Here the relevant identities are sin(θ) = (1 + cos(θ)) * tan(θ/2) and cos(θ) = 1 - sin(θ) * tan(θ/2). Furthermore, this trick is really just an application of the fact that a rotation can be seen as repeated shears (see Alan Paeth’s article “A fast algorithm for general raster rotation” on page 179 of Graphics Gems).

Multiplication of two 3×3 matrices
The standard way of multiplying two general 3×3 matrices is 27 multiplies, 18 additions. We can get this down to 24 multiplies using Winograd’s algorithm:
A[i] = sum(k=1,n/2) a[i,2k-1]*a[i,2k]
B[j] = sum(k=1,n/2) b[2k-1,j]*b[2k,j]
c[i,j] = sum(k=1,n/2){(a[i,2k-1]+b[2k,j]) * (a[i,2k]+b[2k-1,j])}-A[i]-B[j]

It is possible to perform a 3×3 matrix multiplication in 23 multiplies (and many more additions) using algorithms by Laderman [link], and Johnson and McLoughlin [link1][link2].

Others
OK, so there are some others too, such as quaternion multiplication in 8 multiplications and cross products in 5 multiplications, but this is what I had time for right now. What other old-school multiplication tricks did I leave out? Comments invited!

Pierre Terdiman said,

The log/exp tables were indeed quite useful: not only it was faster, but also it consumed a fixed amount of cycles compared to MULS and MULU on a 68000. So, using those log/exp tables was pretty much the only way to do 3D in “fullscreen” on Atari ST, for example. (Basically it was not possible to use multiplications and divisions in this mode).

Another similar trick was to rotate the bounding box of a 3D mesh, and simply compute N vectors along its rotated X, Y and Z axes using interpolation. Those N vectors are stored in 3 small LUTs, and then the object’s coordinates are simply used to index those LUTs, and the vectors are added together to compute the “correct” rotated 3D point. Since the interpolation was almost free, it effectively transformed the “9 muls into 9 adds”, as I wrote in a small article a loooong time ago. It was very effective on the 68000, and we used it pretty much everywhere :)