I am looking into doing karatsuba multiplication at evaluation points 0,
+1, and infinity. We usually evaluate in -1 instead of +1.
The main advantage of evaluating in +1 is that it makes it more straight-
forward to avoid side channel leakage. (Currently, we completely avoid
all o(n^2) multiplication algorithms in side channel sensitive contexts.)
The disadvantage is that one needs to handle one more bit for some inter-
mediate operands. For the sake of side channel silence, these bits need
explicit representation as 0 or 1, as one cannot allow for any conditional
execution.
My first project is to write some mpn_addmul_2 assembly functions for
machines with slow 64b x 64b -> 128b hardware. E.g., all ARM designed
Arm64 cores have a throughput of 1/7 per cycle for that operation, while
plain operations has a throughput of 2 to 3 per cycle. If things pan out,
we might be able to reach a addmul accumulation performance of 5.25 cycles
per limb. Very far from great, but less awful than 7.
We might want to consider offering a mpn_sec_mul with better medium and
large size operand support. It is currently pretty naive:
mpn_sec_mul (mp_ptr rp,
mp_srcptr ap, mp_size_t an,
mp_srcptr bp, mp_size_t bn,
mp_ptr tp)
{
mpn_mul_basecase (rp, ap, an, bp, bn);
}
(But at least it is prepared for the future with its itch/scratch
interface!)
--
Torbjörn
Please encrypt, key id 0xC8601622