tg at gmplib.org (Torbjörn Granlund) writes:
There is no UMULHI instructions. UMULH is our 64b x 64b -> 64b highhalf
instruction.
The other instructions have 32-bit operands.
It might be worth noticing that multiply instructions with "long" in
their names do shorter multiplication than those without it. The long
ones have at least one 32-bit operand.
It is not clear what we should do about ARM inc's arm64 GMP performance.
My approach with karatsuba might not be the best one; we have cortex-a15
neon code which runs at 1.3 c/l for addmul_2; this corresponds to 5.2
c/l for an arm64 addmul_1. This matches my ideal karatsuba code (which
stays clear of neon).
So perhaps the way forward is using neon, with all the tribulations?
PS. I imply to critique against neon; it is a very fine set of
instructions. It just hurts to do bignum using SIMD, even well-designed
SIMD like neon.
--
Torbjörn
Please encrypt, key id 0xC8601622