Jzsimd - Mxu Instructions List

Analysing the mxu_as AWK script, I can determine there are 58 basic SIMD instructions having a common opcode name. Amongst them, there is 12 SIMD instructions which can be subdivised into 48 instructions and 3 into 48 instructions and 1 into 8 instructions because they have some fix arguments (WW, LW, HW, XW, AA, AS, SA, SS, etc.) which partly change the behavior of a basic SIMD instruction. So, we may consider there are 146 SIMD instructions.

What I find interesting is that they've added not only SIMD instructions but a lot of extensions to improve the rather poor memory address modes MIPS has. The SIMD instructions themselves appear to be 32-bit wide only so not extremely useful outside of a few applications.

Indeed, most of SIMD instructions compute on 8 x 8-bit, 4 x 16-bit or 2 x 32-bit through two registers XD and XA as input/output.

pastebin: I have jz_mxu.h but the C version is incomplete (only 34 instructions have their C counterpart). So I guess some reverse-engineering are necessary to find out what the other 112 instructions do.

Indeed, most of SIMD instructions compute on 8 x 8-bit, 4 x 16-bit or 2 x 32-bit through two registers XD and XA as input/output.

Damn, you're a hawk, responding before my edit went through. Anyway, there are some 2x32bit ones so I retracted what I said, that and the extended address modes appear to only apply to loading/storing to a new register set. Of course all can not be clear from your information and that C source alone.

How do we use these extra instructions in GCC for example? Does GCC even support using thse extensions?

Guessing from what you've said, I imagine there is no datasheet available.

Very curious as I'd like to make use of some of these if appropriate.

nope, neither gcc nor as are aware of them. They use a hack by running this AWK script mxu_as on a .s file as a preprocessor to transform each of those instructions into a .word 0bXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.

of course, we can imagine to add them in binutils and gcc. I think the minimal would be to have them through __asm and define those media registers as v8qi, v4hi, v2hi and v2si types and let gcc allocate them for us. Intrinsics are a little more complicate, I think.

Like standard MIPS general purpose registers, media register has a special register XR0 hardwired to 0 as value. So it should be possible to make scalar operation like "D32ADD XR2, XR0, XR1, XR0, SS" being equivalent to "XR2 = -XR1". So if you need to use 3D integer vectors, the fourth register for output can be XR0.

I was wondering about two things:

- XR16 is only accessible through S32M2I and S32I2M, that is, you need a GPR to access the content of XR16. What purpose is that register ?
- some instructions like MAD and ACC accumulate an operation result to the output registers. For instance, "D32ACC XR3, XR1, XR2, XR4, AS" means "XR3 += XR1 + XR2; XR4 += XR1 - XR2". What will happen if we have the same register as output ? something like "XR3 += XR1 + XR2; XR3 += XR1 - XR2" is surely impossible as it should be done in parallel.

Have you noted this one which comments the 60 SIMD instructions for the Jz47xx MIPS core:
http://gitorious.org/~jz4740/linux_jz4740/jz_mxu_doc/blobs/master/jz_mxu_doc.c
The comments are in Chinese, however it's can be translated by Google.