If you have any questions on programming, this is the place to ask them, whether you're a newbie or an experienced programmer. Discussion on programming in general is also welcome. We will help you with programming homework, but we will not do your work for you! Any porting requests must be made in Developmental Ideas.

First, I try to de-compile some C code making division with different type of data.
But assembly code is quite difficult to understand (even using 01 directive).
There is always call to ___udivdi3 / ___floatundisf functions.

But all I want is to see some samples to make divisions using only assembly code of SH4.

I try to Renesas site but there are only general information about registers (FR0..FR15 / FPUL), even in the programming manual there is only description of instructions.

Here is the type of division I want to make in assembly :

float MakeDiv(uint64 n, int d)
{
return n / d;
}

float MakeDiv(int n, float d)
{
return n / d;
}

float MakeDiv(float n, int d)
{
return n / d;
}

Besides, I wonder how to load a floating point (For example let's take PI: 3,1415926535897932384626433832795) in a FRX register in assembly ?

Integer and floating point(fp) division are completely different to the cpu. The compiler uses internal functions to handle integer division because it has to be done using special step-by-step instructions/algorithms. The compiler calls a function to do this so that a large chunk of code isn't repeated. The fpu on the other hand, has a single instruction that performs division on fp numbers.

In C/C++ mixing fp and integer brings in promotion/demotion of types so integers will usually be promoted to fp using the fpul register.

the following code shows the usual operations when compiled even with O1 (sh-elf-gcc -O1 -S fpudiv.c)

First to load directly a constant floating point value like PI in a floating point register,
the floating point value must be converted in the "IEEE754 Single precision 32-bit format".
There are some online converters and source code to do this. Then, simply hard code the value in
the assembly code : https://www.h-schmidt.net/FloatConverter/IEEE754.html

Thus 1078530000 (40490FD0 hex) value in the source code up is the "IEEE754 Single precision 32-bit format" of PI (3,1415926535897932384626433832795).

That's answers to my last question.

Division between integer 32 bits and floating point 32 bits seems to be easy when I understand some things :

1) FPUL is used to transfer value from integer world to floating point world
2) FLOAT is the instruction making conversion between integer 32 bits format to "IEEE754 Single precision 32-bit format".
3) Parameters registers in floating point world behaves like integers : FR4 = first parameter, FR5 = second parameter and FR0 is result.

I do not understand the code, there are no floating point instructions or registers ...
It calls two functions and use r6 register. I imagine r4 and r5 hold the uint64 and r6 the integer.
But r4 and r5 are cleared by moving r1 and r0 values on them ...

r0-r3 and fr0-fr3 are always used for return values so when a function returns, the result will always be in those registers. r4-r7 and fr4-fr7 are always used for parameters so if the parameters for a function we are calling (callee) are already where they need to be, we can just branch to it. Since floating point numbers are always inexact, they are able to hold much larger numbers so even though a uint64 needs two integer registers, it can still fit in one floating point register, unless you need double precision, in which case the result will be in dr0 (i.e. fr0 + fr1).

Integer division is harder than floating point because it has to be exact. This usually requires operating on each and every bit like you normally do on paper with long division. There are a number of different algorithms for doing this, but it does take many instructions so rather than duplicate a long block of code, the compiler just calls the internal integer function for ( long long ), __udivdi3.

In the first example, we are just passing along our parameters to the __udivdi3 function because they are already where they need to be. That function will use them as needed and return the result in r0+r1 so we don't care about them anymore.

Since ( long long ) is stored in two integer registers and fpul register is only 32-bit, converting a ( long long ) needs to be done manually (we can't use lds r0, fpul). Therefore, we pass our result from __udivdi3 which is in r0+r1 to __floatundisf which needs them in r4+r5. When __floatundisf returns, the result will be in fr0 already so we can just return to the function that called us. It's address was saved on the stack using r15 at the start of our function. Every time a function is going to call another function, it needs to save its caller's address on the stack.

So you can see you had the right idea with the second example

It can be hard to follow because the compiler reorders the instructions for optimal performance, but just remember that when a branch is taken on sh4 cpus, the next instruction is usually executed before the branch so: