This forum is for discussion pertaining to homebrew software for the Dreamcast, such as homebrew games, emulators/interpreters, and other homebrew software/applications. Porting requests and developmental ideas are not to be made here; you can make those here. If you need any help burning discs for homebrew software, this is the place to ask as well.

Hello,I will talk here mainly about the dreamcast version.
I use only very little lib pvr, the code is a new version or a modified version of the code of the lib pvr.

The lib can load a 3D model, read a .tar file (equivalent to a .zip without compression), pcx, png and bmp.

Currently it can display 18-19 k triangles per frame.
It can display 26k triangles per frame, if I do a double-buffer for vertex (The rendering on Dreamcast is done in 'parrallle' so it takes a double buffer to avoid data collisions), nevertheless it takes a lot of VRAM, and it's unlikely that you will display as many 3D model without code for the game.

So the best I think is to make a 30 fps version if you want to display the 26 k triangles by frames, (and that avoids me to have a double vertex buffer) and it also allows to have the animation by skeleton with 26k triangles ^^

For the speed of display, I think I managed to be close to the limit, currently according to my test I have 45 cycles for vertex calculations and 55 cycles for the index(so triangle) and 20 cycles for the transfer.

I tested on my console (with an SD card) :

LXDREAM:

An example code (but not finished, I have to hide even more its internal functioning):

this is pretty cool!
the z-near clipping and lighting is not implemented yet? or did i missed it? i only took a quick look so far, i really need to adjust to your coding style first. wouldn't it be better to write the assembly stuff in extra *.s or *.S files instead of inline asm?

I finally add my 2D functions, do 4 times the same code in a slightly different way is a bit painful.
This results in:

It took me a lot of time because I wanted to manage the 8bpp / 4bpp for each platform.
The Dreamcast is quite limited on this point, it only manages 1024 levels of color palette (say that the compression is not really at this level).
On PC 8bpp converted to RGBA (but if you use effects on the palette it will not work on PC ah ah xD).

I add the Backface culling on PS1, PC and Dreamcast.
On Dreamcast, there is no cross product in hardware, I made 2 inner products + subtraction :p

I hope to quickly finish the management of the Camera.
The 'Viewing frustum culling' will be easy to do, I already have my 3D box.

I have not updated for Github yet,I think I will compile the library for each platform, I will update the source code later.
So I think I'm close to a stable version, when it will be the case will remain that the Skeletal animation ^^

First of all congratulate you for your project and great work. It is immense and at the same time super interesting and useful.

I did not want to bother you very much, I just have some questions to ask you first about this:
"On Dreamcast, there is no cross product in hardware, I made 2 inner products + subtraction: p"

1) It has left me very crazy XD. Also with a lot of curiosity and I have researched a little. And yes, it's incredible but it's true! Already the PS1 in GTE has a function / macro for "Outter_product". That costs 6 cycles. It is true that the lighting operations "Inner_product" costs much more. But my question is: Are these types of operations really useful in 3D graphics? When the main utility is to obtain the vector orthogonal to two vectors, for example to calculate the normal vector of a face. But these vectors already come in the 3D information of the geometries do not?

2) If SH4 has cost 4/7 cycles to do "dot product". Do this alternative operation that can cost twice or triple the GTE of PS1?

On the other hand, I would like to suggest adding Sega Saturn to your library. I see that you like 2D games, and good SS was "the 2D machine of the 32bit". Also now the homebrew is very alive and the emulation has improved a lot. I see you drive with DC, and SH asm, so the SS will be familiar. It is true that their VDPs are very unique. But a challenge for a mind eager for challenges, will enjoy it!

1) So, yes, the PS1 has an instruction about it, it serves for the "Backface culling"

2) I didn't understand the comparison (But it seems to me that the PS1 has a dot product).

Otherwise, the assembly of Saturn does not worry me, and that the SH2 is known, it is better than the SPC700 or the VU or that there is not much documentation.

For optimization, I think the best is to redo SGL, the code does not seem to be optimizable without knowing the context.
But the function seems to me to copy more data than to make calculations.
Moreover on my lib, if I take care of loading the 3D model, it is also to reduce any conversion during the game

Especially if SGL = Saturun OpenGL, there is a problem.
For this I do not create OpenGL on Dreamcast or Ps2, the only console I know that can have OpenGL without performance loss is the GameCube / wii.

1) So, yes, the PS1 has an instruction about it, it serves for the "Backface culling"

It's true, just discover it a little before you see your answer. The GTE costs this instruction 6 8 cycles. According to my calculations in SS would be 15 cycles. I suppose that in DC, both are a SH then there it will be similar, worse in fewer cycles. It's strange that DC does not have an operation for it. Update: Outer Product 3 cycles possible cost in DC

2) I didn't understand the comparison (But it seems to me that the PS1 has a dot product).

Yes, it does as in SS in SH2 or SCU-DSP. It is not an instruction that makes dot-product, but MAC = mult + add. But the GTE makes 3 MACs in one cycle Really GTE take 2 cycles for MAC aprox. Although the total calculation of the Lighting function takes between 44 and 11 Cycles, the functions for Lighting:
NCDS
NCDT
NCCS
NCCT
CDP
DC
NCS
NCT

SS Sh2 does a dot-product in four 6 cycles with four 2 MACs. Similar to DC, per SH4 with one instruction. What it is not, exactly the equivalence with GTE functions for lighting. That's where my question went.

For optimization, I think the best is to redo SGL, the code does not seem to be optimizable without knowing the context.

Yes, both the programmer of http://www.jo-engine.org/ and Sonic Z-treme think the same. SBL as a base I think it would be wrong. The pity, is that there have been several attempts of great programmers of the scene, but they have not come to finish. A shame.

Especially if SGL = Saturun OpenGL, there is a problem.
For this I do not create OpenGL on Dreamcast or Ps2, the only console I know that can have OpenGL without performance loss is the GameCube / wii.

A correction, SGL, is not OpenGL, it's like the Sega AM team named to the libraries, mainly for 3D. Unlike the SBL, which were more general. It is true that OpenGL, if it does not have a well optimized driver, generates very annoying bottlenecks.

Thanks for answering to quickly. Anyway, if you dare in the jo-engine forum, segaxtreme you can find a few active users with SS. Some programmers even need help from SH assembler as I tell you.

Regards,

David Gámiz Jiménez

Last edited by corvusdeux on Fri Dec 14, 2018 7:48 am, edited 3 times in total.

It's true, just discover it a little before you see your answer. The GTE costs this instruction 6 cycles. According to my calculations in SS would be 9 cycles. I suppose that in DC, the SH will be similar, worse in fewer cycles. It's strange that he does not have an operation for it.

SH or MIPS are general processors, the advantage of MIPS is that we could add a coprocessor.
So on PS1, the coprocessor is the GTE and on PS2 the VU0

Yes, it does as in SS in SH2 or SCU-DSP. It is not an instruction that makes dot-product, but MAC = mult + add. But the GTE makes 3 MACs in one cycle. Although the total calculation of the function takes between 44 and 11 Cycles, the functions for illumination:

SS does a dot-product in 4 cycles but with four MACs. Similar to DC, per SH4 with one instruction. What it is not, exactly the equivalence with GTE functions for lighting. That's where my question went.

It is difficult to compare SH2 and SH4, SH4 being superscalar, so theoretically everything can be in '1 cycle' and execute 2 instruction at the same time

I did not understand you well. The SPC700 is the SNES sound chip, the VU? Not much information about these or the SH?

I meant that SH2 will not be a problem for me and it looks a lot like M68000.

I'm going to look at the Saturn dev, even if the SDK is not very good, for SaturnSDK, I can not compile GCC (it takes all my RAM lol) and libyaul has too much compilation error for the examples (and the end he tells me he misses 'elf32-sh').

1) It has left me very crazy XD. Also with a lot of curiosity and I have researched a little. And yes, it's incredible but it's true! Already the PS1 in GTE has a function / macro for "Outter_product". That costs 6 cycles. It is true that the lighting operations "Inner_product" costs much more. But my question is: Are these types of operations really useful in 3D graphics? When the main utility is to obtain the vector orthogonal to two vectors, for example to calculate the normal vector of a face. But these vectors already come in the 3D information of the geometries do not?

outer product and cross product are not the same. or i didn't completely understand what you meant here...

these vectors do normally not come with the 3D information of geometries. yes, you can include them. but it is normally not the case. i would include them if i would want flat shaded lighting. normally you use vertex normals, not face normals to do gouraud/phong shading, as it very expensive to calculate (correct) vertex normals on the fly. calculating face normals is *okay* in the most cases (= much much simpler when vertex normals).

matrix rotations use cross products a lot. or you do quaternions and convert to/from matrices everytime. physics code... i really miss a hardware cross-product instruction on the dreamcasts sh4... it should have been included

as interesting the hardware is, i wouldn't want to do 3D on it
these whole quad-drawing (= more like sprite stretching) is kinda offputting. it's an awesome 2D machine, and (still) the best looking games (even by todays standards) are in 2D or 2D with some 3D effects.
Playstation 1 and Saturns 3D is just plain ugly. It was ugly even back then. Nintendos 64 is much better, still ugly. Its textures don't wobble all over the screen, but they are soo blurry. I skipped these whole console generation more or less and played on PC (PCs 96-2002 years were awesome) and came back to consoles with the DC and later Gamecube. I didn't found the DC being that awesome technically (coming from PC, it wasn't), some games got me interested in it.

however...
i find the most stuff on GameHuts Saturn videos really interesting. The stuff about DSP, object fading/transparency and reflection effects are very neat. I appreciate the tech and skills behind it. Sonic R is still ugly though

NOTE: Sorry, I had to correct my previous post, with the rush I did not do the calculations well.
Thanks for your quicker response and complete @Kannagi:
"SH or MIPS are general processors, the advantage of MIPS is that we could add a coprocessor."
So on PS1, the coprocessor is the GTE and on PS2 the VU0
I did not know SH was limited to having a co-processor inside. Anyway, the Sony solution seems great, parallelism and equal frequency. It is true that you have to be careful when making them work together. But Sony in its SDK makes everything much easier.

"It is difficult to compare SH2 and SH4, SH4 being superscalar, so theoretically everything can be in '1 cycle' and execute 2 instruction at the same time"
I found this, it is quite useful to see the instructions they share and not, and the cost they have: http://www.shared-ptr.com/sh_insns.html

"I meant that SH2 will not be a problem for me and it looks like M68000."
Good!

Thanks @ b0b, what you are saying is very interesting.
It is possible that the cross product or outter product is not implemented, because you can do the same or very similar with dot-product, matrix vector multiplication, add, mult, sub and div.
In the case of DC you can do an outter product in 3 cycles. It's not bad at all. The SS will cost 25 cycles if only one SH2 does, but if you use two SH2 half 12 cycles, if you use the SCU 3 times less, approx 8 cycles.

EDIT: If true that the vertex normal unit vector are more usefull data like normal face. Well, store this data in the geometry file, not are a lot of information 12 bytes per vertex. And this machine not have a high polycount scenes.

On the other hand, on the 3D of the 32bits vs others, as they say in my country: "Like tastes, there are colors"
For me personally, it is a challenge and to bring justice to limit this wonderful 2D / 3D machine. And PS1 for me is the example to follow. Which on the other hand is completely possible, seeing examples such as Sonic R already in his time.