As you had planned, your idea to do H-T without overdraw is another genius work on your part.

After thinking about it a little. I agree with you that it can be a problem, this amount of Draw Calls. In a way it reminds me of Doom and Defcon 5. I do not know if there are any more games that I draw by "lines" in SS. But both are quite bad in SS, similar or worst to 3DO, in contrast to PSX much better than both. It may be that it is easier to optimize calls on PSX because of the simplicity of the system against SS, or even because of the best optimized tools, although this year I doubt they were available even on PSX. Finally it is true that both are two ports very improvable for SS and that they had come from previous systems, which does not help.

All in all, I think it is a very good solution, for elements such as: Crystals, square shadows ... in short, simple forms, with a quad or two triangles. Still it would be possible to save the problem of mixing it with the VDP2. But as a sub-function within a larger function that simplifies this "subject" in SS, it would be ideal. Come on, what should have been done in his day.

From what I see of the image that you have shared, and the theory of the tutorial. It is a technique very oriented to closed scenarios, with rooms and with the basic primitive of the triangle.

I understand that it may be difficult for you to implement the BSP tree for both SS, primitive and open levels by levels like this of the Sonic.

I can think of a few suggestions, to see what you think:1) Try to create, or force, the triangle to a quad of the SS. In order to facilitate a replacement with a predefined texture, it works better. And avoid the overdraw of using forced triangles in SS.

2) Search that the geometry designs are designed to create more square divisions. For example, the plant in X, which is 45º above the quad of the ground, are divisions that seek to follow this angle. If this plant were at 90 degrees, the divisions would be more square.

3) For the theme of textures that look bad. I see complicated that you can automate this. That is, automatically cut the textures, so that new textures are created small, more adjusted to the division. Of course all this results in using more space of VDP1 VRAM.

4) I do not know if it would be possible to use RockinB's UV flat mapping code, which I found from my tests quite quickly. On these triangular primitives, to avoid generating new textures, although you will still have the problem of redrawing by deformation.

5) Use the triangle with a mask as an alternative to forcing triangles. But I do not know how you can translate the coordinates of the vertices to this "triangle", so as not to load the transformations engine or the BSP motor itself.

6) Finally, I do not know if you would use the LOD by distance. Perhaps these zones divided into more "ugly" triangles are not seen with the closest LOD.

Great work and encourage!

I continue with my research and analysis work for my entries. On the other hand I hope to finish my 3D engine proposal soon and share it with everyone.

Quake maps do work with OKish draw distance, but have a very high poly count and vertices count. Still, 30 fps with Sonic and some 920 drawn polygons on screen isn't too bad considering all the overdraw.I made some little progress on the bsp tree, but there are many problems to solve as nothing found online mentions quads or small polygons, so it might take a while to make it all work.

Again great job XL2!

It is getting closer and closer to the nearly 1300 of Sonic-R at 30FPS. Definitely, you are putting more and more to the limit the possibilities of the DSP (transformation and lighting) of the two SH2, along with the management of calls to the VDP1 and VRAM of VDP1 and sound system. Not to mention physics, collisions or AI. That is to say all that traffic by BUS-B passing through the SCU. Amazing!

I am increasingly aware that this is one of the keys to the REAL optimization of the SS. In fact, in his reflection with PSX, this is also part of the key to his good performance. And having a Profile system made it really easy to expose all the bottlenecks in an engine or program.

I've been on vacation these days, and I have not been able to see your SAGE build until today.

Great job! I would ignore most reactions. More without foundations. You have done what no one did. THANK YOU!

Reviewing your work in particular.

1. An implementation of the shadow with transparency VDP1 for Sonic very smart. To avoid the redrawn defect of the VDP1 and you take care when shadow cover other primitive this are a Color LUT or RGB from VDP1 color VRAM for correctly blend transparency.

2. A very nice particle effects and well implemented in this last build. Later when you see. If you have surplus performance and find a new and original way you could use "some" type of SS transparency. They would already be perfect.

3. I see that the effect of Metal(Sonic and enemies) changes according to the background. It's a totally original effect!

4. I see that you reach peaks of 750 elements on the screen, keeping 30FPS !!! Absolutely brutal!

5. I see that you still have a lot of main RAM, VDP1 VRAM and VDP2 Pattern RAM available. And I see that you have used almost 90% the VDP2 Color RAM.

6. I also see signs of using DSP sound !! 100% memory used! : D

7. The control in my case I loved. The problem in my point of view. Is that the game needs some kind of tutorial. Because it is not the typical Sonic that the typical user can find. The areas of falls and death are numerous. And the speed of Sonic can be deadly XD. In any case. The control under my point of view responds superbly. As a crazy idea, how would you see that the camera was always behind Sonic?

Well, one solution that I'm currently experimenting with :-Use the framebuffer from last frame.-If for each pixel the bit 15 is 0, it's palette code. Just draw a line using cram palette with transparency ratio.-If the msb is 1, use vdp1 half-transparency.Since you set the z distance, it won't create much sorting issues and would keep artifacts low.

The main issue is reading from the framebuffer, it's just slow.Like I mentionned, you can also use gouraud shading, so these fake polygons could still look nice.I'm not sure it could be done with textures because the width must be a multiple of 8 (I hate this limitation)

Forgive XL2, I wanted to have answered this answer of yours before.

I have a similar idea to solve the problem. I thought that you could create a function that knew when a part of an element (Normal, Scaled or Distorted Sprite and lines or polylines) is on top of another element of VDP1 or not. That is, the function knows when a vertex of an element is above an element of VDP1 or VDP2. If so, use VDP1 CC (Color calculation) Half-Transparent and when it is on the VDP2 use palette transparency shared with VDP2 color RAM.

There would still be the problem that certain elements, when they are on the border, did not make a transparency correctly.

In your idea something similar would happen, but only at the line level.

Maybe my idea is "faster" because it would be done in the transformation calculation part. Comparing common coordinates, it would not touch to read the framebuffer as in your idea.

Also, it may be useful to use both ideas in combination.

Who knows, even plus the trick of Burning Rangers.

According to situations, each form or idea may be more suitable for a better solution. Quality / Performance

Yeah, but the solution is to subdivide the map further. The PS1 had that feature in the sdk from day one. The Saturn, as usual, doesn't and can't really do it. I'm trying to find a way to clip textures in vram to subdivide it in 4, but I'm not sure it will work.

These days I was watching what you told me that the SCU-DSP would not be useful for real time tesselation. Looking at the PSX SDK, it seems that it uses the GTE for it, but I'm not sure that it's for everything. I mean, I think for tessellation. The GTE is used to recalculate the UV coordinates of the new polygon or polygons. But I think that to divide it uses the CPU. In a technical document a developer asks which is better to divide the GTE or the CPU and they respond to the CPU. Which we can apply to the SS because the SCU-DSP and the GTE are very similar doing addition and multiplication to dot product mainly. And the SH2 has divison instruction. Unless divisions can be made with multiplications of decimal numbers in the GTE and in SCU-DSP?

Well, it's a very creative solution on your part. I think that if you can facilitate the creation of content with this routine, it could be used well, in specific things. For large crystals for example.

With respect to making the transparency work in both VDP1 and VDP2. I still believe that you have to see how Burning Rangers does it. Well I think it's the best solution, using the SS pipeline.

The maximum that I have come to find out is that BR make two "spaces" of drawing are alternated in the VDP1. In first, only the opaque elements with their texture or color are drawn. And in a second the transparent elements and a part of the black opaque elements that cover the transparent elements.

The first space is drawn with a clipping system "total" and the second to half. For example for a final output resolution of 320x240 non-interlaced, with a Clipping System of VDP1 of 319x239. The second space will be 160x120 and then send it to VDP2 to NBG1 at 16bit color and using a Color calculation of this layer over the VDP1. The elements drawn by the VDP1 will look transparent over the VDP1 and VDP2 at the same time.

All for "one" frame. BR have 20FPS peaks stable.

Problems that still exist:

1) The elements of the VDP1 between them do not mix. Could we use VDP1 H-T? Solving the redrawing problems. Using only non-deformed elements in the vertical like: Scaled sprites, Normal Sprites or Distorted sprites like Billboards. Or new tricks like yours XL2.

2) Can we get to render the second space to the total resolution?

3) In Burning Renger the final layer of transparency is on top of everything, including the UI. Could we somehow avoid this problem? For example creating a mask on these parts. Or using a VDP2 layer for the UI.

Objective: Make a total Sun Lens Flare effect that works on VDP1 and 2 and that are 3D elements of VDP1.

What's better, dithering, or over-transparency? In my opinion, I would choose something that is real vdp1 transparency, and not the same screen door method that's been used since the genesis. The designers never had the intention of the output being precise, but with s-video, I would rather stick to real transparency, simply because it gives the Saturn a chance of having transparency. Oh, yeah, and I think the n64 can only do 50% transparency because mario in sm64 uses dithering when using a secret warp, so we shouldn't get too ahead of ourselves.

N64 have REAL Alpha Blending. Is a patent of Silicon Graphics. PSX and SS not have Alpha blending, it is wired half-transparency. In its documentation says it clearly all this stuff. We not invent theories or data... already they lied enough in the past: History of millions of polygons on screen at 60 FPS. We stick to the technical data, clearly documented. Please.

Half-transparency is REAL and totally useful in SS. The key is archive the best way to use it without caveats: No Redraw, in 3D primitives and OK blend in all pipeline(VDP1+VDP2).

Example to start whit real approach: Burning Rangers.

In my Table analysis we have at 65 titles whit VDP1 CC half-transparent use. We can research all advantages and disadvantages in each case(geometry, layer, quantity, area pixels, type of color and pool VRAM, size texture, color calculation use it...), and get right conclusion about the objective. In the same way are other column to analysis of VDP2 semi-transparency... Feel free to make research and share!

What if we just ask an sh-2 to make a distorted version of the shadow to be displayed as a sprite with affine transformation?

Everything is possible. But are you aware of the programming implications they have? Time, effort and knowledge. Right now we are in an early moment of homebrew for SS. With public official SDKs or Jo-engine all these things that you pose are far away.

To understand the problem of transparencies and the redrawing in SS. We have to dig deeper into our technical knowledge of the SS graphic pipeline.

Tessellate a Quad will not help in reducing overdraw or redrawing of pixels in the H-T VDP1.

In addition to using the VDP1 H-T (No Gouraud + H-T) has other problems added:- If the quad is very deformed it will take up to 6 (According to documentation) drawing cycles.- Ideally if It do not redraw any pixel, up to 2 cycles (This is a speculation on my part, being in essence the same as Gouraud.).- Restriction of color types to work well within VDP1.- It will never see the VDP2.- Simple to program.- Infinity blend layers.- 1 Level transparency. 50% blend.

In other hand, if we use the transparency of VDP2 with sprites of the VDP1:- It will not have redrawing problem.- It does it in a cycle.- Restriction of color types to work well within VDP2.- It will not ever see the VDP1.- More complicated to program.- Up to 2 blend layers. 1 real transparency, other MSB Shadow function whit a lot of restrictions. Finally 1 effective transparency layers.- Up to 32 levels of transparency.

Unless you want to hack his account and steal his code, he won't share Sonic R's code, Sega owns it.The scu dsp is super hard because of all the restrictions around it, it has its own assembly language, you need to dma data to its own ram, it has no division unit, you need to either scu dma data somewhere or fetch it with the sh2, it has very little ram and it's running at half the clock rate of the sh2. It's probably faster 99% of the time to just use the sh2.

To corvusd, the quad count isn't that relevant. A draw command takes something like 70 cycles minimum. The rest depends on the texture size, the drawn pixels, the color calculation functions used, etc.So lots of small polygons won't hurt performances that much, except maybe on the cpu side.The key is to reduce overdraw. Sonic R used a pvs, so it didn't eliminate overdraw but that's fast enough.Slavedriver engine games reduced the overdraw to a minimum, but it came at heavy cost for the cpu, with a complex portal system.In Sonic Z-Treme, I still don't have a pvs so there is lot of overdraw, but thanks to the mipmapping and lod, it's still very fast, drawing something like 1000 quads at 30 fps in some situations. In some scenes, the quad count is low (300-400), but so many quads are merged that it would otherwise be maybe 1200 quads.But the overdraw is still what's preventing even more polygons on screen, more than the cpu.So using the scu dsp wouldn't have such an impact right now.

Good! this value "70 cycles" is my nightmare. I can not understand this value and the rest of formula from SoE Tutorial. In same way, if it is possible to make a formula, it is possible to calculate the max data to VDP1. Really I think that is very difficult. For SS, but equal for PSX. I am convinced, to the point that the only thing that helped to optimize your GPU in PSX was the Performance Analyzer, which extracts and shows a great amount of data, for better all the data flow in the system and focus in draw GPU state.

EDIT:About SCU-DSP

Quote

SEGA SATURN TECHNICAL BULLETIN #SOA-10Saturn SCU DSP Demonstration ProgramThe DSP sample program performs 3D point transformation, i.e. it multiplies a 4x3homogeneous matrix by an arbitrary list of 3-element vectors (the fourth element of eachvector is presumed to be 1). The program attempts to take full advantage of theparallelism built into the DSP, and the transformation matrix, the input points, and theoutput points are transferred using the SCU’s DMA capability. The sample codeperforms point transformations roughly a third faster than the equivalent code written inSH2 assembly language, even allowing for the time spent transferring data into and out ofthe DSP’s memory. It is hoped that this program is general and useful enough to be usedin an actual development environment.

Quote

SEGA SATURN TECHNICAL BULLETIN #SOA- 8Saturn SCU DSP Tutorial

1.2 Advantages of Using the DSPThe DSP is a highly specialized processor intended to efficiently calculate sums ofproducts, as when performing matrix and vector calculations such as 3D pointtransformations or lighting calculations. When performing the sorts of tasks for which itwas designed, the DSP can be faster than the SH2, because it can load operands for onecalculation, perform a second calculation, and store the results of a third calculation inparallel. It can also perform a 32x32 multiply, yielding a 48-bit result, in a single cycle.The DSP gains an additional advantage when performing fixed-point calculations, since,when it stores its results to its data RAM, it can store either the lower or the upper 32 bitsof its 48-bit accumulator, whereas the SH2 must take time to explicitly reformat theresults of fixed-point calculations by using the “xtrct” instruction.

1.3 Disadvantages of Using the DSPThe DSP runs at half the clock speed of the SH2, so, while the DSP can multiply in asingle cycle, that cycle is twice as long as one of the SH2’s cycles.The DSP’s doesn’t have much memory, and the memory it does have is not mapped ontothe system bus, which means that the DSP must continually take time to copy its databetween its own data RAM and the SH2’s work RAM.The DSP is difficult to program. A routine that could be coded in SH2 assemblylanguage in half an hour might take half a day to write, debug, and fully optimize on theDSP

6. ParallelismThe DSP’s two main functional units (the ALU and the multiplier) can operate inparallel, as can its four buses and its four banks of data RAM. As a result, the DSP canexecute up to six instructions in a single cycle, including any or all of the following: oneALU instruction, an instruction to load the RX or P register, the MOV MUL, Pinstruction, an instruction to load the RY register or the accumulator, either the MOVALU, A instruction or the CLR A instruction, and a D1-bus instruction. These are theonly instructions that can be used in parallel with each other; other instructions require acycle of their very own.

All the rest Totally agree with you. But if we unload the SH2 from calculations that the SCU-DSP can do asynchronously. We could use the SH2 slave with the J. Burton code to reduce the overdraw. As? Part of my idea is the following algorithm sketch. According to those criteria:

Requirements:- Using the raster software by J. Burton. Rasterized part of the "distorted sprites". Converting them into Normal Sprites with the exact pixels. Like PSX. In my estimate up to 200 quads, 400 triangles. According to the R of the initial screen.- HSS always activated for distorted sprites or scaled sprites = or greater than 32x32- Do not use textures + large 64x64. Or the minimum ones on the screen.- Using Pre-clipping Enabled for quads outside the System Clipping Coordinates.- Using Pre-clipping Disable for quads that are always inside, whole or for the most part, of the System Clipping Coordinates.- User Clipping Coordinates always.- Use User Local Clipping if zones such as: Interiors, houses, tunnels, caves ... etc ..- Use End Code in textures with masked or transparent areas.- Use Transparent Pixel only for distorted sprites with texture.- Using End Code Draw and Transparent Pixel Draw is the most optimal, but less colors.

Now, we follow a typical Viewing frustum Clipping case:A) Near Clipping Zone:1) If the quad forms an angle between 90 and 45deg. a) Draw with VDP1 without mipmap. b) Gouraud is drawn.

2) If the quad forms an angle less than 45deg with the camera view. a) Draw with VDP1 with level 2 of mip-map. b) Gouraud is drawn.

3) If the quad forms an angle less than 30deg with the view camera: a) Change the texture quad to a flat polygon with a precalculated base color(representative of the entire texture) b) Gouraud is not drawn.

2) If the quad forms an angle less than 45deg with the camera view. a) Change the texture quad to a flat polygon with a precalculated base color(representative of the entire texture) b) Gouraud is not drawn. Flat Lighing(pallete CLUT levels luminance).

3) If the quad forms an angle less than 30deg with the view camera: a) Raster SH2 slave with precalculated flat color(representative of the entire texture). b) Gouraud is not drawn. No lighting

C) Far Clipping Zone:1) If the quad forms an angle between 90 and 45. a) Draw with VDP1 change to a flat polygon whit a precalculated base color(representative of the entire texture) b) Gouraud is not drawn. Flat Lighing(pallete CLUT levels luminance).

2) If the quad forms an angle less than 45deg with the camera view. a) Raster SH2 slave with precalculated flat color(representative of the entire texture). b) Gouraud is not drawn. No lighting.

3) If the quad forms an angle less than 30deg with the view camera: a) Raster SH2 slave with precalculated flat color(representative of the entire texture). b) Gouraud is not drawn. No lighting.

I am convinced that we can reach the limit of optimizing the problem of overdraw.

Well... I think the copyright. It possible that it not are under property of TT only. Maybe also of SEGA. Wherever, J. Burton not are all TT... and this code is very valuable. If you want try to contact whit him and try to that He share to the community. Let Go!