162k polygons on screen at once at 28 fps (1152 x 864 Pixel) with Xith3d and Jogl in a real world example...Well, that is impressive.

Um, well yes and no. it may be imrepssive but it has nothing to do with Xith or JOGL.

Counting polygons rendered in of itself is pretty-much meaningless measure in todays world as all rendering isdone completely on the graphics card on any reasonably modern system. Even transform and lighting occurs on the card. All thats going to be the same regardless of HOW the data gets to the card.

The real issues are in manipulation of those polygons before they are rendered, the complexity and number of the state changes, how well they are culled and how well they are ordered to minimize those state chnages

Forget about screen fill rate, thats early 90's nonsense. By the time they get to the screen space its out of the software's hands. As such more meaningful measures are things like the total number of polygons in the world model they are rendering and the percentage culled.

Got a question about Java and game programming? Just new to the Java Game Development Community? Try my FAQ. Its likely you'll learn something!

Um, well yes and no. it may be imrepssive but it has nothing to do with Xith or JOGL.

Counting polygons rendered in of itself is pretty-much meaningless measure in todays world as all rendering isdone completely on the graphics card on any reasonably modern system. Even transform and lighting occurs on the card. All thats going to be the same regardless of HOW the data gets to the card.

The real issues are in manipulation of those polygons before they are rendered, the complexity and number of the state changes, how well they are culled and how well they are ordered to minimize those state chnages

Forget about screen fill rate, thats early 90's nonsense. By the time they get to the screen space its out of the software's hands. As such more meaningful measures are things like the total number of polygons in the world model they are rendering and the percentage culled.

I understand your point that the engine's preparation of what it will give to the graphics card is most important (see ID's smart engines). When the engine culled away all polygons which don't need to be drawn, there are still so many ways to feed/give the data to OpenGL with all those extensions. If the engine doesn't use a good way it's still useless.

It's the first time I've seen 162k polygons at 28 fps and high res screen with Java - out of a real world example where the whole universe has many many more objects and polygons.Well, since Xith is the rendering engine it all has to do with Xith, isn't it? Since Xith uses OpenGL it has to do with Jogl, too. We could try to use a Mesa based Xith. ;-)

Xith does impress me because it's a nice high level 3d engine and it's fast (according to Yuri and David it will even become faster in the future): David's real world examples proove it.

However I won't quote again a "SimpleCubeTest" because that is meaningless indeed.

The single largest state change is a texture switch. This generally involves a hardware pipeline flush (you get that with most state changes) & a texture cache flush (nasty).Screen fill rate can be an issue, but only if you are covering the whole screen 10 times with alpha blended polys. Most cards won't like doing that at all.Triangle throughput is ONLY a factor if you are having to transform in software, and then you will hit a bus-bandwidth limitation of around 1million vertices sent to the card per second. Aggressive stripping (using degenerate tris to connect strips so a single object is one strip) is well advised, preferably using something like the NVidia stripper, which has optimisations based on the size of the TnL vertex cache. However, these vastly inflate the tri throughputs as 25% of the tris are now zero size (& should be rejected by hardware).

For best performance:a) Strip your modelsb) Batch by texture (a decent 3d engine should do this for you)c) avoid sorted transparencies if you can - they have to be drawn by depth order, and so cannot batch by texture, and the fill rate is halved (on a good day)d) Cull whole objects to the view frustum in software, but dont bother culling tris in software - the hardware can do this much faster than you.

As a side note for (c), drawing transparent objects (particularly particles) as additive is great, as you dont need to sort by depth. Addition is a commutative operation, alpha-blending is not. This lets you draw your large particle systems all in one go without worrying about the sorting/texture switching.

d) Cull whole objects to the view frustum in software, but dont bother culling tris in software - the hardware can do this much faster than you.

- Dom

I guess you meant to say dont bother culling tris in software - unless that object is really big - if not then show me how 2k*2k plane be rendered faster -without culling unnecessary tris- compared to getting rid of unwanted tris procedurally.

Simple answer: Split the model up into manageable sections.For hardware TnL, you want batches > 200 tris.however, 2kx2k = 4million, which is a little large to throw at the hardware. You should break it into something like 40x40 blocks (1600 tris, nice batch size) and cull those to the frustum/distance and send them as single objects.I presume your meaning a terrain - in which case after splitting into square areas, you can then have lower detail versions (20x20, etc.) and converged versions to deal with level of detail (4 '40x40' blocks become a single '20x20'). If you dont deal with this then you have the problem of 'what if someone stood on a mountain right in the corner of my terrain and tried to look the whole distance'.

You can get away with a small subset of your scene being dynamically generated (ROAM, prgressive meshes, morph targets, etc), but any tris sent this way get hit by the bus-bandwidth. Whole models do not have this problem.For example:

hardware reading a model:

AGP RAM -> Hardware

V. fast, limit is AGP bus (4x you would hope)

Software dealing with it:

RAM -> CPU -> RAM (AGP if lucky) -> Hardware

Note 2 traversals of the CPU bus - one up, one down. This will start using significant CPU resources that could have been avoided. For a start, you won't be able to send 2 million vertices from memory to CPU and back within a single 20ms frame.

That is really simple answer I mean simple solution but it wouldnt work really with terrain - unless one wants to go with precomputed tile-system. But in any case I was just saying that your point d) was too generic. In my system I am culling by tris and performence is not bad. But I agree with the rest you said.

Sometimes sorting front to back for opaque geometry is more important than texture sorting. If you have enough texture memory on GPU, fill rate benefit of front to back sorting can be bigger than penalty of texture trashing.

That is true, particularly if the distant objects have multi-texturing. The latest cards also have z-buffer optimisations that can speed the z-test up a lot. Older cards (early geforces, ATI 7500 or earlier) won't benefit much, as the z-test is concurrent with the texture fetch, and keeping multiple pixel pipelines in sync prevents the early-out. However, if you are in a state with large overdraw like this you would be wise to be looking at techniques such as portals & basic occlusion comparisons to cull objects behind large opaque objects.

The simplest occlusion method is to have several visibility spheres on an object - large ones encompassing the whole object for frustum checking, and a smaller 'inner' spheres representing the opaque regions/ If an objects 'outer' sphere when rendered lies within a closer objects 'inner' sphere then it is occluded, so you can cull. You can use any shape, and there are articles (check Graphics Gems & Gamasutra). the most useful application for this type of system is for rendering city scenes. In this case, buildings want an inner cuboid for occlusion, and cars/pedestrians check their cull-spheres against this. Very fast & saves you even sending the model data to the hardware. Spend a little effort early to save the card a lot of effort later.

It all depends on your situation & what you are rendering, so there is no universal method that is guarenteed for all cases unfortunately, but I would say that texture batching is the most generally useful optimisation for PC cards.

Oh - and if you can, pack multiple small (no-tiling) textures onto larger packed pages, as this saves you a large amount of batching. You can cut down from 200+ individual textures to 20-30 pages and save massively on batching overhead & state changes. (figures taken from memory of a published game I worked on)

If you have enough texture memory on GPU, fill rate benefit of front to back sorting can be bigger than penalty of texture trashing.

Thats a pretty big 'if' though, most games use plenty of textures, but whether they suffer from lots of overdraw that front to back rendering would help with is debatable.

Of course Xith has stencil shadows built in doesn't it? So before doing your proper texturing and lighting rendering you need to create a perfect zbuffer anyway so you get to sort by material and get the benifit of early z-fail

While we're on the topic of optimisation, has anyone thought of adding the use of a HOM to Xith (perhaps adapted from http://www.jpct.net/jpct.htm ) to get accurate occulsion queries without clogging up the graphics card?

It's the first time I've seen 162k polygons at 28 fps and high res screen with Java - out of a real world example where the whole universe has many many more objects and polygons.Well, since Xith is the rendering engine it all has to do with Xith, isn't it? Since Xith uses OpenGL it has to do with Jogl, too.

Fallacious reasoning. if the scene never changes then all the work that is done by Xith OR JOGL is happening before the first frame. Frame counting thus is meaningless.

I'm not suprised this is the first time you've seen this kind of rate as Video cards keep getting better. In the purest case it has nothing to do with Java, Xith or JOGL.

In reality ina real app like mgicosm there ARE changes going on, so frame rate does have some meaning, but the polycount is still pretty much irellevent. The number of state changes and amoutn of texture info that has to be moved across the bus are both likely to be significant. The amount of work culled out is also likely to be significant. But the 'total polys in the scene" just isn't a terribly important measure in of itself and tells you nothing outside of the ability of your graphics card.

Got a question about Java and game programming? Just new to the Java Game Development Community? Try my FAQ. Its likely you'll learn something!

I've noticed that the FPS number of a test scene (200 k polys) halves when I switch the polygon mode from filled mode to line mode.No matter if I do it manually or with Xith's nice Renderoption.setOption(Option.ENABLE_WIREFRAME_MODE, true);

Since I don't plan to go for wireframe in the end, I don't mind too much. ;-)

However maybe any HW/OpenGL expert would like to explain why wireframe is slower than filled mode, please? When some++ years ago I implemented a polygon raster fill routine on good old Amstrad CPC (8 bit) it's been the other way round.

However maybe any HW/OpenGL expert would like to explain why wireframe is slower than filled mode, please? When some++ years ago I implemented a polygon raster fill routine on good old Amstrad CPC (8 bit) it's been the other way round.

I'm not an NVidia expert so this is a guess, but I wouldn't be surpised to find out they had optimized the usual case for the hardware (textured) and not the line case which is a lot more unusual.

Is it doing aliased or anti-aliased lines? Properly anti-aliasing lines can be some work.

Got a question about Java and game programming? Just new to the Java Game Development Community? Try my FAQ. Its likely you'll learn something!

Still Jeff, what people are looking for when they come here is a sense of whether Xith3D is up to the challenge of rendering commercial quality scenes with typical polygon / texture / shading loads. Other than pointing them at current usages it is hard to quantify the performance.

The questions will never go away. They don't really want to know how many polygons per second, they really want to know "can it do what I need it to do as fast as I need it to". But this question is impossible to answer without a detail spec defining what "they need", and even then it would be difficult to give them the answer they seek. What they are hoping for is that there is some metric which can roughly approximate the answer and give them some confidence that the proposed solution is real and not imagined.

The other thing is that users of Xith3D and other engines dont generally understand rendering designs and engines and don't want to... and there is a suspician that any open source engine would fall short of commercial strength. So they come here looking for information and they get anecdotal answers... no hard numbers. You can't fault people for wanting hard numbers, even if they don't know that the hard numbers don't really answer the question they want to ask.

We showed the same demo at Quakecon 2 years ago and were told repeatedly that it compared favorably to Quake 2.

Sounds well.Though I've been visiting java.sun.com for years (irregularly) I've unfortunately never seen impressive 3d stuff. Neither did anybody of my friends or colleagues.I've installed Java3d 3-4 years ago, played with the examples and while they've been nice I couldn't impress anybody with it. Has since then Java3d been optimized for game usage, new OpenGL extensions or such? Don't know. Last thing I read her some months ago has been that Java3d has been frozen.

Now I found Xith3d, asked some questions here, got very positive answers by the developers, then fed Xith3d with some 200 k poly test scenes and showed it to a few friends and they said: That's cool.

Usually, if you ask some 3d artist to do some 3d models for a 3d action game the first question he asks is: low or high poly models, how many polys allowed, and such.

Still Jeff, what people are looking for when they come here is a sense of whether Xith3D is up to the challenge of rendering commercial quality scenes with typical polygon / texture / shading loads. Other than pointing them at current usages it is hard to quantify the performance.

Said very sensibly.

Quote

they really want to know "can it do what I need it to do as fast as I need it to". But this question is impossible to answer without a detail spec defining what "they need", and even then it would be difficult to give them the answer they seek. What they are hoping for is that there is some metric which can roughly approximate the answer and give them some confidence that the proposed solution is real and not imagined.

Couldn't say it better.

Quote

and there is a suspician that any open source engine would fall short of commercial strength.

That's true, too. Still too many people think Opensource isn't as good as commercial software. However in many cases the opposite is true.As a long time Opensource user I know this well. (Couldn't work anymore without Gawk, Jedit, MAME, Openoffice, ...)

Quote

So they come here looking for information and they get anecdotal answers... no hard numbers.

Line mode on all PC cards is terrible, because the driver actually draws them a 6 tris to make 3 rectangles 1 pixel wide each. Its not 6 times slower as you save on the fillrate & texture lookups, but it is pretty shocking nevertheless.The same behaviour happens on ATI, NVidia, & Matrox cards last time I looked.

As an aside, I used to write commercial graphics engines, & static scenes of 200k tris is on the high side. For xBox (GeForce 3/4 level), we aim for around 100k tris on screen at any time (50 fps), split 50/50 for static background, the rest are dynamic lit objects. The major frame issues are due to skinned characters, morph targets, multi-textures, particle effects, & shadows. These can easily take over 1/2 a frame alone, so you need to get your scene rendering in less than half a frame if you can. And don't forget your UI - font rendering alone can be a pain as its a large fill-rate transparency with very low poly counts.

As an aside, I used to write commercial graphics engines, & static scenes of 200k tris is on the high side.

I see.

Quote

(..) skinned characters, morph targets, multi-textures, particle effects, & shadows. These can easily take over 1/2 a frame alone, so you need to get your scene rendering in less than half a frame if you can.

Oh yes. It's just the beginning. However since I use Xith3d it's much more fun now compared to the direct OpenGL way. :)

Still Jeff, what people are looking for when they come here is a sense of whether Xith3D is up to the challenge of rendering commercial quality scenes with typical polygon / texture / shading loads. Other than pointing them at current usages it is hard to quantify the [snip]performance.What they are hoping for is that there is some metric which can roughly approximate the answer and give them some confidence that the proposed solution is real and not imagined.

Agreed. My point is simply that polys rendered on screen and number of frames per sec of those polys is NOT a useful metric today. There is a time when renderers were in software when it was. But today its a totally pointless measure. Its worth educating people on why.

Quote

The other thing is that users of Xith3D and other engines dont generally understand rendering designs and engines and don't want to... and there is a suspician that any open source engine would fall short of commercial strength. So they come here looking for information and they get anecdotal answers... no hard numbers. You can't fault people for wanting hard numbers, even if they don't know that the hard numbers don't really answer the question they want to ask.

"There are three kinds of lies. Lies, damn lies, and statistics." ~Mark Twain~

Fact of the matter is that numbers are no "harder" then the annecodtes. One or two simple numebrs just aren't going to tell the story, as you pointed out as well.

So in fact, the BEST measure turns out to BE annecdotal, where those annecdotes are as close as possible to your intended application.

P.S. And remember, 50% of Americans graduated in the bottom half of their high-school class.

Got a question about Java and game programming? Just new to the Java Game Development Community? Try my FAQ. Its likely you'll learn something!

Is there already a date when the speed optimization will be started? Or is it too early to ask for such a thing.

I personally think that for now the major goal is to add functionality to Xith3D, so it is at least the same level [of functionality] as Java3D. Then we switch to agressive optimizations.

BTW, some people already started working on performance enhancements, so we have progress also in this area.

Regarding the performance question in general, I see major speed-ups on strategical engine desing, rather than on code fine-tuning [which does not mean we allowed to write poor code].

Now on performance tests. I think we should have as many tests as possible, at least to figure out the problematic points in design and implementation. This will help to enhance the entire engine, as well as write the performance tips, how-to's etc.

Yuri

(still resurrecting old threads).

Yuri, if you ever come here again, I don't ask you even to optimize yourself but please tell us what can be done..

java-gaming.org is not responsible for the content posted by its members, including references to external websites,
and other references that may or may not have a relation with our primarily
gaming and game production oriented community.
inquiries and complaints can be sent via email to the info‑account of the
company managing the website of java‑gaming.org