People here think that mapped objects will increase performance of their application. In this thread I want to show you an example, how to make use standard java classes for a decent speed up, which wouldn't be possible with mapped objects (at least I guess so).

NOTE: This is NO offense to anybody. I just think people are focusing too much on mapped objects. Please proove me wrong for anything I write, that's what the discussion forum is about!

At first I want to beg you reading my personal definitions of structs and mapped objects I posted on this topic, just to have a common base. sorry for the cross linking

Ok, here we go.

I wanted to put a real word example for games, so I thought vertices of a triangle-mesh might be a good one, since you have to put them into a Buffer in order to send them to the graphics-card.

Let's say a vertex consists of a position, a normal, a tangent and a 2D texture coordinate, so we can do normal/parallax mapping - I assume the binormal to be generated in the shader. Further, there has to be a perfomance benifit by using mapped objects, so let's the the data be dynamic, because we do something like software skinning. However, the texture coodinates are static, as it should for most type of applications.

First, I'll try to imagine how a Vertex class may look as a MappedObject.

In order to make it simple I just use fields and assume accessing them manipulates or reads from a buffer.(Since there a lots of different possible implementations described on this board, please comment how to change the following. )

This is were IMHO opinion the first problem occurs, since on the application side a vertex is a logical unit, but for the graphics-card, you have to split up static und dynamic data. Well, you don't have to, but I'm sure everyone will agree that sending static data (here the texture coordiantes) every single frame over the bus really decreases performance!

1 2 3 4 5 6

classDynanimcVertexextendsMappedObject {Vector3pos, norm, tan;}

DynamicVertex[] dynamicVerts;Vector2[] staticVerts;

OK, it should be possible to create both from the same underlying ByteBuffer the Vertices are mapped to, but your dynamic data isn't packed tightly anymore. Altough it is possible to adjust the stride parameter, e.g. for glVertexPointer, OpenGL performs much better with tight data. However, the main problem is that you have to send the whole buffer down to the graphics-cards. The only proper solution is to have 2 Buffer, one for the static and one for the dynamic data.

Now my questions is, can an object be mapped to 2 different buffers?I guess not, at least maintaining their promised performance.

Summing up, from my understanding there is no way having 2 different Buffers (static and dynamic) and a object mapped to both, here the Vertex-class with its positions, normals and tangents mapped to the first buffer and the texture coordinates mapped to the second buffer. IMHO not having a single Vertex class is somehow ugly code.

So far, I focused on a possible limitation of mapped objects, from now on I'll try to explain how Java's standard classes (reference types) can be used to increase performance.

What astonishes me most is that Java guys, still think in a C/C++ manner. I know a low of people complaining that in Java classes, in contrast to C#, can only be reference types. Comparing to C/C++, an array of a class is an aray of pointers:

As the graphics guys of you know, some of the vertex data rely on face information, like the normals (smoothing groups/hard edges) or materials (different textures/colors). Therefore you have to split a vertex whenever two neighbouring facess either have

- a hard edge, resulting in a different normals for the same position- diffrent materials, resulting on different texture coordinates for the some position

since the tangent depends on both, the normals and the texture coordinates, it will be different for the same position if one of the above cases is true.

Most implementations however, don't use the information that the positions for the duplicated vertices are the same. The same is true for the nomals, splitted by different materials.

since all are reference types (arrays pointers) they point to the same data (e.g., positions == vertex[j].pos, is true for all all vertices j, duplicated accordingly to the face materials and hard-edges)

all vector arrays are duplicate free, which saves you from doing multiple modifications. this means you save modifications of the positions, whenever faces, which refer to a vertex referencing this position, have different face materials or hard edges). Same for the normals..

Of course the benifit depends on the data, actually there is none if a mesh only has a single material and no hard-edges. For my models, position transformations usaully reduce to ~2/3. You say saving 1/3 isn't much? keep in mind that for example software skinning usually transforms a position 1-4 times. futher there might be other modifications as well (morph-targets used to simulate muscle contraction,..). With all that you can speed up modification say about a factor of 2 (double number of FPS , if this is your bottle neck ), which isn't bad IMHO.

Please tell me if this technique would be possible wih mapped objects? if not I fear of loosing my 2x speedup, by removing the IMHO no so markable copy operation to the buffer.

This is brings me up to my conclusion, which is the bottle neck. I'll never complaining that copying values to buffer would infect the performance of my applications, as long as I'm not sure whether there are other possible optimizations, which have a greater impact.

further, I benchmark the scend techniwue for the 2x speed up (real app nok micro sh#$)Again note this was NOT a comparison between struct code and without, just saving modifications or not! I only asked whether this optimisation would be possible with structs/mapped objects.

Then Java has a problem (...) This is caused by the random-access-style of fetching these objects from RAM, which are all over the place. With a buffer it's pretty much a sequential read. So don't blame Java, it occurs in all languages.

first half deals about dividing mapped objects into static and dynamic data, whic seems to be problematic to me. second half shows an optimization technique, which I doubt to be possible with mapped objects.

I think you should try and work through the actual memory accesses to understand the issue first.

Well, short answer. which memory access are you thinking of, accesing java fields or putting data into a buffers? I only use standard field access, this is by mean fast enough for me. The only except is putting the dynamic data once per frame to a buffer. Are you talking about this kind of memory access? If so, please tell why simple putting data into a buffer can have a great impact on the performance.

One important thing that mapping to ByteBuffers deals with is how it interfaces to I/O .. both I/O to native code and network or disk structures. Being able to set the Byte Order is important... being able to map to Direct Byte Buffers so the C code can have efficient access to the data is also significant.

maybe I got s.t.h wrong, but are you explaining why buffers in general are needed? I did never doubt that, I just argued that mapped objets, which fetch their data directly form a buffer aren't that well IMHO.

I'm not against an automatic, efficient put / get mechanism from an object (struct) to a buffer. In contrast that would be nice, because some bounds checking could be eliminated and it would make the code somehow cleaner.On the other hand, still nobody has told me about a situation in real app, where copying the data to a buffer would be the bottleneck. so there should no hurry, even for this type of optimization.

On the other hand, still nobody has told me about a situation in real app, where copying the data to a buffer would be the bottleneck. so there should no hurry, even for this type of optimization.

I wrote a GIS mapping engine from scratch. To make it perform fast, I stored the mapping data directly in memory. We're talking about hundreds of millions of objects worth of data. First of all, I can't even hold those many objects in a 32-bit JVM, because their headers alone, would cost me a gigabyte of memory. Second, copying that data around just to use it would be pure insanity. Third, having that many objects in memory thrashes the gc every time it goes to do a full scan.

So, to get around these problems, I load the data into DirectByteBuffers which I wrap with flyweight classes. I think structs is a terrible name for this. It'd be better off being called FlyweightView or some such thing. The whole point around this, is that you're avoiding the creation of huge numbers of stateful objects. Flyweight is a well known and understood name for this.

If I weren't bothered with a bajillion other things right now, I would just write the bytecode transformations for it so there would be an end to the discussion. /sigh

On the other hand, still nobody has told me about a situation in real app, where copying the data to a buffer would be the bottleneck. so there should no hurry, even for this type of optimization.

I wrote a GIS mapping engine from scratch. To make it perform fast, I stored the mapping data directly in memory. We're talking about hundreds of millions of objects worth of data. First of all, I can't even hold those many objects in a 32-bit JVM, because their headers alone, would cost me a gigabyte of memory. Second, copying that data around just to use it would be pure insanity. Third, having that many objects in memory thrashes the gc every time it goes to do a full scan.

So, to get around these problems, I load the data into DirectByteBuffers which I wrap with flyweight classes. I think structs is a terrible name for this. It'd be better off being called FlyweightView or some such thing. The whole point around this, is that you're avoiding the creation of huge numbers of stateful objects. Flyweight is a well known and understood name for this.

If I weren't bothered with a bajillion other things right now, I would just write the bytecode transformations for it so there would be an end to the discussion. /sigh

First of all sorry for the late reply, I missed the post updates..

Can you please tell me a bit of the structure of your hundreds of millions of objects, because it difficult to answer prooperly without knowing them. Further, it would like to know how you manipulate the date, before sending it to the the graphics-card, because this is obviously the point where some kind of java-'object' representation (in your case: FlyweightView) comes into the game. Finally please tell me the commands (OpenGL) you use for updating the geometry.

I'm pretty sure, with this information I can explain you my point of view more easily.

Being able to set the Byte Order is important... being able to map to Direct Byte Buffers so the C code can have efficient access to the data is also significant.

Specifically for interacting with huge gobs of C++ data. Passing the data through a DirectByteBuffer with a strongly typed flyweight facade means great speed with very little abstraction penalty.

I was wondering, why people seem to think that I ever doubted the performance benifits of using DirectBuffers. What really questions me is the following:

Say I have a direct buffer and the native ordering is littel endian, further you wrap a flyweight object (sliding window or what ever) around this. Let's say this has x,y,z properties for a 3D vector. Now you're performing certain operation, e.g. a scalar multiplication, on all your objects. but since the scalar is standard java type (big endian) the two factors have a different endian. Now my question is can the java vm efficiently calculate with them?

Riven is correct; if you expect high performance on wrong-endian data you've got an error in brain space 0 and need to reevaluate your career choice High performance btw. is only half the story; the other half is a clean and uncluttered (ie. not error prone) way of accessing legacy data structures directly with as little effort as possible.

In case of intel CPUs, this means LITTLE_ENDIAN, but Java types are all BIG_ENDIAN.

As a result your mapped object, sliding windows or whatever will have a LITTLE_ENDIAN ordered buffer as data backup. Further, doing computations involving Java's primtive types (BIG_ENDIAN) result in a mix of both endians. Now I asked, whether the Java VM can handle this performant?

In case of intel CPUs, this means LITTLE_ENDIAN, but Java types are all BIG_ENDIAN.

Java "the language" is big endian - Java "the virtual machine" is not.

That is, to say, Java types look and feel as if they are big endian to you (the programmer), but they are stored in your machine's memory in the native byte order of your machine. If that were not the case, I couldn't even begin to imagine the kinds of performance hits Java would suffer when performing even the simplest of operations.

Language features (like bitwise-operators) that make Java feel like it is big endian, are just there to help you write portable code.

java-gaming.org is not responsible for the content posted by its members, including references to external websites,
and other references that may or may not have a relation with our primarily
gaming and game production oriented community.
inquiries and complaints can be sent via email to the info‑account of the
company managing the website of java‑gaming.org