I did it! I managed to get my GPU skinning working!!! =D My small test featuring Bob the dwarf is now running very nicely!

Features: - Individually animated Bobs! They can be at different animation frames and even entirely different animations too, but I only have one animation to use at the moment... - Instancing! Each part of Bob (head, helmet, lamp, e.t.c) is drawn with a single OpenGL command no matter how many instances I have. - Bone interpolation is done on the CPU and uploaded per instance into a VBO. In my vertex shader this VBO is then accessed through a Texture Buffer Object (TBO). - Instance positions / model matrices are uploaded to a VBO and is marched over per instance using GL33.glVertexAttribDivisor(index, 1).

Sadly this program is still CPU-bottlenecked, with my GPU being able to process around 2.5x the instances my CPU can interpolate bones for. The above screenshot runs with 600 instances of Bob, has 16xQ CSAA (= 8x MSAA + 8 coverage samples) enabled since this does not affect performance due to the CPU bottleneck and runs smoothly at 60-61 FPS. With threading (and less anti-aliasing ) this could be improved to twice the FPS which would enable me to have over 1000 instances of Bob at the same time! I believe the ultimate solution though is OpenCL. That way I can just upload all the animation frame data on startup and interpolate bones for each instance on the GPU. This would offload everything to the GPU and I estimate that it would run at around 120-150 FPS with no CPU load at all.

I've been working on the same game for half a year now. My answer is "Yes, it will". It's an RTS, but I won't show any screenshots or anything since I'm not 100% sure. I won't announce any specific information about it since I don't like the pressure of having said "I'm gonna release this game in x months"...

nah opencl is probably great, but you already are very familiar with shaders and the opengl pipeline.Why learn something new when you can do it with something you already have, you probably won't have the infrastructure ready in your codebase for opencl also.

ps: I don't want to say something against learning new things of course^^. Just thought of a development processing view.

I think OpenCL is better than OpenGL for this, if only because it makes a lot more sense to read frame data from a buffer to fill another buffer with the per instance data instead of emulating the whole process with shaders, texture objects and transform feedback. OpenCL is meant for general purpose computing (= bone interpolation in my book), OpenGL is meant for graphics (= skinning).

Hehe, I just switched to a better slerp function which uses a threshold to avoid expensive trigonometric functions if the interpolated angle is too small and got a 3-4x speed in CPU performance. xD Now the CPU and GPU are almost equally busy, but now it's almost impossible to not be fragment limited. Bone interpolation (CPU) and skinning (GPU) performance is at 2 000 instances at 60 FPS, but if they are going to actually cover more than a pixel or so per instance (or if I want MSAA) I'll have to reduce the number of instances to around 1 500. Anyway, the point is that I've pretty much maxed out the performance gain from instancing. I'm pushing 2 million triangles per frame with skinning and I haven't even done any heavy optimizations yet. Well, I guess I won't be needing OpenCL for a while then... Off to actually being able to load other 3D models than Bob! xD

I can't say I understand exactly how quaternions work, but I do understand the theory of slerp and how it interpolates along the surface of a sphere... Like I said, this is lighting fast, so I don't see any need to optimize this further at the moment... xd

The trig and inverse trig functions aren't strictly needed. There are tons of possible implementations. The problem, if you will, with the fastest versions is that they require pre-computation (so multiple usages of starting & end points + auxiliary data) and/or some added constraints (like max angle between end points, only forward moving 't' and/or fixed step 't'). I'm assuming that you don't want to bother with any of that. I do have a really old untested version without any constraints that I could pull out and test.

WRT: trig look-up table...the problem is the relative error is huge for small angles and we're mostly interested in small angle. (Well not really, but that's the way most animation data works out in practice.)

Man, why do people insist on making easy stuff hard. SLERP (as a primitive) is freaking awesome.

One easy thing you could do is lose the normalization. The resultant quaternion will be very near one, so that can be replaced by a single step of some renormalization method (like Newton/Ralphson). So you'll trade a sqrt & divide for a couple of multiplies.

java-gaming.org is not responsible for the content posted by its members, including references to external websites,
and other references that may or may not have a relation with our primarily
gaming and game production oriented community.
inquiries and complaints can be sent via email to the info‑account of the
company managing the website of java‑gaming.org