Optimising Java code usually means working at a higher level to guarantee
that your object structure and class hierarchies are efficient.

You've laboured away all semester on your 3D masterpiece.
It's packed full of interesting features, clever behaviours, gorgeous geometry
and tasty bitmaps. Java3D shouldn't get any sweeter; this is what the Wedge
was built for. But there's just one thing ... the performance. Your magnificent
application crawls along at a snails pace. It's real-time alright, but only
if you think in geological terms. The universe itself will end before your app!
It chugs. It bites. It blows. It stinks.

So how do you crank up the speed? In this article we'll
run through a number of general optimisation techniques for all Java programs
as well as providing some hints about how to deal specifically with Java3D.

Before we start, lets get one common misconception out
of the way. Many people think that Java is inherently slow. It isn't. There
are plenty of examples of Java running just as fast as the equivalent C++. And
it's not just on contrived benchmarks or specialised numerical code: full-blown
Java applications can be as quick as native compiled code if you are careful
to avoid some obvious bottlenecks. So tempting as it may be to blame the Java
Virtual Machine (JVM), the problem is almost certainly elsewhere. The trick
is to avoid excessive use of the more expensive features of Java. A little careful
optimisation and your sedentary Java code will be buzzing on a caffeine high!

One other thing before we start:

"More computing sins are committed in the name
of efficiency (without necessarily achieving it) than for any other single
reason - including blind stupidity." - W.A. Wulf

I'm not sure who Mr Wulf is, but that's some sound advice
he's offering. Someone who I have heard of is Donald
Knuth. Don isn't known for his work on 3D visualisation, but he's still
got heaps of street cred so it's worth listening when
he says:

"We should forget about small efficiencies, say
about 97% of the time: premature optimization is the root of all evil."
- Donald Knuth

The root of all evil. Strong words, but the message from
both these lads is that optimisation should come second - program correctness
must always come first. Make your program work, then make it work faster.

That said, it's easy to go too far the other way, and
treat optimisation as one of the final steps in debugging. Although you shouldn't
optimise too early, performance is a Fundamental Requirement of any real
time application. That's Requirment with a capital R and like all other requirements
it must be considered carefully in the design, not left until the last minute.
In fact most of the techniques described in this article relate as much to design
as they do to implementation. Java doesn't give you the same opportunities as
C++ when it comes to ultra fine code tweaking. Optimising Java code usually
means working at a higher level to guarantee that your object structure and
class hierarchies are efficient. The bottom line: design an efficient system,
build it, debug it, then profile and optimise it.

Enough with the generalisations. In the words of Maverick
and Goose in the 80's classic Top
Gun, "I feel the need, the need for speed"...

Slide 3 : 3 / 20 : 1. Minimise object creation and use of Strings

Basic Java Optimisation Hints

1. Minimise object creation and use of Strings

The Java programming style encourages you to create lots of little objects
which don't hang around for terrible long

Especially true when working with strings

How many object in the following code ? ...

Perhaps the cardinal performance sin in Java is to create
too many objects. The Java programming style encourages you to create lots of
little objects which don't hang around for terrible long. This is especially
true when working with strings, where the compiler will magically create many
little objects for you. But it's bad practice, and it can really hurt you're
performance. For example, take a guess how many objects are used by the following
code snippet:

1. Minimise object creation and use of Strings

Big Objects...

chew up virtual memory

swamp your valuable memory bandwidth

pollute your CPU cache

force the garbage collector to do more work

One big issue : the garbage collection

The answer is twelve (check out StringBuffer
if you want to learn more) [1].
Twelve objects for a tiny little trace message! Allocating memory on the heap
for just one of those objects is actually quite costly. It may not look like
much, but the new statement is an expensive operation.
Java objects aren't small either. The minimum size depends on what JVM you're
running. If you're very lucky you might get away with a minimum of 18 - 20 bytes,
but more typically the minimum object size is 40 bytes. And remember that's
the minimum overhead the JVM requires ... actual data will be extra! Big objects
are bad because they chew up virtual memory, swamp your valuable memory bandwidth
(the most limited hardware resource in current machines), pollute your CPU cache
and force the garbage collector to do more work. With our little snippet above,
all 12 objects have to be allocated on the heap, those big 40 byte headers have
to be filled out with all the information the JVM requires, then the data is
copied in, they get used briefly, and then they're discarded. But of course
they don't go away immediately. Oh no, they hang around until the garbage collector
finally catches them.

And all that takes time.

In a loop, a simple Object creation becomes a set of object creations

In a tight loop, even a handful of unnecessary object
creations can blow out a big chunk of memory and cost you a lot of time. What
kind of tight loop? Well how about the rendering loop called every frame in
your Java3D application.

Example of loop : the rendering loop called every frame in your Java3D application

Be very careful about how and when you create objects (new and String
use)

Pre-allocate and recycle your objects: get the memory once and then reuse
it

"setIdenty" to get a clean Matrix

The answer is to be very careful about how and
when you create objects. Take particular note of where you use new
and where you work with strings. Where possible you should pre-allocate and
recycle your objects: get the memory once and then reuse it. If you always work
with a fixed number of objects each don't allocate them every frame, allocate
them once before the first frame has started. For Java3D code Point,
Matrix
and Vector
are ideal candidates for this treatment[2].
Working in 3D inevitably involves matrix and vector manipulation. If you allocate
your matrices every frame then you'll burn memory, waste cache and force the
garbage collector to do much more work (see tip
#4). Much cheaper to call setIdentity
than to create a new object every time you need a fresh matrix.

Slide 5 : 5 / 20 : 2. Take a good look at your method call chains

Basic Java Optimisation Hints

2. Take a good look at your method call chains

General rule of thumb static methods are cheapest, then final methods, then
instance methods, then interface methods and finally synchronized methods:

Methods. Great things aren't they? A lot of fun. But method
calls in Java are a bit more involved than in other languages and in the wrong
circumstances they can become quite expensive. How expensive? Well that depends
in part on your JVM but mostly on the type of method. As a general rule of thumb
static methods are cheapest, then final methods, then instance methods, then
interface methods and finally synchronized methods:

static < final
< instance < interface < synchronized

And the difference is not trivial: in one popular JVM
interface methods take three times longer to call than static methods declared
in a class. In the same JVM synchronized methods are almost seven times slower
than statics!

Single method call the cost is still pretty minimal, but chaining them together
mounts up the cost (methodA() calls methodB()
calls methodC())

Long call chains are a common feature of Java's event model

Now for a single method call the cost is still pretty
minimal. But once you start chaining them together (methodA()
calls methodB() calls methodC())
the costs mount up. Long call chains are a common feature of Java's event model:
"this listener registers with that component and when it receives event
X it passes on event Y". This is known as delegation and it can be an elegant
technique to simplify your design. However, if you delegate your event handling
too much you get long call sequences. If such a sequence is called every time
the user twitches the mouse, then the costs can blow right out.

Sometimes you don't have any choice about the type of
method you use. If you're writing an AWT event listener you simply have to implement
one of the well-defined interfaces. But for your own code you should design
the shortest possible call chains for the performance critical code. The crucial
design tips are:

understand the costs of different types of method

use static and final
where it makes sense but without compromising generality or reuse

use abstract classes instead of an interfaces
if you can

keep call chains as short as possible

use recursion with great care

always make your private methods final

Slide 6 : 6 / 20 : 3. Thread synchonisation is expensive

Basic Java Optimisation Hints

3. Thread synchonisation is expensive

Multithreading is good to improve user interface reactivity, but...

Acquiring locks to guarantee thread safety is slow

If you intend to make extensive use of Java's threads you probably ought to
grab a couple of good textbooks

AWT uses a couple of threads, Java3D uses lots of threads

You don't have to lock everything : identify a minimal model for synchronisation

This one is common sense really: acquiring locks to guarantee
thread safety is slow (see tip #2). Concurrent
programming is a major subject in it's own right and if you intend to make extensive
use of Java's threads you probably ought to grab a couple of good textbooks.
Even if you don't create any threads of your own, the Java libraries use their
own: AWT uses a couple of threads, Java3D uses lots of threads. You're may not
be alone... But remember just because your application has multiple threads
doesn't mean you have to lock everything. An important part of designing
a concurrent system is to determine what data structures each thread will touch,
and so identify a minimal model for synchronisation. Look carefully at the Java
class libraries too, especially the data containers (see tip
#5) some of which are thread-safe while others are not.

Slide 7 : 7 / 20 : 4. Collect your own garbage, with care

Basic Java Optimisation Hints

4. Collect your own garbage, with care

Java programs run nicely most of the time, but now and again they grind to
a halt... that's garbage collection time !

Automatic memory management

A common performance complaint with Java programs is that
they run nicely most of the time, but now and again they grind to a halt. As
often as not the culprit is the garbage collector. Automatic memory management
is a great feature, but it comes at a price. Collecting that garbage is a slow
and difficult task and although most modern JVMs try to minimise the costs,
when the collector-man comes knocking you will know about it. So what do you
do?

Avoid garbage collection in first place by reusing your objects

Do this when things are quite

Well obviously the best thing is to avoid garbage collection
in first place by reusing your objects (see tip
#1). The alternative is to schedule the garbage collector yourself, at a
time which suites you best. You can manually invoke the collector at any time
by calling: System.gc().
The precise effect of this call will vary from JVM to JVM, but in general it
will clean out the memory so that the collector won't need to run for a while.
If you do this when things are quite it can save you some grief when things
get busy.

Java3D : Usability studies : it is better to have a slow but constant frame
rate than a fast but variable one

If you regularly schedule the garbage collector you can average out the costs

This technique is especially useful for Java3D applications.
Most usability studies have concluded that for interactive applications it is
better to have a slow but constant frame rate than a fast but variable one.
Anyone who's played Quake will know there's nothing more annoying than a game
where everything is ticking over nicely, then just as the action hots up the
frame rate bombs out. Averaging 60 frames per second (fps) sounds great, but
if that average varies from 20fps to 80fps it will be more annoying to users
than if you simply maintain a constant 30fps. So if you regularly schedule the
garbage collector you can average out the costs. Sure the highs won't be quite
as high, but then the low's won't be quite as low either. (We'll see how this
same observation affects behaviours in tip #12
and tip #15).

5. Use arrays [] for small collections of objects

Casting and a slow runtime type check

For very small, simple collections of objects
or primitive types you are much better to use arrays (6 - 8 objects).

In the java.util
package you'll find a bunch of nice container classes for managing groups of
objects. LinkedList,
Set,
Map
and TreeSet
are all very convenient, but they aren't necessarily high performance. To access
the contents of these containers you typically use a separate Iterator
object. That means more objects, chained method calls and other speed sapping
overheads. Also, the containers all store their contents as type Object,
which means casting and a slow runtime type check with each access. For very
small, simple collections of objects or primitive types you are much better
to use arrays.

It's important to emphasise that this is only true for
small collections: say to a maximum of 6 - 8 objects. When you have a large
number of objects a HashMap
or some form of tree can be extremely efficient. Obviously you should analyse
the complexity of your algorithm and choose a data structure which makes sense.
But for small collections it will always be hard to beat an array.

Slide 9 : 9 / 20 : 6. Be afraid of Reflection and Serialization

Basic Java Optimisation Hints

6. Be afraid of Reflection and Serialization

Funkiest features are also it's greatest bottlenecks

Reflection is the abiliy to introspect a class and dynamically work out what
methods and fields it has

Invoking a method through reflection is approximately one thousand times (1000x)
slower than a normal method call

Serialisation is the ability to take a group of objects and dump them out
into an array of bytes

Very handy for loading and saving data, and also useful for sharing data between
machines (Java RMI)... But spectacularly slow

Some of Java's funkiest features are also it's greatest
bottlenecks. Reflection is the abiliy to introspect a class and dynamically
work out what methods and fields it has. It can be quite useful when working
with JavaBeans and sometimes gets used in event processing code. But it really
bites when it comes to performance. Invoking a method through reflection is
approximately one thousand times (1000x) slower than a normal method call. That's
three orders of magnitude! Better go and put the kettle on, we could be here
for a while...

Serialisation is the ability to take a group of objects
and dump them out into an array of bytes. Very handy for loading and saving
data, and also useful for sharing data between machines. Serialisation is used
extensively in Java RMI to pass parameters back and forth. A great convenience,
but also a great way to burn CPU cycles. Serialisation is spectacularly slow!
Gob-smackingly inefficient, so use it only if you want your application to be
gob-smackingly unresponsive.

In general you should treat any of Java's more abstract
features as performance bottlenecks. Dynamic class loading, JDBC, parsing XML
documents, LDAP directory access, CORBA networking it's all great stuff but
none of it was designed by speed freaks. Handle with care.

N.B. : JavaTM Remote Method Invocation (RMI) enables the
programmer to create distributed Java technology-based to Java technology-based
applications, in which the methods of remote Java objects can be invoked from
other Java virtual machines*, possibly on different hosts.

Slide 10 : 10 / 20 : 7. Never ignore Exceptions

Basic Java Optimisation Hints

7. Never ignore Exceptions

Okay, okay, we've all done it. As a quick and dirty way
to make a piece of code compile you ignore the exceptions and end up with something
like:

try {
dodgeyMethodCall();
}
catch (Exception e) {
;
}

It keeps the compiler happy while you get on with worrying about the rest
of your algorithm.

That dodgey method full of bug could be throwing exceptions all the time and
you'd never know.

Common fix is to dump the exception out to the System.err
stream

But if you're running a Wedge application in full screen mode you probably
won't be looking at the console output very much

It keeps the compiler happy while you get on with worrying
about the rest of your algorithm. Trouble is, that dodgey method could be throwing
exceptions all the time and you'd never know. That's bad for all sorts of obvious
reasons, but also because exception handling is expensive and so your performance
will take a hammering. The cost of the method invocation blows out, an Exception
object is created (and then ignored), and who knows what knock on effects will
occur if dodgeyMethodCall does part but not all of what it should.

The common fix is to dump the exception out to the System.err
stream. That's better, but you can fall into the same problem if you don't actually
bother checking the text output of your code as it's running. This may seem
unlikely, but if you're running a Wedge application in full screen mode you
probably won't be looking at the console output very much. That makes it easy
to miss important exception traces.

So the lesson is never ignore exceptions: either in your
code, or when they're reported to you in your console

Slide 11 : 11 / 20 : 8. Go native ... but only if you have a really good reason

Basic Java Optimisation Hints

8. Go native ... but only if you have a really good reason

You might want to do some really low level, grimey optimisation work without
having to worry about garbage collectors, strong type checking and the other
elegant abstractions Java offers

You should only look to native code if you know exactly what you're doing
and why. Think cost-benefit analysis.

Moving data in and out of the JVM's garbage collected memory space is not
free. Calling Java code from native code is slow, and you still have to worry
about thread safety

Native code brings with it major development headaches: you
loose portability

You're much better to concentrate on algorithmic improvements

When all else fails you can use the Java
Native Interface (JNI) to jump out to native, compiled code. "Hang
on a minute" you might say, "isn't this tantamount to admitting that
Java is slow after all?". Err ... well no it isn't quite, but it
is an admission that you might want to do some really low level, grimey optimisation
work without having to worry about garbage collectors, strong type checking
and the other elegant abstractions Java offers. This is especially true if you're
one of those sick-o types who hand code tight SIMD assembly routines to crunch
through the inner-most loop of an image processing function or a scientific
computation. Don't believe the hype about what modern compilers can do - nothing
beats hand-crafted assembler!

This is, of course, you're absolute last resort. Lets
be really clear on this point: you should only look to native code if you know
exactly what you're doing and why. Think cost-benefit analysis. Profile your
application extensively, understand precisely how much time each routine uses
and account for every precious CPU cycle. Then way up any performance boost
against the costs associated with going native. Moving data in and out of the
JVM's garbage collected memory space is not free. Calling Java code from native
code is slow, and you still have to worry about thread safety. Once all this
is wayed up you may not see a speed gain with native code. Worse still native
code brings with it major development headaches: you loose portability; you
have to deal with the seedy underbelly of the JVM; debugging becomes a nightmare.
In my experience you're much better to concentrate on algorithmic improvements
since native code will only buy you a percentage or two in the margins.

In fact the reason I've included this tip in the article
is to try and convince you not to go native. Some gung-ho types dive
into native code at the first hint of performance trouble. But a "rush
of blood to the head" is not part of many good design methodologies. Less
of a recipe for success, most of the time it's either blind enthusiasm or blind
panic! My advice is go talk to someone else before ordering that copy of "x86
for Dummies".

Slide 12 : 12 / 20 : 9. Set up your Canvas3D with care

Java3D Specific Hints

9. Set up your Canvas3D with care

To roll your own Java3D initialisation code, you need to get a Canvas3D

If you work with Tiwi
you won't have to worry about this one, but if you decide to roll your own Java3D
initialisation code take care with the Canvas3D.
The crucial step is to make sure you use a GraphicsConfigTemplate3D
when you create your GraphicsConfiguration.
The wrong way to do things is as follows:

Java3D Specific Hints

10. Play by Java3D's rules

The most obvious thing to affect the performance of Java3D is the scene graph
you create

The most obvious thing to affect the performance of Java3D
is the scene graph you create. Java 3D recommends a number of basic things you
can do to your scene graph in the interests of efficiency. These are all pretty
obvious if you've read the documentation, but as a refresher remember to:

The first two won't make a massive difference to the performance
of your code [3]
but they're good programming practices to follow. Tight specification of bounds
is more important, and can really improve picking and collision detection. Think
a little more carefully when it comes to the bounds on Behaviour
nodes. Java3D tries to encourage you to minimise behaviour bounds, but in tip
#12 and tip #15 we'll look at why this
isn't necessarily good advice.

Slide 14 : 14 / 20 : 11. Collapse chains of transforms

Java3D Specific Hints

11. Collapse chains of transforms

Dangers of having too many transforms in a scene

Every position and orientation of every object in a scene is specified with
one or more TransformGroups, so inevitably they are going to be one of the most
common nodes you use.

Java3D has to multiply together all the transforms from that leaf back to
the root of the scene.

Where you can, combine a sequence of transformations together into a single
TransformGroup.

Java3D does this automatically (compile a branch), but only for transformations
which can't be read or written to.

Dear old TransformGroup,
what a trusty friend it is. But sometimes your friends can lead you into bad
ways. Mother always said not to take sweets from strangers, but she never mentioned
the dangers of having too many transforms in a scene. Every position and orientation
of every object in a scene is specified with one or more TransformGroups, so
inevitably they are going to be one of the most common nodes you use. Trouble
is to render a bit of geometry at one of the leaves in your scene Java3D has
to multiply together all the transforms from that leaf back to the root of the
scene. If you have long chains of transforms that can start to cost a bit. Perhaps
Mother doesn't know best after all.

Where you can, it pays to combine a sequence of transformations
together and so collapse a long chain down to a single TransformGroup. Java3D
does this automatically when you compile a branch of your scene graph, but only
for transformations which can't be read or written to. If you need to update
the positions of objects it's far better to concentrate all those updates into
one, or a small number, of transform nodes.

Java3D Specific Hints

12. Combine behaviours and schedule them for consistent performance

The Java3D documentation tells you to set tight bounding volumes on behaviours
so that they only run when they're absolutely required

tip #4 that it is better to have a constant
frame-rate than a highly variable one.

The solution is to concentrate all your code into a small number of intelligent
Behaviour nodes with large (or infinite) scheduling
bounds.

Behaviour
nodes are the smarts in you application. The interesting bits. By using Java3Ds
interpolators and other behaviours in novel ways you can wire some clever logic
into your scene without having to write a single line of code. The Java3D documentation
tends to encourage this, and it also tells you to set tight bounding volumes
on behaviours so that they only run when they're absolutely required (see tip
#10). Wiring logic directly into a scene seems like a pretty cool idea at
first and many new Java3D programmers take this approach to heart. Having tight
scheduling bounds also seems like a cool optimisation by only running those
behaviours that are actually visible.

So the temptation is to try to build everything out of
the existing behaviours, and write very simple little behaviours to plug any
gaps. It seems like a win-win situation. The Java3D scheduler has plenty of
flexibility in running the minimum number of behaviours based on the visible
area of the scene. You win too because by keeping your behaviours simple you
get plenty of opportunity to reuse them in other applications. But not everthing
that seems like a good idea actually turns out to be so. Fortran seemed like
a good idea at the time. So did the Leyland P.76.

What have I got against behaviours? Essentially it comes
back to the point in tip #4 that it is better
to have a constant frame-rate than a highly variable one. Turning behaviours
on and off all the time is a great way to guarantee inconsistent frame-rate.
Sure it improves things for one or two individual frames, but the overall effect
is more fluctuation in performance. This is a bad thing. It is also inefficient
to use the Java3D behaviour scheduler to arrange what bits of your code should
run when. To decide if a behaviour should run, Java3D has to check the bounding
volume of the behaviour against the visible volumes of the scene. This means
mapping the behaviour volume into world space (remember those long transform
chains in tip #11) and then intersecting it
with the view volume. Every behaviour, every frame.

The solution is to concentrate all your code into a small
number of intelligent Behaviour nodes with large (or infinite) scheduling bounds.
That way you get more consistent frame rates, and when it comes to turning on
and off certain pieces of code you are almost always in a better position to
make that decision than the Java3D scheduler (usually without having to map
all sorts of complex volumes through a chain of coordinate transformations).

Java3D Specific Hints

13. Minimise your reliance on collision detection, or do your own

You're walking along quitely minding your own business,
head stuck in a paper, oblivious to the world around you when smack!
you walk into a lamp post. Collision detection - it can get you in the physical
world, why not the virtual world too!

The most common form of physical modelling done in a virtual environment

Java3D provides a mechanism to do it automatically for you

Like all forms of physical modelling, collision detection is expensive to
perform and the more objects in your scene the more complex it becomes

Java3D's collision seems not to be the best one and is slow too

Okay for detecting a basic two-object collisions in very simple scenes

If you plan to use it, reduce the number of objects that can collide to the
barest minimum

If collision detection is a big part of your application, build your own.

A good source of information is actually the computer games industry

Collision detection is perhaps the most common form of
physical modelling done in a virtual environment. Because it is such an important
part of physical interaction Java3D provides a mechanism to do it automatically
for you. But, like all forms of physical modelling, collision detection is expensive
to perform and the more objects in your scene the more complex it becomes. Java3D's
collision detection system has some fairly nasty limitations and even the odd
outright bug. It's slow too. It may be okay for detecting a basic two-object
collisions in very simple scenes but don't rely on it for anything even remotely
complex. If you plan to use it, reduce the number of objects that can collide
to the barest minimum.

If collision detection is a big part of your application
you will probably need to do your own collision detection. This is not a trivial
task, but fortunately collision detection is a well studied problem. A good
source of information is actually the computer games industry. Modern games
have very demanding requirements for physical modelling.Collisions are the basis
of all the interaction in games such as Quake, Unreal and Half-life. Grab a
good game programming text, or hit any of online resources (http://www.gamasutra.com
is a ripper) to learn more. But be warned, writing a general purpose collision
detector is not for the faint of heart!

Slide 17 : 17 / 20 : 14. Don't burn time in system callbacks / don't try to run everything at full frame-rate

Java3D Specific Hints

14. Don't burn time in system callbacks / don't try to run everything at full
frame-rate

Don't burn large amounts of time in your event listening methods or in a behaviour's
processStimulus
method

If you do start burning serious amounts of time in a callback the AWT event
queue will start to fill or Java3D will start to fall behind in it's processing.

This effect can snowball

This tip is as relevant to general Java programming as
it is to Java3D: don't burn large amounts of time in your event listening methods
or in a behaviour's processStimulus
method. The thread that calls your listener method is not yours to do with as
you see fit: it's an AWT or Java3D thread that has important work to do elsewhere.
If you do start burning serious amounts of time in a callback the AWT event
queue will start to fill or Java3D will start to fall behind in it's processing.
This effect can snowball. For example, if you take too long to handle one event
by the time you've finished there may be another three events waiting for you,
then another seven, and so on. Like garbage collection (see tip
#4) this can lead to inconsistent frame rates.

Start being lazy : do not try to do everything at full frame-rate

run different bits of processing code every alternate frame

run your complex processing code in a separate, lower priority thread
:
"decouple the job of rendering from that of updating your application"

The solution is to start being lazy. That's lazy in the
sense of lazy evaluation, not lazy as in "wearing your socks inside-out
instead of washing them". Different kind of lazy.

The basic idea is not to try to do everything at full
frame-rate. One simple way to do this is to run different bits of processing
code every alternate frame - move objects one frame, check for collisions the
next frame. A better solution is to run your complex processing code in a separate,
lower priority thread and so decouple the job of rendering from that of updating
your application. By making processing code asynchonous to rendering code, you
get snappy visuals without crippling what your application can do. Just remember
to take care how you synchronise your threads (see tip
#3).

Slide 18 : 18 / 20 : 15. Don't be afraid to step outside Java3D

Java3D Specific Hints

15. Don't be afraid to step outside Java3D

A conclusion from the 3 previous examples

The trap that many first-time developers is to do everything with the scene
graph

Understand the limits of Java3D

Update the major state of your application in a separate thread, C

Calculate your transformation matrices

Perform your own collision detection

Interface with the scene graph through a small number of Behaviour
nodes

The last three tips have all really been leading us in
the same direction. The message is simple: don't be afraid to work outside Java3D.
It is great for presenting results and the scene graph does impose a degree
of structure on your application, but it doesn't diminsh the need for a good
design. The trap that many first-time developers is to do everything with the
scene graph and then attempt to build up the extra functionality as a bunch
of different behaviours. But that's like designing the GUI before you know what
your application does!

So the last tip is simply this: understand the limits
of Java3D and if it can't do everything you want don't be afraid of building
major parts of your application outside it's scope. Update the major state of
your application in a separate thread, calculate your transformation matrices
and perform your own collision detection and interface with the scene graph
through a small number of Behaviour nodes.

Java 3DTM API Collateral  1.2.1 Performance
Guide

I - Introduction

The Java 3DTM
API was designed with high performance 3D graphics as a primary goal. Since
this is a new API, many of its performance features are not well known. This
document presents the performance features of Java 3D in a number of ways.
It describes the specific APIs that were included for performance. It describes
which optimizations are currently implemented in Java 3D 1.2.1. And, it describes
a number of tips and tricks that application writers can use to improve the
performance of their application.

II - Performance in the API

There are a number of things in the API that were
included specifically to increase performance. This section examines a few
of them.

 Capability bits
Capability bits are the applications way of describing its intentions to the
Java 3D implementation. The implementation
examines the capability bits to determine which objects may change at run
time. Many optimizations are possible with
this feature.

 Compile
The are two compile methods in Java 3D 1.2.1. They are in the BranchGroup
and SharedGroup classes. Once an application calls
compile(), only those attributes of objects that have their capability bits
set may be modified. The implementation may then use this information
to "compile" the data into a more efficient rendering format.

 Bounds
Many Java 3D object require a bounds associated with them. These objects include
Lights, Behaviors, Fogs, Clips, Backgrounds, BoundingLeafs,
Sounds, and Soundscapes. The purpose of these bounds is to
limit the spatial scope of the specific object. The implementation
may quickly disregard the processing of any objects that are out of the spatial
scope of a target object.

 Unordered Rendering
All state required to render a specific object in Java 3D is completely defined
by the direct path from the root node to the given leaf. That means that leaf
nodes have no effect on other leaf nodes, and therefore may be rendered in
any order. There are a few ordering requirements for direct descendents of
OrderedGroup nodes or Transparent objects. But, most leaf nodes may be reordered
to facilitate more efficient rendering.

 Appearance Bundles
A Shape3D node has a reference to a Geometry and an Appearance. An Appearance
NodeComponent is simply a collection of other NodeComponent references that
describe the rendering characteristics of the geometry. Because the Appearance
is nothing but a collection of references, it is much simpler and more efficient
for the implementation to check for rendering characteristic changes when
rendering. This allows the implementation to minimize state changes in the
low level rendering API.

III - Current Optimizations in Java 3D 1.2.1

This section describes a number of optimizations that
are currently implemented in Java 3D 1.2.1. Other optimizations will be implemented
as the API matures. The purpose of this section is to help application programmers
focus their optimizations on things that will compliment the current optimizations
in Java 3D.

 HardwareJava 3D uses OpenGL and Direct3D as its low
level rendering APIs. It relies on the underlying
OpenGL and Direct3D drivers for its low level rendering acceleration. Using
a graphics display adapter that offers OpenGL or Direct3D acceleration is
the best way to increase overall rendering performance in Java 3D.

 Compile
In the Java 3D 1.2
release, no compile optimizations were implemented. The following compile
optimizations are implemented in the Java 3D 1.2.1 release:

Scene graph flattening:
TransformGroup nodes that are neither readable nor writable are collapsed
into a single transform node.

Combining Shape3D nodes: Non-writable
Shape3D nodes that have the same appearance attributes and are under the
same TransformGroup (after flattening) are combined, internally, into a
single Shape3D node that can be rendered with less overhead.

 State Sorted Rendering
Since Java 3D allows for unordered rendering for most leaf nodes, the implementation
sorts all objects to be rendered on a number of rendering characteristics.
The characteristics that are sorted on are, in order, Lights, Texture, Geometry
Type, Material, and finally localToVworld transform. The only exception to
this is any child of an OrderedGroup node. There is no state sorting for those
objects.

 3View Frustum Culling
The Java 3D implementation implements view frustum culling. The view frustum
cull is done when an object is processed for a specific Canvas3D. This cuts
down on the number of objects needed to be processed by the low level graphics
API.

 Multithreading
The Java 3D API was designed with multithreaded environments in mind. The
current implementation is a fully multithreaded system. At any point in time,
there may be parallel threads running performing various tasks such as visibility
detection, rendering, behavior scheduling, sound scheduling, input processing,
collision detection, and others. Java 3D is careful to limit the number of
threads that can run in parallel based on the number of CPUs available.

IV - Tips and Tricks<<=====

This section presents a number of tips and tricks for
an application programmer to try when optimizing their application. These
tips focus on improving rendering frame rates, but some may also help overall
application performance. A number of these optimization will eventually be
handled directly by the Java 3D implementation.

 Move Object vs. Move ViewPlatform
If the application simply needs to transform the entire scene, transform the
ViewPlatform instead. This changes the problem from transforming every object
in the scene into only transforming the ViewPlatform.

 Capability bits
Only set them when needed. Many optimizations can be done when they are not
set. So, plan out application requirements and only set the capability bits
that are needed.

 Bounds and Activation Radius
Consider the spatial extent of various leaf nodes in the scene and assign
bounds accordingly. This allows the implementation to prune processing on
objects that are not in close proximity. Note, this does not apply to Geometric
bounds. Automatic bounds calculations for geometric objects is fine.

 Change Number of Shape3D Nodes
In the current implementation there is a certain amount of fixed overhead
associated with the use of the Shape3D node. In general, the fewer Shape3D
nodes that an application uses, the better. However, combining Shape3D nodes
without factoring in the spatial locality of the nodes to be combined can
adversely effect performance by effectively disabling view frustum culling.
An application programmer will need to experiment to find the right balance
of combining Shape3D nodes while leveraging view frustum culling. The .compile
optimization that combines shape node will do this automatically, when possible.

 Geometry Type and Format
Most rendering hardware reaches peak performance
when rendering long triangle strips. Unfortunately,
most geometry data stored in files is organized as independent triangles or
small triangle fans (polygons). The Java 3D utility package includes a stripifier
utility that will try to convert a given geometry
type into long triangle strips. Application programmers should experiment
with the stripifier to see if it helps with their specific data. If not, any
stripification that the application can do will help. Another option is that
most rendering hardware can process a long list of independent triangles faster
than a long list of single triangle triangle fans. The stripifier in the Java
3D utility package will be continually updated to provided better stripification.

 Sharing Appearance/Texture/Material NodeComponents
To assist the implementation in efficient state sorting, and allow more shape
nodes to be combined during compilation, applications can help by sharing
Appearance/Texture/Material NodeComponent objects when possible.

 Geometry by reference
Using geometry by reference reduces the memory needed to store a scene graph,
since Java 3D avoids creating a copy in some cases. However, using this features
prevents Java 3D from creating display lists (unless the scene graph is compiled),
so rendering performance can suffer in some cases. It is appropriate if memory
is a concern or if the geometry is writable and may change frequently. The
interleaved format will perform better than the non-interleaved formats, and
should be used where possible. In by-reference mode, an application should
use arrays of native data types; referring to TupleXX[] arrays should be avoided.

 Texture by reference and Y-up
Using texture by reference and Y-up format may reduce the memory needed to
store a texture object, since Java 3D avoids creating a copy in some cases.
Currently, Java3D will not make a copy of texture image for the following
combinations of BufferedImage format and ImageComponent format (byReference
and Yup should both be set to true):

 Application Threads
The built in threads support in the Java language is very powerful, but can
be deadly to performance if it is not controlled. Applications need to be
very careful in their threads usage. There are a few things to be careful
of when using Java threads. First, try to use them in a demand driven fashion.
Only let the thread run when it has a task to do. Free running threads can
take a lot of cpu cycles from the rest of the threads in the system - including
Java 3D threads. Next, be sure the priority of the threads are appropriate.

Most Java Virtual Machines will enforce priorities aggressively.
Too low a priority will starve the thread and too high a priority will starve
the rest of the system. If in doubt, use the default thread priority. Finally,
see if the application thread really needs to be a thread. Would the task
that the thread performs be all right if it only ran once per frame? If so,
consider changing the task to a Behavior that wakes up each frame.

 Java 3D Threads
Java 3D uses many threads in its implementation, so it also needs to implement
the precautions listed above. In almost all cases, Java 3D manages its threads
efficiently. They are demand driven with default priorities. There are a few
cases that don't follow these guidelines completely.

 Behaviors
One of these cases is the Behavior scheduler when there are pending WakeupOnElapsedTime
criteria. In this case, it needs to wakeup when the minimum WakeupOnElapsedTime
criteria is about to expire. So, application use of WakeupOnElapsedTime
can cause the Behavior scheduler to run more often than might be necessary.

 Sounds
The final special case for Java 3D threads is the Sound subsystem. Due to
some limitations in the current sound rendering engine, enabling sounds
cause the sound engine to potentially run at a higher priority than other
threads. This may adversely effect performance.

 Threads in General
There is one last comment to make on threads is general. Since Java 3D is
a fully multithreaded system, applications may see significant performance
improvements by increasing the number of CPUs in the system. For an application
that does strictly animation, then two CPUs should be sufficient. As more
features are added to the application (Sound, Collision, etc.), more CPUs
could be utilized. Note: When running in the Solaris environment, be sure
that native threads are enabled. Green threads will not take advantage of
multiple CPUs.

 Switch Nodes for Occlusion Culling
If the application is a first person point of view application, and the environment
is well known, Switch nodes may be used to implement simple occlusion culling.
The children of the switch node that are not currently visible may be turned
off. If the application has this kind of knowledge, this can be a very useful
technique.

 Switch Nodes for Animation
Most animation is accomplished by changing the transformations that effect
an object. If the animation is fairly simple and repeatable, the flip-book
trick can be used to display the animation. Simply put all the animation frames
under one switch node and use a SwitchValueInterpolator on the switch node.
This increases memory consumption in favor of smooth animations.

 Switch nodes under Writable Transforms
Switch nodes that are descendants of writable TransformGroup nodes can incur
extra cost associate with updating the vworld bounds and localToVworld transforms
of all children (not just those that are switched on). This is one more reason
why it is better to rotate the viewer than the entire scene graph (see "Move
Object vs. Move ViewPlatform").

 Link/SharedGroup versus cloneTree
Using multiple Link nodes pointing to a shared subgraph (SharedGroup) can
have a performance penalty over a shallow clone of the scene graph. To create
a shallow clone of the scene graph, use cloneTree
without duplication the node components. Restrict
the use of Link/SharedGroup to those cases where you really need the kind
of sharing that it provides.

 OrderedGroup Nodes
OrderedGroup and its subclasses are not as high performing as the unordered
group nodes. They disable any state sorting optimizations that are possible.
If the application can find alternative solutions, performance will improve.

 LOD Behaviors
For complex scenes, using LOD Behaviors can improve performance by reducing
geometry needed to render objects that don't need high level of detail. This
is another option that increases memory consumption for faster render rates.

 Picking
If the application doesn't need the accuracy of geometry based picking, use
bounds based picking. For more accurate picking
and better picking performance, use PickRay instead of PickCone/PickCylnder
unless you need to pick line/point. PickCanvas with a tolerance of 0 will
use PickRay for picking.

Slide 20 : 20 / 20 : Conclusions, Resources and Further Reading

Conclusions

Java doesn't compile down to assembly language, but that doesn't mean it is
a slow pig.

Modern JVMs : just-in-time compilers and other sophisticated techniques

Much richer set of abstractions than C++ : it is possible to write inefficient
code if you're careless

Algorithmic improvements often yield the biggest gains and they are not specific
to any one language or API

Java doesn't compile down to assembly language, but that
doesn't mean it is a slow pig. Modern JVMs use just-in-time compilers and other
sophisticated techniques to bring the performance up to that of C++. But because
Java also supports a much richer set of abstractions than C++ it is possible
to write inefficient code if you're careless. The secret is to design your application
for performance right from the start, and to profile and optimise it once it's
working correctly.

Footnotes

[1] Actually
the answer can be worse than that. Different Java compilers treat strings in
different ways but you would be fortunate indeed to get away with less than
12 objects. You could be stung for as many as 18 objects by some compilers.

[2] Take a
look at one of the Matrix classes and you'll see things are worse than they
first seem. Each matrix is stored in an array of primitive types. For example
Matrix3f is stored in a float[9] array. So when you create a matrix you're actually
creating two objects: the Matrix object and the array. Ow!

[3] On all
but the most recent releases of Java3D (1.2.1) compiling branches doesn't actually
do anything!