Posted
by
timothy
on Saturday August 11, 2012 @11:20PM
from the also-it's-delicious dept.

An anonymous reader writes "Today the source code to the Rootbeer GPU Compiler was released as open source on github. This work allows for a developer to use almost any Java code on the GPU. It is free, open source and highly tested. Rootbeer is the most full featured translator to convert Java Bytecode to CUDA. It allows arbitrary graphs of objects to be serialized to the GPU and the GPU kernel to be written in Java." Rootbeer is the work of Syracuse University instructor Phil Pratt-Szeliga.

Used intelligently by a skilled programmer, Java can deliver great results and provide exactly the sort of cross platform capabilities it was designed for. Used by idiots and/or kids who just earned that undergrad CS degree, it tends to provide less.

Usually there is only a problem when your application server (I'm looking at you, stinky Websphere) relies on a particular vendor's implementation of Java (eg. IBM JDK). With recent Sun/Oracle Java Runtime Environments (that is: Java 6, which is around 5 years old; and Java 7) applications usually run flawlessly (at least, mine do - and I'm doing all the usual Java funky stuff: radar, roadsign and head tracking hardware device control, networking, web, 3D graphics, rich clients, etc).

As a current practitioner in the field I wonder whether your experience is from a long time ago (and polluted by the experience of using software that was based on Microsoft's awful [and deliberately incompatible - which they lost the famous lawsuit over] Java implementation from a long time ago).

Guess what I'm not coding for it just using apps from major corps. Those fun management apps that on some kit you must use as the cli will only setup basic ip settings, any real config requires some java monstrosity. Just a few weeks back a java update broke some buttons on the latest supermicro ipkvm java app and the cisco asdm launcher (though the web launched versions works). I would not call cisco some fly by night company.

Here, from GitHub, is the short presentation. [github.com] This is very impressive. It finds parallelism automatically, at least for simple cases. Over 50x performance improvement on matrix multiply and naive Fourier transform (not FFT), both of which have very simple inner loops. Not clear how it does on less obvious problems.

Programs don't magically become faster when they are run on GPUs. E.g. Linear Algebra algorithms work really well on CPUs, but if ported 1 to 1 ( as this would ) to a GPU their performance is just abysmal. Usually one needs to use a specially designed algorithm that can actually use the massive parallelism of a GPU and not get stuck e.g. trying to synchronize or doing other kinds of communication. GPUs really like doing the same operation on independent data, which is basically what happens when rendering an image, they are not really designed to have operations that need information of all other data, or neighbouring data in a grid.... . Just because something works on a GPU does not mean its efficient, thus the performance could be much worse using a GPU .Also balancing CPU and GPU usage is even harder ( maybe impossible ? ) as you cannot predict what kind of System you will run your software on, thus usually these days the CPU feeds the GPU with data ( with the current Tesla cards only 1 core per GPU, this changes in the Kepler version to 32 ) and does some processing that can't be done on the GPU, but do not share any kind of workloads.

I don't know how the h.264 codec is structured or if it is possible to have performance gains on encoding. However I really doubt that x.264 can be just ported as they rely heavily on CPU specific features ( SSE etc ) which is quite different to the much higher level bytecode that Java would produce.