Those of you who attended the South West show, or have read about our stuff on the news sites, will know that Geminus now has the ability to decode JPEGs for those apps which call SpriteExtend. The main reason for doing this is that I've always been frustrated by how slowly SpriteExtend decodes and renders large JPEGs.

If you have large images from digital cameras and you view them in applications like !Thump, !Paint, !Draw, !SwiftJPEG, or use them as backdrops then you'll know that it takes many seconds for a full screen JPEG to render. In fact it spends so long decoding and rendering the JPEG that audio playback will cease with apps like DigitalCD because of the way the OS is structured (callbacks do not occur during the rendering).

Geminus can render JPEGs about 3 times faster than the SpriteExtend code in RISC OS 5. The latest code can also transform/rotate JPEGs, which is something Acorn never implemented in their code:

(Redraw glitch - the white pixel run - included for free, to prove that this is the first (nearly-)working code!

This feature is very nearly complete. Michael Drake has very kindly run hundreds of images through the decoder, and I've tested hundreds myself and there's one known glitch, to be fixed shortly. It should be available for all RISC OS machines, since the code is faster than SpriteExtend on the A9home and RISC OS 4.02 too, though the speed up isn't as dramatic as on RISC OS 5, because RISC OS Ltd has already improved SpriteExtend's decoding.

Transformed/rotated JPEGs are new to all version of RISC OS as far as I know.

Finally, a small disclaimer: Geminus hasn't been released for non-IYONIX machines previously (because the currently-released features require an NVIDIA PCI card), so I'll need to do a fair bit of testing to assure myself that it is properly stable on those machines too. It does work; in fact it's running on the A9home at the moment.

Last edited by admin on Wed Mar 22, 2006 6:14 pm, edited 2 times in total.

Running on RiscPC. JPEG feature only, now capable of running without the other Geminus features such as acceleration and emulated screen modes etc. Also implemented 16bpp code (til now only the 32bpp mode support was complete because that's the expected depth on the IYONIX pc). I'm not sure that 8bpp will ever be implemented, but we'll see. 8bpp is a lot more complicated, and a lot less useful for viewing JPEGs anyway.

<= 8bpp will be implemented, but only a very simple algorithm; no support for dithering or error diffusion at the moment. I've decided to include this much so that something is visible for transformed/rotated JPEGs in <=8bpp modes because in this case we can't simply step aside and let SpriteExtend do the work.

Right, all known issues fixed, I've frozen the code. Barring any major problems in absolutely-last-minute-final-final-testing, the JPEG code should be on sale very soon. I've tested it on everything from ARM610 through StrongARM to XScale, even ensuring that the code adapts itself to the CPU that it finds, so if it finds itself running on an ARM610/710 it won't try to use the ARMv4 long-multiply instructions, for example.

I do not, however, recommend trying to view 3MB panoramic JPEGs (that decompress to a whopping 440MB in 32bpp mode!) on your ARM610 RiscPC. It will work, but you may end up drinking a lot of tea waiting

Interestingly, that particular (admittedly somewhat extreme) test image is unviewable on my 128MB Windows NT box because of the way that its software works, attempting to decompress to a bitmap before rendering anything. There is much to be said for the approach taken by Geminus and SpriteExtend, of decompressing on-the-fly. I can now scroll around the same image quite happily on a 128MB IYONIX with substantially less horse power.

With modern processors being very fast computationally and relatively slower at I/O transfers, something that's true even of our somewhat computationally-challenged RISC OS machines, the actual JPEG decoding becomes almost free.

For example, the decoding logic in Geminus can decode and write 33MB of pixel data to the screen every second on an IYONIX pc. I've never achieved more than 58MB/s to screen just plotting a sprite using the CPU; a task that requires no computation, merely the transfer of pixel data. The OS sprite plotting code manages just 23MB/s

Of course, avid readers of my scribblings (yes, both of you!) will know that using a DMA channel allows a significantly higher throughput; just over 80MB/s in fact. Obviously I wondered whether the DMA channel could be pressed into use when plotting JPEGs too, but the prototype failed to yield any significant performance increase.

This may be because of the overheads involved in quickly adapting Geminus's existing DMA code to a task for which it was never intended. A more efficient approach would render the JPEG scanlines to a couple of fixed-size, known-location DMA buffers and quickly program the DMA channel to output to screen in parallel, without any need to translate logical to physical addresses. Time will tell whether I get around to trying this properly and achieving yet further speed increases. Scaled plotting, although faster than the RISC OS 5 SpriteExtend, still leaves quite a bit to be desired. We'll see....