I don't have my profiling results handy. It was last year so the hardware was not that old. So I don't really trust your results. I will believe that halfpel and De blocking filter are high on the list since they really suck up a lot of raw mem bandwidth. But Huffman decoding is really fast even done badly while YUV->RGB (720p?) in java only using 5%? Even that's surprising (but perhaps not on faster modern CPUs).

What profiling tool did you use, how long did you collect stats for? Note i added some of my own timing stuff since these days i am finding hard to get a accurate profiler. Even then i replace the "slowdown" areas with a no op to see if it really does change the timings.

I tested it a few times already, the results stay the same. I am using the NetBeans built-in profiler. I recalibrated it and run the decoder for 100 frames (since it was too slow to run for longer).Here's a screenshot of the results:

Yes, thats what i thought. The method thats taking all the time... is the method that calls the iDCT (Hence DCT in the name). After scanning the code I bet dollars to cents that the iDCT is really whats taking a lot of time (and the de blocking filter). Note that both can use opengl for big speed ups.

The problems i have had with profiling has been the Netbeans and jvisual profilers. I can know they do a bad job, because I can not call a method that takes 80% of the time on some profiling results and it doesn't speed up at all. Also I don't think anything that takes less than 2 mins *does not* give a good reflection of server hot spots performance.

But we will see.

** missed important words

I have no special talents. I am only passionately curious.--Albert Einstein

just talked to some of the theora guys on irc. It was suggested that testing the full Cortado would give pretty messy profiling results as its multi threaded with a bunch of complicated locks. Also different bit rates would change where they expect the cpu to spend its time. What bit rate source are we talking about here?

I have no special talents. I am only passionately curious.--Albert Einstein

Okay so you guys don't like the NetBeans profiler, can you suggest me another one that is good and doesn't cost money? I would run the test again on it. I am just using the netbeans one since it's easy to use and integrates into my project, it seemed to work fine for everything I used it. I found this post with the profiling results for theora C version and it seems similar to my results:http://osdir.com/ml/multimedia.ogg.theora.devel/2004-02/msg00078.html

Quote

The method thats taking all the time... is the method that calls the iDCT (Hence DCT in the name)

It doesn't look like a DCT to me.I found where it was used in ExpandBlock and the comment says this:

Quote

/* Fractional pixel reconstruction. */ /* Note that we only use two pixels per reconstruction even for the diagonal. */

Quote

just talked to some of the theora guys on irc. It was suggested that testing the full Cortado would give pretty messy profiling results as its multi threaded with a bunch of complicated locks.

Okay but I am not using the cortado one, I am not allowed to use it since it's under the GPL, I am just using Jheora that comes with it. Also I profiled with root method being the video decode function, so even if those locks were there, their effects would not be included in the results.

He's using Mac and the java on the mac is probably not that good in optimizing as the Sun one, that might explain the differences. Also like you said maybe its the profiler. I don't have YourKit profiler but I am gonna get it tomorrow and test this again.

Okay so you guys don't like the NetBeans profiler, can you suggest me another one that is good and doesn't cost money?Okay so you guys don't like the NetBeans profiler, can you suggest me another one that is good and doesn't cost money?

Even with money. No. If you find one, let me know.

To put it simply, we are doing millions and billions of operations per second with complicated 3 tier cache system + instruction cache + branch prediction + out of order execution. Even changing the order changes the performance. Adding profiling code *changes* the profile. And in java with conditional compilation this is even worse. Basically taking the measurement changes the measurement so much that the measurement is simply false. Like i said. The profiler claimed that 80% of the time was spent in a method. Yet even with the method *commented out* the run time wasn't changed more than 5%. IO gets even harder since slowing everything down with instrumentation code doesn't slow down the IO. So IO performance is many times faster than in reality when profiling,

The best way to run the profiler (i use jvisual /hprof and Xprof. ) Compare and check. I check by moving the problem around to make some things worse. ie if my opengl code is fill limited, higher resolution should make it go a lot slower...

In this case we also have timing loops and locks.

The theora/jheora guys think that at low bit rates the iDCT won't be a problem because you only have one or 2 non zero coefficients. But at high bit rate they expect both iDCT and huffman decoding to hurt. But there profiling results was showing a huge chunk of work in the YUV2RGB path, and that matches experience in both java and C. In fact the Firefox decoder is using glsl for the YUV2RGB now apparently.

Note one of the main reasons I expect the iDCT to be high is experience. The second is back of the envelope calculations (bandwidth/FLOPS). iDCT in C (asm in fact) is fast because thats what MMX was designed for. Java however does not have this and so this is one area where "java is slow" is in fact true (same goes for YUV2RGB).

So is this 720p. What bit rate? And without profiling do you get real time. Note the C uses less than 12% cpu for 720p24 on my 2 year old system.

I have no special talents. I am only passionately curious.--Albert Einstein

java-gaming.org is not responsible for the content posted by its members, including references to external websites,
and other references that may or may not have a relation with our primarily
gaming and game production oriented community.
inquiries and complaints can be sent via email to the info‑account of the
company managing the website of java‑gaming.org