If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

radeor video acceleration

08-29-2009, 10:37 AM

I would like to know the status of the radeon driver for getting any kind of video processing offloaded from the CPU to the GPU. It seems kind of silly to have to have a high end CPU to do this kind of work when there is a perfectly good GPU sitting there doing nothing. I find no major problem in decoding videos up to 1280x720, but when trying to decode 1920x1080, the CPU gets pegged and the playback becomes quite choppy. This is with an X2-3800 and have a Radeon 3650.

I would like to know the status of the radeon driver for getting any kind of video processing offloaded from the CPU to the GPU. It seems kind of silly to have to have a high end CPU to do this kind of work when there is a perfectly good GPU sitting there doing nothing. I find no major problem in decoding videos up to 1280x720, but when trying to decode 1920x1080, the CPU gets pegged and the playback becomes quite choppy. This is with an X2-3800 and have a Radeon 3650.

No decoding acceleration or motion compensation for the moment. Probably won't be until Gallium3D.

Comment

First, I guess I should make sure you are already making use of the existing video processing, ie going through the Xv interface to offload scaling, colour space conversion and filtering to the GPU. If you're not running with accelerated Xv today that should definitely be the first step.

Re: offloading the remaining video processing ("decode acceleration"), there are two things that have to happen first :

1. Either developers need to be willing to write all of the acceleration code in a hardware-dependent way (as was done for EXA and Xv) or a suitable framework needs to be implemented.

2. A decision needs to be made about how to hook the acceleration code into the playback stack. This is the more significant obstacle IMO. There are a number of decode APIs which offer multiple entry points including ones which map well onto generic GPU capabilities (eg starting with MC) but I don't believe anyone has looked at modifying an existing decode stack to hook into one of those lower level entry points for HD decode.

It might seem that using a pre-existing slice-level API is the obvious approach, but that means a lot of complex decode functionality would need to be implemented in the driver in software since the first implementations are likely to focus on what can readily be done with shaders and that implies the line between CPU and GPU be lower than slice-level.

Given that, the approach that seems to make the most sense is to hook into an existing open source decode library and add hooks to either use an MC-level decode API or to add the shader code directly to the library using an API like Gallium3D. I haven't looked at the existing decode libraries to see how hard it would be to hook in an MC-level decode API (eg VA-API) but if that did turn out to be relatively clean (ie if the VA-API interface mapped cleanly onto the code in the decode library) then it might be feasible to implement something without waiting for Gallium3D.

The "most likely to happen" approach is implementing decode acceleration over Gallium3D, since that provides a relatively vendor-independent low level interface for using the 3D engine. Once the "classic mesa" implementation for 6xx/7xx 3D is stabilized I think you will see focus shift almost immediately to porting that code across to a Gallium3D driver. This approach (implementing Gallium3D first then building decode acceleration on top) is what most of the community developers seem to be favoring today.

HW information to implement shader-based decode acceleration has been available for ~9 months on 6xx/7xx and ~18 months for earlier GPUs, so it's probably fair to say this is not a top priority for other users interested in becoming developers. In the meantime, if you have a multicore CPU there are multithreaded implementations of the current decode stack available and they seem to help a lot.

Comment

I haven't looked at the existing decode libraries to see how hard it would be to hook in an MC-level decode API (eg VA-API)

Well, I meant XvMC actually with the MC I mentioned earlier. I got the impression there already is a state tracker for it in Gallium3D and we just need a working r600g driver to tap into it. ^^ Actual decoding acceleration over VA-API or whatever else will require quite a lot more work though...

Comment

Definitely I'm using Xv. Not a newb here
I assume that by "multithreaded implementations of the current decode stack" you are referring to ffmpeg-mp. I have had a look at that, and it did help, but at this point, I've had to resort to dropping the $15 for a coreavc license. With that its still struggling, but at least the video is watchable.

I have to admit that most of your post went way over my head. I am a computer engineer myself, but no experience at all in graphics driver or video processing development. From what I can gather though, seems to me that there is a while to wait yet.

First, I guess I should make sure you are already making use of the existing video processing, ie going through the Xv interface to offload scaling, colour space conversion and filtering to the GPU. If you're not running with accelerated Xv today that should definitely be the first step.

Re: offloading the remaining video processing ("decode acceleration"), there are two things that have to happen first :

1. Either developers need to be willing to write all of the acceleration code in a hardware-dependent way (as was done for EXA and Xv) or a suitable framework needs to be implemented.

2. A decision needs to be made about how to hook the acceleration code into the playback stack. This is the more significant obstacle IMO. There are a number of decode APIs which offer multiple entry points including ones which map well onto generic GPU capabilities (eg starting with MC) but I don't believe anyone has looked at modifying an existing decode stack to hook into one of those lower level entry points for HD decode.

It might seem that using a pre-existing slice-level API is the obvious approach, but that means a lot of complex decode functionality would need to be implemented in the driver in software since the first implementations are likely to focus on what can readily be done with shaders and that implies the line between CPU and GPU be lower than slice-level.

Given that, the approach that seems to make the most sense is to hook into an existing open source decode library and add hooks to either use an MC-level decode API or to add the shader code directly to the library using an API like Gallium3D. I haven't looked at the existing decode libraries to see how hard it would be to hook in an MC-level decode API (eg VA-API) but if that did turn out to be relatively clean (ie if the VA-API interface mapped cleanly onto the code in the decode library) then it might be feasible to implement something without waiting for Gallium3D.

The "most likely to happen" approach is implementing decode acceleration over Gallium3D, since that provides a relatively vendor-independent low level interface for using the 3D engine. Once the "classic mesa" implementation for 6xx/7xx 3D is stabilized I think you will see focus shift almost immediately to porting that code across to a Gallium3D driver. This approach (implementing Gallium3D first then building decode acceleration on top) is what most of the community developers seem to be favoring today.

HW information to implement shader-based decode acceleration has been available for ~9 months on 6xx/7xx and ~18 months for earlier GPUs, so it's probably fair to say this is not a top priority for other users interested in becoming developers. In the meantime, if you have a multicore CPU there are multithreaded implementations of the current decode stack available and they seem to help a lot.

Comment

Are you saying we might see UVD, i.e. bitstream acceleration in opensource drivers?

I'm saying "we don't know yet, so assume the answer is no unless/until you hear otherwise". In the meantime, decode acceleration with shaders is moving ahead. Even if we opened up UVD tomorrow we would need shader-based decode acceleration anyways, since only the more recent GPUs (everything after the original HD2900) include the UVD block.

I recall there being some limitations on XvMC. Going straight to what I care about and need the stack to provide (note: according to reports on the web, VDPAU with nvidia does this): Postprocessing of the decoded video frames, needed to support current mplayer's implementation of subtitles, OSD, etc. Does XvMC even allow this?

I'm pretty sure that all of those features existed before VDPAU came along, and that code exists to implement them using existing APIs such as OpenGL.

XvMC has all kinds of limitations including being designed around MPEG-2 standards -- the reason for doing XvMC first is simply because a lot of the code is already there. This allows the developers to concentrate on getting a Gallium3D driver working to complete the stack. Once XvMC-over-Gallium3D is running the GPU-specific work will be largely done, and support for other APIs and video standards will be much easier to add.

The fad now is mobility - how does the power draw compare when using UVD and when using shaders? Well the library is a wrapper for various implementations and we already know nvidia's implementation (mostly) works. We're just THAT eager to see other implementations, working with hardware unaffected by Bumpgate

The quick answer is "we'll know for sure when the code is written", but I expect shader-based decode will use more power and CPU than UVD-based decode. The important question is whether it will use enough extra power to really matter for most users, and I suspect the answer is "no".

You make a good point here. We shouldn't spend more than 50 bucks if all you want is to watch HD content. I think the problem is with people that spent 150 or more and want to get the most out of their hardware.

I have been recommending something a bit more powerful than the very low end products to make sure the GPU has enough shader power for decode acceleration, ie going for something like an HD2600/HD3650 just to be safe -- at least until the shader-based decode stack is running well for most users.

The rv710 has 2X the shader power of the rv610/620 so that advice may no longer be relevent.

Comment

I have been recommending something a bit more powerful than the very low end products to make sure the GPU has enough shader power for decode acceleration, ie going for something like an HD2600/HD3650 just to be safe -- at least until the shader-based decode stack is running well for most users.