For more information, join the team subscribe to the mailing list
at the bottom of the Launchpad page
http://launchpad.net/~hybrid-graphics-linux
Please join this team if you are new by clicking on the "Join
Team" link at the right of the Launchpad page. It's important to
have as many users in the community as possible to request for
appropriate support.

Friday, 26 March 2010

This SoC project involves taking the nvidia optimus technology and implementing the same technology in open source driver stack. Some work has already been done to flesh this out in http://airlied.livejournal.com/71734.html there needs to be more work done in integrating it into the kernel and X.org stacks along with some client side method of picking which apps should be started where. Also the possibility of rendering 2D apps could be investigated.

Write a VDPAU state tracker for Gallium. In Gallium, state trackers implement APIs, and generate card-independent shaders. This allows supporting multiple cards with a single piece of code. As part of Summerof Code 2008, a student successfully implemented g3dvl, a video layer for decoding XvMC on top of Gallium. This project separated the video decoding code into two parts: a common vl part, and an XvMC frontend. The result allowed hardware accelerated MPEG2 playback on all Gallium-supported hardware. The purpose of this project is to build upon this code and add H.264 and VDPAU support. This requires improving the vl code for features that differ between H.262 and H.264, and adding a new VDPAU frontend. Furthermore, not all H.264 features can be implemented using shaders (like for example CABAC), so integrating CPU-based and GPU-based acceleration stages together will be another challenge. Ultimately, the purpose of this project is the production of a fully open source chain for hardware video decoding on any Gallium-supported GPU.

Open Source PRIME multi-gpu support

ThisSoC project involves taking the nvidia optimus technology andimplementing the same technology in open source driver stack. Some workhas already been done to flesh this out in http://airlied.livejournal.com/71734.html there needs to be more work done in integrating it into the kernel andX.org stacks along with some client side method of picking which appsshould be started where. Also the possibility of rendering 2D apps could be investigated.

As it's stated in the description, David Airlie has already done a bit of work in this area, so it's now ripe time for an enthusiastic student to apply for the project and spend time experimenting with Linux/Xorg code to implement cool new Linux features for the newest line of GPU hardware on the market. All this while earning good money from Google! Who said open-source software doesn't pay???!?!

If you are an enthusiastic student interested in hybrid graphics features in Linux, contact David now!If you are not a student but know someone who would be interested, spread the word :-)If you are none of the above, but would like this project to happen, write your encouraging words as a comment to this post!

We are looking for Linux users with Nvidia Optimus-enabled laptopswilling to provide debugging information for Open Source PRIMEmulti-gpu support features being worked on. Please join the team andsend an email to the mailing list specifying your laptop model.

You can check the model, version and graphic card details of you laptop with this command:

Friday, 12 March 2010

It seems that NVIDIA Optimus support for Linux is picking up pace and David Airlie has started to work on some proof-of-concept code for Multi-GPU Rendering in Linux. Since Phoronix posted first about it, we are here linking to their post, and David's original post afterwards:

Last month NVIDIA introduced Optimus as a way for dual-GPU notebooks to seamlessly switch between the two GPUs but also to offload the rendering workload to the other graphics processor. This is somewhat similar to NVIDIA's SLI and ATI/AMD's CrossFire for splitting the rendering workload across multiple GPUs, but it has its differences. David ended up developing a proof-of-concept similar to NVIDIA's Optimus that he is calling "Prime" and it works with Intel and ATI GPUs.

David's goals with Prime are to allow a second GPU to render 3D applications onto the screen of the first GPU, with it being configurable by the client, and just to handle the rendering side. This work isn't as simple as his vga_switcheroo implementation, but it required changes to the Linux kernel and the Graphics Execution Manager (GEM), the DRI2 protocol, the X Server and DRI2 modules, and then the actual Linux hardware drivers.

All of this code has already been published as a proof-of-concept, but David shares on his blog that he's unlikely to personally take this work further by upstreaming the code. He has been successful though in using this code to offload the rendering work from an Intel IGP that's driving a display to a discrete ATI graphics processor.

Right now Intel and ATI hardware is supported, but NVIDIA GPUs could be supported too. This work depends upon a system using DRI2 (albeit with these out-of-tree patches) and a compositing manager must be running. David also shares, "To make this as good as Windows we need to seriously re-architect the X server + drivers. At the moment you can't load an X driver without having a screen to attach it to, I don't really want a screen for the slave driver, however I still have to have one all setup and doing nothing and hopefully not getting in the way. We'd need to separate screen + drivers a lot better. Having some sort of dynamic screens would probably fall out of this work if someone decides to actually do it."

It would be wonderful if this work on Prime could be continued and it works its way upstream or that someone takes the reigns from David to continue on with this GPU offloading work for open-source drivers. First though it may make more sense to focus on getting decent performance out of a single GPU before dealing with multi-GPU excitement.

THIS IS A PROOF OF CONCEPT - its notgoing to be upstream unless someone else dedicates their life to it,(btw anyone know anyone in ASUS?)

So NVIDIA unveiled theiroptimus GPU selection solution for Windows 7, so I decided to see whatit would take to implement something similar under DRI. I've named itPRIME for obvious reasons.

Goals:1. Allow a second GPU to render 3D apps onto the screen of the first, pickable from the client side.2.Just target the rendering side, I'm assuming the GPU power up/down issimiliar to what was done for the older switching method.

Restrictions + limitations:1. Must have compositing manager running2. Must have second screen configured for slave card (doesn't need to be used)

The kernel requirements were simple, we needed a way to share a memory managed object between two kernel device drivers.Thekernel has a GEM namespace per device, however this isn't good enoughto share with other devices, so I introduced a new PRIME namespace withtwo ioctls. One ioctl allows the master device to associate a devicebuffer handle with a name in the prime namespace, and the other allowsthe slave device to associate a prime namespace handle with a buffer.When the master creates a prime buffer the kernel associates the listof pages with the handle, and when the slave looks up the same handleit retrieves the list of pages and fakes up a TTM buffer populated withthose pages as backing store. I've added the concept of slave object toTTM to allow for this.

Fromthe X server point of view a recent change to the DRI2 layer allowedfor multiple device driver names to be associated with a DRI2 endpoint. The client can request either a DRI or VDPAU device namecurrently. I firstly extended the DRI2 protocol, to add a new buffertype, called PRIME, and added a hack to mesa's glx loader to requestthe prime driver if an environment variable was specified.

Thiswas the messiest bit and still requires a lot of change. First up Iadded an interface for the drivers to register as PRIME master andslaves. Intel driver registers as master, radeon as slave for my demo.We store these in an array. When a client connects and requests primedriver, we mark the drawable and redirect the dri2 buffer creationrequests to the slave screen driver. Also the drm authentication issent to both kernel drms. It then hooks the swapbuffers command whereit does a region copy, and redirects this to the slave driver, anddamages the pixmap in the master driver. Now the "interesting" part, myoriginal implementation simply grabbed the window pixmap at the dri2create buffers time, however there is an ordering issue withcompositing, this pixmap is pre-composite redirection so isn't actuallythe pixmap you want to tell the kernel to bind to both gpus. Thisturned out to function badly, I could see gears all stretched over thefront buffer.

So a quick coke + chocolate break later, I hadenough sugar to bash out the hack that now exists. DRI2 calls the slavedriver copy region callback, which checks if the drawable pixmap is onthe same screen, if its not, it checks if we've marked the pixmap as aprime pixmap (i.e. one that belongs to the master). It is, it swaps inthe slaves copy, otherwise it callsback into DRI2. This callback callsthe Intel driver to make the buffer object backing the pixmap,shareable, and returns the handle,then calls into radeon with thehandle to create a new pixmap pointing at the shared buffer object.Once all that is done, radeon copies the back buffer to the sharedfront pixmap, we return and damage is posted and the compositor grabsthe window pixmap and displays it.

So does it work?On myblistering fast test system with X + xcompmgr running glxgears wasgoing at 150fps from the r200 PCI card. Hopefully I can get some timeon a faster system or one of the dual laptops.

Caveats:- When a window manager is running the gears get all corrupted, this looks like the clipping and/or stride matching betweenthedrivers isn't correct. I suspect something with reparenting anddecorations, I'm not enough of an X guru to understand this yet,hopefully one of the other hackers can fill me in. Also before it getsreparented and redirected a frame can land on the real front buffer,again clipping should take care of this, but isn't working yet. I needto workout how clipping and that stuff works in X/DRI2. - talk to pplabout clipping then JDI.- Once a client has connected as a prime,we don't tear it down properly, so later clients can end marked asprime. - work out some sort of resources to turn stuff off-Reference counting on the pages in the kernel is iffy, currently i915ups the page list refcount but never drops it. solution JDI- hardcoded /dev/dri paths in dri2 for slave device - solution JDI- radeon driver could in theory be a prime master - solution JDI- nouveau could support prime master/slave also. - solution nouveau guys JDI-requires an ugly second screen in xorg.conf to load the slave driver.Can we have a 0 sized screen or maybe a rootless second screen. -solution : rearchitect X server to allow drivers without screens(6m-1yr work)- pageflipping needs to be hacked off in intel driver. - work out and then JDI

Where is the video?Once I get it working with a window manager on a useful machine I might do a video of two gears going.

Where now?Wellthis is a purely academic exercise so far, after a week of kernelfighting I decided to do something new and cool. To make this as goodas Windows we need to seriously re-architect the X server + drivers. Atthe moment you can't load an X driver without having a screen to attachit to, I don't really want a screen for the slave driver, however Istill have to have one all setup and doing nothing and hopefully notgetting in the way. We'd need to separate screen + drivers a lotbetter. Having some sort of dynamic screens would probably fall out ofthis work if someone decides to actually do it.

The kernel bitsaren't as ugly as I thought but I'm not sure if upstreaming them is agood idea without the others bits. The refcounting definitely needswork also the cleanup when clients exit.

DRI2 needs some more changes, I might try and flesh it out a bit more and then talk to krh about a sane interface.

I'mprobably going to get forced task switch quite soon, so I might justget to having this running on a W500 or T500, before dropping it for 6months, so if anyone wants a neat project to play with and has the hwfeel free to try and take this on.

ASUSfeel free to send me one of the real optimus laptops and I'll getnouveau guys hooked up and try and RE the nvidia DMA engine.

Thursday, 11 March 2010

Optimus is a way of taking over the processing of DirectX/OpenGL calls the moment they're made. Optimus works by leaving the Intel's Display Driver to display image on the screen and actively monitoring everything that is happening in relation to displaying image. The library of applications inside nVidia's driver will automatically react and switch to the GPU as soon as it detects application profile where nVidia's GPU would do much better than integrated graphics.

Using NVIDIA's Optimus technology, when the discrete GPU is handling all the rendering duties, the final image output to the display is still handled by the Intel integrated graphics processor (IGP). In effect, the IGP is only being used as a simple display controller, resulting in a seamless, flicker-free experience with no need to reboot. When less critical or less demanding applications are run, the discrete GPU is powered off and the Intel IGP handles both rendering and display calls to conserve power and provide the highest possible battery life.

The hardware component Optimus-capable GPUs is the "Optimus Copy Engine", a parallel pipeline next to the 3D Engine one. What Copy Engine does is to take the finalized rendered engine created by the 3D Engine and copy the contents from on-board memory to the system memory - which is then taken by Intel's IGP and displayed on frame-by-frame basis. The Optimus Copy Engine is a new alternative to traditional DMA (Direct Memory Access) transfers between the GPU framebuffer memory and main memory used by the IGP.

In the Microsoft Windows world, Optimus technology leverages Windows 7's ability to allow two independent graphics drivers to be active at the same time. The standard Intel graphics driver is used along with the NVIDIA driver because both display adapters operate independently. Looking within the Windows Device Manager, you'll see two display adapters listed even if Optimus has turned the GPU off.

The Linux community now needs people who are going to figure out how to activate two graphics drivers at the same time. Also, how to switch the mux between the integrated graphics and the discrete card. In nvidia/nvidia configurations, how to access the discrete ROM also needs investigating.

Although the open-source Nouveau community has been very active since their merge to the mainline kernel, nobody seems to have shown interest in getting Optimus hardware to work in Linux. From the users pointo of view, we already have Linux users with optimus laptops, willing to provide useful debugging information via the hybrid-graphics-linux Launchpad group:https://launchpad.net/~hybrid-graphics-linux

Hopefully thinks will get better soon, and the usual lag that takes for Linux to implement features in the graphics world will be shorter than usual for Optimus-enabled laptops and desktops.

MSI is showing 2 Optimus notebooks.The new MSI F Series is designed to be professional, slim and powerful, and the FX400 and the FX600 both feature an NVIDIA GeForce 310M GPUs with NVIDIA Optimus technology:

And in NVIDIA’s Cebit demo rooms we’re showing several of the aboveASUS notebooks, plus 2 additional Optimus notebooks - the Medion AkoyaP6622 (15.6”) with GeForce 310M and the recently announced Acer AspireOne 532g (10”) with next generation NVIDIA ION. Optimus is on a roll.

Wednesday, 3 March 2010

Explaining automatic graphics switching and the benefits thereof can bea somewhat dry affair. You have to tell people about usabilityimprovements and battery life savings and whatnot... it's much more funif you just take a nice big engineering board, strap the discrete GPUon its own card and insert an LED light for the viewer to follow.NVIDIA has done just that with its Optimus technology -- coming to a laptop or Ion 2-equipped netbooknear you -- and topped it off by actually pulling out the GPU card whenit wasn't active, then reinserting it and carrying on with its use asif nothing had happened. This was done to illustrate the fact thatOptimus shuts down the GPU electrically, which is that little bit moreenergy efficient than dropping it into an idle state. Shimmy past thebreak to see the video.