Meta

Tag: CUDA

In this earlier posting, I had written about the fact that the project was risky, to switch from the open-source ‘Nouveau’ graphics drivers, which are provided by a set of packages under Debian / Linux that contain the word ‘Mesa’, to the proprietary ‘nVidia’ drivers. So risky, that for a long time I faltered at doing this.

Well just this evening I made the switch. Under Debian / Stretch – aka Debian 9, this switch is relatively straightforward to accomplish. What we do is to switch to a text-session, using <Ctrl>+<Alt>+F1, and then kill the X-server. From there, we essentially just need to give the command (as root):

apt-get install nvidia-driver nvidia-settings nvidia-xconfig

Giving this command essentially allows the Debian package-managers to perform all the post-install steps, such as black-listing the Nouveau drivers. One should expect that this command has much work as its side-effects, as it pulls in quite a few dependencies.

When I gave this command the first time, apt-get suggested additional packages to me, which I wrote down on a sheet of paper. And then I answered ‘No’ to the question of whether or not to proceed (without those), so that I could add all the suggested packages onto a new command-line.

(Update 05/05/2018 :

The additional, suggested packages which I mentioned above, offer the ‘GLVND’ version of GLX. With nVidia, there are actually two ways to deliver GLX, one of which is an nVidia-centered way, and the other of which is a generic way. ‘GLVND’ provides the generic way. It’s also potentially more-useful, if later-on, we might want to install the 32-bit versions as well.

However, if we fail to add any other packages to the command-line, then, the graphics-driver will load, but we won’t have any OpenGL capabilities at all. Some version of GLX must also be installed, and my package manager just happened to suggest the ‘GLVND’ packages.

Without OpenGL at all, the reader will be very disappointed, especially since even his desktop-compositing will not be running – at first.

The all-nVidia packages, which are not the ‘GLVND’ packages, offer certain primitive inputs from user-space applications, which ‘GLVND’ does not implement, because those instructions are not generically a part of OpenGL. Yet, certain applications do exist, which require the non-‘GLVND’ versions of GLX to be installed, and I leave it up to the reader to find out which packages do that – if the reader needs them – and to write their names on a sheet of paper, prior to switching drivers.

It should be noted, that once we’ve decided to switch to either ‘GLVND’- or the other- version of GLX, trying to change our minds, and to switch to the other version, is yet another nightmare, which I have not even contemplated so far. I’m content with the ‘GLVND’- GLX version. )

(Edited 04/30/2018 :

There is one aspect to installing up-to-date nVidia drivers which I should mention. The GeForce GTX460 graphics card does not support 3rd-party frame-buffers. These 3rd-party frame-buffer drivers would normally allow, <Ctrl>+<Alt>+F1, to show us not only a text-session, but one with decent resolution. Well, with the older, legacy graphics-chips, what I’d normally do is to use the ‘uvesafb’ frame-buffer drivers, just to obtain that. With modern nVidia hardware and drivers, this frame-buffer driver is incompatible. It even causes crashes, because with it, essentially, two drivers are trying to control the same hardware.

Just this evening, I tried to get ‘uvesafb’ working one more time, to no avail, just as it does work on the computer I name ‘Phoenix’. )

So the way it looks now for me, the text-sessions are available, but only in very low resolution. They only exist for emergencies now.

But this is the net result I obtained, after I had disabled the ‘uvesafb’ kernel module again:

So what this means in practice, is that I have OpenGL 4.5 on the computer named ‘Plato’ now, as well as having a fully-functional install of ‘OpenCL‘ and ‘CUDA‘, contrarily to what I had according to this earlier posting.

Therefore, GPU-computing will not just exist in theory for me now, but also in practice.

And this displays, that the graphics card on that machine ‘only’ possesses 224 cores after all, not the 7×48 which I had expected earlier, according to a Windows-based tool – no longer installed.

Two of the subjects which I like to blog about, are direct-rendering and Linux graphics drivers.

Well in This Earlier Posting, I had essentially written, that on the Debian 9 , Debian /Stretch computer I name ‘Plato’, I have the ‘Mesa’ Drivers installed, and that therefore, that computer cannot benefit from OpenCL, massively-parallel GPU-computing.

What may confuse some readers about this is the fact that elsewhere on the Internet, there is speak about ‘Nouveau’ Drivers, but less so about Mesa Drivers.

‘Mesa’, which I referred to, is a Debian set of meta-packages, that is all open-source. It installs several drivers, and selects the drivers based on which graphics hardware we may have. But, because ‘Plato’ does in fact have an nVidia graphics card, the Mesa package automatically selects the Nouveau drivers, which is one of the drivers it contains. Hence, when I wrote about using the Mesa Drivers, I was in fact writing about the Nouveau Drivers.

One of the reasons I have to keep using these Nouveau Drivers, is the fact that presently, ‘Plato’ is extremely stable. There would be some performance-improvements if I was to switch to the proprietary drivers, but making the transition can be a nightmare. It involves black-lists, etc..

Another reason for me to keep using the Nouveau Drivers, is the fact that unlike how it was years ago, today, those drivers support real OpenGL 3, hardware-rendering. Therefore, I’m already getting partial benefit from the hardware-rendering which the graphics card has, while using the open-source driver.

The only two things which I do not get, is OpenCL or CUDA computing capabilities, as Nouveau does not support that. Therefore, anything which I write about that subject, will have to remain theoretical for now.

I suppose that on my laptop ‘Klystron’, because I have the AMD chip-set more-correctly installed, I could be using OpenCL…

Also, ‘Plato’ is not fully a ‘Kanotix’ system. When I installed ‘Plato’, I borrowed a core system from Kanotix, before Kanotix was ready for Debian / Stretch. This means that certain features which Kanotix would normally have, which make it easier to switch between graphics drivers, are not installed on ‘Plato’. And that really makes the idea daunting, to try to switch…

One of the subjects which many programmers have been studying, is not only, how to write highly parallel code, but how to write that code for the GPU, since the GPU is also the most-readily-available highly-parallel processor. In fact, any user with a powerful graphics card, may already have the basis to program using CUDA or using OpenCL.

I had written an earlier posting, in which I ended up trying to devise a way, by which the compiler of the synthesized C or C++, would detect that each variable is being used as ‘rvalues’ or ‘lvalues’ in different parts of a loop, and by which the compiler would then choose, to allocate a local register, allocate a shared register, or to make a local copy of a value once provided in a shared register.

According to what I think I’ve learned, this thinking was erroneous, simply because a CUDA or an OpenCL compiler, does not take this responsibility off the hands of the coder. In other words, the coder needs to declare explicitly and each time, whether a variable is to be allocated in a local or a shared register, and must also keep track of how his code can change the value in a shared register, from other threads than the current thread, which may produce errors in how the current thread computes.

But, a command which CUDA offers, and which needs to exist, is a ‘__syncthreads()’ function, which suspends the current thread, until all the threads running in one core-group have executed the ‘__sycnthreads()’ instruction, after which point, they may all resume again.

One fact which disappoints about the real ‘__syncthreads()’ instruction is, that it offers little in the way of added capabilities. One thing which I had written this function may do however, is actually give the CPU a chance to run briefly, in a way not obvious to CUDA code.

But then there exist capabilities which a CUDA or an OpenCL programmer might want, which have no direct support from the GPU, and one of those capabilities might be, to lock an arbitrary object, so that the current thread can perform some computation which reads the object – after having obtained a lock on it – and which then writes changes to the object, before giving up its lock on it.

I am writing about the big graphics cards which power-users and gamers install into their PCs, which have a special bus-slot, and which cost as much money in themselves, as some computers cost.

The way those are organized physically, they possess one or more GPU, and DDR Graphics RAM, which loosely correspond to the CPU and RAM on the motherboard of your PC.

The GPU itself contains registers, which are essentially of two types:

Per-core, and

Shared

When coding shaders for 3D games, the GPU-registers do not fulfill the same function, as addresses in GRAM. The addresses in Graphics RAM typically store texture images, vertex arrays in their various formats, and index buffers, as well as frame-buffers for the output. In other words, the GRAM typically stores model-geometry and 2D or 3D images. The registers on the GPU are typically used as temporary storage-locations, for the work of shaders, which are again, separately loaded onto the GPUs, after they are compiled by the device-drivers.

A major feature which the designers of graphics cards have given them, is to extend the system memory of the PC onto the graphics card, in such a way that most of its memory actually has hardware-addresses as well.

This might not include the GPU-registers that are specific to one core, but I think does include shared GPU-registers.