After releasing OpenGL ES 3.1 on Monday, the Khronos Group today announced a handful of other specifications for 3D graphics and GPU computation, including WebCL.

Just as WebGL defines a JavaScript API for OpenGL ES 2.0 3D graphics, WebCL defines a JavaScript API for OpenCL 1.1 parallel computation. If WebCL receives the same kind of industry adoption that WebGL has, it will enable mobile developers to perform tasks such as physics calculations in games and image processing, all accelerated by the GPU, or even multiple cores of the CPU.

As with WebGL, WebCL is a fairly low-level API. It works the same way as OpenCL does on the desktop: OpenCL routines need to be written in OpenCL's special C-derived language. This will make it familiar to existing OpenCL developers and enable the reuse of existing OpenCL code, but it's unlikely to be something that traditional Web developers are familiar with.

For WebGL, the solution has been twofold. First, high-level libraries such as three.js have been developed, making 3D programming more approachable. Second, systems such as Emscripten and asm.js have been used to enable traditional 3D programs (including those using Unreal Engine and, soon, Unity) written in languages such as C++ to be compiled in such a way that they will run directly within the browser.

Khronos anticipates that a comparable range of companion technologies will be developed to similarly make WebCL usable.

WebCL does not encompass the full OpenCL spec. Certain OpenCL features (for example, features of memory access and usage) can have security implications, especially when used incorrectly. These features have been limited or removed in WebCL to help ensure that Web content cannot compromise system security.

Not everyone wants to write OpenCL code using the special OpenCL language. Competing GPU-based compute systems, including both Microsoft's C++ AMP and Nvidia's CUDA-compatible Thrust, offer a more streamlined development model, where GPU-based functions can be more or less seamlessly integrated with conventional CPU-based code written in high-level C++.

Aware of this, Khronos has produced SYCL, an API specification that provides C++ compiler developers the tools to integrate their libraries and compilers with OpenCL. With SYCL, compilers will be able to offer the same kind of single-source development, where CPU and GPU code are in the same source files, as found in AMP.

The third and final spec announced today is EGL 1.5. EGL is a little-known specification that's quietly found a role on mobile platforms, including Android. The EGL API defines how software initializes its access to OpenGL ES and OpenCL and how it coordinates between OpenGL and the windowing/display environment. The new version improves interoperability between GL and CL and 64-bit support.

The two big desktop platforms, Windows and OS X, don't (currently) use EGL; instead, those platforms have their own extension APIs (WGL and CGL, respectively) to manage these tasks. The X Window System on Unix similarly has an extension called GLX.; however, the new Wayland windowing system, which is being developed to replace X, uses EGL.

This sounds interesting. I wish I knew what the hell it means in layman's' terms.

CPUs are good at general purpose tasks that applications require of them. GPUs are good at raw number crunching. Web javascript runs on the CPU. This is a proposed framework for letting web javascript run on the GPU -- which would give web software much increased number crunching abilities.

You would think at this point in time, intel would add instructions making CPUs more adept at whatever is being done on GPUs. I can understand a GPU being better at a matrix operation given the task of 3d imaging, but what in a GPU makes it good for crypto over a CPU?

You would think at this point in time, intel would add instructions making CPUs more adept at whatever is being done on GPUs. I can understand a GPU being better at a matrix operation given the task of 3d imaging, but what in a GPU makes it good for crypto over a CPU?

SIMD/SIMT... meaning I can take a single set of instruction and run it over multiple hardware threads. So on Xbox One I could run the same routine on 768 cores, or 12 routines on 64 cores each. CPU are more adept at handling control flow, all those if/then/else/switch type statements where the program jumps around from line to line, where as GPUs you need to run all the instructions regardless.

You would think at this point in time, intel would add instructions making CPUs more adept at whatever is being done on GPUs. I can understand a GPU being better at a matrix operation given the task of 3d imaging, but what in a GPU makes it good for crypto over a CPU?

There's a lot of it? Essentially GPUs are good at massively parallel tasks that require the same operations be done on a lot of elements without many branches. A GPU specializes by having tens and hundreds of small limited cores that can do these limited problems. The only way to make a CPU good at these problems is to make it a GPU, which will sacrifice it's performance on the general computing tasks CPUs are good at. In between are many-cpu chips like the Xeon Phi which use fewer, but still many, legitimate CPU cores to process more in parallel than a multicore CPU, but providing sub-Atom levels of single-thread performance.

This sounds interesting. I wish I knew what the hell it means in layman's' terms.

Guess it means it's be easier to turn your rig into a Bitcoin bot?

More likely, for someone else to turn your rig into a Bitcoin miner.

Just imagine a Bitcoin bank that allows you to visit their site, setup an account and leave your browser parked while you mine as part of a pool. Your bounty could go directly into your account with them, or transfered to another wallet. If done well, the entire thing could be easy enough for your grandmother to use.

I'd love to see compilers smart enough to utilize GPGPU when it made sense, but I can't see it happening anytime soon. GPU hardware is great for doing massively parallel calculations, but its memory (or buffer) is a long ways away from the CPU. Data transfer to and from GPU memory can take much longer than accesses from system memory and are less predictable. So someone who wants to write effective code needs to figure out how and when data ought to be transferred from the CPU/System RAM to the GPU, how to execute their kernel on this data, then how and when to transfer data back to the system. Things are further complicated when you potentially have multiple compute devices on the system, each of which might have its own characteristics.

It's actually fairly straight-forward to find the available devices and their capabilities, shuttle data back and forth, and execute kernels on the GPU using OpenCL--no compiler extension is going to make it much simpler unless it can figure out how to execute all of those tasks in a near-optimal way for most systems without the programmer specifying the exact behavior. I think the problem probably won't become more tractable until GPUs become more integrated with the CPU. Certainly, I don't spend too much time worrying about exactly when the CPU is going to move stuff from one cache to the next--when current GPU capability looks more like SIMD than computations on a distant device, then it's reasonable to leave the work to the compiler, since the consequences for getting it not-quite-right aren't all that big. But today, even a good human programmer can't always guess the best strategy for shuffling data around without compiling, running, and testing the program. A compiler doesn't stand a chance of doing better.

> The two big desktop platforms, Windows and OS X, don't (currently) use EGL; instead, those platforms have their own extension APIs (WGL and CGL, respectively) to manage these tasks. The X Window System on Unix similarly has an extension called GLX.; however, the new Wayland windowing system, which is being developed to replace X, uses EGL.

I was really intrigued by the nVidia CUDA solution a number of years ago. The next year they changed the license such that it wouldn't work when a competing GPU was installed in the system. My linux box at the time ran a cheap ATI card for video. It was fine and I was looking at adding two nVidia cards to focus on CUDA work only. Well, their change was good for me because it saved me buying their cards.

This has the potential to release the power of GPU for some calculations on web based solutions. I would never allow such to happen over the internet an intranet solution has some possibilities.

I was really intrigued by the nVidia CUDA solution a number of years ago. The next year they changed the license such that it wouldn't work when a competing GPU was installed in the system. My linux box at the time ran a cheap ATI card for video. It was fine and I was looking at adding two nVidia cards to focus on CUDA work only. Well, their change was good for me because it saved me buying their cards.

This has the potential to release the power of GPU for some calculations on web based solutions. I would never allow such to happen over the internet an intranet solution has some possibilities.

Just go with OpenCL. Vender lock-in sucks way too hard (as you discovered) and OpenCL is much more widely applicable.

As of OS X v10.7, OpenCL developers can enqueue work coded as OpenCL kernels to Grand Central Dispatch (GCD) queues backed by OpenCL compute devices. You can use GCD with OpenCL to:

Investigate the computational environment in which your OpenCL application is running. Specifically, you can learn which devices in the system would be best for performing particular OpenCL computations and operations:

You can find out about the computational power and technical characteristics of each OpenCL-capable device in the system. See “Discovering Available Compute Devices.”

GCD can suggest which OpenCL device(s) would be best for running a particular kernel.

You can obtain recommendations about how to configure kernels. For example, you can get the suggested optimal size of the workgroup for each kernel on any particular device. See “Notes.”

Enqueue the kernel.

Synchronize work between the host and OpenCL devices and synchronize work between devices.Your host can wait on completion of work in all queues (see “Using GCD To Synchronize A Host With OpenCL”) or one queue can wait on completion of another queue (see “Synchronizing Multiple Queues”).

While this is still a ways from compiler-level optimization, I don't see this as being feasible (or even useful) at the compiler level as different computing components on a system have different power requirements in addition to highly-optimized ways of compute unit handling very specific data/instruction sets. I still think it's appropriate to do this type of logic at runtime and figure out which available compute devices are best to use in which situation (i.e. for a laptop, plugged in vs. running on battery). Having an OS-level service to handle this dispatching and queuing of computational units seems like a very elegant solution.

> The two big desktop platforms, Windows and OS X, don't (currently) use EGL; instead, those platforms have their own extension APIs (WGL and CGL, respectively) to manage these tasks. The X Window System on Unix similarly has an extension called GLX.; however, the new Wayland windowing system, which is being developed to replace X, uses EGL.

Maybe so, but in general, nobody writing for Windows (or OS X) is writing GL ES, but rather full OpenGL. As such, most Windows devs have to do the stupid double context creation to use OpenGL with all the extensions. I wouldn't be entirely surprised to see EGL eventually extended to make it a better fit for these platforms.