This patch make the cl_nv_device_attribute_query extension work with the
latest nVidia Cuda toolkit 3.1. Tested on Ubuntu 10.4.
Actually, the inclusion of the extension header is commented. Anyway I
have used the cl_ext.h header provided by the official nVidia installer
labeled "CUDA Toolkit for Ubuntu Linux 9.10"
All the best,
Paolo
--
Paolo Simone Gasparello

Hi all,
PyOpenCL finally has its own numpy.ndarray work-alikes. This has just
landed in the git tree. The code is a port of PyCUDA's equivalent
facilities, and like that code is built on machinery to evaluate
arbitrary element-wise expressions. (See [1] for a description.)
I will add documentation as soon as I have time, for now you may read
the source in pyopencl/array.py. Tests are available in
test/test_array.py.
Unrelatedly, I'm not sure if I mentioned this: Since last Friday,
PyOpenCL git supports and wraps OpenCL 1.1 if it is available at compile
time. (It will of course still compile against OpenCL 1.0
implementations.)
Looks like we'll have a pretty packed 0.92 release. :) I'll cook up a
release candidate soon.
Andreas
[1] http://documen.tician.de/pycuda/array.html#module-pycuda.elementwise

On Sat, 26 Jun 2010 12:30:17 -0600, Cyrus Omar <cyology(a)gmail.com> wrote:
> I think it makes more sense to force both global and local size to be
> keyword arguments, as folks are used to the order of positional
> arguments in a call being the same as those in a definition and it's
> not clear on first glance why e.g. a.shape is being passed instead.
> Similarly with queue, these are sort of meta-arguments and should
> probably be distinguished as such. This seems more clear:
>
> sum(a, b, dest, queue=queue, global_size=a.shape, local_size=(256,))
>
> Perhaps shorter names could be used (gsize, lsize, q) to minimize
> typing? Or I wrote some extensions that implicitly pass a default
> queue around and allow kernels to specify how to calculate sizes in
> their definitions to mostly eliminate the issue as well.
I'm -1 on your proposal, since it's a lot of typing.
Here's a counterproposal, partially based on how the C++ wrapper does
things:
bound_sum = sum.bind(queue, None, (256,))
bound_sum(a, b, dest, global_size=a.shape)
I.e. create an instance of a 'BoundKernel' type, which represents a
binding of a kernel to a queue, and which may also hold defaults for
local and global size, all of which *can* be overriden by kwargs at
invocation time. This provides the clear separation of meta- and actual
arguments that you're after.
Opinions? (Just to be clear--whatever gets decided here will be
backward-compatible, although it might deprecate current behavior.)
Andreas
PS: It'd be good if we had a consensus on this by Tuesday, because
that's when I'll teach the pyopencl tutorial at SciPy'10. :)

I was trying to build pyopencl on my default snow leopard.
I booted into 64 bit mode.
Built boost as in the how to.
Built pyopencl as in the how to.
Everything seems to run and compile fine. But upon importing pyopencl in my
python interpreter I get the familiar.
Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29)
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyopencl
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Python/2.6/site-packages/pyopencl/__init__.py", line 3, in
<module>
import pyopencl._cl as _cl
ImportError: dlopen(/Library/Python/2.6/site-packages/pyopencl/_cl.so, 2):
Symbol not found: __ZN5boost6python17error_already_setD1Ev
Referenced from: /Library/Python/2.6/site-packages/pyopencl/_cl.so
Expected in: flat namespace
in /Library/Python/2.6/site-packages/pyopencl/_cl.so
>>> exit()
I think I've seen some of these to help.
rhaynes$ otool -L /Library/Python/2.6/site-packages/pyopencl/_cl.so
/Library/Python/2.6/site-packages/pyopencl/_cl.so:
/System/Library/Frameworks/OpenCL.framework/Versions/A/OpenCL (compatibility
version 1.0.0, current version 1.0.0)
/usr/lib/libstdc++.6.dylib (compatibility version 7.0.0, current version
7.9.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version
123.0.0)
rhaynes$ otool -L /Users/rhaynes/pool/lib/libboost_python.dylib
/Users/rhaynes/pool/lib/libboost_python.dylib:
libboost_python.dylib (compatibility version 0.0.0, current version 0.0.0)
/System/Library/Frameworks/Python.framework/Versions/2.6/Python
(compatibility version 2.6.0, current version 2.6.1)
/usr/lib/libstdc++.6.dylib (compatibility version 7.0.0, current version
7.9.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version
125.2.0)
rhaynes$ lipo -info `which python`
Architectures in the fat file: /usr/bin/python are: x86_64 i386 ppc7400
I was using the Default install of python because the install wiki didn't
advise against it. For some reason I cannot get the link to the complete
build of python, numpy, scipy etc. I'm not opposed to building it all from
scratch. But I'd prefer be lazy...
I'm doing the tutorials at scipy conf this coming week. Any help I could
get to have this running would be much appreciated.
Cheers,
Ryan

Hi all,
I've made a (compatible) change to a fairly central pyopencl interface,
in the hopes of making everybody's life easier--but this also means that
nearly every pyopencl program in existence now contains code that is
considered deprecated and will result in warnings from pyopencl 0.92
onward and will stop working in 0.94.
The change involves Kernel.__call__, where I've moved 'local_size' from
keyword argument to third positional argument. Luckily, old and new
usage can be safely disambiguated.
Before:
kernel(queue, h_c.shape, d_c_buf, d_a_buf, d_b_buf,
local_size=(block_size, block_size))
After:
kernel(queue, h_c.shape, (block_size, block_size),
d_c_buf, d_a_buf, d_b_buf)
I find the new interface better, because a) it groups related arguments
together and b) it more easily deals with *args-style argument passing
(At least as long as we're targeting Python 2.x.).
The reason for the change is that local_size will be passed to nearly
every call to this interface, so the keyword-arg position was
inconvenient. This change is painful now, but keeping this annoying
interface around would have continued to be painful forever. That's why
I decided to change.
The change just landed in git. Verbose warnings are given for each case
of deprecated use. All code in pyopencl itself was updated to the new
usage.
If you have comments, questions, or suggestions, please do speak up now,
*before* this change becomes part of a release.
Andreas

In Windows 7x64 on a 64-bit machine (Mac Pro running Boot Camp), I am trying
to install pyOpenCl. I can successfully compile Boost 1.43 from source, an
run the setup.py in pyOpenCl with appropriate (I think) options in
site-config, using Visual Studio 9. I get no errors. After installing
pyOpenCl in site-packages, I get the following error on trying to import it:
>>> import pyopencl
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pyopencl\__init__.py", line 3, in <module>
import pyopencl._cl as _cl
ImportError: No module named _cl
I tried running Dependency Walker on the pyopencl\_cl.pyd file and found it
looking for both boost_python-vc90-mt-1_43.dll and IEShims.dll, so I threw
these in the pyopencl directory and it still does not import (same error).
Dependency Walker lists two kinds of errors:
Error: At least one module has an unresolved import due to a missing export
function in an implicitly dependent module.
Error: Modules with different CPU types were found.
I don't understand the former. For the latter, all dependencies are 64-bit
libraries except for MSVCR90.dll, MSVCP90.dll, the boost_python dll
mentioned above, python26.dll, and the _cl.pyd I am examining. For the next
step in getting this to work, I don't what of all this is relevant. Should
I abandon trying to make it work in a 64-bit OS? Has anyone ever made it
work in 64-bit Windows?
Thanks,
Rick
--
Richard C. Gerkin
Postdoctoral Fellow, Dept. of Biological Sciences
Carnegie Mellon University

Hi Paolo,
On Wed, 23 Jun 2010 12:31:13 +0200, Paolo Simone Gasparello <djgaspa(a)gmail.com> wrote:
> Hello everybody,
> I make my code doing ctypes casting in C++ work, so there is no more the
> need to do that casting explicitly in python code. (patch attached)
>
> I also wrote a simple example working on linux (I have used glX APIs and
> flags) showing the usage of GL interop. It draws the sin function using
> coordinates that are generated directly by the video hardware.
>
> I hope this will be useful :)
Thanks for your hard work in getting this far! I've added your fixes and
your example to pyopencl git.
Andreas

Il giorno sab, 19/06/2010 alle 15.04 -0400, Andreas Kloeckner ha
scritto:
> Hi Paolo,
>
> On Tue, 15 Jun 2010 12:02:07 +0200, Paolo Simone Gasparello
<djgaspa(a)gmail.=
> com> wrote:
> > Hi guys, anyone got OpenGL interoperability work?
> > With the following patch I'm able to create a context sharing OpenGL
> > buffers.
>
> Applied, thanks. (And I'm kind of embarrassed that you had to fix
these
> two...) :)
>
> > Where fun_double is a kernel included in the file program.cl that
simply
> > does each coord *=3D 2.
> >=20
> > There are better ways to do this? I tryed to perform the cast to
> > c_void_p in C++ code using boost::python, but I was not able to make
the
> > code work.
>
> I took a look, and it doesn't seem like ctypes has much in the way of
> C-accessible interface. One can of course emulate the Python code in
> C++, but I'm not sure that's necessary. If you'd like guidance on how
to
> do that, let me know.
>
> Lastly, would you mind posting a fully functional example so that I
can
> add it to the examples/ directory shipped with PyOpenCL? I think that
> could be rather helpful for people trying to follow in your footsteps.
>
> Thanks for your work!
> Andreas
Hello everybody,
I make my code doing ctypes casting in C++ work, so there is no more the
need to do that casting explicitly in python code. (patch attached)
I also wrote a simple example working on linux (I have used glX APIs and
flags) showing the usage of GL interop. It draws the sin function using
coordinates that are generated directly by the video hardware.
I hope this will be useful :)
--
Paolo Simone Gasparello <djgaspa(a)gmail.com>

Hi all,
if you have been wondering why the matrix-multiply example shipped with
PyOpenCL shows sub-standard performance on Nvidia hardware, wonder no
longer. In anticipation of next week's SciPy conference, I've finally
fixed that, and it turned out to be (d'oh!) bank conflicts. Which is
odd, since the example was (at some point) derived from Nvidia's own SDK
example. Anyway, for me, matmul performance on the same hardware is now
comparable between CL and CUDA.
Just thought I'd let you know.
Happy hacking,
Andreas