All of CUDA’s supported vector types, such as float3 and long4 are
available as numpy data types within this class. These
numpy.dtype instances have field names of x, y, z, and w
just like their CUDA counterparts. They will work both for parameter passing
to kernels as well as for passing data back and forth between kernels and
Python code. For each type, a make_type function is also provided (e.g.
make_float3(x,y,z)).

A numpy.ndarray work-alike that stores its data and performs its
computations on the compute device. shape and dtype work exactly as in
numpy. Arithmetic methods in GPUArray support the
broadcasting of scalars. (e.g. array+5) If the

allocator is a callable that, upon being called with an argument of the number
of bytes to be allocated, returns an object that can be cast to an
int representing the address of the newly allocated memory.
Observe that both pycuda.driver.mem_alloc() and
pycuda.tools.DeviceMemoryPool.alloc() are a model of this interface.

Transfer the contents of self into ary or a newly allocated
numpy.ndarray. If ary is given, it must have the same
shape and dtype. If it is not given,
a pagelocked specifies whether the new array is allocated
page-locked.

Transfer the contents of self into ary or a newly allocated
numpy.ndarray. If ary is given, it must have the right
size (not necessarily shape) and dtype. If it is not given,
a page-locked array is newly allocated.

Due to alignment requirements, the effective texture bind address may be
different from the requested one by an offset. This method returns this
offset in units of self’s data type. If allow_offset is False, a
nonzero value of this offset will cause an exception to be raised.

Due to alignment requirements, the effective texture bind address may be
different from the requested one by an offset. This method returns this
offset in units of self’s data type. If allow_offset is False, a
nonzero value of this offset will cause an exception to be raised.

New in version 0.93.

As of this writing, CUDA textures do not natively support double-precision
floating point data. To remedy this deficiency, PyCUDA contains a workaround,
which can be enabled by passing True for allow_double_hack. In this case,
use the following code for texture access in your kernel code:

Make a new, uninitialized GPUArray having the same properties
as other_ary. The dtype and order attributes allow these aspects to
be set independently of their values in other_ary. For order, “A”
means retain Fortran-ordering if the input is Fortran-contiguous, otherwise
use “C” ordering. The default, order or “K” tries to match the strides
of other_ary as closely as possible.

Make a new, zero-initialized GPUArray having the same properties
as other_ary. The dtype and order attributes allow these aspects to
be set independently of their values in other_ary. For order, “A”
means retain Fortran-ordering if the input is Fortran-contiguous, otherwise
use “C” ordering. The default, order or “K” tries to match the strides
of other_ary as closely as possible.

Make a new, ones-initialized GPUArray having the same properties
as other_ary. The dtype and order attributes allow these aspects to
be set independently of their values in other_ary. For order, “A”
means retain Fortran-ordering if the input is Fortran-contiguous, otherwise
use “C” ordering. The default, order or “K” tries to match the strides
of other_ary as closely as possible.

Return an array of shape filled with random values of dtype
in the range [0,1).

Note

The use case for this function is “I need some random numbers.
I don’t care how good they are or how fast I get them.” It uses
a pretty terrible MD5-based generator and doesn’t even attempt
to cache generated code.

If you’re interested in a non-toy random number generator, use the
CURAND-based functionality below.

Warning

The following classes are using random number generators that run on the GPU.
Each thread uses its own generator. Creation of those generators requires more
resources than subsequent generation of random numbers. After experiments
it looks like maximum number of active generators on Tesla devices
(with compute capabilities 1.x) is 256. Fermi devices allow for creating
1024 generators without any problems. If there are troubles with creating
objects of class PseudoRandomNumberGenerator or QuasiRandomNumberGenerator
decrease number of created generators
(and therefore number of active threads).

A pseudorandom sequence of numbers satisfies most of the statistical properties
of a truly random sequence but is generated by a deterministic algorithm. A
quasirandom sequence of n-dimensional points is generated by a deterministic
algorithm designed to fill an n-dimensional space evenly.

Creates object of GPUArray with given shape and dtype,
fills it in with Poisson distributed pseudorandom values
with lambda lambda_value, and returns newly created object.
dtype must be 32-bit unsigned int.

Creates object of GPUArray with given shape and dtype,
fills it in with Poisson distributed pseudorandom values
with lambda lambda_value, and returns newly created object.
dtype must be 32-bit unsigned int.

Creates object of GPUArray with given shape and dtype,
fills it in with Poisson distributed pseudorandom values
with lambda lambda_value, and returns newly created object.
dtype must be 32-bit unsigned int.

Creates object of GPUArray with given shape and dtype,
fills it in with Poisson distributed pseudorandom values
with lambda lambda_value, and returns newly created object.
dtype must be 32-bit unsigned int.

Creates object of GPUArray with given shape and dtype,
fills it in with Poisson distributed pseudorandom values
with lambda lambda_value, and returns newly created object.
dtype must be 32-bit unsigned int.

Creates object of GPUArray with given shape and dtype,
fills it in with Poisson distributed pseudorandom values
with lambda lambda_value, and returns newly created object.
dtype must be 32-bit unsigned int.

Evaluating involved expressions on GPUArray instances can be
somewhat inefficient, because a new temporary is created for each
intermediate result. The functionality in the module pycuda.elementwise
contains tools to help generate kernels that evaluate multi-stage expressions
on one or several operands in a single pass.

Invoke the generated scalar kernel. The arguments may either be scalars or
GPUArray instances.

If range is given, it must be a slice object and specifies
the range of indices i for which the operation is carried out.

If slice is given, it must be a slice object and specifies
the range of indices i for which the operation is carried out,
truncated to the container. Also, slice may contain negative indices
to index relative to the end of the array.

If stream is given, it must be a pycuda.driver.Stream object,
where the execution will be serialized.

Generate a kernel that takes a number of scalar or vector arguments
(at least one vector argument), performs the map_expr on each entry of
the vector argument and then the reduce_expr on the outcome of that.
neutral serves as an initial value. preamble offers the possibility
to add preprocessor directives and other code (such as helper functions)
to be added before the actual reduction kernel code.

Vectors in map_expr should be indexed by the variable i. reduce_expr
uses the formal values “a” and “b” to indicate two operands of a binary
reduction operation. If you do not specify a map_expr, “in[i]” – and
therefore the presence of only one input argument – is automatically
assumed.

dtype_out specifies the numpy.dtype in which the reduction is
performed and in which the result is returned. neutral is
specified as float or integer formatted as string. reduce_expr and
map_expr are specified as string formatted operations and arguments
is specified as a string formatted as a C argument list. name specifies
the name as which the kernel is compiled, keep and options are passed
unmodified to pycuda.compiler.SourceModule. preamble is specified
as a string of code.

Generates a kernel that can compute a prefix sum
using any associative operation given as scan_expr.
scan_expr uses the formal values “a” and “b” to indicate two operands of
an associative binary operation. neutral is the neutral element
of scan_expr, obeying scan_expr(a, neutral) == a.

dtype specifies the type of the arrays being operated on.
name_prefix is used for kernel names to ensure recognizability
in profiles and logs. options is a list of compiler options to use
when building. preamble specifies a string of code that is
inserted before the actual kernels.