Python code

2to3 in setup.py

Only changed files will be re-converted when setup.py is called a second
time, making development much faster.

Currently, this seems to handle all of the necessary Python code
conversion.

Not all of the 2to3 transformations are appropriate for all files.
Especially, 2to3 seems to be quite trigger-happy in replacing e.g.
unicode by str which causes problems in defchararray.py.
For files that need special handling, add entries to
tools/py3tool.py.

numpy.compat.py3k

There are some utility functions needed for 3K compatibility in
numpy.compat.py3k -- they can be imported from numpy.compat:

bytes, unicode: bytes and unicode constructors

asbytes: convert string to bytes (no-op on Py2)

asbytes_nested: convert strings in lists to Bytes

asunicode: convert string to unicode

asunicode_nested: convert strings in lists to Unicode

asstr: convert item to the str type

getexception: get current exception (see below)

isfileobj: detect Python file objects

strchar: character for Unicode (Py3) or Strings (Py2)

open_latin1: open file in the latin1 text mode

More can be added as needed.

numpy.f2py

F2py is ported to Py3.

Bytes vs. strings

At many points in Numpy, bytes literals are needed. These can be created via
numpy.compat.asbytes and asbytes_nested.

numpy.loadtxt et al

These routines are difficult to duck-type to read both Unicode and
Bytes input.

I assumed they are meant for reading Bytes streams -- this is probably
the far more common use case with scientific data.

Cyclic imports

Python 3 is less forgiving about cyclic imports than Python 2. Cycles
need to be broken to have the same code work both on Python 2 and 3.

C Code

NPY_PY3K

A #define in config.h, defined when building for Py3.

private/npy_3kcompat.h

Convenience macros for Python 3 support:

PyInt -> PyLong on Py3

PyString -> PyBytes on Py3

PyUString -> PyUnicode on Py3 and PyString on Py2

PyBytes on Py2

PyUnicode_ConcatAndDel, PyUnicode_Concat2

Py_SIZE et al., for older Python versions

npy_PyFile_Dup, etc. to get FILE* from Py3 file objects

PyObject_Cmp, convenience comparison function on Py3

NpyCapsule_* helpers: PyCObject

Any new ones that need to be added should be added in this file.

ob_type, ob_size

These use Py_SIZE, etc. macros now. The macros are also defined in
npy_3kcompat.h for the Python versions that don't have them natively.

Py_TPFLAGS_CHECKTYPES

Python 3 no longer supports type coercion in arithmetic.

Py_TPFLAGS_CHECKTYPES is now on by default, and so the C-level
interface, nb_* methods, still unconditionally receive whatever
types as their two arguments.

However, this will affect Python-level code: previously if you
inherited from a Py_TPFLAGS_CHECKTYPES enabled class that implemented
a __mul__ method, the same __mul__ method would still be
called also as when a __rmul__ was required, but with swapped
arguments (see Python/Objects/typeobject.c:wrap_binaryfunc_r).
However, on Python 3, arguments are swapped only if both are of same
(sub-)type, and otherwise things fail.

This means that ndarray-derived subclasses must now implement all
relevant __r*__ methods, since they cannot any more automatically
fall back to ndarray code.

PyNumberMethods

The structures have been converted to the new format:

number.c

scalartypes.c.src

scalarmathmodule.c.src

The slots np_divide, np_long, np_oct, np_hex, and np_inplace_divide
have gone away. The slot np_int is what np_long used to be, tp_divide
is now tp_floor_divide, and np_inplace_divide is now
np_inplace_floor_divide.

Py3 introduces the PEP 3118 buffer protocol as the only protocol,
so we must implement it.

The exporter parts of the PEP 3118 buffer protocol are currently
implemented in buffer.c for arrays, and in scalartypes.c.src
for generic array scalars. The generic array scalar exporter, however,
doesn't currently produce format strings, which needs to be fixed.

Also some code also stops working when bf_releasebuffer is
defined. Most importantly, PyArg_ParseTuple("s#", ...) refuses to
return a buffer if bf_releasebuffer is present. For this reason,
the buffer interface for arrays is implemented currently without
defining bf_releasebuffer at all. This forces us to go through
some additional work.

There are a couple of places that need further attention:

VOID_getitem

In some cases, this returns a buffer object on Python 2. On Python 3,
there is no stand-alone buffer object, so we return a byte array instead.

multiarray.int_asbuffer

Converts an integer to a void* pointer -- in Python.

Should we just remove this for Py3? It doesn't seem like it is used
anywhere, and it doesn't sound very useful.

PyBuffer (consumer)

There are two places in which we may want to be able to consume buffer
objects and cast them to ndarrays:

multiarray.frombuffer, ie., PyArray_FromAny

The frombuffer returns only arrays of a fixed dtype. It does not
make sense to support PEP 3118 at this location, since not much
would be gained from that -- the backward compatibility functions
using the old array interface still work.

So no changes needed here.

multiarray.array, ie., PyArray_FromAny

In general, we would like to handle PEP 3118 buffers in the same way
as __array_interface__ objects. Hence, we want to be able to cast
them to arrays already in PyArray_FromAny.

Hence, PyArray_FromAny needs additions.

There are a few caveats in allowing PEP 3118 buffers in
PyArray_FromAny:

bytes (and str on Py2) objects offer a buffer interface that
specifies them as 1-D array of bytes.

Previously PyArray_FromAny has cast these to 'S#' dtypes. We
don't want to change this, since will cause problems in many places.

We do, however, want to allow other objects that provide 1-D byte arrays
to be cast to 1-D ndarrays and not 'S#' arrays -- for instance, 'S#'
arrays tend to strip trailing NUL characters.

So what is done in PyArray_FromAny currently is that:

Presence of PEP 3118 buffer interface is checked before checking
for array interface. If it is present and the object is not
bytes object, then it is used for creating a view on the buffer.

We also check in discover_depth and _array_find_type for the
3118 buffers, so that:

array([some_3118_object])

will treat the object similarly as it would handle an ndarray.

However, again, bytes (and unicode) have priority and will not be
handled as buffer objects.

This amounts to possible semantic changes:

array(buffer) will no longer create an object array
array([buffer], dtype='O'), but will instead expand to a view
on the buffer.

PyBuffer (object)

Since there is a native buffer object in Py3, the memoryview, the
newbuffer and getbuffer functions are removed from multiarray in
Py3: their functionality is taken over by the new memoryview object.

PyString

There is no PyString in Py3, everything is either Bytes or Unicode.
Unicode is also preferred in many places, e.g., in __dict__.

PyUString (defined in npy_3kconfig.h): PyString in Py2, PyUnicode in Py3

PyUnicode: UCS in Py2 and Py3

In many cases the conversion only entails replacing PyString with
PyUString.

PyString is currently defined to PyBytes in npy_3kcompat.h, for making
things to build. This definition will be removed when Py3 support is
finished.

Where *_AsStringAndSize is used, more care needs to be taken, as
encoding Unicode to Bytes may needed. If this cannot be avoided, the
encoding should be ASCII, unless there is a very strong reason to do
otherwise. Especially, I don't believe we should silently fall back to
UTF-8 -- raising an exception may be a better choice.

Exceptions should use PyUnicode_AsUnicodeEscape -- this should result
to an ASCII-clean string that is appropriate for the exception
message.

Some specific decisions that have been made so far:

descriptor.c: dtype field names are UString

At some places in Numpy code, there are some guards for Unicode field
names. However, the dtype constructor accepts only strings as field names,
so we should assume field names are always UString.

descriptor.c: field titles can be arbitrary objects.
If they are UString (or, on Py2, Bytes or Unicode), insert to fields dict.

descriptor.c: dtype strings are Unicode.

descriptor.c: datetime tuple contains Bytes only.

repr() and str() should return UString

comparison between Unicode and Bytes is not defined in Py3

Type codes in numerictypes.typeInfo dict are Unicode

Func name in errobj is Bytes (should be forced to ASCII)

PyUnicode

PyUnicode in Py3 is pretty much as it was in Py2, except that it is
now the only "real" string type.

In Py3, Unicode and Bytes are not comparable, ie., 'a' != b'a'. Numpy
comparison routines were handled to act in the same way, leaving
comparison between Unicode and Bytes undefined.

Fate of the 'S' dtype

On Python 3, the 'S' dtype will still be Bytes.

However,:

str, str_ == unicode_

PyInt

There is no limited-range integer type any more in Py3. It makes no
sense to inherit Numpy ints from Py3 ints.

Currently, the following is done:

Numpy's integer types no longer inherit from Python integer.

int is taken dtype-equivalent to NPY_LONG

ints are converted to NPY_LONG

PyInt methods are currently replaced by PyLong, via macros in npy_3kcompat.h.

Dtype decision rules were changed accordingly, so that Numpy understands
Py3 int translate to NPY_LONG as far as dtypes are concerned.

array([1]).dtype will be the default NPY_LONG integer.

Divide

The Divide operation is no more.

Calls to PyNumber_Divide were replaced by FloorDivide or TrueDivide,
as appropriate.

The PyNumberMethods entry is #ifdef'd out on Py3, see above.

tp_compare, PyObject_Compare

The compare method has vanished, and is replaced with richcompare.
We just #ifdef the compare methods out on Py3.

New richcompare methods were implemented for:

flagsobject.c

On the consumer side, we have a convenience wrapper in npy_3kcompat.h
providing PyObject_Cmp also on Py3.

Pickling

The ndarray and dtype __setstate__ were modified to be
backward-compatible with Py3: they need to accept a Unicode endian
character, and Unicode data since that's what Py2 str is unpickled to
in Py3.

An encoding assumption is required for backward compatibility: the user
must do

loads(f, encoding='latin1')

to successfully read pickles created by Py2.

Module initialization

The module initialization API changed in Python 3.1.

Most Numpy modules are now converted.

PyTypeObject

The PyTypeObject of py3k is binary compatible with the py2k version and the
old initializers should work. However, there are several considerations to
keep in mind.

Because the first three slots are now part of a struct some compilers issue
warnings if they are initialized in the old way.

The compare slot has been made reserved in order to preserve binary
compatibily while the tp_compare function went away. The tp_richcompare
function has replaced it and we need to use that slot instead. This will
likely require modifications in the searchsorted functions and generic sorts
that currently use the compare function.

The previous numpy practice of initializing the COUNT_ALLOCS slots was
bogus. They are not supposed to be explicitly initialized and were out of
place in any case because an extra base slot was added in python 2.6.

Because of these facts it is better to use #ifdefs to bring the old
initializers up to py3k snuff rather than just fill the tp_richcompare
slot. They also serve to mark the places where changes have been
made. Note that explicit initialization can stop once none of the
remaining entries are non-zero, because zero is the default value that
variables with non-local linkage receive.

PyFile

Most importantly, in Py3 there is no way to extract a FILE* pointer
from the Python file object. There are, however, new PyFile_* functions
for writing and reading data from the file.

Compatibility wrappers that return a dup-ed fdopen file pointer are
in private/npy_3kcompat.h. This causes more flushing to be necessary,
but it appears there is no alternative solution. The FILE pointer so
obtained must be closed with fclose after use.

READONLY

The RO alias for READONLY is no more.

These were replaced, as READONLY is present also on Py2.

PyOS

Deprecations:

PyOS_ascii_strtod -> PyOS_double_from_string;
curiously enough, PyOS_ascii_strtod is not only deprecated but also
causes segfaults

PyInstance

There are some checks for PyInstance in common.c and ctors.c.

Currently, PyInstance_Check is just #ifdef'd out for Py3. This is,
possibly, not the correct thing to do.

PyCObject / PyCapsule

The PyCObject API is removed in Python 3.2, so we need to rewrite it
using PyCapsule.