Last months, I was busy to fill https://pythoncapi.readthedocs.io/
website with random notes. Many discussions occurred on this list and
python-dev, but I was only able to make the most simple and least
controversal changes in Python upstream. I didn't write a PEP because
CPython had a governance crisis. Since a new Steering Committee has
been elected, it's time to see how concrete PEP can be written.

There is also an ongoing discussion about embedded Python and Python
initialization API, but I'm scared by this topic so I don't even
propose to write a new PEP which would supersed PEP 432 :-)
https://bugs.python.org/issue22213

== PEP A: Ecosystem of C extensions in 2019 ==

Discuss cffi, Cython, PyQt usage of the stable ABI, CPython C API,
etc. The goal is not to solve any problem, mostly to list existing
options.

It sounds like an unsual PEP, but I think that a PEP is needed since
the same discussions happened multiple times.

This PEP can describe what are the kind of "C extensions" and maybe
suggest which tools are the best depending on the kind. cffi doesn't
cover all cases, the C API isn't always the right answer, etc.

Last months, I was busy to fill https://pythoncapi.readthedocs.io/
website with random notes. Many discussions occurred on this list and
python-dev, but I was only able to make the most simple and least
controversal changes in Python upstream. I didn't write a PEP because
CPython had a governance crisis. Since a new Steering Committee has
been elected, it's time to see how concrete PEP can be written.

I'm not sure the "good" vs "bad" categorization will lead to very
productive discussions. I think Steve's categorization proposal is more
likely to bridge the various opinions on the subject.

PEP C: Plan to enhance the existing C API

Sounds ok, though of course it depends on whatever the resulting
proposal looks like ;-)

PEP D: Completely new C API

Sounds interesting too, if more adventurous.

I agree, these all sound like worthwhile PEPs.

Hopefully once we have some categorisations for where the current APIs
are at, it will be easier to discuss the various proposals.

In particular, I think PEP D will have a lot of different ideas, and
being able to compare them equivalently will be very important.

(And specifically on PEP A - I hear a lot of people say "check the top
10/20/30 C extensions", but I don't actually know what they are? Even
just a list of them would be great! And I bet 10 of them are the ones
included in CPython ;) )

(And specifically on PEP A - I hear a lot of people
say "check the top
10/20/30 C extensions", but I don't actually know what they are? Even
just a list of them would be great! And I bet 10 of them are the ones
included in CPython ;) )

A dozen or so entries from the top 100 that include
binary extensions:

We really should make a difference between projects using the C API only
indirectly through some other tool like Cython and packages using the C
API directly. Although the split is not perfect, some Cython projects
still use C API calls or implement some functionality in
manually-written C extensions.

A dozen or so entries from the top 100 that
include binary extensions:

We really should make a difference between projects using the C API only
indirectly through some other tool like Cython and packages using the C
API directly. Although the split is not perfect, some Cython projects
still use C API calls or implement some functionality in
manually-written C extensions.

Indeed, but even doing that level investigation requires picking a set
of popular packages to investigate :)

I'll also note that this will only pick up projects using the C API
that are themselves distributed using PyPI. It won't pick up:

Linux projects that are only shipped as distro packages (I'm
squatting solv, rpm, and dnf on PyPI because they're not
pip-installable, but it would be potentially disastrous if "sudo pip
install solv rpm dnf" actually did anything).

A dozen or so entries from the top 100 that
include binary extensions:

We really should make a difference between projects using the C API only
indirectly through some other tool like Cython and packages using the C
API directly. Although the split is not perfect, some Cython projects
still use C API calls or implement some functionality in
manually-written C extensions.

Indeed, but even doing that level investigation requires picking a set
of popular packages to investigate :)

I'll also note that this will only pick up projects using the C API
that are themselves distributed using PyPI. It won't pick up:

Linux projects that are only shipped as distro packages (I'm
squatting solv, rpm, and dnf on PyPI because they're not
pip-installable, but it would be potentially disastrous if "sudo pip
install solv rpm dnf" actually did anything).

Last months, I was busy to fill https://pythoncapi.readthedocs.io/
website with random notes. Many discussions occurred on this list and
python-dev, but I was only able to make the most simple and least
controversal changes in Python upstream. I didn't write a PEP because
CPython had a governance crisis. Since a new Steering Committee has
been elected, it's time to see how concrete PEP can be written.

There is also an ongoing discussion about embedded Python and Python
initialization API, but I'm scared by this topic so I don't even
propose to write a new PEP which would supersed PEP 432 :-)
https://bugs.python.org/issue22213

== PEP A: Ecosystem of C extensions in 2019 ==

Discuss cffi, Cython, PyQt usage of the stable ABI, CPython C API,
etc. The goal is not to solve any problem, mostly to list existing
options.

It sounds like an unsual PEP, but I think that a PEP is needed since
the same discussions happened multiple times.

This PEP can describe what are the kind of "C extensions" and maybe
suggest which tools are the best depending on the kind. cffi doesn't
cover all cases, the C API isn't always the right answer, etc.

I think the PyHandle idea has the best chance of producing a good
end result. I suspect PEP C doesn't go far enough to solve the
problems for alternative Python implementations. They really want
PyObject to be an opaque handle-like object. Trying to make the
existing C-API work like that seems like a nearly impossible task.

The PyHandle layer can be implemented as a separate project. That
gives the freedom to tinker without upsetting people. It will take
some missteps and revisions until the API becomes polished. You
don't want to make those mistakes inside the CPython repo.

As the PyHandle API evolves, I would imagine we would have a lot of
ideas about what PEP C should entail. Ideally the APIs defined by
PEP C would be the ones you need to implement PyHandle for CPython.
I think trying to do PEP C before PEP D is the wrong way around.

An initial goal would be to make the PyHandle layer be a replacement
for the limited API. I.e. make it so that any extension currently
using the limited API could switch to it.

I think the PyHandle idea has the best chance of producing a good
end result. I suspect PEP C doesn't go far enough to solve the
problems for alternative Python implementations. They really want
PyObject to be an opaque handle-like object. Trying to make the
existing C-API work like that seems like a nearly impossible task.

The PyHandle layer can be implemented as a separate
project. That
gives the freedom to tinker without upsetting people. It will take
some missteps and revisions until the API becomes polished. You
don't want to make those mistakes inside the CPython repo.

As the PyHandle API evolves, I would imagine we would
have a lot of
ideas about what PEP C should entail. Ideally the APIs defined by
PEP C would be the ones you need to implement PyHandle for CPython.
I think trying to do PEP C before PEP D is the wrong way around.

I guess it depends on what "enhancements" people are thinking about. But
there definitely is a chance that PEP D can help inform PEP C.

An initial goal would be to make the PyHandle layer
be a replacement
for the limited API. I.e. make it so that any extension currently
using the limited API could switch to it.

As long as people don't think it's a 1:1 correlation of the APIs but the
idea is that extensions using the limited API should be able to use the PEP
D API with a rewrite does make sense as a good goal to me.

I think the PyHandle idea has the best chance of producing a good
end result. I suspect PEP C doesn't go far enough to solve the
problems for alternative Python implementations. They really want
PyObject to be an opaque handle-like object. Trying to make the
existing C-API work like that seems like a nearly impossible task.

Why do you think so? It looks like a relatively simple change to me, mostly
just replacing "Py_INCREF(obj)" with "obj = Py_INCREF(obj)" in user code,
and then clean up a couple of corner cases here and there.

It would obviously break the world, but that's the whole point, right?

I think the
PyHandle idea has the best chance of producing a good
end result. I suspect PEP C doesn't go far enough to solve the
problems for alternative Python implementations. They really want
PyObject to be an opaque handle-like object. Trying to make the
existing C-API work like that seems like a nearly impossible task.

Why do you think so? It looks like a relatively simple change to me, mostly
just replacing "Py_INCREF(obj)" with "obj = Py_INCREF(obj)" in user code,
and then clean up a couple of corner cases here and there.

I don't understand why you think Py_INCREF() would have to return a
new pointer. That doesn't seem necessary to me or too relevant to
making PyObject and PyTypeObject opaque types. I think the
challenge is for all the extension module code that looks inside
those structs, e.g.

ob->ob_type

or

Py_TYPE(ob)->tp_something

You have to provide APIs that replace all those struct member
accesses. When I said "nearly impossible", maybe that's overstating
the effort. Making PyObject opaque actually doesn't seem to bad.
ob_refcnt is only accessed in a few places in the CPython source
code. ob_type is accessed in a lot more but it is not too terrible
to replace those references with Py_TP(ob) (version of Py_TYPE that
casts the arg to PyObject ). I did that for my tagged pointer
experiment and it wasn't too bad.

Making PyTypeObject opaque seems vastly more difficult. Almost
every extension type is defined as a static PyTypeObject structure.
They would have to convert to using something like
PyType_FromSpec(). To make things easier, I think you could have a
function like PyType_FromSpec() that took a traditional PyTypeObject
static structure and returned an opaque PyTypeObject pointer. It
would copy over the information from the static structure. That
way, you decouple the layout of the internal PyTypeObject from what
was allocated statically by the extension module. Doing it that
way, the module source doesn't have to change much. Basically,
change

if (PyType_Ready(&MyType)) {
...
}

to:

if (MyType = PyType_CreateFromDef(&MyTypeDef)) {
...
}

That also solves the problem with some types being heap allocated
and some stack allocated. PyType_CreateFromDef() would always
return a heap allocated object.

I think
the PyHandle idea has the best chance of producing a good
end result. I suspect PEP C doesn't go far enough to solve the
problems for alternative Python implementations. They really want
PyObject to be an opaque handle-like object. Trying to make the
existing C-API work like that seems like a nearly impossible task.

Why do you think so? It looks like a relatively simple change to me, mostly
just replacing "Py_INCREF(obj)" with "obj = Py_INCREF(obj)" in user code,
and then clean up a couple of corner cases here and there.

I don't understand why you think Py_INCREF() would have to return a
new pointer. That doesn't seem necessary to me or too relevant to
making PyObject and PyTypeObject opaque types.

Interesting. I actually don't understand what PyTypeObject has to do with
this. :)

Py_INCREF() needs to return a new handle (whether that's a pointer or not
is an unimportant detail for now).

ob_type is accessed in a lot more but it is not too
terrible
to replace those references with Py_TP(ob) (version of Py_TYPE that
casts the arg to PyObject *). I did that for my tagged pointer
experiment and it wasn't too bad.

Interesting. I actually don't understand what
PyTypeObject has to do with
this. :)

In CPython, Py_TYPE() is fast because it returns a borrowed
reference to the PyTypeObject. With the PyObject API that is built
on top of the pyref handle API, you can't have borrowed references.
So, extension modules instead of:

that's a fair bit slower for something that is done very often.
Also, the PyTypeObject struct for each object might not exist in the
runtime and so PyObject_GetType() has to allocate memory for it, all
the sub-structures and fill in the slots with appropriate data.

In CPython, Py_TYPE() is super cheap because the PyTypeObject
structure is already there and all filled in. Making other Python
runtimes emulate that PyTypeObject stucture could be burdensome.

To relieve that, we can provide APIs that do the same things without
requiring a PyTypeObject structure with a specific layout. E.g. to
check the type:

I understand that "obj->ob_type" might be
problematic (although that does
not even seem sure yet), but what is the problem with "tp->tp_something"?

You are forcing all Python runtimes that want to support the C API
to have the same memory layouts for type objects. It is a poor
design that the source code for extensions ties the implemention
into a certain structure layout. They should be decoupled.

Certainly we would still have fast type checks. They just won't be
done by making the extension module assume a PyObject is a structure
that has a ob_type pointer at a certain offset and that the ob_type
pointer is the same for every single object of that type. Other
runtimes might implement things differently. It is possible to
build an API that abstracts over that. Since we can use C99 inline
functions now, that would be an obvious way to do it.

Py_INCREF() needs to return a new handle
(whether that's a pointer or not
is an unimportant detail for now).

I don't follow. Why does it have to return a new handle? Py_INCREF
should mutate the object. Are you thinking of some kind of
immutable handle type?

Something that doesn't require refcounting, yes. Could be a pointer or an
index into some object ID mapping array. It would mean that pointer
inequality doesn't rule out object identity anymore, because multiple
handles could point to the same object, but it would provide an alternative
to reference counting because each handle would be a single unique reference.

Your proposal of making it a pointer to a refcount would be a way to keep
it backwards compatible – if that's wanted. It's not the only option, though.

I understand
that "obj->ob_type" might be problematic (although that does
not even seem sure yet), but what is the problem with "tp->tp_something"?

You are forcing all Python runtimes that want to support the C API
to have the same memory layouts for type objects.

Not necessarily. I'm just suggesting to keep the current vtable set (a.k.a.
slots) to allow for fast protocol usages. That doesn't mean it's the
"memory layout for type objects", especially not the one that CPython
itself is tied to internally. PyTypeObject currently mingles multiple
things, at least a) being an object itself, b) allowing for pointer type
tests, c) describing the type/configuration/behaviour of an object and d)
providing access to protocols. The different use cases could be separated.

Not necessarily. I'm just suggesting to keep the
current vtable set (a.k.a.
slots) to allow for fast protocol usages. That doesn't mean it's the
"memory layout for type objects", especially not the one that CPython
itself is tied to internally. PyTypeObject currently mingles multiple
things, at least a) being an object itself, b) allowing for pointer type
tests, c) describing the type/configuration/behaviour of an object and d)
providing access to protocols. The different use cases could be separated.

Regarding d, I am curious to know, given that the set of entities in
protocols is limited, do you think abstracting away the protocol access
with API functions could provide the same properties of a vtable without
having the vtable itself become API?

This would benefit systems that chose to not have a direct pointer from
instances to their class or vtable. For them, as even big applications
tend to have only order tens of thousands of types, encoding the type of an
object in a whole 64-bit pointer wastes space. Instead, the instance type
is represented as a small integer ID leaving the rest of the of header
(which, ideally, is no more than a word) for other metadata.

I think the PyHandle idea has the best chance of
producing a good
end result. I suspect PEP C doesn't go far enough to solve the
problems for alternative Python implementations. They really want
PyObject to be an opaque handle-like object. Trying to make the
existing C-API work like that seems like a nearly impossible task.

In https://pythoncapi.readthedocs.io/ I
proposed solutions to get a
smooth transition towards a better C API without starting from scratch
nor breaking backward compatibility.

I agree that it doesn't solve all problems, and that CPython would be
the first one to benefit from this.

The PyHandle layer can be implemented as a separate
project. That
gives the freedom to tinker without upsetting people. It will take
some missteps and revisions until the API becomes polished. You
don't want to make those mistakes inside the CPython repo.

I proposed to add a new opt-in C API (basically, the current C API
with minor changes) directly in the master branch of Python, but I
have been asked to write a PEP for that. It would be the PEP C.

That's not a good long-term solution. If we decide to enhance the API
and add a new opt-in API, it should be easy to use/experiment it.
There is always the problem of the critical mass to make a project
successful.

As the PyHandle API evolves, I would imagine we would
have a lot of
ideas about what PEP C should entail. Ideally the APIs defined by
PEP C would be the ones you need to implement PyHandle for CPython.
I think trying to do PEP C before PEP D is the wrong way around.

An initial goal would be to make the PyHandle layer be a replacement
for the limited API. I.e. make it so that any extension currently
using the limited API could switch to it.

Even if the PEP D makes your applications 10x faster, I don't believe
that we will ever be able to get ride of the current C API. Again, see
the transition from Python 2 to Python 3. Ten years later, some people
are just discussing how to start to migrate this code base. And a lot
of code will stay at Python 2 forever.

That's why I consider that we need to work on PEP C and PEP D in
parallel, but also on PEP A to show the other existing solutions ;-)

I'm not sure about the exact timeline. We can draft a PEP C right now,
but wait until we get enough feedback to take a decision. We might
wait until PEP D made progress if you prefer.

Victor

Night gathers, and now my watch begins. It shall not end until my death.

I think the PyHandle idea has the best chance of
producing a good
end result. I suspect PEP C doesn't go far enough to solve the
problems for alternative Python implementations. They really want
PyObject to be an opaque handle-like object. Trying to make the
existing C-API work like that seems like a nearly impossible task.

I chatted with Armin a little about his PyHandle idea and we came up
with a possible refinement. I hope I can explain it accurately.

One of the key problems with PyPy implementing the CPython API is
that it doesn't have space for a reference count field inside its
internal memory storage for objects. So, when a CPython API returns
a PyObject*, where can PyPy store the reference count? The problem
makes passing objects back and forth over the CPython API expensive
for PyPy. That's my understanding anyhow.

You can define a new API using opaque object handles and that solves
the problem for PyPy. They don't need to emulate reference counting
and can just have a global table of open object handles. The
problem is, how do you convert existing CPython extensions to use
this new API? They still want to do reference counting.

Here is a sketch of the API. Introduce a new, lower level API that
works with object handles. Call them pyref_t. The object handle
API doesn't implement reference counting. So, it is cheap for PyPy
to implement it. Passing objects back and forth using the handle
API is cheaper. To make it easy for existing extension modules or
ones that want to use reference counting, provide a PyObject layer
on top of the handle API. E.g.

Calling PyObject_FromPyRef() allocates a new one of these
structures. When the reference count goes to zero, the handle is
closed and the PyObject memory is freed.

To solve the non-opaque PyObject/PyTypeObject issue, I think you
could have a source code option that turns on opaque types. PyPy
has already implemented (mostly) compatible PyObject and
PyTypeObject structures. With the option off, they do what they do
now. In that case, PyObject_FromPyRef() has to return a non-opaque
PyObject structure and it needs to have a ob_type slot that points
to a non-opaque PyTypeObject structure.

If you turn the source option for opaque types on, something like:

#define Py_OPAQUE_PYOBJECT 1
#include <Python.h>

Things could get more efficient when you use your extension
with PyPy. I.e. PyObject_FromRef() is faster because it doesn't
need to fill in the ob_type pointer. Obviously your extension would
not compile if you are trying to look inside PyObject structs.

Implementing this handle layer for CPython should be quite easy.
PyObject_FromRef can just be a typecast from pyref to PyObject.
No extra piece of memory needs to be allocated because the pyref
object already has space for the reference count. In debug builds,
CPython should check that the handle API is used correct (e.g. add a
field to keep track that handles are properly closed).

Extensions can use a mix of the new pyref handle-based API and the
old PyObject-based API, and get conversion functions between them.
Additionally, this approach would work even when we don't support
the complete details of all the old C API.

Here is a sketch of the API. Introduce a new, lower
level API that
works with object handles. Call them pyref_t. The object handle
API doesn't implement reference counting. So, it is cheap for PyPy
to implement it. Passing objects back and forth using the handle
API is cheaper. To make it easy for existing extension modules or
ones that want to use reference counting, provide a PyObject layer
on top of the handle API. E.g.

Yes, this is an example of what PyPy would allocate when you create
a PyObject from a pyref. In this case, it is an example of what
gets allocated if you have the opaque PyObject flag turned on. The
non-opaque version would need to have a compatible structure layout,
e.g. something like:

I guess they are similar. As I understand, if the limited API is
turned on, PyTypeObject becomes opaque but PyObject is not. My
Py_OPAQUE_PYOBJECT would also make PyObject opaque. I suppose
that's not a big difference because PyObject is actually not too
hard to make opaque, PyTypeObject is the tricky one.

If we are overhauling the API, I think there should be a separate
option to toggle ABI stability. If you turn it off, functions can
become inlined. As an extension author, I can use just use
PyList_GetSize(o). The person compiling the extension can decide
they would rather get those small functions inlined and lose the ABI
stability. If you are distributing a pre-built extension on PyPI,
you probably want the ABI stability flag on (and pay the performance
hit).

Implementing this handle layer for CPython should be
quite easy.
PyObject_FromRef can just be a typecast from pyref to PyObject.
No extra piece of memory needs to be allocated because the pyref
object already has space for the reference count. In debug builds,
CPython should check that the handle API is used correct (e.g. add a
field to keep track that handles are properly closed).

It is possible, today, to treat PyObject as an opaque handle if you do not
stray far from the limited API. (PyPy is less restricted than that.) In
my experience, this kind of handle can be a pair of a pointer to an object
and a reference count and a PyObject points to that pair. These pairs
would be stored together with other handles in a dense array, something
that is easy to allocate from and for the garbage collector to visit. The
reference count field does add a word of overhead but that is offset by not
storing reference counting metadata in the rest of your heap objects.

An interesting property of PyObject relative to your proposal here is that
PyObject is a direct pointer to an object. This means code expects to be
able to compare a PyObject for identity equality using == as one would do
for any other object in C. To ensure that every PyObject has this
property, a mapping from an object to its unique handle must be done when
passing it between Python and C. Different implementation techniques for
this mapping will make the lookup faster or slower.

This relates to an interesting consequence of something like PyHandle. If
PyHandles are not mapped one-to-one to an object, identity comparisons will
need to go through a function call. Furthermore, a compatibility scheme
such as converting a PyHandle to a PyObject would be more complicated than
a simple wrapping of the PyHandle as the resulting PyObject would not be
identity equal to any other PyObject* referring to the same object.

There are a lot of design considerations and experience with handles in
other languages that can inform a design for CPython. For example,
references in Java's JNI are most commonly implemented as a handle that
indirectly references an object. As such, a user of JNI must be careful to
compare references using the IsSameObject predicate instead of an ordinary
== compare in C . Despite JNI being >20 years old, this remains
counterintuitive and is common source of bugs as you can infer from this
Android SDK guide

Another lesson we can learn from JNI is that all of the bugs associated
with file descriptors apply to handles. Because references to things in
memory are more common than file descriptors, these bugs become a lot more
commonly occurring. A good implementation of JNI will avoid using stack
addresses or dense integers as a handle value because it is too hard to
ensure those values are not stale and do not alias to something that
shouldn’t belong to you. Therefore, a good implementation typically avoids
recycling references and obfuscates their values using some form of
encryption. This adds to the overhead of using a reference and the
complexity of implementing JNI.

Because of all of the accumulated experience with handles in other systems,
I think CPython is positioned to do much better than its predecessors.
Having a PyHandle prototype as a third-party extension for experimentation
purposes will go even further to help avoid making subtle mistakes that
affect developers for decades to come.

If PyHandles are not mapped one-to-one to an object,
identity
comparisons will need to go through a function call. Furthermore,
a compatibility scheme such as converting a PyHandle to a
PyObject would be more complicated than a simple wrapping of the
PyHandle as the resulting PyObject would not be identity equal to
any other PyObject* referring to the same object.

Good point. I was thinking implicitly that PyHandles would not be
mapped one-to-one. However, if CPython makes PyHandle just a type
cast from PyObject, people are going to do pointer compares and then
be surprised that their extension breaks with other runtimes.

There are a lot of design considerations and
experience with handles in
other languages that can inform a design for CPython. For example,
references in Java's JNI are most commonly implemented as a handle that
indirectly references an object.

Thank you for bringing up JNI. I was vaguely aware of it but after
doing some reading last night, I see it solves many of the same
problems we are trying to solve.

As such, a user of JNI must be careful to compare
references using
the IsSameObject predicate instead of an ordinary == compare in C
. Despite JNI being >20 years old, this remains counterintuitive
and is common source of bugs as you can infer from this Android
SDK guide

So, what's your opinion on that choice? Not requiring a one-to-one
mapping for the handles makes things easier for the runtime but the
API is harder to use correctly. Should we follow the JNI model or
should we pay the cost to get one-to-one mapping?

If the runtime also pays the memory cost to keep a reference count
in the handle table, I think I see how we could just make PyObject
be the opaque handle.

A good implementation of JNI will avoid using stack
addresses or
dense integers as a handle value because it is too hard to ensure
those values are not stale and do not alias to something that
shouldn’t belong to you.

Interesting. If you give up on binary compatibility,you could have
a debug build option that enables encryption of handle values.
Disable that for better performance in release builds. Maybe that's
poor software engineering though (like no having array bounds
checking turned on by default).

Because of all of the accumulated experience with
handles in other systems,
I think CPython is positioned to do much better than its predecessors.

We better study those systems then. I think it would be best to not
be too creative and stick to a design that has been proven to work.
JNI looks to be a goldmine of ideas (not that we have to make all
the same decisions).

Do you have suggestions for other native interfaces that should be
studied? It looks like CoreCLR has something but it seems to
require using C++. At least, the exception handling uses C++
features. E.g.

So, what's your opinion on that choice? Not
requiring a one-to-one
mapping for the handles makes things easier for the runtime but the
API is harder to use correctly. Should we follow the JNI model or
should we pay the cost to get one-to-one mapping?

If the runtime also pays the memory cost to keep a reference count
in the handle table, I think I see how we could just make PyObject
be the opaque handle.

Note that moving reference counts to a separate table is something
we've previously discussed doing for CPython itself, with the two most
notable problems being:

the possible performance hit of the extra pointer dereference in
Py_INCREF and Py_DECREF (et al)

any third party code that's accessing ob_refcnt directly

The complete unknown from a performance persecptive is the potential
CPU cache management impact of switching from scattered writes to a
lot of different memory blocks to frequent writes to a particularly
hot memory page (although we know up front that the centralised table
will be far more copy-on-write friendly without any need for
gc.freeze() shenanigans).

I suspect as long as we are using reference counting GC for CPython
(which is realistically probably forever), the sensible choice will
be to have the counts located with the object. However, IMHO, the
extension API should not force the VM into that design.

The complete unknown from a performance persecptive
is the potential
CPU cache management impact of switching from scattered writes to a
lot of different memory blocks to frequent writes to a particularly
hot memory page (although we know up front that the centralised table
will be far more copy-on-write friendly without any need for
gc.freeze() shenanigans).

I was watching an interesting video today talking about the cost of
scattered memory access on modern hardware:

https://youtu.be/TJHgp1ugKGM?t=1564

The benchmarks on the Samsung tablet are interesting. CPython is
very sparse (like Java/C# in the presentation, but worse yet). If
you want to look at the refcnt for many objects (e.g. during cyclic
GC pass), it would help to put them in a contiguous array. However,
in normal execution, you are already going to have to object data in
cache and so it makes sense to have the refcnt there too.

Is this an argument for not introducing a PyHandle
data type? Do
you think it is better just to make PyObject work as handles?

To answer your question somewhat obliquely, I believe that it is possible
to make the C-API more amenable to alternative Python implementations with
incremental changes to the C-API that could be absorbed by third-party code
over a series of releases. I also believe that improving PyObject does
not preclude providing a better abstraction like a PyHandle. I suspect
many of the API work needed to make PyObject better would be required to
make even PyHandle possible.

mapping for the handles makes things easier for the
runtime but the
API is harder to use correctly. Should we follow the JNI model or
should we pay the cost to get one-to-one mapping?

I think a lot of details would have to be considered to make the right
decision. For example, instances of types created in the C-API can be
allocated outside of the Python heap. Would that feature be preserved?
You could keep it with PyObject* but it might be harder to do with PyHandle.

Is this an argument for not introducing a
PyHandle data type? Do
you think it is better just to make PyObject work as handles?

To answer your question somewhat obliquely, I believe that it is possible
to make the C-API more amenable to alternative Python implementations with
incremental changes to the C-API that could be absorbed by third-party code
over a series of releases. I also believe that improving PyObject does
not preclude providing a better abstraction like a PyHandle. I suspect
many of the API work needed to make PyObject better would be required to
make even PyHandle possible.

IMHO we should attempt both approaches:

Make PyObject structure "more" opaque. Either make it fully opaque
and break the Python world in a flag data (haha, that would be
funny!), or add a new opt-in C API (using a C #define or whatever). I
worked on the opt-in approach. This approach doesn't work with old
Python versions which cannot be modified anymore, it cannot start
before Python 3.8.

Add a fully new PyHandle API which would be compatible with Python 2.7-3.8.

In parallel, more and more C API changes are pushed in Python 3.8 to
make some structures opaque (PyInterpreterState). That's the first
obvious option that I called "break the world", but the most risky.

It's way too early to bet which approach will work in the long term.

To be even more clear, all approaches have exactly the same goal:
having a more opaque C API to hide as much implementation details as
possible. The long term goal should to hide "all" implementation
details. I let you try to define what are implementation details or
not. Is CPython GC implementation an implementation detail (destroy an
object as soon as its ref count reach zero)? ... These are hard
questions :-)

PyPy knows better than me than CPython is full of subtle
implementation details. For example, I was very surprised to learn
that Python creates a ".0" local variable to list comprehensions :)

[...] instances of types created in the C-API can be
allocated
outside of the Python heap. Would that feature be preserved? You
could keep it with PyObject* but it might be harder to do with
PyHandle.

I can only speak for myself but I would like to kill off non-heap
allocated types. That's not easy because nearly every extension
module that defines a type does so using a static structure for the
new type (not heap allocated). Some discussion here:

https://bugs.python.org/issue35810

We have PyType_FromSpec() but converting an extension module to use
it is non-trivial. I was wondering if we can make an easier change.
Can we just make a version of PyType_Ready() that copies the static
structure into a heap allocated type and then returns that? Then,
fixing the extension modules is pretty easy. Instead of:

PyType_Ready(&MyType);

you do:

MyType = PyType_FromStatic(&MyTypeDef);

The related thing I would like to change is to force all PyObject
structures to be allocated by CPython memory allocators. Aside from
statically allocated types, I believe that is already the case for
objects with the GC flag. The object memory has to come from
_PyObject_GC_New() or _PyObject_GC_NewVar(). That is not the case
for non-GC objects, as far as I'm aware. At least, it was the case
years ago that extension types could use their own malloc
implementation, for example.

There are some legitimate reasons to want to use a special
allocator. However, I don't think those are good enough reasons for
what we are giving up for supporting that. I suspect most people
are not even aware that is a thing. I'm not sure it even works
anymore. When we implemented obmalloc, it was a considerable
challenge to keep it working, as I recall.

BTW, in the above example, "MyType" is nearly always a static
variable in the extension module. All those static PyObject
variables are a contributing factor to making CPython shutdown
complicated and flakey. Look at Py_FinilizeEx() if you are brave.
It is a pile of doggy dodo. Dirty hacks on hacks, slow, and doesn't
really work correctly. Instead of keeping a static variable
reference to the type, you can add the new type of the globals of
the new extension module, e.g.

That way, you don't have an extra ref to MyType keeping it alive
longer than the module it is contained within. If you need quick
access to the type object, it really should be stored in the
per-interpreter module data.

[...] instances of types created in the C-API
can be allocated
outside of the Python heap. Would that feature be preserved? You
could keep it with PyObject* but it might be harder to do with
PyHandle.

I can only speak for myself but I would like to kill off non-heap
allocated types. That's not easy because nearly every extension
module that defines a type does so using a static structure for the
new type (not heap allocated). Some discussion here:
[SNIP]
The related thing I would like to change is to force all PyObject
structures to be allocated by CPython memory allocators.

I don't agree.

To be at all useful, I think your last sentence needs to be "force all
PyObject structures to be allocated by the single CPython memory
allocator for the current runtime". That means we don't need to store
the deallocator function for each object, and can simply pass the memory
blocks to a known allocator (even if that's been switched out at runtime
startup, it won't have changed in the meantime).

However, in the context of features like NVRAM, GPU/CPU contexts, and
even subinterpreters and subprocesses, I think there's a huge advantage
in having objects know how to deallocate themselves. Without this,
there's no way to support these more advanced concepts transparently.
IMHO, that would be missing a huge opportunity.

(Of course, if Py_DECREF somehow became a per-object/per-class virtual
function, then this becomes trivial. Even now, the dealloc function is
per-type, and I don't think we'd gain anything by removing that, while
what we gain from increasing it to be per-object could be significant.)

To be at all useful, I think your last sentence needs to be "force all
PyObject structures to be allocated by the single CPython memory allocator
for the current runtime".

I think you don't need to have a single allocator. My vision is
that the responsibility of allocating and deallocating PyObject
memory is responsibilty of the Python VM. It might use specialized
allocators for different purposes, for example.

That means we don't need to store the deallocator
function for
each object, and can simply pass the memory blocks to a known
allocator (even if that's been switched out at runtime startup, it
won't have changed in the meantime).

It is up to the Python VM to decide how that's done. The VM might
still store a deallocator function per type, like what is currently
done.

However, in the context of features like NVRAM,
GPU/CPU contexts, and even
subinterpreters and subprocesses, I think there's a huge advantage in having
objects know how to deallocate themselves. Without this, there's no way to
support these more advanced concepts transparently. IMHO, that would be
missing a huge opportunity.

Does it help if the PyObject can have a pointer to memory allocated
in these different ways? It seems to me that allows most of the
benefits but still allows the Python VM to GC PyObject memory in an
efficient way. So a Python extension type can still allocate some
extra memory assocated with instances of it and there is a dealloc
method called by the VM to clean it up again. Just the memory for
the PyObject itself must be allocated and deallocated by the VM
itself.

Maybe that is not flexible enough to do what you want. It adds
another layer of indirection. I'm glad you bring up those cases
because the new API should support those kinds of things.

Does it help if the PyObject can have a pointer to
memory allocated
in these different ways? It seems to me that allows most of the
benefits but still allows the Python VM to GC PyObject memory in an
efficient way. So a Python extension type can still allocate some
extra memory assocated with instances of it and there is a dealloc
method called by the VM to clean it up again. Just the memory for
the PyObject itself must be allocated and deallocated by the VM
itself.

An advantage to keeping the PyObject_HEAD in the regular Python heap and
having a separate pointer to off-heap memory is that memory ordering for
the PyObject_HEAD always works as expected. For example, memory for frame
buffers is often in a write-combining area that does not respect ordering,
allowing the PyObject_HEAD fields to be observed in an inconsistent state.

This affects a Python that wishes to have better shared-memory concurrency
support (even if it's just within in its runtime) as not being able to make
assumptions about writes to a PyObject_HEAD can slow down common
operations.