[Python-Dev] A Subtle Bug in Class Initializations

On Mon, Aug 6, 2018 at 8:11 PM Eddie Elizondo <eelizondo at fb.com> wrote:
>
> Background:
>
> Through the implementation of an alternate runtime I've been poking around some of the class initialization routines and I found out that there was a subtle bug with PyType_Ready and the header initializer PyVarObject_HEAD_INIT.
>
>
>
> Looking through the codebase, I couldn't really find any pattern of when the type should be defined within PyVarObject_HEAD_INIT. Sometimes it was initialized to NULL (or 0) and PyType_Type (let's ignore Py_True and Py_False from now).
>
>
>
> From PyType_Ready it turns out that setting the value PyType_Type is never actually needed outside of PyType_Type and PyBaseObject type. This is clear from the code:
>
> if (Py_TYPE(type) == NULL && base != NULL)
>
> Py_TYPE(type) = Py_TYPE(base);
>
>
>
> Given that any PyTypeObject's base is of type PyType_Type, setting PyVarObject_HEAD_INIT(&PyType_Ready) is superfluous. Therefore, setting all static PyTypeObjects to their ob_type to NULL should be a safe assumption to make.
>
>
>
> Uninitialized Types:
>
> A quick s/PyVarObject_HEAD_INIT(&PyType_Type/PyVarObject_HEAD_INIT(NULL/ shows that some objects do need to have their ob_type set from the outset, violating the previous assumption. After writing a quick script, I found that out of the ~300 PyVarObject_HEAD_INIT present in CPython, only these 14 types segfaulted:
>
>
>
> PyByteArrayIter_Type
>
> PyBytesIter_Type
>
> PyDictIterKey_Type
>
> PyDictIterValue_Type
>
> PyDictIterItem_Type
>
> PyClassMethod_Type
>
> PyAsyncGen_Type
>
> PyListIter_Type
>
> PyListRevIter_Type
>
> PyODictIter_Type
>
> PyLongRangeIter_Type
>
> PySetIter_Type
>
> PyTupleIter_Type
>
> PyUnicodeIter_Type
>
>
>
>
>
> Bug:
>
> It turns out that these PyTypeObjects are never initialized through PyType_Ready. However, they are used as fully initialized types. It is by pure chance that the work without having to call the initializer on them. This though is undefined behavior. This not only might result in a weird future bug which is hard to chase down but also, it affects other runtimes as this behavior depends on implementation details of CPython.
>
>
>
> This is a pervasive pattern that should be removed from the codebase and ideally extensions should follow as well.
>
>
>
> Solution:
>
> Here are my proposed solutions in order from less controversial to most controversial. Note that all of them I already tried in a local branch and are working:
>
>
>
> 1) Initialize all uninitialized types.
>
>
>
> Example: https://github.com/eduardo-elizondo/cpython/commit/bc53db3cf4e5a6923b0b1afa6181305553faf173
>
> 2) Move all PyVarObject_HEAD_INIT to NULL except PyType_Type, PyBaseObject_Type and the booleans.
>
>
>
> 3) Special case the initialization of PyType_Type and PyBaseObject_Type within PyType_Ready to now make all calls to PyVarObject_HEAD_INIT use NULL. To enable this a small change within PyType_Ready is needed to initialize PyType_Type PyBaseObject:
Coincidentally, I just wrote a long-ish blog post explaining in
technical details why PyVarObject_HEAD_INIT(&PyType_Type) pretty much
cannot work, at least for extension modules (it is not a problem in
the core library), on Windows (my post was focused on Cygwin but it is
a problem for Windows in general):
http://iguananaut.net/blog/programming/windows-data-import.html
The TL;DR is that it's not possible on Windows to initialize a struct
with a pointer to some other data that's found in another DLL (i.e.
&PyType_Type), unless it happens to be a function, as a special case.
But &PyType_Type obviously is not, so thinks break.
So I'm +1 for requiring passing NULL to PyVarObject_HEAD_INIT,
requiring PyType_Ready with an explicit base type argument, and maybe
(eventually) making PyVarObject_HEAD_INIT argumentless.