I was looking at fromiter again today with an eye toward extending it to
accept iterators of sequences instead of just iterators of scalars. For
example:
fromiter(([x, x // 2, x+5] for x in range(1000)), dtype=int)
This would result in a shape-(1000,3) array. At first glance at least,
this looks straightforward: one would simply have to correctly deduce
size of the sequence with the given dtype and I imagine that I can
enlist existing numpy machinery to do this for me without a problem. But
enough about that, I won't be able to try this till next week, and these
things are often not as easy as they appear, the real reason I'm writing
is this comment that explains why object arrays are disallowed in
fromiter (multiarraymodule.c::PyArray_:FromIter)
/* We would need to alter the memory RENEW code to decrement any
reference counts before just throwing away the memory.
*/
This doesn't seem right. The array that we would be RENEWing is a bunch
of PyObject*s. The reference counts don't reside there, but in the
objects themselves. When we do the RENEW, we don't want the reference
counts to change at all. The one tricky case is if we run out of memory,
I'm not certain that the current setup correctly deals with reference
counts in this case, although it appears likely that it should work
since ret->data should still point to a valid chunk of memory and
decreffing ret should result in the subsequent deallocation of all the
stored objects.
So, it looks like objects should either just work, or should work with a
minimal amount of tweaking. However, it's possible that I'm getting
rusty at Python extension writing (or more to the point, reading). Does
anyone remember if this check was added to address a specific problem?
If so, do you also remember what it is? I suppose I can track back
through the revision history if no one remembers, but I figured I'd try
the lazy approach first and ask about it.
-tim