On Wed, 2004-06-30 at 15:57, Tim Hochberg wrote:
> I spend some time seeing what I could do in the way of speeding up
> wxPoint_LIST_helper by tweaking the numarray code. My first suspect was
> _universalIndexing by way of _ndarray_item. However, due to some
> new-style machinations, _ndarray_item was never getting called. Instead,
> _ndarray_subscript was being called. So, I added a special case to
> _ndarray_subscript. This sped things up by 50% or so (I don't recall
> exactly). The code for that is at the end of this message; it's not
> gauranteed to be 100% correct; it's all experimental.
>> After futzing around some more I figured out a way to trick python into
> using _ndarray_item. I added "type->tp_as_sequence->sq_item =
> _ndarray_item;" to _ndarray new.
I'm puzzled why you had to do this. You're using Python-2.3.x, right?
There's conditionally compiled code which should be doing this
statically. (At least I thought so.)
> I then optimized _ndarray_item (code
> at end). This halved the execution time of my arbitrary benchmark. This
> trick may have horrible, unforseen consequences so use at your own risk.
Right now the sq_item hack strikes me as somewhere between completely
unnecessary and too scary for me! Maybe if python-dev blessed it.
This optimization looks good to me.
> Finally I commented out the __del__ method numarraycore. This resulted
> in an additional speedup of 64% for a total speed up of 240%. Still not
> close to 10x, but a large improvement. However, this is obviously not
> viable for real use, but it's enough of a speedup that I'll try to see
> if there's anyway to move the shadow stuff back to tp_dealloc.
FYI, the issue with tp_dealloc may have to do with which mode Python is
compiled in, --with-pydebug, or not. One approach which seems like it
ought to work (just thought of this!) is to add an extra reference in C
to the NumArray instance __dict__ (from NumArray.__init__ and stashed
via a new attribute in the PyArrayObject struct) and then DECREF it as
the last part of the tp_dealloc.
> In summary:
>> Version Time Rel Speedup Abs Speedup
> Stock 0.398 ---- ----
> _naarray_item mod 0.192 107% 107%
> del __del__ 0.117 64% 240%
>> There were a couple of other things I tried that resulted in additional
> small speedups, but the tactics I used were too horrible to reproduce
> here. The main one of interest is that all of the calls to
> NA_updateDataPtr seem to burn some time. However, I don't have any idea
> what one could do about that.
Francesc Alted had the same comment about NA_updateDataPtr a while ago.
I tried to optimize it then but didn't get anywhere. NA_updateDataPtr()
should be called at most once per extension function (more is
unnecessary but not harmful) but needs to be called at least once as a
consequence of the way the buffer protocol doesn't give locked
pointers.
> That's all for now.
>> -tim
Well, be picking out your beer.
Todd
>>>> static PyObject*
> _ndarray_subscript(PyArrayObject* self, PyObject* key)
>> {
> PyObject *result;
> #ifdef TAH
> if (PyInt_CheckExact(key)) {
> long ikey = PyInt_AsLong(key);
> long offset;
> if (NA_getByteOffset(self, 1, &ikey, &offset) < 0)
> return NULL;
> if (!NA_updateDataPtr(self))
> return NULL;
> return _simpleIndexingCore(self, offset, 1, Py_None);
> }
> #endif
> #if _PYTHON_CALLBACKS
> result = PyObject_CallMethod(
> (PyObject *) self, "_universalIndexing", "(OO)", key, Py_None);
> #else
> result = _universalIndexing(self, key, Py_None);
> #endif
> return result;
> }
>>>> static PyObject *
> _ndarray_item(PyArrayObject *self, int i)
> {
> #ifdef TAH
> long offset;
> if (NA_getByteOffset(self, 1, &i, &offset) < 0)
> return NULL;
> if (!NA_updateDataPtr(self))
> return NULL;
> return _simpleIndexingCore(self, offset, 1, Py_None);
> #else
> PyObject *result;
> PyObject *key = PyInt_FromLong(i);
> if (!key) return NULL;
> result = _universalIndexing(self, key, Py_None);
> Py_DECREF(key);
> return result;
> #endif
> }
>>>>>> -------------------------------------------------------
> This SF.Net email sponsored by Black Hat Briefings & Training.
> Attend Black Hat Briefings & Training, Las Vegas July 24-29 -
> digital self defense, top technical experts, no vendor pitches,
> unmatched networking opportunities. Visit www.blackhat.com
> _______________________________________________
> Numpy-discussion mailing list
>Numpy-discussion at lists.sourceforge.net>https://lists.sourceforge.net/lists/listinfo/numpy-discussion--