I've been fiddling with ideas for GSoC related to SciPy and I wanted
to run this by people on the list.
David C. and others are often complaining that C and Fortran code is
an order of magnitude harder to maintain than Python/Cython code.
Thus, would there be interest in a proposal that included rewriting
Damian Eads' excellent scipy.spatial.distance and scipy.cluster.vq in
Cython?
I've already been scoping this out as I had wanted to add output
matrix functionality to scipy.spatial.pdist and scipy.spatial.cdist,
which would make scenarios where distances are recomputed frequently
(as in some sort of tracking application) much less memory-intensive.
kmeans
Also at the back of my mind have been implementing some of the tricks
found in the literature for speeding up k-means (optimized versions
that take advantage of the triangle inequality, for instance; "online"
k-means, by which I mean updating the means with the contribution of
each data point sequentially as opposed to considering them all at
once). I'd also like to see the addition of exemplar based methods
such as k-centers and the relatively new affinity propagation (there
is a reference implementation of the latter which would be unsuitable
for direct translation from MATLAB due to licensing, so I'd be
proposing a clean-room implementation).
Any feedback, additional suggestions would be welcome.
Thanks,
David