Konrad Hinsen <hinsen at cnrs-orleans.fr> writes:
> Computational and notational efficiency are rather well separated,
> fortunately. Both the current dot function and an hypothetical matrix
Yes, the only thing they have in common is that both are currently
unsatisfactory (for matrix operations) in numpy, at least for my needs.
Although I've solved my most pressing performance problems by patching Numeric
[1], I'm obviously interested in a more official solution (i.e. one that is
maintained by others :)
[...] [order changed by me]
>a.schmolck at gmx.net (A.Schmolck) writes:
> > My impression is that the best path also very much depends on the what the
> > feature aspirations and divisions of labor of numpy/numarray and scipy are
^^^^^^^
Darn, I made a confusing mistake -- this should read _future_.
> > going to be. For example, scipy is really aimed at scientific users, which
> > need performance, and are willing to buy it with inconvenience (like the
>> I see the main difference in distribution philosophy. NumPy is an
> add-on package to Python, which is in turn used by other add-on
> packages in a modular way. SciPy is rather a monolithic
> super-distribution for scientific users.
>> Personally I strongly favour the modular package approach, and in fact
> I haven't installed SciPy on my system for that reason, although I
> would be interested in some of its components.
[...]
> The same approach as for XML could be used: a slim-line version in the
> standard distribution that could be replaced by a high-efficiency
> extended version for those who care.
[...]
I personally agree with all your above points -- if you have a look at our
"dotblas"-patch mentioned earlier (see [1]), you will find that it aims to do
provide that -- have dot run anywhere without a hassle but run (much) faster
if the user is willing to install atlas.
My main concern was that the argument should shift away a bit from syntactic
and implementation details to what audiences and what needs numpy/numarray and
are supposed to address and, in this light, how to best strike the balance
between convinience for users and maitainers, speed and bloat, generality and
efficiency etc.
As an example, adding the dotblas patch [1] to Numeric is, I think more
convinient for the users (granting a few assumptions (like that it actually
works :) for the sake of the argument) -- it gives users that have atlas
better-performance and those who don't won't (or at least shouldn't) notice.
It is however inconvinient for the maintainers. Whether one should bother
including it in this or some other way depends, among the obvious question of
whether there is a better way to achieve what it does for both groups (like
creating a dedicated Matrix class), also on what numpy is really supposed to
achieve. I'm not entirely clear on that. For example I don't know how many
numpy users deeply care about their matrix multiplications for big (1000x1000)
matrices being 40 times faster.
The monolithic approach is not entirely without its charms (remember python's
"batteries included" jinggle)? Apart from convinience factors it also has the
not unconsiderable advantage that people use _one_ standard module for a
certain thing -- rather than 20 different solutions. This certainly helps to
improve code quality. Not least because someone goes through the trouble of
deciding what merrit's inclusion in the "Big Thing", possibly urging changes
but at least almost certainly taking more time for evalutation than an
indivdual programmer who just wants to get a certain job done. It also makes
life easier for module writers -- they can rely on certain stuff being around
(and don't have to reinvent the wheel, another potential improvement to code
quality). As such it makes live easier for maintainers, as does the scipy
commandment that you have to install atlas/lapack, full-stop (and if it
doesn't run on your machine -- well at least it works fast for some people and
that might well be better than working slow for everyone in this context).
So, I think what's good really depends on what you're aiming at, that's why
I'd like to know what users and developers think about these matters.
My points regarding scipy and numpy/numarray were just one attempt at
interpreting what these respective libraries try to/should/could attempt to be
or become. Now, not being a developer for either of them (I've only submitted
a few minor patches to scipy), I'm not in a particular good position to
venture such interpretations, but I hoped that it would provoke other and more
knowledgeable people to share their opinions and insights on this matter (as
indeed you did).
> I'd love to have efficient matrices without having to install the
> whole SciPy package!
Welcome to the linear algebra lobby group ;) yep, that would be nice but my
impression was that the scipy folks are currently more concerned about
performance issues than the numpy/numarray folks and I could live with either
package providing what I want.
Ideally , I'd like to see a slim core numarray, without any frills (and more
streamlined to behave like standard python containers (e.g. indexing and
type/casts behavior)) for the python core, something more enabled and
efficient for numerics (including matrices!) as a seperate package (like the
XML example you quote). And then maybe a bigger pre-bundled collection of
(ideally rather modular) numerical libraries for really hard-core scientific
users (maybe in the spirit of xemacs-packages and sumo-tar-balls -- no bloat
if you don't need it, plenty of features in an instant if you do).
Anyway, is there at least general agreement that there should be some new and
wonderful matrix class (plus supporting libraries) somewhere (rather than
souping up array)?
alex
Footnotes:
[1] patch for faster dot product in Numeric
http://www.scipy.org/Members/aschmolck
--
Alexander Schmolck Postgraduate Research Student
Department of Computer Science
University of Exeter
A.Schmolck at gmx.nethttp://www.dcs.ex.ac.uk/people/aschmolc/