I haven't fully waded through all the various replies and to this
thread. I plan to do that and send a reply on specific points later.
This is message is more of a historical, motivational or possibly
philosophical nature.
First off, NumPy has used the term "broadcast" to mean the same thing
since its inception and changing the terminology now is asking for
confusion. *In the context of this mailing list *,I think we should use
"broadcast" in the numpy sense and use appropriate qualifiers when
referring to how other array packages practice broadcasting. Referring
to broadcasting as "shape-preserving broadcasting" or some such doesn't
seems to make things any clearer and adds a bunch of excess verbiage. In
any event, I plan to omit any "broadcast" qualifiers here.
The following understanding was formed by using and occasionally helping
with development of NumPy since it was developed in 1995 or thereabouts.
That doesn't mean that my understanding aggrees with the primary
developers of the time, I may misremember things and my recollections
are likely tinged by the experience I've had with NumPy in the interim.
So, don't take this as definitive, but perhaps it will help provide some
insight into what NumPy's broadcasting is supposed to be.
Let's first dispense with the padding of dimensions. As I recall, this
was a way to make matrix like operations easier. This was way before
there was a matrix class and by defining padding in this way 1-D vectors
could generally be treated as column vectors. Row vectors still needed
to be 2-D (1xN), but they tended to be less frequent, so that was less
of a burden. Or maybe I have that backwards, in any event they were put
there to to facilitate matrix-like uses of numpy arrays. Given that
there is a matrix class at this point, I doubt I would automagically pad
the dimensions if I were designing numpy from scratch now. Since the
dimension padding is at least partly historical accident and since it is
in some sense orthogonal to the main point of numpy's broadcasting I'm
going to pretend it doesn't exist for the rest of this discussion.
At it's core broadcasting is about adjusting the shapes of two arrays so
that they match. Consider an array 'A' and an array 'B' with shaps (3,
Any) and (Any, 4). Here, 'Any' means that the given dimension of the
array is unspecified and can take on any value that is convenient for
functions operating on the array. If we add 'A' and 'B' together we'd
like the two 'Any' dimensions to stretch appropriately so that the
result was an array of shape (3, 4). Similarly adding and array of shape
(3, 4) to an array of shape (Any, 4) should work and produce an array of
shape (3, 4). So far, this is pretty straightforward; I believe, it also
bears a fair amount of resemblance to Sasha's 0-stride ideas.
The complicating factor is that there wasn't a good way to spell 'Any'
at the time. Or maybe we were lazy. Or maybe there was some other reason
that I'm forgetting. In any event, we ended up spelling 'Any' as '1'.
That means that there's no way to distinguish between a dimension that's
of length-1 for some legitimate reason and one that is that length just
for stretchability. It would be an interesting experiment to see how
things would work with no padding and with an explicit 'Any' value
available for dimensions. However, it's probably too much work and would
result in too many backwards compatibility problems for NumPy proper.
[Half baked thoughts on how to do this though: newaxis would produce a
new axis with length -1 (or some other marker length). This would be
treated as length-1 axes are treated now. However, length-1axes would no
longer broadcast. Padding would be right out.]
In summary, the platonic ideal of broadcasting is simple and clean. In
practice it's more complicated for two reasons. First, padding the
dimensions.I believe that this is mostly historical baggage. The second
is the conflation of '1' and 'Any' (a name that I made up for this
message, so don't go searching for it). This may be an hostorical
accident and/or implementation artifact, but there may actually be some
more practical reasons behind this as well that I am forgetting.
Hopefully that is mildly informative,
Regards,
-tim