Array broadcasting in certain cases minimizes copying of large arrays in memory, or excessive looping over arrays, which can be very inefficient.
There are certain cases, particularly with JIT compilers, where looping can be faster than array broadcasting.
Often the clean, clear code enabled by array broadcasting outweighs those edge-case possible gains.
As always, make a simple benchmark if you’re concerned about a particular case (broadcast vs. loop) and for Python try
Numba.

Example

If A has size 4x5 and B has shape 1x5, A.*B, A+B etc. just work without bsxfun(), provided the N-D array dimension multiplied is of matching shape.

This is important for clarity and conciseness of syntax.
bsxfun() is a factor that drove me to using Python over Matlab almost entirely.

Fix errors

A.*B

error: operator *: nonconformant arguments (op1 is 5x4, op2 is 1x5)

A.*B.'

Remember to always use .' to transpose because ' means take the Hermetian (complex conjugate) transpose.
Remember that Matlab does memory copies (expensive for large arrays) on transpose.
Python transposes are “free” O(1) since they’re just a view into the original array (pointer tricks).

Minimum versions

Programs earlier than these versions did require bsxfun().

Program

minimum version

date

Matlab

R2016b

Sept. 2016

Octave

3.6.0

Feb. 2012

Python/Numpy

< 2006

Fortran array broadcasting

Fortran 95 brought the
spread() function,
which allows efficient array broadcasting, at the price of replicating the array in memory.