We show that there is a simple (approximately radial) function on $\reals^d$,
expressible by a small 3-layer feedforward neural networks, which cannot be
approximated by any 2-layer network, to more than a certain constant accuracy,
unless its width is exponential in the dimension. The result holds for
virtually all known activation functions, including rectified linear units,
sigmoids and thresholds, and formally demonstrates that depth -- even if
increased by 1 -- can be exponentially more valuable than width for standard
feedforward neural networks. Moreover, compared to related results in the
context of Boolean functions, our result requires fewer assumptions, and the
proof techniques and construction are very different.