In a regression problem, one is given a multidimensional random vector X, the components of which are called predictor variables, and a random variable, Y, called response. A regression surface describes a general relationship between X and Y. A nonparametric regression technique that has been successfully applied to high-dimensional data is projection pursuit regression (PPR). The regression surface is approximated by a sum of empirically determined univariate functions of linear combinations of the predictors. Projection pursuit learning (PPL) formulates PPR using a 2-layer feedforward neural network. The smoothers in PPR are nonparametric, whereas those in PPL are based on Hermite functions of some predefined highest order R. We demonstrate that PPL networks in the original form do not have the universal approximation property for any finite R, and thus cannot converge to the desired function even with an arbitrarily large number of hidden units. But, by including a bias term in each linear projection of the predictor variables, PPL networks can regain these capabilities, independent of the exact choice of R. Experimentally, it is shown in this paper that this modification increases the rate of convergence with respect to the number of hidden units, improves the generalization performance, and makes it less sensitive to the setting of R. Finally, we apply PPL to chaotic time series prediction, and obtain superior results compared with the cascade-correlation architecture