Neural Network general question

1. Since the usual method of training a BPNN means normalizing inputs to accommodate the sigmoidal function limitations, how does one accurately model a continuous function with solutions that fall outside these ranges? I've got a few answers in my mind, but it's not really clear to me just yet. That is, suppose I want to model the relationship between, say, product demand and price. Do I have to normalize the NN input / target values for training and then "scale" the outputs after I run the NN so the outputs are meaningful in their original context (i.e., they look like real prices, not normalized values)?

2. Is it legitimate to mix real and discrete inputs? Citing the product demand example, suppose the price of a product in a particular area is dependent upon two things: the month of the year and the local demand in terms of quantity. It would seem most sensible to normalize the quantities as a series of real (continuous) values, and the months as discrete integers (vectors in a binary or one-of-N format). Thus, the inputs would consist of a real-value input for demand, and a series of inputs for a vector of 1's and 0's to represent the month.

Comments

Typically, continuous outputs are scaled so that the historic distribution of the output variable fits within the output node's range. Most often some linear scaling is used.

Mixing numeric (whether integers or reals) and categorical variables is fine. Categorical variables are most often represented as dummy (0/1) variables. See the sections called "How should categories be encoded?" and "Why not code binary inputs as 0 and 1?" in Part 2 of the Usenet comp.ai.neural-nets FAQ for more on this:

In the intervening time, I'd kinda discovered what you've described. I did also stumble upon a paper that described using the actual target values for correction, rather than scaled targets. The authors claim by so doing it increases accuracy of the network and reduces the needed epochs for convergence (the algorithm proposed performed the scaling inherently during error correction, so it was not required to be done separately).

Apart from that, I'm also looking at coding an ART-2 network in an effort to do pattern recognition of variables that affect construction prices. From what I understand about ART networks, I was thinking perhaps I could use the ART 2 network to create x categories of the tens of thousands of price data I have available (supposing it reduces to say, a few hundred categories), and use those categories (i.e., the forward weight vectors of the network) to train a standard BP MLP.

The reason I propose that is because I get the impression a BP MLP is better at predicition than an ART network, but, of course, requires complete retraining with the addition of new inputs. By combining them, I can avoid the "curse of dimensionality" by using the ART to reduce the data into an expotentially smaller number of categories, which can be used to retrain the BP MLP for prediction...

That's the theory, anyway. Thanks for the response. I didn't think I'd see one.