"Phoenix Project" Neural Network Data
The data used in the first series of experiments with neural
networks is drawn from work done at Johnson Space Center in 1974
on Spacelab photographs of the Phoenix metropolitan area.
The photographs were re-photographed with a circular aperture
corresponding to a 2-km diameter circle on the ground, spaced so
that the area originally photographed was covered by 433 of these
circular "events".
Each "event" photograph was then processed through equipment
which produced and photographed a Fraunhofer Diffraction Pattern.
Two distinct sampling geometries were used to record the light
intensity patterns of these FDP's: a radial wedge sampling and an
annular ring sampling. The resulting 195 data points for each
event were the basis of further analysis.
During the 1974 project, 57 "features" were empirically
extracted from the FDP data points. A variety of methods were
used to identify classification groupings for the original events.
These were boiled down from 96 to 5 land use classes, and Fischer
discriminant tests were developed and refined to yield 90+%
classification accuracy, using as few as 9 of the features as
input.
For the neural network experiments, 292 of the total events
were selected to provide coverage of the 5 land use classes.
These, and the frequency distribution with which they are
represented, are:
Class # Events
__________ ________
R(esidential) 31
F(arm) 75
M(ountain) 29
W(ater) 31
U(rban) 126
_____
292
The data format for all files on these two volumes is that of
NeuralWorks input (.nna) files. The first section of the record
contains all the floating point numbers which are the event data
points, plus an event ID number. The second part of the record
contains the desired network output values, in this case a 5-
vector of binary digits, followed by the letter code for the land
use class (see table above), and an event number to match that in
the middle of the record. (The order on the end may vary from file
to file: event number + land use class, or vice versa.) Because the
records are more than one text line long, they consist of multiple
"continuation" lines. NeuralWorks prefaces each text line after the
first within a single record with an "&" character. The records are all
ASCII.
NOTE ON THE FORMAT STRUCTURE FOR THE RECORDS: These data sets
have been converted from an earlier version of NeuralWorks which
separated the input and output data into distinct record types.
Each could carry comments after the data portion; the event ID and
the land use class are comments. The conversion routine does not
remove comments, or insert the current comment delimiter, "!". The
data files you have thus have a comment (Event ID) in the middle of
each record, separating the input portion from the output portion.
When devleoping and specifying your network, you must take this into
account in the I/O Parameters section. Be sure to specify that the
input begins in column 2, and the output in column 57, for the 55-
feature data (54 features plus and Event ID implies column 57 for
output data.)
This diskette contains both the raw data for the 292 sample
events, that is the 195 light intensity data points from the FDPs,
and the feature data (features 1-54 only) for the same events.
The Feature data sets contain the original data, without
normalization, and the feature data is presented in two data
sets: the 148-event subset used for network training; and the
144-event subset used for recall and generalization testing.
Feature Files Content
_____________ ___________________
PhTrain55.nna 148 training events
PhTest55.nna 144 recall events
These have the following frequency distributions of the 5 land
use classes:
PhTrain PhTest
___ ___
R(esidential) 14 17
F(arm) 40 35
M(ountain) 13 16
W(ater) 16 15
U(rban) 65 61
___ ___
148 144
Also on the diskette are the raw FDP datapoint data sets.
These contain, for each event, the 100 radial wedge geometry
points, normalized to a scale of 0 -1000, and the 95 annular ring
geometry points, also normalized to a scale of 0 - 1000.
Again, the files have the format described above. The files
are:
File Name Contents
_________ _____________________
PhTrain195.NNA 100 training events
PhTest195.NNA 77 recall events
115 events not used in this series
Their land use class frequency distributions are:
PhTrain PhTest NotUsed
___ ___ ___
R(esidential) 18 13 0
F(arm) 21 22 32
M(ountain) 20 9 0
W(ater) 20 11 0
U(rban) 21 22 83
___ ___ ___ ___
100 77 115 292