I recently sat with Leon Ginsburg, CEO of Sphere software. We chatted about “How Artificial Intelligence Is Transforming Industrial Markets”. The original article is linked here, but to make this available to others who may not use LinkedIn, I have included the text below.

Published on April 23, 2019
Leon Ginsburg
Artificial Intelligence (AI) is transforming the landscape of many industries and the industrial markets are no exception. At the end of March, Sphere Software sponsored a TechDebate in Chicago to discuss AI’s impact on labor markets. Adam McElhinney, chief of machine learning and AI strategy at Uptake Technologies, brought his years of experience and insights to the discussion.

After the TechDebate, I had the opportunity to talk further with Adam about the topic.

Let’s have part of our conversation within the context of Uptake’s business. Uptake serves industrial markets; what are the typical differences in operations twenty years ago compared to today?

ADAM: Uptake is an industrial AI company, and our forte is helping customers manage big machines that cost a lot of money when they fail or when they are not being utilized to their full potential. For example, we help railway companies manage their locomotives. If a locomotive fails on the tracks, it costs a lot of money.

Repair costs are high. But the bigger cost is the impact on operations because other locomotives can’t just drive around the failed unit. Everything is on hold until the unit is up, running, and out of the way. There is a good chance that this lapse in operation will rack up additional labor costs, or maybe the railway will be charged a penalty for a late shipment. The common thread of Uptake’s business is providing asset performance management(APM) for companies that operate equipment whose failure can have significant costs. big, expensive machinery.

Our technology also helps operate wind turbines for Berkshire Hathaway Energy. Its wind farm operators log in to our software to see wind turbine performance, the location of technicians, the turbine maintenance schedule, how much revenue each turbine is generating, and many other insights. But the secret sauce of our business is how we inject AI to make insights more powerful.

Twenty years ago, you mostly saw companies and users running machines to failure, then fixing them. Industrial enterprises then started doing preventative maintenance largely based on their expertise, intuition, or guidelines from the manufacturer.

Then industry started adopting what I call optimized preventative maintenance, where companies run optimizations and financial calculations to find the balance between performing maintenance tasks before they are truly needed, which adds unnecessary expense, and performing the maintenance too late, which leads to equipment failure.

Now we’re at the point where we can look at the underlying health of a machine using sensor data and make dynamic, real-time optimizations on the health of that machine.

Are some industry segments more open to applying AI than others?

ADAM: Some companies are much more open than others to applying AI, but we also see industry-wide differences. Companies that tend to have higher margins, thus large costs when there is downtime, and operations as one of their core differentiators, tend to be more open to AI implementation than other companies are.

For example, the oil and gas industry is rapidly adopting AI. It’s been highly sophisticated, using advanced software and mathematical modeling in exploration for decades. We’re now seeing the industry start to leverage AI for production and distribution as well as for refining management. Energy segments outside of oil and gas, like renewable and thermal energy, have also been open to adopting AI.

When you look across the industrial market, it’s hard to think of a company or sector that isn’t harmed by equipment failure, even when operating under tighter margins. Why are some sectors not as open to applying AI?

ADAM: AI implantation requires foundational data. I continue to be shocked by the number of industries and companies that still use paper-based records. Obviously, without large sets of digital data, there is nothing to fuel the AI.

Some companies don’t see operations as a differentiator; therefore, maximizing operations may not be a top priority for their time and monetary investments. I believe maximizing operational efficiencies is critical for most every sector and every company, and organizations will come to this conclusion when it is best for the health of their business.

Are there clear differentiations between companies that are adopting AI and machine learning, versus those that aren’t?

ADAM: Absolutely. At Uptake we work to quantify the exact value being delivered to our customers. Our machine learning algorithms operate on a closed loop that helps them get smarter over time; for instance, we can show our executive sponsor the number of value-creating events triggered by our software.

To learn more about rapidly evolving artificial intelligence and other technology advancements, watch for upcoming TechDebates being held around the world.

Our company, Sphere Software, is a sponsor and organizer of TechDebates initiative and events globally.

Recently, I had the opportunity to be a guest on the Rob’s Reliability podcast with Robert Kalwarowsky. Besides being an absolute blast, Rob asked a bunch of great questions about artificial intelligence and how it can be applied to the field of reliability analysis and asset performance management.

The full podcast is here. I hope to soon get time to type up notes to share on the site. Stay tuned!

Back in 2012, I wrote a short post describing some of the issues I was having doing sampling in R using the function written by Ananda. Anada actually commented on my post directing me to an updated version of the code he has written, which is greatly improved.

However, I still feel that there is room for improvement. One common example is many times during the model building process, you wish to split data into testing, training and validation data sets. This should be extremely easy and done in a standardized way.

Thus, I have set out to simplify this process. My goals are to write a function in to split data frames R that:

1. Allows the user to specify any number of splits and the size of splits (the splits need not necessarily all be the same size).

2. Specify the names of the resulting split data.

3. Provide default values for the most common use case.

4. Just “work”. Ideally this means that the function is intuitive and can be used without requiring the user to read any documentation. This also means that it should have minimal errors, produce useful error messages and protect against unintended usage.

5. Be written in a readable and well commented manner. This should facilitate debugging and extending functionality, even if this means performance is not 100% optimal.

I have written this code as part of my package that is in development called helpRFunctions, which is designed to make R programming as painless as possible.

The function takes just a few arguments:

1. df : The data frame to split

2. pcts Optional. The percentage of observations to put into each bucket.

3. set.names Optional. What to name the resulting data sets. This must be the same length as the pcts vector.

4. seed Optional. Define a seed to use for sampling. Defaults to NULL which is just the normal random number generator in R

The function then returns a list containing data frames named according to the set.names argument.

At work, every Friday we have a mailing list set up where a riddle is announced and then the participants are invited to submit answers to the riddle on Tuesday.

A few weeks ago, there was an interesting riddle presented.

Riddle: On an 8 x 8 chessboard, define two squares to be neighbors if they share a common side. Some squares will have two neighbors, some will have three, and some will have four. Now suppose each square contains a number subject to the following condition: The number in a square equals the average of the numbers of all its neighbors. If the square with coordinates [1, 1] (i.e. a corner square) contains the number 10, then find (with proof) all possible values that the square with coordinates [8, 8] (i.e. opposite corner) can have?

After some consideration, there is an easy proof by contradiction. However, I am trying to learn a new programming language, specifically Mathematica. I thought that attempting to solve this riddle by developing a system of equations in Mathematica would be good practice.

The Mathematica code is located below and also posted on my Github account. By the way, the answer is 10.

Today I am delighted to be presenting at the BrightTalk Analytics Summit. I will be presenting on some new open-source tools that are available for doing analytics. In particular, I will be focusing on Python Scikit-Learn, PyStats, Orange, Julia and Octave.

However, when I ran this function on my data, I received an error that R ran out of memory. Therefore, I had to create my own stratified sampling function that would work for large data sets with many groups.

After some trial and error, the key turned out to be sorting based on the desired groups and then computing counts for those groups. The procedure is extremely fast, taking only .18 seconds on a large data set. I welcome any feedback on how to improve!

stratified_sampling<-function(df,id, size) {
#df is the data to sample from
#id is the column to use for the groups to sample
#size is the count you want to sample from each group

# Order the data based on the groups
df<-df[order(df[,id],decreasing = FALSE),]

I wanted to share some new code with you. The code below allows a user to specify a range of data and then the code will output the standardized values (mean=0 and standard deviation 1) for each of the columns. This can be a big time saver over Excel’s standardize function, which requires the user to input the mean and standard deviation and only standardizes one cell at a time. Also, this allows the user to specify specifically how they want the standard deviation calculated.

Real estate is undoubtedly one of the most important components of the modern economy. In particular, owning a house has long typified the American dream and a residential home is the largest asset for a majority of Americans#. The housing market had swelled 12.5% from 1990 to 200, eventually totaling 119.6 million units#. Along with this growth in the housing market, has come an even greater growth in the accurate assessment of housing prices.

In response to this demand, the consumer market has responded with many websites offer users the ability to get a free and quick estimate of their homes value by inputting several variables such as homes location, size, and features. However, the accuracy of these estimates in unknown and the formulas they are using for estimation are often kept proprietary. Despite this, the internet has allowed for an unprecedented amount of real estate transactions to be tracked, often given details regarding the homes age, size, location, and features.
These data sets are currently being used in the world of financial economics, which has long been interested in real estate valuation. They have employed a variety of statistical techniques in an attempt to accurately predict the values of properties. The bulk of the research has relied on multivariate regression to assign weights to the various features of the home. However, the limitations of these techniques have been found and new statistical techniques such as neural networks are being tested as replacements.
The accuracy of these neural networks in comparison to traditional multivariate method is still in debate; however research is showing a growing niche where neural network models are able to outperform traditional techniques.
II. Background

There are three main methods of valuing real estate. One method is the discounted cash flow method. This is primarily used in the valuation of commercial properties. This method calculates the present value of all the income that one expects to receive from the property. However, this method has several flaws. First, the discount rate used is highly subjective. Various complex statistical techniques have been developed for calculating the discount rate. Further, the discount rate assumes that interest rates will stay constant. Interest rates can be highly volatile, additionally even small changes in the interest rate can have large effects of the property valuation. Lastly, the discounted cash flow model faces encounters circular logic if the purchase of the property is to be funded with a loan. The value of the asset is needed to calculate the WACC; however it is that value we seek to find#.

Another method for real estate valuation is the cost method. This method takes the value of the land as if it was vacant, plus the deprecated value of any structures occupying the land.# However, these methods are very limited in their practicality, especially when applied to large scale valuation of residential properties.

Undoubtedly the most popular technique for real estate valuation is the market comparison approach. In a traditional market comparison approach, a property’s value is determined by finding other recently sold properties with similar physical characteristics, called comparables, or ‘comps’ for short. Then, weights are applied to the various features of the house and adjusted to as closely approximate the target house as possible.

Several practical issues arise from this method. First of all, there may not be a large number of comparable properties that have been sold recently. This is particularly true if the property is very different from the average properties in the area, or has some unique characteristic that does not allow for comparison#. In addition, by omitting many other sales from the analysis, the appraiser may be losing valuable information. In a similar vein, typically only a small number of comparable properties are available. This number of comparable properties may not be enough to assume a statistically normal distribution and thus price estimates may be flawed.

Furthermore, the differences in the properties may be quite significant. There are 5 main differences listed that need to be accounted for in this method; physical, location, market conditions, terms of financing, and conditions of sale. For example, the prime rate, the rate that is largely responsible for the rate a homebuyer will pay for a loan, dropped 2.5% the first six months in 2001#. Thus any property sales that were more than 1 or 2 months old would need to be reconfigured to reflect this new rate.

Lastly, the weighting assigned to each of the various comparable properties is often subject to scrutiny and debate. The accuracy of these weights is highly relevant to the fair pricing of the home.

In order to counteract these limitations, appraisers have begun to rely on statistical techniques such as multivariate regression. In a typical appraisal, the prices of a large number of sold properties are regressed against the characteristics that are seen as influencing the prices of the properties. The defining work on this is the book Real Estate Valuation Theory published by the Appraisal Institute and American Real Estate Society.

In more recent years, appraisers have begun to rely on graphical interface systems (GIS) to assemble databases of properties that incorporate geographic information. Common geographic variables include distance to major metropolitan areas, crime rates, distance to public transportation, etc. An important early study by Wyatt incorporated other buildings, car parks, footpaths, boundary of the urban areas, tree preservation orders, open spaces, conservation areas, water features, road edges and centre-lines and railway lines and stations into his model of commercial real estate prices#.

However, these multiple variable regressions (MVR) have several limitations. First, for small sample sizes, multivariate regressions have proven to be extremely poor estimators of prices#. Therefore, in rural areas or properties that do not have many properties suitable for compassion, neural networks (NN) hold great promise.

In addition, many characteristics of the house are highly inter-correlated. This leads to the possibility of multicolinearity occurring. Further, the data used for the comps is often taken from different times. Thus, hetroskedasticity arises, further diminishing the accuracy of the estimates. Lastly, in an increasingly complex economy, assumptions of linearity seem to be less and less feasible.

In an effort to obtain more accurate sales predictions, some researchers have turned to neural networks to improve performance. The results of using neural networks versus multivariate regressions are contradictory and scattered. Studies by researchers such as Tskudua and Baba, Ngyuen and Cripps, and Hanson, all demonstrate effective ways of using neural networks for valuation #. However, seemingly just as many researchers have found MVR superior to NN#.

III. Explanation of the Neural Network System

To understand the neural network model, one must understand its motivation for creation. The neural network model is a statistical technique modeled off the way the brain processes data. A non-rigorous definition of neural networks is given by Gurney as follows:

A neural network is an interconnected assembly of simple processing elements, units or nodes, whose functionality is loosely based on the animal neuron. The processing ability of the network is stored in the inter-unit connection strengths, or weights, obtained by a process of adaptation to, or learning from, a set of training patterns.

The goal of the method was to create a non-linear and non-parametric statistical technique#. Nonparametric statistics are those that do not make any underlying assumptions, such as normality, about the population distribution#. This is a major weakness of traditional parametric statistical techniques, is that one must assume the underlying probability distribution. Non-parametric statistics are not bound by this constraint.

In order to better understand neural networks, it is helpful to have a little background knowledge on biological neurons, for which our model is based. For our purposes, biological neurons can be broken down into four simple parts. First, are the dendrites. Dendrites are responsible for receiving incoming signals from other neurons. Their structure typically resembles the branches of a tree, allowing them to interact with many different neurons in the brain. The second element is the soma, or cell body. It is the largest part of the cell and responsible for the cells vital functions. Moving forward, the axon is a single truck-like structure that extends from the soma. It serves to carry the impulse outward to the next part of the neuron, the pre-synaptic terminals. The pre-synaptic terminals branch out from the axon, connecting the neuron with other neurons and transmitting the impulse when applicable.

There are two types of impulses that the neurons transmit, inhibitory and exhibitory. When a neuron receives an impulse in the dendrites, it is then transmitted to the cell body#. If the strength of the impulse is greater than some value, it is said to be exhibitory and the cell continues to transfer the impulse. Conversely, if the strength of the impulse is less than some value, the impulse is not transmitted and it is labeled as inhibitory.

The artificial neuron used for our computation follows a similar form. The first artificial neuron neural network was the threshold logic unit (TLU), developed my McCulloch and Pitts#. A neuron receives inputs and the inputs are weighted in some fashion. If the summed value of the impulses exceeds a certain threshold level, the exiting impulse is given a value of one. If not, the impulse is given a value of zero and does not continue.
It is the arrangement of multiple neurons of this type that gives us a neural ‘network’. Arrangement on the neurons can take many forms, but in its simplest is a series of inputs, which then converge onto a hidden layer, continuing on to an outer layer of neurons, which then eventually produces the outputs#.

The network described above is referred to as a feed forward network. However, these networks can take many shapes and have nodes that not only send signals forward, but may send signals laterally and in some cases backward. More details on various structures will be given later.

The weights given to each impulse by the neuron are an important source of their computing strength. These weights need not necessarily be static. Rather, the weights may be altered by a set of known data. One extremely useful process to alter these weights is called supervised learning. In this process, the series of known data is passed through the neural network and the weights are adjusted according to a learning rule, to provide the desired outputs. This process is typically repeated many times until the weights very accurately produce the desired outputs. It is then the hope that when the network receives data it has not seen before, that it will be able to correctly identify the pattern and produce the correct outputs.

As one may notice, neural networks are not a very strongly defined statistical technique, thus giving them flexibility to be applied to a very wide range of areas and applications. Neural network models have been applied in diverse fields such as civil engineering#, pavement crack analysis#, soil and water retention# and many other seemingly unrelated areas. However, the diversity of the applications speaks to the utility of the method.

IV. Data Set
For this paper, data on133 real estate transactions, from the period from 3/20/2006 to 2/28/2007, was gathered from the website Zillow.com. Zillow is a free website designed to give consumers a quick estimate of their homes’ values. In order to calculate these estimates, Zillow contains a database of all home transactions for a particular region, going back to different periods, depending on the location. The transactions recorded are from the northwest suburbs of Chicago, primarily in the 60010, 60047, and 60042 zip codes. In addition, the database lists the address of the home, the number of bedrooms, the number of bathrooms, the size of the home in square-feet, the size of the lot in square feet, the price the home sold for, the date the home was sold on, and the age of the home. The variables, size of home in square feet, size of lot in square feet, have both been divided by 1000 to better condition the numbers to be squared and cubed. Similarly, the variable for the age of the house has been divided by 10. These will be the variables used in our model to predict housing prices.
The data was randomly broken down into 79 “training observations” and 54 “validation” observations. This ratio of training observations to validation observations is consistent with previous literature Each model will be examined for their ability to predict the values of the 54 validation housing prices.

Variable Name

Description

Price

Listed in US Dollars

BD

Number of Bedrooms

BA

Number of Bathrooms

SIZE

(Square Footage of the House)/1000

LOT

(Square Footage of the Lot)/1000

AGE

(Number of years since the house was built)/10

LN(VAR)

The natural log of the specified variable

SQ(VAR)

The specified variable squared

CU(VAR)

The specified variable cubed

Q2

Dummy Variable indicating whether or not the house was sold in the second quarter of 2006

Q3

Dummy Variable indicating whether or not the house was sold in the third quarter of 2006

Q4

Dummy Variable indicating whether or not the house was sold in the fourth quarter of 2006

Q5

Dummy Variable indicating whether or not the house was sold in the first quarter of 2007

As the graphs show, there is a lot of variability amongst the different variables and their effects on price. This is most likely due in large part to the fact that our model does not control any geographic variables. Anyone who knows the Chicago area can tell you that the closer one gets to the city, the higher housing prices are. Despite this, these complicated relationships are exactly what neural networks are designed to predict and thus this high variability should be a good testing grounds for a non-linear model.
V. Multivariate Regression Models
In accordance with previous literature, several multivariate regression models were tested. Previous literature has hypothesized that real estate data may follow a semi-log, or a log-log pattern. In addition, research has shown that age, property size, and house size may have a squared or even cubic relationship.

Lastly, there is a large possibility for hetroskedasticity to occur due to the fact that data is pulled from different time periods, thus all the models are repeated using White’s model for hetroskedasticity. The “Un-Whitened” model formats are given below.

The model was then back tested using the 54 validation observations. The success or failure of the model was based upon its ability to achieve a low forecasting error.

Forecasting Error= |Actual Housing Price- Predicted Housing Price|

Actual Housing Price

Mean Forecasting Error of the Multiple Variable Regression Models

Linear

Semi-Log

Log-Log

Age

Lot

Size

0.254316

0.235697

0.259342

0.258416

0.252312

0.250122

Age,Lot, Size

Age

Lot

Size

Age,Lot, Size

0.254224

0.261117

0.253886

0.249586

0.260392

It is somewhat surprising that the Semi-Log model has the lowest mean forecasting error. In order to predict with a semi-log model, one must take the exponential of the predicted prices. This is a non-linear transformation and therefore introduces bias. However, in this case it appears that this bias is overcome by the increased predictive power of the model.

The lack of variability between the 11 models is striking. The different in mean forecasting error between the highest and the lowest models is only .025. However, given a larger sample size, one would expect this to increase as more subtle relationships between the independent variables are teased out.

The distribution of the forecasting errors for the semi-log model is shown below. The model does an impressive job, 32 of the observations have a forecasting error of less than .2 and 46 of the observations have a forecasting effort of less than .40.

VI. Neural Network Model Explanation
In order to contrast the performance of the two statistical techniques, a traditional feed forward neural network was utilized. In a feed forward network, the data is separated into two sets, the training set and the validation set. As stated previously, our data was randomly broken down into a training set of 79 observations and a validation set of 54 observations. Then, the network is ‘trained’, meaning the training data is passed through the network and the initial parameters are adjusted in some manner, for a specified number of iterations. Then the validation set is used to check the accuracy of the model specified by the training. In this analysis, a sigmoid activation function was used for its simplicity and ease of differentiation.

Sigmoid function

The number of hidden layers, the number of neurons per hidden layer, and the number of training iterations to conduct are open to debate. Previous research on neural networks for real estate prediction has shown that one hidden layer is typically sufficient for accurate assessment#. Similarly, one hidden layer is used in our model. For the other parameters, there are a few ‘rules of thumb’ given for these parameters, but due to the low number of observations in our sample, numerous combinations were tested.

The neural network package utilized for this analysis automatically sets the initial values to random numbers within the data range. This results in quicker training than a pure random initialization#.

Neural Network Model Specifications

Model Number

Hidden Layers

Neurons Per Hidden Layer

1

1

1

2

1

2

3

1

3

4

1

4

5

1

5

6

1

6

7

1

7

8

1

8

9

1

9

10

1

10

11

1

15

12

1

20

13

1

25

The following models were all run with 5, 10, 15, 20, 25, 30, 50 and 75 training iterations. One of the difficulties of the neural network model is estimating how many training iterations to use. If you underestimate the amount needed, the results are not as accurate as they could be. Conversely, if one uses too many training iterations the function may be over-fitted to the data. In turn, this over-fitting causes a loss of accuracy when new data is run through the model.

To compare the accuracy of the models, the validation data was run through both models and the mean forecasting error was computed.

Mean Forecasting Error of Various Neural Network Models

Model Number

Number of Neurons on One Hidden Layer

5 Training Iterations

10 Training Iterations

15 Training Iterations

20 Training Iterations

25 Training Iterations

30 Training Iterations

50 Training Iterations

75 Training Iterations

1

1

0.277338

0.277165

0.277342

0.277027

0.277411

0.276779

0.2761

0.2761

2

2

0.283716

0.277502

0.280274

0.285198

0.28401

0.282454

0.302764

0.301875

3

3

0.267825

0.273145

0.300541

0.304717

0.317211

0.329043

0.35199

0.350984

4

4

0.293282

0.343108

0.338736

0.360779

0.371395

0.406882

0.429371

0.425458

5

5

0.319706

0.382284

0.385009

0.380813

0.376205

0.382051

0.384963

0.388435

6

6

2.06627

9.448

1963.7

2052.98

2361.01

399139.

3436.7

4133.77

7

7

0.312132

0.287282

0.299599

0.309019

0.330474

0.349585

0.493541

0.635953

8

8

0.352379

0.393616

0.635656

0.959402

1.59276

0.538184

1.07871

6.10403

9

9

1.17289

1.57346

1.39654

93.6565

749.619

638.547

256.935

33.1817

10

10

0.297179

0.328543

0.354833

0.412025

0.461227

0.48286

0.703548

0.918779

11

15

0.33348

0.460503

0.489845

0.788403

0.917985

1.20471

1.13428

1.17958

12

20

0.417438

0.530489

0.692084

0.883574

1.16352

1.33265

1.39361

1.48593

13

25

0.657161

2.37524

3.20335

4.61429

5.74316

7.12753

11.5672

13.0967

FE<0.3

.3<FE<0.6

.6<FE<2

2<FE<10

FE<10

As one can see, the neural network model presented here has the best results with a relatively low number of neurons per hidden layer and low training iterations. In fact, the model with the lowest mean forecasting error has three neurons per hidden layer and only 5 training iterations. This is not entirely surprising, due to the fact that our entire data set was only 133 observations, with nine explanatory variables. If one were to add more observations and more explanatory variables, then the number of training iterations and the number of neurons per hidden layer, and possibly even the number of hidden layers, would likely need to be increased. Interestingly, there is very little variability in mean forecasting error between models 1-5. Despite the parameters ranging from 1-5 neurons and 5-75 training iterations, the mean forecasting error varies by a scant .16.

For models 6-13, there appears to be massive over-training tacking place, resulting in ridiculous price predictions. The absurdity of these estimates underscores the importance of choosing the correct parameters when constructing a neural network. Interestingly, there are some shockingly precise estimates that occur within this range. In particular, model 10 with 5 training iterations and model 7 with 10 and 15 training iterations have mean forecasting errors below .30. This is most likely due to the randomness utilized by the neural network in selecting the initial training parameters and is not repeatable. Again, this serves as a cautionary example of the utilization of neural networks. One must not only concern themselves with the neural network predictions for their particular data set, but whether this network could be repeated in different circumstances.

The chart below shows the distribution of the forecasting errors for the model with the lowest mean forecasting error, model 13. As is apparent in the chart, the model is quite effective for the majority of the predictions. However, several outliers are undoubtedly pulling the mean forecasting error much higher. In fact, approximately 28 of the predicted prices have a forecasting error of less than .2 and an impressive 46 of the predicted prices have a forecasting error of less than .40. This is contrasted by the drastic increases in forecasting errors for predictions 50 and onward. Once again, this high variability is most likely a result of the geographic regions from which the data was taken, but none the less illustrates how a neural network will handle predictions of data with high variability.

VII. Comparison of the Two Models
In our analysis, it seems clear that the Semi-log MVREG model is the most accurate model for predicting the real estate prices. The difference in mean forecasting between the best multivariable regression model and the best neural network model was approximately .032. Additionally, the chart below shows how the distribution of forecasting errors is shown below.
The Semi-Log model beats the NN Model 13 in almost every observation. However, in the low to middle spectrum of the forecasting error, the NN 13 competes closely with the Semi-Log model, even showing superiority in some cases over the range 37-49. However, the largest difference between the two models is that the Semi-Log model seems to do a better job of dealing with outliers than the NN Model 13. This is undoubtedly where a large portion of the difference in mean forecasting error is created.
Additionally, the computational requirements of the neural network models further hinder its practicality. Despite the small sample size used in this analysis, it took the 3.2 GHz Pentium IV with Hyper-Threading approximately 22 minutes to run the 104 neural network models, or .21 minutes per model. However, the 11 multivariate regressions models ran almost instantly.

The ambiguity of a neural network model also presents a problem. The difference between the highest and lowest forecasting error in a multivariate regression model was .025. The difference in the neural network models was over 11. The neural network models were admittedly grossly over trained, but there is not this potential for error in a multivariate regression model.

The greater potential for error in a neural network, model stems from the fact that there are many more parameters that the operator must specify than with the regression model. The chart below summarizes the decisions that must be made to construct the two different models.

Multivariable Regression Model

Neural Network Model

Data Transformations (e.g., log, semi-log, squared, etc)

Model Structure (Feed-Forward, Radial Basis, Hopfield, etc.)

Significance Level

Hidden Layers

Number of Neurons per Hidden Layer

Number of Training Iterations

Activation Function

Initial Training Values

Although there is research suggesting values for these parameters in the neural network model, there are not hard rules and experimentation is required. One may also pose the question of whether once the appropriate parameters of a neural network model are found, will those parameters remain optimal with different data. If a large scale real estate company was deciding whether or not to purchase several properties, the NN model they had previously relied on may not give accurate predictions of the price for this new set of possible purchases. In contrast, the parameters for the multivariable regression model are much fewer and have long established ranges (e.g. p<.05 significance level for variables, etc).

Another concern with neural network models is variation among various neural network programs. As a result of being very computationally intensive, neural network models tend to be particularly sensitive to computer programming intricacies. Every neural network program comes with a large manual that details the construction of the software and studies have shown how the different programs can produce different results from the same inputs#.

In contrast, the reliability of most regression software is well known. The software is much less sensitive to differences between programs, due to the decreased computations necessary. Additionally, the software is available to a wider range of users, particularly as a result of the data analysis pack that is embedded in all new versions of Microsoft Excel. Neural network software must be purchased separately, often at significant cost.

VII. Conclusion

Our data set showed a neural network model to be inferior to a multivariable regression. There is greater potential for error due to the increased number of user specified parameters. Additionally, the repeatability of the results amongst various computer programs is disputable. Lastly, the computational requirements make the NN models longer to run than the MVR’s. However, there is still much work to be done in the area of neural networks. Neural network models should be studied utilizing much larger data sets that was used in this study. Additionally, rules for establishing the parameters of the neural network model need to be created and rigorously tested. Further, the performance of neural networks with various levels of sufficiency of variables needs to be evaluated. Lastly, our study emphasizes that fact that whatever model of prediction being used, there must be variables to account for geographic differences.

That being said, the regression model was first contemplated in 1885 and is still being refined today#. In contrast, the neural network model as a computational tool came into being in 1949#. Thus, it is still an infant statistical technique, in need of much more refinement to find large scale application. Due to the ever increasing complexity of problems being studied by researchers and the exponential increases in computing power and technology, it is unlikely that neural networks will cease to be a promising area of study.

The complex nature of real estate valuation will certainly provide a plethora of applications for NN models. Once the technique evolves from its current state as something of an art form into a well defined statistical technique, we will see much more accurate real estate valuations.

Many thanks to Joris Claassen who provided feedback on how to better optimize this code. Please find the updated code below.

Hey guys,

As I continue my quest to become a master VBA programmer, I would like to share a piece of code with you that I recently constructed. This could was the result of Excel Solver not being able to generate a satisfactory solution to my optimization problem. Thus, I wanted to run a sensitivity analysis to see how various initial values would affect my target cell. This code allows the user to specify a target cell, a set of change cells, a range of lower bounds (inclusive) and a range of upper bounds (inclusive), as well as a precision level. The code then one-by-one tests the sensitivity of each of your change cells on your target cell and generates graphs demonstrating the output. I would appreciate any feedback on the code!