Name: A random sample of Wake County, North Carolina residential real estate plots
Type: Random Sample
Size: N = 100, 11 variables
Descriptive Abstract:
The information for this data set was taken from a Wake County, North Carolina real estate database.
Wake County is home to the capital of North Carolina, Raleigh, and to Cary. These cities are the
fifteenth and eighth fastest growing counties in the USA respectively, helping Wake County become
the ninth fastest growing county in the country. Wake County boasts a 31.18% growth in population
since 2000, with a population of approximately 823,345 residents currently.
This data includes 100 randomly selected residential properties in the Wake County registry denoted
by their real estate ID number. For each selected property, 11 variables are recorded. These
variables include year built, square feet, adjusted land value, address, et al.
Sources:
Wake County, via http://services.wakegov.com/realestate/, on 3-25-08
Variable Descriptions:
ID # - the county-given identification number for the selected plot
Year Built - the listed year in which the structure was built (by year)
Sq. Ft. - the area of the floor plan in square feet (in square feet)
Story - how many stories the structure has (in stories)
Acres - how many acres in included in the plot (in acres)
No. Baths - the number of bathrooms at the residence (in bathrooms)
Fireplaces - the number of fireplaces in the residence (in fireplaces)
Total $ - the total assessed value of the property (in dollars)
Land $ - the assessed value of the land (in dollars)
Building $ - the assessed value of the building (in dollars)
Zip - the zip code of the property
Empty cells represent a value not included in the property record
Story Behind the Data:
With Wake County being nationally ranked for its growth over recent years, the size and scale
of the databases with public data on the properties is becoming more readily available. These
databases are utilized by Dr. Woodard in one of the courses he teaches through a CAUSEweb.org
activity because of the information that can be obtained and used for correlation analysis such
as the many variables listed above. This data was collected as a tool to show and compare
results from students data sets collected in the same manor.

Special Notes:
This data set was not compiled using the first 100 randomly obtained real estate identification
numbers. Approximately 140 numbers were tried in order to obtain this set of 100, while the ones
not included were either non-residential plots or were records that do not exist. The real estate
ID numbers varied between approximately 1 and 200000, which were randomly generated using Microsoft
Excel. All the data were found on the Wake County website, and were not altered in any way.
There is an activity posted on CAUSEweb.org by Dr. Woodard in which students would collect their
own version of this data set. A PDF version of this activity can be located at
http://www.causeweb.org/repository/Realestate/Realestate.pdf
Pedagogical Notes:
The most prevalent statistical characteristic of this data is the presence of a natural outlier.
The value in particular is real estate ID number 78570. This property is an outlier in two ways
that can be easily determined graphically in order to help the students visualize the affect an
outlier has on regression lines. It includes 39.38 acres while no other entry has more than 2 acres.
The amount of acreage causes the land values and total values to increase over 4.75 million dollars,
much larger than the rest of the values of other plots. Students can use this outlier to examine the
impact of an outlier on regression and on correlation. Also, the students can be asked to identify
the reason or reasons why this entry is an outlier.
Of course regression analysis can be used to determine which of the variables are good predictors
of total value (simple linear regression). Students can be asked to graph variables against total
value; for example, to graph square feet versus total value to examine the correlation coefficient
and the model of the regression equation for comparisons to the others. Multiple regression can be
used to investigate which sets of variables are good predictors of total value; for example Year
Built, Sq. Ft. and Land $ do quite well when the million dollar homes are removed.
References:
http://services.wakegov.com/realestate/, on 11-2-08
Submitted By:
Dr. Roger Woodard
Professor/Head of North Carolina State University Undergraduate Dept.
Jason Leone
NCSU Junior in Statistics
jtleone@ncsu.edu