CS 251: Lab #5

Fitting Models

The goal of this lab is to start the process of adding a new data
analysis capability to your GUI. In particular, you will be
implementing simple linear regression of two variables--one
independent and one dependent--and plotting the results.

Tasks

In the lab, your goal should be to execute a linear regression on two
variables and then plot the result on the screen.

Give yourself a new working directory and copy your display, viewing,
analysis and data python files into it. It's best to start with copies
and modify them from there.

In your Display class, add variables for holding (a) the graphical
objects associated with a linear regression (i.e. one or more tk Line
objects), and(b) the endpoints of the regression line in normalized
data space (e.g. a numpy matrix or a 2-D list). You can initialize the
first field as an empty list and the field holding the initial
endpoints as None.

In your Display class, add a menu item to the Command menu that calls
the handleLinearRegression function, which you will create in the next
step.

In your Display class, create a method
(e.g. handleLinearRegression. The function should let the
user select the variables to fit and then display them on the main
screen. It should have the following steps, each of which you may
want to do in a separate function.

Create a dialog class that lets the user select
an independent (x) variable and a dependent (y) variable. If you
want to let the user also pick variables for color and size, that
is up to you. The dialog window needs to return at least two
headers from your numeric data: the independent and dependent
variables for analysis. If the user selects Cancel, the process
should terminate and the existing display should not change.

Clear the existing points from the window.

Clear any existing data fits or models from the
window. This should delete any objects in your linear regression
objects list.

Reset the view to the default position.

Update the axes.

Call a buildLinearRegression function that
creates the canvas line object to show the linear regression fit
graphically.

Start by creating the dialog window and make sure it returns two
headers. Then write the buildLinearRegression function (see next step
for details), then go back and deal with clearing the points of any
existing plots, clearing any prior linear regression fit, and
resetting the view.

Create the buildLinearRegression function. This function should do the
following.

Extract the two columns selected by the user from the Data
object. Make the independent variable the X column and the dependent
variable the Y column. Normalize the columns separately. Use your
function from your analysis library.

Add a third column of zeros to the matrix. Use np.zeros and np.hstack.

Add a fourth column of ones to the matrix. You need to store this
matrix in your self.datapts field, or whatever field you used to
store the data in your buildPoints function from last week.

Build the vtm, multiply it by the data points, and then create the
ovals to plot the data on the screen. This should make a 2-D plot
of the two variables, with the independent variable along the
x-axis. At this point, you should be able to test your function to
see if it makes the 2D data plot. If you did it right, the
translations, rotations, and scales should all still work as
expected.

Use the scipy.stats.linregress (import scipy.stats) function to
calculate the linear regression of the independent and dependent
variables. Note, you will need to get the unnormalized data from
your Data object in order for this to work properly. The linear
regression must occur in original data space, not the normalized
data space. Store all of the outputs of the linregress function.
linregress Documentation

Get the range of the independent (x) and dependent (y) variables
(use your analysis.data_range function).

Make the endpoints of the linear regression line fit. Note that
these endpoints need to end up in normalized data space, while the
linear regression model is in unnormalized data space. In
normalized space, the x values of the endpoints will be 0.0 and
1.0. If the slope is m and the y-intercept is b, then the y
values of the endpoints in normalized data space will be:

Multiply the line endpoints points by the vtm and then make a tk
line object out of the two endpoints. Make it a color that will
stand out relative to the data points..

Your program should somehow communicate the linear regression
coefficients to the user as part of the GUI. You could do this by
making a tk.Label object and putting text into it giving the
slope, intercept, and R-value for the fit.

In addition to testing along the way, test out your system now
using this data file. It contains two
variables, linearly related with some noise. Your fit should give a
slope of 1.995, an intercept of 1.012, and an R^2 value of 0.792.
An example plot is shown below.

When you are finished with the lab, go ahead and continue with
the project.