How to determine if two R2 values (rsquared) are significantly different

New Member

I have fit two different functions (f1,f2) to some data. These functions have different forms (see below), in that they have different numbers of parameters. But the data is the same - the number of predictor/dependent variables is the same.

Here, V is the dependent variable (/predictor). All other occurences are parameters that are fit by a least squares method. f1 is a sigmoid with 2 parameters, f2 is a double sigmoid with 7 parameters. :yup:

I obtain a R2 for each fit; one is \(R^2_{f1}=0.99975\) and one is \(R^2_{f2}=0.995\).

I want to know which function the data fits best, i.e. is one of these R2 statistically greater than the other? Seemingly they are almost indistinguishable, but in other cases they may differ more (e.g. R2=0.95 vs 0.9). Can I do a test that tells me which function I should use? Ultimately, this is what I want to know. Whether this is decided by the R2 value or not doesn't matter (in fact I may be wrong thinking that this is what I should use).

If I am using the Rsquare to determine which function (f1 or f2) to use, I have done some reading and think I need to do something as in the following webpage:

Basically, for regression analysis you can determine whether two different R2 generated from two different regression models (typically with different numbers of predictor variables) are statistically different by calculating the F statistic. Say we have two models m1 and m2, then you can calculate the F stat:

where \(df_{m1}\) and \(df_{m2}\) are the number of predictor variables in the model and \(R^2\) are the rsquareds for each of the models (m1 and m2). You obviously cannot use this analysis if the number of predictor variables is the same in the two models.

My problem is that I have the same number of predictor variables for my functions f1 and f2. The difference between the functions f1 and f2 is the number of parameters (to be fit) is different. Do I put \(df_{m1}\) and \(df_{m2}\) as the number of parameters that are used in the functions (2 and 7, respectively)?

Cookie Scientist

Hi MG, The F-ratio approach is only valid for comparing linear models that are nested. Because your models are neither linear nor nested, there's no good reason to think this will work well.

I think the most common thing to do in a situation like this would be to compare the models' information criterion statistics, such as AIC or BIC, and pick the model that has the lower value -- but note that this is technically a bit different from comparing the two models on which has the higher \(R^2\) value. So this may or may not work for you, based on how committed you are to using \(R^2\) as the basis for comparison, or if you just want to know more generally/vaguely which model is more "consistent with the data."

What I personally would probably do here is use a bootstrap approach. In each iteration, sample the rows of the dataset with replacement, fit both models to the resampled data, compute their \(R^2\) values, and then take their difference. Do this enough times and you'll have a bootstrap sampling distribution of the difference in \(R^2\) values. Now check to see where the null value of 0 lies in this distribution.

New Member

Thank you very much for your informative and speedy replies. This is not something I am 100% confident in so would appreciate you help with the below.

Firstly, the bootstrap approach.... Am I correct in thinking that what you are suggesting is take my dataset (say, 30 datapoints), and creating lots of sub-datasets by deletion and 'replacement'. (Replacing what with what? Do you replace missing data with mean data?) And then conducting both fits to this data. Then I calculate \( \Delta R^2 \) and draw the distribution of this. I then look at the distribution, see where 0 lies and look to see if it is +ve or -ve. If its positively skewed, then one function fit is better than the other, say. My issue is that this sounds quite subjective still? At the end of this process, I am still having to interpret a distribution by eye.

After doing some reading, perhaps the best approach would be looking at this "Akaike Information Criterion" (or Bayesian), however I am not very familiar with calculating this. The equation I have found is:

\( AIC = - 2 ln(likelihood)) + 2K\)

where "likelihood" is the probability of the data given a model and K is the number of free parameters in the model.

The input to the criterion that I am not clear about is the "likelihood". What exactly is this and how do I compute it? I have done some reading and think this is somehow associated with the error associated with the least-squares approach, but I am not sure. Is this value something that could be "spat out" in something like MATLAB?

Cookie Scientist

Am I correct in thinking that what you are suggesting is take my dataset (say, 30 datapoints), and creating lots of sub-datasets by deletion and 'replacement'. (Replacing what with what? Do you replace missing data with mean data?)

You build each resampled dataset like so: first, randomly draw 1 row from the original dataset. Next, randomly draw another row from the original dataset and add this to the first row that you drew. (Note that there is a possibility that you will draw the same row twice, and this is okay, this is what I meant by "sampling with replacement.") Do this 30 times until you have build up a new dataset that is the same size as the original dataset, but is comprised of a random selection of rows, including some that are duplicated and some that are missing. And then you will compute \(\Delta R^2\) on this resampled dataset just like you said.

I then look at the distribution, see where 0 lies and look to see if it is +ve or -ve. If its positively skewed, then one function fit is better than the other, say. My issue is that this sounds quite subjective still? At the end of this process, I am still having to interpret a distribution by eye.

You can be more precise by doing things like (a) find the 2.5% and 97.5% quantiles of the distribution, i.e., the middle 95%, and see if 0 lies in this interval, (b) compute the proportion of the distribution that falls below 0 (or above 0, depending on which direction you computed the R^2 differences). The method (a) is called the percentile bootstrap confidence interval, and the method (b) is kind of like a p-value, but not exactly.

I don't personally use MATLAB, but I am pretty confident there will be some way to have the MATLAB nonlinear regression function that you're using spit out the log-likelihood, or possibly -2*log(likelihood), in which case it may be called the "deviance." Some googling for terms like "matlab aic model comparison" should help.