Questions tagged [regression]

I have done two predictions for some data, one with KNeighborsRegressor, another with RandomForestRegressor, and have scored them.
I would now like to use both models combined to make a prediction.
I have found online that you can do this by using VotingClassifier
Here is the code that I currently...

I'm working the the lung dataset in the survival package in R. I have split the data into 80% and 20%, and am built the coxph model using 80% of the data. I would like to test the data on the 20% to predict the disease status, what code would achieve this? Is there a function in the survival package...

In an Rcpp project I would like to be able to either call an R function (the cobs function from the cobs package to do a concave spline fit) or call the fortran code that it relies on (the cobs function uses quantreg's rq.fit.sfnc function to fit the constrained spline model, which in turn relies on...

I am using statsmodels (open to other python options) to run some linear regression. My problem is that I need the regression to have no intercept and constraint the coefficients in the range (0,1) and also sum to 1.
I tried something like this (for the sum of 1, at least):
from statsmodels.formula....

I have trained a linear regression model, with sklearn, for a 5 star rating and it's good enough. I have used Doc2vec to create my vectors, and saved that model. Then I save the linear regression model to another file. What I'm trying to do is load the Doc2vec model and linear regression model and t...

I have an assignment and it asks me to:
Improve the performance of the models from the previous stepwith
hyperparameter tuning and select a final optimal model using grid
search based on a metric (or metrics) that you choose. Choosing an
optimal model for a given task (comparing multiple regressor...

I am running the following in Stata:
eststo: ivregress 2sls y (x=z) control [aw=weight], cluster(cluster) first
esttab using file.tex, b(%9.3f) se(%9.3f) r2(%9.8f) replace
This produces a publication-style table for 2nd stage.
However, what should I do to do that for 1st stage? I need coefficien...

I have 3 columns of data in a dataframe (data) with no headers.
The 1st and 2nd column are the independent variables and the 3rd column is the dependent variable.
I have to fit a polynomial of order 3 in the independent variables.
I did:
dm

I am trying to understand how auto.arima() with linear regression vs. lm() works.
My assumption, which seems to not be true, is that when you use auto.arima() and specifying xreg, that a linear model is fit to the overall series, and then an ARMA model is used to further fit the residuals. I get th...

I am trying to conduct a stepwise logistic regression in r with a dichotomous DV. I have researched the STEP function that uses AIC to select a model, which requires essentially having a NUll and a FULL model. Here's the syntax I've been trying (I have a lot of IVs, but the N is 100,000+):
Full = gl...

I've been trying to follow a machine learning course on coursera. So far, most of the linear regression models introduced use variables that their numerical values have a positive correlation with the output.
Input: square feet of the house
Output: house price.
I'm however, trying to implement a m...

I am trying to calculate linear regression of Y=C-A column, x = ['Plate X', 'Plate Y', 'Field X'] and group those values by Drum and Plate. Additional question - how to save results as a file, csv preferable.
Is pandas package is sufficient for this task or other package needed.
Thank you
There is...

I want to interpret the regression model weights in a model where the input data has been pre-processed with PCA. In reality, I have 100s of input dimensions which are highly correlated, so I know that PCA is useful. However, for the sake of illustration I will use the Iris dataset.
The sklearn code...

When I do ridge regression using sklearn in Python, the coef_ output gives me a 2D array. According to the documentation it is (n_targets, n_features).
I understand that features are my coefficients. However, I am not sure what targets are. What is this?

I'm using an rlm model like this.
fit=rlm(log(y) ~ x + z)
Z is a list that contains all 1. I get the error Error in rlm.default(x, y, weights, method = method, wt.method = wt.method, : 'x' is singular: singular fits are not implemented in 'rlm'
Is it equivalent to use fit=rlm(log(y) ~ x + 1) in...

I'm trying to run one of MLlib algorithms, namely LogisticRegressionWithLBFGS on my database.
This algorithm takes the training set as LabeledPoint. Since LabeledPoint requires a double label ( LabeledPoint( double label, Vector features) ) and my database contains some null values, how can I solve...

I am encountering a very strange problem in .net regression testing. I have a test method which fails when I run the complete test suit, but the same test method passes when run individually.
What could be the possible reason behind it. I double checked that other test methods are having no effect...

I'm trying to print a series of logistic regressions in statsmodels but am unsure how to print the results to something other than the console screen. I've created a function that runs the regressions where data is the dataset, and the other variables are a series of lists of dummy variable labels f...

I´m using the same data but different python libraries to calculate the coefficient of determination R^2. Using stats library and sklearn yield different results.
What is the reason behind this behavior?
# Using stats lineregress
slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)...

I am running a large number of linear regressions, and for each regression I would like to save the adjusted R squared and the degrees of freedom each in a seperate file.
The code below does this perfectly for the adjusted R squared, and I can add the value name of the list to the file (so I can ide...

I have been all over Google trying to find a good function/package to perform multivariate regression (i.e. predict multiple continuous variables given another set of multiple continuous variables).
I wish to use something like fitlm(), since that also gives me p-value statistics and R squared stati...

My nonlinear mixed-effects model regresses body mass (bm) on age. I would like consider that brood is nested within year, but as a brood can only occur in one of the seven years that are in the dataset, the random effects of year and brood should be crossed.
In Pinheiro & Bates (2000): ‘Mixed-Effe...

In linear regression I've always seen the situation where I have many features and I use them to predict a single output, for example
f1 f2 f3 f4 --> y1
f1 f2 f3 f4 --> y2
and so on...
I want to know if there is something where the predicted value i.e. y1 is actually a vector not a single value

For the aim of a robust linear regression, i want to realize a M-Estimator with Geman-McLure loss function
The class of M-Estimators are presented in this document and Geman-McLure can be found at page 13.
To solve the minimization problem, Iteratively reweighted least squares is recommended. How c...

I have a 10 Hz time series measured by a fast instrument and a 1 minute time series measured by a slow reference instrument. The data consists of a fluctuating meteorological parameter. The slow reference instrument is used to calibrate the fast instrument measurements. Both time series are synchron...

I am having to tackle a problem that far exceeds my current programming skill for Python. I am having difficulty combining different modules (csv reader, numpy etc.) into a single script. My data contains a large list of weather variables across time (with minute resolution) for many days. My object...

I am using GAM (generalized additive models) for my dataset. This dataset has 32 observations, with 6 predictor variables and a response variable (namely power).
I am using gam() function of the mgcv package to fit the models. Whenever, I try to fit a model I do get an error message as:
Error in ga...

So here’s the game plan. I’m trying to take this data set (will be a structure object) below, run a curved regression model through it.
Then, I’d like to take the slope (i.e. the first derivative value for each x) at each point, and save the data table with that slope information in its own co...

I am trying to fit a SVM regression model using Scikit Learn Package but it is not working like I am expecting.
Could you please help me to find the error? The code that I would like to use is:
from sklearn.svm import SVR
import numpy as np
X = []
x = np.arange(0, 20)
y = [3, 4, 8, 4, 6, 9, 8, 12,...