Tagged Questions

Refers to general procedures that attempt to determine the generalizability of a statistical result. Cross-validation arises frequently in the context of assessing how a particular model fit predicts future observations. Methods for cross-validation usually involve withholding a random subset of the ...

I have a large data set with over 700,000 examples and I tried to (binary) classify the data set with Naive Bayes and Random Forest. The task was carried out in Python and Scikit-learn
data
The data ...

I'm currently using the train() function in the caret package to run 10-fold repeated cv on a random forest model. I would also like to explore other statistical and machine learning models for use ...

I don't have very good knowledge about statistical testing. Accept my apology if my question sounds stupid. I have developed two methods for competing discomfort level of communities. one is static ...

I have a question regarding how to measure the performance of a model when the distribution of instances per class is limited.
In my scenario I have five data sets $\mathcal{D}_{1}, \mathcal{D}_{2}, ...

I'm working on a data set that contains used (value= 1; animal locations) and random locations (value = 0). I'm using logistic regression to assess non-random habitat selection. I have 6 continuous ...

During the application of cross-validation in sufficient large dataset (say 6000), is there a recommended ratio to split the data in to learning/training and testing/validation data set? I have seen ...

I have a conceptual problem understanding how to cross validate stepwise logistic regression. Every time the training set is divided it is very likely that different features are chosen based on the ...

I am trying to use time series neural network to predict future values. I have time series data from 2010-2014 and I need to predict the values from 2015-2020 using time series neural network. I am ...

As far as I've seen, opinions tend to differ about this. Best practice would certainly dictate using cross-validation (especially if comparing RFs with other algorithms on the same dataset). On the ...

I would like to find a good pair of predictors out of about
400 available pairs. To do this I am using LOO cross validation.
Since there are so many pairs available, don't I run into the
issue that ...

I'm trying to solve a bio-medical image segmentation problem using a binary classifier and then a spatial smoothing (assuming continuous regions). I have:
Training set of 10 3D scans, a total of ~30 ...

I'm evaluating a grid of tuning parameters using Caret with metric="ROC" for cross-validation. Is there any simple way to use as metric the area under the curve for an specified interval of the ROC ...

I have 2000 observations in a dataset with features and a binary-class outcome. I split the dataset into two sets for split sample validation. I use 80% to train the model and internal perform Cross ...

When a dataset is given and it is divided into N parts, training a Cart on N-1 parts and testing it on the remaining part (and doing that N times, i.e. for each possible leave-out), one ends up with N ...