How to Save Your Machine Learning Model and Make Predictions in Weka

After you have found a well performing machine learning model and tuned it, you must finalize your model so that you can make predictions on new data.

In this post you will discover how to finalize your machine learning model, save it to file and load it later in order to make predictions on new data.

After reading this post you will know:

How to train a final version of your machine learning model in Weka.

How to save your finalized model to file.

How to load your finalized model later and use it to make predictions on new data.

Let’s get started.

How to Save Your Machine Learning Model and Make Predictions in WekaPhoto by Nick Kenrick, some rights reserved.

Tutorial Overview

This tutorial is broken down into 4 parts:

Finalize Model where you will discover how to train a finalized version of your model.

Save Model where you will discover how to save a model to file.

Load Model where you will discover how to load a model from file.

Make Predictions where you will discover how to make predictions for new data.

The tutorial provides a template that you can use to finalize your own machine learning algorithms on your data problems.

We are going to use the Pima Indians Onset of Diabetes dataset. Each instance represents medical details for one patient and the task is to predict whether the patient will have an onset of diabetes within the next five years. There are 8 numerical input variables and all have varying scales. You can learn more about this dataset on the UCI Machine Learning Repository. Top results are in the order of 77% accuracy.

We are going to finalize a logistic regression model on this dataset, both because it is a simple algorithm that is well understood and because it does very well on this problem.

Need more help with Weka for Machine Learning?

Take my free 14-day email course and discover how to use the platform step-by-step.

1. Finalize a Machine Learning Model

Perhaps the most neglected task in a machine learning project is how to finalize your model.

Once you have gone through all of the effort to prepare your data, compare algorithms and tune them on your problem, you actually need to create the final model that you intend to use to make new predictions.

Finalizing a model involves training the model on the entire training dataset that you have available.

1. Open the Weka GUI Chooser.

2. Click the “Explorer” button to open the Weka Explorer interface.

3. Load the Pima Indians onset of diabetes dataset from the data/diabetes.arff file.

Weka Load Pima Indians Onset of Diabetes Dataset

4. Click the “Classify” tab to open up the classifiers.

5. Click the “Choose” button and choose “Logistic” under the “functions” group.

6. Select “Use training set” under “Test options”.

7. Click the “Start” button.

Weka Train Logistic Regression Model

This will train the chosen Logistic regression algorithm on the entire loaded dataset. It will also evaluate the model on the entire dataset, but we are not interested in this evaluation.

It is assumed that you have already estimated the performance of the model on unseen data using cross validation as a part of selecting the algorithm you wish to finalize. It is this estimate you prepared previously that you can report when you need to inform others about the skill of your model.

Now that we have finalized the model, we need to save it to file.

2. Save Finalized Model To File

Continuing on from the previous section, we need to save the finalized model to a file on your disk.

This is so that we can load it up at a later time, or even on a different computer in the future and use it to make predictions. We won’t need the training data in the future, just the model of that data.

You can easily save a trained model to file in the Weka Explorer interface.

1. Right click on the result item for your model in the “Result list” on the “Classify” tab.

2. Click “Save model” from the right click menu.

Weka Save Model to File

3. Select a location and enter a filename such as “logistic”, click the “Save button.

Your model is now saved to the file “logistic.model”.

It is in a binary format (not text) that can be read again by the Weka platform. As such, it is a good idea to note down the version of Weka you used to create the model file, just in case you need the same version of Weka in the future to load the model and make predictions. Generally, this will not be a problem, but it is a good safety precaution.

You can close the Weka Explorer now. The next step is to discover how to load up the saved model.

3. Load a Finalized Model

You can load saved Weka models from file.

The Weka Explorer interface makes this easy.

1. Open the Weka GUI Chooser.

2. Click the “Explorer” button to open the Weka Explorer interface.

3. Load any old dataset, it does not matter. We will not be using it, we just need to load a dataset to get access to the “Classify” tab. If you are unsure, load the data/diabetes.arff file again.

4. Click the “Classify” tab to open up the classifiers.

5. Right click on the “Result list” and click “Load model”, select the model saved in the previous section “logistic.model”.

Weka Load Model From File

The model will now be loaded into the explorer.

We can now use the loaded model to make predictions for new data.

Weka Model Loaded From File Ready For Use

4. Make Predictions on New Data

We can now make predictions on new data.

First, let’s create some pretend new data. Make a copy of the file “data/diabetes.arff” and save it as “data/diabetes-new-data.arff“.

Open the file in a text editor.

Find the start of the actual data in the file with the @data on line 95.

We only want to keep 5 records. Move down 5 lines, then delete all the remaining lines of the file.

The class value (output variable) that we want to predict is on the end of each line. Delete each of the 5 output variables and replace them with question mark symbols (?).

Weka Dataset For Making New Predictions

We now have “unseen” data with no known output for which we would like to make predictions.

Continue on from the previous part of the tutorial where we already have the model loaded.

1. On the “Classify” tab, select the “Supplied test set” option in the “Test options” pane.

Weka Select New Dataset On Which To Make New Predictions

2. Click the “Set” button, click the “Open file” button on the options window and select the mock new dataset we just created with the name “diabetes-new-data.arff”. Click “Close” on the window.

3. Click the “More options…” button to bring up options for evaluating the classifier.

4. Uncheck the information we are not interested in, specifically:

“Output model”

“Output per-class stats”

“Output confusion matrix”

“Store predictions for visualization”

Weka Customized Test Options For Making Predictions

5. For the “Output predictions” option click the “Choose” button and select “PlainText”.

7. Right click on the list item for your loaded model in the “Results list” pane.

8. Select “Re-evaluate model on current test set”.

Weka Revaluate Loaded Model On Test Data And Make Predictions

The predictions for each test instance are then listed in the “Classifier Output” pane. Specifically the middle column of the results with predictions like “tested_positive” and “tested_negative”.

You could choose another output format for the predictions, such as CSV, that you could later load into a spreadsheet like Excel. For example, below is an example of the same predictions in CSV format.

Weka Predictions Made on New Data By a Loaded Model

More Information

The Weka Wiki has some more information about saving and loading models as well as making predictions that you may find useful:

Hey, thank you very much for your help!
Just a sidenote for those who have problems with doing the exact same thing as you described using .csv input files: The above description is perfect for .arff but in my case (with .csv) it made predictions for the first 112 lines only and stopped for no reason. Transforming the input (training and test data) solved that problem.
I am looking forward to more tutorials from you 🙂

Thanks for your good work. Please I need your assistance, i am working on crime and i am new in using weka.I have used weka to divide my data set intoo both test and training data set both in CSV format. but the system is complaining whenever I put classfier (such as Bayes, KNN) and i loaded the tested data set on it.

Hello,
Please should train dataset and test dataset be of the same format. If yes why then is my weka complain of incompatible test data set. Also is it the test data that we are converting back to plain test?

Thanks for the tutorial. I have a question, why the number of instances is unknown? and how can I evaluate the accuracy of the prediction? I mean I need to see the number of correctly classified instances and so on…

Hi,
I need to train model on separate genre(blogs data) and test on another genre(hotel reviews). I trained a model by 1. appling StringToWordVector filter(change some settings of filter) 2. attribute selection 3. applied classify Logistic with option “use training set” 4. saved the model. Now I am confused about testing file, should I need to apply all these steps till 3 on test file also? by doing this my train and test file attributes are different but the same format.
Should my training file attribute and test file attribute exactly the same(same to same)? If yes then can I copy the attributes from training file(top to @data) and paste in my test file, is it correct?
If train and test file attributes can be different then there is an error “Data used to train model and test set are not compatible. Would you like to automatically wrap the classifier in “InputMappedClassifier”, what does it mean? if choose Yes what will it do.
Sorry sir, I have many questions. I explored a lot still confused. It will be great help.
Thank you

I have built a logistic regression model in Weka and want to be able to identify what the predictions were for each specific data point. The output I currently have does not allow me to match the predictions to the individual instances.

Hi Jason
Great article. I followed the steps you suggested and I am applying Random Forest classifier. I have the same set of attributes for the training and test set. However in the stage where I predict for unknown data, it ignores all the instances. Below is the message I get in the classifier output:
=== Summary ===

this blog is really helpful, can you please suggest me how can I make UI application on the top of the model using Python where users can put the data manually and it will give the result like positive or negative

Thanks Jason, this is super helpful. Do you know if there is a way to save particular multilayer percepton configurations? I’m running the percepton classifier and set GUI to true in order to tinker with it, but I can’t for the life of me figure out how to save the tinkered configuration so that I can reuse it. I’ve looked everywhere.

Hmm, that correctly saved the usual parameters like Num Epochs, Learning Rate, etc., but it didn’t save the particular percepton GUI tweaks — say, ones where I connect and disconnect certain nodes to other certain nodes by hand using the percepton GUI.

Did I miss a step, or is there something else I’m supposed to do that’s unique to allowing it to save changes made in the GUI?

Thanks for the tutorial. I am new to Weka and machine learning. The tutorial helped a lot. Just wanted to know how to judge the predicted value for a particular instance? Is the prediction done in order?

Hi Jason. Thank you for the good tutorial. Is that all there is to making predictions using WEKA? I mean,
a) Choose the appropriate Model (i.e Classifier)
b) Run it on the Supplied Test Set
c) Save the Model
d) Load and dataset in WEKA Explorer just to have access to the Classifier tab
e) Load your Model
f) Open the new file, and finally
g) Re-evaluate the model on the new file for your predictions.

Say I am trying to further tune and test the algorithms, and I have separate test and training sets, which contain different distribution of the instances so that I can choose to mimic real world distribution or keep it 50/50 and see which option gives me better accuracy with the test set (that will have real-world-like distribution). I would not like to save many models, naturally. Could I then re-evaluate without saving it, skipping to step four as soon as I finish cross-validation with the training set?

thank you very much for great tutorials, Dr. Brownlee. They help me a lot in my final project at school.

I would like to perform this kind of predictive modeling techniques at work, but we work with very large data sets (millions of tuples) so my question is – would Weka be able to handle very large data sets?
Weka seems very easy and user friendly tool.

Hi jason,
Im lina and i read each tutorial step on top …. But still confuse, if we use totally new data as a test set, can it run properly? Example on top show you use 5 same data to predict the class….

Hello. Thanks for the tutorial.
My question is:
Is it possible to perform Cross-validation or Split-percentage in data loaded from a model?
Or if I want to perform any of those two, I have necessarily to load the corresponding training dataset and build a new model for them?

Thanks for the answer.
I have the following situation:
I use a dataset “training.arff” and a classifier, say RandomForest, to generate a model “model1.model”; then I save it.
If I want to evaluate a testing set with it, I load “model1.model” and use the option “reevaluate model on current test set”. Everything is ok until that point.

But if I want to validate my model, I find that there’s no direct way to use CV or split directly over the data used from model1.model. I have necessarily to reload “training.arff”, use CV, and see how it says “building model for training data”, meaning that it is generating another model.

I was wondering if it was possible to validate generated models.
Again, thank you for your feedback

Hi, thanks for your informative article.
I have a query about the indexes of test data instances choosen by weka at the time of cross validation. How to get the index of the test data that is being tested ?

in the main data file first few instances are :
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa
5.4,3.9,1.7,0.4,Iris-setosa

Hi, thank you for the wonderful tutorial.
I am using csv instead of arff.
When I supply test set with 145 true and 70 false instances (in that order), the result is shown only for 145 instances. It doesn’t calculate the result for the 70 instances.
If the set is randomly ordered, the result is shown only for the first few instances with same true/false value. For e.g., if the first ten instances are false, and 11th is true, the result (and confusion matrix) is only calculated for the first ten instances.
Please help.

Hello there Jason. Have been following some of your tutorials on here for some time. Glad to see you still answer questions. Mine is regarding the test set. I made all the instances of class in the @data region as “?” like in the example but why is the result of my model’s classification like this?

// Create the empty dataset “sample” with above attributes
Instances sample = new Instances(“sample”, attributes, 0);
// Make position the class attribute
sample.setClassIndex(classAttribute.index());
// Create empty instance with five attribute values
Instance inst = new DenseInstance(2);
// Set instance values
inst.setValue(text, “What is this are you kidding me 1 2 3 4”);
// Set instance’s dataset to be the dataset “race”
inst.setDataset(sample);
// Set class as missing so we can predict
inst.setClassValue(0); // When I set class as missing, the filter not working at all.
sample.add(inst);

I want to add one more column to the .arff file which I do not want to be used by classifier, but which I want to be present on the prediction output, it is just kind of name for each instance which I need to have in the output – how would I go about it in Explorer?
Thanks a lot.

I also have this problem. I created the model but my test dataset have some extra attributes and weka uses InputMappedClassifier, but it retrains the model instead of just testing it. why is this happening?

Thank you.
I was able to predict the model.
One more doubt.
I’m working with prediction of evasion in distance education.
A model created with RandomForest, during training, obtained accuracy of 90.01% and F-measure of 0.906. But when I use the model to make predictions, it classifies all instances as evasion (YES). The database used to test the model is similar to the one used in the training. I have already reviewed the databases and repeated the entire process, but there was no change in the results. Do you have any idea how I could solve it?

Hello Jason,
Thank you for sharing this knowledge with us.
I have question, I am beginner in WEKA and I have go through the steps above, I have read that the test set should have the same number of attributes as the training set, but how it would be possible knowing that I am working on tweets and the word vector would be different.