Search form

Parallel multiclass logistic regression.

Author(s)

Siddharth Gopal, Yiming Yang

IP Agreement

Refer to http://nyc.lti.cs.cmu.edu/software/

Readme

This README file describes how to use the parallel MLR package.

WARNING: The program was untested after I made some modifications ! If the dataset or parameters dont read into memory, there is some modification to be made to the Distributed Cache section of the code. [emailing me is a better option]

ACKNOWLEDGEMENTS

LBFGS.java and Msrch.java are the implementation of Limited Memory BFGS and associated line search by robert_dodier@yahoo.com

PACKAGE DESCRIPTION

This tool train a regularized Multiclass Logistic Regression for large number of classes.

This is especially useful, when the parameters of all classes cannot be held in memory.

THIS TOOL ASSUMES THE DATASET CAN BE FIT IN MEMORY. [ I plan to do extend this code to datasets which cannot be fit in memory in the near future, but no ideas as of yet. ]

Dataset Format

Please use the Converter tool associated with the MulticlassClassifier package to convert your dataset to the appropriate binary format.

Things to tweak

Here are some parameters to tweak for good performance and convergence,

Regularization parameter : gc.iterativemlr-train.lambda

Total number of iterations to run : iterativemlr-train.iterations

The accuracy of inner lbfgs optimization. A heuristic has been implemented in lines 194 to 196 of TrainingDriver.java. This is by no means a 'recommended' strategy. Please consider rewriting your own for your dataset.

Testing a classifier

Please use the Testing tool associated with the MulticlassClassifier package to convert your dataset to the appropriate binary format.