Conference Review Session

Modified Logistic Regression: An Approximation to SVM and Its Applications in Large-Scale Text Categorization
ICML-2003Jian Zhang

Logistic Regression (LR) has been widely used in statistics for many
years, and has received extensive study in machine learning community
recently due to its close relations to Support Vector Machines (SVM) and
AdaBoost. In this paper, we use a modified version of LR to approximate
the optimization of SVM by a sequence of unconstrained optimization
problems. We prove that our approximation will converge to SVM, and
propose an iterative algorithm called ``MLR-CG'' which uses Conjugate
Gradient as its inner loop. Multiclass version ``MMLR-CG'' is also
obtained after simple modifications. We compare the MLR-CG with
SVM_light over different text categorization collections, and show that
our algorithm is much more efficient than SVM_light when the number of
training examples is very large. Results of the multiclass version
MMLR-CG is also reported.

Exploration and Exploitation in Adaptive Filtering Based on Bayesian Active Learning
ICML-2003Yi Zhang

In the task of adaptive information filtering, a system receives a
stream of documents but delivers only those that match a person's
information need. As the system filters it also refines its knowledge
about the user's information needs based on relevance feedback from the
user. Delivering a document thus has two effects: i) it satisfies the
user's information need immediately, and ii) it helps the system better
satisfy the user in the future by improving its model of the user's
information need. The traditional approach to adaptive information
filtering fails to recognize and model this second effect.

We propose Utility Divergence as the measure of model quality. Unlike
the model quality measures used in most active learning methods, utility
divergence is represented on the same scale as the filtering system's
target utility function. Thus it is meaningful to combine the expected
immediate utility with the model quality, and to quantitatively manage
the trade-off between exploitation and exploration. The proposed
algorithm is implemented for setting the filtering system's
dissemination threshold, a major problem for adaptive filtering systems.
Experimental results on TREC-9 and TREC-10 filtering data will be
reported. We will also discuss the relationship between Utility
Divergence and other active learning algorithms.

We presents a formal analysis of popular text classification
methods, focusing on their loss functions whose minimization is
essential to the optimization of those methods, and whose
decomposition into the training-set loss and the
model complexityenables cross-method comparisons on a common basis from an
optimization point of view. Those methods include Support Vector
Machines, Linear Regression, Logistic Regression, Neural Network,
Naive Bayes, K-Nearest Neighbor, Rocchio-style and Multi-class
Prototype classifiers. Theoretical analysis (including our new
derivations) is provided for each method, along with evaluation
results for all the methods on the Reuters-21578 benchmark corpus.
Using linear regression, neural networks and logistic regression
methods as examples, we show that properly tuning the balance between
the training-set loss and the complexity penalty would have a
significant impact to the performance of a classifier. In linear
regression, in particular, the tuning of the complexity penalty
yielded a result (measured using macro-averaged F1) that outperformed
all text categorization methods ever evaluated on that benchmark
corpus, including Support Vector Machines.