Abstract

Binary classification tasks are among the most important ones in the field of machine learning. One prominent approach to address such tasks are support vector machines which aim at finding a hyperplane separating two classes well such that the induced distance between the hyperplane and the patterns is maximized. In general, sufficient labeled data is needed for such classification settings to obtain reasonable models. However, labeled data is often rare in real-world learning scenarios while unlabeled data can be obtained easily. For this reason, the concept of support vector machines has also been extended to semi- and unsupervised settings: in the unsupervised case, one aims at finding a partition of the data into two classes such that a subsequent application of a support vector machine leads to the best overall result. Similarly, given both a labeled and an unlabeled part, semi-supervised support vector machines favor decision hyperplanes that lie in a low density area induced by the unlabeled training patterns, while still considering the labeled part of the data. The associated optimization problems for both the semi- and unsupervised case, however, are of combinatorial nature and, hence, difficult to solve. In this work, we present efficient implementations of simple local search strategies for (variants of) the both cases that are based on matrix update schemes for the intermediate candidate solutions. We evaluate the performances of the resulting approaches on a variety of artificial and real-world data sets. The results indicate that our approaches can successfully incorporate unlabeled data. (The unsupervised case was originally proposed by Gieseke F, Pahikkala et al. (2009). The derivations presented in this work are new and comprehend the old ones (for the unsupervised setting) as a special case.)

as new objective value. The matrix \(\overline{{\bf K}} = {\bf D} \widetilde{{\bf K}} {\bf D}\) has (at most) r non-zero eigenvalues. To compute them efficiently, we make use of the following derivations: Let BBT be the Cholesky decomposition of the matrix (KR,R)−1 and \({\bf U} \varvec{\Upsigma} {\bf V}^{\rm T}\) be the thin singular value decomposition of BTKRD. The r nonzero eigenvalues of

can then be obtained from \({\varvec{\Upsigma}^2 \in {\mathbb R}^{r\times r}}\) and the matrix \({{\bf V} \in {\mathbb R}^{n \times r}}\) consists of the corresponding eigenvectors (we have UTU = I, see below). By assuming that these non-zero eigenvalues are the first r elements in the matrix \({\varvec{\Uplambda} \in {\mathbb R}^{n \times n}}\) of eigenvalues (of \(\widetilde{{\bf K}}\)), we have \([\varvec{\Uplambda} \tilde{\varvec{\Uplambda}}]_{i,i} = 0\) for \(i=r + 1,\ldots, n; \) hence, the remaining eigenvectors (with eigenvalue 0) do not have to be computed for the evaluation of (13). To sum up, yTDV can be updated in \(\mathcal{O}(r)\) time per single coordinate flip. Further, all preprocessing matrices can be obtained in \(\mathcal{O}(n r^2)\) runtime (in practice and up to machine precision) using \(\mathcal{O}(n r)\) space.

Appendix 2: Matrix calculus

For completeness, we summarize some basic definitions and theorems of the field of matrix calculus that may be helpful when reading the paper. The following definitions and facts are taken from [19] and [16].

Definition 1

(Positive (Semi-)Definite Matrices) A symmetric matrix \({{\bf M}\in\mathbb{R}^{m\times m}}\) is said to be positive definite if

We use the notations \({\bf M} \succ 0\) and \({\bf M} \succeq 0\) if M is positive definite or positive semidefinite, respectively. It is straightforward to derive that if \({{\bf M}_1,\ldots,{\bf M}_p\in\mathbb{R}^{m\times m}}\) are positive definite matrices and \({\alpha_1, \ldots, \alpha_p \in {\mathbb R}}\) are positive coefficients, then

$$ a_1{\bf M}_1+\ldots+a_p{\bf M}_p $$

(23)

is positive definite as well, i.e., any positive linear combination of positive definite matrices is positive definite ([19], pp. 396–398). A lower triangular matrix is a matrix, where the entries above its diagonal are zero.

Fact 1

where \({{\bf N}\in\mathbb{R}^{m\times m}}\) is a lower triangular matrix whose diagonal entries are strictly positive. This factorization is known as the Cholesky decomposition.

The Cholesky decomposition for a m × m-matrix can be obtained in O(m3) time (in practice and up to machine precision, see [16] pp. 141–145).

Definition 2

(Orthogonal Matrix) A matrix \({{\bf M}\in\mathbb{R}^{m\times m}}\) is called orthogonal if

$$ {\bf M}^{\rm T}{\bf M}={\bf M}{\bf M}^{\rm T}={\bf I}, $$

i.e., if the inverse M−1 of a M equals its transpose MT.

Fact 2

(Singular Value Decomposition) A matrix \({{\bf M}\in{\mathbb R}^{m\times n}}\) can be written in the form

$$ {\bf M}={\bf U}\varvec{\Upsigma}{\bf V}^{\rm T}, $$

(25)

where \({{\bf U}\in{\mathbb R}^{m\times m}}\) and \({{\bf V}\in{\mathbb R}^{n\times n}}\) are orthogonal, and where \({\varvec{\Upsigma}\in{\mathbb R}^{m\times n}}\) is a diagonal matrix with non-negative entries. The decomposition is called the singular value decomposition (SVD) of M.

The values on the diagonal matrix are called the singular values of M; they are usually arranged in descending order, i.e., \({[\varvec{\Upsigma}]}_{1,1}\geq \ldots \geq{[\varvec{\Upsigma}]}_{p,p}\) with p = min(n, m).

Fact 3

(Thin Singular Value Decomposition) The thin or economy-size singular value decomposition of \({{\bf M}\in{\mathbb R}^{m\times n}}\) with m ≥ n is of the form

Note that the thin singular value decomposition for a matrix \({{\bf M} \in{\mathbb R}^{n\times m}}\) can be computed in O(nm2) time (in practice and up to machine precision, see [16] p. 239).

Fact 4

(Eigendecomposition) If \({{\bf M}\in\mathbb{R}^{m\times m}}\) is symmetric, then it can be factorized as

$$ {\bf M}={\bf V}\varvec{\Uplambda}{\bf V}^{\rm T}, $$

(27)

where \({{\bf V}\in\mathbb{R}^{m\times m}}\) is an orthogonal matrix containing the eigenvectors of M and \(\varvec{\Uplambda}\) is a diagonal matrix containing the corresponding eigenvalues ( [19], p. 107).

Note that if the nonzero eigenvalues are stored in the first r diagonal entries of \(\varvec{\Uplambda}, \) then (analogously to the economy-sized singular value decomposition) the matrix M can be written as in (27) but with \({{\bf V}\in\mathbb{R}^{m\times r}}\) and \({\varvec{\Uplambda}\in\mathbb{R}^{r\times r}}\).

Fact 5

(SVD and Eigendecomposition) We have the following relationship between the SVD and the eigendecomposition. If (25) or (26) is the SVD of \({{\bf M}\in\mathbb{R}^{m\times m}}\), then

is the eigendecomposition of MTM. Here, the eigenvalues of the matrix MTM are the squares of the singular values of M. Note that an analogous relationship also holds between the economy-sized decompositions.

Fact 6

(Further Matrix Properties) If \({{\bf M}\in\mathbb{R}^{m\times m}}\) is a (symmetric) positive definite matrix and \({\bf M}={\bf V}\varvec{\Uplambda}{\bf V}^{\rm T}\) is its eigendecomposition, then

that is, the eigenvalues of positive definite matrices are strictly positive real numbers ([19], p. 398). From this, it follows that all positive definite matrices are invertible and their inverse matrices are also positive definite. Moreover, we have