"... The problem of maximizing a concave function f(x) in a simplex S can be solved approximately by a simple greedy algorithm. For given k, the algorithm can find a point x(k) on a k-dimensional face of S, such that f(x(k)) ≥ f(x∗) − O(1/k). Here f(x∗) is the maximum value of f in S. This algorithm an ..."

The problem of maximizing a concave function f(x) in a simplex S can be solved approximately by a simple greedy algorithm. For given k, the algorithm can find a point x(k) on a k-dimensional face of S, such that f(x(k)) ≥ f(x∗) − O(1/k). Here f(x∗) is the maximum value of f in S. This algorithm and analysis were known before, and related to problems of statistics and machine learning, such as boosting, regression, and density mixture estimation. In other work, coming from computational geometry, the existence of ɛ-coresets was shown for the minimum enclosing ball problem, by means of a simple greedy algorithm. Similar greedy algorithms, that are special cases of the Frank-Wolfe algorithm, were described for other enclosure problems. Here these results are tied together, stronger convergence results are reviewed, and several coreset bounds are generalized or strengthened.

"... Following recent work of Clarkson, we translate the coreset framework to the problems of finding the point closest to the origin inside a polytope, finding the shortest distance between two polytopes, Perceptrons, and soft- as well as hard-margin Support Vector Machines (SVM). We prove asymptoticall ..."

Following recent work of Clarkson, we translate the coreset framework to the problems of finding the point closest to the origin inside a polytope, finding the shortest distance between two polytopes, Perceptrons, and soft- as well as hard-margin Support Vector Machines (SVM). We prove asymptotically matching upper and lower bounds on the size of coresets, stating that ɛ-coresets of size ⌈(1 + o(1))E ∗ /ɛ⌉ do always exist as ɛ → 0, and that this is best possible. The crucial quantity E ∗ is what we call the excentricity of a polytope, or a pair of polytopes. Additionally, we prove linear convergence speed of Gilbert’s algorithm, one of the earliest known approximation algorithms for polytope distance, and generalize both the algorithm and the proof to the two polytope case. Interestingly, our coreset bounds also imply that we can for the first time prove matching upper and lower bounds for the sparsity of Perceptron and SVM solutions.

...arized offset), in the case that all points have the same norm. In this case the smallest enclosing ball problem is equivalent to finding the distance of one polytope from the origin. In another work =-=[15]-=- directly used coresets to solve the problem of separating two polytopes by a hyperplane passing through the origin, but this is again equivalent to a one polytope distance problem. Both approaches ar...

Discriminative techniques, such as conditional random fields (CRFs) or structure aware maximum-margin techniques (maximum margin Markov networks (M 3 N), structured output support vector machines (S-SVM)), are state-of-the-art in the prediction of structured data. However, to achieve good results these techniques require complete and reliable ground truth, which is not always available in realistic problems. Furthermore, training either CRFs or margin-based techniques is computationally costly, because the runtime of current training methods depends not only on the size of the training set but also on properties of the output space to which the training samples are assigned. We propose an alternative model for structured output prediction, Joint Kernel Support Estimation (JKSE), which is rather generative in nature as it relies on estimating the joint probability density of samples and labels in the training set. This makes it tolerant against incomplete or incorrect labels and also opens the possibility of learning in situations where more than one output label can be considered correct. At the same time, we avoid typical problems of generative models as we do not attempt to learn the full joint probability distribution, but we model only its support in a joint reproducing

"... Active learning methods often achieve improved performance using fewer labels compared to passive learning methods. A variety of practically successful active learning algorithms use a passive learning algorithm as a subroutine, and the essential role of the active component is to construct data set ..."

Active learning methods often achieve improved performance using fewer labels compared to passive learning methods. A variety of practically successful active learning algorithms use a passive learning algorithm as a subroutine, and the essential role of the active component is to construct data sets to feed into the passive subroutine. This general idea is appealing for a variety of reasons, as it may be able

"... We present a streaming model for large-scale classification (in the context of ℓ2-SVM) by leveraging connections between learning and computational geometry. The streaming model imposes the constraint that only a single pass over the data is allowed. The ℓ2-SVM is known to have an equivalent formula ..."

We present a streaming model for large-scale classification (in the context of ℓ2-SVM) by leveraging connections between learning and computational geometry. The streaming model imposes the constraint that only a single pass over the data is allowed. The ℓ2-SVM is known to have an equivalent formulation in terms of the minimum enclosing ball (MEB) problem, and an efficient algorithm based on the idea of core sets exists (CVM) [Tsang et al., 2005]. CVM learns a (1+ε)-approximate MEB for a set of points and yields an approximate solution to corresponding SVM instance. However CVM works in batch mode requiring multiple passes over the data. This paper presents a single-pass SVM which is based on the minimum enclosing ball of streaming data. We show that the MEB updates for the streaming case can be easily adapted to learn the SVM weight vector in a way similar to using online stochastic gradient updates. Our algorithm performs polylogarithmic computation at each example, and requires very small and constant storage. Experimental results show that, even in such restrictive settings, we can learn efficiently in just one pass and get accuracies comparable to other stateof-the-art SVM solvers (batch and online). We also give an analysis of the algorithm, and discuss some open issues and possible extensions. 1

"... We propose a new view of active learning algorithms as optimization. We show that many online active learning algorithms can be viewed as stochastic gradient descent on non-convex objective functions. Variations of some of these algorithms and objective functions have been previously proposed withou ..."

We propose a new view of active learning algorithms as optimization. We show that many online active learning algorithms can be viewed as stochastic gradient descent on non-convex objective functions. Variations of some of these algorithms and objective functions have been previously proposed without noting this connection. We also point out a connection between the standard min-margin offline active learning algorithm and non-convex losses. Finally, we discuss and show empirically how viewing active learning as non-convex loss minimization helps explain two previously observed phenomena: certain active learning algorithms achieve better generalization error than passive learning algorithms on certain data sets (Schohn and Cohn, 2000; Bordes et al., 2005) and on other data sets many active learning algorithms are prone to local minima (Schütze et al., 2006). 1 1

"... This paper introduces the Furthest Hyperplane Problem (FHP), which is an unsupervised counterpart of Support Vector Machines. Given a set of n points in R d, the objective is to produce the hyperplane (passing through the origin) which maximizes the separation margin, that is, the minimal distance b ..."

This paper introduces the Furthest Hyperplane Problem (FHP), which is an unsupervised counterpart of Support Vector Machines. Given a set of n points in R d, the objective is to produce the hyperplane (passing through the origin) which maximizes the separation margin, that is, the minimal distance between the hyperplane and any input point. To the best of our knowledge, this is the first paper achieving provable results regarding FHP. We provide both lower and upper bounds to this NP-hard problem. First, we give a simple randomized algorithm whose running time is n O(1/θ2) where θ is the optimal separation margin. We show that its exponential dependency on 1/θ 2 is tight, up to sub-polynomial factors, assuming SAT cannot be solved in sub-exponential time. Next, we give an efficient approximation algorithm. For any α ∈ [0, 1], the algorithm produces a hyperplane whose distance from at least 1 − 3α fraction of the points is at least α times the optimal separation margin. Finally, we show that FHP does not admit a PTAS by presenting a gap preserving reduction from a particular version of the PCP theorem. 1.

...nd possibly faster whether approximating p ∈ conv(S), or estimating the distance ∆ to within a factor of two. In the context support vector machines (SVM) (see [3] for applications), Har-Peled et al. =-=[20]-=- use coreset to give an approximation algorithm, see also Zimak [51]. We mention these because we feel that the Triangle Algorithm, despite some similarities with existing algorithms or their analysis...

"... We present a simple, first-order approximation algorithm for the support vector classification problem. Given a pair of linearly separable data sets and ɛ ∈ (0, 1), the proposed algorithm computes a separating hyperplane whose margin is within a factor of (1 − ɛ) of that of the maximum-margin separa ..."

We present a simple, first-order approximation algorithm for the support vector classification problem. Given a pair of linearly separable data sets and ɛ ∈ (0, 1), the proposed algorithm computes a separating hyperplane whose margin is within a factor of (1 − ɛ) of that of the maximum-margin separating hyperplane. We discuss how our algorithm can be extended to nonlinearly separable and inseparable data sets. The running time of our algorithm is linear in the number of data points and in 1/ɛ. In particular, the number of support vectors computed by the algorithm is bounded above by O(ζ/ɛ) for all sufficiently small ɛ&gt; 0, where ζ is the square of the ratio of the distances between the farthest and closest points in the two data sets. Furthermore, we establish that our algorithm exhibits linear convergence. We adopt the real number model of computation in our analysis.

...e solved using direct methods. Therefore, previous research on solution approaches has either focused on decomposition methods (see, e.g., [21, 22, 15, 29]) or on approximation algorithms (see, e.g., =-=[16, 14, 7, 11]-=-). In this paper, we take the latter approach and aim to compute a separating hyperplane whose margin is a close approximation to that of the maximum-margin separating hyperplane. Given ɛ ∈ (0, 1), an...

"... Given n points in a d dimensional Euclidean space, the Minimum Enclosing Ball (MEB) problem is to find the ball with the smallest radius which contains all n points. We give two approximation algorithms for producing an enclosing ball whose radius is at most ɛ away from the optimum. The first requir ..."

Given n points in a d dimensional Euclidean space, the Minimum Enclosing Ball (MEB) problem is to find the ball with the smallest radius which contains all n points. We give two approximation algorithms for producing an enclosing ball whose radius is at most ɛ away from the optimum. The first requires O(ndL / √ ɛ) effort, where L is a constant that depends on the scaling of the data. The second is a O ∗ (ndQ / √ ɛ) approximation algorithm, where Q is an upper bound on the norm of the points. This is in contrast with coresets based algorithms which yield a O(nd/ɛ) greedy algorithm. Finding the Minimum Enclosing Convex Polytope (MECP) is a related problem wherein a convex polytope of a fixed shape is given and the aim is to find the smallest magnification of the polytope which encloses the given points. For this problem we present O(mndL/ɛ) and O ∗ (mndQ/ɛ) approximation algorithms, where m is the number of faces of the polytope. Our algorithms borrow heavily from convex duality and recently developed techniques in non-smooth optimization, and are in contrast with existing methods which rely on geometric arguments. In particular, we specialize the excessive gap framework of Nesterov [19] to obtain our results. 1

...readily be established. We omit details for brevity. 5 Applications to Machine Learning The connection between the MEB problem and SVMs has been discussed in a number of publications [Clarkson, 2008, =-=Har-Peled et al., 2007-=-, Gärtner and Jaggi, 2009]. Practical algorithms using coresets were also proposed in Tsang et al. [2007] and Tsang et al. [2005]. In all these cases, our improved MEB algorithm can be plugged in as ...