Title

Author

Document Type

Dissertation

Date of Degree

Spring 2016

Degree Name

PhD (Doctor of Philosophy)

Degree In

Computer Science

First Advisor

Kasturi Varadarajan

Abstract

In this document, we consider coreset and total sensitivity for shape fitting problems. The shape fitting problems that are of considerable interest include: (1) (j, k) projective clustering problem, and (2) circle fitting problem on the plane. In (j, k) projective clustering, we are given a finite set of points P in d-dimensional Euclidean space, and the goal is to find a shape, which is a k-tuple j-flats (affine j-subspace), that best fits P. In circle fitting problem, given an input point set P ⊂ ℝ2, the goal is to find a circle that best fits P. In L1-fitting, the cost of fitting P to a shape F is defined as Σp∈p dist(p, F), where dist(p, F) is the cost of assigning P to F, while in F∞-fitting, maxp∈p dist(pF). We focus on L1-fitting.

A coreset is a compact representation of the input point set. For a shape fitting problem, a coreset for a point set P is a weighted point set, with the property that the cost of fitting the coreset to a shape F approximates the cost of fitting P to F, for every shape in the family of shapes. Coreset of small (e.g., constant) cardinality is of interest, because one can afford to use off-shelf, perhaps computationally expensive algorithms to solve the geometric optimization problem for the coreset, and a good solution for the coreset is guaranteed to be also good for the original input. Depending on whether the fitting problem is L1 fitting or L∞ fitting, the coreset is L1 coreset or L∞ coreset, respectively.

One way to obtain small coreset is via non-uniform sampling, using the framework by [30]. Given a point set P, the “importance" of each point p∈P is quantified by its sensitivity σp(p), and the total sensitivity of P is the summation of sensitivities at every point, Σp∈PσP(p). It is shown that if one samples the point set P according to the probability distribution imposed by the sensitivities, one obtains coresets of size roughly O(�2P).

Total sensitivity of a shape fitting problem quantifies the complexity of the shapes, which is the main object being studied in this thesis. We briefly summarize the main results below.

We establish the connection between L∞ coreset and L1 coreset. In particular, we show that shape fitting problems with small L∞ coreset also have small L1 coreset. This connection allows us to use existing work on L∞ coreset to obtain small L1 coreset for the aforementioned shape fitting problems (variants of (j,k) projective clustering, and circle fitting). Consequently, we obtain the first near-linear algorithm for integer (j,k) projective clustering in high dimension.

We show that the total sensitivity of shape fitting problem in ℝd depends on the intrinsic dimension of the shapes. For many shape fitting problems, the shapes are low-dimensional: for example, in (j,k) projective clustering, each shape is a union of k j-flats, and each k-tuple of j-flats is contained in a subspace of dimension O(jk). This fact allows us to get a dimension-reduction type result for the (j,k)-projective clustering problems. Specifically, for integer (j,k) projective clustering, the upper bounds of the total sensitivity is improved from O((log n)f(d,j,k)) to O((log n)f(j,k)), where f (j, k) is a function depending on only j, and k, and no longer on the possibly large d.

We obtain coreset of size O((log n)2), using the connection between L∞ coreset and L1 coreset. We show that circle fitting problem does not admit coreset of size o((log n). In particular, we show a construction of a point set, such that any 1/100-coreset of P has size at least Ω(log n).

Public Abstract

In this document, we study coresets for shape fitting problems. Shape fitting problems include various optimization problems people encounter in machine learning, computer vision, image processing, computational metrology, etc. Usually for such problems, either exact algorithms are not known to exist, or are computationally expensive. The idea of coreset is to obtain a small subset ̶ so called “succinct presentation" ̶ of the original input, which faithfully captures all the characteristics of the input, and then solve the same optimization problem with the smaller input (coreset).

Depending on how one quantifies how well a shape approximates the input point set, there are L∞ and L1 shape fitting problems. Coresets for L∞ shape fitting problems have been proven to be very successful and influential in obtaining fast approximation algorithms for a wide variety of geometric approximation problems. Inspired by that, we study coresets for L1 shape fitting problems.

We obtain coresets for shape fitting problems such as k-clustering and subspace approximation (from machine learning), k-line fitting (from computer vision), and a more general problem known as (j, k) projective clustering. In addition, for a problem from computational metrology, circle fitting problem, we obtain both small coreset for this problem, and we also show a lower bound on the size of the coreset. These results on coresets allows us to obtain fast approximation algorithms for the corresponding shape fitting problems.