A joint project of the Graduate School, Peabody College, and the Jean & Alexander Heard Library

Title page for ETD etd-04022012-093720

Type of Document

Dissertation

Author

Sekmen, Ali Safak

Author's Email Address

ali.sekmen@vanderbilt.edu

URN

etd-04022012-093720

Title

Subspace Segmentation and High-Dimensional Data Analysis

Degree

PhD

Department

Mathematics

Advisory Committee

Advisor Name

Title

Akram Aldroubi

Committee Chair

Alexander Powell

Committee Member

Douglas Hardin

Committee Member

Larry Schumaker

Committee Member

Mitchell Wilkes

Committee Member

Keywords

motion segmentation

subspace segmentation

union of subspaces

Date of Defense

2012-03-27

Availability

unrestricted

Abstract

This thesis developed theory and associated algorithms to solve subspace segmentation problem. Given a set of data W={w_1,...,w_N} in R^D that comes from a union of subspaces, we focused on determining a nonlinear model of the form U={S_i}_{i in I}, where S_i is a set of subspaces, that is nearest to W. The model is then used to classify W into clusters. Our first approach is based on the binary reduced row echelon form of data matrix. We prove that, in absence of noise, our approach can find the number of subspaces, their dimensions, and an orthonormal basis for each subspace S_i. We provide a comprehensive analysis of our theory and determine its limitations and strengths in presence of outliers and noise. Our second approach is based on nearness to local subspaces approach and it can handle noise effectively, but it works only in special cases of the general subspace segmentation problem (i.e., subspaces of equal and known dimensions). Our approach is based on the computation of a binary similarity matrix for the data points. A local subspace is first estimated for each data point. Then, a distance matrix is generated by computing the distances between the local subspaces and points. The distance matrix is converted to the similarity matrix by applying a data-driven threshold. The problem is then transformed to segmentation of subspaces of dimension 1 instead of subspaces of dimension d. The algorithm was applied to the Hopkins 155 Dataset and generated the best results to date.