About Me:

My research interests include Optimization, Differential Geometry, and Deep Learning.

I'm currently working to develop new strategies for learning the structure of deep networks, and to better understand the interplay between a network's structure and its effectiveness in particular tasks.
In addition, I'm interested in identifying intrinsic geometric properties of machine learning problems and using them to develop representations that are useful for tasks such as transfer learning and dimensionality reduction.

Besides these interests, I have worked alongside Professor Darby Dyar to automatically construct optimized preprocessing techniques for interpreting spectroscopic data, such as from the Mars Curiosity rover.

Optimization on Subspace Manifolds

In many learning algorithms, data is projected onto a low-dimensional subspace. If the subspace is chosen well, this process can remove redundant features and simplify the overall learning process. Often, it's sufficient to generate the subspace using the Principal Components Analysis (PCA), but this is certainly not the only choice. Depending on the application, a better approach may be to optimize the subspace using a loss function that captures the real goal of the problem.

Optimizing a variable that represents a subspace is nontrivial because the set of all subspaces is not a Euclidean space. Instead, it forms a manifold called the Grassmannian. As a result, a variable that initially represents a subspace may no longer be valid if it is updated using standard gradient-based approaches. Fortunately, there are specialized Riemannian optimization methods that solve this problem by incrementally moving along the surface of the manifold. This allows the loss to be minimized without having to constrain the variable explicitly.

While it's useful to have a convenient way of optimizing over subspaces, there are other more general constraints that can be useful in practice.

For example, suppose some data is composed of signals from two distinct processes.
We might ask if a specific feature we observe is attributable to the first process or the second one. Given appropriate domain knowledge, we could create a pair of subspaces, and optimize them to span the features generated by each process. However, if the subspaces are learned separately, there is nothing to stop a feature from being included in both of them. To ensure that each feature is only contained in one of the subspaces, there must not be any overlap between the subspaces. Thus, to implement an approach like this, we would need to optimize over pairs (or more generally, collections) of mutually orthogonal subspaces, which is not possible using the Grassmannian manifold.

Recently, myself and others have proposed the partitioned subspace manifold, which generalizes the Grassmannian and captures the geometry of these constraints.
We've also derived Riemannian optimization methods for the manifold, making it easy for users to apply these constraints in their applications. Currently, we have used this approach for multiple-dataset analysis and for domain adaptation, and have found that the manifold offers several interesting and promising characteristics when setting up an optimization problem. You can read more about this work in our paper, "A Manifold Approach to Learning Mutually Orthogonal Subspaces".

.

Baseline Removal for Spectroscopic Data

One of the many useful tools onboard the NASA Mars Rover Curiosity is the ChemCam instrument, which uses Laser-Induced Breakdown Spectroscopy (LIBS) to obtain data describing the chemical composition of the Martian surface. Each LIBS sample is a high dimensional signal that is transmitted to Earth where it can be analyzed. In LIBS as well as many other areas of spectroscopy, the shape, size, and distribution of peaks present in spectral data are of central interest as they encode properties of the sample that are useful for prediction. Unfortunately, spectral data is often corrupted by physical phenomena that introduce a smoothly varying continuum or baseline into the signal. The problem of correcting for these effects is known as baseline removal.

Over several decades, a large number of methods have been proposed that solve this problem with varying degrees of success, but selecting and tuning the best method for a given task is tedious and time consuming. It is therefore desirable to automate the search for the ideal baseline removal method and its parameters.
Alongside Professor Darby Dyar, we designed a system that generates novel baseline removal methods optimized for the particular problem a scientist might be working on.
Following our initial investigations that showed that existing methods share many common subtasks, such as locating peaks in the spectrum, our approach combines them in a variety of ways to discover a baseline removal method that performs best at a given task, as specified by a user-provided task objective function.
To determine the best method and parameters, we employ global optimization techniques to efficiently rule out configurations that are unlikely to perform well.