TOC Seminar '16-'17

A New Approach to Distribution Testing

Daniel Kane - UCSD

Monday, August 29, 2016 - 1:00pm to 2:30pmPierce 213

We study the problem of determining whether or not a discrete distribution has a given property from a small number of samples. We present a new technique in this field that operates by reducing many such problems in a black box way to a simple L^2 tester.

We show how this new technique can recover simple proofs of several known, optimal results and how it can be extended to provide optimal solutions to a number of previously unsolved problems.

Language Edit Distance, (min,+)-Matrix Multiplication & Beyond

Barna Saha - UMass Amherst

Monday, September 26, 2016 - 1:00pm to 2:30pm

MD 123

The language edit distance is a significant generalization of two basic problems in computer science: parsing and string edit distance computation. Given any context free grammar, it computes the minimum number of insertions, deletions and substitutions required to convert a given input string into a valid member of the language. In 1972, Aho and Peterson gave a dynamic programming algorithm that solves this problem in time cubic in the string length. Despite its vast number of applications, in forty years there has been no improvement over this running time.

Computing (min,+)-product of two n by n matrices in truly subcubic time is an outstanding open question, as it is equivalent to the famous All-Pairs-Shortest-Paths problem. Even when matrices have entries bounded in [1,n], obtaining a truly subcubic (min,+)-product algorithm will be a major breakthrough in computer science.

In this presentation, I will explore the connection between these two problems which led us to develop the first truly subcubic algorithms for the following problems: (1) language edit distance, (2) RNA-folding-a basic computational biology problem and a special case of language edit distance computation, (3) stochastic grammar parsing—fundamental to natural language processing, and (4) (min,+)-product of integer matrices with entries bounded in n(3-ω-c) where c >0 is any constant and, ω is the exponent of the fast matrix multiplication, widely believed to be 2.

Time permitting, we will also discuss developing highly efficient linear time approximation algorithms for language edit distance for important subclasses of context free grammars.

Extension Complexity of Independent Set Polytopes

Mika Göös - Harvard

Monday, October 3, 2016 - 1:00pm to 2:30pm

MD 123

We exhibit an n-node graph whose independent set polytope requires extended formulations of size exponential in Omega(n/log n). Previously, no explicit examples of n-dimensional 0/1-polytopes were known with extension complexity larger than exponential in Theta(sqrt(n)). Our construction is inspired by a relatively little-known connection between extended formulations and (monotone) circuit depth.

We use this technique to prove an \Omega(n(\log n/\log\log n)^2) cell-probe lower bound for the dynamic 2D weighted orthogonal range counting problem (2D-ORC) with n/poly\log n updates and n queries, that holds even for data structures with \exp(-\tilde{\Omega}(n)) success probability. This result not only proves the highest amortized lower bound to date, but is also tight in the strongest possible sense, as a matching upper bound can be obtained by a deterministic data structure with worst-case operational time. This is the first demonstration of a "sharp threshold" phenomenon for dynamic data structures.

Our broader motivation is that cell-probe lower bounds for exponentially small success facilitate reductions from dynamic to static data structures. As a proof-of-concept, we show that a slightly strengthened version of our lower bound would imply an \Omega((\log n /\log\log n)^2) lower bound for the static 3D-ORC problem with O(n\log^{O(1)}n) space. Such result would give a near quadratic improvement over the highest known static cell-probe lower bound, and break the long standing \Omega(\log n) barrier for static data structures.

Joint work with Omri Weinstein.

Thin Spanning Trees and Their Algorithmic Applications

A spanning tree of a graph G is called epsilon-thin if it contains at most an epsilon fraction of the edges of each cut in that graph. Is there a function f:(0,1)→ℤ such that every f(epsilon)-edge-connected graph has an epsilon-thin spanning tree?

I will talk about our journey in search of such thin trees, their applications concerning traveling salesman problems, and unexpected connections to graph sparsification and the Kadison-Singer problem.

Discrepancy Algorithms beyond Partial Coloring

Nikhil Bansal - Eindhoven University of Technology

Monday, November 14, 2016 - 1:00pm to 2:30pm

MD 123

The partial coloring method is one of the most powerful and widely used method in

I will describe some recent algorithmic techniques, based on strong SDP relaxations and

controlled random walks, that allow one to go beyond the partial coloring barrier.

Given the close connection between discrepancy and rounding algorithms, these methods

also give new general-purpose rounding techniques that unify and refine various previous

methods.

Based on joint works with Daniel Dadush, Shashwat Garg and Viswanath Nagarjan.

Learning Theory in the age of Neural Networks: Results and Challenges

Amit Daniely - Google Research

Monday, December 12, 2016 - 1:00pm to 2:30pm

MD 123

I will discuss the extent to which learning theory, as we know it today, can explain modern machine learning practice.

First, I will show that assuming that certain random SAT problems are hard, no efficient algorithm can learn DNF formulas. The picture arising from this and other results, is that it is hard to learn non-linear function classes.

In effect, linear function classes are the main tool that learning theory currently has for providing guarantees on learning. In light of that, in the second part of the talk I will present very recent work, which associates a linear function class to a network architecture, and that shows that modern neural network (NN) algorithms are guaranteed to learn a function that is at least as good as the best function in that class. This result provides the first polynomial time and distribution-free guarantees on modern NN learning algorithms (namely, SGD on all network weights), and applies to a relatively rich family of network architectures.

Lastly, I will describe an experimental study that demonstrates that NN algorithms often learn functions that are better than the best function in the associated linear class. This implies that our guarantees are still far from fully explaining the power of NN learning. I will end the talk with a short discussion on how better understanding could possibly be achieved.

I will assume no prior knowledge in learning theory

Based on joint work with Roy Frostig, Vineet Gupta and Yoram Singer, and with Nati Linial and Shai Shalev-Shwartz

Fixed-parameter dynamic algorithms

Fixed-parameter algorithms and kernelization are two powerful methods to solve NP-hard problems. Yet, so far those algorithms have been largely restricted to static inputs.

In this talk I will discuss fixed-parameter algorithms and kernelizations for fundamental NP-hard problems with \emph{dynamic} inputs.

We consider a variety of parameterized graph and hitting set problems (such as k-Vertex Cover, k-Feedback Vertex Set etc.) with parameter k that are known to have f(k)n^{1+o(1)} time algorithms on inputs of size n, and address the question of whether there is a data structure that supports small updates (such as edge/vertex/set/element insertions and deletions) with an update time of g(k)n^{o(1)}; such an update time would be essentially optimal. Update and query times independent of n are particularly desirable. We obtain such dynamic algorithms for many important problems, and complement these with conditional and unconditional lower bounds.

I'll overview these results and give some of their additional applications.

* Statistical query algorithms (Kearns, 1993) are algorithms that

instead of random samples from an input distribution D over a domain

X, have access to a SQ oracle for D. Given a real-valued query

function g over X, the oracle returns an estimate of the expectation

of g on a sample chosen randomly from D.

Based on joint works with C. Guzman, W. Perkins and S. Vempala.

A (1+epsilon)-Approximation for Makespan Scheduling with Precedence Constraints using LP Hierarchies

Thomas Rothvoss - University of Washington

Monday, February 27, 2017 – 1:15pm to 2:15pm.

Pierce Hall 213

In a classical problem in scheduling, one has n unit size jobs with a precedence order and the goal is to find

a schedule of those jobs on m identical machines as to minimize the makespan. It is one of the remaining four open problems from the book of Garey & Johnson whether or not this problem is NP-hard for m=3.

We prove that for any fixed epsilon > 0 and m, an LP-hierarchy lift of the time-index LP with a slightly super poly-logarithmic number of rounds provides a (1 + epsilon)-approximation. For example Sherali-Adams suffices as hierarchy. This implies an algorithm that yields a (1+epsilon)-approximation in almost quasi-polynomial time. The previously best approximation algorithms guarantee only a 2-approximation for large m.

This is joint work with Elaine Levey.

Computational Efficiency and Robust Statistics

Ilias Diakonikolas - University of Southern California

Monday, March 20, 2017 – 1:15pm to 2:15pm.

Pierce Hall 213

Abstract: We consider the following basic problem: Given corrupted samples from a high-dimensional Gaussian, can we efficiently learn its parameters? This is the prototypical question in robust statistics, a field that took shape in the 1960's with the pioneering works of Tukey and Huber. Unfortunately, all known robust estimators are hard to compute in high dimensions. This prompts the following question: Can we reconcile robustness and computational efficiency in high-dimensional learning?

In this work, we give the first efficient algorithms for robustly learning a high-dimensional Gaussian that are able to tolerate a constant fraction of corruptions. Our techniques also yield robust estimators for several other high-dimensional models, including Bayesian networks, and various mixture models.

The talk will be based on joint works with (different subsets of) G. Kamath, D. Kane, J. Li, A. Moitra, and A. Stewart.

Ben Rossman - University of Toronto

Monday, April 10, 2017 - 1:15pm to 2:15pm

Maxwell Dworkin 223

I will present recent results on the power of formulas vs. circuits in the bounded-depth setting. In joint work with Srikanth Srinivasan, we obtain a nearly optimal separation between the AC^0[⊕] formula vs. circuit size of Approximate Majority functions. In other work, we show that the AC^0 circuit (resp. formula) size of the H-Subgraph Isomorphism problem is tied to the treewidth (resp. treedepth) of the pattern graph H. The latter formula-size lower bound relies on a new “excluded-minor approximation” of treedepth (joint with Kenich Kawarabayashi). The former is based on an improved construction of low-degree approximating polynomials for AC^0[⊕] formulas.

We show that there exist binary locally testable codes (for all rates) and locally correctable codes (for low rates) with rate and distance approaching the Gilbert-Varshamov bound (which is the best rate-distance tradeoff known for general binary error-correcting codes). Our constructions use a number of ingredients: Thommesen's random concatenation technique, the Guruswami-Sudan-Indyk strategy for list-decoding concatenated codes, the Alon-Edmonds-Luby distance amplification method, and the local list-decodability and local testability of Reed-Muller codes. Interestingly, this seems to be the first time that local testability is used in the construction of locally correctable codes.

Incidence geometry, rank bounds for design matrices, and applications

Shubhangi Saraf - Rutgers University

Monday, May 1, 2017 – 1:15pm to 2:15pm.

Maxwell Dworkin 223

The classical Sylvester-Gallai theorem states the following: Given a finite set of points in the Euclidean plane, if the line through every pair of points passes through a third point, then all points must be collinear. Thus basically the result shows that many `local' linear dependencies implies a `global' bound on the dimension of the entire set of points. Variations of these questions have been well studied in additive combinatorics and incidence geometry. In the last few years, techniques from these areas have proven to be very useful in several structural results in theoretical computer science, in areas such as arithmetic complexity as well as coding theory. In this talk I will survey some of these connections as well as highlight some of the proof techniques (such as rank bounds for design matrices). I will also talk about a recent result which gives a linear lower bound for the number of ordinary lines (lines through exactly two points) determined by a point set spanning 3 dimensional complex space.

Based on joint works with Abdul Basit, Zeev Dvir, Neeraj Kayal, Avi Wigderson and Charles Wolf

Two approaches to (Deep) Learning with Differential Privacy

Kunal Talwar - Google Research

Monday, May 8, 2017 – 1:15pm to 2:15pm.

Maxwell Dworkin 223

Machine learning techniques based on neural networks are achieving remarkable results in a wide variety of domains. Often, the training of models requires large, representative datasets, which may be crowd-sourced and contain sensitive information. The models should not expose private information in these datasets. Differential Privacy is a standard privacy definition that implies a strong and concrete guarantee on protecting such information.

In this talk, I'll then outline two recent approaches to training deep neural networks while providing a differential privacy guarantee, and some new analysis tools we developed in the process. Our implementation and experiments demonstrate that we can train deep neural networks with non-convex objectives, under a modest privacy budget, and at a manageable cost in software complexity, training efficiency, and model quality.