Verification and Testing

Ensuring correctness of programs is crucial for software development. The focus of this research is to build tools and techniques to ensure that a program meets its specification.

The Q program verifier brings together state-of-the-art tools to enable automated verification of programs written in a variety of programming languages, leveraging symbolic search techniques. Q stands out from its competition in its unique support for analyzing concurrent, multi-core, as well as distributed systems. The Yogi project pioneered the art of combining static analysis with testing in order to prove safety properties of sequential programs. Seal is a research prototype that infers and summarizes the potential side-effects of a method in a C# program. It is scalable, capable of consuming entire .NET libraries, and supports sophisticated C# features such as LINQ, delegates, event-handler, exceptions and so on. We also work on the application of machine learning techniques for efficient and scalable program verification.

Scalable Distributed Systems

Building systems that can scale seamlessly under load is an art. Scalability is a concern that pervades every aspect of the system stack including languages, runtimes, and storage. Unfortunately, scalability conflicts with other desirable properties such as consistency and security. The focus of this research is to rethink programming models, storage systems, and tools with scalability as the primary goal.

The CScale project explores new designs of distributed databases. As part of this project, we have developed a new replication protocol for building consistent and resilient distributed data structures. With Cipherbase, we are rethinking the design of databases and database abstractions with security as a first class citizen. We are building a compiler that securely migrates existing database applications to public cloud platforms. Perforator is a tool for costing and optimizing queries running on distributed query engines such as HIVE and DryadLINQ.

Probabilistic Programming

Probabilistic programs are usual programs with two added constructs: (1) the ability to draw values at random from distributions, and (2) the ability to condition values of variables in a program through observations. Models from diverse application areas such as computer vision, coding theory, cryptographic protocols, biology and reliability analysis can be written as probabilistic programs.

Probabilistic inference is the problem of computing an explicit representation of the probability distribution implicitly specified by a probabilistic program. We show connections this research area has with programming languages and software engineering, including language design, static and dynamic analysis. These ideas form the basis of the tool R2.

Concurrency Control

(Joint work with collaborators at Tel-Aviv University, Technion, and Stanford University.)

A key challenge in writing concurrent programs is concurrency control: ensuring that concurrent accesses and modifications to shared mutable state do not interfere with each other in undesirable ways. Atomic sections are a simple way to declaratively specify a desired correctness criterion in the presence of concurrency: an atomic section is a code fragment that must appear to execute in isolation, with no interference from other parallel computation. Our work has focused on atomic sections, in various settings, both from an analysis/verification perspective as well as synthesis perspective. The problems we have looked at include the following. (a) Converting a sequential data-structure into a linearizable data-structure. (A linearizable data-structure essentially guarantees that each of its methods is atomic.) (b) Converting a linearizable data-structure into a transactional data-structure. A transactional data-structure is a data-structure that provides additional hooks that enable clients to perform a sequence of operations atomically. (c) Composing transactional data-structures.