Talk Abstracts:

Abstract:
We show that the state reachability problem under TSO is decidable, but highly complex (non primitive recursive) even for finite-state programs (whereas it is PSPACE complete for the SC model). This is due to the fact that, roughly speaking, store buffers have the power of lossy FIFO-channels. Then, we consider this problem under the assumption of a bounded number of context switches. We provide a translation that takes as input a concurrent program P and produces a program P' such that, running P' under SC yields the same set of reachable states as running P under TSO with at most K context-switches for each thread, for any fixed K. Basically we show that it is possible to use 2K additional copies of the shared variables of P as local variables to simulate the store buffers. This translation allows to transfer existing results and analysis/verification techniques from SC to TSO for the same class of programs.

This talk is based on works with: Mohamed-Faouzi Atig, Sebastian Burckhardt, Madan Musuvathi, and Gennaro Parlato.

Abstract:
This is a two part talk on finding and fixing bugs in concurrent programs. In the first part, I will present Poirot, a new tool for finding bugs in concurrent programs. Instead of focussing on abstractions and proofs that try to show the absence of bugs, Poirot looks at restrictions (under-approximations) of programs that try to show the presence of bugs. By appropriately defining these restrictions, we can get decidable problems that can be solved efficiently, while still capturing most of the interesting behaviours. Poirot uses a combination of under-approximations: (1) Context-Switch-bounding helps deal with concurrency; and (2) recursion-bounding helps deal with sequential complications. Read more here: http://research.microsoft.com/en-us/projects/poirot/

Poirot can also suggest fixes to the bugs that it finds in concurrent programs in the form of atomic block annotations that need to be added into the code. An atomic block protects a piece of code and ensures that it runs without interruption from other threads. In recent work, various different ways have been proposed to realize atomic blocks in real programs, including the use of transactional memory or locks, making it possible to include atomic blocks as part of language syntax. Atomic blocks are a convenient way of expression synchronization in concurrent programs, and often lead to easier-to-understand programs compared to one with locks. In our work, given a buggy concurrent program, we infer atomic block annotations that rule out all bugs. In order to permit the maximum amount of concurrency, we ensure that the atomic blocks enclose the least amount of code possible.

This is joint work with Shaz Qadeer, Shuvendu Lahiri, and Saurabh Joshi.

Abstract:
Memory models for shared-memory concurrent programming languages typically guarantee sequential consistency (SC) semantics for datarace-free (DRF) programs, while providing very weak or no guarantees for non-DRF programs.
In effect programmers are expected to write only DRF programs, which are then executed with SC semantics. With this in mind, we propose a novel scalable solution for dataflow analysis of concurrent programs, which is proved to be sound for DRF programs with SC semantics. We use the synchronization structure of the program to propagate dataflow information among threads without requiring to consider all interleavings explicitly.
Given a dataflow analysis that is sound for sequential programs and meets certain criteria, our technique automatically converts it to an analysis for concurrent programs.

Abstract:
We present an approach for automatic verification and fence inference
in concurrent programs running under relaxed memory models.
Verification under relaxed memory models is a hard problem.
Given a finite state program and a safety specification, verifying
that the program satisfies the specification under a sufficiently relaxed
memory model is undecidable. For stronger models, the problem
is decidable but has non-primitive recursive complexity.
In this work, we focus on models that have store-buffer based
semantics, e.g. SPARC TSO and PSO. We use abstract interpretation
to provide an effective verification procedure for programs
running under this type of models. Our main contribution is a family
of novel partial-coherence abstractions, specialized for relaxed
memory models, which partially preserve information required for
memory coherence and consistency. We use our abstractions to automatically
verify programs under relaxed memory models. In addition,
when a program violates its specification but can be fixed by
only adding fences, our approach can automatically infer a correct
fence placement that is optimal under the abstraction. We implemented
our approach in a tool called BLENDER and applied it to
verify and infer fences in several concurrent algorithms.

Abstract:
Lock-freedom ensures the overall progress of a concurrent program, and wait-freedom ensures progress of each and every thread in the program. In this talk we present new algorithms with progress guarantees for basic data structures. In particular, a lock-free balanced tree, and a wait-free queue. Wait-freedom provides a better guarantee, but wait -free algorithms are typically less efficient. For example, the proposed wait-free queue is three times slower than the lock-free queue of Michael and Scott. To deal with his problem, we describe a methodology for making wait-free algorithms as fast as their lock-free counterparts. All proposd algorithms have been implemented and went through some basic testing. We welcome any suggestions on how to deal with the verification of such advanced concurrent algorithms.

Abstract:
We present a technique for automatically making a data-structure (or library) thread-safe by guaranteeing that the operations of the data-structure (or library) execute atomically. Our technique exploits fine-grained locking, especially in the presence of recursive data structures. Manual fine-grained locking is challenging and error-prone, and manually locking recursive data structures is even more challenging, especially when the heap structure is allowed to change dynamically. We introduce domination locking, a new locking protocol that can be used to guarantee atomicity and deadlock freedom in the presence of heap-allocated recursive data structures. Domination locking generalizes known locking protocols for recursive data structures and, in the specific case where the shape of the data structure is guaranteed to be a forest in between operations, domination locking gives rise to a particular efficient protocol we term eager forest-locking. We present a simple technique for automatically enforcing eager forest-locking using reference counting, and show that eager forest-locking provides performance close to hand-crafted locking for several challenging examples.

Abstract:
The distributed systems community has made enormous progress over the
past few decades designing specialized systems that scale to handle
millions of users and petabytes of data. However, combining
individual systems into composite applications that are scalable---not
to mention reliable, secure, and easy to maintain---remains an
enormous challenge. This is where programming models should be able
to help: good programming models are designed to facilitate composing
large applications from smaller components, and for reasoning about
the behavior of applications modularly. However, there appears to be
relatively little work on defining appropriate linguistic foundations
for large-scale distributed systems.

In this talk, I argue that the Actor model of concurrency is a sound foundation for reasoning about the compositional behavior of distributed systems, particularly those in which concurrency arises reactively---in response to external or internal events, signals, or messages; or dynamically---as a consequence of variations in input structure. Applications that exhibit one or both of these properties include web applications, transaction processing, workflow, stream/event processing, and certain data mining algorithms. I will illustrate some of the ideas using examples from the Thorn programming language, which is being developed jointly by IBM and Purdue University.

Abstract:
Concurrent Separation Logic with Permissions, developed by O'Hearn, Bornat et al, is able to represent sophisticated transfer of ownership and permissions between processes. We demonstrate how these ideas can be used to reason about fine-grained concurrent programs which do not employ explicit synchronization operations to control interference but cooperatively manipulate memory cells so that interference is avoided. Reasoning about such programs is challenging and appropriate logical tools are necessary to carry out the reasoning in a reliable fashion. We argue that Concurrent Separation Logic with Permissions provides such tools. We illustrate the logical techniques by presenting the proof of a concurrent garbage collector originally studied by Dijkstra et al.

Abstract:
Building distributed applications that scale seamlessly with load often requires adopting idioms such as data partitioning and structuring computation into loosely coupled services. While these practices achieve scale, they expose programmers to the pitfalls of distribution such as system failures and management of distributed state. In this talk, we introduce a new programming abstraction called molecular transactions that simplifies programming scalable distributed systems. A molecular transaction is a fault tolerant composition of conventional ACID transactions with user-defined compensating actions.

Abstract:
Developing highly responsive distributed systems applications can be very challenging. Such applications often have to maintain multiple replicas of state and manage consistency across them. Programmers have to reason about what view of state end users can observe, how to deal with concurrent updates that cannot all be committed and how such conflicts should be reported to end users. We present a new programming model Guesstimate that abstracts away the complexity involved in maintaining replicated state. On each machine Guesstimate maintains a replicated copy (called "current guesstimate") of the state so that operations on shared state can be executed locally without any blocking, while also guaranteeing that eventually all machines agree on the sequence of operations executed. Under such an execution model it possible for an operation that successfully updated the guesstimated state to fail when an attempt is made to (atomically) commit the operation on all machines. Guesstimate provides language features that let programmers define application specific notions of conflicts and express how such conflicts should be dealt with. In this talk I will describe the programming model in detail and share our experience building collaborative distributed applications with this model.

Abstract:
We propose to specify the correctness of a program's
parallelism using a sequential version of the program with controlled
nondeterminism. Such a nondeterministic sequential specification
allows (1) the correctness of parallel interference to be verified
independently of the program's functional correctness, and (2) the
functional correctness of a program to be understood and verified on a
sequential version of the program, one with controlled nondeterminism
but no interleaving of parallel threads. We identify a number of
common patterns for writing nondeterministic sequential
specifications. We apply these patterns to specify the parallelism
correctness for a variety of parallel Java benchmarks, even in cases
when the functional correctness is far too complex to feasibly
specify. We describe a sound runtime checking technique to validate
that an execution of a parallel program conforms to its
nondeterministic sequential specification. The technique uses a novel
form of conflict-serializability checking to identify, for a given
parallel execution of a test program, an equivalent nondeterministic
sequential execution. Our experiments show a significant reduction in
the number of false positives versus traditional
conflict-serializability in checking for parallelization bugs.

Abstract:
A long held belief is that sequential consistency (SC) is too
expensive as it restricts many compiler and hardware
optimizations. I will demonstrate recent research evidence that
shows otherwise --- SC is indeed achievable with acceptable
performance overhead with careful engineering of the compiler and the
hardware. I will also argue that relaxed memory models not only expose
a complicated programming model to the users but can be
slower than SC for programs we care about.

Abstract:
Concurrent algorithm designers often find it difficult to avoid expensive synchronization, in particular store-load ordering and atomic operations. This talk presents conditions under which, it is impossible to design algorithms that avoid such strong synchronization patterns. The identified conditions impact operations on many common data types and problems, such as FIFO queues, LIFO stacks, counters, sets, work queues, mutual exclusion. The identification of conditions for strong synchronization open the door for tradeoffs between synchronization overheads and the strength of specification of abstract data types. Strong synchronization may be avoided by specification relaxations such as limiting concurrency, limiting the API, and relaxing determinism

Abstract:
Replicating shared data is a fundamental mechanism in large-scale distributed systems, but suffers from a fundamental tension between scalability and data consistency. Eventual consistency sidesteps the
(foreground) synchronisation bottleneck, but remains ad-hoc, error-prone, and difficult to prove correct.

We present a promising new approach that is simple, scales almost indefinitely, and provably ensures eventual consistency: A CRDT is a data type that demonstrates some simple properties, viz. that its concurrent operations commute, or that its states form a semi-lattice.
Any CRDT provably converges, provided all replicas eventually receive all operations. A CRDT requires no synchronisation: an update can execute immediately, irrespective of network latency, faults, or disconnection; it is highly scalable and fault-tolerant.

The approach is necessarily limited since any task requiring consensus is out of reach. Nonetheless, many interesting and useful data types can be designed as a CRDT. We previously published the Treedoc CRDT, a sequence data type suited to concurrent editing tasks (as in a p2p wiki). This talk presents a portfolio of generally-useful, non-trivial, composable CRDTs, including variations on counters, registers, sets, maps (key-value stores), graphs and sequences. This research is part of a systematic and principled study of CRDTs, to discover their power and limitations, and to better understand the underlying mechanisms and requirements. The challenges ahead include scaling garbage collection and integrating occasional non-commuting operations.

This is joint work with Marek Zawirski of LIP6, Nuno Preguiça of Universidade Nova de Lisboa, and Carlos Baquero, of Universidade do Minho. The research is supported by ANR grant ConcoRDanT (ANR-10-BLAN-0208), a Google Research Award 2009, and a Google European Doctoral Fellowship 2010.

Abstract:
Traditional transactional memory systems suffer from overly conservative conflict detection, yielding so-called false conflicts, because they are based on fine-grained, low-level read/write conflicts. In response, the recent trend has been toward integrating various abstract data-type libraries using ad-hoc methods of high-level conflict detection. These proposals have led to improved performance but a lack of a unified theory has led to confusion in the literature.

We clarify these recent proposals by defining a generalization of transactional memory in which a transaction consists of coarse-grained (abstract data-type) operations rather than simple memory read/write operations. We provide semantics for both pessimistic (e.g.
transactional boosting) and optimistic (e.g. traditional TMs and recent alternatives) execution. We show that both are included in the standard atomic semantics, yet find that the choice imposes different requirements on the coarse-grained operations: pessimistic requires operations be left-movers, optimistic requires right-movers. Finally, we discuss how the semantics applies to numerous TM implementation details discussed widely in the literature.

Abstract:
Synchronisation constructs such as locks and barriers are often treated as primitive in program logics such as separation logic. The result has been a large number of custom logics, each addressing a single construct. This has two problems: first, each logic is incompatible with all the others, and second, logics have no way of justifying the specifications of constructs against the implementations. We have proposed a higher-order separation logic that is expressive enough to prove specifications for synchronisation constructs, both widely-used (locks) and bespoke (compiler-inserted barriers). I will illustrate our logic through a lock specification, and show how this specification can be justified against a spinlock implementation.

Abstract:
Verifying concurrent algorithms that manipulate unbounded pointer-linked
data structures is a challenging problem because it involves reasoning about
spatial and temporal interaction between concurrently running threads.
We suggest a novel approach for verification of concurrent algorithms which
separates the reasoning about the algorithmic insight from the verification
of its implementation.

In this talk, we exemplify our approach by presenting a proof of safety,
linearizability, and progress of a highly concurrent optimistic set
algorithm. The key step in our proof is the Hindsight Lemma,
which allows a thread to infer the existence of a global state in which
its operation can be linearized based on limited local atomic observations
about the shared state.

The Hindsight Lemma allows us to avoid one of the most complex
and non-intuitive steps in reasoning about highly concurrent linearizable
algorithms: considering the linearization point of an operation to be in
a different thread than the one executing it.

The Hindsight Lemma assumes that the algorithm maintains certain
simple invariants which are resilient to interference, and which
can themselves be verified using purely thread-local proofs. As a
consequence, the lemma allows us to unlock a perhaps-surprising
intuition: a high degree of interference makes non-trivial highly
concurrent algorithms in some cases much easier to verify than
less concurrent ones.

The talk is based on joint work with Peter O'Hearn, Martin Vechev,
Eran Yahav and Greta Yorsh, published in PODC'10, and on ongoing
research with Richard Bornat, Peter O'Hearn, and Hongseok Yang.

Abstract:
We address the problem of testing concurrent programs that use
concurrent data structure libraries. While concurrent libraries help
programmers exploit parallel hardware, to use such libraries correctly,
programmers must reason about concurrent interactions.We
present an effective technique for testing the linearizability of client
code built using concurrent data structures. Our method is based on
modular testing of client code in the presence of a specialized adversarial
environment. The adversary uses library semantics, such
as commutativity specifications, to simulate operations on the concurrent
data structure. These operations are simulated concurrently
with client operations by using an adversarial scheduler. We implemented
our approach in a tool called COLT, and evaluated its
effectiveness on a range of real-world concurrent Java programs.
Using COLT, we discovered 50 linearizability violations in Apache
Tomcat, Cassandra, MyFaces Trinidad, and other applications.

Abstract:
A code generator is a program that takes a model in a high-level modeling language as input, and outputs a program that captures the behaviour of the model. Code generators are becoming critical with the increasing use of model-based development for safety-critical systems.
We have taken a multipronged approach, involving both testing and proving, for the verification of code generators. We have demonstrated an effective technique for test-case generation for code generators, which takes into account not just the syntax but also the semantics of the modeling language. We have also developed a translation validation technique that can be applied to complex modeling languages like Simulink/Stateflow.

Abstract:
The semantics of concurrent and nondeterministic systems has generated
a plethora of process equivalences and preorders, many of which are based on the
notion of bisimulation. In a previous paper \cite{SEFM:06} we proposed
a parametrised form for bisimulations which may be used as a general
framework for analysing systems separately under several independent
criteria such as correctness and performance. Of particular interest,
are the various inheritance properties which allow one to conclude
properties about bisimilarities by simply studying properties of the
underlying system of observable events.

Equally interesting are the
properties that we may infer about the relationships between observables
from the properties of the bisimilarity.
At a very general level, in this paper, we present a characterisation
of when a parametrised bisimilarity may be an equivalence or a preorder
relation on the universe of processes. We extend the theory to
the so-called ``upto-bsimilarities''.

While we certainly cannot claim that our generalisation unifies all
the previous literature on the subject, several bisimulation based
relations may be reformulated as parametrised bisimilarities,
with small changes or modifications in the original formulation, which
do not in any way alter the spirit of the original formulation.
The examples we have chosen may be broadly classified as examples of
interleaving, non-interleaving, time-sensitive, cost-sensitive and
location-sensitive bisimilarities. We also show that the standard
notion of simulation may also be described as a parametrised bisimulation.

Abstract:
Graphics processing units (GPUs), traditionally used for graphics rendering, have emerged as general purpose, massively-parallel computation devices, and turning out to be a viable alternative to expensive supercomputers due to its extremely high cost-performance ratio. A GPU follows SIMD architecture, consisting of an array of large number of processing elements, which are capable of performing at the petaFLOP scale. Due to its impressive cost-performance ratio, various research and industry groups have started exploiting GPUs in several engineering and scientific domains such as computation of numerical linear algebra, simulation of physical bodies and so on. However, a major bottleneck in adoption of such hardware is that the existing applications are often not readily deployable. For an efficient utilization of the hardware, it is necessary to re-engineer the existing body of program for a GPU, which is not a trivial task. First one needs to identify the appropriate parts of the program that can be parallelized. It may also be necessary to rewrite the application using a completely different algorithm in order to exploit the underlying hardware. The underlying architecture itself, such the memory hierarchy of a GPU, may force the programmer to adopt a different communication mechanism between the CPU and the device.
Our research goal is to explore these challenges and create tools that help programmers to take appropriate decision quickly, during reengineering. In this talk we first share our initial experience in porting an well-known CFD solver framework to a Telsa GPU. Next we give an overview of an approach to identify data parallel parts of a program, using a static analysis technique.

Abstract:
In languages with distributed heap data structures, the type system typically conveys only coarse locality information: whether a reference is local or possibly remote. Often, of interest to the optimizing compiler or the user is a more fine-grain information, such as whether two remote references point to objects in the same partition of the distributed heap. We propose a dependent type system, called place types, to capture such fine-grain locality information. The type of each reference identifies the heap partition, called a place, that the reference points to. This type system is expressive enough to statically type several commonly used distributed data structures. It also accommodates a "dynamic" type when no suitable static type exists. We show how to embed this type system as type annotations in the X10 programming language. We also present a type inference algorithm for saving programmers from the burden of inserting type annotations. The type inference has been inspired by the use of type inference in functional programming languages such as ML. We have implemented the type inference algorithm in the X10 compiler. Experiments with small programs, but with the full set the relevant language features, have been successful in automatically finding the expected place types in these programs.
(Joint work with Ras Bodik, Vivek Sarkar and Vijay Saraswat.)

Abstract:
Building applications that are responsive and can exploit parallel hardware poses an important challenge. In particular, enabling applications to execute various tasks in parallel can be difficult if those tasks exhibit read and write conflicts. To simplify this problem, we introduce a novel programming construct called "revisions", which is inspired by revision control systems commonly used by development teams. Revisions are forked and joined much like asynchronous tasks. However, rather than accessing global shared data directly (and thereby risking data races or atomicity violations), all revisions execute on a (conceptual) copy of the shared state, a "global mutable snapshot" so to speak. Any changes performed in a revision apply to that snapshot only, until the revision is joined at which point the changes become globally effective.

Abstract:
A key difficulty in concurrent program verification is the large number of interleavings of fine-grain actions. The typical correctness argument makes use of a separation of concerns. At each phase of the correctness argument, one worries only about interference due to concurrency, or the sequential behavior of a code block. We present a proof system, QED, that builds on this intuition to combat the "many interleavings" problem. In QED, we generalize Lipton's reduction and the idea of commutativity of actions to be more semantic, less syntactic. A key computational advantage is that QED proof steps have locally-checkable antecedents. QED proofs apply reduction and abstraction iteratively and transform a program to consist of larger-grain actions. We show that this approach greatly simplifies proofs of assertions, atomicity and linearizability, even in optimistically concurrent programs.

Abstract:
We propose computational, declarative definitions of the concepts of weak and strong trust relations between interacting agents, and trust domains of trust-related agents in distributed systems. Our definitions yield computational complexity results for deciding potential and actual trust relationships and membership in trust domains, as well as a positive (negative) compositionality result for strong (weak) trust domains. Our defining principle for weak and strong trust is (common) belief in and knowledge of agent correctness, respectively.

Abstract:
A future is a lightweight program abstraction that expresses a simple yet useful form of fork-join parallelism. The expression future(e) declares that e can be evaluated concurrently with the future's continuation. The expression touch (p) where p is a placeholder returned by evaluating a future, blocks until the result of evaluating e is known. Safe futures provide additional deterministic guarantees on the concurrent execution of the future with its continuation, ensuring that all data dependencies found in the original (non-future annotated) program are respected.

Earlier work on safe futures and related deterministic parallelism abstractions have considered their semantics in the context of an otherwise sequential program. When safe futures are integrated within an explicitly concurrent program (e.g., to extract additional concurrency from within a sequential thread of control), the safety conditions necessary to guarantee determinacy of their computations are substantially more complex.

In this talk, I discuss the issues that arise when trying to formulate notions of correctness of deterministic execution that take place within an inherently non-deterministic environment. To do so, we consider a specification and dynamic analysis that enforces determinism of safe futures in an ML-like core language equipped with dynamic thread creation and first-class references.

Our analysis enforces a new isolation property that ensures deterministic execution among safe futures and their continuations while still allowing non-isolated non-deterministic execution everywhere else. The analysis is defined via a lightweight capability-based dependence tracking mechanism that serves as a compact representation of an effect history.

Abstract:
Barrelfish is a new research operating system being built from scratch in a collaboration between ETH Zurich in Switzerland and Microsoft Research Cambridge in the UK. We are exploring how to structure an OS for future multi-core systems. We are motivated by two closely related trends in hardware design: first, the rapidly growing number of cores, which leads to a scalability challenge, and, second, the increasing diversity in computer hardware, requiring the OS to manage and exploit heterogeneous hardware resources.
As part of the project we are revisiting the interface between applications and the operating system, in terms of how applications invoke system services, in terms of how applications express their resource requirements to the system, and in terms of how the system decides how to allocate cores and resources to the diverse mix of software running on a multi-core desktop system.
In this talk I'll introduce the system architecture that we're exploring with Barrelfish, and I'll discuss some of the challenges and opportunities it offers in terms of how programmers write efficient, correct code.