Systems Lunch

The UMass CS Systems Lunch talk series is Fridays at 12:00pm in room CS 142. Everyone interested in systems is welcome to attend. In addition to providing a free lunch and a chance for systems folks to meet and chat, Systems Lunch is an opportunity to hear exciting talks by visitors as well as to learn about projects going on in the College.

Take two languages, Julia and Fortress, designed to solve the same problem with the same mechanisms and compare the approaches that lead one to being adopted by a growing number of domain scientists and the other discontinued. Can the two designs be reconciled? Can we somehow turn the beast that is into the beauty that could have been?

This talk is a snapshot of our investigations into Julia with more open questions than definitive answers. Together, we will review the language’s design, a pragmatic effort driven by use-cases. In contrast, Fortress followed a more principled, formally grounded, approach that aimed for type soundness rather than adoption. We will marvel at the efficacy of Julia’s compiler — a one-trick pony that happens to work in practice. In comparison, Fortress was limited to a JVM-based interpreter with no clear path to high-performance. The last portion of the talk will be devoted to our study of the multi-dispatch feature of Julia and in particular of the subtype relation that is at its heart.

Jan Vitek loves to engineer software, sometimes teaches software engineering, often hacks single-file LaTeX documents, always complains about Git, and never used Coq. He is partly to blame for the artifact evaluation process in SIGPLAN conferences. He has an office at Northeastern University.

Years ago, D started modestly as an improved offering in the realm of systems programming languages, sharing a good deal of philosophy with C and C++. With time, however, D became a very distinct language with unique features (and, of course, its own mistakes).

One angle of particular interest has been D’s ability to perform compile-time introspection. Artifacts in a D program can be “looked at” during compilation. Coupled with a full-featured compile-time evaluation engine and with an ability to generate arbitrary code during compilation, this has led to a number of interesting applications.

This talk shares early experience with using these features of the D language. Design by Introspection is a proposed programming paradigm that assembles designs by performing introspection on components during compilation.

Bio:

Andrei Alexandrescu is a researcher, software engineer, and author. He wrote three best-selling books on programming (Modern C++ Design, C++ Coding Standards, and The D Programming Language) and numerous articles and papers on wide-ranging topics from software engineering to language design to Machine Learning to Natural Language Processing. Andrei holds a PhD in Computer Science from the University of Washington and a BSc in Electrical Engineering from University “Politehnica” Bucharest. He is Vice President of the D Language Foundation. http://erdani.com

Rust is a new systems-programming language that is becoming increasingly popular. It aims to combine C++’s focus on zero-cost abstractions with numerous ideas that emerged first in academia, most notably affine and region types (‘ownership and borrowing’) and Haskell’s type classes (‘traits’). One of the key goals for Rust is that it it can be used as a ‘drop-in’ replacement for C or C++: it does not require a runtime or garbage collector, and you can even choose to forego the standard library. In this talk, I’ll give a brief overview of the core concepts that are the heart of Rust. I’ll talk about the same core concepts that allow us to avoid a garbage collector *also* turn out to support efficient and data-race-free parallel programming. I’ll also touch on some of our experiences at Mozilla, where we have been replacing large chunks of Firefox’s C++ code with Rust.

Bio:

Nicholas Matsakis is a senior researcher at Mozilla Research and a member of the Rust core team. He has been working on Rust since 2011 and did much of the initial work on its type system and other core features. He did his undergraduate study at MIT, graduating in 2001, and later obtained a PhD in 2011, working with Thomas Gross at ETH Zurich. He also spent several years at DataPower Technology, a startup since acquired by IBM, working on the JIT compiler and networking runtime.

In this talk, I will present our approach for investigating how machine learning models leak information about the individual data records on which they were trained. My focus will be on the fundamental membership inference attack: given a data record and black-box access to a model, determine if the record was in the model’s training dataset. I will demonstrate how to build a successful inference attack on different classification models e.g., trained by commercial “machine learning as a service” providers such as Google and Amazon.
Bio:Reza Shokri is a postdoctoral researcher at Cornell University. His research focuses on data and computational privacy for a variety of applications, from location-based services and recommender systems to web search and machine learning. His work on quantifying location privacy was recognized as a runner-up for the annual Award for Outstanding Research in Privacy Enhancing Technologies (PET Award). Recently, he has focused on privacy-preserving generative models for synthetic data, and privacy in machine learning. More information: www.shokri.org

Monday, November 7, 2016

Speaker: Gilles Muller (INRIA)
Host: Emery Berger
Title: Safe multicore scheduling in a Linux cluster environmentModern clusters rely on servers containing dozens of cores that have high infrastructure cost and energy consumption. Therefore, it is economically essential to exploit their full potential at both the application and the system level. While today’s clusters typically rely on Linux, recent research shows that the Linux scheduler suffers from performance bugs that lead to core under-usage. The consequences are wasted energy, bad infrastructure usage, and lower service response time.The fundamental source of such performance bugs is that the Linux scheduler, being monolithic, has become too complex. In this talk, we present ongoing work on the Ipanema project that proposes to switch to set of simple schedulers, each tailored to a specific application. Our vision raises scientific challenges in terms of easily developing schedulers and proving them safe. The key to our approach is designing a Domain-Specific Language for multicore kernel scheduling policy development, associated verification tools and a Linux run-time system.

Monday, October 24, 2016

Speaker: Tim Kraska (Brown)
Host: Gerome Miklau
Title: Interactive Data Science
Unleashing the full potential of Big Data requires a paradigm shift in the algorithms and tools used to analyze data towards more interactive systems with highly collaborative and visual interfaces. Ideally, a data scientist and a domain expert should be able to make discoveries together by directly manipulating, analyzing and visualizing data on the spot, instead of having week-long forth-and-back interactions between them. Current systems, such as traditional databases or more recent analytical frameworks like Hadoop or Spark, are ill-suited for this purpose. They were not designed to be interactive nor to support the special requirements of visual data exploration. Similarly, most machine learning algorithms are not able to provide initial answers at “human speed” (i.e., sub-seconds), nor are existing methods sufficient to convey the impact of the various risk factors, such as caused by incompleteness within the data or (implicit) multi-hypothesis testing.In this talk, I will present our vision of a new approach for conducting interactive exploratory analytics and explain why integrating the aforementioned features requires a complete rethinking of the full analytics stack from the interface to the “guts”. I will present recent results towards this vision including our novel interface, analytical engine and index structure, and outline what challenges are still ahead of us.
BIO:Tim Kraska is an Assistant Professor in the Computer Science department at Brown University. Currently, his research focuses on Big Data management systems for modern hardware and new types of workloads, especially interactive analytics. Before joining Brown, Tim spent 3 years as a PostDoc in the AMPLab at UC Berkeley, where he worked on hybrid human-machine database systems and cloud-scale data management systems. Tim received his PhD from the ETH Zurich under the supervision of Donald Kossmann. He was awarded an NSF Career Award (2015), an Airforce Young Investigator award (2015), a Swiss National Science Foundation Prospective Researcher Fellowship (2010), a DAAD Scholarship (2006), a University of Sydney Master of Information Technology Scholarship for outstanding achievement (2005), the University of Sydney Siemens Prize (2005), two VLDB best demo awards (2015 and 2011), and an ICDE best paper award (2013).

Monday, October 3, 2016

Speakers: Harry Xu (UC-Irvine)
Host: Emery Berger
Title: Marrying Generational GC and Region Techniques for High-Throughput, Low-Latency Big Data Memory ManagementMost “Big Data” systems are written in managed languages such as Java, C#, or Scala. These systems suffer from severe memory problems due to massive volumes of objects created to process input data. Allocating and deallocating a sea of data objects puts a severe strain on existing garbage collectors (GC), leading to high memory management overhead and reduced performance. We have developed a series of techniques at UC Irvine to tackle this problem. In this talk, I will first talk about Facade (ASPLOS’15), a compiler and runtime system that can statically bound the number of data objects created in the heap. Next, I will talk about our recent work on Yak (OSDI’16), a new hybrid garbage collector that splits the managed heap into a control and a data space, and uses a generational GC and a region-based technique to manage them, respectively.BIO:Guoqing (Harry) Xu is an assistant professor at UC Irvine. He is broadly interested in program languages and (distributed, operating, and runtime) systems. His recent interests center on (1) how to exploit language/compiler techniques to build scalable Big Data systems and (2) how to build Big Data systems to parallelize and scale sophisticated program analyses. He publishes broadly in PL, systems, and SE conferences such as SOSP/OSDI, ASPLOS, PLDI, and OOPSLA and is an author of several papers awarded or nominated for distinguished paper awards.

Friday, September 30, 2016

Speakers: Joe Gibbs Politz (Swarthmore/UCSD) and Ben Lerner (Northeastern University)
Host: Arjun Guha
Title: Pyret: Language Design for #CS4AllComputer science for all requires that computing education simultaneously achieve equity, rigor, and scale. Bootstrap (www.bootstrapworld.org) integrates computing education with disciplines like math, physics, and data science, while also supporting traditional computer science curricula. These interdisciplinary curricula enable equitable access and success in both subjects for all students in grades 6-16.In this talk, we discuss how the Pyret programming language (www.pyret.org), which is the host language for new Bootstrap curricula, supports these efforts. Its design has been informed by the How to Design Programs computing curriculum, a scaffolded, rigorous, multi-representational technique for program design. We discuss the technical challenges induced by making it run on stock Web browsers on commodity machines. We also present novel features in the language to support both data processing and reactive computation, which make curricula both outward-looking and engaging to a broad student population. Finally, as a full-featured language with a traditional syntax and semantics, these multiple entrypoints into Pyret lead to an authentic programming experience for students.BIOS:Joe Gibbs Politz is starting as an Assistant Teaching Professor of computer science at UC San Diego in January of 2017. Previously, he taught at Swarthmore College and received his PhD from Brown University. He studies programming languages, computer science education, and Web programming.Benjamin Lerner is a lecturer at Northeastern University’s College of Computer and Information Sciences. He earned his undergraduate degree at Yale University and achieved his PhD at the University of Washington in Seattle. He is currently developing Pyret, a new programming language aimed at teaching introductory programming. In the past, Prof. Lerner worked for Microsoft and MSR, and has taught for several summers at the Johns Hopkins Center for Talented Youth Program.

November 2, 2015

Speaker: Ethan Heilman (Boston University)
Host: Amir Houmansadr
Title: Eclipse Attacks on Bitcoin’s Peer-to-Peer NetworkWe present eclipse attacks on bitcoin’s peer-to-peer
network. Our attack allows an adversary controlling a sufficient
number of IP addresses to monopolize all connections to and from a
victim bitcoin node. The attacker can then exploit the victim for
attacks on bitcoin’s mining and consensus system, including
N-confirmation double spending, selfish mining, and adversarial forks
in the blockchain. We take a detailed look at bitcoin’s peer-to-peer
network, and quantify the resources involved in our attack via
probabilistic analysis, Monte Carlo simulations, measurements and
experiments with live bitcoin nodes. Finally, we present
countermeasures, inspired by botnet architectures, that are designed
to raise the bar for eclipse attacks while preserving the openness and
decentralization of bitcoin’s current network architecture.Project Website: http://cs-people.bu.edu/heilman/eclipse/Bio:Ethan Heilman is a PhD student in Boston University’s Computer Science Department and a member of the security research group BUSec. He is advised by Sharon Goldberg, and has done research on novel attacks on hash functions, differential cryptanalysis, Intelligent Transit Systems and cache based side channel attacks. He broke the SHA3 contestant Spectral Hash. His current focus is on the RPKI and Bitcoin.Prior to graduate school, Ethan worked as a software engineer at the Broad Institute where he wrote microbial bioinformatics annotation software. He also worked as a software developer at two successful startups, Pubget and Jumptap. In his free time, he writes games, experiments with web application technology, and blogs about security.

October 19, 2015

Speaker: Remco Chang (Tufts)
Host: Alexandra Meliou
Title: Big Data Visual Analytics: A User-Centric ApproachModern visualization systems often assume that the data can fit within the computer’s memory. With such an assumption, visualizations can quickly slice and dice the data and help the users examine and explore the data in a wide variety of ways. However, in the age of Big Data, the assumption that data can fit within memory no longer applies. One critical challenge in designing visual analytics systems today is therefore to allow the users to explore large and remote datasets at an interactive rate. In this talk, I will present our research in approaching this problem in a user-centric manner. In the first half of the talk, I will present preliminary work with the database group at MIT on developing a big data visualization system based on the idea of predictive prefetching and precomputation. In the second half of the talk, I will present mechanisms and approaches for performing prefetching that are based on user’s past interaction histories and their perceptual abilities.
Bio:
Remco Chang is an Assistant Professor in the Computer Science Department at Tufts University. He received his BS from Johns Hopkins University in 1997 in Computer Science and Economics, MSc from Brown University in 2000, and PhD in computer science from UNC Charlotte in 2009. Prior to his PhD, he worked for Boeing developing real-time flight tracking and visualization software, followed by a position at UNC Charlotte as a research scientist. His current research interests include visual analytics, information visualization, and human-computer interactions. His research has been funded by NSF, DHS, MIT Lincoln Lab, and Draper. He has had best paper, best poster, and honorable mention awards at InfoVis, VAST, CHI, and VDA. He is currently an associated editor of the ACM Transactions on Interactive Intelligent Systems (TiiS) and the Human Computation journals, and he has been a PC and in organizational roles in leading conferences such as InfoVis, VAST, and CHI. He received the NSF CAREER Award in 2015.

September 28, 2015

Speaker: Don Porter (Stonybrook)
Host: Emery Berger
Title: BetrFS: Write-Optimization in a Kernel File SystemWrite-optimized data structures (WODS) are a promising building
block for storage systems because they have the potential
to strictly dominate the performance of B-trees and other common
on-disk data structures. In particular, WODS can dramatically improve
performance of both small, random writes and large, sequential scans.
This talk will introduce the basics of WODS, including the Bε
tree, and then will describe BetrFS, the first in-kernel file system
built with a WODS. This work contributes a combination of
kernel-level techniques to leverage write-optimization in the
VFS layer and data structure-level enhancements to meet the
requirements of a POSIX-style file system. BetrFS outperforms
widely-used file systems, such as ext4 and xfs, on many benchmarks,
sometimes by orders of magnitude.The BetrFS project is an ongoing effort by an increasingly large team of contributors from Stony Brook, Rutgers, MIT, and Tokutek/Percona.More information, including source code, is available at betrfs.org.Bio:
Don Porter is an Assistant Professor and Kieburtz Young Scholar of
Computer Science at Stony Brook University. Porter’s research
interests broadly involve improving efficiency and security of
computer systems. Porter earned a Ph.D. and M.S. from The University
of Texas at Austin, and a B.A. from Hendrix College. He has received
awards including the NSF CAREER Award and the Bert Kay Outstanding
Dissertation Award from UT Austin.

April 27, 2015

Speaker: Bryan Ford (Yale/EPFL)
Host: Emery Berger
Title: Warding off timing attacks in DeterlandThe massive parallelism and resource sharing embodying today’s cloud business model not only exacerbate the security challenge of timing channels, but also undermine the viability of defenses based on resource partitioning. This paper proposes hypervisor-enforced timing mitigation to control timing channels in cloud environments. This approach closes “reference clocks” internal to the cloud by imposing a deterministic view of time on guest code, and uses timing mitigators to pace I/O and rate-limit potential information leakage to external observers. Our prototype hypervisor implementation is the first system that can mitigate timing-channel leakage across full-scale existing operating systems such as Linux and applications written in arbitrary languages. Mitigation incurs a varying performance cost, depending on workload and tunable leakage-limiting parameters, but this cost may be justified for security-critical cloud applications and data.Bio:
Bryan Ford currently leads the Decentralized/Distributed Systems (DeDiS) research group at Yale University, but will be moving to EPFL in Lausanne, Switzerland in July 2015. Ford’s work focuses broadly on building secure systems, touching on many particular topics including secure and certified OS kernels, parallel and distributed computing, privacy-preserving technologies, and Internet architecture. He has received the Jay Lepreau Best Paper Award at OSDI, and multiple grants from NSF, DARPA, and ONR, including the NSF CAREER award. His pedagogical achievements include PIOS, the first OS course framework leading students through development of a working, native multiprocessor OS kernel. Prof. Ford earned his B.S. at the University of Utah and his Ph.D. at MIT, while researching topics including mobile device naming and routing, virtualization, microkernel architectures, and touching on programming languages and formal methods.

It is increasingly important for applications to protect sensitive data. Unfortunately, privacy and security policies for information flow are difficult to manage because of their global nature. We propose a policy-agnostic programming model in which the programmer implements information flow policies separately from the other functionality. In this talk, I describe Jeeves, a programming language for automatically enforcing information flow policies. In Jeeves, the programmer associates information flow policies with sensitive data and relies on the runtime to propagate the policies. Policies may talk about the program state and the output channel; the runtime guarantees that values may flow to viewers only if the policies permit. We have formalized the dynamic semantics of Jeeves, proven a non-interference guarantee, and implemented Jeeves as embeddings in Scala and Python. I will describe the Jeeves programming model, our recent focus on building a Jeeves web framework, and our experiences deploying an application written using our web framework. By shifting more responsibility to the language runtime and web framework, Jeeves reduces opportunities for programmer errors that lead to information leaks.Bio:
Jean Yang is a final-year Ph.D. student at MIT, advised by Armando Solar-Lezama. Her Ph.D. thesis is on Jeeves, a language for automatically enforcing information flow policies for security and privacy. She graduated from Harvard University in 2008. Her work on verifying the Verve operating system won Best Paper Award at PLDI in 2009. In 2009 she started Graduate Women at MIT, an institute-wide group that now has two annual conferences, a mentoring program, and over 1800 members.

A properly encapsulated data structure can be revised for refactoring without affecting the behaviors of clients of the data structure. Encapsulation ensures that clients are “representation independent”, that is, they are independent of particular choices of data structure representations. Modular reasoning about data structure revisions in heap-manipulating programs, however, is a challenge because encapsulation in the presence of shared mutable objects is difficult to ensure for a variety of reasons:(a) Pointer aliasing can break encapsulation and invalidate data structure invariants.
(b) Representation independence (RI) is nontrivial to guarantee in a generic manner, without recourse to specialized disciplines such as ownership.
(c) Mechanical verification of RI using theorem provers is nontrivial because it requires relational reasoning between two different data structure representations. Such reasoning lies outside the scope of most modern verification tools.We address these challenges by reasoning in Region Logic (RL), a Hoare logic augmented with state dependent “modifies” specifications based on simple notations for object sets, termed “regions”. RL uses ordinary first order logic assertions to support local reasoning and also the hiding of invariants on encapsulated state, in ways suited to verification using SMT solvers. By using relational assertions, the logic can reason about behavior-preservation of data structure refactorings, even in settings where full functional pre/post specifications are absent. The key ingredient behind such reasoning is a new proof rule that embodies representation independence.A verifier based on the non-relational part of RL has been used in the verification of implementations of the Observer and Composite design patterns and their clients. We further expect RL to be useful in proving noninterference style properties in the context of information flow security.Work in progress with David A. Naumann and Mohammad Nikouei (Stevens Institute of Technology).
Bio: Anindya Banerjee is currently Program Director in the CCF Division of the CISE Directorate at the National Science Foundation, where he participates in the Software and Hardware Foundations (SHF), REU Sites, and Cyberphysical Systems (CPS) programs and leads the Exploiting Parallelism and Scalability (XPS) program. He is on leave from the IMDEA Software Institute, Madrid, Spain, where he is full professor. Anindya’s research spans: Programming languages and program verification; High assurance concurrent, distributed and networked systems; Cyber-security, specifically the modular verification and certification of software systems against security policies; Program logics and semantics; Program analysis and abstract interpretation.

Nov. 17, 2014

Speaker: Tony Printezis, Twitter
Host: Emery Berger
Title: Use of the JVM at Twitter: a Bird’s Eye ViewTwitter’s infrastructure consists of a large number of services which run on a variety of managed runtime systems, starting originally with Ruby, but moving to a mix of Scala and Java. Targeting the JVM allows developers at Twitter to quickly develop and deploy reliable code. Automated memory management, in particular, improves productivity tremendously, especially in a team-oriented and fast-paced environment.With these benefits, however, come some challenges, too. The way code is executed on the JVM makes it very difficult for developers to understand what has gone wrong and how to fix it when problems arise. Additionally, a lot of the profiling tools are not Scala-aware which sometimes makes their output challenging to interpret. Finally, the sheer volume of data that Twitter’s services handle every day, coupled with stringent latency requirements, stresses the JVM to its limits.The talk will cover:
– Overview of how services are deployed and monitored at Twitter.
– Challenges of the use of the JVM in an environment like Twitter.
– Benefits of using a custom build of the JVM, with features developed in-house.Bio: Tony Printezis is a member of the Performance / Core Runtime group (aka VM Team) in the Infrastructure organization at Twitter. He has over 15 years of (mainly Java) virtual machine implementation experience with special focus on memory management. Most of his projects have involved improving the performance, scalability, responsiveness, parallelism, concurrency, monitoring, and visualization of garbage collectors. He was one of the designers of (and tech lead of) the Garbage-First GC and the original implementer of the widely-used CMS GC.Before Twitter he worked at Adobe (on the eventually canceled next-generation Flash effort), the Java organization at Oracle and Sun Microsystems (where he contributed to the garbage collectors of the Java HotSpot virtual machine), and Sun Microsystems Laboratories (where he did garbage collection-related research). He holds a PhD and a BSc (Hons) in Computing Science, both from the University of Glasgow in Scotland. In his spare time he’s been known to carry unreasonably heavy camera equipment to take pictures of birds (of the big, aluminum / carbon fiber kind)… and he also cooks.

Nov. 3, 2014

Speaker: Armando Solar-Lezama, MIT
Host: Emery Berger
Title: Constraint based synthesis beyond automated programmingThis talk will describe how constraint-based synthesis can provide a unifying framework to attack problems as diverse as program optimization, automated grading, and the production of verified code. The talk will focus on the Sketch synthesis platform, and it will show how it can be applied to each of the aforementioned domains. For program optimization, synthesis allows us to raise the level of abstraction of a program, making it possible to apply aggressive optimizations that can even change the algorithmic complexity of a piece of code. In automated grading, synthesis allows us to improve upon the traditional method of grading programs based on a test suite by allowing us to determine the set of modifications needed to make the program correct. Finally, synthesis can aid the development of verified code by deriving code that is easier to prove correct.Bio: Armando Solar-Lezama is an associate professor at MIT where he leads the Computer Aided Programming Group. His research interests include software synthesis and its applications, as well as high-performance computing, information flow security and probabilistic programming.

A couple of years ago, Gang Tan and I set out to prove the correctness of the “checker” for Google’s Native Client, a service that allows native binary code to run in the context of the Chrome browser. The checker is supposed to ensure a basic sandboxing policy on the code, so any bug could result in a major security vulnerability.

It turns out that proving the correctness of the checker wasn’t the hard part. Rather, the correctness statement depends upon a model of execution for binary code, and building a faithful model of the x86 in something like Coq is er, “challenging” to say the least. I’ll describe some of these challenges, how we tackled some of them, and what still remains.

Computer systems research spans sub-disciplines that include embedded systems, programming languages, networking, and operating systems. In this talk, my contention is that a number of structural factors inhibit quality systems research. Symptoms of the problem include unrepeatable and unreproduced results as well as results that are either devoid of meaning or that measure the wrong thing. I will illustrate the impact of these issues on our research output with examples from the development and empirical evaluation of the Schism real-time garbage collection algorithm that is shipped with the FijiVM — a Java virtual machine for embedded and mobile devices.

I will argue that our field should foster: repetition of results, independent reproduction, as well as rigorous evaluation. I will outline some baby steps taken by several computer conferences. In particular, I will focus on the introduction of Artifact Evaluation Committees or AECs to ECOOP, OOPLSA, PLDI and soon POPL. The goal of the AECs is to encourage authors to package the software artifacts that they used to support the claims made in their paper and to submit these artifacts for evaluation. AECs were carefully designed to provide positive feedback to the authors that take the time to create repeatable research.

Storage-class memory (SCM) technologies such as phase-change memory, spin-transfer torque MRAM, and memristers promise the performance and flexibility of DRAM with the persistence of flash and disk. In this talk, I will discuss two interfaces to persistent data stored in SCM.

First, I will talk about Mnemosyne, which is a system that exposes storage-class memory directly to applications in the form of persistent regions. With only minor hardware changes, Mnemosyne supports consistent in-place updates to persistent data structures and performance up to 10x faster than current storage systems.

Second, I will talk about how to build file systems for storage-class memory. While standard storage device rely on the operating system kernel for protected shared access, SCM can use virtual-memory hardware to protect access from user-mode programs. This enables application-specific customization of file system policies and interfaces I will describe the design of the Aerie file system for SCM, which provides flexible high-performance access to files.

Over the past several decades of compiler research, there have been great successes in automatically enhancing locality for regular programs, which operate over dense matrices and arrays. Tackling locality in irregular programs, which operate over pointer-based data structures such as trees and graphs, has been much harder, and has mostly been left to ad hoc, application specific methods. In this talk, I will describe efforts by my group to automatically improve locality in a particular class of irregular applications, those that traverse trees. The key insight behind our approach is an abstraction of data structure traversals as operations on vectors. This abstraction lets us design transformations, predict their behavior and determine their correctness. I will present two specific transformations we are developing, “point blocking” and “traversal splicing,” and show that they can deliver substantial performance improvements when applied to several real-world irregular kernels.

“Everyday, almost 300 bugs appear…far too many for only the Mozilla programmers to handle” –Mozilla developer, 2005

Software quality is a pernicious problem. Although 40 years of software engineering research has provided developers considerable debugging support, actual bug repair remains a predominantly manual, and thus expensive and time-consuming, process. I will describe GenProg, a technique that uses meta heuristic search strategies to automatically fix software bugs using only a program’s source code and existing test suite. My empirical evidence demonstrates that GenProg can quickly and cheaply fix a large proportion of real-world bugs in large open-source C programs. I will also briefly discuss the atypical search space of the automatic program repair problem, and the ways it has challenged assumptions about software defects and how to fix them.

Like shared-memory multi-threaded programs, event-driven programs such as client-side web applications are susceptible to data races that are hard to reproduce and debug. Yet, these races may cause serious damage (e.g. JavaScript crashes, lost e-mails, broken UI).

Building a concurrency detector which can find harmful races in this setting is particularly challenging due to: i) heavy use of ad-hoc synchronization leading to an overwhelming number of false positives, ii) complex interaction between a large number of events which can render current analyzers impractical, and iii) a need to precisely capture the happens-before relation which is assembled from a diverse set of sources.

In this talk, I will present EventRacer, a dynamic race detector that addresses these challenges and finds real bugs in web applications. We focus on the key points that made EventRaces possible.

We first present a scalable algorithm that uses graph connectivity based on chain-decomposition to find races in long executions. This algorithm significantly outperforms existing state-of-the-art detectors, in both space and time. We then define and show how to find uncovered races — a special class of races that are not affected by user-written ad-hoc synchronization. Uncovered races are key to reducing the number of false positives reported by the tool. We finally present an evaluation of our approach on a set of widely used websites, demonstrate that harmful races are widespread, and show how they could negatively affect user experience.

Rust is a new programming language targeting systems-level applications. Rust offers a similar level of control over performance to C++, but guarantees type soundness, memory safety, and data-race freedom. One of Rust’s distinguishing features is that, like C++, it supports stack allocation and does not require the use of a garbage collector.

This talk covers the basics of Rust. We show how Rust’s type system guarantees memory safety, and how these same techniques can be generalized to provide data-race freedom.