The goal of this summer school was to provide a forum for learning and discussing all aspects of computer and communication security, including foundational topics as well as cutting-edge research on network security, database security, and program security. The summer school also touched on topics in data security and user privacy. Theoreticians and practitioners from academia, government, and industry from across the globe conducted tutorials, presented lectures, and participated in panel discussions on open problems that are related to security and privacy concerns in society.

The school focused on building a strong foundation for students who are new to the field by balancing formal methods with practical aspects of ensuring security and privacy. Specific topics included the formalization of security and privacy properties, practical aspects of deploying security solutions across networks and federated systems (including clouds), analysis of programs for information flow security, logic for access control, network security issues such as intrusion detection, identification and prevention of malware, formal analysis of cryptographic protocols and implementations, and important privacy-preserving techniques.

Talk Abstracts

Organizations that collect, use and share personal information have to ensure that they do so in a manner that complies with privacy regulations and respects privacy promises made to the information subjects. A recent survey from Deloitte and the Ponemon Institute recognizes this problem as one of the greatest challenges facing organizations today. I will report on work from my group over the last few years that addresses this problem.

First, we propose PrivacyLFP, a richly expressive first-order logic and signature for the specifi cation of privacy policies with support for self-reference, purposes of uses and disclosures, and real-time provisions and obligations. We present the first complete formalization of all transmission-related clauses of HIPAA and GLBA using this logic.

Second, we develop an iterative algorithm for enforcing policies expressed in this logic. The algorithm, which we name "reduce", addresses two fundamental challenges in compliance checking that arise in practice. First, in order to be applicable to realistic policies, reduce operates on policies expressed in a first-order logic that allows restricted quantification over in nite domains. We build on ideas from logic programming to identify the restricted form of quantified formulas. The logic can, in particular, express all 84 disclosure-related clauses of the HIPAA Privacy Rule, which involve quantification over the infinite set of messages containing personalinformation. Second, since audit logs are inherently incomplete (they may not contain su cient information to determine whether a policy is violated or not), reduce proceeds iteratively: in each iteration, it provably checks as much of the policy as possible over the current log and outputs a residual policy that can only be checked when the log is extended with additional information. We prove correctness, termination, time and space complexity results for reduce. We implement reduce and optimize the base implementation using two heuristics for database indexing that are guided by the syntactic structure of policies. The implementation is used to check simulated audit logs for compliance with the HIPAA Privacy Rule. Our experimental results demonstrate that the algorithm is fast enough to be used in practice.

Since certain predicates in privacy policies are subjective (capturing, for example, beliefs and purposes), they cannot be automatically checked (they will always remain in the residual policy), determining their truth values require external oracles. In practice, these predicates are checked by human auditors. We develop the first principled learning-theoretic foundation for such audits. Our model takes pragmatic considerations into account, in particular, the periodic nature of audits, a budget that constrains the number of actions that the defender can inspect, and a loss function that captures the economic impact of detected and missed violations on the organization. We develop an efficient audit mechanism that provably minimizes regret (a learning concept) for the auditor. The mechanism learns from experience to guide the auditing effort.

Major Internet, phone and web application providers are all, for the most part, in bed with the US government. They all routinely disclose their customers' communications and other private data to law enforcement and intelligence agencies. Worse, firms like Google and Microsoft specifically log data in order to assist the government, while AT&T and Verizon are paid $1.8 million per year in order to provide real time access to customer communications records to the FBI. How many government requests do major ISPs get for its customers' communications each year? How many do they comply with?

How many do they fight? How much do they charge for the surveillance assistance they provide? Who knows. Most companies have a strict policy of not discussing such topics.

You might assume that the law gives companies very little wiggle room – when they are required to provide data, they must do so. This is true. However, companies have a huge amount of flexibility in the way they design their networks, in the amount of data they retain by default, the emergency circumstances in which they share data without a court order, and the degree to which they fight unreasonable requests.

The differences in the privacy practices of the major players in the telecommunications and Internet applications market are significant: Some firms retain identifying data for years, while others retain no data at all; some voluntarily provide the government access to user data ‐ Verizon even argued in court that it has a free speech right to give the NSA access to calling records, while other companies refuse to voluntarily disclose data without a court order; some companies charge government agencies when they request user data, while others disclose it for free. For an individual later investigated by the government, the data retention practices adopted by their phone company or email provider can significantly impact their freedom.

Unfortunately, although many companies claim to care about end‐user privacy, and some even that they compete on their privacy features, none seem to be willing to compete on the extent to which they assist or resist the government in its surveillance activities. Because information about each firm’s practices is not publicly known, consumers cannot vote with their dollars, and pick service providers that best protect their privacy.

This talk will pierce the veil of secrecy surrounding these practices. Based upon a combination of Freedom of Information Act requests, off the record conversations with industry lawyers, and investigative journalism, the practices of many of these firms will be revealed.

Can you hear me now? What we know about law enforcement surveillance of Internet and mobile communicationsChristopher Soghoian (Indiana University)

Wiretaps, at least In Hollywood, often involve FBI agents hiding in an unmarked van outside a suspect’s home, crouched over a set of headphones, as they listen to telephone calls taking place inside. Similarly, the seizure of digital evidence often involves a pre‐dawn raid by a team of armed agents, who later emerge from the target’s home with computers, documents and various types of storage media. In the movies, law enforcement agents obtain the evidence themselves, usually at great personal risk.

While these investigative methods look great on the big screen, they are largely a relic of the past – from an era before modern telecommunications providers, cloud computing and mobile phones. These days, the police or FBI can obtain most of the data they need from the comfort and safety of their own desks, with a few clicks of a mouse, a fax, or a phone call to a telecommunications or Internet service provider. The actual collection of evidence is now increasingly performed by the same companies that consumers rely on to transmit and store their phone calls, emails and documents.

Such third party facilitated surveillance has become a routine tool for law enforcement agencies in the United States.There are likely hundreds of thousands of such requests per year. Unfortunately there are few detailed statistics documenting the use of many modern surveillance methods. As such, the true scale of law enforcement surveillance, although widespread, remains largely shielded from public view.

In this talk, I examine the existing electronic surveillance reporting requirements and the reports that have been created as a result. Some of these have been released to public, but many have only come to light as a result of Freedom of Information Act requests or leaks by government insiders. I also also examine several law enforcement surveillance methods for which there are no existing legally mandated surveillance reports. While the information I will present is by no means complete, it does at least indicate the likely scale of such activities.

This talk introduces the compelled certificate creation attack, in which government agencies may compel a certificate authority to issue false SSL certificates that can be used by intelligence agencies to covertly intercept and hijack individuals' secure Web-based communications. Although we do not have direct evidence that this form of active surveillance is taking place in the wild, we show how products already on the market are geared and marketed towards this kind of use---suggesting such attacks may occur in the future, if they are not already occurring. Finally, we introduce a lightweight browser add-on that detects and thwarts such attacks.

The browser has become the de facto platform for everyday computation. Among the many potential attacks that target or exploit browsers, vulnerabilities in browser extensions have received relatively little attention. Currently, extensions are vetted by manual inspection, which does not scale well and is subject to human error. In this paper, we present VEX, a framework for highlighting potential security vulnerabilities in browser extensions by applying static information-flow analysis to the JavaScript code used to implement extensions. We describe several patterns of flows as well as unsafe programming practices that may lead to privilege escalations in Firefox extensions. VEX analyzes Firefox extensions for such flow patterns using high-precision, context-sensitive, flow-sensitive static analysis. We analyze thousands of browser extensions, and VEX finds six exploitable vulnerabilities, three of which were previously unknown. VEX also finds hundreds of examples of bad programming practices that may lead to security vulnerabilities. We show that compared to current Mozilla extension review tools, VEX greatly reduces the human burden for manually vetting extensions when looking for key types of dangerous flows.

I will describe an algorithm for automated worm detection that we first deployed as part of the EarlyBird system at UCSD in software and later as part of a chipset at 20 Gbps. The algorithm works by identifying certain behavioral characteristics of worm-like content: while these characteristics are simple, the challenge is to implement the algorithm at high speeds. While worm detection requires learning content strings characteristic of worms, if time permits, I will also describe the corresponding detection algorithm to detect such strings in packet payloads at high speeds. This is based on an OSDI 2004 paper called "Automated Worm Detection"

Maintaining correct access control to shared resources such as file servers, wikis, and databases is an important part of enterprise network management. A combination of many factors, including high rates of churn in organizational roles, policy changes, and dynamic information-sharing scenarios, can trigger frequent updates to user permissions, leading to potential inconsistencies. With Baaz, we present a distributed system that monitors updates to access control metadata, analyzes this information to alert administrators about potential security and accessibility issues, and recommends suitable changes. Baaz detects misconfigurations that manifest as small inconsistencies in user permissions that are different from what their peers are entitled to, and prevents integrity and confidentiality vulnerabilities that could lead to insider attacks. In a deployment of our system on an organizational file server that stored confidential data, we found 10 high level security issues that impacted 1639 out of 105682 directories. These were promptly rectified.

User authentication is a fundamental security primitive which is used in a variety of electronic transactions in the world today. In this lecture, I will present some of the basic techniques of user authentication and discuss the pros and cons of each technique. I will then discuss some emerging applications in the developing world for which user authentication is a key building block. The main focus of the presentation will be on branchless banking systems, designed for extend financial services in remote areas, and on building secure and easy-to-use authentication tools for these systems. We will also discuss the UID project of the Government of India and the research challenges associated with implementing secure user authentication in the manner that UID envisages.

Intrusion Prevention Systems (IPSs) are devices available in the market that check packets entering a protected network and can drop packets containing certain forbibben strings. While the string matching and regular expression matching aspects of IPSs are challenging, a surprisingly complex task is to handle evasion using a standard technique called normalization. I will show that the worst-case complexity of handling normalization is very large and will examine some extra assumptions that can allow more efficient algorithms. This is based on a SIGCOMM 2005 paper called Detecting Evasion Attacks at High Speeds without Reassembly.

Research in computer security has historically advocated Design for Security, the principle that security must be proactively integrated into the design of a system. While examples exist in the research literature of systems that have been designed for security, there are few examples of such systems deployed in the real world. Economic and practical considerations force developers to abandon security and focus instead on functionality and performance, which are more tangible than security. As a result, large bodies of legacy code often have inadequate security mechanisms. Security mechanisms are added to legacy code on-demand using ad hoc and manual techniques, and the resulting systems are often insecure.

This talk advocates the need for techniques to retrofit systems with security mechanisms. In particular, it focuses on the problem of retrofitting legacy code with mechanisms for authorization policy enforcement. It introduces a new formalism, called fingerprints, to represent security-sensitive operations. Fingerprints are code templates that represent accesses to security-critical resources, and denote key steps needed to perform operations on these resources. This talk develops both fingerprint mining and fingerprint matching algorithms.

Fingerprint mining algorithms discover fingerprints of security-sensitive operations by analyzing source code. This talk presents two novel algorithms that use dynamic program analysis and static program analysis, respectively, to mine fingerprints. The fingerprints so mined are used by the fingerprint matching algorithm to statically locate security-sensitive operations. Program transformation is then employed to statically modify source code by adding authorization policy lookups at each location that performs a security-sensitive operation.

These techniques have been applied to three real-world systems. These case studies demonstrate that techniques based upon program analysis and transformation offer a principled and automated alternative to the ad hoc and manual techniques that are currently used to retrofit legacy software with security mechanisms. Time permitting, we will talk about other problems in the context of retrofitting legacy code for security. I will also indicate where ideas from model-checking have been used in this work.

In recent years, viruses and worms have started to pose threats at Internet scale in an intelligent, organized manner, enrolling millions of unsuspecting and unprepared PC owners in spamming, denial-of-service, and phishing activities. In January 2007, Vint Cerf stated that "of the 600 million computers currently on the Internet, between 100 and 150 million were already part of these botnets." A botnet is a network of malware-infected machines that are under the control of one attacker. The fundamental cause of the current situation is the limitations inherent in current detection technologies. Commercial virus scanners have low resilience to new attacks because malware writers continuously seek to evade detection through the use of obfuscation. Any malware-detection technique that can counter these attacks must be able to (1) identify malicious code under the cover of obfuscation and (2) provide some guarantee for the detection of future malware.

In my talk, I present a new approach to the detection of malicious code that addresses these requirements by taking into account the high-level program behavior without an increase in false positives. The cornerstone of this approach is a formalism called malspecs (i.e., specifications of malicious behavior) that incorporates instruction semantics to gain resilience to common obfuscations. Experimental evaluation demonstrates that our behavior-based malware-detection algorithm can detect variants of malware due to their shared malicious behaviors, while maintaining a relatively low run-time overhead (a requirement for real-time protection). Additionally, the malspec formalism enables reasoning about the resilience of a detector. In this context, I present a strategy for proving the soundness and completeness of detection algorithms. In this talk, I will also discuss a mining algorithm for specifications of malicious behavior.

Online advertising is a major economic force in the Internet today, funding a wide variety of websites and services. Today’s deployments, however, erode privacy and degrade performance as browsers wait for ad networks to deliver ads. This paper presents Privad, an online advertising system designed to be faster and more private than existing systems while filling the practical market needs of targeted advertising: ads shown in web pages; targeting based on keywords, demographics, and interests; ranking based on auctions; view and click accounting; and defense against click-fraud. Privad occupies a point in the design space that strikes a balance between privacy and practical considerations. This paper presents the design of Privad, and analyzes the pros and cons of various design decisions. It provides an informal analysis of the privacy properties of Privad. Based on microbenchmarks and traces from a production advertising platform, it shows that Privad scales to present-day needs while simultaneously improving users’ browsing experience and lowering infrastructure costs for the ad network. Finally, it reports on our implementation of Privad and deployment of over two thousand clients.