Columbia Engineering Stars at Sigmetrics 2014

Jun 17 2014 | By Holly Evarts

A number of professors and graduate students at the Engineering School are presenting papers at this year’s ACM SIGMETRICS 2014, the annual flagship conference of the ACM’s (Association for Computing Machinery) Special Interest Group on Measurement and Evaluation. Announced earlier today, Computer Science Professor Jason Nieh and his team nabbed a best paper award for their development of PlayDrone, a new tool that reveals critical security problems in Google Play. Held this year at Austin, TX, from June 16-20, SIGMETRICS is considered the leading conference in performance analysis and modeling in the world. The conference is highly selective and has featured many groundbreaking papers and presentations on performance analysis, measurement, and modeling of computer systems.

“Columbia is the most represented institution with five papers and a poster co-authored by our students and faculty,” says Augustin Chaintreau, assistant professor of computer science and one of the co-authors. “For 40 accepted out of 238 submissions, one could even say that, if things were purely left to chance, this might happen only once in 7,000 years!”

The papers, which include one from a new professor joining Columbia Engineering this summer, are:

This paper describes PlayDrone, a tool developed by the paper’s authors that uses various hacking techniques to circumvent Google security to successfully download all Google Play apps and recover their sources. Using PlayDrone, the researchers demonstrate a crucial security problem in Google Play, the official app store for millions of Android users. PlayDrone scales by simply adding more servers and is fast enough to crawl Google Play on a daily basis, downloading more than 1.1 million Android apps and decompiling over 880,000 free applications. The team discovered all kinds of new information about the content in Google Play, including a critical security problem: developers often store their secret keys in their apps software, similar to usernames and passwords, and these can be then used by anyone to maliciously steal user data or resources from service providers such as Amazon and Facebook. These vulnerabilities can affect users even if they are not actively running the Android apps.

John Kymissis and Gil Zussman

Movers and Shakers: Kinetic Energy Harvesting for the Internet of Things

This paper focuses on human and object motion energy availability and properties in commonplace “Internet of Things” scenarios. The researchers examined the properties of common human motions, such as walking, running, and bicycling, using a 40-participant human-motion dataset collected for activity recognition purposes. They also studied kinetic energy availability associated with normal human routines, based on over 200 hours of acceleration information associated with human daily routines (the collected dataset is available via CRAWDAD). The study demonstrates unexpectedly low energy availability associated with some high-amplitude periodic object motions. It also describes and evaluates energy allocation algorithms for wearable energy harvesting devices that take into account practical design considerations. The paper was recently featured in MIT Technology Review.

Retransmissions represent a primary failure recovery mechanism in modern communication systems. Similarly, processor sharing (PS) is a commonly used scheduling policy that guarantees fair resource allocation among multiple users. In this study, the researchers found a new source of instability in modern communication systems: sharing in the presence of retransmissions. In particular, they show that PS-based scheduling leads to complete instability when jobs need to restart after failures, regardless of how loaded the system is. Their results suggest that first-come-first-serve scheduling should be preferred in such systems in order to eliminate instabilities.

We use social media increasingly often, but is social media efficient? How can we define its efficiency? Can this be scientifically measured and mathematically explained? While several related studies rely on proprietary data and/or experiments with real subjects that are prohibitively costly, this paper offers the first answer to that question using publicly available datasets, making this study reproducible. The researchers posit that the key to answering this question lies in the ability of social media users to filter information. Using 11 datasets totaling millions of tweets that mention URLs, they are able to show that users who post less frequently pick disproportionately what's most popular. You could therefore receive either only blockbusters or some niche items by choosing to follow users of various posting activities. Since the researchers can observe the same phenomenon in different social media, they wanted to analyze whether a common reason is compatible with all these new observations: they found that a simple model that includes users’ and bloggers’ self-interests reproduces such efficient information diffusion.

Javad Ghaderi

Serving Content with Unknown Demand: the High-Dimensional RegimeSharayu Moharir (The University of Texas at Austin); Javad Ghaderi (The University of Texas at Austin; joining Columbia Engineering’s Department of Electrical Engineering July 1); Sujay Sanghavi (The University of Texas at Austin); Sanjay Shakkottai (The University of Texas at Austin)

Two trends have emerged in the setting of large-scale content delivery networks like YouTube. First, there has been a sharp rise not just in the volume of data, but also in the number of content types, like YouTube videos. Second, the popularity and demand for most of this content are uneven and ephemeral; in many cases, a particular content type, such as a specific video, becomes popular for a short interval of time after which the demand disappears. Naturally, the storage and content replication strategy—what content should be stored on each of the servers—forms an important part of the system architecture. This paper looks at efficient content storage policies in such high-dimensional settings. The study’s key result is that in contrast to the classical settings, a "learn-and-optimize" approach—separating the estimation of demands and subsequently using the estimations to design optimal content placement policies—is strictly suboptimal. The researchers propose simple adaptive strategies that outperform any content placement strategy based on the "learn-and-optimize" approach.