To devise efficient approaches and tools for detecting malicious packages in the Android ecosystem, researchers are increasingly required to have a deep understanding of malware. There is thus a need to ... [more ▼]

To devise efficient approaches and tools for detecting malicious packages in the Android ecosystem, researchers are increasingly required to have a deep understanding of malware. There is thus a need to provide a framework for dissecting malware and locating malicious program fragments within app code in order to build a comprehensive dataset of malicious samples. Towards addressing this need, we propose in this work a tool-based approach called HookRanker, which provides ranked lists of potentially malicious packages based on the way malware behaviour code is triggered. With experiments on a ground truth of piggybacked apps, we are able to automatically locate the malicious packages from piggybacked Android apps with an accuracy@5 of 83.6% for such packages that are triggered through method invocations and an accuracy@5 of 82.2% for such packages that are triggered independently. [less ▲]

Nowadays, a significant portion of the total energy consumption is attributed to the buildings sector. In order to save energy and protect the environment, energy consumption in buildings must be more ... [more ▼]

Nowadays, a significant portion of the total energy consumption is attributed to the buildings sector. In order to save energy and protect the environment, energy consumption in buildings must be more efficient. At the same time, buildings should offer the same (if not more) comfort to their occupants. Consequently, modern buildings have been equipped with various sensors and actuators and interconnected control systems to meet occupants’ requirements. Unfortunately, so far, Building Automation Systems data have not been well-exploited due to technical and cost limitations. Yet, it can be exceptionally beneficial to take full advantage of the data flowing inside buildings in order to diagnose issues, explore solutions and improve occupant-building interactions. This paper presents a plug-and-play and holistic data mining framework named PHoliData for smart buildings to collect, store, visualize and mine useful information and domain knowledge from data in smart buildings. PHoliData allows non technical experts to easily explore and understand their buildings with minimum IT support. An architecture of this framework has been introduced and a prototype has been implemented and tested against real-world settings. Discussions with industry experts have suggested the system to be extremely helpful for understanding buildings, since it can provide hints about energy efficiency improvements. Finally, extensive experiments have demonstrated the feasibility of such a framework in practice and its advantage and potential for buildings operators. [less ▲]

In this work, we investigate the practice of patch construction in the Linux kernel development, focusing on the differences between three patching processes: (1) patches crafted entirely manually to fix ... [more ▼]

In this work, we investigate the practice of patch construction in the Linux kernel development, focusing on the differences between three patching processes: (1) patches crafted entirely manually to fix bugs, (2) those that are derived from warnings of bug detection tools, and (3) those that are automatically generated based on fix patterns. With this study, we provide to the research community concrete insights on the practice of patching as well as how the development community is currently embracing research and commercial patching tools to improve productivity in repair. The result of our study shows that tool-supported patches are increasingly adopted by the developer community while manually-written patches are accepted more quickly. Patch application tools enable developers to remain committed to contributing patches to the code base. Our findings also include that, in actual development processes, patches generally implement several change operations spread over the code, even for patches fixing warnings by bug detection tools. Finally, this study has shown that there is an opportunity to directly leverage the output of bug detection tools to readily generate patches that are appropriate for fixing the problem, and that are consistent with manually-written patches. [less ▲]

Android malware is now pervasive and evolving rapidly. Thousands of malware samples are discovered every day with new models of attacks. The growth of these threats has come hand in hand with the ... [more ▼]

Android malware is now pervasive and evolving rapidly. Thousands of malware samples are discovered every day with new models of attacks. The growth of these threats has come hand in hand with the proliferation of collective repositories sharing the latest specimens. Having access to a large number of samples opens new research directions aiming at efficiently vetting apps. However, automatically inferring a reference ground-truth from those repositories is not straightforward and can inadvertently lead to unforeseen misconceptions. On the one hand, samples are often mis-labeled as different parties use distinct naming schemes for the same sample. On the other hand, samples are frequently mis-classified due to conceptual errors made during labeling processes. In this paper, we analyze the associations between all labels given by different vendors and we propose a system called EUPHONY to systematically unify common samples into family groups. The key novelty of our approach is that no a-priori knowledge on malware families is needed. We evaluate our approach using reference datasets and more than 0.4 million additional samples outside of these datasets. Results show that EUPHONY provides competitive performance against the state-of-the-art. [less ▲]

App repackaging is a common threat in the Android ecosystem. To face this threat, the literature now includes a large body of work proposing approaches for identifying repackaged apps. Unfortunately ... [more ▼]

App repackaging is a common threat in the Android ecosystem. To face this threat, the literature now includes a large body of work proposing approaches for identifying repackaged apps. Unfortunately, although most research involves pairwise similarity comparison to distinguish repackaged apps from their “original” counterparts, no work has considered the threat to validity of not being able to discover the true original apps. We provide in this paper preliminary insights of an investigation into the Multi-Generation Repackaging Hypothesis: is the original in a repackaging process the outcome of a previous repackaging process? Leveraging the Androzoo dataset of over 5 million Android apps, we validate this hypothesis in the wild, calling upon the community to take this threat into account in new solutions for repackaged app detection. [less ▲]

in Abstract book of the 4th IEEE/ACM International Conference on Mobile Software Engineering and Systems (MobileSoft 2017) (2017, May)

To devise efficient approaches and tools for detecting malicious packages in the Android ecosystem, researchers are increasingly required to have a deep understanding of malware. There is thus a need to ... [more ▼]

To devise efficient approaches and tools for detecting malicious packages in the Android ecosystem, researchers are increasingly required to have a deep understanding of malware. There is thus a need to provide a framework for dissecting malware and locating malicious program fragments within app code in order to build a comprehensive dataset of malicious samples. Towards addressing this need, we propose in this work a tool-based approach called HookRanker, which provides ranked lists of potentially malicious packages based on the way malware behaviour code is triggered. With experiments on a ground truth set of piggybacked apps, we are able to automatically locate the malicious packages from piggybacked Android apps with an accuracy of 83.6% in verifying the top five reported items. [less ▲]

The Android packaging model offers adequate opportunities for attackers to inject malicious code into popular benign apps, attempting to develop new malicious apps that can then be easily spread to a ... [more ▼]

The Android packaging model offers adequate opportunities for attackers to inject malicious code into popular benign apps, attempting to develop new malicious apps that can then be easily spread to a large user base. Despite the fact that the literature has already presented a number of tools to detect piggybacked apps, there is still lacking a comprehensive investigation on the piggybacking processes. To fill this gap, in this work, we collect a large set of benign/piggybacked app pairs that can be taken as benchmark apps for further investigation. We manually look into these benchmark pairs for understanding the characteristics of piggybacking apps and eventually we report 20 interesting findings. We expect these findings to initiate new research directions such as practical and scalable piggybacked app detection, explainable malware detection, and malicious code location. [less ▲]

As the concept of Internet of Things (IoT) develops, buildings are equipped with increasingly heterogeneous sensors to track building status as well as occupant activities. As users become more and more ... [more ▼]

As the concept of Internet of Things (IoT) develops, buildings are equipped with increasingly heterogeneous sensors to track building status as well as occupant activities. As users become more and more concerned with their privacy in buildings, explicit sensing techniques can lead to uncomfortableness and resistance from occupants. In this paper, we adapt a sensing by proxy paradigm that monitors building status and coarse occupant activities through agglomerative clustering of indoor temperature movements. Through extensive experimentation on 86 classrooms, offices and labs in a five-story school building in western Europe, we prove that indoor temperature movements can be leveraged to infer latent information about indoor environments, especially about rooms' relative physical locations and rough type of occupant activities. Our results evidence a cost-effective approach to extending commercial building control systems and gaining extra relevant intelligence from such systems. [less ▲]

The Android packaging model offers ample opportunities for malware writers to piggyback malicious code in popular apps, which can then be easily spread to a large user base. Although recent research has ... [more ▼]

The Android packaging model offers ample opportunities for malware writers to piggyback malicious code in popular apps, which can then be easily spread to a large user base. Although recent research has produced approaches and tools to identify piggybacked apps, the literature lacks a comprehensive investigation into such phenomenon. We fill this gap by 1) systematically building a large set of piggybacked and benign apps pairs, which we release to the community, 2) empirically studying the characteristics of malicious piggybacked apps in comparison with their benign counterparts, and 3) providing insights on piggybacking processes. Among several findings providing insights, analysis techniques should build upon to improve the overall detection and classification accuracy of piggybacked apps, we show that piggybacking operations not only concern app code but also extensively manipulates app resource files, largely contradicting common beliefs. We also find that piggybacking is done with little sophistication, in many cases automatically, and often via library code. [less ▲]

in Abstract book of the 16th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom) (2017)

App updates and repackaging are recurrent in the Android ecosystem, filling markets with similar apps that must be identified and analysed to accelerate user adoption, improve development efforts, and ... [more ▼]

App updates and repackaging are recurrent in the Android ecosystem, filling markets with similar apps that must be identified and analysed to accelerate user adoption, improve development efforts, and prevent malware spreading. Despite the existence of several approaches to improve the scalability of detecting repackaged/cloned apps, researchers and practitioners are eventually faced with the need for a comprehensive pairwise comparison to understand and validate the similarities among apps. This paper describes the design of SimiDroid, a framework for multi-level comparison of Android apps. SimiDroid is built with the aim to support the understanding of similarities/changes among app versions and among repackaged apps. In particular, we demonstrate the need and usefulness of such a framework based on different case studies implementing different analysing scenarios for revealing various insights on how repackaged apps are built. We further show that the similarity comparison plugins implemented in SimiDroid yield more accurate results than the state-of-the-art. [less ▲]

Context: State-of-the-art works on automated detection of Android malware have leveraged app descriptions to spot anomalies w.r.t the functionality implemented, or have used data flow information as a ... [more ▼]

Context: State-of-the-art works on automated detection of Android malware have leveraged app descriptions to spot anomalies w.r.t the functionality implemented, or have used data flow information as a feature to discriminate malicious from benign apps. Although these works have yielded promising performance, we hypothesize that these performances can be improved by a better understanding of malicious behavior. Objective: To characterize malicious apps, we take into account both information on app descriptions, which are indicative of apps’ topics, and information on sensitive data flow, which can be relevant to discriminate malware from benign apps. Method: In this paper, we propose a topic-specific approach to malware comprehension based on app descriptions and data-flow information. First, we use an advanced topic model, adaptive LDA with GA, to cluster apps according to their descriptions. Then, we use information gain ratio of sensitive data flow information to build so-called “topic-specific data flow signatures”. Results: We conduct an empirical study on 3691 benign and 1612 malicious apps. We group them into 118 topics and generate topic-specific data flow signature. We verify the effectiveness of the topic-specific data flow signatures by comparing them with the overall data flow signature. In addition, we perform a deeper analysis on 25 representative topic-specific signatures and yield several implications. Conclusion: Topic-specific data flow signatures are efficient in highlighting the malicious behavior, and thus can help in characterizing malware. [less ▲]

Context: Static analysis exploits techniques that parse program source code or bytecode, often traversing program paths to check some program properties. Static analysis approaches have been proposed for ... [more ▼]

Context: Static analysis exploits techniques that parse program source code or bytecode, often traversing program paths to check some program properties. Static analysis approaches have been proposed for different tasks, including for assessing the security of Android apps, detecting app clones, automating test cases generation, or for uncovering non-functional issues related to performance or energy. The literature thus has proposed a large body of works, each of which attempts to tackle one or more of the several challenges that program analysers face when dealing with Android apps. Objective: We aim to provide a clear view of the state-of-the-art works that statically analyse Android apps, from which we highlight the trends of static analysis approaches, pinpoint where the focus has been put, and enumerate the key aspects where future researches are still needed. Method: We have performed a systematic literature review (SLR) which involves studying 124 research papers published in software engineering, programming languages and security venues in the last 5 years (January 2011 - December 2015). This review is performed mainly in five dimensions: problems targeted by the approach, fundamental techniques used by authors, static analysis sensitivities considered, android characteristics taken into account and the scale of evaluation performed. Results: Our in-depth examination has led to several key findings: 1) Static analysis is largely performed to uncover security and privacy issues; 2) The Soot framework and the Jimple intermediate representation are the most adopted basic support tool and format, respectively; 3) Taint analysis remains the most applied technique in research approaches; 4) Most approaches support several analysis sensitivities, but very few approaches consider path-sensitivity; 5) There is no single work that has been proposed to tackle all challenges of static analysis that are related to Android programming; and 6) Only a small portion of state-of-the-art works have made their artefacts publicly available. Conclusion: The research community is still facing a number of challenges for building approaches that are aware altogether of implicit-Flows, dynamic code loading features, reflective calls, native code and multi-threading, in order to implement sound and highly precise static analyzers. [less ▲]

Source code terms such as method names and variable types are often different from conceptual words mentioned in a search query. This vocabulary mismatch problem can make code search inefficient. In this ... [more ▼]

Source code terms such as method names and variable types are often different from conceptual words mentioned in a search query. This vocabulary mismatch problem can make code search inefficient. In this paper, we present Code voCABUlary (CoCaBu), an approach to resolving the vocabulary mismatch problem when dealing with free-form code search queries. Our approach leverages common developer questions and the associated expert answers to augment user queries with the relevant, but missing, structural code entities in order to improve the performance of matching relevant code examples within large code repositories. To instantiate this approach, we build GitSearch, a code search engine, on top of GitHub and StackOverflow Q\&A data. We evaluate GitSearch in several dimensions to demonstrate that (1) its code search results are correct with respect to user-accepted answers; (2) the results are qualitatively better than those of existing Internet-scale code search engines; (3) our engine is competitive against web search engines, such as Google, in helping users complete solve programming tasks; and (4) GitSearch provides code examples that are acceptable or interesting to the community as answers for StackOverflow questions. [less ▲]

in The 15th International Symposium on Intelligent Data Analysis (2016, October)

The abundance of time series data in various domains and their high dimensionality characteristic are challenging for harvesting useful information from them. To tackle storage and processing challenges ... [more ▼]

The abundance of time series data in various domains and their high dimensionality characteristic are challenging for harvesting useful information from them. To tackle storage and processing challenges, compression-based techniques have been proposed. Our previous work, Domain Series Corpus (DSCo), compresses time series into symbolic strings and takes advantage of language modeling techniques to extract from the training set knowledge about different classes. However, this approach was flawed in practice due to its excessive memory usage and the need for a priori knowledge about the dataset. In this paper we propose DSCo-NG, which reduces DSCo’s complexity and offers an efficient (linear time complexity and low memory footprint), accurate (performance comparable to approaches working on uncompressed data) and generic (so that it can be applied to various domains) approach for time series classification. Our confidence is backed with extensive experimental evaluation against publicly accessible datasets, which also offers insights on when DSCo-NG can be a better choice than others. [less ▲]

in The 32nd International Conference on Software Maintenance and Evolution (ICSME) (2016, October)

As Android becomes a de-facto choice of development platform for mobile apps, developers extensively leverage its accompanying Software Development Kit to quickly build their apps. This SDK comes with a ... [more ▼]

As Android becomes a de-facto choice of development platform for mobile apps, developers extensively leverage its accompanying Software Development Kit to quickly build their apps. This SDK comes with a set of APIs which developers may find limited in comparison to what system apps can do or what framework developers are preparing to harness capabilities of new generation devices. Thus, developers may attempt to explore in advance the normally “inaccessible” APIs for building unique API-based functionality in their app. The Android programming model is unique in its kind. Inaccessible APIs, which however are used by developers, constitute yet another specificity of Android development, and is worth investigating to understand what they are, how they evolve over time, and who uses them. To that end, in this work, we empirically investigate 17 important releases of the Android framework source code base, and we find that inaccessible APIs are commonly implemented in the Android framework, which are further neither forward nor backward compatible. Moreover, a small set of inaccessible APIs can eventually become publicly accessible, while most of them are removed during the evolution, resulting in risks for such apps that have leveraged inaccessible APIs. Finally, we show that inaccessible APIs are indeed accessed by third-party apps, and the official Google Play store has tolerated the proliferation of apps leveraging inaccessible API methods. [less ▲]

in Proceedings of 21st IEEE International Conference on Emerging Technologies and Factory Automation ETFA 2016 (2016, September 06)

Unmanned Aerial Vehicles are currently investigated as an important sub-domain of robotics, a fast growing and truly multidisciplinary research field. UAVs are increasingly deployed in real-world settings ... [more ▼]

Unmanned Aerial Vehicles are currently investigated as an important sub-domain of robotics, a fast growing and truly multidisciplinary research field. UAVs are increasingly deployed in real-world settings for missions in dangerous environments or in environments which are challenging to access. Combined with autonomous flying capabilities, many new possibilities, but also challenges, open up. To overcome the challenge of early identification of degradation, machine learning based on flight features is a promising direction. Existing approaches build classifiers that consider their features to be correlated. This prevents a fine-grained detection of degradation for the different hardware components. This work presents an approach where the data is considered uncorrelated and, using machine learning techniques, allows the precise identification of UAV’s damages. [less ▲]

in The 31st IEEE/ACM International Conference on Automated Software (ASE) (2016, September)

We demonstrate the benefits of DroidRA, a tool for taming reflection in Android apps. DroidRA first statically extracts reflection-related object values from a given Android app. Then, it leverages the ... [more ▼]

We demonstrate the benefits of DroidRA, a tool for taming reflection in Android apps. DroidRA first statically extracts reflection-related object values from a given Android app. Then, it leverages the extracted values to boost the app in a way that reflective calls are no longer a challenge for existing static analyzers. This is achieved through a bytecode instrumentation approach, where reflective calls are supplemented with explicit traditional Java method calls which can be followed by state-of-the-art analyzers which do not handle reflection. Instrumented apps can thus be completely analyzed by existing static analyzers, which are no longer required to be modified to support reflection-aware analysis. The video demo of DroidRA can be found at https://youtu.be/-HW0V68aAWc [less ▲]

Unmanned Aerial Vehicles are currently investigated as an important sub-domain of robotics, a fast growing and truly multidisciplinary research field. UAVs are increasingly deployed in real-world settings ... [more ▼]

Unmanned Aerial Vehicles are currently investigated as an important sub-domain of robotics, a fast growing and truly multidisciplinary research field. UAVs are increasingly deployed in real-world settings for missions in dangerous environments or in environments which are challenging to access. Combined with autonomous flying capabilities, many new possibilities, but also challenges, open up. To overcome the challenge of early identification of degradation, machine learning based on flight features is a promising direction. Existing approaches build classifiers that consider their features to be correlated. This prevents a fine-grained detection of degradation for the different hardware components. This work presents an approach where the data is considered uncorrelated and, using machine learning techniques, allows the precise identification of UAV’s damages. [less ▲]

In widely used mobile operating systems a single vulnerability can threaten the security and privacy of billions of users. Therefore, identifying vulnerabilities and fortifying software systems requires ... [more ▼]

In widely used mobile operating systems a single vulnerability can threaten the security and privacy of billions of users. Therefore, identifying vulnerabilities and fortifying software systems requires constant attention and effort. However, this is costly and it is almost impossible to analyse an entire code base. Thus, it is necessary to prioritize efforts towards the most likely vulnerable areas. A first step in identifying these areas is to profile vulnerabilities based on previously reported ones. To investigate this, we performed a manual analysis of Android vulnerabilities, as reported in the National Vulnerability Database for the period 2008 to 2014. In our analysis, we identified a comprehensive list of issues leading to Android vulnerabilities. We also point out characteristics of the locations where vulnerabilities reside, the complexity of these locations and the complexity to fix the vulnerabilities. To enable future research, we make available all of our data. [less ▲]