NEWS

About CSUR

ACM Computing Surveys (CSUR) publishes comprehensive, readable tutorials and survey papers that give guided tours through the literature and explain topics to those who seek to learn the basics of areas outside their specialties. These carefully planned and presented introductions are also an excellent way for professionals to develop perspectives on, and identify trends in complex technologies. Recent issues have covered image understanding, software reusability, and object and relational database topics.

Network Functions Virtualization (NFV) and Software-Defined Networking (SDN) are new paradigms towards open software and network hardware. While NFV aims at virtualizing network functions and deploying them into general purpose hardware, SDN makes networks programmable by separating the control and data planes. NFV and SDN are complementary technologies capable of providing one network solution. SDN can provide connectivity between Virtual Network Functions (VNFs) in a flexible and automated way, whereas NFV can use SDN as part of a service function chain. There are a great deal of studies proposing NFV/SDN architectures in different environments. Researchers have been trying to address reliability, performance, and scalability problems using different architectural designs. This Systematic Literature Review (SLR) focuses on integrated NFV/SDN architectures and has the following goals: i) to investigate and provide an in-depth review of the state-of-the-art of NFV/SDN architectures, ii) to synthesize their architectural designs, and iii) to identify areas for further improvements. In a broad view, this SLR will encourage researchers to advance the current stage of development (i.e., the state-of-the-practice) of integrated NFV/SDN architectures, as well as to shed some light on future research efforts and their challenges.

Pilot-Job systems play an important role in supporting distributed scientific computing. With the increasing importance of task-level parallelism in high-performance computing, Pilot-Job systems are also witnessing an adoption beyond traditional domains. Notwithstanding the growing impact on scientific research, there is no agreement upon a definition of Pilot-Job system and no clear understanding of the underlying abstraction and paradigm. Pilot-Job implementations have proliferated with no shared best practices or open interfaces and little interoperability. Ultimately, this is hindering the realization of the full impact of Pilot-Jobs by limiting their robustness, portability, and maintainability. This paper offers a com- prehensive analysis of Pilot-Job systems critically assessing their motivations, evolution, properties, and implementation. The three main contributions of this paper are: (i) an anal- ysis of the motivations and evolution of Pilot-Job systems; (ii) an outline of the Pilot abstraction, its distinguishing logi- cal components and functionalities, its terminology, and its architecture pattern; and (iii) the description of core and auxiliary properties of Pilot-Jobs systems and the analysis of seven exemplar Pilot-Job implementations. Together, these contributions illustrate the Pilot paradigm, its generality, and how it helps to address some challenges in distributed scientific computing.

The main achievements of spatio-temporal modelling in the field of Geographic Information Science over the past three decades are surveyed. This article offers an overview of: (i) the origins and history of Temporal Geographic Information Systems (T-GIS); (ii) relevant spatio-temporal data models proposed; (iii) the evolution of spatio-temporal modelling trends; and (iv) an analysis of the future trends and developments in T-GIS. It also presents some current theories and concepts that have emerged from the research performed, as well as a summary of the current progress and the upcoming challenges and potential research directions for T-GIS. One relevant result of this survey is the proposed taxonomy of spatio-temporal modelling trends, which classifies 186 modelling proposals surveyed from more than 1400 articles.

Stress is a major concern in daily life that imposes significant and growing health and economic costs on society every year. Stress and driving are a dangerous combination which can lead to life-threatening situations as a large number of road traffic crashes occur every year due to driver stress. In addition, the rate of many general health issues caused by work-related chronic stress in drivers who work in public and private transport is greater than many other occupational groups. Therefore, an early warning system for drivers stress level in car is needed to continuously predict dangerous driving situations and alert the driver pro-actively from the perspective of safety and comfortable driving. With recent developments in ambient intelligence, such as sensing technologies, pervasive devices, context recognition, and communications, it is becoming feasible to comfortably measure combinations of different sensed modalities to recognise driver stress automatically. This survey reviews the most recent researches on automatic driver stress level detection domain based on different sensors and data. Different computational techniques which have been used in this domain for data analysis are investigated. The important methodological issues that hinder the implementation of such a system are discussed and future research directions are offered.

Reproducibility is widely considered to be an essential requirement of the scientific process.
However, a number of serious concerns have been raised recently, questioning whether
today's computational work is adequately reproducible. In principle, it should be possible
to specify a computation to sufficient detail that anyone should be able to reproduce it exactly.
But in practice, there are fundamental, technical, and social barriers to doing so.
The many objectives and meanings of reproducibility are discussed within the context of scientific computing.
Many technical barriers to reproducibility are described, extant approaches surveyed, and open areas of research are identified.

It is essential to find new ways of enabling experts in different disciplines to collaborate more efficiently in the development of ever more complex systems, under increasing market pressures.
One possible solution for this challenge is to use a heterogeneous model-based approach where different teams can produce their conventional models and carry out their usual mono-disciplinary analysis, but in addition, the different models can be coupled for simulation (co-simulation), allowing the study of the global behavior of the system.
Due to its potential, co-simulation is being studied in many different disciplines but with limited sharing of findings.
Our aim with this work is to summarize, bridge, and enhance future research in this multidisciplinary area.
We provide an overview of co-simulation approaches, research challenges, and research opportunities, together with a detailed taxonomy with different aspects of the state of the art of co-simulation and classification for the past five years.
The main research needs identified are: finding generic approaches for modular, stable and accurate coupling of simulation units; and expressing the adaptations required to ensure that the coupling is correct.

Large volumes of spatio-temporal data are increasingly collected and studied in diverse domains including, climate science, social sciences, neuroscience, epidemiology, transportation, mobile health, and Earth sciences. Spatio-temporal data differs from relational data for which computational approaches are developed in the data mining community for multiple decades, in that both spatial and temporal attributes are available in addition to the actual measurements/attributes. The presence of these attributes introduces additional challenges that needs to be dealt with. Approaches for mining spatio-temporal data have been studied for over a decade in the data mining community. In this article we present a broad survey of this relatively young field of spatio-temporal data mining. We discuss different types of spatio-temporal data and the relevant data mining questions that arise in the context of analyzing each of these datasets. Based on the nature of the data mining problem studied, we classify literature on spatio-temporal data mining into six major categories: clustering, predictive learning, change detection, frequent pattern mining, anomaly detection, and relationship mining. We discuss the various forms of spatio-temporal data mining problems in each of these categories.

Until not long ago, manually capturing and storing provenance from scientific experiments were constant concerns for scientists. With the advent of computational experiments (modeled as scientific workflows) and Scientific Workflow Management Systems, produced and consumed data, as well as the provenance of a given experiment, are automatically managed, so provenance capturing and storing in such context is no longer a major concern. Similarly to several existing big data problems, the bottom line is now on how to analyze the large amounts of provenance data generated by workflow executions and how to be able to extract useful knowledge of this data. In this context, this article surveys the current state-of-art on provenance analytics by presenting the key initiatives that have been taken to support provenance data analysis. We also contribute by proposing a taxonomy to classify elements related to provenance analytics.

Huge increase in the number of digital music tracks has created a necessity to develop an automated tool to extract the needful information from those tracks. As this information has to be extracted from the contents of the music, it is known as Content Based - Music Information Retrieval (CB-MIR). As, since recent two decades, several research outcomes are observed in the area of CB-MIR, there is a need to consolidate and critically analyze the research findings to evolve future research directions. In this survey article, various tasks of content based music information retrieval and their applications are critically reviewed. In particular, the article focuses on eight MIR related tasks such as vocal/non-vocal segmentation, artist identification, genre classification, raga identification, query-by-humming (QBH), emotion recognition, instrument recognition and music clip annotation. The article elaborates the signal processing techniques to extract useful features for performing specific tasks mentioned above and discusses their strengths as well as weakness. This paper also points to some general research issues in CB-MIR and probable approaches towards solutions which help in improving the efficiency of existing CB-MIR systems.

Positional data from small and mobile GPS receivers has become ubiquitous and allows for many new applications such as road traffic or vessel monitoring as well as Location Based Services. To make these applications possible for which information on location is more important than ever, streaming spatial data needs to be managed, mined and used intelligently. This paper provides an overview of previous work in this evolving research field and discusses different applications as well as common problems and solutions. The conclusion indicates promising directions for future research.

The gap is widening between the processor clock speed of end-system
architectures and network throughput capabilities. It is now physically
possible to provide single-flow throughput of speeds up to 100 Gbps, and
400 Gbps will soon be possible. Most current research into high-speed
data networking focuses on managing expanding network capabilities
within datacenter Local-Area Networks (LANs) or efficiently
multiplexing millions of relatively small flows through a Wide-Area
Network (WAN). However, datacenter hyper-convergence places
high-throughput networking workloads on general-purpose hardware, and
distributed High-Performance Computing (HPC) applications require
time-sensitive, high-throughput end-to-end flows (also referred to as
elephant flows) to occur over WANs. For these applications, the
bottleneck is often the end-system, and not the intervening network.
Since the problem of the end-system bottleneck was uncovered, many
techniques have been developed which address this mismatch with varying
degrees of effectiveness. In this survey, we describe the most
promising techniques, beginning with network architecturesand NIC
design, continuing with operating and end-system architectures, and
concluding with clean-slate protocol design.

As applications and operating systems are becoming more complex, the last decade has seen the rise of many tracing tools all across the software stack. This paper presents a hands-on comparison of modern tracers on Linux systems, both in user space and kernel space. The authors implement microbenchmarks that not only quantify the overhead of different tracers, but also sample fine-grained metrics that unveil insights into the tracers' internals and show the cause of each tracer's overhead. Internal design choices and implementation particularities are discussed, which helps to understand the challenges of developing tracers. Furthermore, this analysis aims to help users choose and configure their tracers based on their specific requirements in order to reduce their overhead and get the most of out of them.

Networks are used to represent relationships between entities in many complex systems, spanning from online social networks to biological cell development and brain activity. These networks model relationships which present various challenges. In many cases, relationships between entities are unambiguously known: are two users friends in a social network? Do two researchers collaborate on a published paper? Do two road segments in a transportation system intersect? These are unambiguous and directly observable in the system in question. In most cases, relationship between nodes are not directly observable and must be inferred: does one gene regulate the expression of another? Do two animals who physically co-locate have a social bond? Who infected whom in a disease outbreak?
Existing approaches use specialized knowledge in different home domains to infer and measure the goodness of inferred network for a specific task. However, current research lacks a rigorous validation framework which employs standard statistical validation. In this survey, we examine how network representations are learned from non-network data, the variety of questions and tasks on these data over several domains, and validation strategies for measuring the inferred network's capability of answering questions on the original system of interest.

Software testing activities account for a considerable portion of systems development cost and, for this reason, many studies have sought to automate these activities. Test data generation has a high cost reduction potential (specially for complex domain systems), since it can decrease human effort. Although several studies have been published about this subject, articles of reviews covering this topic usually focus only in specific domains. This article presents a systematic mapping aiming at providing a broad, albeit critical, overview of the literature in the topic of test data generation using genetic algorithms. The selected studies were categorized by software testing technique (structural, functional or mutation testing) for which test data were generated and by proposed modifications on genetic algorithms. The most used evaluation metrics and software testing techniques were identified. The results showed that genetic algorithms have been successfully applied to simple test data generation, but are rarely used to generate complex test data such as images, videos, sounds, and three-dimensional models. From these results, we discuss some challenges and opportunities for researches in this area.

Web application providers have been migrating their applications to cloud data centers, attracted by the emerging cloud computing paradigm. One of the appealing features of cloud is elasticity. It allows cloud users to acquire or release computing resources on demand, which enables web application providers to auto-scale the resources provisioned to their applications under dynamic workload in order to minimize resource cost while satisfying Quality of Service (QoS) requirements. In this paper, we comprehensively analyze the challenges remain in auto-scaling web applications in clouds and review the developments in this field. We present a taxonomy of auto-scaling systems according to the identified challenges and key properties. We analyze the surveyed works and map them to the taxonomy to identify the weakness in this field. Moreover, based on the analysis, we propose new future directions.

Context: Software development process measurement is essential to reach predictable performance and high capability processes. Software process measurement provides support for better understanding, evaluation, management and control of the development process, project and resulting product, as well. Measurement enables organizations to recognize, improve, and predict their processes quality and performance, which place organizations in better position to make appropriate and informed decisions as early as possible during the development process. Objective: This study aims to understand the measurement of the software development process, to identify studies, to create a classification scheme based on the identified studies, and then, to map such studies into the scheme so as to answer the research questions. Method: Systematic mapping is the selected research methodology for this project. Results: A total of 419 studies are included, and classified into four groups with respect to their focus and into three groups based on the publishing date. Conclusion: The project effort and productivity are the attributes that have been measured more frequently, followed by process maturity and productivity in second place. GQM and CMMI are the main methods used in the studies, whereas Agile and Lean development and Small and Medium-Size Enterprise are the most frequently identified research contexts.

Recent diversity of storage demands made various shortcomings of traditional RDBMS systems revealed, which in turn led to the
emergence of a new trend of complementary non-relational data management solutions, named as NoSQL (Not only SQL). This
survey mainly aims at presenting the work that has been conducted with regard to four closely related concepts of NoSQL stores:
data model, consistency model, data partitioning and replication. For each concept, its different protocols, and for each protocol,
its corresponding features, strengths and drawbacks are explained. Furthermore, various implementations of each protocol are
exemplified and crystallized through a collection of representative academic and industrial NoSQL technologies. The rationale
behind each design decision along with some corresponding extensions and improvements are discussed. Finally, we disclose
some existing challenges in developing effective NoSQL stores, which need attention from the research community, application
designers and architects.

Network-enabled sensing and actuation devices are key enablers to connect real-world objects to the cyber world. Internet of Things (IoT) uses these network-enabled devices and communication technologies to allow connectivity and integration of physical objects (Things) from real-world into the data-driven digital world (Internet). Enormous amounts of dynamic IoT data are collected from Internet-connected devices. IoT data is, however, often multi-variant streams that are heterogeneous, sporadic, multi-modal and spatio-temporal. IoT data can be disseminated with different granularities and have diverse structures, types and qualities. Dealing with data deluge from heterogeneous IoT resources and services impose challenges on indexing, discovery and ranking mechanisms to build applications that require on-line access and retrieval of IoT data. However, the existing IoT data indexing and discovery approaches are complex (usually based on formal and logical methods) or centralised which hinder their scalability. The primary objective of this paper is to provide a holistic overview of the state-of-the-art on indexing, discovering and ranking of IoT data. We discuss on-line analysis and fast responses to complex queries. The paper aims to pave the way for researchers to design, develop, implement and evaluate techniques and approaches in future for on-line large-scale distributed IoT applications and platforms.

While advances in computing resources have made processing enormous amounts of data possible, human ability to identify patterns in such data has not scaled accordingly. Thus, efficient computational methods for condensing and simplifying data are becoming vital for extracting actionable insights. In particular, while data summarization techniques have been studied extensively, only recently has summarizing interconnected data, or graphs, become popular. This survey is a structured, comprehensive overview of the state-of-the-art methods for summarizing graph data. We first broach the motivation behind and the challenges of graph summarization. We then categorize summarization approaches by the type of graphs taken as input and further organize each category by core methodology. Finally, we discuss applications of summarization on real-world graphs and conclude by describing some open problems in the field.

Contemporary mobile devices are the result of an evolution process where computational and networking capabilities have been continuously pushed so as to keep pace with the constantly growing workload requirements. This has allowed devices such as smartphone and tablets to perform increasingly complex tasks, up to the point of efficiently replacing traditional options such as desktop computers and notebooks. However, these devices are more prone to theft, to compromising or to exploitation for attacks and other malicious activity, mainly due to their portability and size. The need for investigation of the aforementioned incidents resulted in the creation of the Mobile Forensics (MF) discipline. MF, a sub-domain of Digital Forensics (DF) is specialized in extracting and processing evidence from mobile devices in such a way that attacking entities and actions are identified and traced. Beyond its primary research interest on accurate evidence acquisition from mobile devices, MF has recently expanded its scope to encompass the organized and advanced evidence representation and analysis of entities behavior. The current paper aims to present the research conducted within the MF ecosystem during the last six years. Moreover, it identifies the gaps and highlights the differences from past research directions. Lastly, it addresses challenges and open issues in the field.

In the past decade, Convolutional Neural Networks (CNNs) have demonstrated state-of-the-art performance in various Artificial Intelligence tasks. To accelerate the experimentation and development of CNNs, several software frameworks have been released, primarily targeting power-hungry CPUs and GPUs. In this context, reconfigurable hardware in the form of FPGAs constitutes a potential alternative platform that can be integrated in the existing CNN ecosystem to provide a tunable balance between performance, power consumption and programmability. In this paper, a survey of the existing CNN-to-FPGA toolflows is presented, comprising a comparative study of their key characteristics which include the supported applications, architectural choices, design space exploration methods and achieved performance. Moreover, major challenges and objectives introduced by the latest trends in CNN algorithmic research are identified and presented. Finally, an evaluation methodology is proposed, aiming at the comprehensive, complete and in-depth evaluation of CNN-to-FPGA toolflows.

Computational creativity seeks to understand computational mechanisms that can be characterized as creative. Creation of new concepts is a central challenge for any creative system. In this paper, we outline different approaches to concept creation and then review conceptual representations relevant to concept creation. The conceptual representations are organized in accordance with two important perspectives on the distinctions between them. One distinction is between symbolic, spatial and connectionist representations. The other is between descriptive and procedural representations. These two distinctions are orthogonal. Additionally, conceptual representations used in particular creative domains, i.e. language, music, image and emotion, are reviewed separately. For each representation reviewed, we cover the inference it affords, the computational means of building it, and its application in concept creation.

Over the past decades, researches have been proposing different Intrusion Detection approaches to deal with the increasing number and complexity of threats for computer systems. In this context, Random Forest models have been providing a notable performance on their applications in the realm of the behaviour-based Intrusion Detection Systems. Specificities of the Random Forest model are used to provide classification, feature selection and proximity metrics. This work provides a comprehensive review of the general basic concepts related to Intrusion Detection Systems, including taxonomies, attacks, data collection, modelling, evaluation metrics and commonly used methods. It also provides a survey of Random Forest based methods applied on this context, considering the particularities involved in these models. Finally, some open questions and challenges are posed combined with possible directions to deal with them, which may guide future works on the area.

This survey presents multidimensional scaling (MDS) methods and their applications in real world. MDS is an exploratory and multivariate data analysis technique becoming more and more popular. MDS is one of the multivariate data analysis techniques, which tries to represent the higher dimensional data into lower space. The input data for MDS analysis is measured by the dissimilarity or similarity of the objects under observation. Once the MDS technique is applied to the measured dissimilarity or similarity, MDS results in a spatial map. In the spatial map the dissimilar objects are far apart while objects which are similar are placed close to each other. In this survey paper, MDS is described fairly in comprehensive fashion by explaining the basic notions of classical MDS and how MDS can be helpful to analyze the multidimensional data. Later on various MDS based special models are described in a more mathematical way.

Automatic machine-based Facial Expression Analysis (FEA) has witnessed substantial progress in the past few decades motivated by its importance in psychology, security, health, entertainment and human computer interaction. However, the vast majority of current studies are based on non-occluded faces collected in a controlled laboratory environment, and automatic expression recognition from partially occluded faces remains a largely unresolved field, particularly in real-world scenarios. In recent years, increasing efforts have been directed at investigating techniques to handle partial occlusion for FEA. This survey provides a comprehensive review of the recent advances in dataset creation, algorithm development, and investigations of the effects of occlusion, which are crucial in system design and evaluations. It also outlines existing challenges in overcoming partial occlusion and discusses possible opportunities in advancing the technology. To the best of our knowledge, it is the first FEA survey dedicated to occlusion and devoted to serve as a starting point to promote future work.

The goal of privacy metrics is to measure the degree of privacy enjoyed by users in a system and the amount of protection offered by privacy-enhancing technologies.
In this way, privacy metrics contribute to improving user privacy in the digital world.
The diversity and complexity of privacy metrics in the literature makes an informed choice of metrics challenging. As a result, redundant new metrics are proposed frequently, and privacy studies are often incomparable. In this survey we alleviate these problems by structuring the landscape of privacy metrics. For this we explain and discuss a selection of over eighty privacy metrics and introduce a categorization based on the aspect of privacy they measure, their required inputs, and the type of data that needs protection. In addition, we present a method on how to choose privacy metrics based on eight questions that help identify the right privacy metrics for a given scenario, and highlight topics where additional work on privacy metrics is needed. Our survey spans multiple privacy domains and can be understood as a general framework for privacy measurement.

This article presents a comprehensive survey on parallel I/O. This is an important field for High Performance Computing because of the historic gap between processing power and storage latencies, which causes applications performance to be impaired when accessing or generating large amounts of data. As the available processing power and amount of data increase, I/O remains a central issue for the scientific community. In this survey, we present background concepts everyone could benefit from. Moreover, through the comprehensive study of publications from the most important conferences and journals in a five-year time window, we discuss the state of the art of I/O optimization approaches, access pattern extraction techniques, and performance modeling, in addition to general aspects of parallel I/O research. Through this approach, we aim at identifying the general characteristics of the field and the main current and future research topics.

Activities of a clinical sta in healthcare environments must regularly be adapted to new treatment methods, medications and technologies. This constant evolution requires the monitoring of the work ow, or the sequence of actions from actors involved in a procedure, to ensure quality of medical services. In this context, recent advances in sensing technologies, including Real-time Location Systems (RTLS) and Computer Vision, enable high-precision tracking of actors and equipment. The current state-of-the-art about healthcare work ow monitoring typically focuses on a single technology and does not discuss its integration with others. Such an integration can lead to better solutions to evaluate medical work ows. This study aims to ll the gap regarding the analysis of monitoring technologies with a systematic literature review about sensors for capturing the work ow of healthcare environments. Its main scienti c contribution is to identify both current technologies used to track activities in a clinical environment and gaps on their combination to achieve better results. We also propose a taxonomy to classify work regarding sensing technologies and methods. Our review did not identify proposals that combine data obtained from RTLS and Computer Vision sensors. We conclude that a multimodal analysis is more exible and could yield better results.

While cloud computing has brought paradigm shifts to computing services, researchers and developers have also found some problems inherent to its nature such as bandwidth bottleneck, communication overhead, and location blindness. The concept of fog/edge computing is therefore coined to extend the services from the core in cloud data centers to the edge of the network. In recent years, many systems are proposed to better serve ubiquitous smart devices closer to the user. This paper provides a complete and up-to-date review of edge-oriented computing systems by encapsulating relevant proposals on their architecture features, management approaches, and design objectives.

Activity recognition aims to provide accurate and opportune information on peoples activities by leveraging
sensory data available in todays sensory rich environments. Nowadays, activity recognition has become an
emerging field in the areas of pervasive and ubiquitous computing. A typical activity recognition technique
processes data streams that evolve from sensing platforms such as mobile sensors, on body sensors, and/or
ambient sensors. This paper surveys the two overlapped areas of research of activity recognition and data
stream mining. The perspective of this paper is to review the adaptation capabilities of activity recognition
techniques in streaming environment. Broad categories of techniques are identified based on the different
features in both data streams and activity recognition. The pros and cons of the algorithms in each category
are analysed and the possible directions of future research are indicated.

The size of Linked Data is growing fast, thus a Linked Data management system must to be able to deal with increasing amounts of data. Even though physically handling Linked Data using a relational table is possible, querying a giant triple table becomes very costly due to the multiple nested joins required for typical queries. In addition, the heterogeneity of Linked Data poses entirely new challenges to database systems. This article provides a comprehensive study of the state of the art in storing and querying RDF data. In particular, we focus on data storage techniques, indexing strategies, and query execution mechanisms. In addition, we provide a classification of existing systems and approaches. We also provide an overview of the various benchmarking efforts in this context and discuss some of the open problems in this domain.

The Internet has undergone dramatic changes in the past 15 years, and now forms a global communication platform that billions of users rely on for their daily activities. While this transformation has brought tremendous benefits to society, it has also created new threats to online privacy, such as omnipotent governmental surveillance. As a result, public interest in systems for anonymous communication has drastically increased. In this work, we survey previous research on designing, developing, and deploying systems for anonymous communication. Our taxonomy and comparative assessment provide important insights about the differences between the existing classes of anonymous communication protocols.

Many networking research activities are dependent on the availability of network captures. Even outside academic research there is a need for sharing network captures, to cooperate on threat assessments or for debugging. However, most network captures can not be shared due to privacy concerns.
There have been many advances in the understanding of anonymisation and cryptographic methods, which have changed the perspective on the effectiveness of many anonymisation techniques. On the other hand these advances, combined with the increase of computational abilities, may have also made it feasible to perform anonymisation in real-time. This may make it easier to collect and distribute network captures, both for research and for other applications.
This article surveys the literature over the period of 1998 -- 2015 on network traffic anonymisation techniques and implementations. The aim is to provide an overview of the current state of the art, and to highlight how advances in related fields have shed new light on anonymisation and pseudonimisation methodologies. The few currently maintained implementations are also reviewed. Lastly, we identify future research directions to enable easier sharing of network traffic, which in turn can enable new insights in network traffic analy

Many security and software testing applications require checking whether certain properties of a program hold for any possible usage scenario. For instance, a tool for identifying software vulnerabilities may need to rule out the existence of any backdoor to bypass a program's authentication. One approach would be to test the program using different, possibly random inputs. As the backdoor may only be hit for very specific program workloads, automated exploration of the space of possible inputs is of the essence. Symbolic execution provides an elegant solution to the problem, by systematically exploring many possible execution paths at the same time without necessarily requiring concrete inputs. Rather than taking on fully specified input values, the technique abstractly represents them as symbols, resorting to constraint solvers to construct actual instances that would cause property violations. Symbolic execution has been incubated in dozens of tools developed over the last four decades, leading to major practical breakthroughs in a number of prominent software reliability applications. The goal of this survey is to provide an overview of the main ideas, challenges, and solutions developed in the area, distilling them for a broad audience.

Intrusion alert analysis is an attractive and active topic in the area of intrusion detection and prevention system (IDPS). In recent decades, many research communities have been working in this field. Therefore, a large volume of research works are released and hence, various research areas have emerged. However, there has been no systematic and up-to-date review of research works within the field. The main objective of this paper is to achieve a taxonomy of research fields in intrusion alert analysis and present a reference guide for researchers who want to enter in this area. To this aim, a systematic mapping study (SMS) on 433 high-quality research works has been conducted. By using keywords clustering, there are ten different research topics in the field of intrusion alert analysis which can be classified into three broad groups: pre-processing, processing, and post-processing. A brief description is provided regarding these groups and their related topics. Indeed, some useful analysis are presented based on data extraction from research works. The results show that the processing group contains most of the research works and newly moved to heterogeneous correlation. Also, the post-processing group is newer than others and recently considered by research communities and security administrators.