Abstract 1: Social Networks such as Twitter are often used for disseminating and collecting information during natural disasters. The potential for its use in Disaster Management has been acknowledged. However, more nuanced understanding of the communications that take place on social networks are required to more effectively integrate this information into the processes within disaster management. The type and value of information shared should be assessed, determining the benefits and issues, with credibility and reliability as known concerns. Mapping the tweets in relation to the modelled stages of a disaster can be a useful evaluation for determining the benefits/drawbacks of using data from social networks, such as Twitter, in disaster management.A thematic analysis of tweets’ content, language and tone during the UK Storms and Floods 2013/14 was conducted. Manual scripting was used to determine the official sequence of events, and classify the stages of the disaster into the phases of the Disaster Management Lifecycle, to produce a timeline. Twenty- five topics discussed on Twitter emerged, and three key types of tweets, based on the language and tone, were identified. The timeline represents the events of the disaster, according to the Met Office reports, classed into B. Faulkner’s Disaster Management Lifecycle framework. Context is provided when observing the analysed tweets against the timeline. This illustrates a potential basis and benefit for mapping tweets into the Disaster Management Lifecycle phases. Comparing the number of tweets submitted in each month with the timeline, suggests users tweet more as an event heightens and persists. Furthermore, users generally express greater emotion and urgency in their tweets.This paper concludes that the thematic analysis of content on social networks, such as Twitter, can be useful in gaining additional perspectives for disaster management. It demonstrates that mapping tweets into the phases of a Disaster Management Lifecycle model can have benefits in the recovery phase, not just in the response phase, to potentially improve future policies and activities. Abstract2: The current execution of privacy policies, as a mode of communicating information to users, is unsatisfactory. Social networking sites (SNS) exemplify this issue, attracting growing concerns regarding their use of personal data and its effect on user privacy. This demonstrates the need for more informative policies. However, SNS lack the incentives required to improve policies, which is exacerbated by the difficulties of creating a policy that is both concise and compliant. Standardization addresses many of these issues, providing benefits for users and SNS, although it is only possible if policies share attributes which can be standardized. This investigation used thematic analysis and cross- document structure theory, to assess the similarity of attributes between the privacy policies (as available in August 2014), of the six most frequently visited SNS globally. Using the Jaccard similarity coefficient, two types of attribute were measured; the clauses used by SNS and the coverage of forty recommendations made by the UK Information Commissioner’s Office. Analysis showed that whilst similarity in the clauses used was low, similarity in the recommendations covered was high, indicating that SNS use different clauses, but to convey similar information. The analysis also showed that low similarity in the clauses was largely due to differences in semantics, elaboration and functionality between SNS. Therefore, this paper proposes that the policies of SNS already share attributes, indicating the feasibility of standardization and five recommendations are made to begin facilitating this, based on the findings of the investigation.

As our world becomes increasingly interconnected, diseases can spread at a faster and faster rate. Recent years have seen large-scale influenza, cholera and ebola outbreaks and failing to react in a timely manner to outbreaks leads to a larger spread and longer persistence of the outbreak. Furthermore, diseases like malaria, polio and dengue fever have been eliminated in some parts of the world but continue to put a substantial burden on countries in which these diseases are still endemic. To reduce the disease burden and eventually move towards countrywide elimination of diseases such as malaria, understanding human mobility is crucial for both planning interventions as well as estimation of the prevalence of the disease. In this talk, I will discuss how various data sources can be used to estimate human movements, population distributions and disease prevalence as well as the relevance of this information for intervention planning. Particularly anonymised mobile phone data has been shown to be a valuable source of information for countries with unreliable population density and migration data and I will present several studies where mobile phone data has been used to derive these measures.

The talks are by EA Draffan, Nawar Halabi, Gareth Beeston and Neil Rogers. In 6m40s and 20 slides, each member of Bay 13 will introduce themselves, explaining their background and research interests, so those in WAIS can put a name to a face, and chat after the event if there are common interests.

Abstract Big data nowadays is a fashionable topic, independently of what people mean when they use this term. But being big is just a matter of volume, although there is no clear agreement in the size threshold. On the other hand, it is easy to capture large amounts of data using a brute force approach. So the real goal should not be big data but to ask ourselves, for a given problem, what is the right data and how much of it is needed. For some problems this would imply big data, but for the majority of the problems much less data will and is needed. In this talk we explore the trade-offs involved and the main problems that come with big data using the Web as case study: scalability, redundancy, bias, noise, spam, and privacy. Speaker Biography Ricardo Baeza-Yates Ricardo Baeza-Yates is VP of Research for Yahoo Labs leading teams in United States, Europe and Latin America since 2006 and based in Sunnyvale, California, since August 2014. During this time he has lead the labs in Barcelona and Santiago de Chile. Between 2008 and 2012 he also oversaw the Haifa lab. He is also part time Professor at the Dept. of Information and Communication Technologies of the Universitat Pompeu Fabra, in Barcelona, Spain. During 2005 he was an ICREA research professor at the same university. Until 2004 he was Professor and before founder and Director of the Center for Web Research at the Dept. of Computing Science of the University of Chile (in leave of absence until today). He obtained a Ph.D. in CS from the University of Waterloo, Canada, in 1989. Before he obtained two masters (M.Sc. CS & M.Eng. EE) and the electronics engineer degree from the University of Chile in Santiago. He is co-author of the best-seller Modern Information Retrieval textbook, published in 1999 by Addison-Wesley with a second enlarged edition in 2011, that won the ASIST 2012 Book of the Year award. He is also co-author of the 2nd edition of the Handbook of Algorithms and Data Structures, Addison-Wesley, 1991; and co-editor of Information Retrieval: Algorithms and Data Structures, Prentice-Hall, 1992, among more than 500 other publications. From 2002 to 2004 he was elected to the board of governors of the IEEE Computer Society and in 2012 he was elected for the ACM Council. He has received the Organization of American States award for young researchers in exact sciences (1993), the Graham Medal for innovation in computing given by the University of Waterloo to distinguished ex-alumni (2007), the CLEI Latin American distinction for contributions to CS in the region (2009), and the National Award of the Chilean Association of Engineers (2010), among other distinctions. In 2003 he was the first computer scientist to be elected to the Chilean Academy of Sciences and since 2010 is a founding member of the Chilean Academy of Engineering. In 2009 he was named ACM Fellow and in 2011 IEEE Fellow.

Abstract: Big Data has been characterised as a great economic opportunity and a massive threat to privacy. Both may be correct: the same technology can indeed be used in ways that are highly beneficial and those that are ethically intolerable, maybe even simultaneously. Using examples of how Big Data might be used in education - normally referred to as "learning analytics" - the seminar will discuss possible ethical and legal frameworks for Big Data, and how these might guide the development of technologies, processes and policies that can deliver the benefits of Big Data without the nightmares. Speaker Biography: Andrew Cormack is Chief Regulatory Adviser, Jisc Technologies. He joined the company in 1999 as head of the JANET-CERT and EuroCERT incident response teams. In his current role he concentrates on the security, policy and regulatory issues around the network and services that Janet provides to its customer universities and colleges. Previously he worked for Cardiff University running web and email services, and for NERC's Shipboard Computer Group. He has degrees in Mathematics, Humanities and Law.

The proliferation of Web-based learning objects makes finding and evaluating online resources problematic. While established Learning Analytics methods use Web interaction to evaluate learner engagement, there is uncertainty regarding the appropriateness of these measures. In this paper we propose a method for evaluating pedagogical activity in Web-based comments using a pedagogical framework, and present a preliminary study that assigns a Pedagogical Value (PV) to comments. This has value as it categorises discussion in terms of pedagogical activity rather than Web interaction. Results show that PV is distinct from typical interactional measures; there are negative or insignificant correlations with established Learning Analytics methods, but strong correlations with relevant linguistic indicators of learning, suggesting that the use of pedagogical frameworks may produce more accurate indicators than interaction analysis, and that linguistic rather than interaction analysis has the potential to automatically identify learning behaviour.

This talk will present an overview of the ongoing ERCIM project SMARTDOCS (SeMAntically-cReaTed DOCuments) which aims at automatically generating webpages from RDF data. It will particularly focus on the current issues and the investigated solutions in the different modules of the project, which are related to document planning, natural language generation and multimedia perspectives. The second part of the talk will be dedicated to the KODA annotation system, which is a knowledge-base-agnostic annotator designed to provide the RDF annotations required in the document generation process.

As ubiquitous systems have moved out of the lab and into the world the need to think more systematically about how there are realised has grown. This talk will present intradisciplinary work I have been engaged in with other computing colleagues on how we might develop more formal models and understanding of ubiquitous computing systems. The formal modelling of computing systems has proved valuable in areas as diverse as reliability, security and robustness. However, the emergence of ubiquitous computing raises new challenges for formal modelling due to their contextual nature and dependence on unreliable sensing systems. In this work we undertook an exploration of modelling an example ubiquitous system called the Savannah game using the approach of bigraphical rewriting systems. This required an unusual intra-disciplinary dialogue between formal computing and human- computer interaction researchers to model systematically four perspectives on Savannah: computational, physical, human and technical. Each perspective in turn drew upon a range of different modelling traditions. For example, the human perspective built upon previous work on proxemics, which uses physical distance as a means to understand interaction. In this talk I hope to show how our model explains observed inconsistencies in Savannah and ex- tend it to resolve these. I will then reflect on the need for intradisciplinary work of this form and the importance of the bigraph diagrammatic form to support this form of engagement. Speaker Biography Tom Rodden Tom Rodden (rodden.info) is a Professor of Interactive Computing at the University of Nottingham. His research brings together a range of human and technical disciplines, technologies and techniques to tackle the human, social, ethical and technical challenges involved in ubiquitous computing and the increasing used of personal data. He leads the Mixed Reality Laboratory (www.mrl.nott.ac.uk) an interdisciplinary research facility that is home of a team of over 40 researchers. He founded and currently co-directs the Horizon Digital Economy Research Institute (www.horizon.ac.uk), a university wide interdisciplinary research centre focusing on ethical use of our growing digital footprint. He has previously directed the EPSRC Equator IRC (www.equator.ac.uk) a national interdisciplinary research collaboration exploring the place of digital interaction in our everyday world. He is a fellow of the British Computer Society and the ACM and was elected to the ACM SIGCHI Academy in 2009 (http://www.sigchi.org/about/awards/).

Community capacity is used to monitor socio-economic development. It is composed of a number of dimensions, which can be measured to understand the possible issues in the implementation of a policy or the outcome of a project targeting a community. Measuring community capacity dimensions is usually expensive and time consuming, requiring locally organised surveys. Therefore, we investigate a technique to estimate them by applying the Random Forests algorithm on secondary open government data. This research focuses on the prediction of measures for two dimensions: sense of community and participation. The most important variables for this prediction were determined. The variables included in the datasets used to train the predictive models complied with two criteria: nationwide availability; sufficiently fine-grained geographic breakdown, i.e. neighbourhood level. The models explained 77% of the sense of community measures and 63% of participation. Due to the low geographic detail of the outcome measures available, further research is required to apply the predictive models to a neighbourhood level. The variables that were found to be more determinant for prediction were only partially in agreement with the factors that, according to the social science literature consulted, are the most influential for sense of community and participation. This finding should be further investigated from a social science perspective, in order to be understood in depth.

Many of the most successful and important systems that impact our lives combine humans, data, and algorithms at Web Scale. These social machines are amalgamations of human and machine intelligence. This seminar will provide an update on SOCIAM, a five year EPSRC Programme Grant that seeks to gain a better understanding of social machines; how they are observed and constituted, how they can be designed and their fate determined. We will review how social machines can be of value to society, organisations and individuals. We will consider the challenges they present to our various disciplines.

Nothing lasts forever. The World Wide Web was an essential part of life for much of humantiy in the early 21st century, but these days few people even remember that it existed. Members of the Web Science research group will present several possible scenarios for how the Web, as we know it, could cease to be. This will be followed by an open discussion about the future we want for the Web and what Web Science should be doing today to help make that future happen, or at least avoid some of the bad ones.

Abstract: There is a lot of hype around the Internet of Things along with talk about 100 billion devices within 10 years time. The promise of innovative new services and efficiency savings is fueling interest in a wide range of potential applications across many sectors including smart homes, healthcare, smart grids, smart cities, retail, and smart industry. However, the current reality is one of fragmentation and data silos. W3C is seeking to fix that by exposing IoT platforms through the Web with shared semantics and data formats as the basis for interoperability. This talk will address the abstractions needed to move from a Web of pages to a Web of things, and introduce the work that is being done on standards and on open source projects for a new breed of Web servers on microcontrollers to cloud based server farms. Speaker Biography -Dave Raggett : Dave has been involved at the heart of web standards since 1992, and part of the W3C Team since 1995. As well as working on standards, he likes to dabble with software, and more recently with IoT hardware. He has participated in a wide range of European research projects on behalf of W3C/ERCIM. He currently focuses on Web payments, and realising the potential for the Web of Things as an evolution from the Web of pages. Dave has a doctorate from the University of Oxford. He is a visiting professor at the University of the West of England, and lives in the UK in a small town near to Bath.

A project to identify metrics for assessing the quality of open data based on the needs of small voluntary sector organisations in the UK and India. For this project we assumed the purpose of open data metrics is to determine the value of a group of open datasets to a defined community of users. We adopted a much more user-centred approach than most open data research using small structured workshops to identify users’ key problems and then working from those problems to understand how open data can help address them and the key attributes of the data if it is to be successful. We then piloted different metrics that might be used to measure the presence of those attributes. The result was six metrics that we assessed for validity, reliability, discrimination, transferability and comparability. This user-centred approach to open data research highlighted some fundamental issues with expanding the use of open data from its enthusiast base.

WAISfest is an opportunity to explore an area of research that isn't part of your day-to-day job, for 3 days. It's kinda like your Google 20% time. At the kick off session, a set of themes will be presented, and you get to choose which group to work with. Then for a few working days, you get to work on this challenge, before presenting what you've achieved at the end.