InfoSci®-Journals Annual Subscription Price for New Customers: As Low As US$ 4,950

This collection of over 175 e-journals offers unlimited access to highly-cited, forward-thinking content in full-text PDF and XML with no DRM. There are no platform or maintenance fees and a guarantee of no more than 5% increase annually.

Receive the complimentary e-books for the first, second, and third editions with the purchase of the Encyclopedia of Information Science and Technology, Fourth Edition e-book. Plus, take 20% off when purchasing directly through IGI Global's Online Bookstore.

Abstract

In this chapter we propose a framework for collecting, organizing into a database and querying information in social networks by the specification of content-based, geographic and temporal conditions to the aim of detecting periodic and aperiodic events. Our proposal could be a basis for developing context aware services. For example to identify the streets and their rush hours by analyzing the messages in social media periodically sent by queuing drivers and to report these critical spatio-temporal situations to help other drivers to plan alternative routes. Specifically, we rely on a focused crawler to periodically collect messages in social networks related with the contents of interest, and on an original geo-temporal clustering algorithm in order to explore the geo-temporal distribution of the messages. The clustering algorithm can be customized so as to identify aperiodic and periodic events at global or local scale based on the specification of geographic and temporal query conditions.

Introduction

The widespread use of social media by smartphones and other smart devices with GPS sensors is fostering a novel era of spatio-temporal computing applications.

The US red cross has reported that the US citizens are increasingly relying on social media and mobile devices to get information on ongoing critical situations, such as traffic jams, spread of pandemies, and to seek assistance and safety information as well as to report their health and safety status during or after emergencies (Adam et al., 2012).

New location-based services and context-aware services can exploit social information sources for the most diverse applications in smart cities context, such as leisure recommendation, healthcare and safety, disaster management, critical periodic crisis identification, and so on.

Such services can exploit the information contents provided by users of social media, which can be in the form of free text, pictures and video, coupled with the timestamp and the geolocation as acquired by the GPS sensor of their device, to identify events occurred in specific regions at specific dates, and, depending on the kind of the event, to target either planning or alert responses.

In this chapter, we propose a framework for events’ exploration based on querying a collection of social media messages collected by a focused crawler and organized into a database. The queries allow to specify spatio-temporal conditions to the aim of filtering and then analyzing periodic and aperiodic events at global or local scale. Our proposal could be a basis for developing context aware services. For example, to identify the streets and their rush hours by analyzing the messages in social media periodically sent by queuing drivers and to report these critical spatio-temporal situations to help other drivers to plan alternative routes.

Specifically, we rely on a focused crawler to periodically collect messages in social networks related to contents that may be of interest to categories of users, such as urban planners, who might need to know the streets in their city where most often traffic jams occur, territorial administrators and managers, who might need to identify in their territory of competence the periods when critical situation like floods usually occur, or cultural operators, who might be interested in planning the tour of musicians in specific regions and periods of the year more suitable to attract much audience.

In order to explore the collected messages, a query framework is proposed, consisting of two subsequent phases in order to allow the user to drive the exploration whose aim is to verify some “a priori” hypothesis he/she has on an event of interest.

In the first phase, the user can formulate queries specifying content, spatial and temporal conditions on the textual indexes and metadata parts of the messages, i.e., their geotags and timestamps, in order to filter out a subset of interesting messages from the collection. For example, an urban planner of “Bangkok” may want to select messages dealing with “traffic jams” in his/her city.

In the second phase, the user can specify some criteria in order to drive an original geo-temporal analysis of the selected messages to verify an “a priori” hypothesis. For example, the urban planner may hypothesize that traffic jams in “Bangkok” occur periodically in specific hours of the day, but may lack to know the streets and the hours of traffic jams. Thus, his/her criteria of analysis could be to consider a periodic geo-temporal distance so as to identify the most crowded hours of the day in each specific street.

An Italian cultural operator, in order to plan the tour of a musician, may be interested in selecting messages dealing with music events in Italy and in analyzing in which city in Italy and in which season the events had more resonance. In this case, he/she may formulate a query selecting messages dealing with entertainment events in Italy and then ask to group them in order to identify the cities and months where and when the most popular music events took place.

The grouping is performed by a geo-temporal clustering algorithm that can be flexibly customized to use a specific geo-temporal distance measure in order to identify aperiodic or periodic events at global or local scale.

A proof of concept framework has been implemented for Twitter social media and, through its usage, some examples of geo-temporal analysis are available.

First of all, the chapter introduces the context and the background of the approach; then, it outlines the schema of the proposed exploratory workflow. Subsequently, the three phases are formalized and, finally, examples of application of the exploratory workflow to the Twitter information source are illustrated. The conclusion summarizes the main achievements.

Key Terms in this Chapter

Focused Crawler: A web crawler that visits the Web pages on the Internet and fetches only those that deal with specific topics of interest.

Content, Spatial, and Temporal Query Conditions: Content conditions are generally specified by either terms in natural language or keyword categories and define the topics that the items of interest must deal with; spatial conditions, and geographic conditions, specify the (geographic) area of interest where the items must be located; temporal conditions specify the desired timestamp or time range of creation of the items of interest.

Social media: A generic term identifying online applications and practices that people adopt to share text, images, video and audio on the Web.

Spatial Index: A type of extended index that allows to index a spatial attribute of an item in a spatial database so as to optimize the access by a spatial query.

Geo-Temporal Clustering: A clustering process that automatically partition a set of items based on the similarity of both their geographic attribute values and their timestamps.

Cluster: It identifies a set of items sharing some common properties and identified based on an unsupervised algorithm; also known as: group, container.

Clustering: An unsupervised machine learning technique capable to automatically partition a set of items, described by a set of features, into disjoint groups, clusters. Also known as unsupervised learning mechanism, data mining technique.

Query: A request for information submitted by a user to a database in order to retrieve items of interest, i.e., records or documents. It consists of selection conditions that the items in the database must satisfy in order to be retrieved and judged relevant to the user request.

Geo-Temporal Analysis: A process that allows to characterize subsets of items in a geographic database, for example events, with respect to the similarity of their location and timestamp.

Periodic and Aperiodic Event: An event is an something occurring in a specific time or spam of time, possibly in a given locality or geographic area, interesting or involving many people. A periodic event is characterized by a regular time occurrence, for example each day at the same hour, each month, season, every 10 years, and so on. An aperiodic event can happen just once or may occur several times without a specific time regularity.