An Approach to Data Extraction and Visualisation for Wireless Sensor Networks

Ever since descartes introduced planar coordinate systems, visual representations of data have become a widely accepted way of describing scientific phenomena. Modern advances in measurement and instrumentation have required increasingly sophisticated visual representations, to ensure that scientists can quickly and accurately interpret increasingly complex data. Most recently, wireless sensor networks (WSNs) have emerged as a technology which is capable of collecting a vast amount of data over space and time. The sheer volume of the data makes it difficult to be interpreted by humans into meaningful insights. This presents a number of challenges for developers of visualisation techniques which seek to ``map'' the data sensed by a network. Visualisation techniques helps to turn large amounts of raw data into credible visual information such as graphs, charts, or maps, that can assist in understanding of the meaning of that data. In this paper we propose a map as a suitable data visualisation and extraction tool. We aim to develop an in-network distributed information extraction and visualisation service. Such a service would greatly simplify the production of higher-level information-rich representations suitable for informing other network services and the delivery of field information visualisation.

Transcript of "An Approach to Data Extraction and Visualisation for Wireless Sensor Networks"

1.
An Approach to Data Extraction and Visualisation for Wireless Sensor Networks Mohammad Hammoudeh, Robert Newman, Sarah Mount School of Computing and IT University of Wolverhampton Wolverhampton, UK Email: {m.h.h, r.newman, s.mount}@wlv.ac.uk Abstract—Ever since Descartes introduced planar coordinate geographic map to enable observer to view the data for asystems, visual representations of data have become a widely ac- speciﬁc area.cepted way of describing scientiﬁc phenomena. Modern advances This paper is organised as follows. Section 2 discussesin measurement and instrumentation have required increasinglysophisticated visual representations, to ensure that scientists can the related work. Section 3 presents the characteristics of thequickly and accurately interpret increasingly complex data. Most sense data. In Section 4, we discuss visualisation challenges inrecently, wireless sensor networks (WSNs) have emerged as a WSNs. In Section 5 we introduce the beneﬁts of visualisationtechnology which is capable of collecting a vast amount of data of sense data. In Section 6 we discuss the advantages of mapover space and time. The sheer volume of the data makes it dif- data format. We present the implementation of the mappingﬁcult to be interpreted by humans into meaningful insights. Thispresents a number of challenges for developers of visualisation services in Section 7. We also evaluate the performance of thetechniques which seek to “map” the data sensed by a network. proposed mapping service in Section 8. And we conclude theVisualisation techniques helps to turn large amounts of raw data work in Section 9.into credible visual information such as graphs, charts, or maps,that can assist in understanding of the meaning of that data.In this paper we propose a map as a suitable data visualisation II. R ELATED W ORKand extraction tool. We aim to develop an in-network distributedinformation extraction and visualisation service. Such a service Within the WSN ﬁeld, mapping applications found in the lit-would greatly simplify the production of higher-level information- erature are ultimately concerned with the problem of mappingrich representations suitable for informing other network services measurements onto a model of the environment. Hellerstein etand the delivery of ﬁeld information visualisation. al. [2] propose to construct isobar maps in sensor networks. They show how in-network merging of isobars could help re- I. I NTRODUCTION duce the amount of communication. Furthermore, [3] proposes The main objective of a wireless sensor network (WSN) an efﬁcient data-collection scheme, and the building of contouris to provide users with access to the information of interest maps, for event monitoring and network-wide diagnosis, infrom data gathered by spatially distributed sensors. In real- centralised networks. Solutions such as Distributed Mappingworld applications, WSNs are often deployed in a high density have been proposed to the general mapping domain [2]. How-to ensure a full coverage of the monitored phenomena. These ever, many solutions are limited to particular applications andnetworks are expected to generate an excessive amount of data. constrained with unreliable assumptions. The grid alignmentAs the sensor network scales in size, so does the amount of in [2], for example, is one such assumption.sensed data which is required to be collected by the network, In the wider literature, mapping was sought as a useful toolprocessed, and presented to the user. The data produced and in respect to network diagnosis and monitoring [3], powerthe form in which they are structured varies widely. Scientists management [4], and jammed-area detections [5]. For instance,need tools to reconcile these differences. Human interpretation contour maps were found to be an effective solution to theof this data will require extensive use of visualisation tools. pattern matching problem that works for limited resourcesUsing scientiﬁc visualisation techniques can help to make networks [6]. As opposed to resolving these types of isolatedmeaningful visualisation possible in many aspects of data concerns, in the work proposed here the WSN is expected notunderstanding and analysis. Data visualisation is becoming only to produce map type responses to queries but also toincreasingly important in WSNs as it enables end-users to best make use of the data supporting the maps for more effectiveutilise the data collected by the sensor network. routing, further intelligent data aggregation and information In this paper we propose a map as a suitable visualisa- extraction, power scheduling and other network processes.tion and data extraction tool for WSNs. A map is a visual These are examples of speciﬁc instances of the mappingrepresentation of an area, although most commonly used to problem and, as such, motivate the development of a genericdepict geography, maps may represent any space, real or distributed, in-network mapping framework, furthering theimagined, without regard to context or scale such as weather area of research by moving beyond the limitations of thedata distribution mapping [1]. It could be overlaid over a centralised approaches.

2.
III. C HARACTERISTICS OF S ENSE DATA the pattern falls into one of two categories: regular, when data is gathered from sensing nodes which are deployed on a grid; Data acquired from a WSN is imperfect in nature. This irregular, when sensing nodes are deployed randomly, manyimperfect nature of data is due to physical constraints on sensor network data belongs to the irregular patterns category.node deployment and data collection, noisy environment, It is sometimes necessary to know the data density besidesdevice measurement errors, among other factors. Moreover, their distribution. The network density is deﬁned as the numberdata collected by different sensors may have various qualities of nodes per unit area.depending on physical characteristics such as distance fromsensed phenomena, node modality, or noise model of individ- IV. W HY VISUALISATION OF S ENSE DATA ISual sensors [7]. C HALLENGING In densely deployed WSNs, sensor readings are usually Visualisation of sensor network data is challenging. Thehighly correlated in the space domain [8]. Additionally, the large amount of data collected from deployed sensor networksnature of the physical phenomenon contains the temporal arrives in “bursty” mode. This makes it difﬁcult to process allcorrelation between each sensor node successive readings. the data in a timely manner so that it can be used as an inputData gathered from a WSN is often characterised by its to visualisation systems [12]. Furthermore, most scientiﬁcsigniﬁcant redundancy. Many sensor networks are densely visualisation techniques require data to include connectivitydeployed with high node redundancy to deal with node failure information, which is not provided by a scattered data set.based connectivity and coverage problems. However, dense Hence, highly efﬁcient visualisation schemes operating di-deployment causes neighbouring sensor nodes to have highly rectly on raw scattered data are necessary [13].overlapping sensing regions. Consequently, it is likely that In large-scale WSN it is a non-trivial task to visualise themultiple nodes often detect and communicate data packets observed phenomena given the problems of sparse, inaccurate,about common phenomena. This is due to the fact that each high density, and irregularly distributed data, in addition to thenode observes the physical region of overlap independent of its limited physical resources. Therefore, we aim to deﬁne meth-neighbours. Data aggregation techniques aim at reducing re- ods suitable for real-time visualisation and analysis suitabledundant data transmissions. Although data aggregation results for sensor network data. This includes the speciﬁcation of ain fewer transmissions, however, they introduce considerable modular scattered data interpolation package for implementingamount of delays on delivering the data to its ﬁnal destination. suitable interactive viewers for time-varying data.Data from nearer sources may have to be held back at an Practically, a point measured on a surface represents theintermediate node in order to be aggregated with data coming environment conditions over an area of a certain size, hencefrom sources that are farther away. In addition, within such net- it should be possible to generate a complete view of theworks, delays may be caused by hop-by-hop retransmission, observation ﬁeld of a WSN using a discrete set of observationscheduled data communication, queueing delays, propagation points. Depending on the desired degree of accuracy andthrough the environment, and other factors. ﬁdelity, the sampling interval over an unknown surface is In many WSNs application areas, such as medical or deﬁned. The problem is to deﬁne how to adequately representsurveillance applications, the accuracy of acquired data is the observed phenomena by a limited number of elevationoften crucial. However, sensed data is often inaccurate and points, that is, what sampling interval to use with an unknownerroneous [9]. This inaccuracy may be a result of faulty sensor surface? To select between different sampling strategies whenreadings, internal errors in sensor nodes, network delays, data is acquired from a sensor network many factors must beamong other reason. The deployment of a larger number considered including: application, level of accuracy, the natureof sensor nodes provides potential for greater accuracy in of the sampled data, the nature of monitored environment,the information gathered. The ability to effectively increase nodes distribution, and density. Data acquisition strategy is ofthe sensing quality without necessarily increasing data trans- high importance in sensor networks and is directly related tomissions will increase the reliability of the information for routing and single node capabilities. When sampled data is tothe end user application. Some schemes such as [10], trade visualise a certain observed phenomena results are good as thedata accuracy for energy efﬁciency, which typically increase sample [11].with the amount of data transmissions. Data aggregation also In WSNs, it is increasingly becoming important to have aincreases the level of gathered data accuracy and exploits data live picture of the changing environmental variables. This canredundancy to compensate node failures [10], [9]. Depending be achieved if data about the phenomenon is sampled at aon the accuracy bounds required for a speciﬁc application, a suitable rate. In live data representation applications, where anode may need to communicate some of its information to client is interested in the current picture of the environment,the sink that is incorporated in the model so that the accuracy it is important to collect readings from the network at a ratebounds are met. close as possible to the rate of change of the monitored envi- Another characteristic of sampled sense data is the distribu- ronmental variable. In other applications, such as temperaturetion of sampled source data. The distribution of data is usually and pressure monitoring, when the measured phenomena isspeciﬁed in terms of location and pattern [11]. The location changing linearly over time and space, the sampling rate canis speciﬁed in terms of Cartesian coordinates (x, y, z), while be reduced so that node resources are used effectively.

3.
V. B ENEFITS OF S ENSE DATA VISUALISATION of the data attributes in a real-world map in ways that would be intuitive and easy to talk and reason about. A map is Data visualisation in WSNs has the ability to bridge the intuitive and easy to understand as it provides an interface forgap between the physical and logical worlds, by using the visualising dynamic data from sensor net. It provides a higher-gathered information from the physical world and communi- level information-rich representation which was found suitablecating that information to the end-user in compact and often for informing other network services and the delivery of ﬁeldeasy to understand way. Data visualisation helps to deal with information visualisation. This information-rich representationthis ﬂood of information, integrating the human in the data satisﬁes the various requirements for the sensor networkanalysis process. The main advantages of the application of system end users. The map gives a low cost interface usedvisualisation techniques in wireless sensor networks are: to target queries for generating detailed maps from a subset 1) Visualisation could help in managing the huge amount of the sensors in the network. of data coming from a sensor network. Visual data Maps are effective to understand spatial distribution of exploration can easily deal with large, highly non- environmental features, since humans can use their natural homogeneous and noisy amount of data. interpretation capabilities to understand colours, patterns, and 2) Maximisation of useful information return. Visualisation spatial relevance. The human interpretation capabilities sug- is a fundamental tool to communicate information in a gest the importance of expression methods such as how to compact and easy to understand way. It allows the user represent spatial data on a map. Maps can be either static to gain insight into the data, drawing conclusions and or dynamic and allow data representation on two-dimensional directly interacting with the data. Visualisation not only or tree-dimensional space. They allow the user to infer the helps to answer questions that user has, but it elicits actual sizes of and distance between objects. The unique questions that he did not even think of before. visualisation and analysis beneﬁts offered by maps make them 3) Interactive visualisations beneﬁt from dynamic queries more visually communicative, they imply the distributions and which are a valuable tool to explore data. states; provide information about spatial patterns; and imply 4) More reliable information than possible from individ- the association of diverse phenomena. The users can zoom ual sources. Although individual devices have limited in or zoom out respectively meaning showing more or less resources, the true value of the sensor network systems details. Finally, representations such as maps allow to extract comes from the emergent behaviour that arises when information that can not be obtained by looking at sensor data from many places in the system is combined into readings separately and are more efﬁcient to compute in both a meaningful presentation [14]. The bandwidth of data time and energy. For instance, maps may capture trends or transferred in a picture is much bigger than having a correlations among sense data and missing data, where there human look at log ﬁles or textual data. is no operating sensor, can be interpolated using these spatial 5) Detection of higher-order relationships between different and temporal correlations among sensor readings. sensors. Relationships become apparent. Sometimes they are completely hidden without visualisation. VII. M APPING S ERVICES FOR WSN S 6) More efﬁcient data and information representation. Vi- Leading directly from Section VI which shows the use- sualisation reduces analysis and response times. Going fulness of visualising sense data in a map format, we in- through thousands of line of points data is slower than vestigate the development of methods for map construction looking at a few graphs of the same data. It is a valuable and maintenance services within a WSN, focusing on service tool to communicate information in a compact and often cost, complexity, computational load, storage, communication easy to understand way. requirements, robustness to packet loss, nodes failures, and 7) Visual data examination does not require deep under- network density. standing of complex mathematical or statistical algo- We propose a new network service: map generation. Map rithms. Visualisation techniques provide a qualitative generation is essentially a problem of interpolation from sparse overview useful for further quantitative analysis [15]. and irregular points. This service allows the production of It deﬁnitely reduces analysis and response times. maps of arbitrary level of detail upon requests injected into the network by the user, or pre-programmed as responses VI. S ENSE DATA V ISUALISATION : M APS to network events. The service should be entirely based on Visual formats, such as maps, can be easily understood by in-network processing and would be applied to ﬂat, com-people possibly from different communities, thus allow them putationally homogeneous networks. Given a set of knownto derive conclusions based on substantial understanding of data points representing the nodes’ perception of a giventhe available data. This understanding gained from maps fulﬁls measurable parameter of the phenomenon, what is the mostthe ultimate goal of sensor network deployments which is not likely complete and continuous map of that parameter? In theonly to gather the data from the spatially distributed sensor work proposed here the WSN is expected not only to producenodes, but also convey and translate the data for scientists to map type responses to queries but also to make use of theanalyse and study. Visualisation of data collected from a sensor data supporting the maps for more effective routing, furthernetwork in a map format is one way to display the distribution intelligent data aggregation and information extraction, power

4.
instance, if the current cluster-head decides to hand its role to the backup node, it notiﬁes the respective node and forwards to it necessary information, such as the backup nodes list, to avoid a complete cluster set-up phase. The role of In-network Processing module is to process raw data received from various cluster-head nodes in the network. It applies ﬁltering on all the received data to reduce redun- dancy resulting from overlapping cluster coverage. Moreover, the In-network Processing module manages incremental up- date messages and merges them into single transaction. Also, it deﬁnes two interfaces for the Interpolation and Application modules through which it provides access to the cached data in a suitable format. All mapping applications use the Interpolation module as a building block to generate maps. The Interpolation module provides access to the In-network Processing module to obtain the available mapping or update data. In this paper, we use Shepard interpolation algorithm [18]. Shepard interpolation is simple and intuitive. It is suitable for large-scale wireless Figure 1. Architecture of the distributed in-network mapping service. sensor networks because it reduces communication overhead by only considering data points which are signiﬁcant for the interpolation results. Shepard deﬁnes a continuous functionscheduling and other network processes. Just as clustering, where the weighted average of data is inversely proportionalrouting and aggregation allow for more sophisticated and to the distance from the interpolated location. This methodefﬁcient use of the network resources, a mapping service exploits the intuitive sense that things that are close to eachwould support other network services and make many more other are more likely to be similar. Shepard’s expression forapplications possible with little extra effort. globally modelling a surface is: The proposed implementation of the distributed mappingservice contains four modules as seen in Figure 1: Application, ⎧ NInterpolation, In-network Processing and Routing. ⎪ ⎨ (di )−u zi if di = 0 for all Di (u > 0) The Routing module is an essential module responsible for f1 (P ) = (1) ⎪ i=1 ⎩data communication. Routing proved to be a key issue as the zi if di = 0 for some Diresearch developed. In this paper, we use a hierarchical routingalgorithm called MuMHR [16]. The deﬁned routing procedure where di is the standard distance metric from an interpola-builds the hierarchy and establishes the path between sensing tion point P to the point numbered i in the N known pointsnodes and their respective cluster-head to enable data trans- set and zi is the known value at point i. The exponent u ismissions. MuMHR is an improvement over LEACH [17]. It used to control the smoothness of the interpolation. As therelaxes some of the assumptions made by LEACH such as the distance between interpolation location P and the measuredsingle hop communication. The main objective of MuMHR sample point Di increases, the weight of that sampled pointprotocol is to provide substantially energy-efﬁcient and robust will decrease exponentially. As P approaches a data pointcommunication. The energy efﬁciency is achieved by load bal- Di , di tends to zero and the ith terms in both the numeratorancing at two levels: (1) at the network level, which involves and denominator exceeds all bounds while other terms remaintrafﬁc multiplexing over multiple paths; (2) at the cluster level, bounded.introducing rotation of the cluster-heads every given interval of Finally, the Application module contains the user deﬁnedtime. This prevents energy depletion resulting from constantly applications such as path-ﬁnding or isopleths maps. Theusing the same path for transmission or particular nodes being application module also has direct access to the In-networkcluster-heads for a long duration. The multi-path feature is not Processing module to get raw data if required. Figure 1 showonly used for load balancing but also when path failures occur. the mapping service architecture and interaction between itsWhen a path fails, an alternative path can be immediately used four modules.which allows the protocol to dynamically adapt to failureswithout delays or degradation in the quality of service. At the VIII. E XPERIMENTAL E VALUATIONcluster set-up time, one or more nodes are chosen as cluster- The efﬁciency of the proposed mapping service in terms ofhead backup node(s). Backup cluster-head node substitute for the quality of the produced map was demonstrated using thethe cluster-head in some failure cases or when the current following experiment that simulates distributed execution oncluster-head decides to reduce its participation in the protocol a real life data-set. As an example of the quality obtainable,if its energy level approaches a certain threshold value. For and what might be expected from a sensor network the

5.
following maps, derived from a series presented by Clarke andSwaze [19], are presented. The maps generated by the mappingservice are compared to original Fe map taken from [19]. This algorithm was implemented using a mapping API withan in-house simulation software “Dingo” [20] which is a forkof the “SenSor” project [21]. It has proven that it is not onlyeasy to use, but also powerful enough to model and simulatethe behaviour of the mapping service at various design stages.It provides an easy way to develop system models, enablingusers to quickly manipulate hardware elements and achievethe desired results without having to build a full hardwareprototype. Figure 2 is derived directly from [19] and shows distributionof iron minerals around Cuprite, Nevada. This map represents Figure 2. Fe distribution around Cuprite, NV [19].an area 2km on a side, and provides a very detailed accountof the mineral distribution in that area. This image has beenused as the basis for a simulation of the results that mightbe expected from a sensor network, sensing for evidence ofthe same chemicals. Using the Dingo simulator, a networkof sensors were randomly distributed over the 2km square,and the values of the image in Figure 3 used as the outputof each sensing device at that point. The simulated sensornetwork was programmed to produce a map, using the Shepardinterpolation method. The interpolated map of iron mineralsgenerated by the mapping service is clearly of poorer qualitythan the original which could have been obtained using anorbiting image spectrometer. However, the interpolated terrainis clearly similar to the real surface. Determining exactly howclose the similarity is, and what the algorithmic limits to theaccuracy of the representation, is a problem we are currently Figure 3. Interpolated map of Fe distribution around Cuprite, NV produced using 3000 sensor nodes.investigating. Consider, though, that the information used toreconstruct the surface in Figures 2 is just 3000 points, whilethe original is recorded by the satellite spectroscopy which is simple to develop, and challenges that need to be met beforea hard target to hit. The satellite spatial resolution is about such a service is feasible. Finally, we examine the applicability17 meters pixel spacing. Taking the position of the nodes into of Shepard interpolation in the reconstruction of a parameteraccount as extra information, the reconstruction is built using map from sparsely sampled data.less than 3% of the original data. This paper is not the result of a completed project, but To highlight this potential use of an in-network mapping the exposition of the start of one. We feel that this area ofservice, we present an implementation of isopleths generation. research is pertinent to modern wireless sensor networks, andThe results of generating isopleths based on this data are in this paper we have taken initial steps towards exploring it.shown in Figures 4 and 5. These contours were generated The problems and challenges described in Section 4 are thein the 0.4 to 1.2 micron spectral region and a threshold of 1. opportunities we intend to take and the lines we intend toCompared to the isopleths based on the actual Fe distribution follow.map shown in Figure 4, it is visually evident that the isoplethsbased on the interpolated Fe distribution map, Figure 5, are R EFERENCESvisually similar to those in Figure 4. [1] Wikipedia, “Map,” 2007, [Online; accessed 26-November-2007]. [Online]. Available: http://en.wikipedia.org/wiki/Map IX. CONCLUSION AND FUTURE WORK [2] D. Estrin, “Reﬂections on wireless sensing systems: From ecosystems to human systems,” in Radio and Wireless Symposium, 2007, pp. 1 – 4. In this paper we show that visualisation of sense data [3] X. Meng, T. Nandagopal, L. Li, and S. Lu, “Contour maps: monitoringgathered from networks of wireless sensors is a challenging and diagnosis in sensor networks,” Comput. Netw., vol. 50, no. 15, pp.problem in several regards. We discuss these challenges and 2820–2838, 2006. [4] R. Tynan, G. O’Hare, D. Marsh, and D. O’Kane, “Interpolation forpropose a suitable data extraction and visualisation framework. wireless sensor network power management,” in International WorkshopWe also explain why a map is a suitable data presentation on Wireless and Sensor Networks (WSNET-05). IEEE Press, June 2005.format and propose an implementation of the mapping service [5] A. D. Wood, J. A. Stankovic, and S. H. Son, “Jam: A jammed-area mapping service for sensor networks,” in RTSS ’03: Proceedings of thefor wireless sensor networks. We have identiﬁed applications, 24th IEEE International Real-Time Systems Symposium. Washington,such as isopleths generation, that the service would make DC, USA: IEEE Computer Society, 2003, p. 286.