collaborative sensor data management

Monthly Archives: August 2017

Smartphones have become increasingly successful in monitoring everything The data collection of a smartphone provides sensor data that was unimaginable just months ago. All Smart Phones have sensors in the form of a camera, microphone, GPS. gyroscope, accelerometer, compass, proximity sensors, and it does not end there. Every day apps use sensors to collect data as well.

Sensor Data

Sensor data is the output of a device that detects and responds to some type of input from the physical environment. The output may be used to provide information or input to another system or to guide a process. Like a template for data.

In the early days of sensors, a pattern would have to be recognized in order to cause the sensor to activate a process. For instance an elevator door that opens immediately as soon as it senses an object in the way of the closing door.. An oil light in your car when you are low on oil is another type of sensor.

With nanotechnology comes smaller sensors. Smarter sensors. Sensors that are able to collect data on just about everything. So not only will a heart sensor let a doctor know when the heart is beating, but data collected from sensors can help predict when it may stop.

Just like the elevator door that stops closing because its sensors detected a person while the doors were closing, that same sensor can now measure how many people rode the elevator, wirelessly diagnose itself of trouble and send data to the elevator maintenance company to schedule service.

How many sensors does a person deal with on an average day? Think about it.

Your smartphone is a sensor network. Depending on what apps you have installed on your smartphone, that phone is seeing, hearing, and reporting on its surroundings. But to whom?

We know a big search engine company that uses data from its phones to gain semantic language skills. Did you ever use Google Voice? When someone left a voicemail, Google would attempt to transcribe it into a text. When the user played the voice mail, Google would highlight the words in real time to indicate that it did not understand. So you corrected it and taught Google how to learn the human language in different tones, timbers and accents.

The little sensor in your phone called a microphone got what it needed.

Sensors have helped us get out of jams in the past.

But what do we do with this data? Throw it back in the cloud and use it only when we needed it? There are many uses for this data including but limited to health and surveillance.

In 2012 Dilshan Silva proposed a place where sensor analytics could be stored and studied collection of data from sensors and also a place to collaborate, share and even make new friends.

What are the Features of WikiSensing for Sensor Data Management

Earlier I have you the easy definition and metaphors for sensor data management. So now lets get get started.

Virtual Sensors – What are they.

A virtual sensor is a sensor that is not physically deployed at a certain location but uses data streams of nearby located sensors to obtain sensor reading. Virtual sensors can be implemented by selecting a set of contributing sensor data streams, either by using the web interface or the application services. These aggregations are linear operations that produce a single value or a data stream. This is an extremely useful feature that provides sensor readings in the case where no physical sensors are present at a specific location and even where combining a set of high quality sensors can lead to a higher accuracy reading.

The existing sensor data is used to create this conceptual item of a “virtual sensor”. This would require the knowledge and experience of the collaborating users for example, the knowledge of geographical locations, or the reliability factors of the sensor devices. The knowledge of the collaborating sources are used to annotate sensors so that they can be combined to create aggregated virtual sensor in a logical manner. Virtual sensors are most useful in cases where a user requests a sensor reading (Temperature or pollution level) where a physical sensor is not deployed as well as in situations where a low quality sensor is physically deployed, but the aggregation of a set of high quality sensors that are located nearby may lead to a more accurate reading when finding unknown caller.

Figure 3 illustrates a scenario where several physical sensor devices are combined to create a virtual sensor. During stage 1 of this process the collaborating users are involved in annotating the sensor streams with geographical information and sensor meta-data such as reliability, precision and accuracy. This information is recorded in the wiki. Stage 2 involves the users selecting the physical sensors that would contribute to the virtual sensor. The readings of the virtual sensors can either be persistent or calculated dynamically. The query involved in aggregating the data streams for the virtual sensor can be updated to increase or reduce the scope of the sensor or modify the valid window size.

Figure 3. Collaborating sensors to create virtual sensors.

The components that deal with the creation of the virtual sensors are contained in the data management module. The functionality spans from querying for deployed sensors, registering a virtual sensor, to selecting the contributing sensors. The API web services component is used to connect the deployed sensors to acquire the sensor reading into the system. These readings are processed and stored in the database via the data access component.

4.2. API Web Services

WikiSensing supports a list of API web services that can be used by external platforms to automatically connect sensor devices to the system. These services can be categorized as inputs such as submitting data, or as queries that produces an output. The services to create environments, data streams, triggers, smartphones and add data points to existing streams submits data to the system. The querying services include listing user environment, outputting sensor streams in formats such as XML or JSON and obtaining minimum, maximum and current values of a data stream.

To access the web services the user is required to obtain a reference to the API. The following example code snippet written in C# illustrates obtaining a WikiSensing service reference.

The API web services are written into the web services component which resides in the application serves and communicates with the database layer through first the data management and then the data access component.

4.3. Collaboration

The success of a collaborative system is centered on the usability and the organization of the information. In order for users to collaborate, the system must contain the required structure to enable sharing of knowledge and information. The aim is to use a Wiki infrastructure and the challenge is to make it successful for collaboration of sensor information.

WikiSensing has enabled collaboration through the use of a combination of Wiki and web pages that enables users to add annotations, comments and sensor meta-data to the sensor information. The collaboration layer sits on top of the Data management layer as depicted in Figure 4. For example, the users can annotate and comment on the sensor environments, sensor meta-data and the data streams that are managed in the sensor data management layer.

Figure 4. The WikiSensing Information Layers.

There are two interfaces that an online collaborator can use to provide information. The first interface is the Wiki pages that are automatically created for each sensor and its deployed environment where the users can provide annotations and comments on this mobile data. Secondly the WikiSensing web pages that directly correspond to the information stored in the underlying database. These web pages, for example, are used to register new sensors, create virtual sensors, add or update sensor meta-information.Online collaboration is enabled in the form of Wiki pages on the client that are hosted using a media wiki deployed in the application layer. These wiki pages contain the information that the uses add or update. All updates are logged and access to the Wiki articles is controlled by the user management component.

4.4. Trust and Conflict Management

Trust management in WikiSensing is based on a rating scheme that is calculated using the following information:

The meta-data of the sensors to calculate the sensor reliability rating.

Aggregated reading of nearby sensors to calculate the distance of the sensor readings.

User credentials and contributions to calculate the user reliability rating.

The information to calculate the sensor and user reliability rating is taken from the sensor meta-data and user information. For example, a sensor with properties such as a good accuracy and precision would have a higher reliability rating than a weaker sensor. A standard method is initially set to define the sensor properties as illustrated in the Wiki page for sensor meta-data in Figure 9. This page contains a list of attributes such as accuracy, sensitivity that can be used as metrics to calculate this rating. Hence a set of general rules are used to evaluate this information in order to obtain a standardized rating that is used throughout the system. The ratings are in the scale of 1 to 10. It is common practice to update (recalculate) these ratings with the addition of new information. Once requested by the users these ratings are reordered in the corresponding Wikipages. For example, the sensor reliability rating is listed in the wiki page that contains the meta-data of a sensor.

The distance of the sensor readings is calculated using the data streams stored in the system. For instance, if the users want to check the trustworthiness of a particular data stream, they can do this by aggregating several nearby data streams and obtaining the difference between the aggregated reading and the actual sensor reading. The difference of the readings and the list of selected sensors are recorded in the Wiki page of the relevant sensor.

These ratings and values (Figure 5) can be compared by the users to assess the trustworthiness of the information. The ratings can then be further used to manage conflict that exist between data streams or conflict between user annotations.

he Manage collaboration component includes the sub modules that contain the operations to assess trustworthiness and manage conflicts. For example, the sensor reliability rating is calculated by obtaining the sensor meta-data. Once the information is acquired and the reliability rating is calculated, it is stored with the sensor data. This information is then automatically recorded into the Media Wiki by the application logic framework.

5. Case Studies

In this section we present four different case studies that illustrate the use and the functionality of the system. The first case study describes how information is organized in WikiSensing. The second shows how multiple sensor data streams are aggregated. The third scenario is focused on creating virtual sensors using the system, and the last case study demonstrates the manner in which trustworthiness is assessed in WikiSensing.

5.1. Case Study 1: Organizing Sensor Information

Stage 1: Registering an Environment for a sensor in the system

The first mandatory step for registering sensors is to create an environment that the sensor is deployed in. This information (Table 1) includes location descriptions, for example, name of the city, street and country as well as geographical coordinates which are the longitude and latitude that can be selected using the provided Google map.

The user is also encouraged to provide a feed description that contains the type of sensor, for instance, type GUSTO [26] sensors. User can create private feeds, which is only be visible to them as well as public feeds, which are accessible by the other users of WikiSensing.

Stage 2: Registering the data streams of a sensor

A sensor in an environment can measure several attributes and produce multiple data streams (Figure 6), for example, a sensor with data streams that measures NO, NO2, SO2 and ozone (GUSTO sensors). The data streams are representations of a physical or virtual sensor that is deployed at a particular location.

Once an environment (deployed sensor) has been defined and a data stream (a sensor can have multiple data streams) is attached to it, data points can be added. The data stream information can be viewed graphically as illustrated by Figure 7.

The measurement units for a data stream can either be selected form a predefined list or can be explicitly specified by a user. When defining a new unit of measurement the user is required to provide a conversion function to a base unit.

A wiki page is created automatically and the provided information is recorded (Figure 8). This page contains a description of the sensor environment followed by the sensor details and information of the sensor data streams. The system also automatically links the environment with a page that contains the relevant sensor meta-information (Figure 9). This Wiki page lists the sensor meta-data and the features of the sensor that can be updated by collaborating users. The user is able to create a new sensor meta-data Wiki page in the case where a corresponding page does not exist. These Wiki pages are automatically updated when the corresponding information on the system are modified by the user.

Once the location information is provided the user can then connect the sensor data streams to the system via the web service layer. This is done by obtaining a web service reference of the WikiSensing web service through any programming platform.

The Wiki page displayed in Figure 9 shows the inclusion of referencing the information added to the page by the user. In this example the user annotates a GUSTO (Generic Ultraviolet Sensor Technologies and Observations) sensor by referencing a research paper [26].

Stage 3: Query sensor the data streams

The following is a sample query that averages the readings of a single sensor for a window size of 1 hour. The construct WIKI_WINDOW indicates a time window for the sensor readings specified in hours which selects the readings within an hour prior to the execution time.

SELECT Average (Value)

FROM Environment, Datastream, DataPoint

WHERE environmentId = ‘14’

AND streamType = ‘NO2’

WIKI_WINDOW 1h

Stage 4: Registering a sensor network in the system

A sensor network is a group of (usually homogeneous) sensors deployed at multiple locations providing data streams that can be aggregated to obtain a set of combined sensor readings.

Creating a sensor network involves two main steps. First registering the sensor network details that are listed in Table 2. The Sensor Network references the sensor environment through the Sensor Network Id. A Wiki page is automatically created for every sensor network that gets created.

Stage 5: Registering sensors to a sensor network

Firstly the user has to create the set of sensors individually by repeating the steps (1 to 6) of case study 1 specifying the sensor network id. This links the sensors with the sensor network. The relevant sensor network Wiki page is then be updated with this information.

Stage 6: Query sensor data in a sensor network

The following sample query aggregates a set of sensors that belong to the same sensor network.

SELECT Average (Value)

FROM Environment, Datastream, DataPoint

WHERE sensorNetwork = ‘SN-1’

AND streamType = ‘NO2’

WIKI_WINDOW 1h

Figure 6. The multiplicity between the environment and its data streams.

Figure 7. WikiSensing graphical view of sensor data streams.

Figure 8. Wiki pages that record the sensor and data stream information.

Table 1. The list of fields involved in registering sensors in WikiSensing.

Table 2. The list of fields to register a sensors network.

5.2. Case Study 2: The Aggregation of Multiple Data Streams

Stage 1: View sensor data streams

When the users log in to WikiSensing they are able to view a list of sensors or sensor networks that were created by themselves as well as all the public sensors and sensor networks.

When the user, for example, requires obtaining the average temperature reading of South Kensington, London the relevant sensor data streams are aggregated to produce the output. Importantly, the system checks if the data streams are compatible for aggregation. If compatible they must then be checked for other disparities as data streams produced by different sensor devices may have different characteristics, for instance different output frequencies or different units of measurements.

Stage 2: Convert to a single unit of measurement

Firstly, if the units of measurements are different, WikiSensing automatically converts the values of the data streams to the unit of measure that is used by the majority of the data streams. If there are the same numbers of data streams with different units the system would then use a default unit of measurements. These rules are suppressed if the user specifies a unit of measurements in the query using the WIKI_UOM construct.

Stage 3: Sample different frequencies of data streams

There are two policies to handle disparity of frequency among data streams. The first policy samples the time frames of the data stream to fit the stream with the largest time interval. Table 3 illustrates this by combining the first streams readings at 10:27:30 and 10:28:0 to a single time frame of 10:28:0 so that it can be accurately mapped with the frequencies of the second data streams. This policy is applied when the user explicitly specifies the WIKI_SAMPLE_STREAM construct in the query.

The second, or default, policy where the user does not specify any construct in the query individually averages the data streams disregarding the differences of the frequencies.

Stage 4: Aggregate Queries

The following query outputs the average temperature reading at South Kensington, London. The WIKI_PROPORTION construct is used to indicate that the aggregated values must be based on the weighted mean of the specified attributes. The WIKI_LOCATION construct select records with in a location specified or the geographical coordinates. This query can be further extended using the WIKI_RADIUS construct that selects records within a radius (specified in meters) to the specified location or coordinates. The WIKI_SAMPLE_STREAM construct samples the data streams to match the stream with the largest frequency (Table 3).

SELECT Average (Value)

FROM Environment, Datastream, DataPoint

WHERE sensorType = “Temperature’

WIKI_LOCATION (Coordinates OR ‘South Kensington London’)

WIKI_RADIUS 10

WIKI_WINDOW 1 h

WIKI_UOM Celsius

WIKI_PROPORTION ON (DISTANCE, TIME)

WIKI_SAMPLE_STREAM

The user has the option to specify this query as continuous query with the construct WIKI_CONTINUE_FOR <time interval in hours>. This enforces the query to continuously run for the specified time period.

The queries explained so far are fetched data from two sources namely the meta-data of the sensor and their geographic information from a relational data base and the sensor reading from relational free data base.

Table 3. The sampling of the frequency of multiple data streams.

5.3. Case Study 3: Creating a Virtual Sensor

Virtual sensors can be created when there is no physical sensor deployed at a specific location. This is useful when users require the aggregation of several data streams to be persistent.

Stage 1: The search phase

The users can either view the WikiSensing map or query to check the locations of the physically deployed sensors. Figure 10 illustrates an instance of the WikiSensing map followed by an example query that would select certain sensors in a specific location.

SELECT EnvId, StreamId

FROM Environment, Datastream

WHERE sensorType = ‘SO2’

WIKI_LOCATION = Coordinates OR Location_name

Stage 2: Registering the details of a new virtual sensor

If the user requires a sensor reading from a particular location where a sensor is not physically deployed the user can create a virtual sensor in the system specifying its geographical details similar to registering a physical sensor described in use case 1 with the exception that the domain field must be set as ‘Virtual’. In addition users can specify the Virtual Sensor Reading to be either persistent or dynamic.

There are two categories of virtual sensors, the persistent virtual sensors which store the aggregated readings and the virtual sensors that generate readings dynamically. The readings of persistent virtual sensors can be traced for the origins of the contributing sensor data streams. For example, in the case where a doubt exists about a virtual sensor reading, this can be audited as the readings are recorded. In contrast dynamic virtual sensors produce their reading on request, and their output is generated by aggregating the data streams in real time.

Stage 3: Select and record the contributing sensors

The user can select nearby sensors that contribute to the newly created virtual sensor (Figure 11). The user has the option to add more sensors or remove any existing contributing sensors from the virtual sensor.

The sensors that contribute to a virtual sensor are recorded in a virtual sensor map table, whose fields are listed in Table 4. The optimize column is updated when the user explicitly requests the selected contributing sensors list to be optimized. The system updates this column with persistent virtual sensor identities that are already created using a subset of the selected sensors. The aim is to reduce the database reads using existing virtual sensor data streams that are already formulated.

The system provides an aggregated sensor reading (of the contributing sensors) as the reading for the virtual sensor. The following query is an example that aggregates readings for a virtual sensor.

SELECT Average (Value)

FROM Environment, Datastream, DataPoint

WHERE EnvironmentIdentity IN {The set of contributing sensor environments}

WHERE sensorType = ‘Temperature’

WIKI_RADIUS 10

WIKI_WINDOW 1

WIKI_UOM Celsius

WIKI_PROPORTION ON (DISTANCE, TIME)

WIKI_SAMPLE_STREAM

In accordance to Figure 11 steams S1 and S2 are selected to contribute towards the virtual sensor. If the construct WIKI_PROPORTION ON is set to the distance and time the data streams are averaged based on the weighted mean on each of these attributes. This is calculated using the following formula:

X¯¯¯w=∑wx∑w

where X̄w is the weighted arithmetic mean, x stands for values of the items and w is the weight of the item.

The aggregation query that is responsible for obtaining virtual sensor readings is stored in the virtual sensor query table (Table 5). Users are able to update and save (validated before saving) these queries.

When the user completes the registration of a virtual sensor a Wiki page is automatically created and the information recorded (Figure 13). The Wiki page gets automatically updated when a user modifies the composition of the virtual sensor.

Figure 10. The WikiSensing map illustrating the deployment of sensors.

WikiSensing is accessible for any online use and in most cases the sensor data would need to be assessed for trustworthiness. The information that needs to be assessed is the sensor data streams and the annotations provided by the collaborating users. The trustworthiness of this information can be used in managing conflicts between different sources.

There are various public sensor data streams available in the system. The users can view these data streams and their annotations with the purpose of understanding the information or with the intention of using it in their own analysis.

Stage 1: Identifying the information

When a user concentrates on a specific location to obtain a temperature sensor reading, identifies that there are two sensors deployed at that same location measuring the same attribute. The two sensors are providing conflicting readings of the temperature described in the following example.

Temperature sensor 1 deployed at south Kensington station: 29 Celsius

Temperature sensor 2 deployed at south Kensington station: 21 Celsius

The user also discovers that there are two annotations provided by separate users on a data stream that contradict each other.

User 1: time stamp: 11/4/2012, the sensor ID 25 does not produce an accurate outcome of the temperature as it is located near a functioning refrigerator.

User 2: time stamp: 10/4/2012, the sensor ID 25 is the primary sensor to obtain temperature readings for the William Penney building at Imperial College London.

Stage 2: Calculate credibility of information

The user now has to make a decision in selecting a specific data stream of the two sensors as well as to know which annotation is valid. Hence the user can select the functionality for checking the credibility of sensors from the system.

When the check credibility functionality is executed the following ratings are automatically calculated.

Use the meta-data of the sensors to cross reference the capabilities of the sensors. In this case the sensor meta-information such as the sensitivity and the accuracy data are used to decide on which sensor is superior. This information is used to calculate the sensor reliability rating that can be used to compare different sensors.

Aggregated readings of nearby sensors are obtained using the sensor readings of nearby sensors. This is compared with the readings of the sensors that need to be assessed for trustworthiness or to resolve a conflict. The output is the distance of the sensor readings.

To resolve conflicts between annotations provided by different users the system takes in to account the previous updates committed by those users as well as the background information such as qualifications and experience and calculate a user reliability rating.

Stage 3: The comparison

The sensor reliability rating, distance of the sensor readings and the user reliability rating are used to create a single rating known as the Credibility Rating. This value is calculated by averaging the values obtained by the previous stage (Table 6). Users can compare these ratings to assess the trustworthiness of the information. In the case of managing conflicts these rating are compared with the sources that conflict each other.

Stage 4: The policy

Data streams, users and user annotations that are assessed through this process are annotated with this rating to help future users obtain a better understanding of the trustworthiness of this information.

The policy is that the credibility rating is a reflection on the trustworthiness of the information and therefore can be used to manage conflicts.

Stage 5: Log information in Wiki

This information is recorded in Wiki pages in order to obtain a trace or log of the type of method followed and the information that was used to assess the trustworthiness and to manage conflicts.

Table 6. Calculate the credibility rating.

6. Experimental Evaluation

The experimental evaluation is designed to understand the attributes that affect the performance of the virtual sensors. The evaluation is based on different strategies that can be followed for aggregation queries and the storage for virtual sensor readings. The goal is to have an efficient methodology leading towards quicker responses to end users.

6.1. Improving the Performance of Aggregate Queries

We present two scenarios to demonstrate the methodology used by WikiSensing to improve the performance of aggregate queries. The performance is based on the response time of the queries and the improvement of the response time is a reflection of the decrease of the number of data base reads. The aim here is to identify strategies that reduce the number of database reads. A virtual sensor is an aggregation of one or more sensor data streams. The aggregate function takes a set of data streams and produces a single value that summarizes the information contained in the selected data streams [27]. In the case of virtual sensors that are persistent, it can record the results of the aggregation in the database.

Consider a scenario where a virtual sensor is already created using a set of sensors (Virtual sensor 1 in Figure 14(a)). A naïve strategy and the WikiSensing strategy are analyzed when the requirement for a second virtual sensor (Virtual sensor 2) arises. Firstly a naïve strategy creates the new virtual sensor by including all the required contributing data streams in the aggregate query (Figure 14(a)). This would not consider the fact that the fully overlapping virtual sensor 1 is a complete subset of virtual sensor 2. In contrast WikiSensing takes this fact in to account and create virtual sensor 2 by using the information in virtual sensor 1 (Figure 14(b)).

As the information of Virtual sensor 1 is persistent and cached the time involved in obtaining the result is expected to be less than a single database read. The aim of this strategy is to use existing persistent virtual sensors that are subsets of the newly created virtual sensor in order to reduce the number of data base reads. The trade-off using this strategy is the extra cost of storing the sensor readings. Hence it is important to identify the situations where persistent storage is suitable.

6.1.2. Scenario 2: Aggregate Sensor Data Streams to Create Virtual Sensors that do not Fully with Overlap Other Virtual Sensors

Figure 15 depicts the requirement of a new sensor when the contributing streams do not fully overlap an existing virtual sensor (Virtual sensor 1). While a naive strategy would create new virtual sensor with all contributing sensors from scratch, WikiSensing uses the existing virtual sensor 1 and combine it with the other exclusive sensor streams. Similar to the first scenario, the readings of virtual sensor 1 can be taken from the cache and the rest of the reading can be fetched form the database.

6.2. Experimental Setup and Benchmark

The version of the WikiSensing system that is used for the experiment is implemented as a complete working system hosted on an IIS server running on a Windows server 2008 virtual machine in the IC-Cloud platform [28]. Test emulator that implements the Siege benchmark [29] is used to send requests and runs in another Linux Centos 5.4 virtual machine in the IC-Cloud. Siege is a regression testing and benchmarking utility that measures the performance of web applications and services.

The workload of the application tested obtains readings from physical sensors and virtual sensors that were created from a set of sensor data streams. The test emulator is run for a specific period of time and continuously generates a sequence of interactions that are initiated by multiple active sessions. After an interaction is completed, the emulator waits for a random interval before initiating the next interaction to simulate user’s thinking time. Each experimental trial session is carried out for 300 seconds and three separate experiments are carried out. We are testing the performance by obtaining random readings from sensor data streams.

The first experiment measures the response times of a physical sensor by increasing the number of users accessing it. We use window sizes of 10 and 1,000 for a maximum of 1,000 simulated users.

The second experiment involves a single client accessing virtual sensor readings. This is further divided into 2 trials where we test with a window sizes 10 and 1,000 sensor readings. Each trial is tested with different workloads that are the naïve approach and the WikiSensing strategies based on a 100%, 80%, 50% and 20% overlap of sensors.

The third experiment has the same parameters as the previous one, except the fact that it is tested using multiple simulated users with active sessions. The first trial simulates 100 clients concurrently accessing the system with the gradual increase of the contributing sensors. The second trial gradually increases the number of clients that access a virtual sensor created with 50 sensor data streams.

The test emulator based on the Siege benchmark outputs the response time for each experimental scenario. The emulator makes an HTTP request for a web page that invokes a web service function. The response time is calculated from the start of the invocation till the function returns a value and is loaded into the web page. The time for each execution is summed and averaged to obtain uniform reading.

6.2.1. Experiment 1: Measure Response Time of a Physical Sensors Accessed by an Increasing Number of Clients

We test the response time of obtaining readings form a physical sensors with the increase of the number of users. This results in increasing the number of concurrent users that access a single sensor stream with a window size of 10 and 1,000.

The number of concurrent clients are increased from 250 to 1,000. The response time R(t) has a dependency on the number of concurrent users (X) and the window size (Y), R(t) = f(X,Y) according to the graph (Figure 16).

Figure 16. Response times for querying a single physical sensor by increasing the number of clients.

6.2.2. Experiment 2. Measuring Response Time of Virtual Sensors Accessed by a Single Client with Respect to the Increase of the Contributing Sensor Data Streams

We measure the response time for obtaining an aggregate reading of a virtual sensor with respect to the increase of the number of contributing sensors. The aggregate reading is a combined or averaged single value of the contributing sequential data streams. It tests a single client accessing the virtual sensors reading by gradually increasing the number of contributing sensors from 10 to 140. The different workloads are the naïve approach where all records are fetched from the database, 100% overlapping where the information is picked form the server cache and 80%, 50% and 20% overlapping where the data is fetched directly from the database.

Virtual sensor readings are cached when the user makes a request for that sensor. If the data is not cached it is then fetched from the database. Overlapping is dealt with in WikiSensing as illustrated in Figure 15(b). For example, if the overlapping is 80% for a virtual sensor it obtains the overlapped portion using a single database read (or directly from the cache if the information is cached) and gets the rest (20%) of the reading from the other data streams.

We have used 2 Trials with windows sizes 10 (Figure 17(a)) and 1,000 (Figure 17(b)). The aim of changing the window size is to alternate the amount of sensors reading that are selected for an aggregate query. For instance, a window size of 10 selects the 10 most up-to-date sensor readings for the aggregate query.

Figure 17. Comparing the response times for querying a single virtual sensor (a) With a window size of 10. (b) With a window size of 1,000.

The response times for both the scenarios with a 100% overlap (fetched from the database and the cache) were constant throughout the experiment and returned response times of 30 and 10 milliseconds. With a window size 10, the response time of a single virtual sensor is in the range of 60 to 20 milliseconds for the naïve, 80%, 50% and 20% overlapping workloads. The performance for a single virtual sensor when used with window size of 1,000 is in the time span of 110 to 30 milliseconds for the respective workloads.

The response time for the virtual sensors readings R(t) has a dependency on the number of contributing sensors (X) and the window size (Y), R(t) = f(X,Y). When comparing the results of the two window sizes the different strategies have responded in similar fashion. The main difference here is the response time increases when using a window size of 1,000. The response time of the 50% overlapped workload at 140 sensors (window size 10) is 370 milliseconds. This response time increases when the overlapping is reduced and increases when the overlapping is reduced. This is due to the impact of the increase in the number of database reads. Thus the decrease of overlapped sensors constitutes a 60% change of the response time. The same situation prevails with a window size of 1,000 as well.

6.2.3. Experiment 3: Measuring Response time of Virtual Sensors Accessed by 100 Concurrent Clients with Respect to the Increase of the Contributing Sensor Data Streams

This test simulates a case where a popular virtual sensor is accessed by many users. In the first trial we measure the response time of an aggregate reading of a virtual sensor with 100 clients accessing the same set of data concurrently. The second trial records the response time by increasing the number of clients from 10 to 50 and keeping the number of contributing sensor data streams constant at 50. In both trials we use a window size of 10. This experiment mainly focuses on testing the response and the scalability of the system. The graph in Figure 18 depicts the bottlenecks with the scenarios when fetching data where the overlapping does not exceed 50%. The scenarios that 100% overlap fetched from the database and the memory cache returned the constant response times ranging from 30 and 10 milliseconds throughout this experiment.

Figure 18. Response times for querying a single virtual sensor (a) Increasing the number of contributing sensors with 100 concurrent users (b) Increasing the number of users with 50 sensors.

The test emulator times-out due to memory limitation when using a traditional naïve strategy when the number of sensors exceeds 50 as depicted by the graph in Figure 18(a). Clearly the strategy followed by WikiSensing to use the principles of overlapping scales better that the traditional approach as the response times are comparatively less.

The response time for the virtual sensors readings R(t) has a dependency on both the number of contributing sensors (X) the window size (Y) and the number of concurrent users (Z), R(t) = f(X,Y,Z). As the data access intensifies with 100 concurrent users the response time tends to increase and the performance is diminished in the strategies where there is 50% or less overlapping. Form these experiments we can conclude that, response time for virtual sensor readings:

where N is the required number of sensors, O is the overlapped number of sensors, d(t) is the time to fetch record from database, c(t) is the time to fetch record from cache, a(t) is the time to process the aggregation.The other factors that affect the response time of such an HTTP request are the performance of the browser, the speed of the Internet connection, the local network traffic, the load on the remote host, and the structure and format of the web page requested [30]. Taking the time cost of all these factors as X, the total response time is = R (t) + X.

7. Summary and Discussion

The WikiSensing system at present can be used with sensors that produce data streams. At this stage the system does not support other formats such as images, text messages or audio or video files (security cameras, audio sensors). We realize the significance of widening the scope in order to store different types of media as a future enhancement of WikiSensing.

A major goal of sensor data management is that it can be used by different applications that provide users with useful information. For example, consider an application that reads the pollution data from a sensor data management system and provides warning to asthmatic users, or a heart beat ECG monitoring application that alerts its users when values reach certain thresholds. These are examples of applications that can use WikiSensing as a repository for their sensor data and make use of the API services it provides. The applications usually compare readings against a prescribed threshold value. We aim to support more complicated applications that depend on the readings of multiple sensors and require several parameters for decision making. For instance, contemplate an application that outputs the disturbance noise levels of a room at a particular location. To provide an accurate reading this application requires the information on the noise levels of the traffic, distance to bus stops, the type of traffic that passes through (as in heavy or light vehicles), thickness of double glazing of windows, the elevation of the room, the reliability of the sensors and so on. WikiSensing currently stores sensor geographical data variety of sensor meta-information but its goal is to store more domain specific information in order to support complicated applications as the one previously mention.

When considering the openness of WikiSensing for online collaboration it facilitates creating virtual sensors, updating information as well as allowing users to provide their feedback or comment on the existing sensor data. A current limitation of the system is based on the granularity of commenting on this data. The feedback functionality of WikiSensing enables users to add comments to a particular data stream but cannot add annotations to a specific point of a graph that represents a sensor reading. This is useful to understand more about sensor data streams and can be a part of the functionality, incorporated in the future.

Calibration details are deemed important in understanding the quality of sensor measurements. We plan to incorporate calibration details as a separate entity in the data model linked to the sensor data points as it relates to periods of measurement of a sensor. This information can be a part of the sensor meta-data that can be used as a metric in assessing the trustworthiness of sensors.

The system currently supports a single trustworthiness score. However this score can be categorized in to several aspects of a sensor, for example such as the reliability, accuracy or calibration. The system can provide an ontology containing various attributes and its relationships with each other. For instance, the reliability of a sensor could not be the same when it is located outdoors as opposed to being placed indoors. Hence the ontology could contain definitions for both these scenarios as the same attribute may have different values depending on the circumstance. The work described by [31] is with particular interest as it presents information on developing ontologies for heterogeneous sensors. We can also use the JCGM VIM [32] standard terminology for selecting sensor attributes in designing the ontology.

The functions currently supported by the system are based on simple aggregations such as summing and averaging. There is a limitation imposed on enabling users to define their own functions and being able to execute them at the centralized server, as it may poses issues on performance and security. In order to evade there problems the system API allows users to obtain the data streams and perform complex functions locally. The research work by [33] can be considered when enhancing the system to support complex aggregation functions.

8. Conclusions and Future Work

8.1. Conclusions

This paper has introduced a new collaborative approach for sensor data management known as WikiSensing. It has presented an architectural design and described the implementation details for a collaborative sensor data management system. The advantage of WikiSensing is based on incorporating online collaboration into sensor data management. Online collaboration is used in WikiSensing to annotate, update and share sensor information as well as in creating virtual sensors. The virtual sensor concept is an extremely useful feature that provides sensor readings using existing sensor data streams. The main challenges in sensor data management and online collaboration is due to the large amounts of sensor data and the inability to demonstrate the trustworthiness of the shared information. This research has addressed some of these challenges towards developing a successful collaborative sensor data management system.

We anticipate that the convergence of online collaborations with sensor data management can enable better use and understanding of the vast amounts of sensor information. Further the efforts required are considerably lower due to the collaborative nature and the involvement of users with experience and knowledge on sensors and their deployments.

8.2. Future Work

We plan to concentrate on enhancing the response time of the extensively used aggregate queries as well as implementing a mechanism to trace the developments of the virtual sensors. From an analytical perspective we are working on building a wiki analytical layer for the sensor and wiki data that can markup the information using a universal methodology. In the short term out future work will focus on the following aspects.

An important future development would be to trace the modifications of virtual sensors. Hence we plan to extend the data model in order to maintain a record of changes applied to virtual sensors. A potential source for this information could be the updates applied to the virtual sensor network and the virtual sensors query entities. The work done by [34] highlights the challenges in managing historical sensor information and can be used as the basis for this development.

We hope to reduce the response time of aggregate operations by using the MapReduce MongoDB [35] for batch processing of data. This is similar to Apache Hadoop (hadoop.apache.org) but uses distributed processing of large data sets across clusters of computers. MapReduce in MongoDB processes the input from a collection and outputs it to a collection. This can be used for the aggregation queries especially when they involve combining a large number of data streams. This relates to the work of [36] that proposes a scalable platform for network log analysis, which targets for fast aggregation and agile querying.

Our main objective is to use the gathered sensor data and put it into further analysis with the goal of helping users to obtain useful insights. For example, it could be useful to know whether there is a relationship between the temperature of the environment and the pollution levels of NO or ozone or between the noise levels and the prevailing traffic. The data in the system must therefore be transformed into a suitable format in order to make further use of it and the system must provide the suitable functionality. This can be supported by adding a new layer to out architecture.

The proposed new layer (highlighted in Figure 19) would enable the existing data and information to be formatted and annotated based on a standard markup. The functionality of this tier would be able to extract and use the information from the Wiki pages created as result of online collaborations. The goal is to annotate this information so that is can be further analyzed thereby increasing the chances of obtaining useful insights form this rich set of underlining sensor data.

Figure 19. The proposed new layer for organizing the sensor data and user annotations.

Acknowledgments

We would like to thank our colleagues in the Discovery Science Group at Imperial College London for their help and support. In particular we would like to acknowledge insightful discussion and comments on the manuscript by Orestis Tsinalis. We would also like to acknowledge research funding received from through the EPSRC, without which this work would not have been possible; through Research Grant EP/H042512/1, “Elastic Sensor Networks: Towards Attention-Based Information Management in Large-Scale Sensor Networks”, and Research Grant EP/I038837/1, “Digital City Exchange”. This work is also benefited from the support for the GuangDong Innovation Team on Cloud Computing from the GuangDong provincial government of China.

Fair Usage

The purpose of this site is to keep the domain and idea of WikiSensing alive. The volunteers that work on this website will try to collect studies and research on sensor data management collaboration. Fair Use