One of the most prominent and comprehensive ways of data collection in sensor
networks is to periodically
extract raw sensor readings. This way of data collection enables complex
analysis of data, which may not be
possible with in-network aggregation or query processing. However, this
flexibility in data analysis comes at
the cost of power consumption. In this paper, we introduce selective sampling
for energy-efficient periodic
data collection in sensor networks. The main idea behind selective sampling is
to use a dynamically changing
subset of nodes as samplers such that the sensor readings of sampler nodes are
directly collected, whereas
the values of non-sampler nodes are predicted through the use of probabilistic
models that are locally and
periodically constructed in an in-network manner. Selective sampling can be
effectively used to increase
the network lifetime while keeping quality of the collected data high, in
scenarios where either the spatial
density of the network deployment is superfluous relative to the required
spatial resolution for data analysis
or certain amount of data quality can be traded off in order to decrease the
overall power consumption of the
network. Our selective sampling approach consists of three main mechanisms.
First, sensing-driven cluster
construction is used to create clusters within the network such that nodes with
close sensor readings are
assigned to the same clusters. Second, correlation-based sampler selection and
model derivation is used to
determine the sampler nodes and to calculate the parameters of probabilistic
models that capture the spatial
and temporal correlations among sensor readings. Last, selective data
collection and model-based prediction
is used to minimize the number of messages used to extract data from the
network. A unique feature of our
selective sampling mechanisms is the use of localized schemes, as opposed to
the protocols requiring global
information, to select and dynamically refine the subset of sensor nodes
serving as samplers and the modelbased
value prediction for non-sampler nodes. Such runtime adaptations create a data
collection schedule
which is self-optimizing in response to changes in energy levels of nodes and
environmental dynamics.