Human Mobility Modeling at Metropolitan Scales

Human Mobility Modeling at
Metropolitan Scales
Sibren Isaacman, Richard Becker, Ramón Cáceres, Margaret Martonosi,
James Rowland, Alexander Varshavsky, Walter Willinger
Princeton University, Princeton
AT&T Labs, Florham Park, NJ, USA
Mobisys 2012
Junction 101.07.09
Introduction
• Human mobility model
• Mobile sensing, opportunistic networking, urban planning,
ecology, and epidemiology
• Aim to produce accurate models of how large populations
move within different metropolitan areas
• Generate sequences of locations and associated times that
capture how individuals move between important places in their
lives, such as home and work.
• Aggregate the movements of many such individuals to reproduce
human densities over time at the geographic scale of
metropolitan areas.
• Take into account how different metropolitan areas exhibit
distinct mobility patterns due to differences in geographic
distributions of homes and jobs, transportation infrastructures
and other factors
Shortages of previous works
• Random motion -> unrealistic motion
• Lack of memory about recurring movement patterns
• Lack of spatiotemporal realism about population densities
• Without diverse population and geographic concerns
• Small geographic area (e.g. campus)
• Too universal -> fail to adopt to different geographic areas
Sources
• Call Detail Records (CDRs)
• Creating human mobility models from such data
• Maintained by cellular network operators
• Information -> Sporadic samples of the approximate locations of
the phone’s owner
• The time a voice call was placed or a text message was received
• The identity of the call tower with which the phone was associated
• Difficulties:
• Whether associated locations corresponding to home, work, or other
important places for particular cellphone users.
• Both the spatial and temporal granularity of CDR data is quite coarse
• Spatially, the granularity of cell-tower spacing
• Temporally, only generated when phones are actively involved
Overall view
• WHERE modeling approach (Work and Home Extracted Regions)
• Human movement (important locations, commute distances .. )
=> probability distributions
=> generate synthetic CDRs for an arbitrary number of synthetic people
Parameters for mobility modeling
• Human mobility is tightly coupled to the geography of the city
people live.
• => should take into account both the area geography and
individual user mobility patterns.
• Spatial information: important locations
• Spatiotemporal information: hourly population densities
• Temporal Information: calling patterns
Spatial: Important Locations
• A full 60% of mobility can be accounted for just the top two
cellphone towers with which a user is associated.
• Important locations: home and work
• Probability distribution
• home locations: Home
• CommuteDistance : d
• Work locations: Work
Spatiotemporal:
Hourly population densities
• Heavily residential area is likely to be more populated at night
Commercial district is likely to be more populated during the day
• Hourly population density: simply reflects the probability of people
being at a particular location at a particular time (Hourly)
• Assumption: the spatial densities of telephone calls is approximately
equivalent to the spatial density of people
7 ~ 8 p.m. on weekdays
Over a 3-month period
Temporal: calling patterns
• A user’s daily call volume characteristics from distribution:
PerUserCallsPerDay ~ N(mean, sd)
• Temporal patterns of when those calls are made:
• 2 classes (clustered by k-mean method)
• How many calls each user makes during each hour of the day
• 24-dimensional vector
• Hourly call probability distribution: CallTime
Algorithm for model generation
• WHERE2 => Two-place model: Work and Home
• Makes use of the fact that most people spend the majority of
their time either at home or at work.
WHERE2: Work and Home
• Users’ movement: occurs based on synthetic CDRs
representing calls made at different locations at different
times
Extension to Additional Places
• WHERE3: increasing the number of important locations
• Tradeoff: model complexity <-> fidelity of the synthetic trace
Evaluation
• Earth Mover’s Distance (EMD)
• A “good” synthetic trace has the synthetic user population
distributed in a very similar way in space as the real trace, at any
point during the day.
• A measure for comparing two spatial probability distributions
• Attempt to find the minimum amount of energy required to transform one
probability distribution into another.
• This energy is given by the “amount” of probability to be moved and the
“distance” to move it.
“distance”, the linear
mapping used in EMD.
Evaluation
• Comparison models
• Random Waypoint (RWP)
• Each user selects a random destination from all possible destinations in the
area to be simulated.
• Once a destination is selected, the user moves at a random velocity toward
the destination.
• When the selected destination is reached, the user waits for a random
amount of time and then selects a new destination and new velocity to begin
the procedure.
• Weighted Random Waypoint (WRWP)
• The destination is chosen from a location probability distribution
• Here, use the distribution “Hourly”
Evaluation
• Sources for input probability distributions
• Real call detail records (CDRs)
• ZIP codes within a 50-mile radius of the LA and NY centers
• Billing addresses lie within the metropolitan regions of interest
• Phone id, starting time, duration, cell towers locations
• Data Validation (CDR can accurately represent the mobility patterns)
• The number of sampled phones in each ZIP code is proportional to the population of
that ZIP code
• The maximum pairwise distance between any 2 cell towers contacted by a phone in
one day is a close approximation for how far the phone’s owner traveled that day.
• Applying certain clustering and regression techniques produces accurate estimates of
important locations in people’s lives, in particular home and work.
Evaluation
• Sources for input probability distributions
• Census Data
• Need to buy CDRs!
• Publicly available data regarding home and work locations, as well as
commute distances, for large populations
• Provide little or no information about the hourly probabilities of a given
location
• Make an assumption about the hourly distributions
• Combination of public data and CDRs
• home, work, and commute distances come from the census
• Other distributions are drawn from CDR data
Evaluation
• Artificial Test Cases
• To reason about the expected behavior of the model and its
strengths and weaknesses
• Two locations:
• The model must be able to place synthetic users at home and work locations
and move them according to time of day
• since we emulate CDRs with discrete call locations, the synthetic users must
only exist at these locations
• The entire world is populated by users that move predictably between two
locations at highly regimented intervals.
• From 7am to 7pm on weekdays, all of the probability is concentrated in a
single “work” location. At all other times, the probability is clustered in a
second “home” location.
Validation
•
WRWP
RWP
WHERE2
Idea
clearly
doesn’t
result,
isdiffers
able
inhave
which
to
greatly,
sufficient
correctly
probability
because
temporal
model
isitathe
isspike
given
input
callatdistribution
so
to
the
distinguish
little
home
input
how
information
location
information
9amtest
mobility
regarding
atcase
6am,
both
probabilities
and
during
spatial
a spike
“home”
orshould
temporal
at work
and
differ
location
“work”
patterns
fromphases.
at6am
to
9am.
model.
ones.
Two locations
Evaluation
• Artificial Test Cases
• To reason about the expected behavior of the model and its
strengths and weaknesses
• If multiple possible works exist for a home, the model must select a realistic
one.
has
WHERE2
a single
detects
user travel
that users
to each
make
of the
callsfour
onlypossible
from one
locations,
of two
locations
resulting
and
in thus
a signifﬁcantly
correctly positions
worse EMD.
users.
Validation: Large Scale Real
Data
• Modeling based on real CDRs
• WRWP improves over original RWP by 4 times
• WHERE out performs WRWP by additional 20%
• WHERE3 improves further (NY by additional 3 miles accuracy)
Validation: Large Scale Real
Data
• Modeling based on real CDRs
Validation: Large Scale Real
Data
• Modeling based on Census Data
• Using all-public data from the US Census
• With average error of 8 miles in NY
• Using hybrid of census data with some CDR information
• An average error of 6.8 miles
• Using all-CDR information in WHERE3
• Reduces average error to some 3 miles
Example Uses
• Daily range: maximum distance a person travels in one
day
• serves as an important metric for verifying correctness of the
generated models
• The EMD metric measures the aggregate behavior of the users,
but daily range displays the results at a per-user granularity.
(2, 25, 50, 75, 98) percentile
Comparing NY and LA:
*.
WHERE2:
LA with only
median
1 mile
is within
error

0.8applicability
miles of theacross
true value.
cities
with
*. WHERE3
very different
0.7 miles
mobility
error
=>
patterns
Addingand
a third
geographic
point to
the model
characteristics.
provides very
little benefit for this use.
Example Uses
• Message Propagation: relevant in opportunistic
networking
• Social contacts, epidemiology and data carrying
• Simulator for epidemic routing
• As user meet they exchange all messages (0.5 radius)
• Delivery percentage and message delay rate
Example Uses
• Hypothetical Cities: what-if scenario regarding mobility in
NY
• Create parameterized model of cities and user behavior patterns
• City planner to experiment with the effects of modifications that
are being considered
• 10% of the people whose original work location were in the
borough of Manhattan were instead given work locations that
matched their home location.
Conclusion
• Human mobility modeling
• Capture the motion of individuals among important places in
their lives
• Aggregating that motion to reproduce human densities over time
at the scale of a metropolitan area.
• Accounting for differences between metropolitan areas.
• Refinements
• To produce not only sequences of locations with associated times,
but also routes taken between those locations.
• CDRs and census tables are too coarse to provide route information
• Sequences of Cell towers passing by