We provide a unique data set derived from trajectories of raw GPS position fixes (consisting of a latitude, a longitude, a time stamp, as well as the vehicle speed and driving direction recorded at the time). The data is made available by HERE Technologies and originates from a large fleet of probe vehicles which recorded their movements in multiple culturally and socially diverse metropolitan areas around the world throughout the course of an entire year. The full data that we share with the scientific community is based on the unprecedented number of over 1011 probe-points, corresponding to over 300,000 frames, which at a 24 frame/s play rate would give in excess of three hours data-movie footage.

Specifically, the aggregation procedure involves the following steps:

Spatial tessellation of the study area: the study area is tessellated in regular grid cells. We select all probe points which intersect one of the cells and were recorded in the selected time interval.

Aggregation of probe points: Probe points are grouped based on their spatial and temporal attributes, i.e., the grid cell their location falls into and the 5 minute time bin within which their time stamp belongs.

Core channels: We compute the mean speed, volume, and direction of traffic.

Generation of video frames: The encoded values are stored in a tensor of the form (t; h;w; c) where t is the number of individual 5 minute time bins, i.e., the number of frames, while h and w denote the height and width of the frame grid cells, and c stands for the number of data channels, with c = 3 when using only a single channel for each traffic state feature.

Hello,
I can confirm that the red channel (colour value) is proportional to what could loose be called the volume, so let me explain what I mean by that. In fact, it is the number of “probe points” (GPS coordinates with time stamps) received from a partial collection of all our sources, capped at a minimum and maximum level and normalized over the year and aggregated in the roughly 100mx100m and 5min interval space-time windows. All these capped numbers are then mapped proportionally to the interval [1,255] after which they are rounded to the nearest integer colour value (between 1 to 255). Note that some of the underlying probes generating this data might emit “probe points” at different time intervals – yet the procedure above only counts the points that arrive during a given time interval.
Hope that helps. Thanks for your interest.

I have several questions about the dataset. Since time bin is 5 mins and this competition expect model to predict next 15 mins. I assume we are asked to build a video prediction model to predict next-3 frames. I am wondering how long is the input sequence. Expect for the movie footage, do we have access to other type of data like date, time (e.g. 4 pm.) or population map?

Hi gnosisyu, thanks for your interest in the competition. As you can see from the test data set, we provide data for 60 min previous to each prediction point in time. Of course, however, it is up to you to decide how much of that information you actually feed into your model.

With regards to your second question, we only provide the traffic data. If you choose to incorporate other additional data sources, you are of course more than welcome to do so.