Visualizing 200K Moving Job Requests on Thumbtack in R

At Thumbtack, millions of customers each year, across the entire nation, seek the help of hundreds of thousands of service providers (“pros”) to get jobs done. Analysts and economists at Thumbtack can use the marketplace activity data generated by these interactions to help drive efficiency in the marketplace and to understand where we can do better to help serve our customers and pros.

In this post, we showcase an example of a project, completed by our analytics team, where we visualized how Americans are moving across the nation. Specifically, we visualized a sample of approximately 200,000 long distance move (50+ miles) job requests our customers made through Thumbtack over the past few years.

Below, you can see the resulting map visualization, where each point represents either a move request origin or destination, and each line represents the move connecting those two points. The move lines are generated by calculating the great circle arcs, which represent the shortest routes between two points on the Earth. Because the Earth is a sphere, the lines are not necessarily straight on a flat projection map (nor do they reflect the actual route taken by any individual who undertook a move).

As the map makes clear, Thumbtack customers live across the United States, with their move request origins and destinations closely matching the distribution of the U.S. population. There are even job requests for moves beyond the contiguous United States, to Alaska, Hawaii and Puerto Rico. Even without an underlying map, the points themselves approximately outline the shape of the U.S. landmass.

In the second version, below, we zoom in on the contiguous United States to better display the primary moving patterns visible on the map. Unsurprisingly, we see that most moves happen between major population hubs. The main moving “corridors” are between San Francisco/Los Angeles and the East Coast cities, and between Florida and the two coasts. Although harder to see on this map, the Boston-Washington DC corridor, inclusive of New York City and Philadelphia, is also popular for moves. Cities in Texas, Seattle/Portland, Minneapolis, Chicago and Atlanta are major origins or destinations for moves as well.

In future posts, we also plan to write about popular move destinations and origins, to look at what cities are most or least popular among people who are moving and which cities had the largest population losses or gains.

Making Of

In making of this map, we were inspired by similar visualizations created at AirBnB and Facebook. In particular, we used an approach similar to the one used to make AirBnB’s map of trips, creating the map with the ggplot2 package in R, by simply using the line plot (“geom_line” object) and scatterplot (“geom_point” object) graphs.

One difficulty with rendering the visualization in ggplot2 was the rendering speed. With hundreds of thousands of points, it could take up to an hour or more to render the plot, especially when rendering in higher resolutions. To improve rendering speed, we instead split the dataset into chunks of 10,000 move requests, made the plot background transparent, and visualized each chunk separately. Because rendering speed appeared to increase in more of an exponential manner than a linear fashion, limiting each plot to 10,000 moves kept rendering speeds quite fast. At the end, since each plot had a transparent background, we could simply overlay all the individual layers and merge them into one image. This was easily accomplished in a programmatic way using the magick package in R.

This was just one example of a quick project the Thumbtack analytics team worked on. There are many more fascinating data problems to solve at Thumbtack and we are always hiring analysts and data scientists. Join us!