Using Uber Ridership Data to Compare Cities and Neighborhoods

You're hopefully already familiar with Uber, the app-based on-demand private driver service that launched in San Francisco back in June, 2010. Although we've only been around for two years, we've moved a lot of people in that time.

On the surface we look like "just a car company," but under the hood we're a technology company that strives to use our data to make our customer experience the best possible. One of the ways we do that is by figuring out where people want to go and when. The first issue we encountered is determining an easy, intuitive way to break a city into discrete places. While mathematically this isn't necessary, in terms of communicating the data it's very important.

As a brain guy I have to think in terms of both space and time: what's happening in the brain and when. With Uber, our temporal patterns of demand tell us about the neighborhoods of the cities we service.

For example, this is what San Francisco's demand looks like, broken down by hour of week:

Now compare that to New York:

Right away you can see there's something different; the differences may not appear to be huge, but they're significant. Our ridership in New York is more heavily skewed toward weekdays whereas San Francisco demand jumps up on weekends.

Of course, now that we've been in business for two years and still growing like crazy, we can get much more granular. Instead of looking at differences between cities, we can start to look at differences between neighborhoods—271 of them across nine major U.S. cities, to be exact:

Now that we can get more fine-grained we can begin to observe some pretty clear neighborhood-by-neighborhood differences. Again, a nerd-picture is worth a thousand nerd-words, so have a look at two neighborhoods in San Francisco - the Mission and the Financial District:

Check out how daily demand in the Mission peaks later in the day - after work hours - whereas demand in the Financial District peaks toward the end of the work day. The big difference, of course, is that the Mission has a lot more demand on Saturdays.

Now look at how San Francisco's Financial District compares to New York's Financial District:

I love this stuff. San Francisco's Financial District is more Manhattan-like than it is San Francisco-like.

In fact, we can quantify how city-like or not city-like any given neighborhood is. That is, we can ask, "how San Francisco-like is the Mission, really?" and "how much more like New York is the Financial District than it is San Francisco?"

And we can do this for every neighborhood. What do we find?

Cities have "stereotypical" neighborhoods that very strongly match the flow of their home cities, and some neighborhoods just don't really seem to belong to their home city. They're outliers.

Of course, some neighborhoods have more demand and thus contribute more to the overall city demand. One way to correct for this is to correlate a city's neighborhood demand with the city's demand curves by removing the effect of that neighborhood. Once I do that, I find that the most stereotypically "like" neighborhood for each Uber city is:

San Francisco: North Beach

New York: Chelsea

Seattle: Capitol Hill

Chicago: Near North Side

Boston: Back Bay - Beacon Hill

D.C.: Dupont Circle

L.A.: Mid-City West

In contrast, the most stereotypically "unlike" neighborhood for each Uber city is:

San Francisco: Crocker Amazon

New York: Washington Heights

Seattle: South Park

Chicago: Montclare

Boston: West Roxbury

D.C.: Deanwood

L.A.: Southeast L.A.

We can also extract "types" of demand curves: are there neighborhoods that are more active on weekends and others that are clearly work-week hotspots? One simple mathematical technique to identify stereotyped patterns in data is via principal component analysis. The details aren't too important, so let's just jump to the results: there are two "types" of demand curves that account for 93 percent of the variance in overall demand. Here's what they look like:

Essentially you've got one rising demand curve that peaks on evenings and Friday and Saturday nights (red) and one workday/workweek curve that diminishes on weekends (blue). We can then ask, for each city, which neighborhood is the most "weekend-like" and which is the most "weekday-like" (that is, how strongly does each neighborhood correlate with each of these two curves)?

So if we could build the perfect "party city" consisting only of the neighborhoods from each city that correlate most with the weekend curve, this is what it would look like:

San Francisco: North Beach

New York: SoHo

Seattle: First Hill

Chicago: Near North Side

Boston: South Boston

D.C.: Dupont Circle

L.A.: Santa Monica

And now again, in contrast, the lame all work/no play city would be:

San Francisco: Financial District

New York: Garment District

Seattle: Overlake

Chicago: O'Hare

Boston: East Boston

D.C.: Deanwood

L.A.: Westchester

But this is looking at how neighborhoods relate to cities. What about how they relate to one-another? Well, given that we're working with 271 neighborhoods, we're talking about running 36,585 correlations, which is messy to display. So I've pared the data down to just the strongest relationships, which you can play around with by clicking the image below.

Kids repeatedly exposed to violence, homelessness, and addiction are more likely to carry the long-term effects into adulthood. A new report breaks down the geographic and racial distribution of this trauma.