Abstract

Computer vision methods that quantify the perception of urban environment are increasingly being used to study the relationship between a city’s physical appearance and the behavior and health of its residents. Yet, the throughput of current methods is too limited to quantify the perception of cities across the world. To tackle this challenge, we introduce a new crowdsourced dataset containing 110,988 images from 56 cities, and 1,170,000 pairwise comparisons provided by 81,630 online volunteers along six perceptual attributes: safe, lively, boring, wealthy, depressing, and beautiful. Using this data, we train a Siamese-like convolutional neural architecture, which learns from a joint classification and ranking loss, to predict human judgments of pairwise image comparisons. Our results show that crowdsourcing combined with neural networks can produce urban perception data at the global scale.

Keywords

Perception Attributes Street view Crowdsourcing

Electronic supplementary material

The online version of this chapter (doi:10.1007/978-3-319-46448-0_12) contains supplementary material, which is available to authorized users.

Naik, N., Kominers, S.D., Raskar, R., Glaeser, E.L., Hidalgo, C.A.: Do people shape cities, or do cities shape people? the co-evolution of physical, social, and economic change in five major U.S. cities. Working Paper 21620, National Bureau of Economic Research (2015)Google Scholar

Sampson, R.J., Raudenbush, S.W.: Disorder in urban neighborhoods: Does it lead to crime. National Institute of Justice (2001)Google Scholar

36.

Harcourt, B.E.: Reflecting on the subject: a critique of the social influence conception of deterrence, the broken windows theory, and order-maintenance policing New York style. Mich. Law Rev. 97(2), 291–389 (1998)CrossRefGoogle Scholar