I collected 200′000 data records from a major accommodation website in the United States and will compare this data with external factors like crime data, population, economy growth, tourism popularity and so on.

The rental data include, among other things, the price and the coordinates. The next step will be to analyze these coordinates. Does somebody know an interesting website, where I can have access to external factors with coordinates.
Thank you for your help

Hoi Severin - I removed the name of the website, because it's against ToS to scrape. If you obtained the data in another way, then please indicate so in the question. Grüss aus Zürich
– philshem♦Sep 23 '16 at 8:44

1 Answer
1

If you found non-aggregate data based on latitude/longitude, then you'd still need some algorithm to map your point to the nearest measurement coordinates - which is not trivial. It's unlikely to find US-wide "incident-level" data, whereas most data will be aggregated to some geographical region and time frame.

For that reason, I think most demographic, economic, environmental and crime data will be only mappable from aggregates like zip code, city or municipality, voting district, county and state. You can create a simple mapping table between all your latitude/longitude combinations and then which zip code, municipality, etc they belong to. For 200k records, you probably can't use Google Maps Reverse Geocoding (also due to license):

Whatever API service you use, make sure you understanding the quotas, and that you also have a good strategy for importing all the data to a local database (or file archive). This way you can do the reverse geocoding once for all your records, over a period of days or weeks, and then not need any more queries.

Once you have your latitude/longitude mapped to geographical regions, there are tons of resources at data.gov to start to join to your original dataset. (Don't forget about season differences!)

Sidenote - for those using zip codes - check out this research that shows how zip codes masked the contaminated water crisis in Flint, Michigan.

Their ZIP code data included people who appeared to live in Flint and receive Flint water but actually didn't, making the data much less accurate than it appeared.