Data Scientist in NYC who loves scuba diving

new project

NYC Open Data Portal

Working Through CSV

Oil-burning boilers are one of the largest sources of air pollution in NYC. The purpose of the project is to see how complete this dataset is and to map the data in a geographic visualization. The original data is small-sized (~22MB, 8K rows), and appropriately enough, dirty.

Steps

Get data from remote url using NYC Open Data API, and save it to local file rows.csv.

Data cleaning: There are issues with the data as imported, having to do with cleanliness and completeness. First, get rid of rows with all NaN value at the end of the dataframe. Then parse the ‘Owner Address’ to get clean latitude and longitude information.

Save the clean data with latitude and logitude into a new file oil.csv.

Load the cleaned CSV data into a new dataset on CartoDB for geographic visualization.