Archives

Category: Open Data

As I build out CityGraph, I’ve run into the question of which mapping libraries and services to use and why. My purposes are focused on overlaying various types and representations of datasets on (mostly) city-level maps, and modifying those visuals according to user interaction. Here’s what I’ve learned:

Why not Google Maps?

From the start, I narrowed my decision down to Mapbox and Mapzen because they have more robust data visualization APIs and are based on OpenStreetMap. To their credit, I believe Google Maps has better and more reliable data than OpenStreetMap, but I feel it is important to run an open data based service on open mapping data and open source libraries. Additionaly, for my purposes, which are heavily focused on data visualization and interactivity, Google Maps’s lackluster datavis APIs would leave me to rely on something like Leaflet, which doesn’t take advantage of the excellent WebGL features that Mapbox and Mapzen’s libraries have.

Mapbox and Mapzen

Between Mapbox and Mapzen’s rendering libraries and data services/APIs, the choice comes down to what your use cases are. Mapbox has the superior rendering libraries — Mapbox GL libraries work across the web, iOS, and Android. Mapzen has a WebGL renderer, but their mobile library is still in its early stages of development Mapbox seems like the smart choice here.

With respect to data access and API usage, the situation becomes more complicated. If you’re building a commercial application with Mapbox, you have to start out with Mapbox’s Premium plan, which runs at $499/month. If you’re a business with any revenue at all, this is almost certainly worth it, and you can negotiate a higher-tier plan if you exceed the Premium plan’s rates. However, if you aren’t ready to start with the Mapbox Premium plan, Mapzen may be the better choice, because they allow commercial apps to use their free tier. If you don’t care about commercial mapping licensing or supporting thousands of users, then either service’s free tier APIs will almost certainly suit your needs.Mapzen’s rate limits for their free tier are incredibly generous, more so than Mapbox’s, and you can grow your application to support many users before even having to worry about upgrading. It seems their pricing plans are still under development, but I can’t imagine their prices settling any higher than those of Mapbox.

An Ideal Compromise

Ultimately, I decided to go with Mapbox’s libraries for their better cross-platform support and feature-completeness; however, for mapping data and APIs, I chose Mapzen’s services. Every aspect of Mapzen’s stack, from routing to geocoding to tile generation and serving, is open source. So in theory, if you wanted to host your own rate-unlimited Mapzen instance, you could (though it would likely be far more expensive than simply paying Mapbox or Mapzen for their services). And if either service were ever shut down, you could still run your own instances of Mapzen’s open source software and get the same usability. Luckily, Mapbox’s libraries make it easy to use Mapzen’s services. If you have the revenue to do so and aren’t paranoid of a shutdown, paying for Mapbox’s APIs may be the simpler decision. However, Mapzen’s open source approach is inviting and reassuring, and its compatibility with Mapbox’s web and mobile rendering libraries gives me the best of both worlds.

James Webb Space Telescope and Astronomy

JWST goes well into the infrared
Launch Autumn/winter 2018 — lots of things that can go wrong, but these engineers are awesome.
Science proposals start November 2017.
Routine science observations start six months after launch.
Compared to next-gen observatories, JWST is an old school telescope. We can bring it into the 21st century with better tools for research.
Coordination of development tools with Astropy developers.
Watch the clean room live on the WebbCam(ha!).

Open Source Hardware in Astronomy

hardware.astronomy
Bringing the open hardware movement to astronomy
1) Develop low(er) cost astronomical instruments
2) Invest undergrads in the development (helps keep costs low).
3) Make hardware available to broader community
4) develop an open standard for hardware in astronomy

Citizen Science with the Zooniverse: turning data into discovery (Oxford)

Crowdsourcing has been proven effective at dealing with large, messy data in many cases across different fields.
Amateur consensus agrees with experts 97% of the time (experts agree with each other 98% of the time), and remaining 3% are deemed “impossible” even by experts.
Create your own zooniverse!

Gaffa tape and string: Professional hardware hacking (in astronomy)

Spectra with fiber optic cables on a focal plane.
Move the cables to new locations.
Use a ring-magnet and piezoelectric movement to move “Starbugs” around — messy, inefficient.
Prototyped a vacuum solution that worked fine! This is now the final design.
Hacking/lean prototypes/live demos are effective in showing and proving results to people. Kinks can be ironed out later, but faith is won in showing something can work.

Open Science with K2

Science is woefully underfunded.
Qatar World Cup ($220 billion) vs. Kepler mission ($0.6 billion)
Open science disseminates research and data to all levels of society.
We need more than a bunch of papers on the ArXiv.
Zooniverse promotes active participation.
K2 mission shows the impact of extreme openness.
Kepler contributed immensely to science, but it was closed.
Large missions are too valuable to give exclusively to the PI team — don’t build a wall.
Proprietary data slows down science, misses opportunities for limited-lifetime missions, blocks early-career researchers, and reduces diversity by favoring rich universities.
People are afraid of getting scooped, but we can have more than one paper.
Putting work on GitHub is publishing, and getting “scooped” is actually plagiarism.
K2 is basically a huge hack — using solar photon pressure to balance an axis after K1 broke.
Open approach: no proprietary data, funding other groups to do the same science, requires large programs to keep data open.
K2 vs K1: The broken spacecraft with a 5x smaller budget has more authors and most publications, and more are early-career researchers because all the data is open. 2x increase, and a more fair representation of the astro community.
Call to action: question restrictive policies and proprietary periods. Question the idea of one paper for the same dataset or discovery. Don’t fear each other as competition — fear losing public support.
The next mission will have open data from Day 0 thanks to K2.

The promise of Open Data has drawn most major US cities to implement some sort of program making city data available online and easily accessible to the general public. Citizen hackers, activists, news media, researchers, and more have all made use of the data in novel ways. However, these uses have largely been more information-based than action-based, and there remains work to be done in using Open Data to drive decisions in government and policy-making at all levels, from local to federal. Below I present some of the challenges and and opportunities available in making use of Open Data in more meaningful ways.

Challenges

Standardization and Organization

Open Data is dirty data. There is no set standard between different cities for how data should be formatted, and even similar datasets within a city are often not interoperable. Departments at all levels of government often act independently in publishing their data, so even if most datasets are available from the same repository (e.g. Socrata), their organization and quality can differ significantly. Without a cohesive set of standards between cities, it is difficult to adopt applications built for one city to others.

Automation

The way data is uploaded and made accessible must be improved. Datasets are often frozen and uploaded in bulk, so that when someone downloads a dataset, they download it for a particular period in time, and if they want newer data, they must either wait until it is released or find the bulk download for the newer data. This involves more human effort both in the process of uploading the data and in downloading and processing that data. Instead, new data should be made immediately accessible as a stream with old data going back as far in time as it is available. This allows someone to access exactly as much data as they need without the hassle of combing through multiple datasets, and it removes the curators need to constantly compile and update newer datasets.

Accessibility

Compared to the amount of data that the government stores, very little of it is digital and very little of what is digital is publicly available. The filing cabinet should not be a part of the government storage media. Making all data digital from the start makes it simpler to analyze and release. Finally, much of the data the government releases is in awkward formats such as XLSX and PDF that are not easily machine-readable. If the data is not readily available and easily accessible, it in effect does not exist.

Transparency

Most of the publicly accessible records that the government has are not readily available unless FOILed. The transparency argument of Open Data could be taken to a completely new level of depth and thoroughness if information at all levels of government was made readily available digitally as immediately as it was generated. Law enforcement records, public meetings, political records, judicial records, finance records, and any other operation of government that can be publicly audited by its people should be digitally available to the public from the moment it is entered into a government system.

Private Sector Data

Companies such as Uber and Airbnb have come to collect immense amounts of data on transportation and real estate that have historically fallen under regulated jurisdiction. Decisions should be reached with private companies to allow governments to access as much data as is necessary to ensure proper regulation of these utilities. This data should in turn be added to the public record along with official government data on these utilities.

Opportunities

Analytic Technologies

Policy-making should be actively informed by the nature of a constituency. Data-driven decision making is much hyped, but making it a reality requires software that easily and quickly gives decision-makers the information they need. From the city to the federal level, governments should have dashboards that summarize information on all aspects of citizens’ lives. These dashboards can contain information about traffic, pollution, crime, utilities, health, finance, education, and more. Lots of this data already exists within governments, and surely there exist some dashboards that analyze and visualize these properties individually, but to combine all available data on the population of a city can give significantly more insight into a decision than any one of these datasets alone.

Predictive Technologies

Governments have data going far back into history. Cities like New York have logged every service request for years, and that data is readily available digitally. Using the right statistical analysis on periodic data like heating requests, cities can start to predict which buildings might be at risk for heating violations in the winter, and can address such issues before they happen. The same can be applied to pot holes, graffiti, pollution issues and essentially any city-wide phenomena that might occur regularly. More precise preventative measures can be taken with more confidence, and eventually, the 311 call itself can be ruled out entirely.

Future Outlook

These ideas have the potential to radically change the way we engage with our cities and our politics. We can make decisions based unambiguously on what is happening in the world, and we can refine those decisions based on measured changes in the world over time. A population can know exactly if its citizens are getting healthier, safer, and smarter, and how to aid in these pursuits. Areas of governance that need more attention and potential approaches will become increasingly obvious as more information is combined and analyzed in meaningful ways. Decisions and their outcomes can be made with more confidence based on a more rigorous process. By making the most of Open Data, we can go beyond interesting information and begin to drive political action that directly benefits our cities, states, and nation.

A Note on Privacy

All of the ideas presented above have serious implications for the privacy of individuals and populations. These ideas have only considered the best-case uses of data in our society. Whether a government is analyzing granular data or data on a population in bulk, care must be taken to respect the privacy of its citizens. There is ongoing dialogue about how to balance data collection and privacy, and it is essential that governments and citizens take part in this dialogue as new technologies are developed and our societies become more data-driven.