At a time when many of us seek ways to apply our skills in the service of the greater social good, some people are actually doing it. In late 2016, Jonathon Morgan, CEO of Austin, Texas-based startup New Knowledge, created Data for Democracy, a loosely-formed coalition of data experts, analysts, engineers, project managers, scientists, and more. The organization tasks itself with the rather noble aim of solving real-world problems. Data for Democracy (or “D4D,” as the kids call it) crowd-sources its attacks on challenges like big-city traffic optimization, international refugee migration forecasting, and more. A recent contest collaboration with KDNuggets tasked entrants to devise algorithms to detect “fake news.” I spoke with Jonathon about D4D’s mission, community, and opportunity.

TOPH WHITMORE: Tell me about Data for Democracy. Who’s involved, and what kind of work are you doing?

JONATHON MORGAN: Data for Democracy is a volunteer collective. We’re about a thousand people right now, including data scientists, software engineers, and other technologists. There’s about a dozen active projects—some are community-led. For instance, there’s a group of folks who’ve been collecting election results data dating back to 1988 at the county level. That involves calling secretaries of state in different states around the country, collecting that data however they produce it, and going through the often-manual process of cleaning up that data and packaging it in a way that other people can use.

There are also projects where we partner with an existing organization. We’re working on a data model with the city of Boston so that we can ultimately produce an application that Boston citizens can engage with to experiment with how traffic fatalities can be reduced across the city. We’re also working with the Internal Displacement Monitoring Center (IDMC) on a project to understand the flow of refugees internally within a country based on conflict or a natural disaster. It’s a wide range of projects, which is important with a group this size. But almost everything is community-driven, community-led. Everybody’s a volunteer. We’ve been active for about three months.

TW: So Data for Democracy is composed of volunteers—What’s your mission or charter? What brings these volunteers together?

JM: The mission is broad. We are a community using data and technology to work on meaningful social impact projects, full stop. The genesis of it, there seemed to be a sense in the technology community–and in particular the data science community—that had been growing for some time, that there was a need for that community to understand and discover its civic responsibilities.

Perhaps because this latest election was fairly polarizing, I think people on both sides of the aisle want to be more engaged: they want to be participating and organizing, building community, participating in the democratic process, making sure that their voice is heard in the discussion. That typically hasn’t been a role the technology community has played. It’s a moment in which people have a lot of passion and excitement and enthusiasm for this type of engagement, so we wanted to make a space where people could gather, organize, and meet others who were feeling the same sense of responsibility, and find worthwhile projects to dedicate their time and energy to.

TW: Do you serve a political aim? Or is Data for Democracy non-political?

JM: We don’t serve a political aim. There’s people in the D4D community from both sides of the political spectrum. We have volunteers who consider themselves Tea-Party Republicans collaborating with people who worked with Hillary for America. The thing that holds everybody together is a belief in the power of technology and data to have a positive impact on the way that our cities and ultimately our states and country are run. That’s a pretty powerful thing.

TW: Are your volunteers primarily Americans working on American projects? Or is it more international than that?

JM: We’re fairly international, though everybody’s operating in English. Our volunteers skew toward the U.S., Canada, U.K., Australia but there’s also more Europeans interested in working on projects that have more of an international focus. I mentioned the large group that’s working on understanding the flow of refugees inside of countries: It’s a fairly specific humanitarian objective, and the volunteers are partnering with an organization called VINVC which focuses on this kind of internal migration. It’s probably 80/20, with 80% of the community in the U.S., but even on U.S.-specific projects like the one with the city of Boston, the intention is to take the model and process and adapt them to the transportation and mobility data available around the country, and ultimately around the world.

TW: What skills do the volunteers bring to these projects?

JM: A wide variety, under the larger umbrella of data science and technology. There’s folks with data-engineering backgrounds, machine-learning, statistics, software engineering, infrastructure operations. There are people that make the plumbing of all of our software and data applications work, there are communications folks who focus on the story-telling, people who focus on data visualizations, even a few folks who are more product and project managers—They tend to be good organizers for the projects and for the general community.

With a community this large, we have to think deliberately about the mechanics of the community: how you join, how to hook you up with the right projects, how to make sure you don’t feel lost. With projects like this, it’s a little bit like wandering into a big city and trying to figure out where to stay for the night. It can be a little bit daunting if nobody is there to grab your hand. It’s somewhere in-between an open-source software project and an academic research project, like those two worlds coming together.

TW: You beat me to the open-source community analogy. Walk me through the project management model. How do the projects get determined? Who leads them?

JM: The projects come from two places. First, someone in the community will have an idea—something that would be interesting to work on. We have a space in the community for those sorts of conversations, and if a handful of people are also excited about that idea, then they run off and do it.

Second, somebody from outside the community might have an idea, hear that we exist and then approach us for some collaborators and executing on the idea. Our work with the Internal Displacement Monitoring Center is a good example: The IDMC has obviously deep expertise understanding immigration law, but its members are not technologists and data scientist, nevertheless they have important data needs that we can help with.

So far, every project has started with a small core of people—one, two, or three—who have expressed passion for delivering it, and have time and energy to devote to it. I tap them on the shoulder and say “Hey it looks like you’re excited about this, how about you assume the responsibility for leading it, organizing it, setting deliverables, making sure that people understand what this project is about and how to get involved with it.”

So far, that model is working. There aren’t a lot of good working models for collaborative research, like there is for collaborative software development, for example. The people who end up being project leads are so essential to this process: They document objectives, needed skills, the sorts of people who can add value, and the specific, bite-size tasks they can engage in to give something back to the project.

TW: How many projects are you working on? What does delivery look like?

JM: Right now, there’s about a dozen active projects with multiple delivery points. In a sense, there’s no such thing as done. In the election transparency project, the first deliverable was to document county-level elections results back to 1988 for all of the counties in the United States. That was a big marker. Once the volunteers produced that data set, they published it via a partner platform, Data.World. They made that data available to the public. That’s a big deliverable, but it’s just step one.

Next there’s the modeling process to understand what economic or socioeconomic factors might have caused certain counties to flip in any given election year, and what the underling mechanics of that might be. That requires a lot of statistics. The team is close to having models that explain at least some of that phenomenon. The deliverable after that will be reports or in our case, blog posts, where we communicate findings and implications of those findings. Along the way, we’re generating artifacts that can be used by other data scientists and software engineers. Everything we work on we publish as open-source projects.

TW: How is Data for Democracy funded?

JM: We don’t have corporate sponsors. A handful of technology providers have offered their products to the community to use for free. Data.World is a data publishing and collaboration platform—many of their staff are community members and have supported projects in addition to offering use of their platform. Eventador is a streaming data platform that’s been helpful in data acquisition and processing. Mode Analytics is an analytics and dashboard platform that we’ve been using data exploration and visualization. And Domino Data Lab is a collaborative research platform which we’ve been utilizing as well.

TW: How can someone reading this get involved with Data for Democracy?

JM: For an individual, just let us know. We have a couple steps to get you into the community, understand where you might want to contribute, where your skills might sync up with active projects.

For an organization, it’s the same process, but we’ll talk about how Data for Democracy can be useful to the organization. The city of Boston had a very clear idea—we’re working with their data and analytics team, so they had a specific project idea that was appropriate for data science and technology. Then we can frame a project and offer it to the community to see who’s interested in working on it.

TW: Any other projects to highlight?

JM: We’re sponsoring a data visualization project with KDNuggets. The goal is to debunk a false statement using data visualizations as a story-telling tool. It’s a nice way to counter the rhetoric we heard over the course of the last election. People say we’re in this post-factual environment—As data scientist, we have a real responsibility to right that ship. It’s an interesting idea for a contest in trying to get people to think about how they can clearly communicate a fact so that it’s interpretable and that it makes sense and that it’s sticky.

TW: Data for Democracy just hit a thousand volunteers. How important is that milestone?

JM: It signals that this is an important movement for the technology community. This isn’t just a response to the election, this is something that the community needs. This sense of civic engagement and responsibility is a real thing. This is a foundational shift in the way technologists see themselves.

TW: Where do you go from here? What comes next?

JM: There’s always more work to be done. It means making sure that we’re collaborating with partners that can use this kind of help in furthering their mission. When we have the data sets that we’re creating and the models that we’re producing, we’re making sure that we communicate that to the outside world in the broader community…that we’re participating in the national discussion about the kind of discourse that we want our country to have. It means continuing to improve our community so it’s easy for people to get involved, there’s always something for them to do, and that we’re making it a place that’s welcoming and positive and accepting and full of energy, which is what it is right now.

About Toph Whitmore

Toph Whitmore is a Blue Hill Research principal analyst covering the Big Data, analytics, marketing automation, and business operations technology spaces. His research interests include technology adoption criteria, data-driven decision-making in the enterprise, customer-journey analytics, and enterprise data-integration models. Before joining Blue Hill Research, Toph spent four years providing management consulting services to Microsoft, delivering strategic project management leadership. More recently, he served as a marketing executive with cloud infrastructure and Big Data software technology firms. A former journalist, Toph's writing has appeared in GigaOM, DevOps Angle, and The Huffington Post, among other media. Toph resides in North Vancouver, British Columbia, Canada, where he is active in the local tech startup community as an angel investor and corporate advisor.