Manifesto Data Practices Explained by Patrick McGarry

Patrick McGarry, Head of community at data.world details the “Manifesto for Data Practice Workshop” held at this year’s Driven conference.

Krista: So, thanks, Patrick, for sitting down with me. So you’re one of the authors of the data practices manifesto. Can you tell me a little bit about that?

Patrick McGarry: Yeah, definitely. So, the data practices manifesto is actually one of the things that came out of an event that we put together last November. Last November we decided that we wanted to get together a lot of the visionary leaders in the data science community so semantics, data visualization, data journalism, so a number of different portions of the open data and data science community. And basically put them in a room and shake it to see what happened. And out of that came a number of things but the most notable which was the data practices manifesto. We were looking at the umbrella of the data science industry and profession as a whole and how it was growing. We noticed that it was very siloed and there weren’t a lot of best practices that were being broadly adopted. And so we looked at a number of different ways that we could help effect change. And one of those was looking back at the open source world and the software development world and how the agile manifesto and agile methodologies in practice really flipped that waterfall software development on its head and changed how software development was done in a broad degree.

Patrick McGarry: And we wanted to do something very much the same and so the data practices manifesto is our agile manifesto except focused on the data ecosystem so it was a very interesting event and I think the manifesto is going to lead to a lot of other things. Exercises like the workshop we have here and it’s all community driven. So all of those leaders and then the community got behind the first version and helped us reach the version where it is now and I’m sure that it will only continue to grow.

Patrick McGarry: Yeah, so the … One of the interesting things that came out of that agile world was that a lot of consultancies and a lot of education and change were driven through starting at that point and so we’re looking to do something very much the same with the data ecosystem. And so in that, we’re looking to introduce some things like this workshop and this workshop blends together the agile portion but we’re also taking cues from design thinking workshops like IBM’s … What was it? The design thinking field guide and they were looking at like the ideal method cards and a number of different things that have really worked in software development and design and other avenues. So we’re hoping to kind of meld all that together to really help people understand and, more effectively, work together on data team work.

Patrick McGarry: If you look at Gartner and Forrester and some of the different studies that have been done, the chief data officer, one of the newest additions to the C Suite, is really facing some challenges around “How do I create a data driven culture within my organization?” And the estimate is that anywhere between 50% and 60% of them are either failing or going to fail. And so we’re just trying, as a community, to come together to enable the leaders in the data industry but also practitioners to have better tools at their disposal.

Krista: In your first answer, you also talked about open versus closed data. Well, more about open data. So, tell me what the difference is between open and closed data and why should we care?

Patrick McGarry: Sure. I think I alluded to this a little bit before but a lot of the things that are happening, the discussions that are occurring in the data ecosystem right now, are very much … Well, they’re very similar to what had come before in the open source world. My background is very heavily rooted in the open source world. And so it’s been interesting to watch the level of discussion, the visceral reactions are very similar now as to 15 years ago or so in the open source world where when you said, “Hey, you should share your source code. There’s all this return on investment that can happen.” The immediate, that visceral response was, “No, no. My source code is for me. Like, I don’t share that. That’s my IP.” And it took a long time for that discussion to kind of evolve to the point where people were finally realizing what they could get out of sharing their source code.

Patrick McGarry: And, I think, get how this, obviously, the ultimate realization of over $7 billion for a single property like GitHub. You know, software development has definitely embraced the open source paradigm. At least to a large degree. And in open data there’s, I think, a very similar parallel that we can follow. You know, we can learn from what came before with open source so, thankfully, the discussions won’t take as long. But I think the amount of impact that you can have for sharing open data can be just as impactful, if not more. Sir Tim Berners-Lee, who has been preaching the semantic web, he loves to give this talk. I’ve seen it a few different ways. And he talks about, “Oh, hey, I invented this internet thing. It was pretty cool. It had some impact.” But if we take the same lessons of linking documents together, which gave us the web, and apply that to linking data together which could give us the semantic web. He said it could have more than 10X the impact on the world if we do that the right way.

Patrick McGarry: And so we realize that there will be times when you don’t want to have open data. Personal, identifiable information, personal health information, financial data, there are some things that always will be private, just inherently because of the type of data it is. But we think that there are also a lot of data either open to the community and to the world or even just open internally ’cause we’re seeing that there’s a lot of silos within organizations still, especially when it comes to data. There’s a lot of dark data that people just, they can’t find easily, they can’t utilize because it’s in different formats. And so we’re really trying to bring the industry together to create more openness, whether that’s just openness internally within your organization, so that you can raise the bar on data literacy across the organization. Or whether it’s open to the world so that you can share and get that same open source kind of ROI.

Krista: Well, thank you for coming to Driven and for sitting down with me and thank you for doing your workshop. It was really fun.