Pages

Friday, 25 November 2016

Podcast Interview with Craig Taverner, Neo Technology

The interview below was long overdue - but very much worth the wait. For the past couple of years, the Neo4j community has been brewing on a really interesting add-on capability to integrate GIS-style, spatial querying capabilities into Neo4j. It's such a great and natural fit - and one of the driving forces behind this in the community has always been this global citizen called Craig Taverner. Craig has been in the ecosystem for years - first as a community member, then as a commercial customer, and now as an employee in Neo's Swedish engineering team. So about time we had a chat:

Here's the transcript of our conversation:

RVB: 00:02.785 Hello everyone. My name is Rik, Rik Van Bruggen from Neo Technology, and here we are again, recording another Neo4j Graphistania podcast session. And today I'm joined by one of my colleagues actually, in the Neo4j engineering team, Craig Taverner. Hi Craig.

CT: 00:20.685 Hi, Rik.

RVB: 00:21.379 Hey, good to have you on the podcast. Thanks for joining me.

CT: 00:23.946 Thank you.

RVB: 00:24.823 Craig, you've had an interesting journey with Neo4j, first as a community member, then as a customer of Neo4j, and now as one of our engineering team leads. So, why don't you tell us a little bit more about you and your relationship to the wonderful world of graphs?

CT: 00:43.121 Yeah. It's an interesting story. I guess I should say that my background is in Geo-science, but when I had the opportunity to move to Sweden about 17 years ago to work in Telecoms, I got involved in software for engineering telecom networks and optimising telecom networks, and entrepreneurial work starting companies. And I did that for about a decade and a half. My last company, we were building modeling software for doing GIS modeling, mapping the modeling of telecom networks and the data coming from telephones as well. And that's where I got involved in graphs, because it's a very graph-like domain. There's a lot of interconnections, relationships, relationships between phones and the network, between people, between the services, between the signals. There's a lot of complexity there that is really natural for graphs. I had an opportunity to meet Emil back in 2009, I think, and he tried to convince me that Neo4j would be the right database for our product, and I said--

RVB: 01:53.857 He does that with everyone [chuckles].

CT: 01:55.749 Yeah. And he didn't succeed, but he did [chuckles]. He put the seed in my mind. He proved to me that this was a cool idea, but I told him we already were on MySQL and it was working fine for us. And that was it. But a couple of months later, I was faced with a database refactoring problem, and I needed to enhance the data model. And I was sitting there thinking, "This would've been easier in Neo4j." I remembered what Emil had said about the way things worked. It was the whiteboard's friendliness that really sold me. And I put my most junior developer on the project, and said, "Listen, could you just model this up in Neo4j?" And this guy was a guy that always did Google searches for sample code, and even back in 2009, he managed to find all of the examples that he needed from the community, and got it done really quickly. And--

RVB: 02:52.413 No way?!

CT: 02:52.457 --that taught me that the product was more mature and the community was more mature than I had anticipated. And it was the junior guy who did it. So I was sold, and within a month we ported the whole product over and that was the beginning of the story for me.

RVB: 03:07.634 Oh, wow. And is this also when you got involved in the Neo4j Spatial add-on to Neo4j?

CT: 03:15.740 Yes. Once we ported the telecom model, we had a need to visualise it in our map - we had a map user interface - and so I built a simple quadtree as a tree structure in the graph itself, and then presented that at a conference at the end of the year, when I got in contact with Tobias, one of the other engineers in Neo4j.

CT: 03:39.846 Yeah. And he'd done a similar thing, and we started brainstorming, and this led to a collaboration between my company and Neo Technology. So for 2010 and '11, we collaborated to build a Neo4j Spatial data modeling library, which actually is a very, very rich GIS platform for doing quite complicated geographical analysis as a data models within the graph. And that was really, really great. Then, of course, I focused on my own company for a few years, and we moved our markets to Asia and worked there for a while, but when the entire company got shifted to Asia I came back to Neo. So I was out of graphs for a while, and 2014 I switched to actually being an employee and work with the engineering team in Malmø. And that's been really great, because it's been fantastic being a customer, and now to actually get into the insides and really see what's going on deep inside the product, that's fantastic as well.

RVB: 04:41.772 Super. So what have you been working on most recently within Neo4j engineering up there?

CT: 04:47.747 Well, I've been with the company for over two years, and most of the time I was working in Cypher as an engineer in Cypher or as a team lead for Cypher, but the last more than half a year, say eight months or so, I've been the team leader for security. We've been building the first fully-featured security model for Neo4j with multiple users and roles, and all the things you would expect of a security model, which we're now releasing in 3.1.

RVB: 05:16.252 Which is something that I know a lot of our customers are looking forward to, so thank you so much for taking that on. Really great. So you've mentioned a couple of things already, Craig, but what was it that really attracted you and that made you get into graphs most? Is it that whiteboard friendliness or the flexibility? What stands out for you, if you don't mind talking about that a little bit?

CT: 05:43.820 I think the strongest thing for me has been the whiteboard friendliness, and the fact that you can really understand your data model so much better when you work with a database that is so similar, in a way, to how anyone thinks about the data model. In my case with Telecom, is that it was completely natural, and then also with maps. And the map side I think is a particular passion of mine. I've been involved in GIS in many different ways in the past, but when it comes to graphs, the synergy is enormous, and it reminds me of the fact that so much of graph theory actually came out of mapping analysis of GIS. So it's a passion for me to actually see Neo4j get more involved in maps again. Even though we built that map system back in 2010 and '11, that's very external to the product. Now we're looking at building graph capabilities, spatial capabilities into the product itself. And that's going to be super exciting. I'm hoping we're going to get into that more and more quite soon. We've done a little bit in the last year, but if we're lucky, I think it's going to be something we can see some more of in the future.

RVB: 07:01.147 It's really cool to see how it got picked up really quickly in the APOC developments. There's a couple of really nice procedures that allow for much easier access, I think, to the spatial libraries, right?

CT: 07:15.296 What we did is two things. And I collaborated with Michael on this, of course. In APOC we did geocoding only, but outside of APOC, with Michael's support as well, we built a series of procedures on top of the old spatial library, the one that I mentioned before. So that library has been revamped and polished up a little bit now for the 3.X series of Neo4j using procedures, making it far more accessible from Cypher than it was before. So I think that's going to help the market a lot, because up until this point, using the spatial library from the Cypher has been difficult and in fact buggy due to the difference in the way Cypher interacts with indexes, and the way the old library was designed to interact with indexes coded back in the Java API days. So I think this is going to help a lot. The library does have some limitations, some performance issues as well. Although I don't think this is going to make it applicable to all markets, I think it's going to open up the markets enough that we will get the feedback we need, which will help us design the built-in version of spatial for the future, which is going to be fast. It won't have any of those performance problems or any other issues.

RVB: 08:31.149 I can only confirm that it's been one of those domains where there's been a lot of customer use cases as well. If you look at one of our big customers in Europe, like TomTom, they've been doing quite a bit of work on Neo4j already, sort of like confirming that their maps are effectively graphs. There's a lot of interest in it, and I think the APOC work has already made it a lot more usable for non-programmers like myself. I can use the spatial library now, which I couldn't do before. It's pretty simple.

CT: 09:08.059 Well, that's fantastic to hear.

RVB: 09:09.356 Yeah. So Craig, where is this going [chuckles]? What does the future hold, both for the Neo4j developers that you are working on now, but also for things like spatial, maybe even for our industry? What do you think is around the bend?

CT: 09:29.325 Well, around the bend there's so many things you could say. All speculation, of course, when you look into the future. But I could say a few things about spatial in particular, because we have real buy-in I believe, from the company for a certain elemental spatial, the location-based search, distance calculations, point data, which is something that almost anyone in any industry is likely to end up needing. So we see a very large demand for that. And that's something that I think we're going to see coming in very soon, with high quality and high-end performance. But I think there is a interesting thing that we should consider, and that is the whole element of graphs in spatial. If you look at industry leaders in this area, like Oracle, and PostGIS, and others, they are doing some very advanced graph analysis on relational back-ends. What they do is they pull data out of relational stores, build complex graphs in memory, do the graph analysis in memory, and then either present the results to the user or save them back into the tables.

CT: 10:37.086 There is an opportunity there, a sweet spot for a meta-graph database with the index-free adjacency that Neo4j has, to be able to do that far more efficiently and scale to much, much larger sizes, without having to have the same RAM requirements and CPU requirements that the other databases have. But we're talking about something well beyond the current plans for Neo4j Spatial, but something that I think will be a market changer, a disruptive changer there, because no one else can do it the way that we'll be able to do it. So I'm still looking forward to that kind of a disruptive change in the market further down the line. I don't know if we're talking about one or two years, or five years. I'm not sure, but it's something that could be really massive. This also relates to something that's separate from spatial, but if we talk to the customers that use these advanced spatial features, they are interested in thing like time-versioned graphs based on MVCC and other techniques. And I can imagine the product going in that direction anyway, not just because of my interest in spatial, other people's interest in spatial, but we see other markets interested in time-versioned graphs and other aspects like that with really complex indexes, that actually turn it into high-performance data warehouses as well. Then you start to overlap the OLTP and OLAP areas as well, which I think is also an area that the company is interested in.

RVB: 12:07.815 It's funny that you mentioned that, the time-versioning bit. I'm sure you've read some of the work that Ian Robinson did on that, and some of our customers have presented on it, but it's really, really cool and I couldn't agree more, it's one of those things that lots of people have been showing an interest into. So Craig, thank you so much for coming online. You know that we want to keep these podcasts fairly short, although we could talk about these things for at least another hour, probably more of a beer conversation. But [chuckles] thank you so much for coming online. We'll put the transcription up on the different websites, and I'm sure that if people want to reach out, they will. So thank you so much.