Pages

Friday, 17 March 2017

Just before ending the week, I thought I would publish another great episode on our Graphistania podcast. Ever since the launch of Neo4j 3.1, I had been wanting to do an episode about the new Neo4j clustering architecture. It's so innovative, new and a great piece of engineering - we just had to sing its praise :) ... So who better to invite back to the podcast than Alistair Jones, who was one of the lead engineers at Neo Technology to pursue the effort. Here's our chat:

Here's the transcript of our conversation:

RVB: 00:02.563 Hello everyone. My name is Rik, Rik Van Bruggen from Neo Technology. And here I am again recording the second podcast of this year. I know it's only two months into the year, so I've been slacking, but I [laughter]--

AJ: 00:15.346 You've been picking up the pace again.

RVB: 00:16.082 Yeah, picking up the pace again. And for the second episode, I have invited a returning guest to our podcast, and that's my friend and colleague, Alistair Jones, from the Neo Technology engineering team. Hi, Alistair.

AJ: 00:28.380 Hi, Rik.

RVB: 00:29.089 Hey, thank you for making the time. I know you're a busy man these days, so thanks for taking the time. Alistair, the reason why I invited you back is because I know you've been hard at work in the engineering team, on some of the really big, new features in Neo4j. 3.1 was released at GraphConnect San Francisco last year. Or, no, it was actually announced and was released a little bit later, but one of the biggest new features in Neo4j 3.1 was the new clustering architecture, right?

AJ: 01:03.036 Yep.

RVB: 01:03.119 And that was what you and your team were working on?

AJ: 01:05.054 Yeah, it was a really big thing for us, actually. So I've been working on this area for nearly two years, actually, on this new clustering architecture. And as you know, Neo4J is a clustered database designed to run over multiple servers. And we've had clustering in place for six or seven years in Neo. This is the biggest change we've ever made, by miles. It's a huge, huge upgrade of all the technology around the clustering.

RVB: 01:40.760 Wow. I remember like in version 1.8 it was like Zookeeper that was doing some of the work.

AJ: 01:46.629 Yeah, we had a small change in the 1.9 release back in the day.

RVB: 01:54.312 Back in the day, yes.

AJ: 01:55.544 This is a much bigger release in 3.1.

RVB: 02:00.387 So what's it all about?

AJ: 02:01.882 So the first part of it is getting up to date. So the world around us has moved on, and one of the great things about Neo is that we can take research from academia and actually apply it. So reasonably recent stuff that, if you read the academic papers and blog about it, we read all those, and some of those things we can put them fairly quickly into the product. So, for us, this time, it was doing the Raft protocol, which is a consensus algorithm. So what that means is getting agreement between participants, so computers in this case. We--

RVB: 02:52.316 Members of the cluster, right?

AJ: 02:53.396 Yeah. So, in this case, it can be different services in the class are getting consensus between those servers, when the servers themselves and the communication between servers is potentially unreliable. So you need to account for the unreliability in the design. Now, we know a little bit about consensus algorithms because previously, back in that 1.9 release, we implemented Paxos. And, at the time, that was the kind-of state-of-the-art thing to do. Raft, you could argue, at some theoretical level, is the same thing, but it's much more clearly structured. Raft is--

RVB: 03:38.960 You mean a consensus protocol, right?

AJ: 03:40.209 Yeah, yeah, exactly. So it's from Diego Ongaro, who's the lead researcher in this area, and it's really impressive how it's described.

NOTE: Diego reacted to this part of the podcast with a super-cool tweet:

It's actually aimed to be simple to understand and to explain. And that makes it really good to implement because you can be very clear about what you've done. You can see the direction that you've gone in. So we've changed from one consensus algorithm to another.

RVB: 04:13.411 Yep. Which is a big change [crosstalk].

AJ: 04:14.871 Which is a big change, but architecturally it's totally different, because previously we were using Paxos to agree on membership of the cluster. So actually a very small amount of data. Not that many servers. They don't go that often. Now what we're doing is we're using Raft, and we're using it for every single transaction in the database. So every single node, relationship, property you create in the database it goes through the Raft protocol. You've got consensus across the cluster. And what that means is that every single change is agreed to by a majority of the cluster, so no matter what happens in terms of loss of connectivity or failure of the minority of the servers, still, the cluster as a whole agrees on what the state is as you move forward, so--

RVB: 05:06.995 Sounds a bit like open heart surgery to me.

AJ: 05:09.230 Yeah, it's quite a major change, but it's actually really nice. Once you've got that super solid foundation, you can build a whole load of things on top of it. So it's extremely solid for-- it's like the most reliable we could make it, and it stores every single transaction in this replicated log across all members of the cluster. And also as the membership changes, that's agreed to with protocol as well, so you know every time who the people were, who the servers were. People were allowed to [inaudible] transactions and to get them committed. So the whole thing's very tightly integrated into the core of the clustering.

RVB: 05:52.520 So I would never claim that I understand everything about it, but what I've read is that it's very different architecturally in terms of-- previously we had masters and slaves, now we talk about cores and edges, right?

AJ: 06:05.636 Yeah. The second part of this is that we were aiming to have much larger clusters than people had previously been running in Neo. Neo's been around for a long time. And, previously, people used to think of having 3, 5, 10 servers being a large database cluster. Now people want to run hundreds of servers, and we have customers and users running 200 servers in a database cluster. We want to be able to get higher than that, and the consensus algorithm that we were using before, the design of it, or perhaps the membership, yeah, it had a sort of limit on the-- or do we say kind of--? It was hard to get to that scale.

AJ: 06:55.530 And the reason is that all of the servers had to be aware of each other and what they were doing at any stage to basically make sure that they hadn't disappeared. So that led to heartbeats going from every server to every other server, and that ultimately gets very expensive when you have a large number of servers. It also gets very difficult when you're committing across the majority of the servers because you have to wait for a large number of them to come back before you can say, "Yes, this is now safely committed."

AJ: 07:30.229 So just having one huge cluster of Raft servers is not a good design for that kind of hundreds of servers or thousands of servers. So we came up with a new architecture. And what we do now is we divide the cluster into two groups. We mark some of the servers as being in what we call a call. Call servers participate in Raft and they are about safety. They're about storing your data durably. Secondly, we have a lot of potentially much larger group of read replicas. And these are servers that are for running your queries on, and--

RVB: 08:14.895 Read queries, not write queries.

AJ: 08:16.298 Yeah, yeah, read queries. So you don't have to worry about safety here, and the idea is these are about-- they're disposable, where you can scale them up and down; when your web traffic is high a certain time of day, have more and more of them.

RVB: 08:30.725 Just have more of them, yep.

AJ: 08:30.721 [inaudible] your cloud instances when it's quieter, and you can adapt to the shape of your traffic with the read replicas. What's interesting is that the name read is that we're doing more service than reading. Why does that make sense in a--? How does that help you in a database, have more read only things? Surely you need them more so to write. Well, that's because of the shape of graph data. It's because, actually, when we look at the-- I'll show you, because it's a nice slide [laughter] with audio only. You're looking at a slide that shows kind of how we see people do stuff with graphs, and what you notice is that the right [inaudible] updates tend to be quite small.

RVB: 09:17.172 Local [crosstalk]?

AJ: 09:17.605 Yeah, very, very local. Like, two or three nodes in relationships, up to maybe 100 things in a transaction, whereas on the read side - the whole point of graphs is to really fast, and people go a long way - they traverse along the graph in a read transaction. So they're doing hundreds of thousands of relationships in one transaction. Now, that's very fast, but it still takes resources. It takes memory bandwidths, it takes CPU to run these queries. And that's what people are really hammering their graph with, thousands of these, each very big, queries. And that's an enormous amount of computational load. We want to spread that across a lot of servers, and this is a way to do it - have loads of re-replicas that can handle that traffic for you. So it is really helping you in the kind of [inaudible] applications. It's a very specific architecture to the type of system that we're building.

RVB: 10:13.028 Pretty cool. And so, as I understand it, the core is-- so they're all about the safety, and about writing to the graphs, and the age servers are all about reading. Is there any downside to this? Is this good news show all the way around, or are there some things that we should take care with?

AJ: 10:34.656 So there's one thing that's just like-- a challenge here for people when they're deploying these type of applications, is that the transaction's being pushed out from the core, out to the B replicates, and there's some delay in that happening. It's very small, but there is some delay. So people call this eventual consistency, and this is something that we're aware of. And lots of modern sort of web systems that you get into this kind of eventual consistency situation. An example of this that could kind of catch you out is, say you're a user, you create an account, or you make a booking, that's a right transaction. It updates the graph. Then when you come to refresh your page, you try another operation and it's a read only operation, maybe you hit a read replica that hasn't quite seen your update, so, as a user, it almost appears like the thing you just did has disappeared, like you've gone back in time. There's a bit of a--

RVB: 11:45.015 It's [crosstalk] read your own writes problem.

AJ: 11:46.385 Yeah, I can't really-- so what we did at the same time as this, is we actually added a whole new feature that became the name of the whole clustering architecture. So this is what I like to call causal clustering, because we added in a feature of causal consistency.

RVB: 12:08.446 Tell me more about that, because I don't know what that means [laughter].

AJ: 12:10.836 Okay, Rik. So causal consistency. So it's actually something that's been-- again, from research, there's some academic and industry research in this area, but it's not very commonly implemented. There are only a handful of other implementations out there, and what it's about is trying to represent what causally has happened in the user's application. So the cause and effects of the changes that you've made.

AJ: 12:45.088 Practically, it's very easy to use. What happens is that when you update the graph or when you touch the graph in any way, the database can give you a bookmark. And this bookmark represents the latest thing that you've changed or the latest thing that you've seen in the database. And then when you make another request to any other server in the cluster, you can supply that bookmark that's saying bookmark, and the database will make sure that it has at least as up-to-date a state as the bookmark represents. So the bookmark is just a little string and it comes back to your database driver into your application code. You can store it in your application server, or you can hold onto it temporarily while you make another inquiry, or you can send it all the way back to the client. You can send it back to your web browser or your mobile device, and route it back, ultimately, to the database.

RVB: 13:46.373 So that basically assures that the client of the database always takes into consideration everything that it calls [crosstalk]?

AJ: 13:54.484 Yeah, it prevents you from going back in time--

RVB: 13:56.751 Ah, yeah, that's it.

AJ: 13:56.890 --is what it does. And it supports a totally stateless architecture - everything between the user and the database. The database is storing state. Why should you need to store it anywhere else? So this is [inaudible]. Your sessions, you don't need to worry about sophisticated routing. Just have stateless application servers, pass your bookmark around, and you get causal consistency. That's the idea.

RVB: 14:28.017 Wow.

AJ: 14:28.699 And we've tried to make this even easier to use by building some of the primitives. The kind of passing backwards and forwards keeping track of things is built into the database drivers. So in 3.0, we introduced--

RVB: 14:48.561 Right. And so the new version of the driver supports this bookmarking--

AJ: 14:51.515 Exactly, yeah.

RVB: 14:52.721 --and that gives us the causal consistency.

AJ: 14:54.338 The causal consistency, yeah. Exactly.

RVB: 14:56.476 So let's talk a little bit about the future. What's coming up? What are you working on now, and what keeps you up at night, and [laughter]---?

AJ: 15:03.404 Yeah. Well, [crosstalk]. I mean, it's kind of following on logically from where we are now, so the next stage of this is to be-- it's that kind of how people actually deploy this stuff. And these days, not just a cluster of servers that are using it to run a database. It's also servers across multiple data centres and multiple regions around the world. Around the country, all around the world. So that's what the cloud environment's been very easy to do, to have geographic distribution. And we are taking account of that feature in the product, or that server usage in the product. So what we're going to do is make the clustering aware of data centres and how they're organised, and allow the client to give hints about how might be the best way to serve it. So that means that you can do your reads from a server that's very close to you, with a low latency, and you can support fault tolerance across data centres when one of them goes away, or explicitly recover in a disaster recovery zone. All of these different operational scenarios. So--

RVB: 16:24.512 Is that something that's coming up in the next couple of versions of Neo4j or--?

AJ: 16:27.024 Yeah, yeah. So in the next couple of versions, that's the stuff that's going on. And, again, it's to be seamless all the way through the driver, so you write your application once for Neo4j on your laptop, and then it should move forward [inaudible].

AJ: 16:52.896 Yeah. So I always miss the visualisation. I try to devote my spare time to get back into it every now and then, so--

RVB: 17:03.147 Very cool. Well, thank you so much for spending your time, Alistair. I mean, we want to keep these podcasts fairly short, but I'm sure we'll include a bunch of links to the documentation and the blog post that we wrote about this topic. I really appreciate you making the time, and look forward to seeing what's up next.

Monday, 6 March 2017

Last month I had one of those cool encounters of the graph kind at the Belgian Beerfest that we have been organising a couple of times in the the last few years at the occasion of Fosdem - the amazing open source conference that's taking place in Brussels every year. This year, I got talking to a fellow countryman that has been doing some amazing work on integrating the Drupal content management system with Neo4j - something that has a lot of potential in a lot of areas, I think. So - we just HAD TO have a chat :) ...

Here's the transcript of our conversation:

RVB: 00:03.346 Hello, everyone. My name is Rik, Rik Van Bruggen from Neo Technology. And here I am again the third time in two days, this is wonderful, I'm on a roll here, recording another podcast for our Neo4j Graphistania podcast. And today I have a fellow Belgian on the other side of this Skype call, and that's Kristof Van Tomme from Pronovix. Hi, Kristof.

KVT: 00:27.466 Good morning Rik. How are you?

RVB: 00:29.593 I'm really well, and I hope the Skype gods bear with us, because we've had some trouble in the past couple of minutes, but I'm sure it will fine. Hey, Kristof, we met each other at the FOSDEM conference, which was a great experience, and I loved the Beer Fest afterwards [laughter]. But yeah, you told me about some really great stuff that you guys are doing with graph databases. So, first of all, let's start from the beginning, who are you, what do you do and what's your relationship to the wonderful world of graphs?

KVT: 01:07.202 So I'm a bit of a weird duck because I'm actually a bioengineer who ended up in IT through a biotech startup that did research in schizophrenia. It's a whole other life. But I got involved in the Drupal community a little over 10 years ago when we started making websites for biotech companies.

KVT: 01:38.557 Yes, Drupal the open source content management system. The other really good Belgian product after beer and chocolates [laughter]. And I got really strongly involved in that community 10 years ago. I helped organise one of the big European conferences, and then we built a consultancy around that. Then, about five years ago, I got really excited about documentation, and reuse of documentation specifically, and how to deliver it and reuse bits and pieces so that you could build deliverables that can easily reuse between different channels. And that's how I got excited about graph databases, and Neo in specifically.

RVB: 02:32.949 When you say documentation, you mean technical recommendation for software, right?

KVT: 02:44.417 So that's how I got involved in-- because we had one of our colleagues, a long time ago, I think six years ago or something, started playing with graph databases, and actually, he built a first connector for Drupal for Neo. And he's like, "Kristof, I did this thing, and I'm really excited about graph databases, and I think it's cool. Can we do something with this?" And I was like, "I have no idea." So that was the first connector for Neo for Drupal, and then that kind of died because there was-- technically it was there, but then there were no further implementations, and I was not sold, and people didn't figure out how to use it. But then because of the documentation thing, I actually started seeing what you would use a graph database for and that's when I got really excited.

RVB: 03:46.370 Super cool. Because documentation, I don't know if you notice, but this is where Neo4J started as well, as an open source project, 15 years ago, Viking hackers in a garage. They were all about content management at the time as well because they were working for a media company that was managing digital assets. So it's funny that there's this convergence or link between the two worlds, right? What is the use case all about? How does it work?

KVT: 04:20.696 So I've been thinking-- I've got this DITA, which is another of those words. It's a standard that's fairly popular in the technical writing community for writing reusable documentation. It's like an XML standard. Some people scratch their heads when they hear about it, and other people are raving mad about it. So in the DITA community, I've been doing talks about consult management systems and open source and things like that. I think two years ago, I started thinking about personalisation and embedding information. What I dream about is this; instead of having a manual that the documentation system knows who you are and serves you the right information when you need it. I did a talk about that at the DITA conference here, I think it was in Europe, and I was thinking, "So how would you do that?" And then I started thinking yeah, actually, probably it wouldn't really work with a relational database because you need to start collecting a whole lot of information and start analysing for patterns. And that's how I started thinking about Neo and graph databases more in general.

RVB: 05:48.382 So as a personalisation engine for documentation, right? So you wouldn't need to search for documentation as much, but you would have a recommended set of documentations that would be served to you semi-automatically.

KVT: 06:04.195 Yeah. So it's the idea that, for example, you're in an application, you're in a web app, and you can't find that one damn button that you know is somewhere--

RVB: 06:16.996 We've all been there.

KVT: 06:18.043 Yeah, we've all been there. So you're clicking around, and you're going through settings, and I don't know, connections, so you keep going circles and circles and circles because you can't find the damn button. And at that point, the system would say, "This looks a lot like what people do when they're looking for this thing," and then you would get a little pop-up saying, "Are you maybe looking for this?" And similarly, if you're using a certain feature and you're doing something really weird and other people have done that, and then they went through the documentation and found some other feature, then you could shortcut that and skip a few jumps in that graph and immediately serve them the information that they're looking for. So it's kind of like analysing patterns of behaviour that people have inside of a web application and then serving them-- that's patterns of behaviour that they normally do just before going to documentation sites and then serving them that documentation that people normally will find when they go to documentation site after they've done a certain thing, and then serving that information to them. So that's one of the really cool things that I would like to do.

RVB: 07:32.834 Yeah, I understand. So why is that such a good use case for a graph database? Is that because of the pattern recognition, or what's the secret sauce?

KVT: 07:44.973 So it's the pattern recognition. So I think CMSs are really good at storing data in a-- storing similarly structured information because most of CMSs use SQL databases and they're pretty good at that, just building up a content model and then reusing that over and over again. But being able to recognise behaviour-- well, that's not something that we are normally doing in the CMS space. We have some very basic things, like there's some recommendation based on the content and shared keywords and things like that, but behaviour analysis is not one of the things that you normally find in the CMS. So for that, we need different technology because in a SQL database you would have to do so many joints to even figure out what's going on, yeah, that I don't think that it would make sense to do it that way. And ideally, it would be a system that you don't have to program everything but that it can start looking for patterns on its own eventually. And that you build this graph of interactions and content and kind of like a graph that combines those two to do things with that. So yeah.

RVB: 09:04.602 So where are you guys with this? How far along that path are you? I know you've done some prototyping already, right?

KVT: 09:11.567 Yeah. So we are very, very early. So our main business right now is developer portals. So two years ago we started working-- well, a year and a half ago we started working with APG, that's now part of Google, and they have a developer portal that we are customising for their customers. And we built this whole business around documentation, specifically about APIs, so that's where our core focus is right now. And so the AI and personalised documentation is something that we're doing research on. So the thing we've done currently is we've built a connector for Drupal for Neo - I did a talk about that at FOSDEM - and that was--

RVB: 10:02.991 I went to that one, yeah.

KVT: 10:04.291 Yeah. So that talk was not just about this use case. It was about what could you do if you combine a CMS and a graph database and looking at it from an added-value perspective, rather than a replacement perspective. Because I know that in the DO community people are like, "Just get rid of the stupid SQL databases [laughter]." They're worthless and graph databases can do everything so much better. I think--RVB: 10:37.056 That's a pipe dream in my opinion.

KVT: 10:38.368 Probably. You could build a CMS graph database, and I think that could work. But I think that there's so much existing technology already where it's a large amount of extensions and huge communities that it would make more sense to create an add-on instead of a replacement because if you replace it then you have to rewrite everything.

RVB: 11:05.120 I couldn't agree more.

KVT: 11:06.124 Yeah. So that's why I think that's their sweet spot for Neo in the CMS community but I think there's two stress facts to this. One is the sweet spot for neo in the CMS community, and that could be recommendation and pattern finding. But then there's also the inverse that you could think about and that's what if you were to put an open source CMS like Drupal in front of a graph database and we use it as an interface to manipulate the graph and to add, maybe, some structured objects into your graph? And then use the CMS to build reports about those objects and the graph to find out which ones you're going to put into your reports. So that was my talk about.

RVB: 11:57.922 Well, you've already touched on my last question, which is what does the future hold [laughter]? What could we do in the future? And I know that we'll be doing some meet-ups together and I'm really looking forward to those, but where does this go, Kristof? What's in your crystal ball?

KVT: 12:21.174 So I love thinking about a future. I really love Kevin Kelly's book, The Inevitable. And in that book, he talked about-- I think this is the basic pattern that got me thinking about this, also. He talks about flowing and it's a very, very interesting concept that we're moving from an Internet where we used to have documents to an Internet where we have pages today, where we'll have flows of information tomorrow. And this idea of going from having an object that's structures and it has a context-- has a manual context, or a book context, or a document's context where you put all the information in context of the rest of the book into a very rigid structure. That's how we used to do things. That's how books and manuals were built, even when printing press-- even before the printing press was invited. And what the Internet has been doing, and what search engines have been doing, is that we've been moving towards pages where you can just dive into any object-- sorry, any document, any book, and just find out one page where a certain concept is explained. So you can just jump in. You don't have to read the whole book to be able to understand something. And that's where we are today. But I think that's the next step in this process, and it's also what Kevin Kelly talks about is flows, where you have a flow of information that's much more personalised, and we're just constantly dipping in and out of these information flows around us that are serving us the documentation that we need at a certain time to be able to do what we need to do and that are aware of our contexts so that we don't have to adjust to the context of the documentation, but the documentation adjusts to our own personal context, and I think-- yeah?

RVB: 14:31.872 So what I'm hearing is you see this graph database integration and everything that you guys are building as a means to that end, to get there somewhere, somehow, to get closer to it.

KVT: 14:44.560 Yeah. So we have a first customer where I've been talking about this concept, and-- they're an SaaS company. So what I imagine is that we could track users, the administrators as interacting with the software, and then basically serve them the contents this way where you look at their whole experience inside of your tool, and then you serve them the information they need to be able to interact better and get more value out of your system. So it's kind of like the idea-- the way that I describe it going from the context of the manual to the context of the one, like the one person, one single user and how they are interacting with the system. This is very, very-- there's a lot of work to get here [laughter]. But I think that we can take baby steps, start with first implementation. Start with building a graph of the behaviour and how people interact with documentation and with the tools that are documented by the documentation and then use that to start recommending content. And yeah, I'm really excited about it. We started a mailing list about it at one of the meet-ups where I was presenting. We actually had one of the people that worked on the Clippy years and years ago at Microsoft who was also really excited about the idea. Because I think this is actually what Clippy wanted to do, or wanted to be, but it was not possible. And I think that graph databases could be the piece of technology that enables the dream of Clippy [laughter].

RVB: 16:40.452 Well, I think on that bombshell [laughter], I think that's a great time to kind of wrap up this podcast. Thank you so much for coming online, Kristof, and we'll be publishing some more details around your work and also the talks that you've been doing with the transcription of the podcast so people can read up about it. And I look forward to seeing you at one of our meet-ups, right? Because we'll be doing some community work together in the next couple of months as well. So really looking forward to that.