Pages

Friday, 18 May 2018

Podcast Interview with Iryna Feuerstein, Prodyna

Finally. It seems I am getting increasingly bad at getting great podcasts episodes that I have actually recorded a while ago, out there. This is one of them: I had a fantastic chat with Iryna Feuerstein from Prodyna some weeks ago. She has done some amazing talks on Neo4j, and on related subjects (like for example her work on toxicogenomics). So I am very happy to get this episode out there - and hope you will enjoy!

Here's the transcript of our conversation:

RVB: 00:00:02.736 Hello, everyone. My name is Rik Van Bruggen from Neo4j, and here I am on this wonderful Tuesday evening recording another podcast with someone from Germany, and that's Iryna Feuerstein from PRODYNA. Hi, Iryna.

RVB: 00:00:36.457 Fantastic. Iryna, why don't you introduce yourself to our podcast listeners. I know you've been working for PRODYNA. That's a Neo4j partner in Germany and elsewhere in Europe actually. You've been working for them for some time and involved in graphs. Right?

IF: 00:00:54.081 Yeah. So I'm a software engineer at PRODYNA. You have already had a podcast series with Darko Križić who is our CTO. So for everyone who wants to know more what PRODYNA's doing with graphs, you can also check this episode. Yeah. But I'm working for PRODYNA since five years, and since about four years I'm also working with Neo4j and, especially the last two years, very intensively in production projects. Yeah, that's it about me.

RVB: 00:01:34.087 Wow. And what kinds of projects have you been working on? How did you get into graphs, and what's the story there?

IF: 00:01:42.952 So there are two points from which I came to the graphs. The one is my studies in mathematics, which was just completed last year, and my specialization was graph theory. But it was very interesting but also very theoretical. On the other side, I started working with Neo4j and find it really cool that I can use the theoretical knowledge also into practice. Currently I'm working on a big project using Neo4j as a knowledge graph. We are trying to implement semantic search on the basis of a big, huge amount of documents, technical documentation, research papers, and different-- yeah, really, really big amount of documents to make a better search and also to rank better the results for the users.

RVB: 00:03:25.922 Very cool. Excellent. All right. And I heard that you did a talk at the conference recently about German regulations and stuff like that. Can you tell us a little bit more about that?

IF: 00:03:40.274 Yes, that's right. So I took part at the JavaLand. It's quite a big conference in Germany organized by the Java user communities, and I was talking about the usage and data and analytics with graphs. So how can you analyze your data? How can you manage your data with graphs? And the idea was to find some use case which not so obvious. So there are some use cases which are about paths, about relationships. But if you are thinking about regulations and about laws at the first moment, nobody is thinking about graphs. But you can still do such cool things. And also semantic search, find similar documents just by comparing the norms or keywords, the sets of keywords which they are using. And so the graph which I used for demonstration contains German laws connected between them and also some verdicts and judgments from German courts, which are available publicly. And I just extracted the mentions of the norms in those documents and connected them to the norms already imported in the graph. And then I was able to find similar documents to find norms which I used more frequently together and do a lot of other cool stuff. Yeah. So graphs can be used really in a lot of use cases and not only in the obvious ones for networks.

RVB: 00:05:30.979 That sounds fantastic. That sounds fantastic. I mean, we've had quite a few people talking about laws and legislation, but also verdicts because they always reference one another. Right? They reference--

IF: 00:05:43.216 Yes, that's right.

RVB: 00:05:44.505 A legal document is typically a very much interlinked document that references a lot of this. It's like an academic paper almost, right, where you have these references all over. Right?

IF: 00:05:55.720 Exactly. And you can use graph algorithm like PageRank, for example, to find the more meaningful documents or the more important or somehow central norms or regulations.

RVB: 00:06:09.320 Sure. Yeah. So maybe we can put some links to that talk and then your presentation in the transcription of the podcast later on for our listeners. But maybe I can ask you the most important question, really. Why [laughter]? Why did you get into this, Iryna? What attracts you to graphs? What's your summary there?

IF: 00:06:35.308 Quite often at conferences or meet-ups which I'm also attending a lot, other participants are asking me, "So please tell me which is the use case, when you know. So here you need the graph?" And I say, "It's easier to me to tell you which use case is not suitable for graphs because there are so many, and graphs are about optimization tasks. They're about planning, about paths and relationships, about reasoning in the graphs." So there are really a lot of use cases which can be modeled, and they can be made more easy and not so complex. So that's also the work of almost everyone, mathematician or mathematical practitioner, to solve some problem from a real life, from the real world, which is very complicated. And all they are trying is to somehow model this world in a formula, to find a solution. And this modeling is getting easier if you can use graphs.

RVB: 00:07:51.606 Fantastic. I couldn't agree more. But you raised an interesting point, Iryna. Where would you not use a graph? Have you given that more thought?

IF: 00:08:00.934 So I probably won't use a graph database for saving, I don't know, big binary data like images. So if I have really a lot of images, I just save, preserve them maybe in a free storage. I can link them in a graph database with some keywords to quickly find them again. That is actually something we are quite often doing in our projects, linking some PDF documents, some images in the graph in order to find them quickly. Yeah. But I want also to save these images in a relational database. That's also something which is not suitable for that. Actually, every use case you can solve with the relational database is also solvable by the graph database, [and even more?].

RVB: 00:08:55.553 Yeah. Absolutely. I mean, I actually had a conversation today with a user about this myself. And the way I always think about it is if you have a very, very rare case where you don't have connections, where you don't have relationships, where your entities are just on their own, then yeah, why would you use a graph? Right? If it's not connected, then that would be stupid. But that's so rare. Right [laughter]?

RVB: 00:09:28.862 It's very difficult to find it. I mean, the closest thing I guess is things like time series data, but even then a timestamp or a log n tree is related to something. Right?

IF: 00:09:40.818 Yeah. Even then, they are related between them because events are happening during the time, because some other event happened beforehand. So, even then, you can use a lot of graph algorithms for your problems.

RVB: 00:09:55.859 So [damn it?], they are everywhere [laughter].

IF: 00:09:59.331 Yeah, that's right [laughter].

RVB: 00:10:01.142 Absolutely. All right. Well, Iryna, why don't we talk a little bit about the future. What does that hold for you at PRODYNA but maybe also for your studies? Where do you see this wonderful world of graphs going for our industry maybe even?

IF: 00:10:20.199 So if you take a look at the past, the graph theory erased for about 300 years ago; probably founded by Leonhard Euler with his seven-bridges problem. And in the math history relations, it's quite a short term because mathematicians took more than 300 years to solve the Fermat theorem. So 300 years, it's not much [laughter]. But in the relations of our 21st Century digital revolution, it's an epoch. It's a lot of time. So the first graph algorithms were developed before the computers came into our life. But for quite a long time, graph theory was theoretical, mathematical studies and field which-- yeah. So the people who started computer science they knew, yes, you could use graphs to solve some kind of problems. But nobody in the industry was really using them. And with the database Neo4j, you brought graphs to a broad audience and made them tractable in a software practice. That's something very great. And now, I think, in the future, it would-- so both the theory and the industry will profit from each other because now you have those theoretical problems like the minimum cut problem, which is now a problem of the industry. You want to partition your graphs because you'll have a lot of data, and it then has to be made quickly. And I think that would be also a big push for the research, for the theoretical research, to find even better algorithms to make them even quicker and more robust and more simply also, more intuitive. Yeah. And for the industry, we have still a lot of theory which can be implemented and used for the everyday problems.

RVB: 00:12:56.532 Yeah. That's a great place to end our recording. I want to thank you one more time, Iryna. It was great chatting with you, and I look forward to seeing some more of your work, and of your presentations, and your studies maybe even.

IF: 00:13:18.020 Thank you.

RVB: 00:13:18.649 And then no doubt we'll meet again in this bright, graphy future of ours.