@rotnroll666 Just discovered the nice #arc42 based documentation of biking2. Thanks for providing such nice example. Got the hint from @DevBoost that #arc42 might be something helpful. The biking2 example helps a lot!

This refers to arc42 by example. There will be a second edition of this book soon, with new contributions from Gernot and new, Ralf.

My example in this book, biking2 is still up to date. I used this now several times at customers and every time, I also recommended my approach of generating the documentation as part of the build process. You see the live output here.

I’m using the Asciidoctor Maven plugin. That means they are an essential part of the build and aren’t left to die in a Confluence or whatever.

Even more, I use maven-resources-plugin to copy them into the output dir from which they are put into the final artifact.

While the documentation is strictly structured by the ideas of arc42, here come’s the fun parts:

I’m using jQAssistant to continuously verify my personal quality goals with that project: jQAssistant is a QA tool, which allows the definition and validation of project specific rules on a structural level. It is built upon the graph database Neo4j.
jQAssistant integrates also in your build process. It first analyzes your sources and dependencies and writes every information it can discover through various plugins into an embedded Neo4j instance. That is the scan step. In the following analyze step, it verifies the data agains build-in or custom rules. You write custom rules as Cypher queries. Now the interesting fact: Those concept and rules can be written in Asciidoctor as well. This one, concepts_structure.adoc, declares my concept of a config and support package. Those concepts are executed in Neo4j and add labels to certain nodes. The label is then used in this rule (structure.adoc):

“Give me all the packages of the main artifact but the config and the summary package that depend on other packages that are not contained in the package itself.” If that query returns an entry, the rule is violated. I use that rule to enforce horizontal, domain specific slices.

Now, the very same, executable rule becomes part of my documentation (see building block view) by just including them. Isn’t that great?

I also include specific information from my class files in the docs. Asciidoctor allows including dedicated part of all text files. For example, I use package-info.java files like this directly in the docs: “bikes”. I did this to a much bigger extend in this repo, find the linked articles there. I love Asciidoctor.

Last but not least, I use Spring REST docs in my unit tests. Spring REST docs combines hand-written documentation written with Asciidoctor and auto-generated snippets produced with Spring MVC Test. A simple unit test like this gets instrumented by a call to document like show in this block. This describes expectations about parameters and returned values. If they aren’t there in the request or response, the test fails. So one cannot change the api without adapting documentation. You might have guess: The generated Asciidoctor snippet as included again in the final docs, you find it here.

I started working on the documentation in early 2016, after a great workshop with Peter Hruschka and Gernot Starke. This is now 2 years ago and the tooling just got better and better.

Whether you are writing a monolithic application like my example, micro services or desktop applications. There’s no reason not to document your application.

Here are some more interesting projects, that have similar goals:

Oliver Drotbohms Moduliths. Oliver strives to create great monoliths by enforcing structure in code through rules and integrates with other tools like jQAssistant as well.

There is docToolchain, that describes all the tooling needed for creating living docs.

Also, as an added bonus, there’s DocGist, that creates rendered Asciidoctor files for you to share. Thanks Michael for the tip.

]]>https://info.michael-simons.eu/2018/12/05/documentation-as-code-code-as-documentation/feed/5Passion.https://info.michael-simons.eu/2018/11/30/passion/
https://info.michael-simons.eu/2018/11/30/passion/#respondFri, 30 Nov 2018 22:57:46 +0000https://info.michael-simons.eu/?p=2919In “my” industry people speaking a lot of doing things “with passion or not at all.” I’m wonder, are we aware of what passion meant?

The word itself means both in Greek and Latin “to suffer”, which fits very well with the some of the crazy work ethics out there.

I for myself try to stay with the “good feelings” that arose with from the Stoic and might add to them: Engagement instead of passion. I’ll always do my best, but I will not suffer. Or at least, I try not to.

Just a late night thought.

]]>https://info.michael-simons.eu/2018/11/30/passion/feed/0Modeling a domain with Spring Data Neo4j and OGMhttps://info.michael-simons.eu/2018/11/02/modeling-a-domain-with-spring-data-neo4j-and-ogm/
https://info.michael-simons.eu/2018/11/02/modeling-a-domain-with-spring-data-neo4j-and-ogm/#commentsFri, 02 Nov 2018 12:11:03 +0000https://info.michael-simons.eu/?p=2895This is the forth post in this series and I want to keep it short and simple.

A domain can be modeled in many ways and so can databases. As long as I deal with them, I always preferred the approach: Database (model) first. Usually, data is much longer around than applications and I don’t want my first application instance or version define the model for all eternity.

Using an Object-Graph-Mapper or Object-Relational-Mapper can be slightly danger. One tends to write down some class hierarchy and just let the tool do its magic. In the end, there are schemes that sometimes are very hard to read for humans. The danger might be a bit smaller with an OGM as hierarchies and connections map quite nicely onto a graph, but still, I don’t want that to be the default.

My domain can be summarized with a few sentences:

There are Artists, that might be highlighted as Bands or Solo Artists

Bands have Member

Bands are founded in and solo artists are born incountries

Sometimes Artists are associated with other Artists

Albums are released by Artists in a year which is part of a decade

Albums contain multiple tracks that have been played several times in a month of a given year

I spare you the logical ER-diagram for a relational database here and jump straight to the nodes. I highlighted all the important “classes” and their relations.

Modelling data with Neo4j feels a lot like modeling on a whiteboard. And actually, it really is: The whiteboard model ends up being the physical model in the end with Neo4j.

Neo4j is a property graph database. It stores Nodes with one or more Labels, their Relationships with a type among each other and properties for both nodes and relationships.

A label starts with a colon and is usually written with a initial upper case letter, i.e. :Artist and :Album, the type of a relationship is written with a colon and than all uppercase, :RELEASED_BY and properties in camel case, without a colon, i.e. name and firstName. The above list translates in my application to a model like this:

I really find it fascinating how that model reads: Pretty much the same as my verbal description.

How to model this with Neo4-OGM and Spring Data Neo4j? You might want to recap the previous post to get an idea of the moving parts.

Value objects that happen to be persisted

In my domain the most simple objects are probably the year and the decade of the year. Those objects are value objects, they don’t have an identity. A year with a given value is as good as another instance with the same value.

I did model both of them to use them further on in aggregates, for example in the relationship RELEASED_IN, but I don’t see providing dedicated repositories for them. They only have a meaning in connection with other nodes. Things to notice here are: I followed a naming convention: All classes that are mapped to something inside the graph ends in “Entity”. Thus I have to use label attribute of @NodeEntity (or the default attribute) to specify a “nice” label, i.e. @NodeEntity(label = "Year"). I use @Index for completeness. One can configure Neo4j-OGM to automatically create indexes, but TBH, I prefer to create them by hand. There’s also one outgoing relationship from year to decade. A year is part of a decade: @Relationship("PART_OF").

You also notice that I didn’t model any of the other outgoing relationships from the year: Like all the albums released in this year, all the months with play counts in that year or the foundation years of band.

While all of Neo4j, Neo4j-OGM and Spring Data Neo4j could map relationships many levels deep, I don’t think it’s wise from an application performance point of view. I’d rather explicitly select the stuff I need.

A common base class for aggregates

I’m known for my dislike of having common base entities:

I like it a lot. Still having a hard time to understand why introducing "Base entities". I often see this, mainly for an ID column and some auditing. Heck, I even added one to a Spring Data Neo4j example myself to see if our code works as advertised.

For this project however, I included one for several reasons: To not pester every other entity with the technical id, to audit interesting entities and plain simple, to have an example that actually uses our (Spring Data Neo4j) support of Spring Data’s auditing in inheritance scenarios. This is what the class looks like:

While it’s often preferable to extend such a repository interface not from the concrete store, I do think it’s better having the concrete store at hand in the case of Neo4j. While the concrete implementation brings a lot of CRUD method one doesn’t need all the time, it also brings in overloaded versions of them that take the depth into account as well.

To mitigate the large surface of repository methods, it’s often a good idea to reduce the repositories visibility to a minimum.

I don’t see a problem using such an entity and repository directly from a controller, for example like this:

One use case is definitely “give me all the albums having a specific main genre.” To implement such a case, I rather access that sub collection from the owning site of the relationship. Here, the albums:

Complex aggregates

The Artist is a complex thing. It exists in three different forms: An unspecified artist, solo artist and as bands.

While Neo4j-OGM allows you to add a list of labels to your domain and thus allowing one entity to be mapped to several labels, I don’t like that approach.

Bands and solo artists have quite different attributes, as you can see in the sources linked above and I don’t want them to mix up.

By declaring the Artist class with @NodeEntity("Artist") and the band, which extends from it, with @NodeEntity("Band") and solo artist accordingly, bands and solo artists are stored with this two labels. Polymorphic queries works to some extend with a repository for the base entity, but as Neo4j-OGM applies schema based loading, stuff can be missing from the result.

As you see: No setters for the member and a getter that returns an unmodifiable list. Thus adding (and removing) members goes only through the band. The modified band is then returned. As Neo4j-OGM and Spring Data Neo4j don’t do dirty tracking and don’t save things automatically at the end of a transaction, we have to take care here. Again, I recommend a service layer:

To close this up, one final example: When entities are being deleted through Neo4j-OGM, it deletes only relationships, not the target nodes of the relationships. You have to decide wether you want “dangling” nodes in your database or not. Sometimes this is ok, sometimes not. As of today, Neo4j itself has no foreign key constraint on the relationship. And how so? It’s complete ok that a node exists for its own.

In my domain here however, albums without an artist and tracks without albums serve no purpose. To delete them when I delete an artist, I do this again through a service. The session in the following snippet is the autowired OGM session. It’s completely ok to access it. Spring Data Neo4j takes care that it participates in ongoing transactions:

Yes, there is Cypher hidden away in a class. Sometimes there are compromises to be taken, and this one is a comprise that’s ok for me. There’s also JCypher, maybe that would be something to try out in the future.

With all the things here in this post, it’s easy to write a nice application, that deals not only with CRUD, but already presents all the interesting associations:

The complete application is available on GitHub as “bootiful music”. It has some rough edges, also ops wise, but the repository along with the posts of this series should help to get you started.

I’d like to thank Michael a lot for the idea of this query, which results in nice micro genres or categories:

]]>https://info.michael-simons.eu/2018/11/02/modeling-a-domain-with-spring-data-neo4j-and-ogm/feed/1No silver bullets here: Accessing data stored in Neo4j on the JVMhttps://info.michael-simons.eu/2018/10/29/accessing-data-stored-in-neo4j-on-the-jvm/
https://info.michael-simons.eu/2018/10/29/accessing-data-stored-in-neo4j-on-the-jvm/#commentsMon, 29 Oct 2018 12:16:57 +0000https://info.michael-simons.eu/?p=2865In the previous post I presented various ways how to get data into Neo4j. Now that you have a lot of connected data and it’s attributes, how to access, manipulate, add to them and delete them?

I’m working with and in the Spring ecosystem quite a while now and for me the straight answer is – without much surprises – just use the Spring Data Neo4j module if you work inside the Spring ecosystem. But to surprise of some, there’s more than just Spring there outside.

In this blog post I walk you through

Using the Neoj4 Java-Driver directly

Creating an application based on Micronaut, which went 1.0 GA these days, the Neo4j Java-Driver and Neo4-OGM

A full blown Spring Boot application using Spring Data Neo4j

Before we jump right into some of the options you as an application developer have to access data inside Neo4j, we have to get a clear idea of some of the building blocks and moving parts involved. Let’s get started with those.

Building blocks and moving parts

Neo4j Java-Driver

The most important building block for access Neo4j on the JVM is possibly the Neo4j Java Driver. The Java driver is open source and is available on Github under the Apache License. This driver uses the binary “Bolt” protocol.

You can think of that driver as analogue to a JDBC driver that available for a relational database. Neo4j also offers drivers for different languages based on the Bolt protocol.

As with Java’s JDBC driver, there’s a bit of ceremony involved when working with this driver. First you have to acquire a driver instance and then open a session from which you can query the database:

With that code, one connects against the database and retrieves the names of all artists, I imported in my previous post. What I omitted here is the fact that the driver does connection pooling and one should not open and close it immediately. Instead, you would have to write some boiler plate code to handle this.

There are some important things to notice here: The code speaks of a driver. That is org.neo4j.driver.v1.Driver. The session is also from the same package: org.neo4j.driver.v1.Session. Those both are types from the driver itself. You have to know this things, because those terms will pop up later again. Neo4j-OGM, the object graph mapper, also speaks about drivers and session, but those are completely different things.

Most of the time however, people in the Java ecosystem prefer nominal typing over structural typing and want to map “all the things database” to objects of some kind. Let’s not get into bikeshedding here but just accept things as they are. Given a database model where a musical artist has multiple links to different wikipedia sites, represented like this (I omitted getter and setter for clarity):

To fill such a model directly by interacting purely with the driver, you’ll have to do something like this: A driver session get’s opened, than we write a query in Neo4j’s declarative graph query language called Cypher, execute and map all the returned records and nodes:

(This code is part of my example how to interact with Neo4j from a Micronaut application, find its source here and the whole application here.)

While this works, it’s quite an effort: For a simple thing (one root aggregate, the artist, with some attributes), a query that is not that simple anymore and a lot of manual mapping. The query makes good use of a standardized multiset (the collect-statement), to avoid having n+1 queries or deduplication of things on the client site, but all this mapping is kinda annoying for a simple READ operation.

Enter

Neo4j-OGM

Neo4j-OGM stands for Object-Graph-Mapper. It’s on the same level of abstraction as JPA/Hibernate are for relational databases. There’s extensive documentation: Neo4j-OGM – An Object Graph Mapping Library for Neo4j. An OGM maps nodes and relationships in the graph to objects and references in a domain model. Object instances are mapped to nodes while object references are mapped using relationships, or serialized to properties. JVM primitives are mapped to node or relationship properties.

Given the example from above, we only have to add a handful of simple annotations to make our domain usable with Neo4j-OGM:

Notice @NodeEntity on the classes, @Relationship on the attribute wikipediaArticles of the ArtistEntity-class and some technical details, mainly @Id @GeneratedValue, needed to map Neo4j's internal, technical ids to instances of the classes and vice-versa.

@NodeEntity and @Relationship are used not only to mark the classes and attributes as something to store in the graph, but also to specify labels to be used for the nodes and names for the relationship.

Quite a different, right? Dealing with the driver, the driver's session and Cypher has been abstracted away. Take note that the above session attribute is not a Driver's session, but OGM's session. This is a bit confusing when you start using those things.

Again, this code is part of my example how to interact with Neo4j from a Micronaut application. The complete source of the above is here and the whole application here.

To be fair, Neo4j-OGM needs to be configured as well. This is done in it's simplest form with a drivers instance and a list of packages that contains domain entities as described above, for example like this:

The driver instance in the example above is instantiated by Micronaut. With Micronaut's configuration support, it would have been manually configured as in the very first example.

In a Spring Boot application, Spring Boot takes care of the driver and Spring Data Neo4j creates the OGM session and deals with transactions, among other things:

Spring Data Neo4j

Let's start with quoting Spring Data:

Spring Data’s mission is to provide a familiar and consistent, Spring-based programming model for data access while still retaining the special traits of the underlying data store. It makes it easy to use data access technologies, relational and non-relational databases, map-reduce frameworks, and cloud-based data services.

That goes so far, that Craig Walls is fairly correct when he says, that many stores "are mostly the same from a Spring Data perspective":

It’d be tricky to cover ALL NoSQL options and since they’re mostly the same from a Spring Data perspective, I focused on a couple that I could also cover reactive with.

Spring Data Neo4j has some specialities, but on a superficial level, the above statement is correct.

Spring Data depends on the Spring Framework and given that, it's kinda hard to get it to work in environments other than Spring. If you're however using Spring Framework already, I wouldn't think twice to add Spring Data to the mix, regardless whether I have to deal with a relational database or Neo4j.

Given the entity ArtistEntity above, one can just declare a repository as this:

There is no need to add an implementation for that interface, this is done by Spring Data. Spring Data also wires up a Neo4j-OGM session that is aware of Spring transactions.

From an application developers point you don't have to deal with mapping, opening and closing sessions and transactions any longer, but only with one single "repository" as abstraction over a set of given entities.

Please be aware that the idea behind Spring Data and its repository concept is not having a repository for each entity there is, but only for the root aggregates. To quote Jens Schauder: "Repositories persist and load aggregates. An aggregate is a cluster of objects that form a unit, which should always be consistent. Also, it should always get persisted (and loaded) together. It has a single object, called the aggregate root, which is the only thing allowed to touch or reference the internals of the aggregate." (see Spring Data JDBC, References, and Aggregates).

In my "music" example, I deal with albums released in a given year. The release year is an integral part of the album and it would be weird having an additional repository for it.

So what are the specialities of Spring Data Neo4j? First of all, in the pure Neo4-OGM example you might have noticed the single, lone "1". That specifies the fetch depth in which entities should be loaded. Depending on how entities are modeled, you could ran in the problem, that you fetch your whole graph with hone single query. Specifying the depth means specifying how deep relationships should be fetch. The repository method can be declared analog:

and so on. I have seen some interesting finder methods here and there. While this is technically possible, I would recommend using the @Query annotation on the method name, write down the query myself and chose a method name that corresponds to the business.

Different abstraction levels

At this point it should be clear, that Neo4j Java-Driver, Neo4j-OGM and Spring Data act on different abstraction levels:

In your application, you have to decide which level of abstraction you need. You can come along way with direct interaction with the driver, especially for all kind of queries that facilitates your database for more than simple CRUD operations. However, I don't think that you want to deal with all the cruft of CRUD yourself throughout your application.

When to use what?

All three abstractions can execute all kind of Cypher queries. If you want to deal with result sets and records yourself and don't mind mapping stuff as you go along, use the Java driver. It has the least overhead. Not mapping stuff to fixed objects has the advantage that you can freely traverse relationships in your queries and use the results as needed.

As soon as you want to map nodes with the same labels and their relationship to other nodes more often than not, you should consider Neo4j-OGM. It takes away the "boring" mapping code from you and helps you to concentrate on your domain. Also, Neo4j-OGM is not tied to Spring. I didn't write application outside the Spring ecosystem for quite a while now. For this post, I needed an example where I don't have Spring, so I came up with the Micronaut demo, that uses both plain Java-Driver access and OGM access. Depending on what you want to achieve, you can combine both approaches: Mapping the boring stuff with Neo4j-OGM, handling "special" results yourself.

If you're writing an application in the Spring-Eco-System and decided for OGM, please also add Spring Data Neo4j to the mix. While it doesn't put any further abstraction layer on the mapping itself and thus is not slowing things down, it takes away the burden dealing with the session and transaction from you.

I do firmly believe that Spring Data Neo4j is the most flexible solution.

Start with a simple repository, relying on the CRUD methods

If necessary, declare your queries with @Query

To differentiate between write and read models, execute writes through mapped @NodeEntities and reads through read-only @QueryResults

Write a custom repository extension and interact directly with the Neo4j-OGM or Neo4j Java-Driver session

By declaring this additional method on the repository, I know have mapped a simple Cypher query that does complex thinks (Here match all albums that contain a specific track and all the relationships of that album and return that all apart from the other tracks) to my entity. I benefit from SDNs mapping and have all the queries in one place.

In my domain, I didn't model the track as part of the album. Those tracks should be explicitly read and not all the time. I therefore added an additional class, called AlbumTrack. Again, accessors omitted for brevity:

Notice the @QueryResult annotation. This is special to Spring Data Neo4j. It marks this as a class that is instantiated from arbitrary query result but doesn't have a lifecycle. It then can be used as in a declarative query method, similar to the first one:

while this query is indeed much simpler as the first one, it's important to be able to do such things for designing an application that performs well. Think about it: Is it really necessary to have all the relations to all other possible nodes at hands all the time?

In the end, you might have guess it: There are no silver bullets. There are situations where an approach close to the database is more appropriate than another, sometimes a higher abstraction level is better. Whatever you chose, try not be to dogmatic.

All the examples are part of my bootiful music project, more specifically, the "knowledge" submodule. With the building blocks described here, you can develop an web application that is used for reading and writing data.

The example application uses a simple, server side rendered approach for the frontend, but Spring Data Neo4j plays well with Spring Data Rest and that makes many different approaches possible.

In the next installment of this series, we have a look at the concrete domain modeling with Spring Data Neo4j.

]]>https://info.michael-simons.eu/2018/10/29/accessing-data-stored-in-neo4j-on-the-jvm/feed/4How to get data into Neo4j?https://info.michael-simons.eu/2018/10/12/how-to-get-data-into-neo4j/
https://info.michael-simons.eu/2018/10/12/how-to-get-data-into-neo4j/#respondFri, 12 Oct 2018 15:00:55 +0000https://info.michael-simons.eu/?p=2811This is the second post in the series of From relational databases to databases with relations. Check the link to find all the other entries, too. The source code for everything, including the relational dataset is on GitHub: https://github.com/michael-simons/bootiful-music.The post has also been featured in This Week in Neo4j.

To feel comfortable in a talk, I need to have a domain in front of me I’m familiar with. Neo4j just announced the Graph Gallery which is super nice to explore Graphs, but I wanted to have something of my own and I keep continue using the idea of tracking my musical habits. In the end I want to have knowledge base in addition to my chart applications (both linked above).

There are several ways to get data into your Neo4j database. One is a simple Cypher statement like this

CREATE (artist:Artist {name: 'Queen'}) RETURN artist

which creates a node with the label Artist and a property name for one of my favorite bands of all time. I can use the MERGE statement, which ensures the existence of a whole pattern (note only a node), too. More on that later, though. In any case, that would be a real effort todo manually. Therefor, let’s have a look how to import data. I found several options:

I find it quite fascinating in how many ways CSV data can be processed from within Neo4j itself. It reads through CSV and provides each row in a way that you can interact with from a Cypher statement like it comes from the graph itself. Those CSV files can be put into a dedicated import folder in the database instance but can also be retrieved from a URL. You can come a long way with that import if you already have CSV or a willing to massage your data a bit so that it fits a plain structure. Rik did this with a beer related data, check it out here: Fun with Beer – and Graphs in Part 1: Getting Beer into Neo4j.

JSON is an option if you want to work agains the many nice APIs out there, but my data sits in a relational database. It looks like this:

This tables store all tracks I have listened to, their artists and genres and in a time series table all plays. If you read the previous post closely, you’ll notice that this database, dubbed statsdb is only in NF2. There is no separate relation for the albums of an artist. They are stored with the tracks. I actually forgot why I modeled it that way. I vaguely remember that I was enjoyed that album names are not necessarily unique and I could find a good business primary key. Anyway, the schema is still running and gives me each month something like that which btw takes only three SQL queries:

In my property graph model I wanted to have separate albums nodes, though, so I had to massage and aggregate my data a bit before creating nodes. As I didn’t want to do this based on CSV, I looked at all the options JDBC. First, Neo4j ETL Tool:

Neo4j ETL Tool

The Neo4j ETL Tool can be downloaded through Neo4j Desktop and needs a free activation key. The link guides you through all the steps necessary to connect the tool to both the Graph database as well as the relational database. The nice thing here: You don’t have to install a driver for popular databases like MySQL or PostgreSQL as it comes with our tool.

I ran the tool agains my database and stopped here:

The ETL Tool recognized my tables and also the foreign keys and proposed a node and relationship structure. This is nice, but doesn’t fit exactly what I want. The relationships can easily be renamed and so can the node labels, but I don’t want the structure. I could just ran this and than transform everything via Cypher again, but that feels weird.

Getting all the artists is simple. It’s basically a “select * from artists” and the corresponding Cypher. This is what Apoc does:

Importing data with JDBC into Neo4j

APOC is not only the name of a technician in the famous Movie “The Matrix”, but also an acronym for “Awesome Procedures on Cypher”. Neo4j is extensible via custom, stored procedures and functions much like PostgreSQL and Oracle. I have been a friend of those for years, I even read and generated Microsoft Excel files from within an Oracle Database. Now I realize, that Michael Hunger does the same for Neo4j

Anyway. APOC offers “Load JDBC” and Michael has a nice introduction at hand.

To use APOC you have to find the plugin folder of your Neo4j installation. Download the APOC release matching your database version from the above GitHub link and add it to the plugin folder. APOC doesn’t distribute the JDBC drivers itself, those have to be installed as well. As my stats db is PostgreSQL, I grabbed the PostgreSQL JDBC Driver and put it into the plugin folder as well.

Things are super easy from there on. With the following Cypher statement, the Neo4j instances connects agains the PostgreSQL instance, executes a select * from artists and returns each row. YIELD is a subclause to CALL and selects all nodes, relationships or properties returned from the procedure for further processing as bound variables in a Cypher query. I then used them to merge all artists. In case they already existed, I updated some auditing attributes. The associations between some artists have already been there, so merge didn’t remove that:

The merge-clause is really fascinating. It matches a pattern and the merge is applied to the full pattern, meaning it must exist as a whole. In the case above, the pattern is simple, it’s just a single node. If you want to create or update an album released by an artist and use something like this MERGE (album:Album {name: $someAlbumName}) - [:RELEASED_BY] -> (artist:Artist {name: $someNewArtistName}), the statement will always create a new artist node, regardless wether the artist already exists or not as the whole pattern is checked. You have to do those things in separate steps like this:

First, merge the artist. Then, merge the album and the relationship to the existing artist node. For me, coming from with SQL background, this feels unusual at first, much like I can use something in an imperative AND declarative way.

Now, the above example is still not what I want. apoc.load.jdbc basically gives me the whole table and I have to do every post-processing in afterwards. Luckily, that second parameter can also be a full SQL statement. Now, to create a graph structure like above (minus the tracks though), I did something like this. Behold, all the SQL and Cypher in one statement:

Read: Bind the JDBC url and a SQL statement to variables. The sql selects a distinct set of all artists and album, including genre and year, thus denormalizing my relational schema by joining the artists and genres. The resulting tuples are then used to create decades and years, merge artists, genres and albums and finally relating them to each other. I was deeply impressed after I ran that.

First: It’s amazing how well Neo4j integrates with other stuff and secondly, it’s actually pretty readable I think.

In my talk, I will dig deeper in the Cypher used. Also, the statements for generating the track data in the above Graph are bit more complex (you find them here). But in the end, I could have stopped here. And probably, I would have done so, if I had started with APOC in the beginning.

But me being the mad scientist I am, started somewhere else:

Writing your own tool

I did this right after my first look at the ETL tool. Neo4j can be extended with procedures and functions. Procedures stream a result with several attributes, functions return single values. When I joined Neo4j, my dear colleague Gerrit impressed me with knowledge about that stuff and I wanted to keep up with his pace, so I saw a good learning opportunity here.

One does not have to extend from a dedicated class or have to implement an interface for writing a Neo4j stored procedure. It’s enough to annotate the methods that should be exposed with either @Procedure or @UserFunction. It’s necessary that org.neo4j:neo4j is a provided dependency on the class path.

Neo4j provides detailed information how to create a package that just can be dropped into Neo4js plugin folder, find the instructions here. They worked as promised.

What I did was to apply one of my favorite libraries again, run jOOQ against my stats database, generating a type safe DSL dedicated to my domain, packaged it up and than used it in a Neo4j stored procedure. In the following listing you’ll find that I use Neo4js @Context to get hold of the graph database service in which the procedure is running, pass in user name, password and JDBC url to my procedure. Those are used to connect against PostgresSQL (try (var connection = DriverManager.getConnection(url, userName, password);) {} (yeah, that’s Java 11 code), open a Neo4j transaction and than just executing my SQL and manually creating nodes.

The complete sources for that exercise are in the etl module of my project. As interesting as the stored procedure itself is the integration test, that needs both a Neo4j instance and PostgreSQL. For the first I use our own test harness for the later one, Testcontainers.

Conclusion

As much as the last exercise was fun and I learned a ton about extending Neo, working directly with the engine and so on, it’s not only slower (probably due to context switches and the relatively large transaction), but it’s hard to read, by magnitudes. As with a relational database, declarative languages like Cypher and SQL beat explicit programming hand down. If you ever have a need to migrate or aggregate and duplicate data from a relational database into Neo4j, have a look at apoc.load.jdbc. Select the stuff you need in one single context switch and then process as needed, either in a single transaction or also in batches and parallel.

For my music project, I’ll probably keep the ETL Tool around as I like the structure I created for the tool itself, but will rewrite the process from relational to graph based on APOC and plain SQL and Cypher statements, if it ever sees production of any kind.

Overall verdict: It took me a while to find a graph model I like and from which I am convinced I can actually draw some conclusions about the music I like, but after the graph started to resonate with me, it felt very natural and enjoyable.

https://info.michael-simons.eu/2018/10/12/how-to-get-data-into-neo4j/feed/0From relational databases to databases with relationshttps://info.michael-simons.eu/2018/10/11/from-relational-databases-to-databases-with-relations/
https://info.michael-simons.eu/2018/10/11/from-relational-databases-to-databases-with-relations/#commentsThu, 11 Oct 2018 19:32:46 +0000https://info.michael-simons.eu/?p=2807In the summer of 2018, I joined Neo4j. This seems odd at first, my “love” for relational databases and SQL is known. I didn’t have this slide in my jOOQ presentation for two years now without reason. But: Looking at the jOOQ and SQL talks from the perspective of early 2000s, they also seemed odd at first. Back then, I never thought doing that much with databases. That changed a lot.

My experience is that all the data your deal with usually has a much longer lifetime than any of your applications sitting on top of that. Knowing one or more database management systems is essential. Being able to query them even more: What Neo4j and relational databases have in common: A great, declarative way to tell them which data to return and not how to return them. In the case of a relational database, this is obviously SQL, which had quite the renaissance for a few years now. Neo4j’s lingua franca is Cypher.

I get to play with Cypher a lot, but in the end this is not what I am working on at Neo4j. My work is focused on our Object Graph Mapper, Neo4j-OGM and the related Spring Data Module, Spring Data Neo4j. We have written about that a bit on medium. Given my experience with Spring, Spring Data and Spring Boot, the role suddenly makes much more sense.

People who entered the IT-conference circus may know the merry-go-round (or should I say “trap”?) of “talks stress me out a lot” – “hey, this is great, just enter another CfP”. I fell for it again and proposed a talk with the above title to several conferences. Now, I have to come up with something.

What are we talking about here?

The domain will be music. I have been tracking my musical habits for more than 10 years now at my side project Daily Fratze and I enjoy looking back to what I listened like this months but 5 years ago.

The guy on the left, Edgar F. Codd invented the relational model back in the 1970s. A relation in this model doesn’t describe a relation like the one between two people or a musician associated with a band who released a new track. A relation in the relational model is a table itself. Foreign keys between tables ensure referential integrity, but cannot define relations themselves. I put down some thoughts a while back in this German deck. This is what a relation looks like in a relational database:

One kind of sport to do with relational databases is the process of normalizing data. There are several normal forms. Their goal is to keep a database redundancy free, for several reasons. Back in the 1970, disk space being one of them. First normal form (1NF): All attributes should be atomic. 2NF is 1NF plus no functional dependencies on parts of any candidate keys. That is: There must not be a pair of attributes appearing twice in a relation’s tuples. 3NF forbids transitive dependencies (“Nothing but the key, so help me Codd”) and it gets complicated from there on. I have to say though, that normalization up to 3NF is still relevant today in a relational systems, at least if you’re a friend of (strong) consistent data.

Why is this? Relations in NF can be queried in many, many ways. Each query send via SQL returns a new relation, what’s the problem? It depends. In a strictly analytical use case, there’s often not a problem. Recreating object hierarchies however, joining things back together, is. A handful of joins is not hard to understand, even without a tool, but self referential joins or a sheer, huge amount, is. It also gets increasingly hard on the database management system.

This is where graphs can come into play. Graphs are another mathematical concept, this time from Graph theory. Sometimes
people call a chart graph by coincident, but this is wrong. A graph is a set of objects with pairs of objects being related. In mathematical terms, those objects are vertices and the relations between them, edges. We call them nodes and relationships in the Neo4j database.

Neo4j is a Property Graph. A property graph adds labels and properties to both nodes and relationships:

One takeaway from that post is: Neo4j is referred to as a native graph database because it efficiently implements the property graph model down to the storage level. Or in technical terms: Neo4j employs so called index-free adjacency, which is the most efficient means of processing data in a graph because connected nodes physically point to each other in the database. This obliterates the needs for complex joins, either directly or via intersection tables. One just can tell the database to retrieve all nodes connected to another node.

This is not only super nice for simple aggregations of things, but especially for many graph algorithms.

So what has this todo with my talk? My SQL talk was all about doing analytics. That is, retrieving data like in this image from a relational database with build-in analytic functions. Computing running totals, differences from previous windows and so on (read more here).

First of all I’m gonna analyze how to create a graph structure from the very same dataset I used in the SQL talk. There are different tools out there with different approaches. I’ve chosen a technique that resonated for various reasons with me. As I want to enrich the existing dataset, I’ll model a domain around it, with Java, putting Neo4j-OGM to use. I’ll show how Spring Data Neo4j helps me not having to deal with a lot of cruft. In the end, I’ll show that I can build my own music recommendation engine based on 10 years of tracking my musical habits by applying some of the queries and algorithms possible with Neo4j.

]]>https://info.michael-simons.eu/2018/10/11/from-relational-databases-to-databases-with-relations/feed/3Validate nested Transaction settings with Spring and Spring Boothttps://info.michael-simons.eu/2018/09/25/validate-nested-transaction-settings-with-spring-and-spring-boot/
https://info.michael-simons.eu/2018/09/25/validate-nested-transaction-settings-with-spring-and-spring-boot/#respondTue, 25 Sep 2018 10:00:32 +0000https://info.michael-simons.eu/?p=2799The Spring Framework has had an outstanding, declarative Transaction management for years now.
The configurable options maybe overwhelming at first, but important to accommodate many different scenarios.

Three of them stick out: propagation, isolation and to some lesser extend, read-only mode (more on that a bit later)

propagation describes what happens if a transaction is to be opened inside the scope of an already existing transaction

isolation determines among other whether one transaction can see uncommitted writes from another

read-only can be used as a hint when user code only executes reads

I wrote “to some lesser extend” regarding read-only as read-only transactions can be a useful optimization in some cases, such as when you use Hibernate. Some underlying implementations treat them as hints only and don’t actually prevent writes. For a full description of things, have a look at the reference documentation on transaction strategies.

Note: A great discussion on how setting read-only to true can affect performance in a positive way with Spring 5.1 and Hibernate 5.3 can be find in the Spring Ticket SPR-16956.

Some of the transactions settings are contradicting in case of nested transaction scenarios. The documentation says:

By default, a participating transaction joins the characteristics of the outer scope, silently ignoring the local isolation level, timeout value, or read-only flag (if any).

This service here is broken in my perception. It explicitly declare a read-only transaction and than calls a save on a Spring Data repository:

This can be detected by using a PlatformTransactionManager that supports validation of existing transactions. The JpaTransactionManager does this as well as Neo4jsNeo4jTransactionManager (both extending AbstractPlatformTransactionManager).

To enable validation for JPA’s transaction manager in a Spring Boot based scenario, just make use of the provided PlatformTransactionManagerCustomizer interface. Spring Boots autoconfiguration calls them with the corresponding transaction manager:

In the unlikely scenario you’re not using Spring Boot, you can always let Spring inject an EntityManagerFactory or for example Neo4j’s SessionFactory and create and configure the corresponding transaction manager yourself. My Neo4j tip does cover that as well.

If you try to execute the above service now, it’ll fail with an IllegalTransactionStateException indicating “save is not marked as read-only but existing transaction is”.

The question if a validation is possible arose in a discussion with customers. Funny enough, even working with Spring now for nearly 10 years, I never thought of that and always assumed it would validate those automatically but never tested it. Good to have learned something new, again.

]]>https://info.michael-simons.eu/2018/09/25/validate-nested-transaction-settings-with-spring-and-spring-boot/feed/0Donating to Médecins Sans Frontières (Ärzte ohne Grenzen)https://info.michael-simons.eu/2018/09/01/donating-to-medecins-sans-frontieres-arzte-ohne-grenzen/
https://info.michael-simons.eu/2018/09/01/donating-to-medecins-sans-frontieres-arzte-ohne-grenzen/#respondSat, 01 Sep 2018 08:01:32 +0000https://info.michael-simons.eu/?p=2790Some weeks ago, my friends Judith and Christian, who write great Steampunk, Fantasy and in the recent time science fiction books for whom I wrote this little Kotlin app had a good idea:

We just decided to donate all sales revenues from #Shardland and #Scherbenland from June to August 2018 to Medecins Sans Frontières / Doctors Without Borders @MSF for their emergency help for refugees in Lybia, Europe and on the Mediterranean:https://t.co/7TKpL6tuT2

Keeping my promises, here are the numbers: My share of royalties of the arc42 by example book has been 112,15€ from June to August. Gernot, Stefan and I split revenues, so that is only my share. Gernot himself already donates through Leanpubs causes. I don’t have numbers from my publisher for Spring Boot Book. I therefore decided to round this number to 250€:

Me and my family have been incredibly lucky the last years and I’ more than happy that one can give. We live in Germany and despite some idiots on various (social) media, it’s really fortunate to live here. Health care is working, social system as well. We don’t have war but clean water, food and everything. We should not forget that this is by far not self-evident for many, many people on this planet-

]]>https://info.michael-simons.eu/2018/09/01/donating-to-medecins-sans-frontieres-arzte-ohne-grenzen/feed/0On becoming a Java Championhttps://info.michael-simons.eu/2018/08/20/on-becoming-a-java-champion/
https://info.michael-simons.eu/2018/08/20/on-becoming-a-java-champion/#respondMon, 20 Aug 2018 19:34:50 +0000https://info.michael-simons.eu/?p=2778July 2018 marks a personal highlight of mine. Just a bit after Rabea brought the news to our very own EuregJUG, the Java Champions account send this tweet out:

My name along this Java illuminaries. When I started this blog here more than twelve years ago, that was something I never even dreamed of. In 2006 the Java Champions program already existed but me, just been from university and vocal training for about 4 years or so, had no clue at all.

While getting my feet wet, I took inspiration from many of the people in the program, from their code, blog posts and talks. Not knowing that I would be working with them later in my life, even being direct colleagues with one of the founders like Eberhard Wolff. Even better: I’m lucky enough to call some of them my friends.

Java Champion was a long shot. I’m hopefully not the worst software engineer and architect out there, but I’m very far from knowing all the things. Quite the contrary. Did I know what a Java bridge method is until Gunnar brought this up on twitter? Do I know much about theoretical computer science? Hell no. I’m very sure that many people who I find very inspiring could forget more than I ever knew and still know more than I do.

So I must have done something else right and I’m happy with that. I just want to write some points down that might help others on their way:

On growth

Somewhen back in autumn 2014 I got Prokura in my company. I’m still not aware of an English word for that, but it means something along the lines that I can make and execute business decisions. My Prokura was only restricted in the way that I could not have closed the company.

Looking back, it was my bosses saying “we trust you to run this thing and also, this is our way of saying here, you’re explicitly technical lead, too.” Sadly, it didn’t come with a manual. At this point I was already more than 12 years at ENERKO INFORMATIK. I had grown in this time, but mostly on many technical levels. Things I learned include SQL, PL/SQL, XML, XSLT, Java (obviously), Spring (more obvious), we did Groovy at some point, not speaking of all the Swing based stuff I wrote and AFAIK ENERKO still runs an Oracle Database Dictionary based ORM I invented.

But could I lead a team? That was hard for me for several reasons. The company didn’t have had much fluctuation (and still hasn’t), so one did basically “grew up” with one another and it’s a weird situation if one person suddenly changes, either internally or externally.

Back then, I somewhen added the title line of a Nick Cave song to my personal site: “You’ve got to just, Keep on pushing, Keep on pushing, Push the sky away.” That has been my motto for quite some time.

I needed to grow beyond technical, “hard skills”. I tried and surprisingly, the feedback I got after leaving ENERKO INFORMATIK in 2017 was better than the impression I had of myself. I managed to hire two new engineers and they are still with the company which makes me quite happy.

In the end, I still felt I didn’t manage to achieve to anything. It’s weird how self perception and perception from others diverge. I left the company and I am working now at Neo4j, where I work in a small team with Gerrit Meier on Neo4j OGM and Spring Data and I couldn’t be happier with it.

Things come with a price

Most of the things “Spring” I learned on working with and on Daily Fratze, a personal photo site I’m running since 2005. I really love that stuff and I learned so much by developing it. But: It ruined my sleep in the days between 2010 and 2013 (2013 was the year my 2nd kid was born), it made parts of my family time with the my 1st kid very hard. I wanted to “finish” stuff and basically didn’t do anything else in my spare time.

Partially the same happened during late 2014 and early 2015, coincident with the events described unter “Growth” when I developed biking.michael-simons.eu and a lot more of Spring Boot related stuff inside my company. I was at home but I wasn’t there. I came back from the office, ate, and went into my office.

At some point, I needed to step back and also visited a doctor to help me to cope with sleep issues and dark moods.

We have a good family life most of the time, though. My wife always kept my back and I tried to be awake early every morning to be there for my kids. It’s by far not self evident that someone tolerates a partner that uses every minute awake when the kids are in their beds for coding, reading technical stuff and so on.

I gave a lot of talks since the end of 2015 with some success. The talks I enjoyed the most have been on Spring Boot and database related stuff. All the things I can explain in the middle of the night. However: I was and I still am super nervous at least a week before a talk. That feeling doesn’t seem to go away. Doing those talks is much more work for me than writing things (for example a book as explained in this post).

On mentoring

I was astonished that during the last 3 years or so many people came to me and thanked me for inspiration. That’s one of the reasons I’m trying to write this down here. Something like being a Java Champion or also success in general is most often not something that happens in a void, without help and support.

Everything I did and keep on doing: I would not be able to do this without having support in my life, a growing sense of what is good for my health (both mind and body) and without having had good mentors in my life.

There was my boss Rainer Barluschke at ENERKO INFORMATIK who taught me that there’s so much more than technical problems to solve. That it’s worth going a detour if the end result fits. Who even introduced me to some topics that seems to be more of the esoteric kind back than, like spiritual growth.

My ex-colleague Silke Böhler, who challenged me in JavaLand 2015 with some good food for thoughts and later on with a line “work is fun, but has to be taken seriously anyway…”. Apart from that: It has been the little things that last and helped along the way.

I already mentioned the support of my family, but also a kid can be a mentor. It’s hard to describe, but having someone near me that most of the time is in a good mood compared to myself, helps on focussing and accentuating the good things.

Summing things up…

Don’t give up trying to reach your goals because other peoples success seems to be so easily achieved. In the end, people of a group engage in the same game, but start with different preconditions. Look for opportunities where you are

Allowed to learn

Be able to fulfill a meaningful task, with all given due diligence and seriousness

Be part of a team, IT is not a single players sport

And also

Find a good mentor

Become a mentor: Pass on what you learned

Keep interest in other things outside your job… Not everything is related to IT

Right now, I feel at peace with myself for the first time in about 5 years. Going over the midlife crises? Who knows… I’m thankful that I actually could push my sky away by magnitudes and the last year will be a year I will always remember.

For the near future I’m super thrilled to work on cool stuff with Neo4j and Spring Data, with having the latest release of Neo4j OGM 3.1.1 just out of the door and see what happens next.

I made some innuendo to some people (Ralf ) that I do have ideas for a next book and as a spoiler: If put this together, I’ll try to bring this post here, something along that one and some other ideas into a form that might be worth reading for more people.

]]>https://info.michael-simons.eu/2018/08/20/on-becoming-a-java-champion/feed/0Spring Boots Configuration Metadata with Kotlinhttps://info.michael-simons.eu/2018/07/15/spring-boots-configuration-metadata-with-kotlin/
https://info.michael-simons.eu/2018/07/15/spring-boots-configuration-metadata-with-kotlin/#respondSun, 15 Jul 2018 19:14:14 +0000https://info.michael-simons.eu/?p=2756Last week I decided to raffle a copy of my book (see Twitter) and I wrote a small Spring Boot Command Line Runner to raffle the retweeting winner as one does (see raffle-by-retweet, feel free to reuse this).

I wrote the application in Kotlin. Notice the use of @ConfigurationProperties in my application:

The lateinit attributes are not as nice as I want them to be, but I heard support for data-classes is coming. Anyway. A super useful thing with those configuration property classes are the metadata that can be generated for your IDE of choice, see Configuration Metadata. In a Java application it’s enough to add org.springframework.boot:spring-boot-configuration-processor as compile time dependency respectively as annotationProcessor dependency in a Gradle build.

For Kotlin, you have to use the kotlin-kapt-plugin. It takes care of annotation processing in Kotlin and Spring Boots annotation processor has to be declared in its scope like this: