In the last couple of years, I developed all of my codes in the way that all members and statics of every class were sorted by Eclipse. I recently joined a new project and it was surprising to me how much the other members of the team were against sorting the members. As always, I tried to understand their motivation and find the differences in our opinions.

Their main reason was the following:

I want to put the functions and variables that belong together next to each other, so I can see them together.

I saw their point. However, I did not agree with it and I could not convince them about my right. I told them:

If you used Git and code review before accepting pull requests, you would find it very annoying if non-fucntional code changes would divert your attention. By non-functional, I mean changing the order of functions or doing some code formatting.

That was not enough. How do you explain the benefit of code reviews and the drawbacks of non-functional changes in the code to someone, who has never used Git neither code reviews?

I still felt that there was something else, too. What did I miss in the argument? Why do I not care if the functions that are called from each other are not close to each other in the source code?

You can call me uppish, but in every development team in my life, I was much faster at code investigation than others. It had one reason. While the other programmers started to analyze the code as a book, I wanted to know only what I had to.

Imagine you are looking for the reason why a label has a specific value on the screen. While the others see the code snippet below:

I see only the following:

You see the difference? I care only about the code I need to know. After I find this code snippet, I will push CTRL+ALT+H to see where the current function is called from. I do not care if the caller function is before or behind this one. I will jump there and continue my investigation.

I look around myself, and see that most of the developers do not even know the most basic keyboard shortcuts of Eclipse:

CTRL+SHIFT+T: Open a type

F3: jump to definition

CTRL+T: Type hierarchy in a pop-up. This is very useful during an investigation if we want to know where a method of an interface is implemented.

CTRL+ALT+H: Call hierarchy. Where is this variable used or where is this method called from?

CTRL+L: Jump to line. Very useful if we have a stacktrace and we want to start investigating the issue by line number.

I might have missed a couple of useful shortcut keys. What I wanted to express is that the ones who do code investigations without these, cannot call them experts. Their effectivity is way beyond the ones who know the tricks.

As soon as you start reading the code effectively, you will not care about the grouping of variables and functions at all. However, it will be very annoying if you see that others spend much energy on grouping, instead of concentrating on the business logic they want to implement. If you start thinking of grouping, you waste time and effectivity. You will never be as effective as others.

So why do I care about sorting the members at all? Why should I sort the members if it does not matter during code investigation? Because I want to make only those changes visible in my commits that have actual functional meaning. Without automated-sorting and very strict automated code-formatting in my IDE, I should spend a high amount of energy on reviewing my own commits before pushing them for code review.

We are happy to announce the first public release of Blobstore components. Blobstore allows the programmer to read and write binary large objects within atomic transactions.

Everit Blobstore has in-memory and JDBC-based implementations. While the in-memory implementation can be used to write unit tests, the other one supports several relational database types: Apache Derby, Hsqldb, MySQL, PostgreSQL, Microsoft SQLServer, Oracle

Introduction

There are many libraries out there that contain reusable code, but only a few of them persist data in relational database. Even if they do so, they behave normally as standalone project. It is hard to use their tables in custom queries of the project that embeds them.

Our goal is to create a technology stack that allows to embed logically related table sets into separate modules. There are many use-cases in which reusable persistent data structure can be defined. With a well defined API, complicated queries can be eased.

A reusable module with relational database tables must meet the following requirements:

The module must provide an API that makes it possible to easily create or modify the records of the tables

It must be possible to reference the tables with foreign keys from other modules

The module should provide API to easily extend SQL queries

During a project many parts can be separated that can be implemented in a reusable way. This post shows some practical use-cases and the way how they can be implemented.

Localization

Imagine that you have a User entity with the following fields:

userId

name

birth_date

country_code

Task: Show a filtered and paginated list of users on a website. The list should be ordered by the localized country name

Low-performance solution

Select all the users that meet the filter criteria. If we have 1 million results, than download all of them into memory.

The SQL above looks difficult. Re-implementing this query at each place is almost impossible. It is time to find a technology that supports Object-Oriented SQL queries to be able to separate complex query parts to reusable functions. Some solutions that allow this:

It would be hard to write down the COALESCE logic every place where we want to have localized data as part of the result set. Let’s make a function called createLocalizedCoalesce(DSLExpression<String> key, Locale locale). With that function, it will be easy to write the robust query:

Building a query with the complex localized_data logic is not that difficult anymore.

Initialize database schema

Defining tables, views, versions

We chose QueryDSL to write queries. However, we need a tool that can populate the database schema.

We investigated lots of the technologies that supports creating and modifying tables and Liquibase won. In our solution, each module contains a liquibase changelog file that defines the database schema. In the localization module it looks like the following:

Initializiation of the database

Our stack is based on OSGi and Declarative Services. Using the right modules and components we are able initialize the database before the business logic is activated. This goal can be achieved by the configuration of the following components:

Database initialization

First of all, we need a JDBC driver that registers a DataSourceFactory as an OSGi service

picks up the DataSource service that was registered by Everit – Commons DBCP component

tries to initialize/validate the database with the specified Liquibase changelog file

if the initialization/validation is successful, the DataSource object will be registered again as an OSGi service but with different service properties

Based on service properties, business components can pick up the DataSource that was registered by Liquibase DataSource component

With a configuration like this, we can be sure that the database is up-to-date when the business components are activated.

How does the Liquibase DataSource component know, where it should find the changelog file?

Each bundle that contains one or more Liquibase changelog files. They must also provide one or moreliquibase.schemacapability. The capability can have the following attributes:

name: The “name” of the schema that identifies the changelog file. By “schema” I mean the identifier of a logicaly related set of tables, views and sequences. The meaning of “schema” capability attribute is not the same as a database schema.

resource: The path of the changelog file within the bundle

custom attributes: Any custom attribute may be specified. They can be useful if we want to allow special filters on the consumer side (e.g.: version, dbtype, …)

In case of localization, the capability MANIFEST header looks like the following:

Everit – Liquibase DataSource component should be configured to pull this capability. To do that, simply specify the following at the liquibase.schema configuration property: org.everit.osgi.localization.schema.

It is possible to add directives in the configuration of the Liquibase DataSource component. E.g.: org.everit.osgi.localization.schema;filter:=(version>=1.0.0)

The consumer bundle must have a Require-Capability MANIFEST header that refers to the capability offered by the producer bundle. By doing that, a wire will be created between the bundles. Our solution finds inclusions based on these wires.

the include tag must define the path with the “eosgi:” prefix and add the name of the liquibase.schema capability as value instead of the path

If we want to use the Localization module in our project, we should include the liquibase changelog in the following way in the changelog of the application:

<include file="eosgi:org.everit.osgi.localization.schema"/>

And specify the following requirement in the MANIFEST of the application bundle:

The wiring and inclusion above works transitively. E.g.: If you include authorization tables in your application, you include its dependencies transitively as well. As resource is included in authorization, the resource table will be created/validated during the application initialization.

Generating the QueryDSL metadata classes

Writing QueryDSL metadata classes manually would be a hard task. There is a tool called LQMG that generates the Metadata classes from the Liquibase capabilities for us. This tool has the following steps:

Steps of code generation

Starts an embedded OSGi container (Equinox)

Deploys the specified bundles. Those bundles should be specified that contain liquibase capabilities.

Starts an embedded H2 server

Based on the specified liquibase capabilities, initializes the embedded H2 database. This is done in the same way as Liquibae DataSource works

Generates the QueryDSL metadata classes from the H2 databsae

It is also possible to specify renaming rules in LQMG to generate class names and properties that are easier to read in the source code.

If we use lqmg-maven-plugin, the dependencies of the projects will be handled as the OSGi bundles.

Resource

The Resource project is already available at GitHub. Resource has no own functionality. It is simply a table that has a generated id. Resource is often used in other modules to wire their functionality to each other.

Databsae Sequence could have been used as well, but we wanted to let the developers find out if any other module uses the same resource id. In case a foreign key references the resource record, it cannot be deleted until all module allows it.

Authorization

The authorization module has already been implemented by Everit based on JPA-OSGi technology stack. As we are changing our technology stack, it will be available soon based on Liquibase-QueryDSL at GitHub.

To have a reusable authorization module, we can use the following relational schema:

permission table

target_resource: Resources which other resources can run action on. It might be a document that a user can open and edit or a user who can be deleted or modified

action: The action of the permission. E.g.: If the authorized resource was a user and the target resource was a document, actions could be ‘open’, ‘edit’, ‘delete’, …

permission_inheritance table

It is possible to define inheritance between authorized resources. The child resource inherits all of the permissions of the parent. A parent could be a user group, role or organization, while a child could be a user or user group. The table is abstract enough to let the designer of the application decide, how to describe roles, groups, users and the relation between them.

Functionality of authorization

First of all, the authorization module contains functions that make it possible to manipulate permissions between resources and the permission inheritance DAG. It also contains a permissionChecker that can

calculate the inheritance and provide it from cache next time

provide permission information for an action between two resources

from the database

from cache

And last but not least, the authorization component provides functionality to extend QueryDSL based queries with permission restrictions. Imagine the following query:

SELECT ...
FROM document d
JOIN document.attachment a;

We can define permissions on every entity that contains a foreign key to the resource table. Every entities can be found in a SQL statement behind FROM and JOIN parts may be authorized or target resource. In this case we can define permissions on documents and on their attachments. We define the following function:

The function generates the following predicate (in QueryDSL of course):

EXISTS(SELECT 1 FROM permission p
WHERE p.authorized_resource_id IN (?, ...)
AND p.target_resource_id = d.resource_id
AND action IN (?, ...))

By using QueryDSL and the function above, it is easy to extend every queries with authorization restrictions.

Full-text search

This is more a concept for us at the moment. We realized that full-text searching together with permission checks cannot be implemented effectively if the index is outside of the database. We saw huge document management systems that did the following:

Doing a full-text search with a technology like Lucene

Checking permissions on each record one-by-one

Showing twenty records on the user interface from the middle of the result set

And saw them die on requests that had larger result sets🙂.

Every modern DBMS has built in full-text search support:

H2-Lucene

MySQL – Sphinx

PostgreSQL – TSearch2

MSSQL – Some Microsoft stuff🙂

Oracle – Oracle Text

All of these allow to have FTS as part of a complex SQL statement. We believe that by using the FTS and the authorization modules together the two logic can be migrated into one SQL statement easily. By doing that, only the number of final results have to be transported from the database.

Other use-cases

After the developer starts to design projects with this concept, he/she will find many reusable parts. Some ideas: historical tables with data mining functions, audit records, currencies with exchange rates, geolocations, gtfs, …