As you may have noticed, I am an addicted user of Elasticsearch and have already written some river plugins for indexing different data sources such as Amazon S3 buckets or Google drives.

As a river plugins developper, I usually find myself in situations where I should test many Elasticsearch version and configuration combos – for that I have to reinstall fresh copies of my plugins. Also, as a river user, I have to create many rivers – and it’s often a trial and error process in order to find the correct configuration in regards of content to index, data freshness, index and request performance, mapping issues and so on…

So this lead me to the point that I felt tired about those long CURL commands needed to configure all this stuffs and decided to react ! Today I’m introducing you a new Elasticsearch plugin I’ve wrote to make my life easier (and maybe yours if you’re using rivers too ;-)): Sluice !

Features

So what’s Sluice ? As stated above, Sluice is also an Elasticsearh plugin whose goal is to help you manage your rivers : it simplifies installation of the required plugins but also helps you setup and tune your rivers. The idea with Sluice is to no longer have CURL commands to type, just install the Sluice plugin and then use its simple User interface.

Sluice is hosted on Github and install as a regular plugin by typing the following command in a shell :

You see here that Sluice checks installed river plugins among supported ones and offers simple way to install river plugins not already installed. Just click the Install link and its cares about retrieving and setting up Amazon S3 river plugin for example. You just then need to restart your ES node.

Picking up the dedicated section, you may also have the list all the river instances created into your ES cluster. For now, you can just edit and modify existing river – not remove them.

Finally, it offers a convenient way to add a new river. Configuration attributes of the river are grouped together with clear explanation of its meaning and supported format.

Easy, no ? For the moment, supported River plugins are :

Amazon S3 River plugin,

Google Drive River plugin

Limitations

Sluice has only a first release named 0.0.1 and it’s far from being feature complete !
The current limitations are :

So long ago since my last post but be sure I have not been devoided of thoughts since then (have seen the blog title ? ;-)). Just a lack of time and energy to write things down…

I resume today with blogging with a Spring Roo plugin I finished last week. For those that didn’t know about Spring Roo : it’s a productivity tool helping you bootstrapping a Spring application within seconds. And although excitement seems to be more around Spring Boot these days, I found Roo to be a valuable tool for a developper toolbox… Anymway, Roo comes with many plugins allowing you to chose your persistence layer and APIs : typically JPA based or MongoDB based.

I’ve started some months ago this plugin allowing you to have a persistence layer based on Elasticsearch. The idea here is to have your domain objects directly persisted into an Elasticsearch index and – thanks to the conventions of Roo – quickly having a CRUD service layer and scaffolded web screens directly generates for us. After a little contribution to Spring Data for Elasticsearch (here), the plugin was on its way and is now hosted here on Github.

Twitter example development

The plugin is not yet released to official Spring Roo repository to installation is a bit teadious… The README.md on Githud explains how to do that so I won’t delve into this part. Instead, I propose to illustrate more in details the Twitter example that is used to illustrate the plugin commands.

So let start with a brand new project. In a new directory, start a Roo shell and create a new project with this command :

project --topLevelPackage com.github.lbroudoux.es

This produces a bunch of configuration files as shown by screenshot above. Now, next thing to do is to activate the Elasticsearh layer plugin for Roo and setting it up for using a ES node that is non local to the JVM and hosted on localhost:9300. You do this with this line :

elasticsearch setup --local false --clusterNodes localhost:9300

Configuration files are generated for you, dependencies (to spring-data-elasticsearch) are added for you and Spring version is updated to required one. Following step is to tell Roo you want a Tweet domain object that will be backed by Elasticsearch. This is done through this new variation of the entity command available in Roo :

entity elasticsearch --class ~.domain.Tweet

Tweet domain Java class is generated and followed by AspectJ ITD. You can now embellish your domain class with fields such as author and content that should be limited to 140 characters length. This is done with the following based commands in Roo :

Nothing more to say here : Tweet class is modified. Next step is more interesting : it’s here that you’re asking the plugin to generate a Spring Data repository layer for persisting Tweets into ES. This is done by :

You see that a new interface TweetRepository has been generated and that an ITD that triggers an Elasticsearch implementation proxy is also present. By now, we have to create a CRUD Service layer for our repository and its done simply using this command :

service --interface ~.service.TweetService --entity ~.domain.Tweet

The TweetService interface and its implementations are generated in a way that they’re using the repository we’ve generated earlier in order to persist and retrieve Tweet instances. Finally, in order to easily test and check the resulting application, we have to setup a web layer and generate scaffolded screens for our domain objects. This is done by sequencing these 2 commands :

web mvc setup
web mvc all --package ~.web

And a bunch of web resources, controllers and configuration files are now present into our application. Development is done !

Twitter example execution

We now want to execute all of this in order to properly test our app (Yes : Roo offers many way to unit and integration test your app but a screen is more expressive, at least for a blog post ;-)).

First, in a terminal, start your Elasticsearch node on localhost. Default command will do the job, you don’t need extra configuration :

bin/elasticsearch

Then, from the terminal you were working with Roo shell : exit the shell and launch the Tomcat plugin executing your app. This is done with this Maven command :

mvn tomcat:run

After Tomcat has started up, you can now open a browser to http://localhost:8080/es. You’ll get this screen this is the default home page for application.

From there, you can access a page allowing you to create new Tweets with the fields we have added to our domain class.

Persistence works fine and you’ll see by checking icons that every services are here for showing, updating, finding and deleting Tweets.

Twitter example validation

Then you would told me : “Ok, ok… Stuffs are persisted but how do you know that they’re persisted into Elasticsearch node ?”. A simple thing to do is to check on ES using Marvel monitoring solution (I highly recommend you to install it if not already done !). So open a new browser tab to http://localhost:9200/_plugin/marvel/ and check the “Cluster Overview” dashboard.

Now, you see our first tweet has really been persisted into Elasticsearch !

Conclusion

So I have demonstrated you how to write a full-blown Spring application that :

Persist and retrieve its domain object into Elasticsearch,

Is correctly architectured with a repository layer and a service layer,

Presents basic administrative web frontend,

in no more than 9 lines of Roo commands ! Wouah !

Much much more over this basic persistence stuffs, we’re able – as a developer – to build cool apps using the powerful indexing and querying features of Elasticsearch easily. Just consider this tutorial has a quick-starter and think about : full-text search, geo query, analytics and aggregates on various fields of your Tweets … it is close to hand !

As I blogged yesterday, I recently discover a limitation into Elasticsearch architecture regarding the isolation of plugins. The fact is that every plugin and its libraries are added to the same Java ClassLoader during startup and thus all the plugins share resources and classes definitions.

Observation

I encounter this by developping and testing 2 plugins : one for indexing documents stored onto Google Drive ; the other for indexing documents stored onto Amazon S3. Unfortunately, each one has Apache httpclient coming from its Maven dependencies : version 4.0.1 is used by Google SDK and version 4.1 is used by Amazon SDK.

So when you start Elasticsearch with both, you end up with a beautiful exception as follow :

What happens here ? Both plugins are loaded and Google Drive river seems to be loaded first. As you can see here, its libraries are added to ClassLoader first. So the 4.0.1 definition of org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager is first and will be later resolved by classes referencing it. During its init phase, Amazon plugin will try to use this class but needs the 4.1 definition that holds the new ()V method !

Enhancement

An an enhancement proposition, I’ve forked the Elasticsearch repository here and make some rework onto the classloading scheme of plugins. You may now have the possibility to force the loading of plugins into dedicated and isolated classloaders that will try to resolve requested classes using the plugin libraries first and then the main classloader.

Although I’ve made tests with some other plugins (twitter, head, attachment, fsriver) and see no regression, I thought it will be safer to add a feature toggle in order to activate this. Plugin isolation is then only done if the plugin.isolate settings flag is set to true (either from the YAML configuration file or from the command line).

The result is shown below, when started with the -Des.plugin.isolate=true property, dedicated classloaders are used making use of conflicting plugins a breeze :

So, your company uses Amazon S3 as a storage backend for internal documentation ? Or you’re running a Web application where users can upload and share files and content backed by S3 ? Now, you want/have/need to have the whole suff indexed and searchable using a “mind blowing searh engine” (say Elasticsearch ;-)) ? Well the solution might be the Amazon S3 River plugin for ES released today.

Main features

So what does this plugin do ? Here are the features for this first release :

Connect to your S3 bucket using AWS Credentials,

Scan only changes from last scan for better efficiency,

Filter documents based on folder path (no restriction on the depth level, you can use such path as Work/Archives/2012/Project1/docs/),

Filter documents to include using wilcard expresssions, such as *.doc or *.pdf,

Filter documents to exclude using alwo wilcards expressions, such as *.avi or *.zip (of course, exclusions are computed first),

Support ms office, open office, google documents and many formats (full list here),

Support scan frequency configuration,

Support bulk indexing for optimization

The project

Project is naturally hosted on GitHub here : https://github.com/lbroudoux/es-amazon-s3-river. Plugin is installable as a standard Elasticsearch plugin by using the bin/plugin -install command. Everything you need for installation and configuration should be present onto the project front page.

Restriction

As a disclaimer : when developping this plugin, I discover an Elasticsearch limitation in the fact that all loaded plugins are not isolated from each other and share the same resources (this because plugin libraries are added to main ClassLoader as you can see here). As a consequence, using this new plugin in the conjonction of the Google Drive River plugin previoulsy released is not possible (both Amazon and Google libraries are using conflicting versions of Apache http-client). I’ll tackle this subject if enough time in the forthcoming days.

As usual, do not hesitate to give me your feedback through comments on this post, issues on GitHub projet or tweets (@lbroudoux) !

I’ve blogged some weeks ago about a test run I’ve done with Elasticsearch and Kibana3 (now just Kibana, the ‘3’ has been dropped since ;-)). And the fact is that is was so much fun and so pleasant to go with them that I’d like to go further and start digging into Elasticsearch.

Few days scratching my head and looking around the plugin ecosystem of ES and I’ll get the idea of writing a Google Drive river to actually learn from the trenches. So I am happy to announce the 1st release of this Elasticsearch plugin that allows you to index with ES the content of a Google Drive !

Main features

So what does this plugin do ? Here are the features for this first release :

Connect to Google Drive in ‘offline’ mode (no need to be connected to your Google account, just to authorize the plugin to do so) using OAuth 2,

Scan only changes from last scan for better efficiency,

Filter documents based on folder path (only 1 level for the moment),

Filter documents to include using wilcard expresssions, such as *.doc or *.pdf,

Filter documents to exclude using alwo wilcards expressions, such as *.avi or *.zip (of course, exclusions are computed first),

Support ms office, open office, google documents and many formats (full list here),

Support scan frequency configuration,

Support bulk indexing for optimization

The project

Project is naturally hosted on GitHub here : https://github.com/lbroudoux/es-google-drive-river. Plugin is installable as a standard Elasticsearch plugin by using the bin/plugin -install command. Everything you need for installation and configuration should be present onto the project front page.

Some features are still missing and some may be improved but the basic stuffs should work well and fast. Want to give it a try ? Or help with some ideas, tests or contributions ? Do not hesitate to give me your feedback, I’ll keep on digging and investigating in Elasticsearch the forthcoming weeks, months … who knows !?

Beyond the relevancy of the speakers and the products, an Elasticsearch extension called Kibana3 was briefly introduced and – although marked as alpha release – it totally astonished me ! Kibana3 is an extension designed for real time analytics of data stored into Elasticsearch. It allows a full customization of dashboards and is such easy to use that it can almost be put into the hands of business people…

Some weeks later I found some time for a test run and although things go well, I thought it would be useful to write kind of a “How to” or “Quickstart” with Kibana3. Here it is.

The setup

Install and run Elasticsearch

Download Elasticsearch from http://www.elasticsearch.org (as I recheck everything for writing this post, I have chosen the 0.90.0 release that wasn’t out when I first test this… so everything should run fine also on the 0.20.6 release I’ve picked previously). Just extract the archive into a target directory and simply run the following ;

Congratulations ! You are now running an Elasticsearh cluster with one node ! That is basically anything you need in order to have a basic setup because every interaction with the node – from the administration ones to the client APIs – are done through REST APIs over HTTP. That means a simple CURL command does the job.

Anyway, before going further, we’d like to add an administration console to our cluster (cause having some GUI doesn’t hurt after all) and we need to feed our node with data. For that, we are going to install 2 plugins :

elasticsearch-head that is a web frontend plugin for easy interaction with cluster and nodes,

Now just restart your node by killing the started elasticsearch process and launching another one and point your browser to http://localhost:9200/_plugin/head/ ; you should now have access to web frontend.

Install and run Kibana3

As said into introduction, Kibana3 is an Elasticsearch plugin hosted by Elasticsearch itself and dedicated to analytics by providing the mean to dynamically build any dashboard onto an ES index (the data store). The best way to retrieve the product is to clone the GitHub repository like this :

As states Kibana3 documentation, it’s ‘just’ a bunch of static HTML and Javascript resources that can be put onto any reachable web server. For test commodity, Kibana3 embeds a little Node.js server that can be run if you’re lazy like me :

You can now check http://localhost:8000/index.html with your web browser and should see a default dashboard appearing with a bunch of red panels announcing errors… We’re going to fix that in next section.

The dashboard creation

Before starting to acutally create a dashboard, we need data ! Remember, we have installed the Twitter river plugin : we are going to connect Twitter public stream to retrieve such data. In order to complete following step, you need a valid Twitter account.

The following command helps us creating a Twitter connection specifying some trendy keywords ;-) Just substitute the placeholders with your Twitter account name and password and that’s done.

Let’s go back now to the defaul Kibana3 dashboard into your web browser.. We are gonna change somme params to make it a descent dashboard. First thing to change is the “Timepicker” widget that is use to define the data store on which dashboard it based.

14-may update

For the lazy ones (;-)) that will only want to see the result without building the dashboard, I’ve posted the JSON export here as a Gist : https://gist.github.com/lbroudoux/5579650. It’s easily importable into Kibana.

Edit this widget settings and change the time field as follow :

and then the index patterns as follow :

You should already have a descent dashboard as below (I’ve also changed the dashboard title and the time resolution to see many green bars on histogram).

You can experiment the “Zoom In” and “Zoom Out” on histogram and see their effect onto timepicker widget. You can also draw a rectangular zone onto histogram in order to zoom to this temporal period. Typing keywords into the Query input fied also have dynamic effects on searched records and histogram.

When moving down the page, you see a table widget that still have errors. Its goal is to display excerpts of found records. Edit this widget parameters as follow to configure it to correctly display your tweets :

You see that we reference here the different fields found into a Twitter message coming from public stream (such informations on available fields can be found through the Head web frontend when browsing indexes and looking at stored documents).

Note that we can also modify the layout of widgets by editing row parameters. For exemple, we’re switching table and fields widgets to suit our preferences. Fields widget is indeed very convenient for adding new fields to table view. The screenshot below shows a result obtained after such a switch.

Last thing I’ll show you here is the addition of new Kibana3 widget onto your dashboard. We are now going to display a map showing location of our Twitter users into the “Events” row. Open this row settings editor and select “map” into the new panel dropdown list. Then you’ll have to tell which field is used to get this information ; in the case of tweets the field is “place.country_code”. The setting is shown below :

Don’t forget to click on the “Create Panel” button before closing editor ! The map now displays on your row. Finally after having heavenly distribute widget onto the row, you may reach the following result :

The map widget is also clickable and can be used to drilldown into the data previously selected using query filter and/or timepicker filter. Quite impressive !

Conclusion

If I succeed in my demonstration, you have seen that using Kibana3 can be just easy when understanding the basic customization steps. Kibana3 looks like a very promising tool into this new area of big data, data scientist and miners that has appeared last years.

Some features might be still missing (like a complete integration with Elasticesearch indexes or document types catalogs, security around data consultation or dashboard sharing, etc…) for ensuring a deployment into enterprise world. However premices are already there with the ability of storing Kibana3 dashboard into Elasticsearch itself and the recent posts on how to secure an Elasticsearch cluster (see http://dev.david.pilato.fr/?p=241 for french readers).

I think that Kibana3 being hosted under the Elasticsearch umbrella may be a guarantee of seeing this extension developped and enhanced in the near future. In my humble opinion, this can represent a big advantage onto Elasticsearch business cards.

Things run well during this post redaction but after some weeks and colleagues tests, I realized that Acceleo had some limitations that made this build setup hard to be portable. To summarize : when it comes to referencing modules coming from other projects, Acceleo uses multiple forms to reference paths : relative paths when built dynamically by the IDE, platform:/ paths when exported as a plugin and absolute jar paths when built via Maven (our case).

The limitations

To illustrate, here is an excerpt of the entityFile.emtl module you may find on my sample project, The reference to my own Maven local repository location made it hard to be portable !

When it comes to deployment onto our projects (in my company for my day-time job), these limitations do not really bother us because development and CI machines setup were standardized and we were sure that every local Maven repos were having the same location. I finally put up this problem over my shoulder and forgot it …

… until Dave comments !

Last week, Dave commented out (see its comments) this blog post, remembering me that this issue was left unsolved but still deserves some interest … While Dave is following a pure Java solution, I’m showing in this new post a pure Maven workaround, so let’s go.

A Maven workaround

The principle of this workaround is the following : as referencing other jar archives into the EMTL files make the build not portable, stop using multiple jar archives and use only one uber jar with referenced paths being relatives !

I know that this sounds weird as Maven promotes fine grained and atomic artifacts with transitive dependency resolution and so on … but it also open ways for different forms of artifacts when running/deploying into a constrained environment through the notion of assembly. That is exactly our situation : we’ve got a constrained running environment so we’re going to use an assembly.

The explanation takes place in 3 steps.

Replacing references into EMTL files

The first step is to deal with the referenced jar paths placed into EMTL files by the Acceleo compiler. The goal is to replace them by relatives paths. For this, we can use the Replacer plugin into the build of the Acceleo module referencing other modules.

This configuration basically tells to activate plugin on the prepare-package phase and to process any emtl file to replace the given regular expression denoting an absolute jar path by this relative path.

the value given for replacement is dependent of the package you for this current Acceleo module files (com.github.lbroudoux.acceleo.uml.java.jpa in this case),

the value is the same for any EMTL file because sample project follows Acceleo best practice in term of package naming : each package containing generator is at the same deepness from root (not following that best practice make this workaround non applicable in this state – configuration of replacer might be trickier !)

Creating a flattened uber assembly

Next step is now to create an archive that will contains :

the EMTL and class files of our current Acceleo module (the one reworked during step 1),

the EMTL and class files of the generators we depend on (their coming from Maven dependencies)

The whole resources should be flattened : all put together into a single package hierarchy, into a single jar file for still being usable as a library.

In order to do that, we start declaring a configuration for the Maven assembly plugin into the pom.xml of the Acceleo module referencing other modules (check sample project) :

This configuration tells to activate assembly during the package phase (so after the pre-package) and to refer to descriptor present into assembly.xml file. This is a new file and you just have to create it into project root folder. Its content is the following :

The important part here is to specify that our assembly with use a uber qualifier/classifier for its result archive and that self artifact and dependency artifact should be specified into inclusions.

From now, when doing a mvn install into this Acceleo module, Maven should now produce 2 artifacts : the main one that we already got and a the new uber one holding every EMTL reources with relatives paths flattened. That new artifact is attached as secondary artifact to your build process.

Using this new archive for generation

Last step is now to modify our application that integrates Acceleo generators during its own build process : we should now tell it to use the uber jar we produced at previous step. This modification is simply done editing the pom.xml of your application and adding a classifier information.

I have dealt last weeks with evaluating SOADesigner (see http://marketplace.obeonetwork.com/module/soa) as a complementary solution of a traditional Enterprise Architecture Management suite we are using at day work. One of our goals when deciding to use this suite was to minimize the gap between architecture analysis and realizations by generating and managing SOA assets such as WSDL and XSD artifacts. Obviously we did not succeed and then evaluate another way to get the job done…

SOADesigner is based on Eclipse tooling and implies many Eclipse Modeling initiative technologies. It provides a bunch of EMF Metamodels related to information system management in general and SOA in particular ; so that models produced on its top can be used by tools like Acceleo for generating text artifacts from.

The purpose of this blog post is to introduce the Acceleo generators I have realized for producing WSDL and XSD artifacts from SOADesigner models. The generators – still a work in progress – have been open sourced and put onto Github. You can find them here https://github.com/lbroudoux/InformationSystem-generators and I’ll explain later how to use them.

As an introduction and to setup ideas, here’s some screenshots of the kind of diagrams and concepts you may work with into SOADesigner.

Exchange model design

This first one, covers the design of the exchange model that will be used for services interface specification. The elements of such a model are called DTO (for Data Transfert Object) and may be initialized from Entity elements. DTO are organized into Category – which is roughly the same notion as a package – within a DTO Registry.

Service model design

This second diagram deals with the specification of Services within a Component. Service may hold many operations through its interface that can be detailed in terms of input and output specifications. You see here that we’re quite close of the SOA / WebServices terminologies apart the missing of fault specification (but there’s a feature request on its way ;-)).

Generators specifications

The design generators specifications are the following :

generate 1 XSD artifact per Category or sub-Category holding DTOs,

use the parent system name, category name and version to produce distinct file name,

generate 1 WSDL artifact per Service holding Operations,

make the WSDL artifact hold only the service related datatypes and reference reusable one from XSD,

use the service name and version to produce distinct file name

As an example on the nonRegressionModel.is model that is embedded into the tests modules of the Git repository, we achieve the following results in term of artifacts generation :

Generators features and usage

The currently supported features of generators are as followed :

usage of descriptions put into models to annotate artifacts with documentation,

usage of multiplicity informations to generate according XSD occurence specifications,

correct import XSD within another XSD or a WSDL,

correct usages of different namespaces during inclusions and reuse,

support of inheritance between DTOs,

support of composition and references between DTOs

If you would like to give them a try, you’ll have for now to git clone the repository (I have not yet released them under an plugin) and import the plugins/com.github.lbroudoux.acceleo.soa.contracts into your Eclipse workspace. Then you’ll have to create a new Acceleo launcher referencing a fresh model and the com.github.lbroudoux.acceleo.soa.contracts.main.GenerateAll class as the Acceleo generator class.

Obviously, we assumed you’ll have previously installed SOADesigner as mentionned here onto an Eclipse setup – so that you will have designers but also complete Acceleo environment sets up.

I’ve tried the last few weeks to found the best way to write automated tests for my Acceleo generators and have come to some thoughts and findings that may be interesting.

The first resource I found on the subject was this Tumblr post from Stephane Begaudeau about how unit testing the Acceleo templates and queries. Even though Stephane work was close to an achievement, there was still work to be done on Acceleo API.

Also I realized that my needs were much closer to integration testing : follow a “black box” approach and have the ability to automatically launch non regression tests on a blueprint model when my generators change. Thanks to my previous findings on launching Acceleo from Maven, I would have the base infrastructure to do this.

3 levels of integration testing

So the use-case I’ll try to cover is the following : I have Acceleo generators that produce a bunch of Java and Xml files from a model and I’m going to write test to check that produced files are correct. I quickly realize that this checkings my be done at 3 different levels.

Byte code level

The first level is the byte code one and is available for the Java part of my use-case (you can of course extend this to any compiling language). This first level is interesting because it can be quickly achieved, just use the compiler and the reflection APIs provided by the JDK to check : that methods are generated with correct params, that fields are presents and so on …

However, this method has some limitations :

the code you produce has to compile ! This may seems odd but in large projects you may have many generators producing only parts of the whole puzzle and it may be difficult to make every generated unit compiling without pulling a lot of dependencies. Also sometimes you may want your generated code not to compile in order to force the developer to write something clever ;-)

compilation is a destructive process ! Some elements found in the sources dissapear when transformed into byte code … How to : retrieve parameter names, check javadoc presence or ensure that annotations are all there ?

Syntax tree level

The second level answers these limitations by providing a representation of the tree of directives found in the source files. This is generally called an “Abstract Syntax Tree” that may be produced and visited using APIs. The most famous one is the DOM APIs that represent the AST of an Xml document but analogous tools exists for Java and compiled language (we’ll see 2 of them in next section)

Using this AST allows us to check details that dissapear after the compilation step. It also allows us to verify that some statements (assignations, loops, switch, …) are done in respect of the coding rules/conventions. Sadly enough, an AST is “Abstract” that means that some details are still missing depending on the performance of the available tooling. Sometimes, it is necessary to go deeper to the third level…

String level

This last level is the obvious but tedious one : work at the string level (because after all everything we produce is text !). The kind of tools employed is more primitive : pattern and regexp matchers, string comparison and file readers. Every language offers its shortcuts for that (personally, I love Groovy grep, file and =~ operator a lot ;-))

Possible solutions for Java

The Syntax tree level is maybe the most ignored one, so I’m going to detail it a bit more … After a quick Google search, 2 solutions seems available.

JDT stands for Java Devloper Tools and its the module responsible of all the Java related things in the Eclipse platform. It is used by project management module, editors, code style checker, etc in the IDE.

The main class in JDT that has to be used for our purpose is ASTVisitor. It’s usage is quite analogous to Javaparser except the initialization part that is a little bit trickier :

Then, build yourself a blueprint model including all the generation cases you may allow and follow my previous posts in order to automatically enable the generation from this model using Maven.

Be sure that Maven launches the Acceleo generation before the execution of your JUnit test. For that, I recommend you to bind the generation onto the generate-test-resources phase of your build lifecycle.

Sample project

I mentionned above the sample project I’ve committed to GitHub. In this project, you will found :

A blueprint Uml model testModel.uml at project root that uses the generators from my previous posts,

A Maven pom.xml files that generates files into /target/test-resources directory

A Junit test case using Javaparser into /src/test/java/com/github/lbroudoux/acceleo/uml/java/jpa/files

A Junit test case using Eclipse JTD into /src/test/java/com/github/lbroudoux/acceleo/uml/java/jpa/files

Samples visitors for Javaparser and JDT into /src/main/java/com/github/lbroudoux/japa|jdt

Conclusion

Javaparser and Eclipse JDT are great tools for going into details of the source code and allow the checking of many things that dissapear with compilation. However, I have found limitations on both in the support of line and blocks comments that are not Javadocs :

JDT fully ignore them and give only information on the start and end line of blocks (useless !),

Javaparser tries to handle them but a bug into its parser make him lose the starting point of line comments if not followed by a Java instruction (a ‘;’ character is enough)

This feature would have been usefull for checking – for example – that my Acceleo templates were actually providing some protected area for code to be inserted !

The whole tool chain (AST + JUnit + Maven + Acceleo) makes the non regression checks on generators a breeze mainly if you plug them into a Continous Integration Server (such as Jenkins) in order to have thet checks trigerred by a modification on your Maven module containing your generators !

Let me know if it help some of you … Do not hesitate sending me feedback or other ideas on generators test automation !