Reporting TalesPentaho Reporting Tips and Tricks2015-01-14T18:07:49Zhttps://www.on-reporting.com/feed/atom/WordPressThomashttps://www.on-reporting.com/https://www.on-reporting.com/?p=5402015-01-14T18:07:49Z2015-01-14T18:07:49ZA couple of days ago, prijip contributed a thermometer chart to Pentaho Reporting. Thank you, it’s good to see that our product is useful! Thermometer charts are one of the many variations of a typical “Traffic light” indicator and great if you have to monitor a single KPI for critical levels. The typical chart shows […]

]]>A couple of days ago, prijip contributed a thermometer chart to Pentaho Reporting. Thank you, it’s good to see that our product is useful!

Thermometer charts are one of the many variations of a typical “Traffic light” indicator and great if you have to monitor a single KPI for critical levels. The typical chart shows you three levels of ‘hotness’ – normal levels show you everything is fine, critical levels require your attention, and above that, lets just panic.

As it is implemented, the thermometer chart is great when you need to make the indicator big and bold and hard to miss. The thermometer’s color will change according to the value level reached. And unlike the traffic-light function that changes background colors on text, this chart is an easy to interpret symbol that is intuitively understood by everyone.

The chart requires only a single value to work, but be aware that due to the nature of the charting framework in our reporting engine, it needs at least a single row of data to receive at least one “items-advanced” processing event. Even if the value comes from parameter or calculation, you have to have at least one row of data.

If you have more than one row of data per grouping, the chart collector will aggregate all values together as one big sum.

The chart itself almost self-explaining, and is quite easy to configure: provide the ranges, provide proper colours (in case you don’t like the off-the-shelf green, yellow and red range) and some labels, and voilà, a thermometer.

Personally, I don’t find the limited selection of units of just Kelvin, Celsius and Fahrenheit thrilling. So unless you are monitoring temperature, disable them and the thermometer is rather suitable for all sorts of monitoring of value levels.

Needless to say: If we have not covered your favourite property – like all charts, this chart type has the option to customize it via scripts.

The thermometer chart and many more goodies will be part of the upcoming Pentaho 5.3 release.

]]>0Thomashttps://www.on-reporting.com/https://www.on-reporting.com/?p=5332014-12-17T15:59:49Z2014-12-16T18:30:10ZI’m not exactly a social media user, so I am rather surprised that every now and then I get a question via these channels. @PentahoReport could you explain to me, how to use the pentaho sdk with java to replace the datasource defined within the prpt file? — PhilmacFLy (@PhilmacFLy) December 16, 2014 There are […]

There are two ways to change a PRPT file. One, the ugly way, is to crack open the ZIP structure and to mess with the XML files contained in there. I won’t support this atrocity with documentation. The second way, the good way, is to use the reporting API to do your changes. It is clean, and it will ensure that your report will be valid. And best of all – its unit testable.

This will be a code-heavy post, so lets talk about the setup first, to keep the main body simple.

I assume you have a project set up that has all the reporting libraries ready and that contains all jars for every data-source you are using in your report. Normally that means you are bootstrapping from the SDK’s “sample-use-full” module.

Now lets create a base harness for our task. A simple Java-class with a “public static void main” method will do.

The processReport method will take two parameter. First, the source file, which should be a valid file name pointing to a PRPT file, and second a target file name, where the processed report will be written to. I personally do not like overwriting the source, it makes it rather hard from recovering from errors.

Lets add the “processReport” method next. I still consider this boiler plate code, as all it does is to parse the report, hand of the MasterReport object to the actual method that does all the work, and then write the modified MasterReport into a PRPT file.

Again, the code should be rather self-explaining. The first two lines parse the report, next we hand it off to some other method to manipulate the report, and whatever we get back will be written out into a new PRPT file. I omitted the proper exception handling to make the code more readable – if it crashes, it will burn, wild but beautiful.

Now finally, the meat. Manipulating reports. First, something simple: Lets not do anything at all, lets just return the report. This effectively copies the report from the source to the target file.

Data-Factories are stored on a report. A master-report can have sub-reports, which itself can have data-factories defined. Of course, sub-reports can have other sub-reports, which have their own data-factories and so on.

A report can contain multiple data-factories by using a “CompoundDataFactory”. Reports created with PRD always use a compound-factory – it makes the code a lot easier and adds almost no overhead.

To deal with the complexities of nested subreports, we use a "StructureVisitor" to traverse the report definition for us. On each report we encounter (the master-report and all sub-reports) we now check for data-factories we are interested in.

There are two ways to retrieve a data-factory shown here:
(1) processAllDataSources - if you want to modify them all or don't know which data-factory is your target. This will iterate over all data-factories stored on that particular report and let you modify it in the "handleDataSource" method.
(2) processSingleDataSource - this method expects the name of a query and will try to locate the first data-factory that claims to be able to handle that query. If your report has many data-factories but you want to modify only a particular one, this method is yours.

Now, enough of standard code - lets solve a real problem.

I want to replace the JNDI definition for reports that have a local file-based HSQL data-source with the proper JNDI data-source. We all know, if you have SQL data-sources and don't want to change your reports whenever your database server changes, you have to use JNDI connections. But what we know is not always what we do, right?

So lets replace the "handleDataSource" method with one that finds all SQL data-factories. If the data-factory uses the local sample-data, then replaces them with the JNDI reference.

A good set of samples on how to just inspect and report on the use of certain features, including the use of fields, have a look at the report-designer's inspections. These little helpers also use the AbstractStructureVisitor system to check each element and collect data which they then report to the user.

]]>0Thomashttps://www.on-reporting.com/https://www.on-reporting.com/?p=5262014-11-12T16:42:36Z2014-11-12T16:39:19ZI recently was asked on Twitter how Pentaho Reporting can be translated. Translating our software is sadly not as easy as I’d wish it to be. Prepare for a larger project. On the positive side of it: We use standard Java methods to provide the translations, so it’s not overly arcane and tools exist to […]

]]>I recently was asked on Twitter how Pentaho Reporting can be translated. Translating our software is sadly not as easy as I’d wish it to be. Prepare for a larger project.

On the positive side of it: We use standard Java methods to provide the translations, so it’s not overly arcane and tools exist to assist with the translation process.

On the negative side: The translations are stored using the standard Java methods. That means they are held in various properties files within the source code and are tedious to locate. And if you are not a developer, wrapping them up into a ZIP file can be an equally tedious task.

So lets cut through it.

The Guide to translating Pentaho Reporting and the Report Designer

Translation basics

Translatable resources are stored in a set of text-files that follow a similar naming pattern. Each file begins with a common name (“messages”), an optional suffix specifying language and location, and finally a file type (always “.properties”).

In the Pentaho Reporting project, we use the common name “messages” for all bundles. A message bundle file that has no language or location suffix is called the base resource. In Pentaho Reporting, this resource contains the US-English texts.

All files contain the text as key-value pairs in the format:

translation.key=My User Interface Text

To provide translations for a new language, all you need to do is to copy the base file (messages.properties) to the name that matches your target language and start translating the values (everything after the “=”). The language and optional locale are specified as two letter codes. For instance for Portugese, the country code is be “pt”, thus the suffix is “_pt”, and the full filename would be “messages_pt.properties”. For the Brasilian dialect of Portugese the suffix would be pt_BR, and thus the full name would be “messages_pt_BR.properties”.

If there is already a translation file for your language, all you need to do is to add the missing keys to the file.

The file must be saved with an ISO-8859-1 encoding. Any character that cannot be expressed in that encoding must be written as unicode-escape sequence. See this Wikipedia article on properties files for an easy explanation.

Tip: When translating these files, it can be helpful to use a Java development environment for the task. These tools have plugins that I recommend “IntelliJ IDEA” for this task. If a properties file has more than one translation (using the naming rules lined out above), it presents the contents of all languages next to each other, making it easy to cross-reference the translation with the English original.

Translating Pentaho Reporting is not small task, the project is big and has a lot of parts. I am happy to integrate any translation, no matter whether it encompasses every translatable text or a single element. Over time, we may be able to build a full set of translations, but I’m sure it won’t happen overnight. We have a lot of text, and a lot of catching up to do.

]]>0Thomashttps://www.on-reporting.com/https://www.on-reporting.com/?p=5112014-07-14T11:55:21Z2014-07-14T11:55:21ZIn my last post about creating a local build environment, I showed how you can set up a local CI server to validate your local commits. When working with Open-Source on GitHub, contributions usually come in the form of pull-requests. When such a request comes in, it is usually wise to validate the code. But […]

When working with Open-Source on GitHub, contributions usually come in the form of pull-requests. When such a request comes in, it is usually wise to validate the code. But hey, validating commits is hard work when you have to do it manually. You have to pull in the change, merge it locally, build it, run the tests, maybe even sniff test the final application to catch errors the unit-tests can’t catch (for example GUI errors). It’s hard and complex and so every now and then we skip these tests and accept the pull request blindly and deal with the fall out afterwards.

Why not automate those steps?

Technically, a pull request is a branch in your server side repository, created automatically by the GitHub systems. A valid branch should build and should run all tests without errors.

So now that we have a build infrastructure in place, lets use it to build and validate all pull requests for us.

Jenkins comes with an pull-request builder plugin for that job. I was never able to make this work – as it requires some GitHub API magic to update the status of the request on GitHub, and does not seem to run without it. But why use something complicated when pull requests are just branches.

A slight word of warning: Pentaho uses non-unique “TRUNK-SNAPSHOT” references for its internal libraries. When a project is built, it always pulls in the latest version of that library, and not the version that was current when that pull-request was made. Therefore it is highly unlikely that very old pull requests build correctly now.

Project set up

This set up requires that you have configured the system to run local builds already. If not, please read the previous posting in this series before you continue.

Begin by creating a new free-style project. Give it a name, and click continue to enter the configuration screen.

1. Configure the clean-up rules

First, lets make sure we never run out of disk space. As every build can be quite large (between 600 MB and 1GB), we never want to have more than two builds around.

2. Define a parameter for the pull-request ID

The build will be parametrized, so that we can choose which pull request to build. Building will be a manual process (you don’t want random people submit code that then gets executed on your machine without your explicit consent.

Create a new string-parameter “PR_ID” and give it a nice description and maybe a default value of a known-good pull request (so that you can test this set up easily).

We will reference this PR_ID parameter in the next step.

3. Configure the Git Checkout rules

Configure to checkout from GitHub as usual. However, this time we are not going to monitor or build the “master” branch. Instead we are checking against the pull-requests filed in GitHub.

Github stores all pull requests in a special section named “refs/pull”. For each pull request it creates a new sub-entry for the pull-request ID which contains two pointer for “head” (the actual code) and “merge” pointing to the merge commit for the pull request.

Change the “Refspec” parameter to “+refs/pull/*:refs/remotes/origin/pr/*”. This tells Jenkins to monitor the remote references under “pull/*” (which is where GitHub deposits pull request data) and to drop it into the local branch reference “remotes/origin/pr/*”.

Next, specify that we want to build the branch “origin/pr/${PR_ID}/head”. The ${PR_ID} is a reference to your job-parameter that you defined above.

And last but not least: Make sure you clean the working directory and yhat you check-out into the “code” sub-directory.

As the pull requests are triggered manually, you do not need to configure any of the build triggers.

4. Provide the Ant and Ivy configuration to point to our local artifactory.

5. Invoke Ant to build the project

Configure your ANT build step to use testing (to validate that the new code does not break anything) and to publish locally. The artefacts created by the pull request build should not be published to a shared repository. If you do, you start mixing artefacts from branches locations into the same category, which is a recipe for subtle bugs.

6. Archive your artefacts to make them available for manual testing

Unlike in the previous builds, I prefer to let Jenkins archive the final assembly. This allows me to download and install the Report Designer for some quick manual testing (to see that nothing big is broken – or at least is not more broken than before).

That’s it. Hit “Save” and you will be able to build pull requests. Whenever you select “Build now” you will be asked for a pull-request ID and Jenkins will happily start building this particular pull request for you.

]]>0Thomashttps://www.on-reporting.com/https://www.on-reporting.com/?p=4652014-07-11T10:53:08Z2014-07-09T12:54:05ZWhen you finished writing a feature or bug-fix, the big moment of truth arrives: Did I break something? Of course, the obvious errors should be validated in the local unit-tests. And all programmers run unit and integration tests every time they make a change. Yeah! I know I don’t. I should, but its a quick […]

]]>When you finished writing a feature or bug-fix, the big moment of truth arrives: Did I break something?

Of course, the obvious errors should be validated in the local unit-tests. And all programmers run unit and integration tests every time they make a change. Yeah!

I know I don’t. I should, but its a quick fix and what bad could ever happen? So I need a system to help me stay clean – I need a CI server that automatically picks up my changes so that I don’t have to worry about skipping tests any more. I want it automated, I want it to work in the background, and I want it fast.

Pentaho Reporting is a library that is embedded in many places, most importantly for readers of this blog, in the Pentaho Report Designer and the Pentaho BI-Server. Sometimes changes have far fetching results – and not good ones! Checking method parameter more strictly can cause errors pop up everywhere. Removing a seemingly unused library or renaming a internal method can break the downstream compile process in rather surprising ways.

This is the second part of a 3 part series on how to manage your local builds better. Today, we will setup a CI server to continuously validate your local commits. To work best, this server will require you to have an Artifactory repository running, so that you get results faster.

Contents of this series

In the first part of this series, we did create a faster build by setting up an Artifactory server as a strong cache. This did cut the build time from over 60 minutes to roughly 15 minutes. Although this is not exactly lightning fast, it is fast enough to give you feedback on your work before you have moved on to the next task.

This post will automate the testing, so that every commit you do is validated. I will also introduce you to a advanced set of override properties for subfloor that will speed up your build process even more.

And in the third part, we will look at how to set up a proper end-to-end intergration test for the Pentaho-BI-Server assembly, so that any change you do can be validated against the full stack.

Continuous Integration explained

Continuous integration is a method to automatically build software whenever there have been any changes to the system or the code. CI acts as a second line of defence to inform you of any errors you make as early as possible.

A CI system is only as good as your suite of automated tests. If your code does not use any automated tests, you gain only the bare minimum from a CI system: You will know when code that you publish does not compile.

With respect to tests, I tend to fall between the test-driven-development, have 100% code coverage crowd and the ‘pragmatic’ no tests – if there are errors a customer will scream – crowd. I subscribe to the design-driven-testing camp, which tests whatever is needed to validate the use-cases, but does not worry too much about testing code on the extreme ends of the spectrum. Likewise, for me code coverage is a nice tool, but not a yardstick to beat up code with it. I write tests the test-driven way for each bug I fix (failing test first, fix second, passing test as a result) and for features I test the user story via unit-tests.

Pentaho Reporting comes with two test suites: A fast suite of tests that runs in a few minutes or so, and a long running test suite that validates report rendering and thus falls into the integration tests spectrum.

A CI server only automates stuff that you should be able to do equally well on the command line.

Defining an build process for automation

When you build any modern software, you usually use one of the many build scripting systems that exist out there. For old folks from the C camp, this may be make, for dot-net it is MSBuild, and for Java it is ant, maven or any of the modern alternatives.

Our project is already pre-configured to make it as easy as possible to set up an CI server. To build the full reporting project with all libraries, just invoke

ant continuous-local

Once this has finished without errors, invoke

ant longrun-test

to run all tests. This target will resolve all libraries again, and thus cannot be run without first running continuous-local to publish all artefacts.

(You can find a list of callable targets in the first posting of this series.)

If you get a “OutOfMemoryError”, you will have to increase the memory of your ANT installation by setting the environment property “ANT_OPTS” to “-Xmx1024m -XX:MaxPermSize=512m”.

Our CI server will invoke the same two commands, whenever it detects changes in the repository it monitors.

Installing Jenkins as CI-Server

I will use Jenkins as the CI server of our choice. Jenkins is easy to install, and has barely any requirements on the host system – as long as you can run Java, you will be fine.

When downloading Jenkins, do make sure you download the latest stable release, as the development versions can be outright broken from time to time. Download the latest stable native package for your operating system and install it.

Jenkins on Windows

If you install Jenkins on Windows, this will install the server as “Local System” user, with permission to do any evil operation it likes. For a CI-server that is a bit dangerous. So please follow this guide to run Jenkins as normal user instead. This will also make it easier for you to configure build tools without having to fight against the permission scheme of Windows.

Jenkins will run on port 8080, which may conflict with other servers you either have running or will run during development. I strongly recommend to change the port to an unused port, like port 28080. Last but not least, the Jenkins install happily mixes the installation files with your job configuration, which is bad. Ideally, the configuration data sits in a separate directory that you can reach easily.

On Windows, I tend to keep all build files close to the root of the disk, as Windows does not like long path names. Create a “build/jenkins” directory in C (or any other drive). We will use this as Jenkin’s workspace. On Unix or a Mac, I tend to use “/opt/ci-data” for the same purpose.

Edit the “jenkins.xml” file (usually in C:\Program Files (x86)\Jenkins) and point it to the new directory. The file should look similar to this one at the end:

<!--
Windows service definition for Jenkins
To uninstall, run "jenkins.exe stop" to stop the service, then "jenkins.exe uninstall" to uninstall the service.
Both commands don't produce any output if the execution is successful.
-->
<service>
<id>jenkins</id>
<name>Jenkins</name>
<description>This service runs Jenkins continuous integration system.</description>
<env name="JENKINS_HOME" value="C:/build/jenkins"/>
<!--
if you'd like to run Jenkins with a specific version of Java, specify a full path to java.exe.
The following value assumes that you have java in your PATH.
-->
<executable>%BASE%\jre\bin\java</executable>
<arguments>-Xrs -Xmx256m -Dhudson.lifecycle=hudson.lifecycle.WindowsServiceLifecycle -jar "%BASE%\jenkins.war" --httpPort=28080</arguments>
<!--
interactive flag causes the empty black Java window to be displayed.
I'm still debugging this.
<interactive />
-->
<logmode>rotate</logmode>
<onfailure action="restart" />
</service>

Basic Setup

Once Jenkins is installed, start (or restart if already running) the Jenkins service and point your browser to http://localhost:28080 to configure Jenkins.

First, we must make sure that Jenkins only uses stable plugins. Plugins are updated regularly, and not all updates are free of bugs. To ensure you have a working system, we must change Jenkins’ update feed to the stable feed.

This way you ensure to get the proper update notifications for LTS and LTS-compatible plugins instead of Latest&Greatest. After you do this you may need to remove the contents of ${JENKINS_HOME}/updates to ensure that Jenkins shows the correct updates for the LTS stream. (If you followed the instructions above, JENKINS_HOME is either C:\Build\Jenkins” or “/opt/ci-data”).

We now have to install some additional plugins to teach Jenkins how to handle Git repositories. Go to Manage Jenkins->Manage Plugins->Available Plugins and install the “git plugin”, and ask Jenkins to install it on the next restart.

Now do the same to install the following plugins:

config file provider plugin – Ability to provide configuration files (e.g. settings.xml for maven, XML, groovy, custom files,…) loaded through the UI which will be copied to the job workspace. We will use this plugin to provide our ivysettings and common build properties to the ant process.

Parameterized Trigger plugin – This plugin lets you trigger new builds when your build has completed, with various ways of specifying parameters for the new build. We will need that to safely trigger the integration tests on the correct GIT revision.

After Jenkins is back up, lets configure a few tools. To access GIT repositories, Jenkins needs a valid GIT installation. And to build a Java project with Ant, Jenkins needs to know about where to find Ant and a valid JDK.

I assume that these tools are already installed on your system, (otherwise you would have had a hard time getting the sources and building the project so far), so lets point Jenkins into the correct position.

If you use a Unix system, Jenkins should be able to pick these tools automatically. On Windows, you will have to point Jenkins into the correct direction.

First, specify the installation location for your JDK. For Pentaho Reporting, this must be a JDK 1.7 installation. Click the “Add JDK” button and you will see input fields for Name and JAVA_HOME.

Next, point Jenkins to the path to your Git command. Click “Add Git” to see the input fields. On Unix and sufficiently recent MacOS versions, the defaults should be fine. On Windows, point the “Path to Git executable” to the Git.exe in your GIT installation.

And last but not least, let Jenkins install itself an Ant installation.

So we are ready, lets start with setting up the first project.

Prepare shared configuration files

Next, we configure some central configuration files that we need later for the build process. As I explained in the first post of this series, ivy is rather slow when resolving artefacts, and a strong cache helps.

For a safe CI build, it is important to resolve artefacts freshly from a know-to-be-safe source. The ivy-cache can be easily tainted, both by parallel resolve processes as well as when Ivy gets interrupted in its work.

Therefore we will rely on our local Artifactory server to keep build times down.

On the Jenkins front page, select “Manage Jenkins” to access the system configuration. Now that you have the “config file provider plugin” installed, you should see a new entry called “Managed files” in the list of configuration options.

Click on it to set up the shared configuration files. Now, in the menu on the left hand side, select the “Add a new Config” option and select the “Simple XML file” option.

In the next dialog, ignore the “ID” field – it is some internal identifier that probably should have never been visible here.

Fill in the Name as “standard-ivy-settings.xml” and the contents from the ivysettings.xml file from your home directory. This file is also available here on Github.

Hit submit and select the “Add a new Config” option and this time select the “Custom file” option. Once again ignore the “ID” field, and fill out the Name field as “standard-reporting-build-properties.properties“.

Now you have all shared files set up and are ready to configure the actual project.

Setting up a Pentaho Reporting Project

Create a new project by clicking “New Item”, and selecting to “Build a free-style software project”. Give it the name “pentaho-reporting”. This project will build the reporting engine and will publish all artefacts into a private local repository.

The next screen that comes up will be the configure project screen. We will go through the configuration from top to bottom.

In short, this is what we are going to do:

We setup the build to poll your local working copy of the Pentaho Reporting project for changes every 2 minutes and to build the project automatically if needed. To make sure the build is not interrupting your work (and to ensure that neither does your work interrupt the build), we will set up a local ivy repository cache within the workspace of Jenkins. As Git’s cleanup and checkout routines can be a bit funny when others put data into their workspace, we keep all sources in a separate directory.

1. Discard old builds to preserve disk-space

Tick the checkbox in front of “Discard Old Builds” to delete old project artefacts. When building, we are mostly interested in the JUnit test results, and keep only the latest copy around for eventual manual testing. Click the “Advanced ..” button to see all configuration options, and select “Max # of builds to keep with artifacts”.

2. Configure the Source Code Management.

Select “Git” as your source code management tool. If you dont see Git here, check that you installed the correct plugin and that you restarted your Jenkins server.

Enter the local path to your current working directory as Repository URL. These are the checkout sources on which you work normally. We let the CI server monitor your local commits and let it build the software in the background.

As you access your sources locally, you don’t need any credentials. Next add some additional behaviours to clean out the workspace and to check out the code into a sub-directory within the workspace. Add the “Checkout to a sub-directory” behaviour and set the “Local subdirectory for repo” setting to “code”.

You can set up a specific branch that this CI server should monitor. If you leave the “Branches to build” as it is with the value of “*/master”, it will monitor commits to your “master” branch. I usually set this to the current feature branch on which I am working. Alternatively you could point this to a “ci” branch and push or rebase to that branch to your latest changes whenever you want to trigger a build.

3. Configure the build triggers.

This is simple: Tick the “Poll SCM” check-box and set the schedule to “*/2 * * * *”. This tells Jenkins to look for new changes every 2 minutes.

4. Prepare the Build Environment

To ensure that builds are valid, I usually take extra care to clean out the workspace. This makes sure that left-overs from previous builds do not affect the current build.

Check the “Delete workspace before build starts” check-box. To see the exclude options, hit the “Advanced..” button.

We delete everything except the “code” directory (git’s cleanup step takes care of that) and the provided configuration directory (we can guarantee the content of that one).

Next, we provide some configuration files for the build. We prepared these files earlier on. Set up the ivy-settings-file and set its “Target” to “bin/conf/ivysettings.xml”.

Second, set up the reporting-build-properties and set the “Target” for those to “bin/conf/.pentaho-reporting-build-settings.properties”

We will configure the ANT build to pick up these files instead of using the files from your home directory. This again shield you and your build from unwanted interactions.

5. Finally, configure the actual build

Click the “Add build step” button and choose the “Invoke Ant” step. As usual, net select “Advanced..” to see all options.

Set the “Ant version” to the ant installation you configured earlier on. Like we did on the command line, we will execute the “continuous-local-junit” target. As we checked out the source code into a sub-directory, we have to tell Ant how to find its entry point, by entering “code/build.xml” into the “Build file” text box.

Now set some properties for the build. This will tell Ant how to find its configuration and it will configure Ivy to store all downloaded and published artefacts within the Jenkins workspace.

# Point to the parent report's ivy cache so that we resolve
# against the last successful build from there
ivy.default.ivy.user.dir=${WORKSPACE}/bin/ivy
# Point the build towards our configuration files.
user.home=${WORKSPACE}/bin/conf

And finally: We will have to give Ant a bit more memory than usual. The Pentaho reporting build process is complex and involves a lot of work, so set the “Java Options” to “-XX:MaxPermSize=256M -Xmx1024m” to avoid “OutOfMemory” errors.

6. Collect artefacts and test results

As a last step for now, we keep both the final report-designer ZIP file as artefact and collect and aggregate the test-results from all modules into one nice looking report on the Jenkins project page.

Now hit the “Save” button on the bottom and start your first build by choosing the “Build Now” link in the top left menu.

Congratulations, you now have a local CI server watching your commits and building and validating your project for you.

]]>0Thomashttps://www.on-reporting.com/https://www.on-reporting.com/?p=4472014-07-09T12:46:02Z2014-07-03T15:41:30ZWhen developing bug-fixes or new features for software I write (in the context of this blog: Pentaho Reporting), I tend to follow a test-supported development model. Its not the pure “write tests first” approach of the Test-Driven crowd, but a workable approximation of the main idea behind it: Any code written should have some automatic […]

]]>When developing bug-fixes or new features for software I write (in the context of this blog: Pentaho Reporting), I tend to follow a test-supported development model. Its not the pure “write tests first” approach of the Test-Driven crowd, but a workable approximation of the main idea behind it: Any code written should have some automatic validation in place to make sure it works as intended today and in the future.

The number one prerequisite to make that work is a fast feedback loop between me writing the code and an automated system to tell me I messed up again.

In this series of three posts I am going to demonstrate how to set up a local build environment that greatly speeds up the build process and that automatically validates all builds for you.

Contents of this series

In this first post I will introduce some necessary changes to the build scripts that allow us to manage the build configuration from a central location and then to use this to speed up the process.

In the second post I will show you how to set up a CI server and how to feed it with your changes, so that it builds your work for you the moment you push the changes to a shared (possibly private) git repository.

And finally, in the third post, I will show you how to create a build chain to assemble a BI-Server and how you can create and run integration tests against this assembly. This then provides you the means to run both Java JUnit-tests and JavaScript Jasmine tests against a fully deployed BI-server to make sure that everything still works.

Why does the build process needs improving?

When you start developing today with the infrastructure provided by Pentaho, this development approach is nearly impossible. Pentaho’s CI servers only build the official branches of the various repositories. So any code will only be automatically tested after I have pushed the code into the main line that may become the next release.

I could – of course – manually test the code. Well, my problem is that normally I am quite god-damn-sure that my code is all correct. Call it trigger happy. So I need a system that validates my work regardless of my opinion.

Let’s agree that this is not safe. I need to be able to automatically validate changes before I start the pull request, or latest after the pull request is created but before it is merged. Ideally I should be able to validate the code automatically in a tight loop with early feedback within minutes to no more than one hour at worst, instead of the 8+ hours which are the norm today.

And finally, today the Maven artefacts produced by the official build process are near useless, thanks to marking every dependency optional. So lets fix that as well.

The fixes in this post are nearly non-intrusive. Instead of forcing these changes into the build process, I chose to make the overrides optional to give you the choice to activate them as needed. If you’re building a one-off build, you wont see them and you build with a process as similar as possible to the Pentaho build servers. But if you need it, you now can tweak the build without having to maintain a separate fork of the sources.

The only mandatory changes that are required to the build files load additional properties from the home directory and inject a set of new Ant-targets into the build without altering the behaviour of Pentaho’s subfloor build system. Without any user-defined overrides, this system will behave as good or bad as before – but now you have a choice to change that for your local build whenever and however you want.

How to fix the build process in three easy steps

The changes I introduce to the build can be organized into three stages:

Prepare the build to allow us to override settings and to inject tasks

Make the local publish process work faster by setting up a local Artifactory repository as proxy.

Actually fix the build by overriding some subfloor targets with better alternatives.

Stage 1: Prepare the build (already done for you in the reporting projects)

Since the day we moved Pentaho Reporting to GitHub for the version 5.0 release, the build files contain the ability to define build properties in a central configuration file in your home directory.

This will already allow you to replace things like the location of the “ivyconfig.xml” file, tweak config options or even to replace the common build file (also known as subfloor) with your own version.

However, while overriding configuration options is safe, replacing subfloor is not. If there are updates to the build process, you would have to manually keep in sync with them, effectively maintaining your own fork. Subfloor is complex enough to make this work potentially dangerous.

If replacing subfloor is a sword, then the next change is the equivalent of a scalpel:

With this change, we inject a ant-include file before we finally load subfloor. When Ant loads build files, it allows multiple build files to define the same target. However, once a target is defined, subsequent build files cannot override this declaration.

When loading a build file with imports, it first fully defines all targets before it processes the import. The first file loaded is always your build.xml file, followed by the imports in the order they appear in the file. So by putting the import for the “reporting-shared.xml” file before the subfloor import, we now can replace subfloor targets without having to alter subfloor itself.

I keep the reporting-shared.xml file in a central directory (build-res) at the root of the repository. This way, if there are changes necessary, only one file has to be changed.

The file is simple and only provides a common set of targets to make it easier to build the whole of Pentaho Reporting in an automated way (or on the command line).

It defines the following targets:

continuous-local : builds the project, runs the normal unit tests with code coverage and publishes the artefacts to the local ivy cache. This takes a hell of a long time as cobertura is slow on large projects.

continuous-local-junit: Builds the project as above, but only runs standard JUnit tests. This is the normal target for validating changes quickly.

continuous-local-testless: Builds the project without running tests. Use this if you just need the binaries and if you have a trustworthy CI server to run the actual tests.

continuous-junit: Builds the project with unit tests without code coverage. The artefacts are published to a Maven repository. You will need your own Maven repository server for that and you will have to configure ivy to resolve against this server as well.

longrun-test: Runs integration level tests with code-coverage. These tests take a extremely long time to run and require that you have published the artefacts either locally or to a Maven server, as the build will depend on some of these artefacts.

longrun-junit: Runs the integration level tests with normal JUnit. This is considerably faster than the cobertura runs.

In addition to these targets, Pentaho’s subfloor also defines some useful targets for CI and command line environments:

continuous: Builds the project with cobertura test-coverage and publishes artefacts to a Maven repository server. As with all cobertura targets, prepare to wait for a hell of a long time.

continuous-testless: Builds the project without running any tests, and publishes the artefacts to a Maven repository. I do not recommend this. If you publish shared artefacts, take the time to at least minimally validate them by running the unit-tests. Otherwise you may just introduce a bunch of hard to trace bugs in an otherwise trusted source.

With those changes in place, we now have a well-defined build environment on which to build.

If you want to adapt this process to other Pentaho projects, then this should be painless as long as they do not stray too far away from the standard build process. I successfully patched the BI-Server and the server plugins with this process – but have not tried the same with either Mondrian or Kettle, as neither of those projects need to be built locally to get a working pentaho-bi-server assembly on demand.

Just checking out the Pentaho Reporting project and running

ant continuous-local

will now build all the modules and finally create the finished Pentaho Report Designer in the “designer/report-designer-assembly-dist” directory.

On my machine (i7-3537U) this takes 68 minutes from a freshly cleaned cache and 17 minutes from a populated cache that is up to date with Pentaho’s latest builds. Normally the build times are somewhere in-between these two extremes, as when snapshot builds get rebuild they need to be downloaded again.

Step 2: Speed up the local builds by installing Artifactory as a local proxy

When you profile the build, either by just looking at it or by using an ant-profiler plugin, you will see that most of the time during the build is spent resolving libraries and copying artefacts around.

Let’s fix that.

When Ivy resolves artefacts, it follows a simple process: For release versions, it tries to find any server that contains the release artefacts. And for snapshot versions it contacts all repository servers to find the latest version of all snapshot releases.

So if you have one server in your resolver list, Ivy contacts one server. If you have 10 servers, ivy contacts all 10 servers, and then chooses one to download artefacts from. It does this for each artefact separately. So if you need to download 100 artefacts, Ivy will possibly make 10 x 100 connections to remote servers and then possibly another 100 connections to download the artefacts. Each connection attempt takes time to answer. And even though the bandwidth available to us has improved, the response time has not. It is not uncommon to wait 500ms for a server to return with a response. At 1000 requests, that makes 500seconds (or 8 minutes) of plain waiting time.

So lets fix the network problem first, by installing a local proxy. This way, requests for cached artefacts are guaranteed to be answered in less than 10 milliseconds, and with virtually no upper limit on the bandwidth used to download them.

1. Install Artifactory as your local proxy server

First, download and install Artifactory. Download the ZIP version from the JFrog download pages. Unzip this file into a directory, for instance into your HOME directory. Then all you need is to start it via artifactory.bat (windows) or artifactory.sh (Unix, Mac).

This assumes that you have JAVA_HOME defined as environment variable and have it pointing to a JDK 1.7 installation, so that the startup scripts can find a valid Java installation.

Then access your server via “http://localhost:8081″ (note: This is different from the usual 8080 port used for servlet container). The predefined administrator account uses the username “admin” and the password “password”.

Now that Artifactory is running, we need to configure it a bit.

2. Configure Artifactory to know about the Pentaho Repository

Log in as admin and click on the “Admin” tab.

On the side, click on “Repositories” to bring up the repository configuration.

On the repositories configuration page, locate the section labelled “Remote Repositories”. Click the “New” button to add a new remote repository.

Give this repository the name “pentaho-public”. Now set the repository URL to “http://repository.pentaho.org/artifactory/pentaho”. Make sure the “Handle Releases” and “Handle Snapshot” options are selected.

Pentaho maintains a second repository of artifacts that we need during the build process, but which have never been published on a public Maven repository, and for artefacts where the public Maven copy is broken.

As before, configure a new remote repository with the identifier “pentaho-third-party” and the URL “http://repository.pentaho.org/artifactory/third-party”. Make sure the “Handle Releases” and “Handle Snapshot” options are selected here as well.

3. Configure the remote repositories.

Artifactory acts as a caching proxy. Therefore for each request we send to the server, the server will contact all configured repositories to find the best artefact for us. As with plain ivy – the more repositories we have to ask, the more time we waste.

Locate the “Virtual Repositories” section and select the repository named “remote-repos”.

Remove all repositories from the list of active repositories until you have only “repo1″ left.

And finally, locate the “pentaho-public” repository you just created and add it to the list of active repositories. Now do the same with the “pentaho-third-party” repository as well.

And finally, reorder the configured remote repositories so that the “repo1″ repository is the last element in the list, as shown on the screen-shot.

Congratulation: You now have an Artifactory server that can serve as your local proxy. Lets tell the build process about it!

4. Create an ivysettings.xml file.

Ivy knows about remote servers via its ivysettings.xml file. The Pentaho projects come with a settings file that points to the public pentaho servers. Now that we have our own proxy, we want to use that one.

The file contains sensible defaults to allow you to access a configurable Maven repository. It is based on the Pentaho ivysettings.xml file, but also updates the caching strategy to be safe when run in a CI environment. And last but not least, it separates this build from the default configuration used by Pentaho’s default settings, so that we minimize any conflicts.

Now we need to define the necessary overrides to make the build use this file.

# Build override for Pentaho Reporting
# Fail the build if a test fails
# Fail the build if a error occurs
junit.haltonfailure=true
junit.haltonerror=true
ivy.settingsurl=file:///${user.home}/ivysettings.xml
ivy.repository.resolve=http://localhost:8081/artifactory/libs-snapshot
# Used later during publish to a maven server
ivy.repository.id=libs-snapshot
ivy.repository.publish=http://localhost:8081/artifactory/libs-snapshot

This file first changes the default for unit tests to a safer option. Now, if there is any kind of failure during the tests, the build process will fail. This may seem radical, but if failing tests are a bad thing then failing silently and hoping that a human will notice is even worse.

Now build the whole reporting project again, via

ant continuous-local

The first time you run this, your Artifactory server will reach out to the public Maven and Pentaho Server to download all artefacts. Any subsequent access will be cached and will be much faster.

And last but not least, but equally important: If the Pentaho server goes down, your Artifactory server will continue to serve artefacts and will periodically check whether Pentaho’s server comes up again. As I am writing this, the Pentaho server seems to deliver read-timeouts instead of artefacts, but the local cache holds up against it.

A clean cache build – same as above – now takes 45 minutes and the fully cached build slightly exceeds the pure ivy build with 15 minutes. However, thanks to the stronger caching promises made by Artifactory, you will now always hover closer to the 15 minutes than the 45 or 68 minutes of a cold cache build.

This Artifactory server will see more use later during the CI builds in part 2 of this series. The integration tests require this server to be running as Ivy and Maven do not communicate well without a server to translate their requests.

Advanced bonus content

To quickly configure your Artifactory server, use this prepared config descriptor. Go to the Admin tab in Artifactory and locate the “Config Descriptor” option on the side (under the “Advanced” Category). Then copy the contents of this Gist into the text box and hit save. This configures your Artifactory server immediately with all the correct settings. But a big warning: This will overwrite any other configuration you may have made before. Use with care.

]]>0Thomashttps://www.on-reporting.com/https://www.on-reporting.com/?p=4352014-06-09T12:54:24Z2014-06-09T12:52:00ZWhen we work with text on a computer, we usually seem to take it for granted that whatever we write is stored by the system exactly as we write it. If you are old enough to remember DOS and code-pages, then you also remember the fun one can have when trying to exchange documents across […]

]]>When we work with text on a computer, we usually seem to take it for granted that whatever we write is stored by the system exactly as we write it. If you are old enough to remember DOS and code-pages, then you also remember the fun one can have when trying to exchange documents across regional borders.

Each region has its own set of characters they consider important. Everyone can agree on the first 7-bit of the character set encodings, but move beyond that any you are in no-mans land.

Now it is 2014 and it this was relevant 20 years ago. So why bother?

It matters, because old standards never die. In Java, some files used by standard functions have prescribed encodings. For instance, property files used by Java-code must always be encoded as ISO-8859-1. Property files used by GWT code must be encoded as UTF-8. Source files can have any encoding and the compiler will select the encoding based on your system default.

So for property files, you have to keep two separate copies in case you share code and you have to tell your editor that this file types may be different based on how you human operator want to use them. So there is a default – but the default sucks.

But worst: The compiler accepts everything and uses a system dependent default. So unless you specify the encoding manually, your local compile result can differ greatly from the result produced by a different machine. Now try to debug that!

As part of the Pentaho Reporting 5.0 release I normalized all the source code of Pentaho Reporting to the safe US-ASCII character set. Remember, all character sets in use agree on the lower 7-bits of their character range. This corresponds to the US-ASCII set of characters. Therefore, no matter where you are in the world and no matter what machine you use – the source code will look and behave the same everywhere.

As a result, I ‘fixed’ the salt-string for our password obscuration, which contained the Unicode character ‘SECTION SIGN’ (U+00A7). Apparently, when you use ISO-8859-1 as encoding for source code, this character works just fine. But our CI and release build machine actually uses UTF-8 as its default encoding for source code, and there this character is invalid, and must be encoded as a 2-byte sequence.

Therefore, the release of Pentaho Reporting 5.0 decodes and encodes passwords differently than the previous 4.8 release. Now, if you were always good and enterprise ready, you would use JNDI datasources and none of that would matter to you. But some users want or need to access databases that are not defined via JNDI – and thus store passwords in the PRPT file.

All these reports broke when running on 5.0 with a “Invalid password” error reported by the database.

Testing with a local build, however, showed no error. The reports ran fine.

Only thanks to the detailed bug-report with a great sample on what causes the breakage I was able to make the connection to ultimately narrow it down to a variation in the binary files produced by the build machine. Thank god that Pentaho Reporting is part of our open source offering and that there was no code obscuration used in that build. Therefore I was able to decompile the class file and see where that restored source code differed from the sources we have in GIT.

Lessons learned: (1) Don’t assume your sources reflect your binary files. (2) Machine dependent defaults suck as much today as they did 20 years ago. And (3) Never assume that the user provides sensible settings and thus use the safest option possible. In our case encode all files as US-ASCII with plenty of use of escape-sequences for characters outside of that range. It may be ugly, but it is guaranteed to work on every machine regardless of the developer’s defaults.

]]>1Thomashttps://www.on-reporting.com/https://www.on-reporting.com/?p=4302014-05-14T15:13:46Z2014-05-12T14:40:19ZIn the upcoming version 5.1, Pentaho Reporting will now ship with (for this version) experimental support for printing bidirectional text. Bi-Directional text processing enables us to print both Arabic, Hebrew and other non-Latin languages. Support for Arabic text was on our backlog for a decade now. Thanks to the work done by Nortal and Marian […]

]]>In the upcoming version 5.1, Pentaho Reporting will now ship with (for this version) experimental support for printing bidirectional text. Bi-Directional text processing enables us to print both Arabic, Hebrew and other non-Latin languages.

Support for Arabic text was on our backlog for a decade now. Thanks to the work done by Nortal and Marian Androne in particular, we now have a new text processing sub-system that relies on the JDK’s TextLayout class to process complex text. The text-layout class handles all the line-breaking calculations, while some additional helper code around it adds stronger rich-text features, including embedded images, to the mix.

Why experimental?

The new text layouting system is a large chunk of code and came comparatively late in the development process for version 5.1. We have maybe a month of development time left until we are supposed to finalize the next release. The AWT itself is platform dependent and next to impossible to unit-test properly. The results of any font and text processing are heavily dependent on your JDK version and vendor and the operating system and its configuration and available fonts.

To properly test the new code, we will need a considerable amount of time, as that testing will be either manual or will require a whole new approach to testing to insulate ourselves from the platform specifics.

Oh, and no one at Pentaho seems to speak Arabic or Hebrew to validate the somewhat critical correctness of the BiDi-Text processing. As glyphs flow together and – for the untrained eye – insignificant dots and lines can alter the meaning of the text that is printed, I personally do not feel confident to vouch for the correctness of the print without more tests.

The text might be fine, or a misspelling might start a (national/corporate/marital) war. So let’s play it safe for now.

How to enable Arabic Text support.

You can enable Arabic text processing on a per-report basis by setting the attribute “common::complex-text” to “true”. If you want to enable this globally, add the configuration setting “org.pentaho.reporting.engine.classic.core.layout.fontrenderer.ComplexTextLayout=true” to the “classic-engine.properties” file in the root of your classpath.

Once the complex text processing is enabled, you can use a couple of new styles in your reports. (These styles can be set regardless of the complex-text processing setting – but they will have no effect if the old text processor is used.)

When using Arabic or any other non-Latin text, it is critical to NEVER EVER EVER! use any of the built-in fonts (“Serif”, “SansSerif”, “Monospaced”, “Dialog”) or your export to PDF will produce invalid output. The PDF specification does not support non-Latin text for these fonts and will fail silently.

Apparently, in 1985, when the PDF specifications were made, no one could have foreseen that Arabic people would want to use computers for printing texts in their native language.

Once complex text processing is enabled, you can control the default flow of the text via a new style property named “text-direction”. This style is inherited, so to define a preference for Right-To-Left text processing for the whole report, it is sufficient to define this style only once on the master-report object.

By using the text-layout class, we now also gained the ability to break text within words. The new inheritable style property “word-break” allows you to control this feature. If not defined, this defaults to “true” (breaks only at word-boundaries), just like in the previous versions.

Reporting Bugs

Please help us to squash all remaining bugs in this new feature by giving it a try. And if you happen to be a native or fluent speaker of a Right-to-Left language, we would love to hear whether we print everything correctly.

]]>2Thomashttps://www.on-reporting.com/https://www.on-reporting.com/?p=4242014-04-09T12:42:00Z2014-04-09T12:42:00ZWhile working on one of my many side-projects, I stumbled across a very complete and understandable explanation of the various garbage collector algorithms used inside the JDK and their performance impacts. If you want to tickle the most out of your BI-server or Carte server installations, the article “Java Garbage Collectors distilled” at the Mechanical […]

]]>While working on one of my many side-projects, I stumbled across a very complete and understandable explanation of the various garbage collector algorithms used inside the JDK and their performance impacts.

]]>0Thomashttps://www.on-reporting.com/https://www.on-reporting.com/?p=4122014-03-24T15:33:46Z2014-01-21T16:21:34ZEver felt that getting the Pentaho BI-server to spit out CSV or Excel files should be faster? As unbelievable as it sounds, with case PRD-4921 closed, we have up to 5 (in words: five!) times faster exports now. Its one of those occasions where talking about customer problems creates a wacky idea that falls on […]

]]>Ever felt that getting the Pentaho BI-server to spit out CSV or Excel files should be faster? As unbelievable as it sounds, with case PRD-4921 closed, we have up to 5 (in words: five!) times faster exports now.

Its one of those occasions where talking about customer problems creates a wacky idea that falls on fruitful ground.

Many customers use the Interactive Reporting tools (Pentaho Interactive Reporting, Saiku Reporting or the ugly Adhoc Reporting (WAQR)) to get data out of the data-warehouse into Excel or CSV files. Such reports are usually rather simple list reports, with no fancy charting or complex layout structures. However, the reporting engine does not know that, it’s pessimistic nature always assumes the worst.

The Pentaho Reporting engine allows an insane degree of freedom, and via custom report functions, it allows to reconfigure the report on the fly while the report is running. But with that freedom, we no longer can make any assumptions about how a report will look like in the next row. Thus the engine, and the layout subsystem, assume nothing and (apart from a bit of caching of reusable bits) recalculate everything from scratch.

Which takes time.

With PRD-4921, I added a fast-mode to some of the export types (Stream CSV, Stream HTML and XLS/XLSX). The new exporters check whether the report uses only ‘safe’ features, and if so, switches to a template based output instead of using the full layouting.

A report is safe if it does not contain any of the following items:

inline subreports. They are evil, as they can appear anywhere and can be of any complexity.

crosstabs. They are a complex layout and can’t be easily condensed into templated sections.

functions that listen for page-events, other than the standard page-function and page-of-pages functions. During fast-mode, we don’t generate page events and thus these wont output correct values. I am willing to ignore page functions, as data exports are less concerned about page numbers.

any layout-processor-function other than the Row-Banding function. They exist to rewrite the report, which stops us from making assumptions about the report’s structure.

If a report is not safe, the engine falls back to the normal, slow mode. You now just have to wait a bit longer to get your data, but you wont get sudden service interruptions.

For fast reports, the engine produce a template of each root-level band. If the style of a band changes over time (as a result of having Style-expressions), we produce a template for each unique combinations of styles the reporting engine encounters.

Once the engine has a valid template, it can skip all layouting on all subsequent rows of data and can just fill in the data into the template’s place holders. The resulting output is exactly the same as the slow output – minus the waiting time.

So how does this system perform? Here is what my system produces using a 65k rows report (to stay within the limits imposed by Excel97) with 35 columns of data exported. The report has no groups, it is just one big fat stream of data. All times are given in seconds.

Export

5.1 with fix

5.0 with fix

5.0 GA

Fast-CSV

4.5

5.4

-

CSV

25.8

24.5

24.8

Fast-XLS

11.7

11.3

-

XLS

53.2

51.3

213.4

Fast-XLSX

31.3

37.7

-

XLSX

86.0

82.8

232.4

Fast-HTML

10.0

11.1

-

HTML

42.9

43.5

44.9

PDF

66.7

69.2

66.4

As you can see from the data, the fix gave a 4 to 5 times speed up for HTML and CSV exports. The Excel exports were extra slow in 5.0 (and 4.x), and a few fixes in the layout handling and Excel specific exports gave the ‘normal’ mode a speed-up of 3 to 4 times. On top of that, we now have the fast mode, that gives another 2-3 times more raw speed.

Not bad for one week of frantic coding, I guess.

Go grab the 5.1 CI builds to give it a go. You will need an updated BI-Server reporting plugin to make the BI-server (and thus the Adhoc reporting tools) pick up that change.

The 5.0 branch does not have those changes in, so don’t even try the CI builds for it. As the 5.0 codeline is in lock-down for bug-fixes only, these performance improvements will take a while to go in, as we don’t want to introduce regressions that break systems in production.