JEE and test automation

In my current assignment, I had the opportunity to discuss with Data Warehouse (DWH) experts about its integration with the rest of the information system. I noticed that not every stakeholders (included Data Warehouse professionals) use the same vocabulary.
During the discussions, people raised words such as “Data Warehouse”, “Data Marts”, ODS, “Data Lake” and so on. Some of the words were used interchangeably which does not help to follow the discussion. As I was not familiar with several of them, I decided to do my homework and to come up with a small glossary to provide a common ground for further discussions.

Disclaimer : I am not an expert in the field, I only tried to come up with a couple of definitions to get a common ground for further discussions. Please experts in the field, help me to improve this!

Business Intelligence (B.I.)

Business intelligence is a set of methodologies, processes, architectures, and technologies that transform raw data into meaningful and useful information. It allows business users to make informed business decisions with real-time data that can put a company ahead of its competitors.

Boris Evelson - Forrester

Business Intelligence Systems usually (but not always) rely on a Data Warehouse to provide information out of raw operational data.
The following diagram shows the relationships between the different levels involved in making a decision.
Information emerge from consolidated data thus helping the user to improve her knowledge on a given subject. Using this knowledge, she then can make a informed decision.
How B.I. helps Decision making

Business Intelligence Systems have the following properties:

They leverage raw heterogeneous operational data

They enable multi-dimensional information and operations on it

They are driven by the business

They have to be performant and must not interfere with daily operations

Data Warehouse (DWH)

DWHs are central repositories of integrated data from one or more disparate sources. Its purpose is to organize and homogenize data into information. User can then leverage this information into knowledge and therefore make informed decision (see B.I.).

There are three main approaches on how to build a data warehouse.

William Inmon’s approach

According to William Inmon that originally coined the term “Data Warehouse”, a data warehouse has the following properties:

Subject oriented : This implies that data are organized around the business and not around the sources. For instance, several accounting data sources are consolidated into one accounting data warehouse. The purpose of which is to letting information emerge out of data.

Integrated : Coming from different sources, data must be standardized to enable consistency and thus letting information emerge. For instance, customer identification must be normalized across different sources.

Non-volatile : Once in the data warehouse, data must not be altered. Therefore, data is available for future comparison.

Time-variant : Changes made on data over time are tracked. For instance, each and every change to a customer country of residence are tracked.

Inmon’s model follows a top-down approach. First, a complete (enterprise wide) Data Warehouse (DW) is created in third normal form (3NF : avoiding duplication and having referential integrity) and then, if required, datamarts (DMT) are provisioned out of the DW. Datamarts in Inmon’s model are in 3NF from which the OLAP cubes are built.
For Inmon, data quality and coherency is paramount and thus the 3NF in the DW and the DMs.

The Imnon Top-Down model

Ralph Kimball’s approach

According to Kimball another prominent actor in this field, “Data Warehouse is a system that extracts, cleans, conforms, and delivers source data into a dimensional data store and then supports and implements querying and analysis for the purpose of decision making.” This definition does not contradict Inmon’s properties. The difference lies in the architecture.

Kimball’s model follows a bottom-up approach. First, some (Datamarts)(#dmt) (DM) emerge directly sourced from OLTP (Online Transaction Processing Systems) systems usually follow the company processes and organization.
The Datamarts are either in 3NF (OLAP cubes are built on top of them) or de-normalized star schemas.

The Kimball Bottum-Up model

The Hybrid model - A typical architecture

Both of these approaches have their pros and cons. Kimball’s model is easy to start with because of the bottom-up approach and hence you can start small and scale-up eventually. Moreover, the ROI is usually better with Kimball’s model.
Because of this approach it is difficult to created re-usable data structures and operations (extraction) for different datamarts. Finally, you may end-up with consistency problems.
On the other hand, Inmon’s approach is structured and easier to maintain at the cost of being rigid and more expensive.

Real-life DWH implementation often end-up using a hybrid architecure. The following architecture relies on the following biulding blocks:

an 3NF DWH with full history to enable the creation unanticipated datamarts

datamarts that rely either on 3NF or on the star schema for better performances

A typical Hybrid Architecture

Data Mart (DMT)

A datamart is essentially a basic building block of the data warehouse.
It is subject-oriented subset of a Data Warehouse. Data mart does not explicitly imply the presence of a multi-dimensional technology such as OLAP and data mart does not explicitly imply the presence of summarized numerical data.

Operational Data Store (ODS)

An operational data store (ODS) is building block of Data warehouse used for immediate reporting with operational data. An ODS contains lightly transformed and lightly integrated operational data with a (rather) short time window. The ODS is usually used when looking for specific events (settling a banking movement or looking for a specific operation). Full history is available in the DWH.

OLAP Cubes

OLAP Cubes are multidimensional arrays of data comming from a relational database. It enables operations such as slicing and dicing (projection), drill down/up, roll-up. A datamart relying on a star schema provides equivalent functionalities. However, in a cube every projection/aggregation are pre-computed (this enables to discover new patterns) whether in a star schema only some projections/aggregations (the one you know are interrested) are pre-computed.

Conclusion

Business Intelligence is a set of practices/methodologies that leverage raw data into decisions. Data warehouses, data marts and cubes are building blocks used to build Busines Intelligence system.

Last month I decided to add a touch of microservices to the JEE course I teach at the University of Geneva.
I ended up with a couple of microservices and as expected I came across the challenge of their integration testing.
This post is specifically about microservices. It rathers focuses on the JEE integration testing experience I encountered while building the microservices. A dedicated blog post will follow on my journey through the building of microservices with JEE.

A short description of the architecture

From a technological perspective, I am using JEE 7 on Wildfly with MySql. Therefore, my microservices are wars composed of 1-2 EJBs plus a restful service that exposes the logic. Typically, my microservices are composed of 4-5 classes of max 150 lines of codes each. On my laptop, a microservice deploys in less than 5 seconds. I mainly need to test EJBs and their database calls. I also want to test integration between microservices.

Why Docker?

As it serves as a example for a course, I want it to be super easy to install/re-install 20 times if necessary. Furthermore, I want fast-paced deployment. To that end, Docker is a great tool because I do not have to bother on what laptop/computer the students work (provided they can run Docker Toolbox). Moreover, all the tools/middlewares I am using for the course are already packaged as Docker images.

Building a Docker image for a Wildfly Integration Test Server

As mentioned previously, I propose to use Wildfly + MySql as the runtime environment. However, at test time, I do not want to start both an application server and a database. More important, I want to get a fresh database for each and every test suite. Futhermore, I would like to use on a simpler setup for the application server. For instance, I do prefer to use an in memory H2 database instead of MySql. I also do not want to bother with LDAP/JAAS configuration, clustering, etc….
Of course, datasource name, realm name and more generally all the resources required by the microservices must be present with their production name.

Ones of Wildfly’s great features is its ability to be configured using the command line.
The first step is to configure a data source relying on H2 that has the same name as the production one.

Then, let’s configure a realm. This is helpful when integration tests rely on principals and do verify the security.
The file jee7-demo-realm-users.properties (resp. jee7-demo-realm-roles.properties) defines the users (resp. the roles) of the realm.

The next step is to enhance the official Wildfly image called jboss/wildfly:latest with the specific configurations required for integration testing. This following Dockerfile describes how to build the image that we will use for testing.
First, it adds the previous configuration files ./config_wildfly.sh ./jee7-demo-realm-roles.properties ./jee7-demo-realm-users.properties to the /opt/jboss/wildfly/customization/ of the image. Then we tell Docker to run the configuration config_wildfly.sh and to do some cleanup. After what, it will record the states as a new image.

Now we can build an image called jee7-test-wildfly using the following command (in the directory where the Dockerfile lives):

1

docker build --no-cache -rm -t jee7-test-wildfly .

The following command runs the image we just built, exposes (and map) the port 8080, 9090 and 8787, and mount the local directory /Users/XXXXXXXXXX/tmp/docker-deploy on the image’s /opt/jboss/wildfly/standalone/deployments/ directory. This is of course the directory in which the integration tests are to be deployed..

Now that we do have a container running an application server with in memory database and a simple realm, we can configure the test harness.

How to configure Arquillian

Arquillian is a JEE integration test framework. It allows to test JEE components such as EJBs or web services.
The first step it to tell Arquillian where the Wildfly server lives. The following arquillian.xml file states that the integration tests should deploy on a Wildfly container that listens at 192.168.99.100 (which is the Docker container address) on port 9990 (which is the administration port). Furthermore, it declares the admin username and password.

A simple integration Test

Let us now write a simple integration test. First, we must tell Arquillian the it is in charge of running the test (@RunWith(Arquillian.class)).

12

@RunWith(Arquillian.class)publicclassStudentServiceImplTest{

To test a given component (let’s say an EJB), arquillian expects a well-formed Java component (either a jar or a war).
The following test is composend of a package contaning the EJB under test (ch.demo), an empty beans.xml to enable CDI and finally a persistence.xml to enable JPA.

The previous JAR as well as the tests are packaged as a WAR and deployed on the application server declared in the arquillian.xml file. The following test injects the EJB under test implementing the StudentService interface and tests its add method.

Maven, IDE Integration and Coverage

Finally, let me add that it is possible to get coverage data from Arquillian by enabling extensions. In the following, it enables`jacoco. This produces coverage data that can be used by the Eclipse ECL-EMMA plugin.

Conclusion

Docker and Arquillian provide a nice and seamless way for JEE integration testing. Nevertheless, I had a hard time at the beginning because Arquillian error handling in case of undeployable test archive is not very good. In this case, make sure that you package your test correctly (in the method annotated @Deployment). In particular, double check beans.xml, web.xml and the JAR/WAR structure. It really helped me to unzip the deployed test archive to figure out what when wrong in my code.

I recently started to use Docker. It is a great tool that significantly increases developer’s productivity. However, I regularly encounter disk space problems when developing new images. Indeed, I sometimes end up with dangling images and containers. Hereafter, a simple script that cleans up most of them.

Recently I came across the following problem : How to propagate information from one enterprise application to another in a transparent manner? Transparent meaning without changing the API, that is without adding transversal information to the services’ parameters. The typical use case is to propagate information such as the language, applicative security roles information (not the JAAS role), or the session-id. Moreover, I would like the information to be “request scoped” and to be automatically cleaned up at the end of the request. This is important to avoid memory leak and to enforce isolation for security reasons. Let me add that I currently work with JEE6 on WebSphere 8.5.5.

I did some research among blogs and forums and I found the following solutions:

1) Passing information in thread-local. This consists in putting information in a Map stored in ThreadLocal. Although this solution is very simple to implement, information is only transmitted inside the current thread. This means that @Asynchronous calls will not get access to the information. Similarly, it is not transmitted through RMI calls that spread on several VMs, as it is usually the case on distributed applications.

2) Using the JNDI. This solution does not suffer of the above limitations as it is distributed by nature but scoping must be implemented on top of the existing JNDI implementation. I think that it may be possible to implement something like a custom CDI scope but it seems rather complex.

3) TransactionSynchronizationRegistry (TSR). This alternative is well documented here. This solution works on a JEE application servers. It looks great at the first sight but it does not support any use case in which there is different transactions (or no transaction) involved. This invalidates any information sharing before a transaction as started, when a transaction has been suspended, or when a new transaction is started. Again, I can imagine that it would possible to propagate the content using interceptors but it is too much plumber code to me.

4) Work Area Service (WAS). This is basically the IBM implementation of solution 2 with scoping. Documentation is clear and it seems easy to implement. Of course, the main drawback is that it is vendor-specific. IBM started a JSR long time ago but it was dropped.

Let us now enumerate several criteria to make a decision about which way to go:

1) Supports of asynchronous calls : during a request it may be necessary to dispatch the processing among several threads and I would like the shared information to be accessible by any threads involved in this request.

2) Is (automatically) “request scoped” : if the shared information is not automatically collected at the end the request we may end up with memory leaks. Manual collection is never a good option.

3) Supports Remote Calls : for a given request, we may end up calling several services (EJBs) on others servers and I would like to have an automatic propagation of the information among the clusters nodes.

3) Performance : to be useful, the information sharing must be ubiquitous and therefore it must cheap in terms of resources.

4) Vendor Independence : as far as possible an application must rely on known and portable APIs such as JEE. Locking the application to a specific vendor is, in my opinions, is essentially a problem for maintenance. Migrating from one application server to another only happens rarely.

Solution

Async

Scope

RMI

Vendor Indep.

Thread-Local

X

OK

X

OK

JNDI

OK

X

OK

OK

TSR

X

OK

OK

OK

WAS

OK

OK

OK

X

As you can see there is no silver bullet here. I went for the vendor specific solution. It can be nicely encapsulated to isolate the dependency to vendor specific code. Furthermore, several servers have similar mechanism and it can be therefore adapted. Here is why it was not possible in my setup to use the other alternatives:

The Thread-local solution is not acceptable because it does not support Remote Method Invocation on several virtual machines.

The JNDI solution requires to implement the scoping mechanism. This can be tricky and it is definitely not my area of expertise.

The TransactionSynchronizationRegistry is JEE compliant but it requires huge machinery to support asynchronous calls as well as transaction suspension and re-creation (REQUIRES_NEW, NOT_SUPPORTED, NEVER). Basically, it does not work if there is not one and only one transaction throughout the request.

[1] Adam Bien. HOW TO PASS CONTEXT IN STANDARD WAY - WITHOUT THREADLOCAL. http://www.adam-bien.com/roller/abien/entry/how_to_pass_context_in

[2] Adam Bien. HOW TO PASS CONTEXT BETWEEN LAYERS WITH THREADLOCAL AND EJB 3.(1). http://www.adam-bien.com/roller/abien/entry/how_to_pass_context_with

In this post, I look at the different kind of objects used for test purposes. By this, I mean objects that are used to make a test running. This article focuses on component testing, a.k.a. unit testing (I do not like the term unit testing because it is too often misunderstood with the technology behind, e.g., JUnit, testNG).
Although there already exists a great number of resources on that subject, it was very difficult to me to understand the differences between the different kinds of test objects. This is partly due to the fact that different authors use different terms for the same object and the same term for different objects [1]. To be as didactic as possible, I also chose to add some blocks of code. Please note, that these blocks are only here for the sake of clarity. This is not the way I would recommend to do stubbing, faking, and mocking. Consider Mockito and PowerMockito for that. These are amazing tools to that purpose. They deserve a post on their own to discuss good practices.
This post is in no way an exhaustive state of the art, I only tried to select the terms that, in my opinion, are clear and are consensual enough. To that end I used a number of sources that can be found in the bibliography section.

Here are the main reasons to use different objects during the test phase and in production:

Performances: the actual object contains slow algorithms and heavy calculation that may impair the test performances. A test should always be fast to not discourage regular run and therefore to identify problems as soon as possible. The worst case being the one in which the developer must deploy and run the entire application to test a single use case.

States: sometimes the constellation under test happens rarely. This is for instances that occur with a low probability such as race conditions, network failure, etc..

Non-deterministic: this is the case of components that have interactions with the real-world such as sensors.

The actual object does not exist: for instance, another team is working on it and is not yet ready.

To instrument the actual dependency: for instance to spy the calls of the CUT to one of its dependencies.

Doubles objects

Test double is the generic term that groups all the categories of objects that are used to fulfill one or several of the previous requirements.
The term comes has been coined by Gerard Meszaros in [2]
In rough terms, test doubles look like the actual object they double. They satisfy, to different extends the original interface and propose a sub-set of the behaviors that is expected by the specification. This helps to isolate the problem and reduce the double implementation to the strict minimum.

There exists different kind of test doubles for different purposes. The have in common that they can be use instead of the actual component without breaking the contract syntactically.

The next figure describes a simple test setup that do not use test doubles. To test the Component Under Test (CUT), the following test setup uses its actual dependencies (another component). This setup phase is trivial as there is nothing to do. The exercise phase calls the CUT with the proper parameters (direct inputs) that in turn calls it dependency (indirect outputs). Another Component returns its result to the CUT (indirect inputs) that uses it to complete the work and then finally returns the overall result (direct outputs). The terms “direct inputs”, “indirect outputs”, and so on come from [2].

Overview of a test setup

Now let us say that “AnotherComponent” is either too complex, not already implemented or has a non-deterministic behavior. In those cases, it is easier to use another implementation of “AnotherComponent” that behaves exactly has expected for a specific scenario.

Hereafter, a simple example to illustrate the rest of the post. The class CUTImpl that realizes the contract CUT implements the component under test. The CUT uses a component that realizes the interface AnotherComponent.
For the sake of clarity, the following example injects the dependencies through the constructor.
To improve loose coupling, it is possible to rely on dependency injection.

packagech.demo.business.service;publicclassAnotherComponentImplimplementsAnotherComponent{publicIntegerinc(Integerparam){if(param==null){thrownewIllegalArgumentException("Param must be not null!");}elseif(param==Integer.MAX_INTEGER){thrownewIllegalStateException("Incrementing MAX_INTEGER will result in overflow!");}else{returnparam+1;}}}

The following test uses real implementations of the the different components.

Dummy objects

Dummy objects are meant to satisfy compile-time check and runtime execution. Dummies do not take part to the test scenario.
Some method signatures of the CUT may require objects as parameters. If neither the test nor the CUT care about these objects, we may choose to pass in a Dummy Object. This can be a null reference, an empty object or a constant. Dummy objects are passed around (to dependencies for instance) but never actually used. Usually they are just used to fill parameter lists. They are meant to replace input/output parameters of the components that the CUT interacts with.

In the current example, the parameter delta of the doBusiness method can be set to null or any Integer value without interfering with the test. Of course, this might be different for another test.

Stub objects

Stub objects provide simple answers to the CUT invocations. It does answer to scenarii that are not foreseen by the current test. In other terms it is a simplified fake object. Stub objects may trigger paths in the CUT that would otherwise not been executed.

The next figure presents a test that relies on a test stub. First, the test case setups a stub object. This object responds to the expected CUT invokation in order to enact a given scenario. This is very useful to check indirect inputs with seldom values.

Test setup that uses a test stub

Back to the example, the following program illustrates how to use a stub to check specific indirect inputs.
This stub shows that the CUT relies on the fact that AnotherComponent does not return null, as it
would otherwise raise a NullPointerException.

packagech.demo.business.service;publicclassCUTTest{publicvoidtestIncWhenAnotherComponentReturnsNull(){//Without any modification of the CUT implement, this would raise an exceptionAssert.assertEquals("inc(3) != 4",4,newCUTImpl(newAnotherComponentStub()).inc(3,1));}}

Fake objects

Fake objects have working implementations, but they may simplify some behaviors. This makes them not suitable for prime time. The idea is that the object actually displays some real behavior but not everything. While a Fake Object is typically built specifically for testing, it is not used as either a control point or an observation point by the test. The most common reasons for using fake objects is that the real component is not available yet, is too slow or cannot be used during tests because of side effects.

Test setup that uses a fake

The following fake simulates most of the behaviors except for the limits (MAX_INTEGER, null, etc…)

Mock objects

Partially implements the interface and provides a way to verify that the calls to the mock objects validate the specification.
Mock objects are pre-programmed with expectations that form a specification of the calls they are expected to receive.
In fact mocks are a certain kind of stub or fake. However, the additional feature mock objects offer on top of acting as simple stubs or fakes is that they provide a flexible way to specify more directly how your function under test should actually operate. In this sense they also act as a kind of recording device: They keep track of which of the mock object’s methods are called, with what kind of parameters, and how many times.

Whenever the assertions are made on the fake object and not the CUT, then it is a mock.

Test setup that uses a mock

The following example uses Mockito to provide easy Mocking. Note that the last assertion Mockito.verify checks whether the mock was called with a given parameters. In other words, we check that the CUT did not filter the input parameter.

1234567891011121314151617181920212223242526

packagech.demo.business.service;publicclassCUTTest{@MockAnotherComponentac;@InjectMocksCUTcut=newCUTImpl();publicvoidtestIncWhenAnotherComponentReturnsNull(){Mockito.when(ac.inc(Integer.MAX_INTEGER)).thenReturn(Integer.MAX_INTEGER+1);Mockito.when(ac.inc(3)).thenReturn(3);Mockito.when(ac.inc(123)).thenReturn(124);Assert.assertEquals("inc(Integer.MIN_INTEGER) != Integer.MIN_INTEGER + 1",Integer.MIN_INTEGER+1,cut.inc(Integer.MIN_INTEGER,1));Assert.assertEquals("inc(3) != 4",4,cut.inc(3,1));Assert.assertEquals("inc(123) != 124",124,cut.inc(123,1));//Verifies that the method inc of AnotherComponent was called with parameter Integer.MAX_INTEGERMockito.verify(ac).inc(Matchers.eq(Integer.MAX_INTEGER));//Verifies that the inc method has been called three times.Mockito.verify(ac,Mockito.times(3)).inc(anyInt());}}

Test Spy

According to Meszaros [2], a test spy is basically a recorder that is able to save the interactions between the CUT and the spy for later verifications.

Test setup that uses a test spy

On the other hand, Mockito considers that a spy is an real implementation in which you change only some specific behaviors. Instead of specifying every behavior one by one, you take an existing object that does most of it and you only change very specific behaviors.

Conclusion

To sum up:

A dummy is just there to enable compilation and is not supposed to be part of the test.

A fake is a partial implementation that can be used either in a component test or in a deployed setting.

A mock is a partial implementation that enables asserting on the component interactions.

A spy is either a recorder for later use or a proxy on a real implementation that is used to override some specific behaviors.

Introduction

In this post, I would like to discuss number of definitions around the testing activity. Having these definitions in mind helps to organize this crucial activity. In a previous post, I discussed the difference between verification and validation. If the difference is not clear to you, please have a look at it prior reading this post.

Let me start with the definition of what is testing. Software testing helps to measure the quality of a software in terms of defects. It is crucial to understand that
“testing shows the presence, not the absence of bugs” [1].

This comes from the fact that exhaustive testing is not possible due to a phenomena called the state space explosion [2]. The idea is that doing exhaustive testing would require a structure in memory that remembers all the tested states of the system. A state of the system being the concatenation of its variables. For instance, let us take a program that has two variables:
- an integer (4 bytes = 32 bits)
- an array of ASCII characters of length 10 (10 bytes = 80 bits)
The number of states to explore is 2112 ~ 5x1033 states (remember that the number of atoms in the observable universe is 1080) and the required amount of memory would be 7.2x1022 Terabytes. Although many optimizations can be brought to a brute force approach [1,2], the problem remains huge. Therefore, exhaustive testing is not an option.

Another important point about defects is to understand from where they originate.
A (software) defect originates in a human mistake (e.g., a misunderstanding) that produces a fault (i.e., a defect, a bug). Under certain circumstances, the faulty code will end up doing something unexpected with respect to the user requirements. This is called a failure.

To sum up, there is the following causality chain : Mistake –> Fault –> Failure.

This demonstrates that testing is not only a matter of detecting the failure but that it can be done earlier. Of course the earlier the defect is detected the cheaper is it to address it. For instance, informing the developer about the business may avoid a mistake. Using automated code checker may detect some faults.

Testing dimensions

Testing can be characterized in terms of dimensions. These dimensions help to categorized the test types.

What : This dimension describes what are the objectives of the tests. Test objectives vary from one approach to another.
Usually the objectives are the verification or the validation of functionnal (e.g., portfolio performance) and non-functionnal (e.g., performance, security) requirements.

How : This defines how the test objective is achieved. For instance, tests can be either static or dynamic, in isolation or in integration, or knowing the implementation.

When : Test can be executed at different moment of the development process. For instance, component testing can be done very early in the development process, while user acceptance test can only be performed when the software is ready for prime time.

Who : Different kind of tests are run by different people (e.g., developer, testers , end-users, …) For instance, component testing can be done by programmers, while user acceptance test are performed by end-users.

Testing level

Testing levels have been addressed in a number of publications, blog posts and talks [3], [4], [5]. Testing levels describe test types by their quantity and when they occur in the software lifecycle. At the base, tests are done early in the development and extensively. The higher the level, the later the test occurs in the lifecycle. Moreover, while lower levels are usually done the the software supplier, higher levels tend to be performed by the customer.

Testing Levels

Static testing

This sort of testing do not require to execute the code. Tools crawl the code and look for patterns that can lead to fault. Example of tools are Findbugs, PMD. This kind of testing is especially useful to detect complex mistakes involving thread-safety or typing.

Unit testing

Test objects are isolated components (classes, packages, programs, …) To promote isolation, test objects such as stubs, fakes or mocks can be used. for more information see Fakes, Stubs, Dummies, Mocks and all that. These tests happen during development and discovered bugs are fixed right away. Therefore, the management overhead is minimal. Both verification of functional and non.functional requirements can be addressed.

Integration testing

Integration testing (a.k.a. assembly testing) verifies the integration between several components. At this level, some components can still be faked to ease deployment and isolation.
Both verification of functional and non.functional requirements can be addressed.

API testing

This is the first test level that addresses validation instead of verification. It tests the software using its contracts (API). This is pure blackbox testing usually by using webservices. Tools such as SoapUI are very good at testing the software API and semantics.

GUI testing

This level acts on the graphical user interface. Example of tools are Selenium

System testing

This test level aims the system as a whole with every internal and external components.
Both verification of functional and non.functional requirements can be addressed.

Acceptance testing

Both verification of functional and non.functional requirements can be addressed.

Bibliography

[1 ]Dijkstra (1969) J.N. Buxton and B. Randell, eds, Software Engineering Techniques, April 1970, p. 16. Report on a conference sponsored by the NATO Science Committee, Rome, Italy, 27–31 October 1969. Possibly the earliest documented use of the famous quote.

I hereby start a series of tutorial on the major (IMHO) JEE 6 features. As a lecturer, I teach the JEE stack at the University of Geneva. This tutorial represents most of the topics that I cover during the EJB, CDI, and JPA lessons.

As the EJB 3.1 specification is rather well written, I encourage people to read it or at least to look at specifics when needed.

To support this tutorial you can find a JEE6-Demo on GitHub. It contains a simple enterprise application that demonstrates many of the
aspects I discuss hereafter. The first step of this tutorial is to prepare the environment:

1) Install GlassFish and integrate it to your favorite development environment (eclipse in my case). You can get GlassFish from here. Grab the zip version and install it by unzipping the archive.

2) Run GlassFish and check that it works. Locate the script named asadmin in the bin directory, start it and at the prompt, execute the command start-domain. This starts the GlassFish instance.

Now that the database is started. It is time to populate the database with the database creation scripts as well as some data.
In the project, under the ejb components, there is a sql file that initializes the database structure as well as some data.

Derby provides a command line tool called ij to connect to the database and to execute sql scripts:

The connection pool provides a … pool of connection that is managed by the application server. A major benefit being that the developer does not have to open and close connections.
The server opens n connections and allocate them on demand. It ensures that there will not be 1000 open connections to the database thus improving the performances.
Then, we create a datasource that uses the previous connection pool:

The tutorial is organized in three parts. First, in this post, we will look at the EJB stack. In following posts, we will cover both the Context and Dependency Injection and the Java Persistence API.

EJBs

The EJB API provides useful non-functional services to Java Enterprise Applications such as transactionality, security, pooling, and thread-safety.
Server side components that implement this API and wrap up business logic are called Enterprise Java Beans (EJBs). While the first versions were infamously known for the boilerplate code as well as their low testability,
EJB 3.x are rather easy to use thanks to a new design paradigms called Convention over Configuration. The idea is to specify unconventional aspects rather than everything. This dramatically reduces the boilerplate code and improves readability.

Furthermore, the massive usage of annotations instead of XML descriptor greatly improves the productivity. In the sequel, I mostly rely on annotations. Nevertheless, every annotation has its counterpart in a configuration descriptor that is called ejb-jar.xml.

In case there is two different value for the same property, remember that one from the ejb-jar.xml always overrides the annotation.

Enterprise Java Beans are of three kinds:

Session Beans: components whose execution is triggered by a client call. The call can be local (within the JVM) or remote (by another JVM).

Message Beans: execution is triggered by a message and the business logic is processed asynchronously. Please note that, since JEE6 there are other (simpler) ways to process logic asynchronously. Nevertheless, Message Beans
remain very useful to implement bus and/or publish-subscribe patterns and to enforce loose coupling between the client and the server component.

Session Beans

As mentioned previously, session beans (merely) react on client invocation. Henceforth, the bean can either take the client session into account (thus it shares a state between multiple client invocations) or it can treat each call as unrelated.
In the first case, the EJB is called Stateful while in the latter it is called Stateless.

Stateless Session Beans

Let us start with a Stateless bean. The following Java class describes a Stateless EJB.
The most important part is the annotation @Stateless at the top of the class.
This marks the class as an EJB. When the container instantiates this class, it knows that
it is a so-called Managed Bean. By this, it means that the container manages the bean’s lifecycle (e.g., instantiation, passivation). Furthermore, as an EJB, it has access to the container services such as security and transactionality.

The following snippet describes a Stateless Session bean that computes a grade distribution. The method is secured and only authorized to client that have the role user through the @RolesAllowed({ "user" }) annotation.

12345678910111213141516

@StatelesspublicclassStudentServiceJPAImplimplementsStudentService{...@Override@RolesAllowed({"user"})// This a role-based security.publicfinalInteger[]getDistribution(finalintn){numberOfAccess++;Integer[]grades=newInteger[n];for(Students:this.getAll()){grades[(s.getAvgGrade().intValue()-1)/(TOTAL/n)]++;}returngrades;}...}

Please note that the previous EJB implements an interface. An EJB can be local (invokable from within the container),
remote (invokable from another JVM) or both. In the previous case, it is a local interface and therefore the bean is only
invokable locally.

123456789

@LocalpublicinterfaceStudentServiceextendsSerializable{/** * @return an array that contains the distribution of the grades. It * partitions the grades in "n" parts. * @param n : number of parts */int[]getDistribution(finalintn);}

Adding a remote invocation to the implementation amounts to add an interface that is annotated @Remote.
Of course, the interface may be different and may exposes different services locally and remotely.

1234567891011

@RemotepublicinterfaceStudentServiceRemoteextendsSerializable{/** * @param lastname * of the student * @return the student with the given lastname. */StudentgetStudentByLastName(Stringlastname);}

The implementation must be changed as follow to be remotely invokable:

To use this EJB, we need a client code. The easiest possible interaction is either via a JAX-RS service or via a servlet.
In the JEE6-Demo, under the JEE6-WEB project the JAX-RS facade is called StudentFacade.java. It implements a JAX-RS facade over the StudentService.

The method getDistribution() uses the student service that is injected via @EJB in the
facade. Dependency injection is now a first class citizen in the JEE6 stack.

123456789101112131415

@ApplicationScoped// This sets the scope to application level@Path("/studentService")publicclassStudentServiceFacadeimplementsSerializable{...@EJBStudentServicestudentService;@GET@Produces({"application/xml","application/json"})@Path("distribution")publicResponsegetDistribution(){Integer[]distrib=studentService.getDistribution(10);returnResponse.ok(newDistributionDto(distrib)).build();}}

The following snippet illustrates the injection of a Stateless EJB into a Servlet.

1234567891011121314151617181920

@WebServlet("/student")publicclassStudentServletextendsHttpServlet{privatestaticfinallongserialVersionUID=1L;@EJBStudentServicestudentService;@OverrideprotectedvoiddoGet(HttpServletRequestrequest,HttpServletResponseresponse)throwsServletException,IOException{response.getOutputStream().println("There are #"+studentService.getAll().size()+" students in the database");}@OverrideprotectedvoiddoPost(HttpServletRequestarg0,HttpServletResponsearg1)throwsServletException,IOException{this.doGet(arg0,arg1);}}

As you can see, implementing EJBs (and invoking them) is rather easy. So far, we did not talk much about transactionality and security so you might wonder why using EJBs for such simple business services. The answer is pooling and thread-safety. As the container manages a pool of EJBs, it handles the invocation queue and you do not have to manager re-entrance and thread-safety.

Remember that there is a one to many relationship between a stateless session bean and the clients. In other words, one instance may manage different clients. Hence, there must be not field that represent a client state in a Stateless EJB because there is no guarantee that a given client will get the same SLSB accross two invokations. Nevertheless, within a given invokation the client is guaranteed to be the only user.

Consider the following exercises.

Exercise 1: Add a new method to the StudentService service that computes the standard deviation and use this method in the JAX-RS facade.

Exercise 2: Add a counter that is incremented each and every time a method is invoked. Print out the thread-id (and if possible the session-id) and explain why the so-called stateless beans must be without state. Add a rest service to interface this method.

Transactionality

Transactions are important concepts when dealing with enterprise applications. It is of vital importance that whenever something goes wrong, the transaction is roll-backed. This preserves the ACID properties. Managing database transactions with JDBC is complex to say the least. Remember that we often have to manage transactions over several databases and even between databases and messaging queues. If the processing of a message leads to a database exception, we might want to put the message back in the queue.

Lucky for us, JEE 6 manages (almost) everything by itself. By default a bean is transactional and each method either reuse a running transaction or creates a new one if necessary.

By default the transaction (or transaction context) is propagated to other business methods that are called from within a transactional method.

In JEE, there is two types of transaction management:

Bean Managed : In this case the bean is annotated with @TransactionManagement(TransactionManagementType.BEAN). In this mode, the developer must manage the transaction himself.

Container Managed: this is the default behavior and it corresponds to marking the bean @TransactionManagement(TransactionManagementType.CONTAINER). This mode is called Container Managed Transaction Demarcation. This idea is that every public method is transactional and is marked as @Required.

Because the previous snippet relies on the default convention, it is equivalent to:

12345678910111213141516171819202122232425262728

@MessageDriven(mappedName="MyQueue")@TransactionManagement(TransactionManagementType.CONTAINER)@TransactionAttribute(TransactionAttributeType.REQUIRED)publicclassStudentRegistrationServiceimplementsMessageListener{@Inject// Let's ignore this for the momentprivatetransientLoggerlogger;@PersistenceContextEntityManagerem;@Resource// Let's ignore this for the momentprivateMessageDrivenContextmdbContext;publicvoidonMessage(MessageinMessage){TextMessagemsg=null;try{...msg=(TextMessage)inMessage;logger.info("MESSAGE BEAN: Message received: "+msg.getText());logger.info(Thread.currentThread().getName());...}catch(JMSExceptione){em.getTransaction().rollback();...}}}

In other terms, it uses container managed transaction demarcation @TransactionManagement(TransactionManagementType.CONTAINER).
Moreover, each and every public method either reuses an existing transaction or creates a new one if none has been started @TransactionAttribute(TransactionAttributeType.REQUIRED)

The @TransactionAttribute annotation can also be put directly on the method. Valid options are:

REQUIRED: if there is an existing transaction then it reuses it, otherwise it creates a new one.

REQUIRES_NEW: it always creates a new transaction

NEVER: if there is an existing transaction then it fails

MANDATORY: if there is an existing transaction then it reuses it, otherwise it fails

NOT_SUPPORTED: If there is a transaction then it suspends it and it does not propagate the transaction to other business methods, otherwise it does nothing.

SUPPORTS: If a transaction exists then it works as for REQUIRED, otherwise it works as for NOT_SUPPORTED.

Consider the following exercises.

Exercise 3: Test the different types of transaction attribute and observe their respective behavior.

Exercise 4: Are the snippets 1 and 2 semantically equivalent?

Exercise 5: Mix the two types of transaction management.

Security

Another great feature of EJBs is their ability for declarative authorization and the integration in the JAAS framework. JAAS is the Java Authentication and Authorization Service
It provides unified services to access user directories for user authentication and their authorization. As the client is already authenticated, at the EJB level, only authorization matters.

After authentication, a principal (e.g., user name, role) is set in the context. This principal is used for authorization.

The main annotation is @RolesAllowed followed by a list of authorized roles. As an example, look at the method getDistribution of StudentServiceJPAImpl.java. The method is only allowed for client that belongs to the group/role user.

Sometimes, a method of an EJB is not executed in the context of an authenticated user or the authenticated user has not enough credentials. If it is still important to run the method, you can use the @RunAs annotation. This basically overrides the current security context (and the associated principal) similarly to sudo for UNIX.

In any EJB, it is possible to get the current client session and therefore the principal:

1234567891011

@ResourceSessionContextctx;publicvoiddoSomething(){// obtain the principal. Principalprincipal=ctx.getCallerPrincipal();// obtain the name. Stringname=callerPrincipal.getName();// Is the caller an admin?ctx.isCallerInRole("admin")}

Exercise 6: Change the method getDistribution to only allow client that belong to the admin group. Log as a user and as an admin, check the servlet as well as the JAX-RS service. What do you observe?

Exercise 7: Add a new Role to the application, and use it. Warning, some configurations are vendor specific.

Exercise 8: Create a service that uses @RunAs to upgrade the credentials of a standard user in order to call another service that requires the admin level.

Callbacks and Interceptors

Interceptors are used to enhance the business method invocations and the beans’ lifecycle events. Interceptors implements the AOP paradigm.

Interceptors are of two kinds:
- Lifecycle event interceptors such as @PostConstruct, @PostActivate, and @PrePassivate, and @PreDestroy enable to add logic when the bean is created or destroyed.
- Call-based interceptors that relies on the @AroundInvoke annotation.

In the first case, adding the annotation before the method declaration is enough. It will then be called when the lifecycle event occurs.
In the latter case, in addition to the annotation the developer must configure the ejb-jar.xml file.

Here is a simple interceptor that prints out the time consumed by a method call:

In addition to the interceptor declaration, it must be activated by mean of either an annotation on the target EJB or via the ejb-jar.xml file. If several interceptors are declared, then the order of declaration is the order of execution: the last is the most nested interceptor.

In the following snippet, the interceptor is applied to all methods whose name is like get* and EJB’s name is like Student*. It is also possible to specify the parameters’ type in case of overloading.

1234567891011

<assembly-descriptor><!-- Default interceptor that will apply to all methods for all beans in deployment --><interceptor-binding><ejb-name>Student*</ejb-name><method><method-name>get*</method-name></method><interceptor-class>ch.demo.business.interceptors.PerformanceEJBInterceptor</interceptor-class></interceptor-binding> ...
</assembly-descriptor>

Exercise 9: Add an interceptor that for all public methods of StudentServiceJPAImpl.java to increments an overall (common to all clients) usage counter using the annotation based approach.

Exercise 10: Do the same, but using the `ejb-jar.xml.

Timers

Some services are not directly linked to a client session. This is the case for batch processes that must run every x hours. To that end, EJB 3.1 provides the annotation @Schedule to specify how often a service method must be called. This is similar to the CRON feature on UNIX systems.

The following example runs the method doSomethingUseful every 30 minutes.

1234

@Schedule(minute="*/30",hour="*")publicvoiddoSomethingUseful(){...}

Exercise 11: Add a scheduler that runs a method every minutes to print out the number of students in the database.

Exercise 12:@Schedule is often used in conjunction with @RunAs. Why?

Asynchronous calls

Asynchronous calling means that the control is returned to the caller before the process has been completed. For instance, a batch job that loads 1000 customers description from one database, processes them and finally save them in another database. From a performances point of view, it is better to split the process in x batches and then to wait until every thing is finished. Firstly, if the asynchronous call is made to another EJB, it is possible to leverage the pool to have parallel treatments without managing multi-threading and race conditions. Secondly, as the different call may use different database connections, it will limit the database bottleneck.

To do this, JEE provides the @Asynchronous annotation. Any method that returns either nothing (void) or an instance of type Future<T> can be run asynchronously.

The Future interface exposes the method get that blocks until the job is over.
The following class, exposes the method processJob that runs asynchronously.

The methods processJobs starts 10 times the method processJob and put the resulting Future<String> in an array. Then it iterates of the pool and wait until all calls have returned.

Exercise 13: Create an EJB that invoke an asynchronous method that runs 100 times. The asynchronous method should wait a random time between 5 and 10 seconds before returning a the actual waiting time. Collect all the waiting times and compute the average.

Singleton Beans

Singleton beans (@Singleton) are special kind of EJBS that exist only in one instance in the container. This is for instance useful to initialize the application. Thus, @Singleton can be associated with @Startup to start the EJB during container’s startup.

As there is only one instance, it means the bean is meant to be shared. Therefore, it is important to specify the locking policy. There are two types of locking:
- Container managed concurrency management @ConcurrencyManagement(ConcurrencyManagementType.CONTAINER). This is the default behavior and it is highly recommended to stick to it. Annotating a method (or the class) with @Lock(LockType.READ) means that the method can be safely access concurrently. @Lock(LockType.WRITE) requires the calls to the method to be serialized. The methods are @Lock(LockType.WRITE) by default.
- Bean managed concurrency management @ConcurrencyManagement(ConcurrencyManagementType.BEAN). This requires the developer to use synchronized and volatile to achieve good concurrency.

By default the class is marked as @Lock(LockType.WRITE), thus EVERY CALL is synchronized. This is probably not the expected behavior (at least most of the time) and produces a huge bottleneck. Make sure to set the proper lock policy on the class and to only put @Lock(LockType.WRITE) where needed.

The following bean initialize a shared variable during container startup and lock the access to the modification to only one thread (client) at a time.

Exercise 14: Write a singleton that reads the postal code from the database during startup, update them every 30 minutes. Take care of the concurrency aspects.

Stateful Session Beans

Unlike stateless session beans, stateful session beans maintain a state across several client invocation. This is because there is a one to one relationship between a client and a stateful session bean.

The following code snippet describes a stateful session bean that maintains a counter across several invocations.
A stateful session bean is annotated with @Stateful. As it is stateful, it maintains a session state and it is therefore important to set a timeout. Otherwise, the number of open session will raise and it will consume the server memory.
The timeout is defined using the @StatefulTimeout annotation.

Because there is no instance sharing among several client, Stateful session beans are more resource-demanding than stateless session beans. Therefore, they should be used with great care.

12345678910111213141516

@Stateful@StatefulTimeout(unit=TimeUnit.MINUTES,value=30)publicclassStudentStatisticsServiceImplimplementsStudentStatisticsService{privateLongnumberOfAccess=0l;@OverridepublicStringgetStatistics(){return"The student service has been invoked #"+numberOfAccess+" in this session";}publicvoidcount(){numberOfAccess++;}}

As a stateful session bean is not per se linked to an HTTP session, the application client code must ensure to remember which instance of the EJB is dedicated to which client.
When used in conjunction with a Servlet, Stateful beans are often put in the HTTP session for further reuse.

1234567891011121314151617181920212223

@OverrideprotectedvoiddoGet(HttpServletRequestrequest,HttpServletResponseresponse)throwsServletException,IOException{StudentStatisticsServicestatistics=(StudentStatisticsService)request.getSession().getAttribute(StudentStatisticsService.class.toString());if(statistics==null){try{InitialContextctx=newInitialContext();statistics=(StudentStatisticsService)ctx.lookup("java:global/JEE6-EAR/JEE6-EJB-0.0.1-SNAPSHOT/StudentStatisticsServiceImpl");request.getSession().setAttribute(StudentStatisticsService.class.toString(),statistics);}catch(NamingExceptione){thrownewRuntimeException(e);}}statistics.count();response.getOutputStream().println("There are #"+studentService.getAll().size()+" students in the database");response.getOutputStream().println(statistics.getStatistics());response.getOutputStream().println(studentService.getStatistics());}

Either the EJB is looked-up with JNDI and then put into the HTTP session or, when using CDI, the link is made based on the caller scope.
The class StudentServiceFacade.java describes how the injection automatically links the correct instance of a stateful session bean to a given based on the @SessionScoped annotation.

Exercise 15: Show that two sessions share the same overall count but not the same per session count.

As the container does not share the stateful instances, it might run out of memory and therefore it will try to save the state of the bean on disk (or database). This operation is called Passivation. Afterwards, when the client comes back and requires its stateful instance, the container loads it from the disk. This operation is called Activation.

Some operations such as initializing/cleaning File, a socket connection, or a database connection can be made before activation and/or after activation using the following
@PrePassivate and the @PostActivate annotations.

Consider the following exercises:

Exercise 16: Stop the server and observe that the bean has been passivated.

Exercise 17: Start the server and observe that the bean has been activated.

Whenever the user logs out and thus invalidate the HTTP Session, you might want to clean up the EJB state and release the resources. To achieve this, the client must call a method annotated with @Remove. As soon as the client call is over the container can garbage the bean. The idea is simply to get ahead of the timeout.

Exercise 18: Implements two methods with @Remove with two different behaviors. Check that both will discard the current instance. Hint: you can use @PreDestroy to check the bean’s destruction.

Message Driven Beans

Message driven beans (MDB) are very useful for asynchronous process and for decoupling the client from the processor. MDB implement a typical publish/subscribe architecture.

There are two kinds of objects, a MDB can listen to:

Topics: pure publish/subscribe semantics. A new message will be processed by every listeners.

Queue: more of a load balancer, once the message is consumed by one of the listeners, it
is removed from the queue.

Although, there is, in JEE6, another mechanism for asynchronous processing, MDBs remain the best alternative if you do not want the client to be aware of the asynchronous logic and processor.

The idea is that there is a queue (or a topic) in between. Therefore, the client as no knowledge whatsoever of the message processor contract or semantics.

The first step is to configure a Queue Connection Factory as well as a Queue in the application server.

The following servlet gets the connection factory as well as the queue injected. As always with dependency injection, the same result can be achieved using JNDI lookups.
For each request to the servlet 100 messages are generated and put in the Queue called MyQueue.

[Rich object model] vs [anemic object model] is long running debate. While the latter encourages to use simple and stupid objects with little or no business in them, the rich object model advocates for a clean object design with inheritance, polymorphism and so on.
The anemic object model is very popular among JEE partitioners because, in the past, the specification did not provide any mean to invoke services in business objects. Therefore, the anemic pattern uses so called “managers” that maintain references to other “managers”. A direct benefit is the clear separation of concerns between the different kind of objects. Basically, it splits processing and data. As this is anti-object oriented, the abstract design of such system is often very different from the actual implementation.

The portfolio example

As example, let us take a portfolio that contains a set of financial position. A financial position can be either a set of stock, or an amount in a given currency. To evaluate the actual portfolio value, we go through the positions and for each of them we ask the current quote for stock to the service QuoteService or the current value of a given currency to the CurrencyService.
The next figure presents the “ideal” design.

An object oriented class diagram of the Portfolio management component.

To achieve this, one need to access services from within business objects. Since EJB 3.1, Context and Dependency Injection (CDI) provides such a mechanism via the @Inject annotation. The only requirement is that the object that requires the service as well as the service to inject are so called “managed beans”. The trick is that not all objects are meant to be managed. Furthermore, having managed lists of object is very tricky to say the least. Fortunately, the EJB 3.1 and more specifically the CDI 1.0 specification provide a way to solve this.
In CDI, the main component is the bean manager. This manager keeps track of the beans to inject via @Inject and other means. Instead of relying on annotations to provide injection, it is possible to use the good old Service Locator pattern. CDI 1.0 exposes the bean manager on JNDI with the name java:comp/BeanManager.

123456789101112131415161718192021222324252627

publicclassServiceLocator{@SuppressWarnings("unchecked")publicstatic<T>TgetInstance(finalClass<T>type){Tresult=null;try{//Access to the current context.InitialContextctx=newInitialContext();//Resolve the bean managerBeanManagermanager=(BeanManager)ctx.lookup("java:comp/BeanManager");//Retrieve all beans of that typeSet<Bean<?>>beans=manager.getBeans(type);Bean<T>bean=(Bean<T>)manager.resolve(beans);if(bean!=null){CreationalContext<T>context=manager.createCreationalContext(bean);if(context!=null){result=(T)manager.getReference(bean,type,context);}}}catch(NamingExceptione){thrownewRuntimeException(e);}returnresult;}}

The client code is very simple. It consists in calling the ServiceLocatorwith the desired interface.
For the sake of clarity, I did not show the ServiceLocator that takes a qualifier in addition to the interface. To add this feature, look at the getBeans(Type beanType, Annotation... qualifiers) method.

Some thoughts on the Demeter law

Let me be clear, I do not recommend this approach everywhere. It is very important to not mix the objects responsabilities. Furthermore,
in order to respect the Demeter law, a business must not directly call something outside of the current component. Calls to other components are always to be done through so-called consumers to have clear components boundaries.
For instance, putting to much intelligence in JPA entities that can be detached and serialized may cause problems on the client side.

Conclusion

In this post, I showed a solution to consume services that are exposed via the CDI BeanManager. These services can be pure POJOs or EJBs.
Nevertheless, this approach must be used with great care as it can blur the components boundaries and responsabilities.

Multi-tenancy is a recurrent non functional requirement. Indeed, many important IT-systems are meant to be shared among multiple tenants. The data are often distributed over several databases or schemas. This, for different reasons:

Security: The data belong to different customers and some level of isolation is required;

Performances: Distributing the data over multiple systems may help to master performance issues;

Legacy: Sometimes, old and new systems must cohabit for a (long) time;

Maintenability: A database or a schema can be updated without putting the rest of the application at risk.

Although data are distributed, the application code should remain tenant agnostic. Furthermore, choosing between the different tenants is often made at runtime based on credentials (e.g. user Joe has access to customer AAAA while user Jane sees data of customer BBB). Java EE 7 will address this problem and much more, but in the mean time here is the way that I use to address this problematic using EJB 3.1 and JPA 2.0

Overall architecture

First, let me start with the overall architecture as described below.

Multi-tenancy architecture with serveral datasources

In the above figure, the database is organized in schemas, with one application server datasource (DS) per schema and one persistence unit (PU) per datasource.
It is also possible to use only one datasoure and to discriminate between schemas by setting the <property name="openjpa.jdbc.Schema" value="TenantX" /> property for each persistence unit (PU). This sets the default schema for the PU.
Here is a persistence.xml file that provides one persistence unit per tenant.

The following code has been tested for Open-JPA but there is nothing specific to this implementation outside of the <provider>tag in the persistence.xmlfile.

The basic idea is that, instead of using @PersistenceContext, we inject our “own” multi tenant entity manager wraper.
Then, at runtime, the multi-tenant entity manager loads the persistence context that corresponds to the current user context from JNDI.
Please note that this only works for JTA-based persistence-units. Otherwise, the persistence context is not container-basd and therefore not exposed to JNDI. Moreover, without JTA, we loose container based transaction demarcation.

Let us first start with the client code. In other words, how to use the Multi-Tenant Entity manager.

Client Code

Here is the client code. In order to preserve thread-safety and transactionality, Data access objects are EJBs (@Stateless, @Stateful, @Singleton). The presented solution uses an entity manager that is wrapped and then injected using @Inject or @EJB. Thread-safety, transactionnality and performances are guaranted by the EJB 3.1 and JPA 2.0 specification as explained in the section Thread-safety and Transactionality. As shown below, the MultiTenancyWrapper delegates to a real entity manager and implements the EntityManager interface. Therefore, its use is very similar to a normal EntityManager injected via @PersistenceContext.

The Multi-Tenant EntityManager EJB

The MultiTenanEntityManagertWrapper simply wraps the entity manager that corresponds to the current user context. The trick is to configure it as an EJB in order to get the xml configuration feature via ejb-jar.xml. Another alternative would be to use the @PersistenceContexts and @PersistenceContext annotations. The main drawback being that, for each new tenant, not only the persistence.xml and ejb-jar.xml must be changed but also the Java code.

The JNDI context that is linked to the current request is injected in the MultiTenantEntityManager using the @Resourceannotation.
As there is no creation of a new InitialContext the overhead is not significant. Actually, the @PersistentContext annotation does the exact same thing except that it is not specific to the user context. The MultiTenanEntityManagertWrapper implements the delegate pattern. This allows to use it (almost) transparently in client code.
The main difference being the use of @Inject or @EJB over @PersistenceContext in the client code.

Using the session context that is specific to the caller bean (and thus the caller request/session) enables transparent support for thread-safety, security and transactionality.

The method getMultiTenantEntityManager of the MultiTenanEntityManagertWrapperImpl extracts the EntityManager that corresponds to the current request from JNDI (we will see later how it has been put there). To that end, the method getMultiTenantEntityManagerfirst extracts the prinipal from the current EJB context (SessionContext). After what, the tenant that corresponds to the current user is used to obtain the JNDI name of the corresponding entity manager. MultiTenanEntityManagertWrapperImpl simple delegates every call to the this Request specific EntityManager.

@StatelesspublicclassMultiTenanEntityManagertWrapperImplimplementsMultiTenanEntityManagertWrapper{privatestaticfinalStringJNDI_ENV="java:comp/env/persistence/";@ResourceSessionContextcontext;privateEntityManagergetMultiTenantEntityManager(){//Extract the name of the current user.Principalp=context.getCallerPrincipal();//Lookup the tenant name for the current user//This is application specificUsersu=Users.getUser(p.getName());//Produces either TENANT1 or TENANT2 StringtenantName=u.getSite().toString();StringjndiName=newStringBuffer(JNDI_ENV).append(tenantName).toString();//Lookup the entity managerEntityManagermanager=(EntityManager)context.lookup(jndiName);if(manager==null){thrownewRuntimeException("Tenant unknown");}returnmanager;}//The delegates@Overridepublicvoidpersist(Objectentity){getMultiTenantEntityManager().persist(entity);}@Overridepublic<T>Tmerge(Tentity){returngetMultiTenantEntityManager().merge(entity);}@Overridepublicvoidremove(Objectentity){getMultiTenantEntityManager().remove(entity);}...}

Now let us see how to put the entity manager references in JNDI.
In order to avoid a lot of annotations (one per tenant) and therefore to be able to handle a huge number of tenans, I propose to use the ejb-jar.xml file to configure the EJB intead of the PersistenceContext annotation. The MultiTenantEntityWrapperEJB is configured as a stateless EJB. Ther persistence contexts are simply exposed to JNDI with the following pattern: java:comp/env/persistence/TENANTX. For more information please look at the EJB 3.1 specification chapter 16.11.1.

<persistence-unit-name>Tenant1</persistence-unit-name> is the name of the PU as defined in the persistence.xml file. <persistence-context-ref-name>persistence/TENANT1</persistence-context-ref-name>defines the name of the entity manager that is exposed via JNDI.

Thread-safety and Transactionality

As this is compliant with both the EJB 3.1 and JPA 2.0 specification, thread-safety and transactionnaly are guaranteed by the container. For more details please look at the EJB 3.1 specification
at chapters 16.10, 16.11 and the JPA 2.0 specification at chapter 7.6. Of course, the wrapper has to be an EJB in order to have access to the current JNDI context without having to create it.
Furthermore, because the EntityManageris not per se thread-safe (JPA 2.0, chapter 7.2), the serialization of the invokations that is provided by the container for EJBs is essential the thread-safety aspect (EJB 3.1, chapter 4.10.13).

Conclusion

In this post, I showed how to leverage EJB 3.1 and JPA 2.0 standard features to provide multi-tenancy. The presented approach is thead-safe, it preserves transactionaly and does not induce
a significant overhead.

A couple of days ago, I had a discussion with a developer about the notion of web conversation.
More precisely, about its utility. During this discussion, we ran into some basic misconceptions
that, IMHO, no web developer must do. Conversation (or flows) are good features of the recent
frameworks (Spring, JSF) because they allow to save information across several user requests without
putting them into the session. For instance, this can be very useful for wizards. Putting these
information into the session asks for manual maintenance, and in particular for manual collection.
This is usual error prone and should be avoided. The problem being that the user rarely logs out
before closing the browser. Thus, HttpSession.invalidate() does not get a chance to be called
and the session remains active until the timeout occurs (usually 20 minutes later).

The person I was talking with, though that it is not necessary to care about that. In his view,
the garbage collector will take care of this and that it can be forced. We discuss the matter a bit
deeper and here are some myths that I think must be rebutted.

First assumption: I can force the garbage collection

This is not true, actually according to the Java documentation:

1

Calling the gc method suggests that the Java Virtual Machine expend effort toward recycling unused objects in order to make the memory they currently occupy available for quick reuse. When control returns from the method call, the Java Virtual Machine has made a best effort to reclaim space from all discarded objects.

System.gc()suggests the garbage collection, it does not enforce it. This signals that you
would like the garbage collector to do its job, but there is no guarantee whatsoever. It is only
possible to enforce garbage collection by writing your own garbage collector, but you cannot expect
all garbage collectors of all the possible VM to act similarly. Therefore, relying on this
assumption is not portable.

Second assumption: If I close the browser the session and its objects are collected

Even If it would be possible to enforce the garbage collection, it is of no use as long as there
remains a single reference to the objects to collect. Again, this is absolutely not the case.
Remember that closing the browser does not mean anything to the server. It would, in theory possible
to imagine something with an ajax call that traps the close event of the browser and does a
HttpSession.invalidate() . This is quite complex to do in a cross-browser manner and gives no
guarantee. Therefore, the data attached to the session will be kept in memory until the session
timeout. This is precisely the beauty of conversation scopes, the developer just tells when the
conversation starts and when it ends. Usually, user do not stop in the middle of a conversation,
they tend to finish it. At least, more often that they click on logout.

Third assumption: Anyway this does not represent a huge amount of memory.

Let us take a simple example: a standard business application that, at some point in the business
process, requires a tree for shop selection. This kind of interactions requires several client-server
communications and therefore several requests. A temptation would be to put the tree in the session
scope. Let us say that 10,000 (logical) users log into the application, for instance from 09:00 am to 09:30
am. If each session requires 100 KB (trees are huge even with lazy loading), we end up with 1G memory(only for the tree).
As most of the users do not click on logout, you rely on the timeout to free these objects.

Conclusion

Using request scope of conversation scope over session scope is a good practice as it frees you
from managing the garbage collection. The code is therefore cleaner and more efficient.
This can be done with JSF’s @ConversationScoped or with Spring Webflow’s Conversation.