QAware | Blog

Dec 20, 2017

Background In Part 1
of this article, I have described the motivation and scope of a research project into the perceptions of code
quality by different stakeholders. The project has been conducted by a consortium which two scientists from Sweden and I led. The
first set of results from that project has been accepted for
publication in the ITiCSE’17 proceedings recently [3], and that is what I report on in this article. Comics in this blog post courtesy ofGeek & Poke.

Code quality is a key issue in software
development. One would expect, therefore, that such a basic concept is well
understood. In fact, we had expected that introductory courses in computing of higher
education programs would communicate this topic clearly and would be the
primary source of information for professionals.

It turns out, however, that
this is not the case. Actually, different groups of people have very different
perceptions of code quality, how they learn about the topic, and what
can (and should) be done to improve code quality.

Study
method

In the study [3], we administered questionnaires
and conducted structured interviews with 34 students, educators, and
professional developers regarding their percept­ions of code quality. Participants
brought along code from their own experience to discuss and exemplify code
quality. From this rich set of data, we so far only analyzed the quantitative
part, that is, the study so far is effectively a survey.

Results
1 : Quality Aspects

Quality was mostly described in terms of
indicators that could measure an aspect of code quality. Among these indicators,
readability [1, 5] was named most frequently by all groups. The next two most
frequent mentions were structure and comprehensibility. All other aspects (documentation,
dynamic behavior, testability, correctness, and maintainability) were mentioned
roughly equally often. There were few differences between different sub
populations regarding readability as the single most important aspect, but
marked differences about all other aspects. As far as developers are concerned, comprehensibility
and correctness follow after readability, while educators name structure,
documentation, and dynamic behavior as the most important aspects after
readability. For students, structure is almost as important as readability,
while correctness is not mentioned at all by students. Similar findings were made when investigating the importance of identifiers in programs [2].

Summing up: developers, educators and students
seem to have quite different perceptions of what makes good code quality.

Results 2: Information
Sources

We also looked at the sources people consult to learn about code quality. When asked about the most important
sources of information about code quality, both students and developers point
to colleagues, while educators refer to textbooks – which is hardly surprising.
However, while the internet was mentioned most often as a source by all groups,
no group referred to the internet as their dominant first choice. Specifically,
developers mentioned the internet only as a secondary point of reference.

So, while everybody talks about StackOverflow and such, colleagues are really the most used inforamtion "resource" for developers and students. Academics mostly refer to textbooks.

Results 3: Tool support

Looking at tool support, we see that educators
and students rely heavily on Integrated Development Environments (IDEs), while
developers mention continuous integration more frequently. Interestingly
though, developers mention static analysis tools as their most frequent primary
choice of tools, while both educators and students mention it only as a
secondary tool of choice. We conclude from this that education should pay
greater emphasis to static analysis tools and continuous integration as tools
for code quality assessment.

We conclude that static analysis is a topic underappreciated in academic teaching (or learning).

Summary
& Outlook

Educators, students, and professional
developers have different opinions on what code quality is, and how they form
their opinions. Practitioners, it appears, follow an apprenticing model. The
internet is an important source of information, but not the first source
consulted. Tools are widely used, particularly static analysis tools such as
SonarQube [6] or the checks built into IDEs.

It is important to note that the data presented here should not be overstretched. It is (so far) only a humble survey after all, and many questions can be asked from a scientific validity point of view (see [3] for a more indepth discussion). Particularly, one can complain about the lack of sample representativity, but, frankly, that is virtually impossible anyway [4].

As pointed out above, we have yet to analyze
the data from the qualitative part of our study. We expect further clues as to
which role code quality plays. For an educational setting, we want to explore
further what students expect, and which teaching methods are the most
promising. This may, in turn, benefit industrial software engineering by better
educated junior developers.

From a more personal perspective, the one thing I took away is the degree to which developers and students depend on their colleagues for information. This underscores once more how important communication and learning are in software development. So much for the cliché of the asocial coder in the dark...

[4]Robert Kraut, Judith Olson, Mahzarin
R. Banaji, Amy Bruckman, Jeffrey Cohen, and Mick Couper. Psychological research
online: Report of Board of Scientific Affairs’ Advisory Group on the conduct of
research on the Internet. American Psychologist, 59:105–117, 2004.

Sep 20, 2017

Developing the Eclipse SmartHome (ESH) bindings with another IDE than Eclipse is difficult because the standard way of developing the bindings requires the tools provided by the Eclipse IDE. One can develop the ESH bindings with IntelliJ, however a crucial question remains open: how to test the developed bindings without having the Eclipse tools at hand?
Using IntelliJ to develop the ESH bindings is different to using Eclipse for two main reasons:

First: Eclipse uses a “manifest first” approach while the Java standard approach is “POM first” with a handwritten or generated manifest. In our example we have used a handwritten manifest.

Second: To debug and test in Eclipse, it is enough start the full environment using the target platform provided by ESH. This starts a JVM with equinox as a local process. To debug and test in IntelliJ, one should copy the binding jar built by MAVEN into a prepared Docker container and then start it.

Minimal Eclipse SmartHome for testing bindings in IntelliJ

Our approach is based on the packaging example of ESH in combination with Eclipse Concierge as OSGi Container.
A full MAVEN build with mvn clean package creates a new zip file within the /target directory. This zip file contains a full OSGi container, the Eclipse SmartHome basics, the Yahoo Weather Binding and an addons directory. The OSGi container scans the addons directory for bundles and deploys them automatically. This zip file can be used to create a Docker container.
In order to stop the container gracefully a small change in the start script is needed. A description of it can be found here: qaware/smarthome-packaging-sample.
We have updated and added some dependencies to the original repository. One of the updated dependencies is the Jetty webserver, which also includes its client in the latest version of the 9.3.x branch.

The Docker Container

All the following examples and commands are based on the qaware/smarthome-packaging-sample repository.
The Dockerfile creates an image which is runnable out of the box. It contains a full Eclipse SmartHome, which runs on an Alpine Linux with the current OpenJDK 8 Runtime Environment. The image has a final size of less than 140 MB.

You can find the Dockerfile within the forked repository.
To build and tag the container as eclipse/smarthome:latest, you need to run:./mvnw clean package && docker build . -t eclipse/smarthome:latest
Now it is possible to start the container:

docker run -dit \
-p 127.0.0.1:8080:8080 \
eclipse/smarthome:latest

You can subsequently open the PaperUI and the System Console (username & password: admin).
The container can also be started in debug mode, which allows connecting an external debugger to the container. To start the container in debug mode use this command:

You can now connect the container with a Java debugger (like IntelliJ) using the address 127.0.0.1:5005.

Run and Debug an Eclipse SmartHome Binding within the Docker container

This part builds upon our previous blog post about the usage of Docker and IntelliJ: How to use Docker within IntelliJ
You can use the previously created Docker image for the development of bindings. To do that you need another small Dockerfile which copies the binding into /opt/esh/addons/ of the container:

You can place this Dockerfile into the main directory of the bundle, next to the MAVEN pom.xml, and subsequently use it for a new Docker deployment in IntelliJ as described in How to use Docker within IntelliJ. In IntelliJ you can directly start the Docker deployment with an attached debugger. Alternatively, you can create a volume mount to /opt/esh/addons and, after the successful completion of the build, copy the binding jar into the mount.

Recommended Volume Mount

We recommend to mount a volume into the Docker container to store the information about the bindings and the other configurations of the ESH: /opt/esh/userdata

Sep 18, 2017

Intro

We just came home from two days of talks, workshops and lots of fun during this year's Software Circus. "Cloudbusting" was the theme this year and it took place in the lovely city of Amsterdam. The organization team did a great job finding a unique venue: A festival location including a Big Top circus tent, a rusty hangar and a large outdoor area is not quite the setting you expect from a software conference!

Not your standard conference at all, the Circus provided some relaxed, fun and joyful atmosphere for learning new stuff, meeting new people and chatting to old friends of the cloud native community. The conference was embedded into a futuristic story arch that was moved forward by several great performances of actors, singers and dancers in between talks and sessions. Loud music, great food and Dutch beer rounded out the experience.

Maybe - just maybe, but don't tell our boss - especially the first day was a bit heavy on the show and too light on the content side. We did, however, get to talk tech, as there were several tracks throughout the day. If anything, we would wish for more talks and hands-on sessions during the next year's event!

The second day was reserved for workshops and some deep-dive sessions. Heavy rain and wet feet couldn't stop us from being there, not like many of the other attendants.

The following sections cover the most interesting topics and talks that we experienced this year.

Machine Learning/AI

In a practical part Google gave an intro into Tensorflow. It’s an Open Source library for AI and ML, that is developed by Google’s Brain Team. It performs operations on multidimensional arrays, so called tensors. As Google uses it for it’s search ranking, Tensorflow is worth a closer look.

In a theoretical talk Thiago de Faria spoke about ML, AI and DevOps. He introduced the history of AI and ML which goes back to the late 50’s and 60’s, where the first algorithms occurred. In the 90’s support vector machines mark another big step until 1997 IBM’s Deep Blue beat the World Champion at chess. Nowadays IBM’s Watson is one of the most famous AI / ML projects.

As Thiago is an ML practitioner he pointed out some very important questions concerning DevOps in AI and ML systems which still remain unanswered. A normal program is tested automatically in the context of Continuous Integration (CI). But how can you apply CI to AI and ML systems? Can you create automated tests for such a system? As even the smallest change in an AI has unpredictable effects and might break completely disjoint features, this is a very important point. Furthermore, a normal program is debuggable. You can set breakpoints and follow the program execution. But how can you debug AI and ML systems as there exists no traditional program flow? He hopes that those questions might be answered as AI and ML become more and more explainable.

At last he expressed his concerns and fears regarding ML. On the one hand existing biases might propagate into a system’s learned behavior and influence its decisions, on the other hand people might tend to delegate decisions to algorithms as they are too afraid to decide for themselves. Only time will reveal if his concerns were unfounded.

Software Architecture

With many buzzwords flying around, it is sometimes forgotten that certain topics never loose their relevance. Independent technology consultant Simon Brown delivered an inspiring (re-) imagination of the modern Software Architect. He debunks the notion that a capable architect is only doing the up-front specification work (the seagull approach), and is an avid proponent of a hands-on approach to software architecture. An architect needs to have people skills as much as technological expertise and is an essential building block for well-performing dev teams.

Simon also advocates the use of modelling tools to support development. This does not mean UML necessarily, he introduced his own creation as an alternative: The C4 model for software architecture is a lightweight alternative to get started with a better architectural documentation.

DevOps

Improving the software delivery process is still a key concern today and it was an important topic at the Software Circus as well. Often summarized under the 'DevOps' term, speakers and workshop organizers examined the matter from different perspectives, sharing success stories and cautionary tales. Maarten Dirkse of dutch online bookstore bol.com gave a workshop on doing continuous delivery - including automating canary deployments - using Gitlab-CI and Spinnaker.

DevOps is not about a particular technology, it is about culture and collaboration. This is the primary takeaway from Kris Buytaerts emotional talk in which he explained how "Docker Is Killing DevOps Efforts". He uses the Anti-Pattern of the "Enterprise Container", which contain an entire application-stack from Message-Queue to Database. By that he shows that simply adopting a particular technology does not solve the delivery problems and reminds everyone of the core values of the DevOps movement: culture, automation, measurement and sharing (CAMS).

DevOps principals are not only important in application development, as "DataOps" practitioner Thiago de Faria explained. With increasing importance, machine learning and AI projects need to think about their own delivery pipelines to tackle lock-in, onboarding and delivery problems.

Dealing with Legacy Applications

Even if this year's Software Circus was using the theme "Cloudbusting", there was some good news for those of us dealing with monolithic legacy applications: David Pilato from Elastic Search demonstrated how legacy applications can easily adopt the Elastic stack with incredibly small effort (see github.com/dadoonet/legacy-search). Twelve brave attendants watched this morning's first demo on adopting Elastic Search, all of us defying rain, wind, cold and noise from trains and ice carving...

The new 6.0.0-beta version contains some really cool features: the now build-in client for Elastic's API can be integrated in your JAVA applications quite easily, comes with convenient query builders, and grants out-of-the-box access to the following features:

Bulk processing for high performance index operations

Custom analyzers for easy index token definitions

Easy data aggregations for your application specific needs

Fuzziness factor for typo tolerance

Easy Kibana integration

Good to know that Elastic Search cannot be used for time series only. Can't wait to try it out in our applications!

Conclusion

The Software Circus is worth a visit, especially if you tire a bit of the conventional conference setting. It is a community event in the truest sense, bringing the people together and creating a comfortable backdrop for talks and sessions. If you come for the conversations and the spirit, you will be delighted. If you are only in for the content, then you may leave longing for some more - even though the Circus had lots to offer in that regards as well!

Jun 7, 2017

Everybody
talks about code quality, so surely we have a good understanding of what good
(or bad) code is, exactly. Right?

Fig 1.: Code Quality according to XKCD (https://xkcd.com/1513/)

Well, no. There are certainly many
books and scholarly articles on the topic, but they present a wide array of
different, and often conflicting views. It doesn’t get better if you turn to
industry: If you ask three professional programmers, you get four different opinions,
and they’re often fuzzy and apply only to the kind of software the programmer
is experienced with.

All
attempts to come up with a simple and crisp definition that everybody accepts
have failed. In the end of the day, people will resort to “I know it when I see
it”. Unfortunately, that doesn’t quite cut it, neither from a scientific point
of view, nor from a practical point of view.

Why should I bother?

Some software engineers might be tempted at this point to simply say
“Not my problem” and turn away. However, consider the following two scenarios
where this lack of a good definition truly is
your problem. First, imagine a teaching environment, be it a secondary school or
a university, and keep in mind that today’s students are tomorrow’s engineers
so they will be your colleagues in no time. In any such setting, students
expect to be told what is good code, and what isn’t. After all, that definition
will surely affect how their work is graded. Therefore, the definition should
be simple, universal, and easy to apply. However, there is a tension between
simplicity and universality: simple solutions often fail in difficult
situations. That is why practitioners often reject textbook definitions of code
quality as simplistic, or vague.

Now imagine
a second scenario of a professional programmer acquainting herself with a piece
of existing code. In order to understand the code, an IDE can provide valuable
help by flagging suspect code to guide the programmer’s attention. Clearly,
providing metrics (and threshold values) that the IDE should implement requires
absolute precision in the definition of code quality. Without the necessary
underpinnings, the tools will be of much less help, to fewer people.

However,
the problem is not a shortage of definitions, concepts, and tools – quite the
opposite, and all of them claim to be just the right thing, naturally. What we
need is guidance to select our approach, lest we want to waste our energy and
enthusiasm on ineffective ways or outright hoaxes (and yes, that happens a
lot). Unfortunately, there is precious little evidence to help along the way.

Now what?

In this
situation, researchers and practitioners from Sweden, Germany, the Netherlands,
the United States, and Finland teamed up to form a Working Group at ITiCSE (see WG 2), me among them representing QAware.
The working group pursues three goals. First, it needs to validate the above
observation and thus turn it into fact. Next up, we want to clarify and
systematize the existing aspects of code quality to inform the conversation
about code quality. Finally, we want to elicit and contrast the views on code
quality that teachers, students, and professionals hold, respectively, with a
view to deriving recommendations for programming education with a greater
practical value.

Based on
the literature (and common sense), we have some up front idea of what we might
find. For instance, we expect to find consistent opinions within groups of people
in similar professional situations (i.e., teachers, students, and professional
programmers), and different opinions across these groups, simply because they
have very different levels of expertise, and are likely concerned with
different kinds of quality issues. We expect a progression of levels of more
and more global properties.

SYNTAX At
the one end of the spectrum, there are syntax level issues, such as confusing
the tokens “=” with “==”, and preference of language constructs (e.g., avoiding
unsafe constructs, default-switch-cases and so on).

ARCHITECTURE Finally, there is a level of
architecture that is concerned with the structure and interrelation of units,
e.g., it considers depth of inheritance trees, design patterns, architectural
compliance, and other system-level properties.

Clearly,
one has to master the lower levels before one can work effectively on the higher
levels. But to what degree are the various groups aware of the elements of this
hierarchy? Which are the predominant concerns, and what tools and sources of
information used by the various populations? And which of the many issues at
each level are really relevant, and how do they compare?

Starting Point

There are
two types of evidence that exist addressing such questions. On the one hand,
there are quantitative studies (mostly controlled experiments and quasi experiments)
on very low-level aspects of code quality. Such studies are usually conducted
on students and focus on simple metrics [1,2,4,7,8], or individual aspects such
as readability [3,5]. Such studies aspire to provide scientific reliability,
though necessarily losing ecological validity in the process. On the other
hand, there are surveys and experience reports based on practitioner
experiences such as [6,9] that generally lack the degree of focus (and, too
often, also scientific rigor), but offer a higher degree of validity. Our
Study, in contrast, uses a qualitative study design and is the first to look at
differences across groups.

Of course,
many a practitioner might object that these questions in particular, or even
scientific enquiry in general, while interesting, are of purely academic
concern. People might often object that science is too slow, and lags behind
coding practice and thus is unable to give good guidance for today’s
developers. I beg to differ. While I am ready to accept criticisms of science
being slow, sometimes wrong, and often not immediately applicable, it is still
the only reliable (!) way forward. The IT industry is highly hype-driven, but
lasting improvements are rare.

Leaving
aside this philosophical argument, I believe questions like the ones addressed
in our study offer a set of very practical benefits.

Raising
the awareness about code quality in academic (or school) teaching will trickle
down into increased quality awareness and coding capabilities of graduates, and
thus junior practitioners.

Reliable
(i.e., scientific) insight into the relative contributions and effects of the
various factors allows practitioners to focus their efforts on those properties
that truly make a difference.

Finally,
fostering understanding of the respective viewpoints should improve mutual
understanding, and thus contribute to more collaboration, which I truly believe
in—for the common good.

Stay tuned
for the initial results of our study due in late June, and follow me on Twitter @stoerrle!

May 18, 2017

Apache Ignite
Like last year in Vancouver Apache Ignite is again a big thing. It's really an amazing piece of technology. Here's the feature puzzle of Apache Ignite:

At the conference the following Ignite topics were covered for the lately released version 2.0:

SQL Grid

Ignite supports ANSI SQL 99 compliant access to the data within a memory grid. It supports even the tricky things like (distributed) joins and groupings and full-text search within the data model and geo-spatial qeries. The data is always consistent and transactions are ACID. Even if Ignite acts as an read-through/write-throughcache for a relational database. This is a very interesting use case as this allows Ignite to act as an caching SQL proxy in front of an relational database. Ignite SQL can be accessed by an own JDBC and ODBC driver as well as by the Ignite SQL API. The relational data model within Ignite can be described and modified with SQL DDL and DMLs as well as by code annotations and XML configuration. The relational data model can also be imported from relational databases. Indexes are stored in-memory (off-heap) as B+ trees.

Streaming

With data streamers you can import data into an Ignite Cluster as stream with automatic partitioning support. Prebuilt data streamers for Kafka, RocketMQ, sockets, JMS, MQTT and others are available. The processing side are continuous SQL queries on sliding windows.

Web Console

There is a web console for Apache Ignite available for query execution, result visualization and monitoring. It also provides a schema import wizard from relational databases.

File System

Ignite provides an in-memory file system which implements the Hadoop FileSystem API. So it can be used as a HDFS or Alluxio replacement for {Hadoop, Spark, Flink}. In this scenario it can also act as an caching layer between {Hadoop, Spark, Flink} and real (and persistent) HDFS.

Ignite 2.1

Ignite 2.1 will be released within the next months. The big new thing will be an own high-performance persistent storage implementation to be able to provide durable scenarios without relying on external persistent storage solutions.

Apache is very busy in providing an open source IoT stack on top of mynewt, an real time operating system (RTOS) for low-level devices (Cortex M0-M4, MIPS, RISC-V) with included device management features like build and package mangement, remote firmware upgrade, secure bootloader and signed images.

Incubating Edget provides analytics capabilities at the edge from the cloud to the IoT fog.

May 17, 2017

The Apache Foundation event management team is really excellent in choosing venues for their conferences. After Vancouver, BC last year this year's ApacheCon and Apache BigData takes place in beautiful Miami, FL. Following my conference coverage of day 1. See day 2 coverage here.

Notebooks

Notebooks for data analysis are very en vogue. Apache Zeppelin and Jupyter are the super heroes in that area. Pixiedust is a nice extension to Jupyter providing easy-to-use data visualization primitives. Helium is a new plugin system and package repository for Zeppelin providing various ready-to-use Zeppelin extensions (visualizations, interpreters, spell).

Cloud

Basically no surprise but a little bit surprisingly intensive is the promotion of Apache CloudStack as open source IaaS platform and competitor to OpenStack. I thought this war is over and OpenStack is the clear winner - but Apache doesn't want to capitulate.

Flink and Spark ... and Beam

Flink seems to be at eye level with Spark. Each time Spark is mentioned also Flink is mentioned. Apache Beam is also very good covered at the conference providing an abstraction layer atop of both. But concerning Apache Beam I'm very suspicious of abstraction frameworks of abstraction frameworks. Beam is also an abstraction for Google Cloud Dataflow. So it maybe also exists for Google having a "no vendor lock-in" argument. Btw.: Google is one of the most contributing companies to Beam.

Messaging
There are two new players around in the field of messaging systems. In the range between Kafka and classical messaging systems like ActiveMQ and RabbitMQ RocketMQ is just in the middle. RocketMQ is an open source contribution of Alibaba - one of the largest web-scale companies on earth. You can find a nice comparison chart of RocketMQ with Kafka and ActiveMQ here. RocketMQ provides more guarantees compared to Kafka like strict ordering but at a price: It's based on a master/slave architecture so it's not as scalable like Kafka. But compared with ActiveMQ and RabbitMQ it has a significant higher throughput through leveraging the pull/distributed log principle of Kafka. As RocketMQ also provides a JMS interface it could be on a real sweet spot between Kafka and ActiveMQ/RabbitMQ. Apache DistributedLog is not a full fledged messaging solution but a building block therefor. It provides a distributed log implementation - f.e. Kafka is also based on a distributed log. Allegro open-sourced Hermes, a message broken on top of Kafka extending Kafka with REST pub/consumer interfaces, message tracing and monitoring, and guaranteed message delivery at a sub-millisecond cost atop of Kafka.

Dataservices
Dataservices is a new way how to process data and an alternative to Spark and Flink if you want to implement and run data processing applications atop of a microservice platform. I did a talk on how to implement dataservices with Spring Cloud Data Flow.
Others proposed to use a serverless framework like OpenWhisk to implement dataservices.

Jan 23, 2017

Setting up a distributed Ehcache on Mule
ESB Community Edition is in fact quite simple and can be achieved in a few steps. After
creating an Ehcache configuration, we set up a cache manager managed by Spring.
We then use the previously defined caches in our Mule configuration together with a cache key extractor in a custom caching interceptor.

Related

Prerequisites

In our example we are using Mule 3.8.0
together with Ehcache 2.6.3 and Spring 4.1.6 inside a Glassfish 4 server. Mule
is configured using XML configuration files.

Setting up the distributed Ehcache

The Ehcache is configured using ehcache.xml
configuration files which consist at least of a list of cache configurations.
For the distributed cache, we also need a peer provider, a peer listener and an event listener for each cache.

peer provider: locates other peers and manages a list of peers belonging to the distributed cache

peer listener: listens for incoming cache changes

cache event listener: listens for local cache changes and distributes changes to other peers

First, we set up the peer provider which locates other
peers in the network and manages a list of peers which belong to the
distributed cache:

The peer discovery can either be done in
automatic mode using multicast (as listed above) or in manual mode explicitly specifying the
remote peer addresses. The latter approach is usually safer for company
networks or data centers, but requires a lot of lines of configuration when using more
than just a few caches and server instances. In automatic mode, the peer provider sends multicast messages to all server instances in the multicast group and tells them about its caches and the port on which the peer listener (see below) listens for incoming cache changes.

Together with the peer provider, we need a
peer listener which listens for incoming cache changes:

If we do not specify a port as in the example above, Ehcache
automatically chooses a high numbered port which is still unused. For company networks
or data centers, you might want to specify the port explicitly. When using automatic peer discovery, the information about which server instance uses which port is distributed by the peer provider over multicast messages. In case of a manual peer discovery, the addresses and ports have already been stated explicitly in the peer provider configuration.

Finally each cache needs a cache event
listener (defined inside its cache tag) which distributes cache changes such as
new cache entries to the remote peers.

The cache manager uses the previously
defined ehcache.xml files. This is usually a good place to define separate
configuration files for your environments e.g. using ${ } property expansion.
For example you might want to disable the distributed cache when testing
locally or define different addresses or ports for your production environment.
Note that the configLocation is a Spring resource, so if you want to point it
to a file not in the classpath, use the file: prefix, e.g.
file:/path/to/ehcache.xml.

If we also want to use our caches elsewhere with Spring, it is a good idea to define a Spring cache manager (in the example above called ehcacheCacheManager) and an appropriate cache advice.

An interface for cache key extractors

In order to provide a cache key to our
caching interceptor for every service, we implement a cache key extractor
defined by a simple extractor interface.

public interface CacheKeyExtractor {

Object extractKeyFrom(MuleEvent event);

}

For each service, we implement a concrete
cache key extractor. An extractor could e.g. analyse the payload of the request,
parse it and extract the relevant information that is a viable cache key. Since
we pass the MuleEvent, we also have access to inbound, outbound and session
properties set by Mule or could retrieve other information from our Spring context.

In order to be able to access our cache key
extractor implementations in the Mule configuration files, we define them as
Spring components (e.g. @Component("fooCacheKeyExtractor")) and give
them a unique name for simple usage.

Implementing the caching interceptor

The last component needed for a working
cache is the caching interceptor. It is implemented as a custom Mule
interceptor. On a cache hit, further execution of the flow is stopped and the
cached payload is returned. On a cache miss, the flow continues and the result
of the execution is put into the cache. Logging messages and documentation are
stripped from the following example code.

Configuring Mule

The Mule configuration is now simple. We
first need access to our caches so we can insert them into the caching
interceptor. The Spring EhCacheFactoryBean already provides the extraction of caches
from our previsouly defined cache manager.

In our flows, we can now insert the custom
caching interceptor. The interceptor is configured with the cache to be used (the name must match the one in the ehcache.xml file) and a cache key extractor that knows how to extract a cache key for this
specific service. Since we defined the extractor as a named Spring bean, we can
now easily inject it here. On a side note, a more sophisticated implementation
of the caching interceptor could also e.g. find the extractor by some name
magic using Spring. The message processor listener, which is also needed by the
caching interceptor, is automatically set by Mule.

And that’s it. Calls to our foo service are
now cached and distributed to our other nodes. Subsequent calls of our foo
service should now be answered faster.

Troubleshooting

If you have problems with the Ehcache
configuration, first make sure that the correct ehcache.xml file is loaded.
Spring and Ehcache will switch to a default failsafe configuration in case of
errors which will lead you on a wrong trail. Also have a look at the Ehcache
log message at debug log level. Ehcache should print a lot of peer discovery
messages for automatic mode and give you a hint on problems with your
configuration. In case of problems with Mule, also have a look at the log
messages in debug mode, they are quite verbose.