Best Practice: Talend ESB

Introduction

The best practices described in this document are techniques that have consistently shown
results superior to those achieved by other means. Talend recommends the use
and adoption of the guidelines described in this document to expedite your development. The best
practices can be used as a benchmark to measure quality and conformity of work produced by
developers.

The best practices described in this document will undergo tweaks and addition with every new
release of the Talend ESB
product.

The purpose of this document is to provide standards and best practices around the creation of
mediation routes and web services. This is intended to be a working document that can be added to
and removed from as and when best practices are updated or superseded. The standards in this
document should be followed whenever a route/web service is being built. However there is some
flexibility with regard to the best practices. It is advised that these be followed, but in some
cases it may be beneficial to do something different. Developers should use the best practice
section as a starting point but stick strictly to the standards.

Mediation Best Practices

This section describes the best practices that should be followed when building
Mediation
routes.

Readability

The following are suggestions for best practice in order to aid readability and reduce
complexity.

Component Names

Do not leave component names unchanged. Always give them a useful name that indicates how they
are being used.

While you are able to use the same name for several components this should not be done in most
cases as it can lead to problems whereby components with the same name will overwrite each
other’s settings.

Where a component is a pure copy of another with no changes then it is OK (and might make
sense) to leave the names the same.

Generic Components

While it is possible to “recreate the wheel” by using generic components to do everything, it
is recommended that dedicated components be used where possible.

For example, it is possible to recreate the functionality of the cTimer
component using the cMessagingEndpoint component. This should not be done
without a good reason. Using the dedicated components means that potential errors are more likely
to be caught and the readability of the routes will be better as the images for these components
can help identify their purpose.

Keep Complexity Low

While it is possible to achieve many things in one very long and complex route, it makes it
very hard to read and leaves the route prone to unhandled errors. If your route becomes complex,
try to break it down into several subroutes. Remember when doing this that you may need to keep
control of the order of processing, so use suitable endpoints (“direct” for synchronous behavior
and “seda” for asynchronous behavior).

Subroutes should be ordered in the order in which you expect them to be processed from top to
bottom going from left to right. The main route (usually the route that receives the initial
message) should be at the top of the route design.

Unconnected Components

Don’t scatter unconnected components such as cConfig and
cJMSConnectionFactory components all over the place in your route design.
Unconnected components that occur in all routes should be placed in the top left corner of the
route design and route specific components should be placed just below these on the left. As
people will generally try and read from left to right and from top to bottom this is natural
place to position these key components.

Always Check the Camel Documentation First

When designing a route always make sure you check the Camel documentation (http://camel.apache.org/components.html) to find out what is available for you to use.
In many cases this will save a lot of time, work and will provide better performance.

A developer wanted to move a file from an input folder to a working directory. He designed
this step with a separate route and two cFile components. This
approach had a couple of major drawbacks.

Instead of moving the file via a fast file system operation, the file was read into a
stream (the first cFile component) and then written from that stream
into a second file (the second cFile component). This basically meant
making a copy of the file and then deleting the original.

A look into the cFile component documentation would have told the
developer to use the preMove setting of the file component, which
would have been much faster and the whole route design would have been much simpler.

Use Folders and Sensible Route Names

If you are creating a lot of routes it can get very confusing which is which and what they are
for if you do not have a good and consistent naming convention. As well as making sure the routes
are named in a logical manner it is also recommended that folders are used to separate routes
into groups. The grouping method very much depends on the project but should be decided upon
during the design stage.

Reusability and Scalability

The following are recommendations to improve reusability and scalability.

Divide and Conquer

Do not try and model all use cases as single and highly complex routes. Break a use case down
into smaller bite size chunks that you can possibly use throughout several use cases.

While this might take a little more time to design it will reduce the build time and improve
the scalability and reusability.

Remember, routes in the same virtual machine can pass messages via endpoints (vm or direct-vm).
If the routes needing to communicate are not in the same virtual machine you can use message
queues as a way of communicating.

Do Not Expect Anything

The more decoupled your routes become the less you can expect from the exchanges. In order to
protect against unforeseen exceptions creeping into the system always check for the data that you
are expecting before attempting to use it.

For example, if you are expecting a particular XML format in the body of a message then
validate it against a XSD file to check it is the correct format before passing it further along
the route. Use cTry components to catch potential problems like
that.

Avoid camel dependencies

Using annotations is the second best approach, if camel specific values are required. Working
with the org.apache.camel.exchange object should be avoided as much as possible, because changes
in the exchange object itself can easily lead into undesired behaviour.

Improve Extensibility

The following are recommendations to improve extensibility.

Multiple Entry Points

As the system matures you will find that entry points can change over time. Where in the
beginning a flat comma separated file may have been the initial source of data, this can change
format to be XML, JSON or even change type to become a message in a queue or an email.

To accommodate this it is a good idea to separate the entry point from the main flow of a route
and decide upon a common format to share between routes after the data has entered the system.

For example, you might choose to create POJOs (Plain Old Java Objects) to share the data
between routes and subroutes as they make the data contained very easy to access. Alternatively
XML or JSON may be chosen. You might decide to pass POJOs between routes and XML for when you are
passing data to Data Integration jobs. Whatever is decided it should be consistent throughout the
system.

Flexible Persistence

If you need to persist data within your route, try to make this as flexible as possible. Use a
subroute for persisting data, so you can easily change persistent storage from the file system to
a database, or vice versa. Try also to keep all persistent dependent tasks/steps within this
subroute, so that you do not need to worry about special dependencies, when changing the
persistent storage.

Use Built in Libraries

Studio provides several 3rd party libraries which can be used in designing a route. These
libraries also contain all further (internal) dependencies. Providing all indirect dependencies
yourself can become a rather extensive task, so try to find functionality in the supplied
libraries before looking elsewhere.

Miscellaneous

The following are general recommendations to make building and designing routes
easier.

Use Context Variables

Use context variables whenever possible and reuse those variables whenever it is appropriate.
If, for example, a message queue needs to be used then its location, port number and name should
be set up in context variables so that they can be maintained in one place. If you are sharing
any values between routes (Endpoint names, passwords, queues, URIs, etc) these should be stored
in context variables.

Preserve exchange body while calling a bean

Sometimes a java bean needs to be called, but you do not want the method response to replace
the current exchange body. But using the cBean component would cause this
undesired behaviour.

Therefore you should not use the cBean component (in this case), but
rather a cSetHeader component. The return value of the Java bean would not
change the body, but would only set a header field which could be easily ignored, while the
exchange body would remain untouched.

If you need to call a specific bean method name within the cSetHeader
component you can just type the method name comma separated after the bean class name (e.g:
beans.MyBean, myMethod).

Avoid instantiating a class for each message

Sometimes you will need to do some work on a message which cannot be done by generic components
or by using a Talend
DI Job. To do this you might choose Java and might want to encapsulate the code in a class or a
bean. This is a very powerful mechanism and can be very useful, but can also cause performance
issues if it is not implemented in an efficient manner.

If you have a route that might process thousands of messages an hour, having a new class
instantiated for every message is not very efficient in terms of memory usage. Doing this
inefficiently can increase the likelihood of facing the “Exception in thread "main"
java.lang.OutOfMemoryError: Java heap space” exception. In order to avoid this it is a good idea
to build your classes so that they do not need to be instantiated for every message. If possible
instantiate them at the beginning of the route and store them in a registry.

Talend provide a
component for doing this called the cBeanRegister component. This
component will register the created object and make it available at any point in the route. If it
is a simple bean you can reference it in the many ways described in the documentation using the
inbuilt functionality. If you need to use it in a cProcessor component,
you can retrieve it using a variation on following code:

Doing this can dramatically improve the memory usage and performance of your routes.

Behaviour Analysis

If you want to test a component the first time, or if you encounter problems within your route
that you cannot solve easily, you should create a new TestRoute and focus only on the specific
component/problem. Try to make this example as simple as possible to avoid error from
misconfiguration at another (unseen) point. This usually helps a lot to find a reason for an
error, or learning how to handle a new component best.

Use cLog at all interesting points of your route, to make sure the
content of your route is still what you expect it to be.

Use Component Specific Headers

Some components like cHTTP are aware of specific header fields (e.g.
org.apache.camel.Exchange.HTTP_PATH). If such a header is set, this
header will overwrite the default configuration of the component itself. This is quite helpful,
if you can set a specific value at runtime only. If you use such a component specific header, set
this header to your required value just before calling your component, and set this header to
null right after that component.

If you don't do this, calling a similar component again later in the route (or within a
subroute) could cause unexpected behaviour. To avoid the burden for each subroute to test whether
or not any component specific header values are set, just remember to always reset these types of
header values, right after they have been used.

Disappearing messages

If your exchange body is a stream you cannot read this stream twice (by default). So if you
want to print the content of a stream to your logfile but also process this stream within a
following component for example, you should use the cConvertBodyTo
component to change the body type from stream to e.g. String. A String can read as often as you
need to.

Always keep in mind the lifespan of the data that you are using and passing on.

Write Documentation

Each component has a Documentation tab. If you feel that an explanation
as to what that component is doing would help someone, fill it in here.

There is also a Show Information tick box on this tab. If you select
this it shows to those reading the route that there is information to read there for the
component.

Documentation is always a chore so it makes sense to do it as you build. Filling in these
Documentation tabs can really help others and is vastly less work than writing a complete
document for each route.

If you do have to write a document for each route, these notes will help you when you come
round to doing the documentation which there is seldom a great deal of time for when you are
actually building routes. Having these notes will save a lot of investigation for you and others
in the future.

Calculate somewhere else

There will be times when you will need to process data in messages. This can be done in code in
cProcessor components but not everybody is comfortable with
reading/writing code.

You will also find that sometimes the same processing may be needed across several routes. In
order to make this processing reusable it is good practice to package this logic up in a Data
Integration Job that can be shared amongst all areas that need that bit of logic. Data can be
supplied in exchange messages as XML and Headers (for example) which DI can easily consume and
output.

Learn Java or get used to reading it

Talend ESB is a code
generating piece of software. Sometimes when you have a bug in your route it will be because of
how the code is generated. Maybe the way the components have been connected is not handled very
well or it doesn’t make sense for you to connect the components that way according the Camel
Framework.

The best way to identify these issues and find workarounds is to be able to read the Java error
stack and use it to point you toward the line of code that is at fault in the generated code (by
using the Code tab). This will save hours of searching forums.

Synchronous or Asynchronous Endpoints

There will be plenty of times where you need to send data between endpoints. There are many way
of doing this (direct, vm, direct-vm, seda) and you should work out which is the best for your
route before implementing it.

Sometimes you may want the main body of the route to finish quickly but have some other
processing in a subroute where you do not mind how long it takes. In this situation you should
use a seda or vm endpoint as these are asynchronous.

However, if every subroute is required to have finished before the result of the main route is
returned a direct or direct-vm endpoint should be used. This will make the route slower but will
enable completeness.

Web Service Best Practices

This section describes the best practices that should be followed when building web
services.

Selection of Service Type

The first decision that needs to be made when designing Web services is what type of service it
needs to be (REST or SOAP). The following should be considered before making this
decision.

Should the service be Stateless or Stateful

A stateless system can be seen as a black box where at any point in time
the value of the outputs depend only on the value of the inputs.

A stateful system can be seen as a box where at any point in time the
value of the outputs depend on the value of the inputs and of an internal state. So basically a
stateful system is like a state machine with memory as the same set of
inputs can generate different outputs depending on the previous inputs received by the
system.

This is an important distinction to make when deciding on a type of service. If your service
needs to be stateful then SOAP is the type of service you need. A real world example of where
SOAP is preferred over REST can be seen in the banking industry where money is transferred from
one account to another. SOAP would allow a bank to perform a transaction on an account and if the
transaction failed, SOAP would automatically retry the transaction ensuring that the request was
completed. Unfortunately, with REST, failed service calls must be handled manually by the
requesting application.

What operations need to be performed

What does the service need to do? If it simply needs to carry out CRUD operations (Create,
Read, Update or Delete) then REST is a good choice. It is lightweight, easy to construct the call
(for the consumer), can make use of caching to reduce the load for regular calls and returns
human readable responses. If your operations are more complex and need to stick to a strict
contract, then SOAP is the better choice.

Must the Service Type be consistent

Is it architecturally important for the service type to be consistent across the system? This
is an important decision to make as if it does then it is likely you will need to select SOAP
unless all you are carrying out are simple CRUD or stateless operations.

However, if a mix of service types is permitted then that allows a lot of flexibility and can
vastly reduce the effort in implementing the whole system. A choice that is often made in systems
where a mixture of service types are permitted, is to use REST for simple read operations and to
use SOAP for the complex operations and operations where data changes may occur.

Security

In the majority of cases REST and SOAP security systems are the same: some form of HTTP-based
authentication plus Secure Sockets Layer (SSL).

However a SOAP service does support end-to-end message security. This means that if you pass
SOAP messages from endpoint to endpoint to endpoint, over the same or different protocols, the
message is secure. If your system needs this particular feature SOAP is definitely the way to
go.

It should be noted that security is a large domain and far too complex to decide upon based on
a couple of paragraphs. The point here is to say that while underlying REST and SOAP security
systems are largely the same, SOAP has provision for intermediary security that REST does
not.

Readability

Readability best practices for Web Services are practically the same as for the
Mediation
routes.

For more information, see the Readability
section for Mediation routes.

Reusability and Scalability

The following are recommendations to improve reusability and scalability.

Divide and Conquer

Very similar to the section with the same title in the Mediation section, do not try and model
all use cases as single and highly complex Web services.

The services should be broken down into their most atomic parts. There is no need to expose
these atomic services to the outside world, but they can be used by other services to build up a
more complex one which you will expose.

Do Not Expect Anything

The thing about services is that you have to expect to not be able to expect anything from the
caller. It might be another system, an experienced developer, someone with a bit of knowledge or
someone that has found it by mistake and wants to give it a try.

Obviously in systems with built in security you don’t need to worry so much about the person
who finds it by mistake but it is important, no matter who or what is expected to use the
service, that you always check for the data that you are receiving before attempting to use
it.

Miscellaneous

There are many overlaps between best practices for Web Services and Mediation routes in
general. Many of the miscellaneous best practices for Web Services have been covered in the
Mediation
routes Miscellaneous section.

For more information, see the Miscellaneous section for Mediation routes.

There are also a few other cases especially for Web Services which are described below.

Always return something

It is good practice to ensure that when a Web service is called that something is always
returned. In many cases a return of data will be expected. But in some cases there may not
actually be an expected return. No matter whether a response is needed, there should always be
a response returned indicating a success or failure. There should also be a mechanism to
ensure that errors are reported.

Use standard HTTP web codes

When returning statuses, ensure that where possible standard HTTP Web codes are used.

Database connection pooling

Web services are highly available and therefore can cause problems for any databases they need
to connect to if they are forced to open a new connection every time they need to interrogate
them. A way around this is to use a connection pool.

At present Talend only supports connection pooling using a JDBC connection. Therefore it is
considered best practice to make use of the JDBC database components when working with databases
via services.

Mediation Route Standards

This section describes the standards that should be followed when building Mediation routes. Due to
the nature of route development the standards are relatively light.

This section will also cover several related development standards such as naming conventions,
variable usage and Java coding standards.

Context Variables

Context variables should be used in place of hardcoded parameters across the ESB system.
Context groups should be set up in the development environment to ensure that common variables
are reused by all developers. Different context groups should be set up to contain related
variables.

For example, there should be a context group that will only contain variables directly related
to the error and/or logging handling functionality. These variables should be used by all routes.
But there should also be contexts set up for other groups that routes can fit into like project,
business area, route types, etc. These should be decided upon as early as possible. Developers
can set up context variables that are specific to individual routes, but this should only be done
where absolutely necessary.

The naming convention for context variables should be “meaningful names in lower camel
case”.

Naming Conventions

The naming conventions for the Routes should be as follows:

Route names start with ro_.

The rest of the name should be in upper camel case following a consistent project wide
format.

An example of a route name for a Remittance project that validates an input XML schema might be
ro_RemittanceValidateInputSchema.

All names should be approved by the lead developer/team and should be specified in the design
to keep a consistent naming approach for the project.

Java Standards

It will be necessary to write some Java from time to time in order to meet certain
requirements.

Comments

The most important of all of the Java standards for readability are the comment standards. ALL
code should be well documented. It must not be assumed that another developer will be able to
interpret the code.

Some people are better at coding than others and as the Talend ESB tool is not a coding
tool exclusively it is important to make sure that everyone has a chance of being able to work
out what it happening.

Code Format

Everybody has their own preference for code formatting. For this project a rule of thumb should
be that it must be as readable as possible. A good way of ensuring this is to use tabs to format
your code using one tab for each nested layer of code.

Code Reuse

Reusable pieces of code should be packaged as beans so that they can be reused. It is good
practice to create static methods where possible so that a new instance of the class does not
need to be instantiated for each method. If this cannot be done then a workaround is to use the
cBeanRegister component which is mentioned in the best practices section
of this document.

However it is achieved, it is important to make sure that code is reused wherever possible.

Web Service Standards

This section describes the standards that should be followed when building the Web
Services. Web Services align quite closely to Data Integration jobs, so there will be some
commonality between the standards for Web Services and Data Integration.

Context Variables

The standards for context variables are the same as for the Mediation routes.

Naming Conventions

The rest of the name should be in upper camel case following a consistent project wide
format.

An example of a service name for a Remittance project that retrieves a balance might be
ws_RemittanceRetrieveBalance.

All names should be approved by the lead developer/team and should be specified in the design
to keep a consistent naming approach for the project.

Remember that the Web Service will be a Talend Job and its name
does not necessarily have to have anything in common with it endpoint or URI. Standards for
these are in the next section.

Web Service Endpoints and URIs

A Web Service is exposed to its consumers via an Endpoint and a URI. It is important that each
service has a different endpoint or it will overwrite the current service running using that
endpoint when the new service is started. It is possible to share endpoints if multiple services
are implemented in the same job. If that is done then the services need to be distinguished by
different URIs.

This is a nice method of grouping services by endpoint however it can lead to very big and
messy jobs.