Wednesday, 12 August 2009

When designing SOA infrastructures or Enterprise Architectures, most architects miss one component which may impact the overall system performance dramatically – the hardware load balancer.

Typically a hardware load balancer like F5 BigIP exists as a shared component for many applications and services to distribute load between multiple servers or applications and to ensure failover in case of hardware or software failure.

However, the typical setup of a load balancer is rarely suited for typical SOA services as the following example shows:

A sample real life SOA service consists of a large order system which does the backend processing of customer orders. It was implemented using Oracle SOA Suite 10g and is based mainly on BPEL processes which contain the business logic and orchestrate the backend systems like Product Catalog, Fulfillment and Billing. All the requests (orders) come from a CRM portal.

This SOA implementation is running on an 8-node Linux-based BPEL cluster running Oracle Fusion Middleware 10g. In front of these 8-nodes, a F5 BigIP is distributing the load.

Initially the load balancer setup was the following:

- static round-robin to all 8 nodes - session affinity (persistence) for a duration of one hour

The corresponding rule for the BigIP configuration looks like

persist source_addr_1h

Let’s look at the impact of these settings:

Session Affinity (persistence) means that all requests coming from the same originating IP address will be routed by the load balancer to the same cluster node. This rule is mostly suited for typical web applications where it is desired to keep one HTTP session on the same cluster node.

However this setup has a severe negative impact for backend SOA services as we can see shortly:

The SOA order service consists of one invocation endpoint and some partnerlinks for calling the backend systems. Most of this communication is asynchronous, that means the backend systems are called via one-way requests and use callbacks to return messages back to the order system.

What now happens is that the first request to the order system comes from the CRM system, resulting in a routing to one node (say “Node A”) of the cluster. If these requests come over the load balancer, then all subsequent request coming from the CRM system will also be routed to the same cluster node, because the originating IP is equal! This results in a very poor overall load distribution. In fact only one of the 8 nodes will get most of the requests. When this node is at 100% resource utilization, the system will dramatically slow down even though the other 7 nodes are almost idle.

But it is even getting worse.

First of all, the same effect is in place for every callback from a backend system. So, if a callback (return message) is received from the Billing System, also the same node gets these answer repeatedly within one hour (though this need not be Node A, of course).

Secondly, the static round-robin algorithm does not take in effect, which state each cluster node has. So, for example if one cluster node is heavily under load, because it processes some complex orders, and this results in 100% cpu load, then the load balancer will not recognize this but route lots of other requests to this node causing overload and saturation.

In summary, a small misconfiguration of the load balancer will lead to a system which does not use the hardware of 8 nodes effectively, which will not be able to handle lots of requests and which will not scale well.

So what are the recommendations?

1. There should be no session affinity (persistence) at all for a runtime SOA system. There may be some exceptions, for example at deployment time. When deploying SOA services with multiple artifacts (for example multiple BPEL processes, WSDLs and XSDs), this should happen in general on the same cluster node first to prevent lots of replication and inconsistencies in the cluster. But this can be configured, for example by using a dedicated deployment server or setting up a virtual host as a deployment target.

2. BigIP offers more sophisticated load balancing algorithms than dumb static round-robin. For example you can use the dynamic ratio load balancing (described in Chapter "Configuring servers for SNMP Dynamic Ratio load balancing" of the BigIP Reference Guide). This algorithms uses metrics which are calculated dynamically by SNMP agents running on each node. These SNMP agents are typically included in the LINUX distribution and just need to be started on each node. The load balancer then regularly queries these agents about the values, dynamically calculates the metrics and routes the requests accordingly. This means that the distribution of requests will be proportionally to the metrics of each node for each time frame.

The overall effect is that only nodes which are not under load will get requests, and that the overall distribution of load will be much more effective. All cluster nodes will be utilized and the overall system will scale well up to the limit of the all available nodes. Then, of course, when all 8 cluster nodes are at 100%, this algorithm cannot help anymore of course….

Wednesday, 22 July 2009

On September 2nd, 2009 in Frankfurt and September 3rd, 2009 in Berlin, Oracle will organize an Enterprise Architecture Roundtable to discuss trends in infrastructures and methods for modern IT platforms. Main focus is to facilitate the exchange of Enterprise Architecture knowledge and experience between lead architects of our main customers. You are welcome to join. Register directly here:

Wednesday, 15 July 2009

Not only the version numbers match ;-) Also the products – OpenSuse 11.1 and Oracle Fusion Middleware 11gR1 form an excellent couple even though not officially certified. While Oracle Enterprise Linux is the officially certified version, it lacks more recent kernel support which means also that WLAN drivers might not work for example. So if you are looking for a lightweight Open Source Linux based installation (as development environment) of SOA Suite 11gR1 here is the way how I did it (Installation on Toshiba Tecra M9 Notebook with 4 GB RAM with Gnome desktop).

1. Be sure to install OpenSuse 11.1 with the development tools and libraries and with 32-bit support for gcc, glibc and glibc-devel.

7. Next, execute <WLS_HOME>/common/config.sh to create a new WLS domain for SOA.

First check if the Sun JDK has been set in commEnv.sh as JAVA_HOME – otherwise set the <WLS_HOME>/jdk160_11 as JAVA_HOME.

Set the database connections to the ones created with RCU:

Finally you are done!

Troubleshooting:

When you receive an error at Database linking in ins_client.mk at database installation time: Skip this step, complete the DB installation and execute <ORACLE_HOME>/bin/relink client after installation.

If executing runInstaller fails, then include the option –jreLoc <path to JDK160_11> on the command line

Thursday, 7 May 2009

Oracle JRockit (formerly BEA JRockit) is a well recognized JVM which has many advantages over Sun JVM. For example it overcomes the usual difficulties with PermGen space limitations of the Sun JVM, just to name one. In benchmarks in a real complex SOA project we have experienced performance advantages of JRockit over Sun on Linux of more than 50%!

Since today, JRockit 1.6 (R27.6.3) is now certified for SOA Suite 10.1.3.4 on Linux and Windows!

About Me

Stefan is a leading SOA architect within Oracle. He has more than 15 years experience in delivering complex projects on Oracle platforms. Since 1997 he has specialized in J2EE and middleware. Since 2004 Stefan is architecting complex SOA solutions and in 2011 moved to the Fusion Middleware Architects Team (A-Team) within Oracle Product Development.