Dehydration for OSB: How to Run Long Transactions and Not Run Out-of-Memory

Offload the currently unused in-memory XMLs to a persistent storage, like BPEL does. Download examples.

Slow backends can kill the JVM if they are used in a composite service.

The data are accumulated in the service while the backend service is taking its time to respond.

Make the service slow enough and the data big enough, and the heap will be all consumed up.

Can we do something about it?

Case Study: CustomerProfile Service

A composite service is the one that calls more than one backend service and then combines the results into a response that the consumer wants.

Some initiated backend calls complete faster than others. All the time until the last call is complete, the data just sit there, not used, but occupying the precious memory.

This is true even for parallel calls, but it becomes most harmful for sequential executions.

Take this schematic CustomerProfile service:

A composite service collects data from many backends.

This CustomerProfile service:

Collects the basic profile information from Profile backend service and stores it in memory.

Uses profile to retrieve the location-specific data from Location service and stores it in memory.

Based on profile and location, gathers the available products and services.

Finally, merges all three data sources into one response and provides it to the consumer.

A Single Slow Backend Holds All Data in Memory

What if Products service is slow? Say, instead of usual 2000ms response time, it now responds in 10000ms?

A typical CustomerProfile service can wait quite for long for all the required data to come in.

Why, we get 5 times more data stored in memory! (5 times more requests are in progress now).

You can estimate the memory usage by multiplying the response size by 4x. E.g. a response of 100K will take about 400K in memory after it is parsed.

This may not be an issue for smaller services, true.

However, some of the services I deal with return responses of up to 5MBytes(!). Letting this amount accumulate in memory can kill OSB by going out-of-memory.

Larger than usual memory use would also affect the performance due to memory allocation overhead.

Should We Tighten the Timeouts? No.

An obvious quick-and-dirty solution to this is reducing the timeout for the Product service. If it can’t respond in 3000ms, cancel it!

Yes, this will prevent the OOM, but some of the consumers will start getting failures when calling CustomerProfile. This is not good, because, in fact, we may still have a lot of resources (memory) to serve the requests.

The timeout solution is too blunt.

BPEL Avoids it by Dehydrating

Let’s take a look at our SOA neighbour, BPEL.

BPEL is designed to execute long-running transactions. You may not believe it, but it is totally normal for a BPEL instance to run for days or even weeks!

Read about BPEL correlation ids if you want to know how they work around read timeouts and network drops.

Just like an OSB request, an BPEL instance must have all the responses accumulated to do the job. With those uber-long transactions, BPEL must be using a LOT of memory, right?

It would, if not for dehydration.

Before performing any request, BPEL engine saves all the data in the current process and persists them into an out-of-process storage (normally a database). Then it removes them from the memory.

Now, no matter how long the request takes, there is no impact on memory.

When the call is completed, the data are read back from the storage and the process resumes.

Smart, eh?

Emulating Dehydration

We can implement the dehydration in OSB, too!

It will take a manual step, and we will have to decide when and what to dehydrate, but it is totally feasible. All we need to do is:

1. Before making a potentially long call, serialize the large data into an on-disk file. We’ll only keep a token (the file name) in memory.
2. After the call, de-serialize the data back from the file.

2 comments on “Dehydration for OSB: How to Run Long Transactions and Not Run Out-of-Memory”

Thanks for the article. Have a doubt. Where is this dehydrated XML stored. And will it be flushed or do we need to manually run some job to flush it out after the usage. I am using the Dehydrate to Convert huge XML to token and passing it to Queue. And then target service will hydrate it using the token. In my case, Multiple services will be using the token and hydrate function.

As you may see from the Java code, the payloads are saved to the current directory, which is for Weblogic server is the domain directory. Obviously, it is just a proof-of-concept, and the code can be configured to use a dedicated store or even a database.

As long as your services are running within the same JVM, i.e. do not cross the boundaries of the managed servers, the approach would work.

Even JMS distributed queue by default keeps the message on the same managed server (unless told otherwise).

Re: cleanup. If you take a look at the code for hydrate(), it deletes the file after reading it. Having a cron script that removes old files tho could be a good idea.