We are trying to improve the average latency of Java proxy application.
The Java application receives a message from a client and proxies it to a necessary server.
The protocol is binary, asynchronous.

The topology is:

Client (1) - Java proxy (2) - Server (3)

Call flow is

(1)-(2)-(3)-(2)-(1)

The requirement is 2 ms average latency overhead for one direction hop for Java proxy.

We've noticed that if we increase the number of clients, latency overhead is decreased linearly.
Let's say,