Request load balancing

At its core, Passenger is a process manager and HTTP request router. In order to minimize response times, and to distribute load over multiple CPU cores for optimal performance, Passenger load balances requests over processes in a "least busy process first" manner. This article explains the implications and details of our request load balancing mechanism.

Table of contents

Loading...

Introduction

Ruby applications can normally handle only 1 request at the same time.
With Passenger this is improved by running multiple processes of the application using pooled application groups. For each request from the incoming request queue, a non-busy (free) application process is selected to handle the request.

For thread-safe Ruby apps it is also possible to enable multithreading, which allows the application processes to concurrently handle multiple requests at the same time – up to the amount of threads configured. In this case, Passenger forwards the request to the instance that is currently handling the least number of requests.

If all application processes and threads are busy, Passenger spawns a new instance, up to the process limit for the group. The amount of processes of the groups combined may also not exceed the limit for the application pool. If either limit is reached, the request remains in the queue (which has its own limit).

Maximum process concurrency

A core concept in the load balancing algorithm is that of the maximum process concurrency. This is the maximum number of concurrent requests that a particular process can handle.

For Ruby applications, the maximum process concurrency is assumed to be 1. This means that Passenger assumes each process can handle 1 request at a time.

This can be changed by setting concurrency model to thread, and by setting thread count. If you do that, the assumed maximum process concurrency will equal the number of configured threads. This reflects the fact that each thread can handle 1 request at a time.

For this reason, load balancing requests between multiple processes is beneficial.

Least-busy-process-first routing

Algorithm summary

Passenger keeps a list of application processes. For each application process, Passenger keeps track of how many requests it is currently handling. When a new request comes in, Passenger routes the request to the process that is handling the least number of requests (the one that is "least busy").

First available process in the list has highest priority

If there are multiple processes that have the least busyness, then Passenger will pick the first one in the list. For example, suppose that there are 3 application processes:

This property is used by the dynamic process scaling algorithm. Dynamic process scaling works by shutting down processes that haven't received requests for a while (processes that are "idle"). By routing to the first process with least busyiness (instead of, say, a random one, using round-robin), Passenger gives other processes the chance to become idle and thus eligible for shutdown.

Another advantage of picking the first process is that it improves application-level caching. Since the first process is the most likely candidate for load balancing, it will have the most chance to keep its cache warm. Examples of such caches include: in-memory hash tables, JIT caches, etc.

As you can see, there are 4 processes, but the "Processed" field doesn't look balanced at all. This is completely normal, it is supposed to look like this. Passenger does not use round-robin load balancing. The reasons are explained in the above section. Despite not looking balanced, Passenger is in fact balancing requests between processes just fine.

No head-of-line blocking problem

Many web servers and load balancers use independent queues, which causes the problem of head-of-line blocking, whereby HTTP requests to your app are queued behind slow or long-running requests. Passenger avoids this problem by using a single (per application group) shared queue.

Imagine a supermarket with a number of (independent) checkout lanes and a queue in each of them. If someone in one of the queues has trouble with the checkout, everyone already in that queue will be delayed, which is especially unfair if the other queues keep moving fast.

Instead, the Passenger implementation can be compared to using a single (shared) queue and sending only 1 person per checkout lane at a time, thereby preventing head-of-line blocking.

Example with maximum concurrency 1

Suppose that you have 3 application processes, and each process's maximum concurrency is 1. When the application is idle, none of the processes are handling any requests:

Process A [ ]
Process B [ ]
Process C [ ]

When a new request comes in (let's call this α), Passenger will decide to route the request to process A. A will then reach its maximum concurrency:

Process A [α]
Process B [ ]
Process C [ ]

Suppose that, while α is still in progress, a new requests comes in (which we call β). That request will be load balanced to process B because it is the least busy one:

Process A [α]
Process B [β]
Process C [ ]

Suppose α finishes. The situation will look like this:

Process A [ ]
Process B [β]
Process C [ ]

If another request comes in (which we call ɣ), that request will be routed to A, not C:

Process A [ɣ]
Process B [β]
Process C [ ]

Example with maximum concurrency 4

Suppose that you have 2 application processes,
and you configured the number of threads to 4, causing each process's maximum concurrency to be 4.
When the application is idle, none of the processes are handling any requests:

Process A [ ]
Process B [ ]

When a new request comes in (which we call α, Passenger will decide to route the request to process A.

Process A [α ]
Process B [ ]

Suppose that, while α is still in progress, 1 more request comes in (which we call β). That request will be load balanced to process B because it is the least busy one:

Process A [α ]
Process B [β ]

Suppose that another request comes in (which we call ɣ). That will be load balanced to process A again, not to B:

Process A [αɣ ]
Process B [β ]

Customized balancing through additional request queues

By default, all requests for an application are balanced in a fair way. You may want to adjust this behavior; for example if your app consists of pages for visitors as well as some kind of web service, you probably want to make sure the visitor requests don't get stuck behind web service requests when under heavy load.

This can be solved by configuring two application groups for your app: one for the web service URL and one for the visitor pages, each with their own request queue. The requests for the visitor pages will keep being handled even if there is an overload of requests for the web service. See the app group name option (Apache and Nginx only, not available on Standalone).

The two application groups will still compete with each other for server resources. You can control how many instances are allowed per application group, for example allowing more instances to serve the visitor pages than the web service URL. See the max instances option.

Request queue overflow

If a request arrives for an application group, and all its processes are busy, and the application pool is full, the request remains in the Passenger request queue. If this keeps happening (e.g. due to a flood of requests) the queue will eventually overflow its limit (see: max request queue size). Overflowing requests will no longer be balanced, but simply dropped (with a HTTP 503 error).