HTTP Load Balancing

The target HTTP proxy checks each request against a
URL map to determine the appropriate
backend service for the request.

The backend service directs each request to an appropriate backend based on
serving capacity, zone, and instance health of its attached backends. The
health of each backend instance is verified using
an HTTP health check, an
HTTPS health check, or an HTTP/2 health check. If the backend service is
configured to use an HTTPS or HTTP/2 health check, the request will be
encrypted on its way to the backend instance.

Sessions between the load balancer and the instance can use the HTTP, HTTPS,
or HTTP/2 protocol. If you use HTTPS or HTTP/2, each instance in the backend
services must have an SSL certificate.

HTTPS Load Balancing

An HTTPS load balancer has the same basic structure as an HTTP load balancer
(described above), but differs in the following ways:

Client communications with the load balancer

Clients can communicate with the load balancer using the HTTP 1.1 or HTTP/2
protocol.

When HTTPS is used, modern clients default to HTTP/2. This is
controlled on the client, not on the HTTPS load balancer.

You cannot disable HTTP/2 by making a configuration change on the load
balancer itself. However, you can configure some clients to use HTTP 1.1
instead of HTTP/2. For example, with curl, use the --http1.1 parameter.

Components

The following are components of HTTP(S) load balancers.

Global forwarding rules and addresses

Global forwarding rules
route traffic by IP address, port, and protocol to a load balancing
configuration consisting of a target proxy, URL map, and one or more backend
services.

Each global forwarding rule provides a single global IP address that can be used
in DNS records for your application. No DNS-based load balancing is required.
You can either specify the IP address to be used or let Google Cloud Load Balancing
assign one for you.

Target proxies

Target proxies terminate HTTP(S)
connections from clients. One or more global forwarding rules direct
traffic to the target proxy, and the target proxy consults the URL map to
determine how to route traffic to backends.

The proxies set HTTP request/response headers as follows:

Via: 1.1 google (requests and responses)

X-Forwarded-Proto: [http | https] (requests only)

X-Forwarded-For: <unverified IP(s)>, <immediate client IP>,
<global forwarding rule external IP>, <proxies running in GCP>
(requests only)
A comma-separated list of IP addresses appended by the intermediaries the
request traveled through. If you are running proxies inside GCP that
append data to the X-forwarded-For header, then your software must take into
account the existence and number of those proxies. Only the
<immediate client IP> and <global forwarding rule external IP>
entries are provided by the load balancer. All other
entries in the list are passed along without verification. The
<immediate client IP> entry is the client that connected directly
to the load balancer. The <global forwarding rule external IP> entry
is the external IP address of the load balancer's
forwarding rule. If there are more entries than that, then the first entry
in the list is the address of the original client. Other entries before the
<immediate client IP> entry represent other proxies that forwarded
the request along to the load balancer.

You can create custom request headers if the default headers do not meet
your needs. For more information on this feature, see
User-defined request headers.

URL maps

URL maps define matching patterns for
URL-based routing of requests to the appropriate backend services. A default
service is defined to handle any requests that do not match a specified host
rule or path matching rule. In some situations, such as the
cross-region load balancing example,
you might not define any URL rules and rely only on the default service.
For content-based routing of traffic, the URL map allows you to divide your
traffic by examining the URL components to send requests to different sets of
backends.

SSL certificates

If you are using HTTPS Load Balancing, you must install one or more
SSL certificates on the target
HTTPS proxy. You can have up to ten (10) SSL certificates installed. They are
used by target HTTPS proxies to authenticate communications between the load
balancer and the client. These can be
Google-managed or self-managed SSL certificates.

For more information on installing SSL certificates for an HTTPS load balancer,
see SSL Certificates.

If you are using HTTPS or HTTP/2 from the load balancer to the backends, you
must install SSL certificates on each VM instance. To install SSL certificates
on a VM instance, use the instructions in your application documentation. These
certificates can be self-signed.

SSL policies

SSL policies give you the ability
to control the features of SSL that your HTTPS load balancer negotiates with
HTTPS clients.

By default, HTTPS Load Balancing uses a set of SSL features that provides good
security and wide compatibility. Some applications require more control over
which SSL versions and ciphers are used for their HTTPS or SSL connections. You
can define SSL policies that control the features of SSL that your load
balancer negotiates and associate an SSL policy with your target HTTPS proxy.

You can enable connection draining on backend services to ensure minimal
interruption to your users when an instance that is serving traffic is
terminated, removed manually, or removed by an autoscaler. To learn more about
connection draining, read the
Enabling Connection Draining
documentation.

Protocol to the backends

Beta

This is a Beta release of
HTTP/2 protocol to the backends. This feature
is not covered by any SLA or deprecation policy and might be subject to backward-incompatible
changes.

When you configure a backend service for the HTTP(S) load balancer, you set the
protocol that the backend service uses to communicate with the backends. You can
choose HTTP, HTTPS, or HTTP/2. The load balancer uses only the protocol that
you specify. The load balancer does not fall back one of the other protocols if
it is unable to negotiate a connection to the backend with the specified
protocol.

Use an HTTP/2 health check if you are using HTTP/2 to the backend VMs.

Using gRPC with your Google Cloud Platform applications

Beta

This is a Beta release of
HTTP/2 to the backends and gRPC. This feature
is not covered by any SLA or deprecation policy and might be subject to backward-incompatible
changes.

gRPC is an open-source framework
for remote procedure calls. It is based on the HTTP/2 standard. Use cases for
gRPC include the following:

Low latency, highly scalable, distributed systems

Developing mobile clients that communicate with a cloud server

Designing new protocols that must be accurate, efficient, and language
independent

Layered design to enable extension, authentication, and logging

To use gRPC with your Google Cloud Platform applications, you must proxy
requests end-to-end over HTTP/2. To do this with an HTTP(S) load balancer:

Configure an HTTPS load balancer.

Enable HTTP/2 as the protocol from the load balancer to the backends.

The load balancer negotiates HTTP/2 with clients as part of the SSL handshake by
using the ALPN TLS extension.

The load balancer may still negotiate HTTPS with some clients or accept insecure
HTTP requests on an HTTP(S) load balancer that is configured to use HTTP/2
between the load balancer and the backend instances. Those HTTP or HTTPS
requests are transformed by the load balancer to proxy the requests over HTTP/2
to the backend instances.

Firewall rules

You must create a
firewall rule that allows traffic from
130.211.0.0/22 and 35.191.0.0/16 to reach your instances. These are IP
address ranges that the load balancer uses to connect to backend instances.
This rule allows traffic from both the load balancer and the health checker.
The rule must allow traffic on the port your global forwarding rule has been
configured to use, and your health checker should be configured to use the
same port. If your health checker uses a different port, then you must create
another firewall rule for that port.

Note that firewall rules block and allow traffic at the instance level, not
at the edges of the network. They cannot prevent traffic from reaching the load
balancer itself.

Load distribution algorithm

HTTP(S) Load Balancing provides two methods of determining instance load.
Within the backend service resource, the balancingMode property selects
between the
requests per second (RPS)
and CPU utilization modes. Both modes allow a
maximum value to be specified; the HTTP(S) load balancer tries to ensure that
load remains under the limit, but short bursts above the limit can occur during
failover or load spike events.

Incoming requests are sent to the region closest to the user, provided that
region has available capacity. If more than one zone is configured with backends
in a region, the traffic is distributed across the instance groups in each zone
according to each group's capacity. Within the zone, the requests are spread
evenly over the instances using a round-robin algorithm. You can override
round-robin distribution by configuring session affinity.
However, note that session affinity works best if you also set balancing mode
to requests per second (RPS).

Session affinity

Session affinity
sends all requests from the same client to the same virtual machine instance as
long as the instance stays healthy and has capacity.

GCP HTTP(S) Load Balancing offers two types of session affinity:

client IP affinity—
forwards all requests from the same client IP address to the same instance.

WebSocket proxy support

HTTP(S) Load Balancing has native support for the WebSocket protocol.
Backends that use WebSocket to communicate with clients can use the HTTP(S)
load balancer as a front end, for scale and availability. The load balancer does
not need any additional configuration to proxy WebSocket connections.

The WebSocket protocol, which is defined in RFC 6455,
provides a full-duplex communication channel between clients and servers. The
channel is initiated from an HTTP(S) request.

When HTTP(S) Load Balancing recognizes a WebSocket Upgrade request from an
HTTP(S) client and the request is followed by a successful Upgrade response
from the backend instance, the load balancer proxies bidirectional traffic for
the duration of the current connection. If the backend does not return a
successful Upgrade response, the load balancer closes the connection.

The timeout for a WebSocket connection depends on the configurable response
timeout of the load balancer, which is 30 seconds by default. This timeout is
applied to WebSocket connections regardless of whether they are in use or not.
For more information about the response timeout and how to configure it, refer
to Timeouts and retries.

If you have configured either client IP or generated cookie
session affinity
for your HTTP(S) load balancer, all WebSocket connections from a client are
sent to the same backend instance, provided the instance continues to pass
health checks and has capacity.

QUIC protocol support for HTTPS Load Balancing

HTTPS Load Balancing supports the QUIC protocol
in connections between the load balancer and the clients. QUIC is a transport
layer protocol that provides congestion control similar to TCP and security
equivalent to SSL/TLS for HTTP/2, with improved performance. QUIC allows faster
client connection initiation, eliminates head-of-line blocking in multiplexed
streams, and supports connection migration when a client's IP address changes.

QUIC affects connections between clients and the load balancer, not
connections between the load balancer and backends.

The target proxy's QUIC override setting allows you to enable one of the
following:

When possible, negotiate QUIC for a load balancer OR

Always disable QUIC for a load balancer.

If you specify no QUIC override, you allow Google to manage when QUIC is used.
Google does not enable QUIC with no override specified. For information on
enabling and disabling QUIC support, see Target Proxies.

How QUIC is negotiated

When you enable QUIC, the load balancer can advertise its QUIC capability to
clients, allowing clients that support QUIC to attempt to establish QUIC
connections with the HTTPS load balancer. Properly implemented clients always
fall back to HTTPS or HTTP/2 when they cannot establish a QUIC connection.
Because of this fallback, enabling or disabling QUIC in the load balancer does
not disrupt the load balancer’s ability to connect to clients.

When you have QUIC enabled in your HTTPS load balancer, some circumstances can
cause your client to fall back to HTTPS or HTTP/2 instead of negotiating QUIC.
These include:

When a client supports versions of QUIC that are not compatible with the
QUIC versions supported by the HTTPS load balancer

When the load balancer detects that UDP traffic is blocked or rate limited
in a way that would prevent QUIC from working

If QUIC is temporarily disabled for HTTPS load balancers in response to bugs,
vulnerabilities, or other concerns.

When a connection falls back to HTTPS or HTTP/2 because of these circumstances,
we do not count this as a failure of the load balancer.

Ensure that the above described behaviors are acceptable for your workloads
before you enable QUIC.

Interfaces

Your HTTP(S) Load Balancing service can be configured and updated through the
following interfaces:

The gcloud command-line tool: a command-line tool included in the
Cloud SDK. The HTTP(S) Load Balancing documentation calls on
this tool frequently to accomplish tasks. For a complete overview of
the tool, see the gcloud Tool Guide. You can
find commands related to load balancing in the
gcloud compute command group.

You can also get detailed help for any gcloud command by using the --help
flag:

The REST API: All load balancing tasks can be accomplished using the
Google Cloud Load Balancing API. The
API reference docs describe the
resources and methods available to you.

TLS support

By default, an HTTPS target proxy accepts only TLS 1.0, 1.1, and 1.2 when
terminating client SSL requests. You can use
SSL policies to change this
default behavior and control how the load balancer negotiates SSL with clients.

When the load balancer uses HTTPS as a backend service protocol, it can
negotiate TLS 1.0, 1.1, or 1.2 to the backend.

Timeouts and retries

HTTP(S) Load Balancing has two distinct types of timeouts:

A configurable response timeout, which represents the amount of time the
load balancer will wait for your backend to return a complete response. It is
not an idle (keepalive) timeout. This timeout is configurable by modifying
the timeout setting for your backend
service.
The default value is 30 seconds. Consider increasing this timeout under these
circumstances:

If you expect a backend to take longer to return HTTP responses, or

If the connection is upgraded to a WebSocket.

A TCP session timeout, whose value is fixed at 10 minutes (600 seconds).
This session timeout is sometimes called a keepalive or idle timeout, and its
value is not configurable by modifying your backend service. You must
configure the web server software used by your backends so that its keepalive
timeout is longer than 600 seconds to prevent connections from being closed
prematurely by the backend. This timeout does not apply to WebSockets.

HTTP(S) Load Balancing retries failed GET requests in certain circumstances,
such as when the response timeout is exhausted. It does not retry failed POST
requests. Retries are limited to two attempts. Retried requests only generate
one log entry for the final response. Refer to Logging
for more information.

Illegal request handling

The HTTP(S) load balancer blocks client requests from reaching the backend
for a number of reasons: some strictly for HTTP/1.1 compliance and others
to avoid unexpected data being passed to the backends.

The load balancer blocks the following for HTTP/1.1 compliance:

It cannot parse the first line of the request.

A header is missing the : delimiter.

Headers or the first line contain invalid characters.

The content length is not a valid number, or there are multiple
content length headers.

There are multiple transfer encoding keys, or there are unrecognized
transfer encoding values.

There's a non-chunked body and no content length specified.

Body chunks are unparseable. This is the only case where some data will
make it to the backend. The load balancer will close the
connections to client and backend when it receives an unparseable chunk.

The load balancer also blocks the request if any of the following are true: