Understanding HTTP/2: History, Features, Debugging, and Performance

HTTP/2 is an optimized transfer protocol over the previous version and offers various advantages. Alibaba Cloud has supported HTTP/2 since the CDN 6.

Alibaba Cloud has supported HTTP/2 since the CDN 6.0 service launched in 2016, and has seen a 68% improvement to access rates. In this article, we will cover four aspects of HTTP/2: history, features, debugging, and performance.

HTTP (HyperText Transfer Protocol) is the most widely used network protocol on the Internet. The purpose of HTTP is to provide a method for publishing and receiving HTML pages. Resources requested through HTTP or HTTPS are identified by URI (Uniform Resource Identifier).

Although HTTP/1.1 has been running stably for more than 10 years, HTTP/2 is the wave of the future, which all engineers should learn. This article takes a deep dive in the history, features, debugging, and performance of HTTP/2 as shared by Jinjiu, a security technical expert at Alibaba Cloud CDN.

I. History

1. HTTP/0.9

The earliest prototype published in 1991, it is very simple. It only supports the GET method, and does not support MIME types and various HTTP headers.

2. HTTP/1.0

Published in 1996, it is based on HTTP/0.9 and has added a variety of methods, HTTP headers, as well as processing for multimedia objects.

In addition to the GET method, the POST and HEAD methods were also introduced, which enriched interactions between browsers and servers. Using this protocol, you can send contents in any format. The result is that the Internet can transfer not only text, but also images, videos, and binary files. This laid the foundation for the rapid development of the Internet.

The formats of HTTP requests and responses also changed. In addition to the data itself, every communication must include an HTTP header that describes relevant metadata.

Even though HTTP/1.0 presented a revolutionary change to HTTP/0.9, it still has some disadvantages. Primarily, each TCP connection can only send one request, and the connection is disabled when data is sent. If you want to request other resources, you have to create a new connection. Although some browsers have addressed this problem with a non-standard connection header, this is does not fully solve the problem. Since it is not a standard header, implementations between different browsers and servers may be different, so this solution is considered sub-optimal.

3. HTTP/1.1

Officially published in 1999, HTTP/1.1 is the current mainstream HTTP protocol. It repairs the structural defects in previous HTTP design, makes semantics explicit, adds and deletes some features, and supports more complex Web applications.

After nearly 20 years of development, this version of the HTTP protocol is now very stable. Compared to HTTP/1.0, it adds various noticeable new features such as Host protocol headers, range segment requests, default sustained connections, compressed and chunked transfer encoding, cache processing, and many more. These features are still widely used and relied upon by most software.

Although HTTP/1.1 is not as revolutionary a change as HTTP/1.0 to HTTP/0.9, it still had a lot of improvements. Current mainstream browsers are still using HTTP/1.1 by default to this day.

4. SPDY

The SPDY (pronounced as "speedy") protocol is developed by Google and mainly solves the problem of low efficiency in HTTP/1.1. It was published in 2009, but terminated in 2016. Since HTTP/2 has been standardized by IETF and will be supported by several browsers in the future, Google believes that it will make SPDY obsolete.

5. HTTP/2

HTTP/2 is the latest HTTP protocol, which was published in May 2015. It is supported by mainstream browsers such as Chrome, IE11, Safari, and Firefox.

Note that HTTP/2 is not called HTTP/2.0 because IETF (Internet Engineering Task Force) considers HTTPP/2 to be a mature technology that does not require future sub-versions. If significant changes are made necessary in the future, they will be published in HTTP/3.

Actually, SPDY is the predecessor of HTTP/2 as it has very similar goals, principles, and implementations. As there are many Google engineers in the IETF committee, it is not surprising that SPDY became the standard for HTTP/2.

HTTP/2 not only features optimized performance, but is also compatible with the HTTP/1.1 syntax, whose features are similar to SPDY. HTTP/2 is quite different from HTTP/1.1; it is a binary protocol and not a text protocol. It also uses HPACK to compress HTTP headers, and supports multiplexing, server push, etc.

II. Features

1. Binary Protocol

HTTP/2 transfers data in binary format rather than text format used in HTTP/1.x. Both header and body are in binary format and collectively called the "Frame". Basic binary format of the Frame is as follows (from rfc7540#section-4.1)

We call it the basic format because all HTTP/2 Frames are packaged according to this basic format, which is similar to the TCP header. There are currently 10 Frames, which are distinguished by field Type, and each Frame has its own binary format packaged in Frame Payload.

There are two important Frames: the Headers Frame (Type = 0x1) and the Data Frame (Type = 0x0), corresponding to the Header and Body in HTTP/1.1 respectively. Obviously the semantics hasn't changed significantly, only the text format has changed to binary. The conversion and relationship between Frames are shown in the figure below (from High Performance Browser Networking):

Furthermore, there are concepts like Stream and Message in HTTP/2, which are identified by the field Stream Identifier (Stream ID). Streams with the same Stream ID refer to the same Stream, and Message is included in Stream, corresponding to Request Message or Response Message in HTTP/1.x. Message is transferred through a Frame, while Response Message is larger and may be transferred through multiple Data Frames. The relationship of Stream, Message, and Frame in HTTP/2 is shown in the figure below (from High Performance Browser Networking):

2. Header Compression

Each request and response in HTTP/1.x will carry a lot of redundant header information, such as Cookie and User Agent. These are similar in content, but are carried by the browser in each request, which is a waste of bandwidth resources and transfer rate. This is because HTTP is a stateless protocol, and all information has to be attached in each request, including a lot of redundant headers.

Thus, there is an optimization in HTTP/2 to compress and transfer headers in the HPACK format, and create an index table for the headers. Only the index number has to be sent for the same header, which improves efficiency and transfer rate. The cost is that the client and the server have to maintain an index table, but since memory is not currently as expensive as it used to be, the tradeoff is worthwhile.

3. Multiplexing

Multiplexing is when the client and server are able to send multiple requests and responses at the same time in a single TCP connection. In HTTP/1.x there are strict requirements for the order of the requests and responses, while in HTTP/2 the requests and responses do not have to be in matching order and a block in any of the multiple concurrent requests and responses will not affect the rest. This effectively avoids the issue of Head-of-line blocking. This is demonstrated in the figure below (from High Performance Browser Networking):

4. Server Push

Server Push refers to the HTTP/2 ability for the server to actively push resources to clients without a request. For example, a server can actively push images, JS and CSS files to browsers, without waiting for the browser to resolve the HTML before sending the request. When the browser has resolved the HTML, the required resources are already in the browser, which makes web pages load significantly faster. This is demonstrated in the figure below (from High Performance Browser Networking):

A browser initiates a request for the page page.html, in which script.js and style.css are introduced. The server pushes the files script.js and style.css after the response to page.html, so that script.js and style.css are already available locally when the browser resolves page.html. This means that the browser does not have to send a separate request for these files, saving two rounds of requests and responses and effectively increasing the loading speed of the page.

5. Security

HTTP security is guaranteed by SSL/TLS, i.e. HTTPS. In fact, HTTP/2 does not have to rely on SSL/TLS, however, current mainstream browsers only support HTTP/2 based on SSL/TLS, and in an Internet environment with increasing number of network hijackings, HTTPS is the trend of future, and HTTP/2 based on HTTPS is also the trend of future. Various mainstream browsers only supporting HTTP/2 based on SSL/TLS at the beginning of HTTP/2 show that security is also an important feature of HTTP/2.

III. Debugging

HTTP/2 and SPDY are similar in principle and target, and official Nginx code shows that they are also similar in implementation. The SPDY module code has been deleted from the official Nginx code, and replaced with the code for the HTTP/2 module (ngx_http_v2_module).

2. Packet Capture Analysis

The best way to learn the HTTP/2 format is to start with packet capturing. While HTTP/2 is based on HTTPS and is therefore encrypted, which means that the contents of captured packets will be encoded, you can easily decrypt the packets with tools in Wireshark.

2.1. First export the system variable $SSLKEYLOGFILE, taking OS-X as an example

Open an https website with Chrome or Firefox, for example: https://www.taobao.comand see if there is content in ~/ssl_debug/ssl_pms.log. If there is, you can use Wireshark to decrypt the https data.

2.3 Wireshark settings

Wireshark-&gt;Perferences...-&gt;Protocols-&gt;SSL

Start packet capture

As you can see, we have now obtained the plaintext content of the HTTP/2 packets. As the scope of this article is limited, we won't go into any further explanation of HTTP/2. For more information please see: RFC7540.

IV. Performance

Analysis of Results:

Even with or without keepalive, HTTP/2 is similar in performance to SPDY/3.1, in fact slightly better. In the test results with sizes of 1kb, 2kb, and 4kb, QPS is relatively high when RT is low, and the CPU bottleneck is not a factor. QPS is slightly increased after increasing compression test clients. When RT is increased, so is 5xx (data for this part is not given, it is just the recorded phenomenon during the test). In the test results with sizes 16kb, 32kb, 64kb, 128kb and 256kb, the CPU reaches its bottleneck. As the size increases, QPS reduces and RT increases.

Looking at the CPU performance, it seems the culprit behind consuming more resources is gcm_ghash_clmul. The network card reaches its bottleneck when size is 512k, while the CPU does not. When keepalive is enabled, there is no significant difference between the performances of HTTP/1.1 and HTTP/2, but when disabled, HTTP/2 performs better.

Conclusion:

HTTP/2 features a number of underlying optimizations compared to the previous transfer protocol. These advantages have lead numerous companies to begin using the HTTP/2 protocol. Alibaba Cloud has supported HTTP/2 since the CDN 6.0 service launched last year, and has seen a 68% improvement in access rates. Though we may not have been aware of its approach previously, we now have a more secure, reliable, and faster era of the internet.