The Cable Guy

End-to-End WAN Optimization with BranchCache

Joseph Davies

Expanding a business into new regions of the world with branch offices is a great idea from a business perspective, but it often presents challenges to network architects and implementers. To connect each branch office to a central location requires some sort of physical or logical connection, with bandwidth that is typically orders of magnitude smaller than local area connections. Low bandwidth combined with the trend toward centralizing organization data often yields branch office links that are congested, resulting in poor performance for applications. Moreover, many types of wide area network (WAN) links are expensive and can incur substantial startup and monthly costs.

To better utilize an existing WAN link or to prevent incurring the costs associated with increasing the bandwidth of your WAN link, you can use a variety of WAN optimization techniques. Some of these techniques require additional equipment. One prominent WAN optimization technique is caching; data obtained across the WAN link is cached at the branch office. A computer in the branch office that requests data already requested by another computer will retrieve the data from a branch office cache, rather than pulling it across the WAN link.

Content servers provide the requested data and additional metadata that caching computers can use to identify cached data in the branch office. Requesting computers request data from the content server and indicate that they are BranchCache-enabled. Caching computers store data from a content server that was previously requested and are either clients running Windows 7 or Windows Server 2008 R2 or a hosted cache server running Windows Server 2008 R2. BranchCache has two operating modes, corresponding to whether you use clients (distributed cache mode) or a hosted cache server (hosted cache mode) to store the cached data.

Every caching implementation must answer the following questions:

How do the content server, requesting computers and caching computers identify cached content?

When is data cached at the branch office?

How do the content server and the requesting computer determine that the content has changed?

How does the requesting computer discover the cached content at the branch office?

How does BranchCache prevent eavesdroppers from obtaining unauthorized data?

As we learn how BranchCache works, let’s come back to these questions to ensure they are answered.

BranchCache Operation and WAN Optimization

The basic operation of BranchCache can be described in four steps.

Step 1: A requesting computer in the branch office requests a file from a content server in the central office. The content server has computed and stored metadata associated with the file. Because the requesting computer has identified itself as BranchCache-capable, the content server sends the metadata associated with the file to the requesting computer. The requesting computer tries to locate the file on another caching computer in the branch office by using the file’s metadata as an identifier. Because no other caching computer in the branch office has cached the file, the requesting computer requests the file from the content server.

Step 2: The requesting computer caches the file and its associated metadata in the branch office, either on itself (distributed cache mode) or on a hosted cache server (hosted cache mode).

Step 3: A second requesting computer in the same branch office tries to download the same file from the content server in the central office. The content server responds with the metadata for the content.

Step 4: The second requesting computer uses the content metadata to download the document from either a client or the hosted cache server that is located in the branch office. Figure 1 shows these basic steps for distributed cache mode.

Figure 1 Basic BranchCache operation for distributed cache mode

When the data can be located in the branch office, the only information crossing the WAN link is the metadata. The bandwidth requirements are reduced by a factor of up to 2,000; the metadata can be up to 2,000 times smaller than the size of the associated data. However, this is only for files that are requested multiple times and for files larger than 64 KB.

Now that we have a basic understanding of BranchCache operation, let’s go back and answer some of those fundamental caching questions. For the question “How do the content server, requesting computers, and caching computers identify cached content?”, BranchCache uses the metadata computed by the content server to identify cached content. This metadata contains a set of segment and block hashes computed by the content server using the Secure Hash Algorithm (SHA)-256.

For the question “When is data cached at the branch office?”, BranchCache caches data in the branch office when a BranchCache-capable and enabled requesting computer obtains SMB 2.0- or HTTP 1.1-based data for a file larger than 64 KB from a BranchCache-capable and enabled content server.

For the question “How do the content server and the requesting computer know when the content has changed?”, when a requesting computer tries to obtain data, the content server sends the new hashes. For example, if a Microsoft Word document has changed, the content server computes new block and segment hashes and sends the new metadata for the entire file to the requesting computer.

For the question “How does the requesting computer discover the cached content at the branch office?”, the requesting computer either sends a request for the file metadata directly to a configured hosted cache computer (for hosted cache mode) or uses the Web Services Discovery (WS-Discovery) multicast protocol to discover the client computer that has the cached data, and then sends a request for the data (for distributed cache mode).

For our example of the Word document whose contents have changed, a requesting computer requests the entire file from the content server, which responds with the metadata containing block and segment hashes for the entire file. Because only specific block and segment hashes have changed, the requesting computer might be able to obtain the unchanged blocks from a caching computer in the branch office and the changed blocks from the content server over the WAN link. In many cases, however, a change in a file changes all of the blocks and the content server sends the entire file over the WAN link.

Security for BranchCache Operations

The last question, “How does BranchCache prevent eavesdroppers from obtaining unauthorized data?”, is more difficult to answer because a caching implementation must not allow malicious users to circumvent the security that exists when caching is not present. These security elements include authentication of the requesting user or computer, authorization that the computer or user can access the requested data and the confidentiality of the data on the network.

When a client requests data from the content server, the content server does not provide the metadata or the data until after the user or computer’s credentials are authenticated and access to the data is authorized. The metadata consists of the block and segment hashes and a private segment key (SK) for each segment. The content server computes the SK from the segment hash and a secret key known only to the content server. The metadata is sent in the same way as the data. For example, if the contents of a file are sent encrypted to the requesting computer, the metadata is also sent encrypted.

From the metadata, the requesting computer computes a segment discovery key (SD), which requesting computers and caching computers use as an identifier of the segment. In hosted cache mode, the requesting computer sends the SD in a request to the hosted cache server. In distributed cache mode, the requesting computer sends the SD in the local subnet multicast message. When the caching computer receives the request with the SD, it determines the corresponding segment hash and responds to the requesting computer with the availability of the requested blocks within that segment.

The requesting computer then requests the blocks needed for that segment from the caching computer. To prevent an eavesdropping computer from capturing the data sent on the branch office network, both the requesting computer and the caching computer compute the same encryption key based on the SK. The caching computer sends the encrypted blocks to the requesting computer, which decrypts them. By default, BranchCache uses the Advanced Encryption Standard (AES) with a 128-bit key.

After decryption of each block, the requesting computer computes its own block hash and verifies that it matches the block hash in the metadata. If they match, the data was successfully received from the caching computer and the block of data is returned to the application.

Sidebar: Maintaining the Security Status Quo

BranchCache uses encryption between the requesting computer and the caching computer to maintain the security status quo on your network. If there is no encryption of data sent from the content server to the requesting computer, BranchCache offers no additional protection when the cached data is sent on the branch office network. However, if the data sent from the content server to the requesting computer is encrypted, BranchCache maintains confidentiality when the cached data is sent on the branch office network.

Requests for data that is sent as clear text can be captured by malicious users in the branch office. Because BranchCache sends the metadata and the data using the same method as non-cached data, a malicious user in the branch office can capture the metadata for a segment if sent unencrypted, compute the SD and encryption key from the metadata, and then use it to decrypt the blocks of data when they are sent from a caching computer to a requesting computer. However, this is not an additional security issue because the original data was sent as clear text and, in the absence of BranchCache, would be sent by the server in the central office each time it is requested by a client in the branch office. When clear text is used, BranchCache does not offer any additional security services for the cached data sent on the branch office network.

When the content server sends data to the requesting computers using an encrypted channel—for example, using Internet Protocol security (IPsec) or HTTPS—the additional encryption performed between the caching computer and the requesting computer ensures that an eavesdropper in the branch office can’t interpret the cached data. Because the content server sends metadata using the same encrypted channel as the data from the content server, an eavesdropper in the branch office must first decrypt the metadata for a segment before it can compute the SD and encryption key and decrypt the data blocks. If eavesdroppers can’t decrypt the metadata, they can’t decrypt the original data.

Therefore, for encrypted data channels between the content server and the requesting computer, using the encryption key derived from the SK protects the cached data in two ways. First, only authenticated and authorized users or computers will receive the metadata, which includes the SK. Second, because the SK was sent using an encrypted channel, a malicious user can’t determine the SK or the corresponding encryption key and decrypt the cached data when it is sent on the branch office network.

Joseph Davies is a Principal Technical Writer on the Windows networking writing team at Microsoft. He is author or coauthor of a number of books published by Microsoft Press, including Windows Server 2008 Networking and Network Access Protection (NAP), Understanding IPv6, Second Edition, and Windows Server 2008 TCP/IP Protocols and Services.