I recently participated in a discussion regarding what happens when a client requests a page from a proxy server. I just wanted to make sure that my understanding of this sequence of events was correct in the general case:

User requests site

A DNS request is sent by the client, to its configured DNS server to resolve the destination IP address (this is done first in order to accommodate HTTP requests that are configured to bypass the proxy)

Once the destination IP is received from DNS, and just before the HTTP request is sent, the request is checked against the exception list

If the destination server is not on the exception list, the request is forwarded to the proxy server.

If the destination server is on the exception list, the request is forwarded according to the client machine's routing table.

4 Answers
4

Not exactly: it depends on how the client is configured. Let's use IE as the basic example.

If you configure IE with an explicit proxy:

User types an address

The address is checked for string matches against the IE proxy exceptions list

a. If matching a bypassed entry, DNS is used to resolve the name, and the client connects directly to the target IP address on port 80 (assumed), then sends a request like:

GET /something.htm HTTP/1.1Host: fulldomainame.example.com

b. If non matching, continue

The client connects to its configured proxy and sends a request of the form:

GET http://fulldomainname.example.com/something.htm HTTP/1.1

(this use of the FQDN in the URL is one way you can tell that a client thinks it's talking to a proxy instead of a real web server)

The proxy resolves the name, connects to the target site, etc, etc

In the case of using a WPAD or Autoconfiguration script (such as provided by ISA/TMG when autoconfiguration is enabled), it's different:

When using WPAD/PAC:

User types an address

Client downloads the current copy of the wpad.dat/autoproxy.js file from its configured location

Client looks for the entry point "FindProxyForUrl" in the js file, and executes it

The Autoproxy file processes the hostname and URL. This is a limited function javascript file:

a. this may include name resolution (IsInNet, DnsResolve)

b. this may include string matching (ShExpMatch)

c. this may include counting to a million (i++)

d. this may include narky alert popup messages if the admin's a jerk (or just funny (or debugging))

The FindProxyForUrl function returns at least one string, or an ordered list of the best proxies to use:

a. either "DIRECT", in which case the client then needs to resolve the name itself, as per the bypass case above

b. or "PROXY proxyname:8080" or similar, in which case the client connects to that port on the proxy, tells it to GET the full URL, and the proxy performs name resolution.

There are occasionally glitches, subtleties and unexplained behaviours, but for the most part when things aren't broken in weird and interesting ways, the above is how I've seen it work. If you're interested in the Winsock Proxy Client, that's a different story.

One final note: Once the client has decided to talk to a proxy, there's no way for the proxy to tell it "I don't serve that, you should just go directly to it instead." Once the client decides a particular URL is proxy-served, proxy-death-grip ensues.

...and oops. I just did a test that proved myself wrong, at least in IE. So, I guess my next question would be, how is DNS resolved then for addresses that are in the proxy exception list? Maybe it's time to get out the sniffer.
–
orange_aureliusAug 12 '10 at 15:48