Apache module mod_proxy

Summary

This module implements a proxy/cache for Apache. It implements
proxying capability for FTP, CONNECT
(for SSL), HTTP/0.9, HTTP/1.0, and
(as of Apache 1.3.23) HTTP/1.1.
The module can be configured to connect to other proxy modules
for these and other protocols.

This module was experimental in Apache 1.1.x. As of Apache
1.2, mod_proxy stability is greatly improved.

Warning: Do not enable proxying with ProxyRequests until you have secured your server. Open proxy servers are
dangerous both to your network and to the Internet at large.

An ordinary forward proxy is an intermediate
server that sits between the client and the origin
server. In order to get content from the origin server,
the client sends a request to the proxy naming the origin server
as the target and the proxy then requests the content from the
origin server and returns it to the client. The client must be
specially configured to use the forward proxy to access other
sites.

A typical usage of a forward proxy is to provide Internet
access to internal clients that are otherwise restricted by a
firewall. The forward proxy can also use caching to reduce
network usage.

The forward proxy is activated using the ProxyRequests directive.
Because forward proxys allow clients to access arbitrary sites
through your server and to hide their true origin, it is
essential that you secure your server so
that only authorized clients can access the proxy before
activating a forward proxy.

A reverse proxy, by contrast, appears to the
client just like an ordinary web server. No special
configuration on the client is necessary. The client makes
ordinary requests for content in the name-space of the reverse
proxy. The reverse proxy then decides where to send those
requests, and returns the content as if it was itself the
origin.

A typical usage of a reverse proxy is to provide Internet
users access to a server that is behind a firewall. Reverse
proxies can also be used to balance load among several back-end
servers, or to provide caching for a slower back-end server.
In addition, reverse proxies can be used simply to bring
several servers into the same URL space.

A reverse proxy is activated using the ProxyPass directive or the
[P] flag to the RewriteRule
directive. It is not necessary to turn
ProxyRequests on in
order to configure a reverse proxy.

Strictly limiting access is essential if you are using a
forward proxy (using the ProxyRequests directive).
Otherwise, your server can be used by any client to access
arbitrary hosts while hiding his or her true identity. This is
dangerous both for your network and for the Internet at large.
When using a reverse proxy (using the ProxyPass directive with
ProxyRequests Off), access control is less critical
because clients can only contact the hosts that you have
specifically configured.

In the rare situation where you must download a specific file
using the FTP ASCII transfer method (while the
default transfer is in binary mode), you can
override mod_proxy's default by suffixing the request with
;type=a to force an ASCII transfer.
(FTP Directory listings are always executed in ASCII mode, however.)

A FTP URI is interpreted relative to the home directory of
the user who is logging in. Alas, to reach higher directory
levels you cannot use /../, as the dots are interpreted by the
browser and not actually sent to the FTP server. To address
this problem, the so called "Squid %2f hack" was implemented in
the Apache FTP proxy; it is a solution which is also used by
other popular proxy servers like the
Squid Proxy Cache.
By prepending /%2f to the path of your request, you can make
such a proxy change the FTP starting directory to / (instead
of the home directory). Example: To retrieve the file /etc/motd,
you would use the URL

To log in to an FTP server by username and password, Apache
uses different strategies.
In absense of a user name and password in the URL altogether,
Apache sends an anonymous login to the FTP server, i.e.,

user: anonymous
password: apache_proxy@

This works for all popular FTP servers which are configured for
anonymous access.
For a personal login with a specific username, you can embed
the user name into the URL, like in:
ftp://username@host/myfile. If the FTP server
asks for a password when given this username (which it should),
then Apache will reply with a [401 Authorization required] response,
which causes the Browser to pop up the username/password dialog.
Upon entering the password, the connection attempt is retried,
and if successful, the requested resource is presented.
The advantage of this procedure is that your browser does not
display the password in cleartext (which it would if you had used
ftp://username:password@host/myfile in
the first place).
Note that the password which is transmitted in such a way
is not encrypted on its way. It travels between your browser and
the Apache proxy server in a base64-encoded cleartext string, and
between the Apache proxy and the FTP server as plaintext. You should
therefore think twice before accessing your FTP server via HTTP
(or before accessing your personal files via FTP at all!) When
using unsecure channels, an eavesdropper might intercept your
password on its way.

If you're using the ProxyBlock or
NoCache directives, hostnames' IP addresses are
looked up and cached during startup for later match test. This
may take a few seconds (or more) depending on the speed with
which the hostname lookups occur.

Yes. Just build Apache with the rule SOCKS4=yes in
your Configuration file, and follow the instructions
there. SOCKS5 capability can be added in a similar way (there's
no SOCKS5 rule yet), so use the
EXTRA_LDFLAGS definition, or build Apache normally
and run it with the runsocks wrapper provided with
SOCKS5, if your OS supports dynamically linked libraries.

Some users have reported problems when using SOCKS version
4.2 on Solaris. The problem was solved by upgrading to SOCKS
4.3.

Remember that you'll also have to grant access to your
Apache proxy machine by permitting connections on the
appropriate ports in your SOCKS daemon's configuration.

An Apache proxy server situated in an intranet needs to
forward external requests through the company's firewall
(for this, configure the ProxyRemote
directive to forward the respective scheme to
the firewall proxy).
However, when it has to access resources within the intranet,
it can bypass the firewall when accessing hosts. The NoProxy directive is useful for specifying
which hosts belong to the intranet and should be accessed
directly.

Users within an intranet tend to omit the local domain name
from their WWW requests, thus requesting "http://somehost/"
instead of "http://somehost.my.dom.ain/". Some commercial proxy
servers let them get away with this and simply serve the
request, implying a configured local domain. When the ProxyDomain directive is used and the
server is configured for proxy
service, Apache can return a redirect response and send the
client to the correct, fully qualified, server address. This is
the preferred method since the user's bookmark files will then
contain fully qualified hosts.

This defines remote proxies to this proxy. match is
either the name of a URL-scheme that the remote server
supports, or a partial URL for which the remote server should
be used, or '*' to indicate the server should be contacted for
all requests. remote-server is a partial URL for the
remote server. Syntax:

remote-server = protocol://hostname[:port]

protocol is the protocol that should be used to
communicate with the remote server; only "http" is supported by
this module.

This directive allows remote servers to be mapped into the
space of the local server; the local server does not act as a
proxy in the conventional sense, but appears to be a mirror of
the remote server. path is the name of a local virtual
path; url is a partial URL for the remote server.

Suppose the local server has address
http://wibble.org/; then

ProxyPass /mirror/foo/ http://foo.com/

will cause a local request for the
<http://wibble.org/mirror/foo/bar> to be
internally converted into a proxy request to
<http://foo.com/bar>.

Warning: The ProxyRequests directive should
usually be set off when using ProxyPass.

This directive lets Apache adjust the URL in the
Location header on HTTP redirect responses. For
instance this is essential when Apache is used as a reverse
proxy to avoid by-passing the reverse proxy because of HTTP
redirects on the backend servers which stay behind the reverse
proxy.

path is the name of a local virtual path.url is a partial URL for the remote server - the same
way they are used for the ProxyPass directive.

Example:
Suppose the local server has address
http://wibble.org/; then

will not only cause a local request for the
<http://wibble.org/mirror/foo/bar> to be
internally converted into a proxy request to
<http://foo.com/bar> (the functionality
ProxyPass provides here). It also takes care of
redirects the server foo.com sends: when
http://foo.com/bar is redirected by him to
http://foo.com/quux Apache adjusts this to
http://wibble.org/mirror/foo/quux before
forwarding the HTTP redirect response to the client.

Note that this ProxyPassReverse directive can
also be used in conjunction with the proxy pass-through feature
("RewriteRule ... [P]") from mod_rewrite
because its doesn't depend on a corresponding
ProxyPass directive.

The AllowCONNECT directive specifies a list of
port numbers to which the proxy CONNECT method may
connect. Today's browsers use this method when a https
connection is requested and proxy tunneling over http
is in effect.
By default, only the default https port (443) and the default
snews port (563) are enabled. Use the AllowCONNECT
directive to override this default and allow connections to
the listed ports only.

The ProxyBlock directive specifies a list of words, hosts
and/or domains, separated by spaces. HTTP, HTTPS, and FTP
document requests to sites whose names contain matched words,
hosts or domains are blocked by the proxy server. The
proxy module will also attempt to determine IP addresses of
list items which may be hostnames during startup, and cache
them for match test as well. Example:

ProxyBlock joes-garage.com some-host.co.uk rocky.wotsamattau.edu

'rocky.wotsamattau.edu' would also be matched if referenced by
IP address.

Note that 'wotsamattau' would also be sufficient to match
'wotsamattau.edu'.

The ProxyReceiveBufferSize directive specifies an explicit
network buffer size for outgoing HTTP and FTP connections, for
increased throughput. It has to be greater than 512 or set to 0
to indicate that the system's default buffer size should be
used.

The ProxyIOBufferSize directive specifies the number of bytes
that will be read from a remote HTTP or FTP server at one time.
This directive is different from the ProxyReceiveBufferSize
directive, which specifies the low level socket buffer size.

When a response is received which fits entirely within the IO
buffer size, the remote HTTP or FTP server socket will be closed
before an attempt is made to write the response to the client.
This ensures that the remote server does not remain connected
unnecessarily while the response is delivered to a slow client.
A high value for the IO buffer decreases the load on remote HTTP
and FTP servers, at the expense of greater RAM footprint on the
proxy.

This directive is only useful for Apache proxy servers
within intranets. The NoProxy directive specifies a list of
subnets, IP addresses, hosts and/or domains, separated by
spaces. A request to a host which matches one or more of these
is always served directly, without forwarding to the configured
ProxyRemote proxy server(s).

A Domain is a partially qualified DNS domain
name, preceded by a period. It represents a list of hosts
which logically belong to the same DNS domain or zone
(i.e., the suffixes of the hostnames are all ending
in Domain).
Examples: .com.apache.org.
To distinguish Domains from Hostnames (both syntactically
and semantically; a DNS domain can have a DNS A record,
too!), Domains are always written with a leading
period.
Note: Domain name comparisons are done without regard to the
case, and Domains are always assumed to be anchored
in the root of the DNS tree, therefore two domains
.MyDomain.com and .mydomain.com.
(note the trailing period) are considered equal. Since a
domain comparison does not involve a DNS lookup, it is much
more efficient than subnet comparison.

A SubNet is a partially qualified internet address
in numeric (dotted quad) form, optionally followed by a
slash and the netmask, specified as the number of
significant bits in the SubNet. It is used to
represent a subnet of hosts which can be reached over a
common network interface. In the absence of the explicit
net mask it is assumed that omitted (or zero valued)
trailing digits specify the mask. (In this case, the
netmask can only be multiples of 8 bits wide.)
Examples:

192.168 or 192.168.0.0

the subnet 192.168.0.0 with an implied netmask of 16
valid bits (sometimes used in the netmask form
255.255.0.0)

192.168.112.0/21

the subnet 192.168.112.0/21 with a
netmask of 21 valid bits (also used in the form
255.255.248.0)

As a degenerate case, a SubNet with 32 valid bits
is the equivalent to an IPAddr, while a
SubNet with zero valid bits (e.g.,
0.0.0.0/0) is the same as the constant _Default_,
matching any IP address.

A IPAddr represents a fully qualified internet
address in numeric (dotted quad) form. Usually, this
address represents a host, but there need not necessarily
be a DNS domain name connected with the address.
Example: 192.168.123.7
Note: An IPAddr does not need to be resolved by
the DNS system, so it can result in more effective apache
performance.

A Hostname is a fully qualified DNS domain name
which can be resolved to one or more IPAddrs via the DNS domain name
service. It represents a logical host (in contrast to Domains, see above) and must be
resolvable to at least one IPAddr (or often to a list of
hosts with different IPAddr's).
Examples: prep.ai.mit.eduwww.apache.org.
Note: In many situations, it is more effective to specify
an IPAddr in place of a
Hostname since a DNS lookup can be avoided. Name
resolution in Apache can take a remarkable deal of time
when the connection to the name server uses a slow PPP
link.
Note: Hostname comparisons are done without
regard to the case, and Hostnames are always
assumed to be anchored in the root of the DNS tree,
therefore two hosts WWW.MyDomain.com and
www.mydomain.com. (note the trailing period)
are considered equal.

This directive is only useful for Apache proxy servers
within intranets. The ProxyDomain directive specifies the
default domain which the apache proxy server will belong to. If
a request to a host without a domain name is encountered, a
redirection response to the same host with the configured
Domain appended will be generated.

This directive controls the use of the Via:
HTTP header by the proxy. Its intended use is to control the
flow of of proxy requests along a chain of proxy servers. See
RFC2068 (HTTP/1.1) for an explanation of Via:
header lines.

If set to off, which is the default, no special
processing is performed. If a request or reply contains a
Via: header, it is passed through
unchanged.

If set to on, each request and reply will get a
Via: header line added for the current
host.

If set to full, each generated Via:
header line will additionally have the Apache server version
shown as a Via: comment field.

If set to block, every proxy request will have
all its Via: header lines removed. No new
Via: header will be generated.

If an http transfer that is being cached is cancelled, the
proxy module will complete the transfer to cache if more than
the percentage specified has already been transferred.

This is a percentage, and must be a number between 1 and
100, or 0 to use the default. 100 will cause a document to be
cached only if the transfer was allowed to complete. A number
between 60 and 90 is recommended.

Sets the name of the directory to contain cache files; this
must be writable by the httpd server. (see the User directive).
Setting CacheRoot enables proxy cacheing; without
defining a CacheRoot, proxy functionality will be
available if ProxyRequests are set to
On, but no cacheing will be available.

Sets the desired space usage of the cache, in KB (1024-byte
units). Although usage may grow above this setting, the garbage
collection will delete files until the usage is at or below
this setting.
Depending on the expected proxy traffic volume and
CacheGcInterval, use a value which is at least 20
to 40 % lower than the available space.

Check the cache after the specified number of
hours, and delete files if the space usage is greater
than that set by CacheSize. Note that hours accepts a
float value, you could for example use CacheGcInterval
1.5 to check the cache every 90 minutes. (If unset, no
garbage collection will be performed, and the cache will grow
indefinitely.) Note also that the larger the
CacheGcInterval, the more extra space beyond the
configured CacheSize will be needed for the cache
between garbage collections.

Specifies the maximum number of hours for which
cachable HTTP documents will be retained without checking the
origin server. Thus, documents will be out of date at most this
number of hours This restriction is enforced even if
an expiry date was supplied with the document.

The NoCache directive specifies a list of words, hosts
and/or domains, separated by spaces. HTTP and non-passworded
FTP documents from matched words, hosts or domains are
not cached by the proxy server. The proxy module will
also attempt to determine IP addresses of list items which may
be hostnames during startup, and cache them for match test as
well. Example:

NoCache joes-garage.com some-host.co.uk bullwinkle.wotsamattau.edu

'bullwinkle.wotsamattau.edu' would also be matched if
referenced by IP address.

Note that 'wotsamattau' would also be sufficient to match
'wotsamattau.edu'.