Six Things First-Time Squid Administrators Should Know

New users often struggle with the same frustrating set of Squid idiosyncracies. In this article, I'll detail six things you should know about using Squid from the get-go. Even if you're an experienced Squid administrator, you might want to look at these tips and give your configuration file a sanity check, especially the one about preventing spam.

1. File Descriptor Limits

File descriptor limits are a common problem for new Squid users.
This happens because some operating systems have relatively low
per-process and system-wide limits. In some cases, you must take
steps to tune your system before compiling Squid.

A file descriptor is simply a number that represents an open file
or socket. Every time a process opens a new file or socket, it
allocates a new file descriptor. These descriptors are reused
after the file or socket is closed. Most Unix systems place a limit
on the number of simultaneously open file descriptors. There are
both per-process and per-system limits.

How many file descriptors does Squid need? The answer depends
on how many users you have, the size of your cache, and which particular
features that you have enabled. Here are some of the things that
consume file descriptors in Squid:

Communication with external helper processes, such as redirectors and authenticators

Idle (persistent) HTTP connections

Even when Squid is not doing anything, it has some number of file
descriptors open for log files and helpers. In most cases, this
is between 10 and 25, so it's probably not a big deal. If
you have a lot of external helpers, that number goes up. However,
the file descriptor count really goes up once Squid starts serving
requests. In the worst case, each concurrent request requires
three file descriptors: the client-side connection, a server-side
connection for cache misses, and a disk file for reading hits or
writing misses.

A Squid cache with just a few users might be able to get by with
a file descriptor limit of 256. For a moderately busy Squid,
1024 is a better limit. Very busy caches should use 4096 or more.
One thing to keep in mind is that file descriptor usage often
surges above the normal level for brief amounts of time. This
can happen during short, temporary network outages or other
interruptions in service.

There are a number of ways to determine the file descriptor limit
on your system. One is to use the built-in shell commands limit
or ulimit.

For Bourne shell users:

root# ulimit -n
1024

For C shell users:

root# limit desc
descriptors 1024

If you already have Squid compiled and installed, you can
just look at the cache.log file for a line like this:

2003/12/12 11:10:54| With 1024 file descriptors available

If Squid detects a file descriptor shortage while it is running,
you'll see a warning like this in cache.log:

WARNING! Your cache is running out of file descriptors

If you see the warning, or know in advance that you'll need more
file descriptors, you should increase the limits.
The technique for increasing the file descriptor limit varies
between operating systems.

For Linux Users

Linux users need to edit one of the system include files and
twiddle one of the system parameters via the /proc interface.
First, edit /usr/include/bits/types.h and change the value for
__FD_SETSIZE. Then, give the kernel a new limit with this command:

root# echo 1024 > /proc/sys/fs/file-max

Finally, before compiling or running Squid, execute this shell command
to set the process limit equal to the kernel limit:

root# ulimit -Hn 1024

After you have set the limit in this manner, you'll need to
reconfigure, recompile, and reinstall Squid. Also note that these
two commands do not permanently set the limit. They must be
executed each time your system boots. You'll want to add them to
your system startup scripts.

For NetBSD/OpenBSD/FreeBSD Users

On BSD-based systems, you'll need to compile a new kernel. The kernel
configuration file lives in a directory such as /usr/src/sys/i386/conf or
/usr/src/sys/arch/i386/conf. There you'll find a file, possibly named
GENERIC, to which you should add a line like this:

options MAXFILES=8192

For OpenBSD, use option instead of options. Reboot your
system after you've finished configuring, compiling, and installing
your new kernel. Then, reconfigure, recompile, and reinstall
Squid.

For Solaris Users

Add this line to your /etc/system file:

set rlim_fd_max = 1024

Then, reboot the system, reconfigure, recompile, and reinstall Squid.

For further information on file descriptor limits, see Chapter 3, "Compiling and Installing", of Squid: The Definitive Guide or section 11.4 of
the Squid FAQ.

2. File and Directory Permissions

Directory permissions are another problem that first-time users
often encounter. One of the reasons for this difficulty is that,
in the interest of security, Squid refuses to run as root.
Furthermore, if you do start Squid as root, it switches to a
default user ("nobody") that has no special privileges. If you
don't want to use the "nobody" userid, you can set your own with
the cache_effective_user directive in the configuration file.

Certain files and directories must be writable by the Squid userid.
These include the log files, usually found in /usr/local/squid/var/logs,
and the cache directories, /usr/local/squid/var/cache by default.

As an example, let's assume that you're using the "nobody" userid
for Squid. After running make install, you can use this command
to set the permissions for the log files and cache:

Then, you can proceed to initialize the cache directories with
this command:

root# /usr/local/squid/sbin/squid -z

Helper processes are another source of potential permission
problems. Squid spawns the helper processes as the unprivileged
user (that is, as "nobody"). This usually means that the helper program
must have read and execute permissions for everyone (for example,
-rwxr-xr-x). Furthermore, any configuration or password files
that the helper needs must have appropriate read permissions as
well.

Note that Unix also requires correct permissions on parent
directories leading to a file. For example, if /usr/local/squid
is owned by root with -rwxr-x--- permissions, the user nobody
will not be able to access any of the directories underneath it.
/usr/local/squid should be "-rwxr-xr-x" instead.

You may want to debug file or directory permission problems from a
shell window. If Squid runs as nobody, then start a shell process
as user nobody:

root# su - nobody

(You may have to temporarily change "nobody"'s home directory and
shell program for this to work.) Then, try to read, write, or
execute the files that are giving you trouble. For example:

3. Controlling Squid's Memory Usage

Squid tends to be a bit of a memory hog. It uses memory for many
different things, some of which are easier to control than others.
Memory usage is important because if the Squid process size exceeds
your system's RAM capacity, some chunks of the process must be
temporarily swapped to disk. Swapping can also happen if you
have other memory-hungry applications running on the same system.
Swapping causes Squid's performance to degrade very quickly.

An easy way to monitor Squid's memory usage is with standard
system tools such as top and ps. You can also ask Squid
itself how much memory it is using, through either the cache
manager or SNMP interfaces. If the process size becomes too large,
you'll want to take steps to reduce it. A good rule of thumb is
to not let Squid's process size exceed 60% to 80% of your RAM capacity.

One of the most important uses for memory is the main cache index.
This is a hash table that contains a small amount of metadata for
each object in the cache. Unfortunately, all of these "small" data
structures add up to a lot when Squid contains millions of objects.
The only way to control the size of the in-memory index is to
change Squid's disk cache size (with the cache_dir directive).
Thus, if you have plenty of disk space, but are short on RAM, you
may have to leave the disk space underutilized.

Squid's in-memory cache can also use significant amounts of RAM.
This is where Squid stores incoming and recently retrieved objects.
Its size is controlled by setting the cache_mem directive. Note
that the cache_mem directive only affects the size of the memory
cache, not Squid's entire memory footprint.

Squid also uses some memory for various I/O buffers. For example,
each time a client makes an HTTP request to Squid, a number of
memory buffers are allocated and then later freed. Squid uses
similar buffers when forwarding requests to origin servers, and
when reading and writing disk files. Depending on the amount and
type of traffic coming to Squid, these I/O buffers may require a
lot of memory. There's not much you can do to control memory
usage for these purposes. However, you can try changing the TCP
receive buffer size with the tcp_recv_bufsize directive.

If you have a large number of clients accessing Squid, you may
find that the "client DB" consumes more memory than you would
like. It keeps a small number of counters for each client IP
address that sends requests to Squid. You can reduce Squid's
memory usage a little by disabling this feature. Simply put
client_db off in squid.conf.

Another thing that can help is to simply restart Squid periodically,
say, once per week. Over time, something may happen (such as a
network outage) that causes Squid to temporarily allocate a large
amount of memory. Even though Squid may not be using that memory,
it may still be attached to the Squid process. Restarting Squid
allows your operating system to truly free up the memory for other
uses.

You can use Squid's high_memory_warning directive to warn you
when its memory size exceeds a certain limit. For example, add
a line like this to squid.conf:

high_memory_warning 400 MB

Then, if the process grows beyond that value, Squid writes warnings
to cache.log and syslog if configured.

4. Rotating the Log Files

Squid writes to various log and journal files as it runs. These
files will continually increase in size unless you take steps to
"rotate" them. Rotation refers to the process of closing a log
file, renaming it, and opening a new log file. It's similar to
the way that most systems deal with their syslog files, such
as /var/log/messages.

If you don't rotate the log files, they may eventually consume
all free space on that partition. Some operating systems, such
as Linux, cannot support files larger than 2Gb. When
this happens, you'll get a "File too large" error message and
Squid will complain and restart.

To avoid such problems, create a cron job that periodically rotates
the log files. It can be as simple as this:

0 0 * * * /usr/local/squid/sbin/squid -k rotate

In most cases, daily log file rotation is the most appropriate.
A not-so-busy cache can get by with weekly or monthly rotation.

Squid appends numeric suffixes to rotated log files. Each time
you run squid -k rotate, each file's numeric suffix is incremented
by one. Thus, cache.log.0 becomes cache.log.1, cache.log.1 becomes
cache.log.2, and so on. The logfile_rotate directive specifies
the maximum number of old files to keep around.

Logfile rotation affects more than just the log files in
/usr/local/squid/var/logs. It also generates new swap.state
files for each cache directory. However, Squid does not keep old
copies of the swap.state files. It simply writes a new file from
the in-memory index and forgets about the old one.

5. Understanding Squid's Access Control Syntax

Squid has an extensive, but somewhat confusing, set of access
controls. The most important thing to understand is the difference
between ACL types, elements, and rules, and how they work together
to allow or deny access.

Squid has about 20 different ACL types. These refer to certain
aspects of an HTTP request or response, such as the client's IP
address (the src type), the origin server's hostname (the
dstdomain type), and the HTTP request method (the method
type).

An ACL element consists of three components: a type, a name, and one or
more type-specific values. Here are some simple examples:

acl Foo src 1.2.3.4
acl Bar dstdomain www.cnn.com
acl Baz method GET

The above ACL element named Foo would match a request that comes from
the IP address 1.2.3.4. The ACL named Bar matches a www.cnn.com URL.
The Baz ACL matches an HTTP GET request. Note that we are not allowing
or denying anything yet.

For most of the ACL types, an element can have multiple values, like this:

A multi-valued ACL matches a request when any one of the values
is a match. They use OR logic. The Argle ACL matches a request
from 1.1.1.8, from 1.1.1.28, or from 1.1.1.88. The Bargle ACL
matches requests to NBC, ABC, or CBS web sites. The Fraggle ACL
matches a request with the methods PUT or POST.

Now that you're an expert in ACL elements, its time to graduate
to ACL rules. These are where you say that a request is allowed
or denied. Access list rules refer to ACL elements by their names
and contain either the allow or deny keyword. Here are some
simple examples:

http_access allow Foo
http_access deny Bar
http_access allow Baz

It is important to understand that access list rules are checked
in order and that the decision is made when a match is found.
Given the above list, let's see what happens when a user from
1.2.3.4 makes a GET request for www.cnn.com. Squid
encounters the allow Foo rule first. Our request matches the
Foo ACL, because the source address is 1.2.3.4, and the request
is allowed to proceed. The remaining rules are not checked.

How about a PUT request for www.cnn.com from 5.5.5.5? The request
does not match the first rule. It does match the second rule,
however. This access list rule says that the request must be
denied, so the user receives an error message from Squid.

How about a GET request for www.oreilly.com from 5.5.5.5? The
request does not match the first rule (allow Foo). It does not
match the second rule, either, because www.oreilly.com is different
than www.cnn.com. However, it does match the third rule, because
the request method is GET.

Of course, these simple ACL rules are not very interesting. The
real power comes from Squid's ability to combine multiple elements
on a single rule. When a rule contains multiple elements, each
element must be a match in order to trigger the rule. In other
words, Squid uses AND logic for access list rules. Consider this
example:

http_access allow Foo Bar
http_access deny Foo

The first rule says that a request from 1.2.3.4 AND for www.cnn.com
will be allowed. However, the second rule
says that any other request from 1.2.3.4 will be denied. These
two lines restrict the user at 1.2.3.4 to visiting only the
www.cnn.com site. Here's an even more complex example:

These three lines allow the Argle clients (1.1.1.8, 1.1.1.28, and 1.1.1.88)
to access the Bargle servers (www.nbc.com, www.abc.com, and www.cbs.com), but
not with PUT or POST methods. Furthermore, the Argle clients are not
allowed to access any other servers.

One of the common mistakes often made by new users is to write a
rule that can never be true. It is easy to do if you forget that
Squid uses AND logic on rules and OR logic on elements. Here is
a configuration that can never be true:

acl A 1.1.1.1
acl B 2.2.2.2
http_access allow A B

The reason is that a request cannot be from both 1.1.1.1 AND
2.2.2.2 at the same time. Most likely, it should be written
like this:

acl A 1.1.1.1 2.2.2.2
http_access allow A

Then, requests from either 1.1.1.1 or 2.2.2.2 are allowed.

Access control rules can become long and complicated. When
adding a new rule, how do you know where it should go? You should
put more-specific rules before less-specific ones. Remember that
the rules are checked in order. When adding a rule, go through
the current rules in your head and see where the new one fits.
For example, let's say that you want to deny requests to a certain
site, but allow all others. It should look like this:

If we place the new rule after deny XXX, it will never even get
checked. The first rule will always match the request and she
will not be able to visit the site.

When you first install Squid, the access control rules will deny
every request. To get things working, you'll need to add an ACL
element and a rule for your local network. The easiest way is to
write an source IP address ACL element for your subnet(s). For
example:

acl MyNetwork src 192.168.0.0/24

Then, search through squid.conf for this line:

# INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS

After that line, add an http_access line with an allow rule:

http_access allow MyNetwork

Once you get this simple configuration working, feel free to move on to
some of the more advanced ACL features, such as username-based proxy
authentication.

6. How to Not Be a Spam Relay

Unless you've been living under a rock, you're aware of the spam
problem on the Internet. Spam senders used to take advantage of
open email relays. These days, a lot of spam comes from open
proxies. An open proxy is one that allows outsiders to make
requests through it. If others on the Internet receive spam email
from your proxy, your IP address will be placed on one or more
of the various blackhole lists. This will adversely affect your
ability to communicate with other Internet sites.

Use the following access control rules to make sure this never
happens to you. First, always deny all requests that don't come
from your local network. Define an ACL element for your subnet:

acl MyNetwork src 10.0.0.0/16

Then, place a deny rule near the top of your http_access rules
that matches requests from anywhere else:

http_access deny !MyNetwork
http_access ...
http_access ...

While that may stop outsiders, it may not be good enough. It
won't stop insiders who intentionally, or unintentionally, try
to forward spam through Squid. To add even more security, you
should make sure that Squid never connects to another server's
SMTP port:

acl SMTP_port port 25
http_access deny SMTP_port

In fact, there are many well-known TCP ports, in addition to SMTP,
to which Squid should never connect. The default squid.conf
includes some rules to address this. There, you'll see a
Safe_ports ACL element that defines good ports. A deny
!Safe_ports rule ensures that Squid does not connect to any of
the bad ports, including SMTP.

Duane Wessels
discovered Unix and the Internet as an
undergraduate student studying physics at Washington State University.