Over a 36 hour period, the FreshPorts webserver ran out of space on /var four times. This effectively ground the server to a halt. Incoming http requests could not be answered and the database queries were failing. To say the least, I was not amused. Once is annoying, twice is frustrating, etc.

This little article documents that incident and shows the solution I came up with.

Ouch, FreshPorts is down

On Tuesday morning, I found that my FreshPorts webserver was down. Well, actually, the box was up and running. But the main page on the website was blank. And a message was repeating on the console quite frequently:

After that, the /var/tmp files, mentioned above, disappeared. Good. That's that fixed. Famous last words.

I restarted mysqld and then apache:

# /usr/local/etc/rc.d/mysqld.sh # /usr/local/sbin/apachectl start

I noted that /var utilization was still down at 15%. Good.

What is /var anyway?

/var is where your log files, spool files, accounting information, some databases (/var/db) including the FreeBSD packages database (/var/db/pkg) are kept. It's used for files that can grow quickly. Unlike /usr, which can be pretty much the same from host to host, /var typically contains information which is specific to that host and that host alone. You'll find that apache puts it's log files there by default. As does sendmail and many other programs.

Have a look at /var/log and you'll also find the following files:

messages - system messages

maillog - mail log

As I mentioned above, your apache logs are also stored in /var/log, but I've changed that default. My apache logs are stored on another volume, for convenience.

What are the implications if /var fills up? It means the system was unable to do anything. It had no space to write logs, no where to spool incoming mail. Effectively, it was halted.

So what happened to cause this?

I had no idea why the space was filling up. Given that the webserver had been running uninterrupted for several months, I suspected malice on the part of some unsociable animal. That's a natural reaction when the system suddenly starts behaving abnormally. But I checked the logs and found nothing unusual which would indicate any sort of attack. I was quite mystified as to the cause of the problem.

History never repeats

/var went to 100% about 6 hours later. I took the same recovery steps and checked the logs. Nothing obvious. About 12 hours later, /var filled up again. Nothing in the logs. Restarting mysqld released the space.

About 20 minutes later, I noticed /var was back up to 50%. I checked the logs to see what was happening. Again, nothing jumped out at me. I shoved the probable cause of this problem into the background and went on to answer email.

Eventually, the more complex database queries came to mind. I started playing with some of the web pages while keeping an eye on /var/tmp. Eventually I found a web page which would caused files to be created in /var/tmp. One of my those queries quickly took /var to 50%. I started to wonder what would happen if a couple of these queries were launched concurrently or coincided with an incoming port commit. I'll bet that would be enough to fill up /var/tmp. There and then I decided I needed more /var.

That's not a great deal of space. Only about 15MB free. So I decided to give it some more space. Luckily, I had lots of spare disk, some of which was extremely underutilized. The plan was to change the mount point for /var to another drive. This must be done with care as a running system actually needs and uses /var dynamically. You shouldn't just umount /var and mount it somewhere else.

Here are my before and after images of /etc/fstab:

/dev/ad0s1e /var ufs rw 2 2 /dev/da1s1 /usr3 ufs rw 2 2

I changed this to:

/dev/ad0s1e /var-old ufs rw 2 2 /dev/da1s1 /var ufs rw 2 2

This means that the new /var will be where /usr3 once was. The original /var will available under /var-old, just in case I need to view it.

But I did not reboot, nor did I mount or umount anything yet. The above could have been done in single user mode, but I decided to do it from my nice GUI client where I could easily cut and paste.

These changes ensure that on the next reboot, the system will mount the proper volumes in the right places.

In the next step, we'll get the old /var over to the new location. And we'll do that from single user mode.

Moving /var via single user mode

In single user mode, you are the only user. It's much safer to make critical changes in single user mode. That's why I dropped to single user mode:

shutdown now

to do this:

# cp -Rp /var /usr3

I did a copy because I wanted to keep the original files in the old location, just in case. I could have done a mv instead of a cp, but I chose not to. The above command copied everything from the existing /var to the new location. Then I umounted the existing mount points:

# umount /var # umount /usr3

and umounted the new ones:

# mount /var # mount /var-old

I verified that the mounts had succeeded by viewing the output of mount.

This showed that the /dev/da1s1 was mounted as /var and that /dev/ad0s1e was mounted as /var-old. Which is exactly what I needed.

Back to multi-user mode

Then I left single user mode and went back to multi-user mode:

# exit

An exit (a CONTROL-D will also work) from single user mode will take you back to multi-user mode.

It's now been just over 24 hours since I created the new /var, giving it 2GB of space (that's really overkill and is far far more than it will never need, but it was a quick and easy solution). Eventually, I'll partition that disk and give /var about 100MB. But that's another article.

A side effect is that the box seems to be a faster. Mind you, this may just be my perception and there has been no actual improvement. But, theoretically, by putting /var on its own SCSI disk, speed can be increased. One disk can be writing data to /var while the other is reading the database files on /usr.

My theory is that I've always had those SQL files created in /var/tmp, but they've never gotten to such a large size. Perhaps as more and more new ports have been added to the FreshPorts database, a larger temp file is needed for particular types of updates. And it is merely coincidence that these /var peaks have just started reaching the capacity of the disk.

As always, should you or any of your IM force be caught or killed with more information about the above, your comments will be appreciated.

Other ideas

Shortly after writing this article, several people wrote in regarding /var/tmp.

They all mentioned setting TMPDIR to a different location, say /usr4/tmp and restarting mysqld.