Looking for UNIX and IT expertise? Why not get in touch and see how we can help?

A fairly common problem with Solaris UFS filesystems is where df output is showing lots of free space, but you can’t actually write to the filesystem. Having been recently playing with multi-terabyte filesystems, and forcing these sort of issues for debugging, I thought I’d share some information about the tools you can use and what they can report.

If we have multi-terabyte filesystems, our number of bytes per inode (nbpi) could be set too high if we’re using lots of small files – in which case it’s very easy to run out of inodes. We can see on this filesystem that we’ve used up all our inodes. Trying to write to this filesystem will result in “No space left on device” error messages – which is always good for some head scratching fun, as we can see that we’ve got 1.4Tb of space free.

To get an idea of how inodes, block size and things have been specified we need to find out how the filesystem was built:

/usr/sbin/mkfs -m <disk_device>

I’ve wrapped the line here to make it a bit more readable, but here’s the output querying our full multi-terabyte filesystem.

This will show the commands passed to mkfs when it created the filesystem, and we can get an idea of what parameters were specified when the filesystem was built.

Things we care about here are:

fragsize – the smallest amount of disk space that can be allocated to a file. If we have loads of files smaller than 8kb, then this should be smaller than 8kb.

nbpi – number of bytes per inode

opt – how is filesystem performance being optimised? t means we’re optimising to spend the least time allocating blocks, and s means we’ll be minimising the space fragmentation on the disk

On a multiterabyte filesystem, nbpi cannot be set to less than 1mb, and fragsize will also be set to bsize. So we’d want to optimise for time as opposed to fragments, as we’ll only every allocate in 8kb blocks.

fstyp is the command we can use to do some really low-level querying of a UFS filesystem.

We can invoke it with:

fstyp -v <disk_device>

Make sure you pipe it through more, or redirect the output to a file, because there’s a lot of it. fstyp will report on the statistics of all the cylinder groups for a filesystem, but it’s really just the first section reported from the superblocks that we’re interested in.

bsize and fsize show us the block and fragment size, respectively.
nbfree and nffree show us the number of free block and fragments, respectively. If nbfree is 0, you’re in trouble – no free blocks means no more writing to the filesystem, regardless of how much space is actually still available.

What usually happens when writing lots of small (ie. > 8kb) files to a filesystem is that the number of free blocks (nbfree) has fallen to 0, but you’ve got plenty of fragments left. If block size = fragment size, that’s not an issue – but if fragments are, say, 2kb, then you’re not going to be able to write to the filesystem any more (“file system full” error messages) even though df is showing lots of free disk space.

A big part of tuning your filesystem is knowing what’s going on it. For multi-terabyte filesystems, you should be placing larger files on there – so setting block size to equal fragment size won’t be wasting space.

If you’ve got lots of smaller files, you’ll need to think about what the average filesize is – if it’s less than 8kb, you’ll want to make sure that fragment size is also less than 8kb. Otherwise you’ll be wasting space by writing 8kb blocks all the time when you could get away with 2kb fragments.

Anyway, back to the problem at hand – our 2Tb filesystem that’s run out of inodes. In this particular case, we’ll need to rebuild the filesystem and allocate more inodes. The question is – how do we work out what the value should be?

This simple shell script will analyse the files from the directory you execute it in, and will come back with the average file size:

Now, if our average file size is 252k, then our inode density of 1161051 (1 inode per 1mb) is going to be hopelessly inadequate. This is born out by looking again at our df output – we can see that we’ve run out of inodes when the filesystem is only approximately a quarter full, which matches up to our average file size being a quarter of the inode density.

However, at this point, we’re stuffed – we can’t set nbpi to be less than 1mb on a Solaris UFS filesystem that’s larger than 1Tb. Our only options are: