OK, so this is the final part in a long winded series of posts about why and how to defragment a Linux partition.

I’ve previously mentioned the “why” of why I needed to do it, and the “how” of how I went about achieving it. Now for the final results.

The main question I wanted to answer is “Did defragging speed up the deletion in MythTV”. Short answer “No, not in the slightest”. It still takes a very long time, and I cannot see any improvement over the deletions I did before defragging.

Rubbish.

So now, I’m going to try the other things – optimising the database, reducing the number of “Record All on All Channels” schedules that I have, etc.

The other question I should answer is “What happened about removing the recordings from underneath MythTV’s feet?” – Those of you who read the previous post will remember that I discovered that removing recordings from the recordings directory, whilst MythTV was still active was a very bad plan. Fortunately, all you need to do to fix it, is to replace the files, and then reboot – although, you probably don’t need to reboot, you can probably just kill MythBackend and MythFrontend and it might work the same; I just prefer rebooting so that I know for sure that there won’t be any other repercussions.

So this is now post number four in a long rambling series on how to stop my media centre from taking forever to delete a recording.

Last time, I spoke about the theory of how to defragment a particular set of files on a Linux partition. In my case, this is ext3, but the theory applies across the board and even works for Windows NTFS partitions as well.

Anyway, this is how I have defragmented my recordings….

I am in the lucky situation that I had two hard drives in my media centre, and luckily I also had enough free space on the “other drive” (the one that doesn’t contain the recordings) to hold all of the recording files in one go. To make life easier, on the destination drive, I created a directory: /home/defrag.

I then changed into my recordings directory, and issued the following command:

find . -mtime +1 -exec mv {} /home/defrag/ \;

This will move all files older than 24 hours into the directory /home/defrag. As I said previously, you will note that I am doing this using “mv” and not “cp” in order to guarantee that I’m defragmenting the files as I go.

I then left this for blood ages to copy the files to the other drive. That part has just finished, and I will shortly move them back to their original locations. Therefore, they should be nicely defragmented.

Now, there are a couple of things to note with this:

MythTV doesn’t like it if you remove the recordings from underneath it’s feet. Thinking about it, that kind of makes sense, so make sure you have the MythTV backend disabled, and it’s not recording anything. Hopefully mine will be fine once I have restored the files, and rebooted!

That “24 hours” clause on the find command is there for a reason. It means that it won’t move anything that is currently being recorded. Although, if you have the backend turned off, then you shouldn’t need to worry about that anyway.

The files were not moved in the order that I was expecting. Now this, for me, is the most interesting point. When the files were moving, they appeared to be moved in a completely random order – it was not by date/time, nor was it alphabetically. It jumped from letter to letter, from month to month. Now, I expect that this is because the files are being read in disk order – i.e. the order that they are stored on disk. I’m trying to find a reference for “find”, but I cannot seem to find one that says what the default sort order is – I expect it’s probably “no sort”, as in, handle the files as they are encountered. What this means is, if the files are being read in from disk, and are jumping around from month to month, it therefore means that the free space between recordings has been reused, and therefore you are quite likely to get fragmentation occurring.

Bingo! So, if they are being retrieved in the order that they are on disk, and that order is strange, then something is probably fragmented somewhere. Also, when you couple this with the fact that one of the recordings is the 5 hour long Beijing Olympic Opening Ceremony (a hefty 12GB in size), it’s quite plain to see that may be it was pretty well fragmented around that part of the disk. 165GB of semi-fragmented files is not going to be particularly responsive.

I feel somewhat vindicated….

Anyway, once everything is back in place, I’ll provide an update as well. I’m also planning on doing the other suggested things, like really pruning down the “Record All Showings” schedules that I have, and then running the database optimisation script as well. After each one, I’ll make some notes, so I can nail down precisely what causes the problem.

So, further to my previous ramblings about when it is a good idea to defrag a Linux partition, the advice that I saw repeated time and time again, on how to defrag a hard disk, was quite simple to understand.

The basic theory is, is that you do not need to do anything more spectacular than move some suspected fragmented files from one partition onto another, and then back to the original location. Now, this is something that is quite easy to do, and the theory behind it is quite simple as well.

Because doing a file move users the userland commands (like “mv” and “cp”), we will be copying the contents of the file, and not the disk blocks used by the file, and it will then be deleted. Therefore, the file will be copied from partition A to partition B in an unfragmented way – i.e. from the start of the file to the end. Now, remember that the file system will choose a contiguous block of disk space, plus a little extra. This means that as soon as the file is on partition B, it is automatically defragmented for us. Brilliant!

Now, the only problem with this is, that you file is now on a different partition, and therefore in a different directory to before. This therefore means that you have to move the file back to the original partition in order for everything to work. This also has a slight plus, as if there is not much free space on partition B to hold the file in a contiguous block, the filesystem would therefore fragment it (not the desired effect), however, seeing as the file came from partition A in the first place, there should be enough free space on there to hold it in a contiguous block (unless of course partition A is already very full, in which case it could become fragmented again, but in this situation, there is nothing really that you could do in order to defragment anyway).

If you have enough free space, you do not necessarily need two partitions in order to make this work. You could simple copy the files from one directory to another, delete the originals and copy them back again. Note, however, that there is a problem with this procedure if you are not careful. You cannot simply move the files from the original directory to the new directory – it explicitly has to be a copy. As Linux’s ext2 and 3 filesystems are based on a linked list approach, moving a file involves nothing more than moving around a few address pointers on disk (hence why it is a very quick procedure to move files on disk in Linux). The files themselves are not physically moved, and therefore they would not be defragmented. The second “copy” procedure, can be a “move”, as the files would have already been defragmented in the original “copy”.

I should also point out that this procedure also works on a Windows NTFS partition as well. If I could find the link I had, I could point you towards a third-party disk defragmentor for Windows, that does the same thing (and if I remember correctly, I’ve got a feeling I saw Microsoft’s Raymond Chen using it as well.)

So, how do you actually achieve all this, and how did I get on when I did it?. It’s actually quite simple, and I’ll post that later.

I posted last night about slow delete response time in MythTV. At some point, I mentioned that I would explain why file fragmentation could be an issue on a MythTV box. To be fair, this isn’t MythTV’s fault – it’s a problem that would affect all media centre computers, regardless of file system, OS or architecture.

This is all from information I learnt last night, so it’s an amalgamation of various sources, some which I forget.

The typical line about Linux file systems (particularly ext3) is that they do not need to be typically defragmented. From the reasons that were given, it does make sense that in a lot of instances, that is correct and they do not need to be defragged.

Basically, when a file is created, the file system drivers do not simply try and fill all available blocks from the start of the disk (e.g. like FAT does), that is to say, imagine two files – A and B – (size/number of blocks isn’t important) which are separated by a space of 10 free blocks. Now imagine a third file (C) is written to disk – it is 15 blocks in size. With a FAT (or similar) file system is used, it will write the first 10 blocks of C in-between files A and B, and write the final 5 blocks after file B. With a more intelligent file system (say ext3), the system will see the empty 10 blocks and ignore them, it will seek for a space which is 15 blocks in size. Only if it cannot find a space of an appropriate size, will it use the free spaces between existing files.

As well as this, file systems such as ext3 will preallocate/reserve a few extra blocks around a file, to allow for future file growth. This again reduces the need to find new spaces, as they is usually enough room to accommodate the increase.

Also, something else to note is that (allegedly) ext3 is good at this, xfs is much better, and ReiserFS (v3) is crap at doing this (and hence there is, apparently, a need to defragment ReiserFS partitions).

This is all very well and good, and I can understand where people are coming from when they say “You don’t need to defragment Linux file systems”, however there is a problem with this and it applies to media centres.

The problem is quite simple: Recordings on a media centre are large, and they grow at an unknown rate for long periods of time.

Think about it for a second, if you copy a Word document on to a partition, the chances are it’s not going to grow too much, and if it does, it will happen in a quick burst. With a media file, that is being recorded from a live stream, it will be drip-fed data over a period of time – an hour long programme is not an unusual programme length.

This means that the preallocation of blocks isn’t going to be enough – you are quickly going to overrun those buffer-blocks. As more and more data is fed in, the file system driver is not going to perpetually move the recording file into free space – imagine copying an hour long recording of approximately 4GB every time you over run a few kilobytes of buffer blocks.

Couple this with the fact that you are not going to delete recordings in the order you recorded them in – if you’re like me, you delete them as you watch them, and maybe save a few for later watching. That behaviour is going to leave free-space holes all over the place, and I really don’t know how the file system picks up those holes – may be it will start to use those holes up first as initially your recording file is going to be small.

So, so far this points to the idea that a media centre is more likely to need defragmenting than a usual desktop machine, or even server. So I would expect to see more fragmentation on a media centre, and therefore, this could be part of my problem.

Applying a bit of lateral thinking, and a bit of background knowledge, I can also understand why this wouldn’t affect something like a database server. The reason is quite simple: for large database servers, the database engine will often preallocate large files to store the data in, rather than growing them organically. Therefore, the files are often big enough to soak up the growing and shrinking of a database and you don’t get the same fragmentation problem.

So, now here is the killer blow as well…. What if you’re recording two programmes at the same time?

I’m in this exact situation – I have two tuner cards in my MythTV box, so that I can watch one channel whilst I record another, or I can record two programmes at the same time. Each programme will have it’s own media file which will grow organically. Each time the file grows, it conflicts with the free space of the other recording, continually battling for free space. I can see it now in my head – the disk cluttered with alternating blocks from two files! Fragmentation hell…..

I noticed a little while ago that I was getting a very slow response time in the MythTV frontend whenever I tried to delete a recorded TV programme.

This was never too much of a problem but it has recently started to irritate me, and tonight I’ve decided to search for an answer.

My initial thought was that it could be a problem with file system fragmentation (and nobody say to me that “Linux does not need to be defragmented” – that’s bollocks and a gross generalisation which does not apply in this instance), however looking into it a bit more, the file fragmentation problem is only really present on systems which are almost full. I have a spare 124GB of data, so that should not be the case for me. An important thing to note though is that fragmentation can occur when recording long files, and especially when you have multiple tuners and are recording more than one programme at the same time (but I’ll post more on that later).

So, I started to search for more MythTV related things. I’ve found this wonderful newsgroup post here:

This seems to (basically) imply that the reason it takes so long is because the frontend has to wait for the backend to delete the programme, which in turn is waiting for a slow MySQL query to finish. The reason that that query is slow is because it is searching for programmes and upcoming recordings that it needs to update.

Now, there are plenty of suggestions about defragging the database, and optimising it (in fact, this post points to a script in the Contrib directory which will optimise the MythTV database for you.

Once you get down to the very bottom of thread, you started to see some more sensible answers being posted, and it’s this one that has peaked my interest. Basically, the theory is that the more “Record at any time on any channel” schedules you have, the more difficult it is for the system to schedule those programmes (i.e. when searching the database for programmes, the number of results returned is therefore greater,) and that therefore takes more time for the query to execute. The simplest way to reduce that query time is to reduce the number of “Record at any time on any channel” schedules that you may have.

I’ve since removed some schedules that were really annoying me (More4 is a pain in the arse, by repeating the same programme 4 times in one day, day after day after day, and not labelling the programmes sensibly so that MythTV can filter the ones that you have already watched!), and I have since noticed that the amount of time has dropped. Slightly. It’s not a massive difference, but the amount of time it is taking is reduced.

I’m going to try and further reduce the number of programmes that I am recording, and see how that goes. I’m also planning on clearing out a load of old programmes as well, so that should help alot. I’m also going to try and to optimise the database, and see how much that helps.