Technical: December 2004 Archives

I just noticed that some idiot has decided to steal bandwidth by linking to an image on my website. I went hunting around looking for a suitable replacement when I came across an entry by Jeremy Zawodny

Unfortunately the site has decreased the size of the image to 24x24 which means you don't get the full effect of the image. I did consider using the infamous goatse image but decided against it.

I used mod_rewrite to change the image by adding a couple more rules to my .htaccess file.

Very often we are humbled by the simplest things and tonight I got a good one.

I had created a plpgsql function that was called from within a trigger to check for some duplicates in a table, blah blah. The function was working because I had tested it. It was registered in the "pg_proc" table and the two "proargtypes" were type 1043 ie varchar. This function was there, I could see it and if I was a character in Tron I could touch it, so why the hell when one of my scripts ran did I get: function foo does not exist.

I'll tell you why, I was not using the correct schema. Ahhhhhhhhhhhh.

I had tested it while logged in as the user who created it and that users schema is different from the user that needs to use it from the website. A quick

When I have added a few CD'S to my ogg collection I need to copy them across to my H320. This is OK if you copy the CD over as soon as it has been copied but I don't normally work that way. We just got some CD's for Christmas and I already had a few copied to the hard drive so as usual I use rsync to do it.

I noticed that rsync would always copy everything rather than do an update of the files. I found out that this is because of limitations on a vfat file system. The following command sorted it.

I tried to make up a few playlists today for my H320 iRiver. I am using vim to make them not an application and this had me stumped for a while.

Then I realised that the iRiver expects a DOS formatted file. Basically Linux text files use "\n" for a newline and MS windows uses "\r\n". To get it to work I had to save the file as a dos file. This can be done in vim as follows.

:set fileformat=dos

For those that don't like vim then a Perl one liner will do the same thing.

perl -i.bak -pe 's/(?<!\r)\n/\r\n/g;' *.m3u

It is safe to run this on a file that has already been converted to the msdos format. It also creates a backup file in case anything goes horribly wrong.

So a simple example of an m3u playlist is as follows:

#EXTM3U
\cypress_hill\a_to_the_k.ogg

Of course the "\r\n" is not visible but its there ;) The example m3u file above is in the ROOT directory of the mp3 player. Remember that on the H320 and probably on the other iRiver makes as well you need to use the A-B button to view your playlist when the player has stopped.

I was approached tonight by Henry Stern with respect to registering my interest in a project to help curb blog Spam. Apparently it has been noted that I wrote a SpamAssassin plugin for Movable Type. Wonders never cease.

I think a project would be good, as for what would come out of it that remains to be seen. There are a good few blogging tools out there and I doubt getting them all pulling together would be feasible. Although this is not really a necessary requirement to start one.

I have never seen the oom-killer before. I had heard stories from battle hardened veterans about their tussles with the beast but these stories where all just myths to me, until today, when the beastie raised its head in my logs.

If you have found this page then you probably have a serious problem with a corrupt Postgres database. If like me you found that your database seems to be missing then you have come to the correct place. I am not saying this will cure your problem or even come close but it might give you a few ideas on where to look next.

If you are going to follow this article I strongly suggest you read it ALL before starting to do anything. I have written it as I done it and I went down some blind alleys before I recovered the data. If you start doing what I have done as you read it you might waste a lot of time or fsck something up, so read it first and decide what you need to do. In most situations it might be a very simple single command that needs to be run and you will be sorted, others may have to do it a bit different.

I would also highly recommend getting onto the Postgres mailing lists and asking some pertinent questions. Make sure if you do post questions you give as much detail as possible i.e. version numbers and full debug level log output. There are people on there who have done this a lot more than I have and I am sure they have seen some nastier cases than the one I've got.

This article does not cover how to fix file system errors, see fsck or e2fsck for that if you have them. You might also want to investigate setting

zero_damaged_pages = true

in your postgresql.conf file if you are expecting corruption in your files. Ask on the postgresql maiing lists about this before doing it though.

Type: \copyright for distribution terms
\h for help with SQL commands
\? for help on internal slash commands
\g or terminate with semicolon to execute query
\q to quit

links=# \dt
no relations found
links=# select relname from pg_class where relname not like 'pg_%';

none of my tables were present in the output, this is where I had a sudden urge to go to the toilet.

I don't often use the links database but every now and then I start a set of spiders that use the database to traverse the internet. I have been doing this for about 2 years and the database is huge or at least for a home machine it is.

I only have a backup from several months ago (lesson to be learned here) and I really didn't want to loose the data I had collected over the last few months.

On hunting around the internet I noticed that there have been a lot of people who have had corrupt postgres databases and managed to recover them with varying degrees of success. Most causes of these corruptions seemed to be hardware related and some with bodged upgrades. Mine could have been one of 2 things:

1. A glitch on the 160Gb SATA drive the database is stored on. This happened the other night.
2. Recent Debian upgrade to the database.

At this point figuring out what went wrong was less important than getting the data back so I decided not to bother on a witch hunt and cure it instead.

This was the point where I asked myself what was more important.

1. Recover as much of the data as possible.
2. Data Integrity.

For me the choice was quite simple. I wanted the data, I also needed to be able to retrieve the vast majority of it otherwise I might as well just use most recent dump which would guarantee the integrity of the database but would set me back a few months.

First thing I did was stop the postgres server:

/etc/init.d/postgres stop

I then took a full copy of the "data" directory, this is the one you may have used the "initdb" command to setup.

Once the backup has been made make sure that nothing happens to the original directory, don't do anything to it at all because we may need it later. All subsequent actions will use the copy of the database not the original.

At this point it might be an idea to a little data gathering. For me I needed to know what table was the largest etc. Doing

ls -la /var/lib/postgres/data/base/17142/

This will list all the files in the directory. I was pretty sure that the biggest table was going to be either the "child_links" or home_page table and it was easy to see in my case which was the largest tables. I also turned on full logging on all the postgres databases by editing the postgresql.conf file. Be aware that any more database created by "initdb" will create separate config files in the data directory and these will need to be edited. I suggest copying over them with a single common one. Another thing that you will need to know is the last Transaction ID or at least as close to value as possible. When you start stop the postgres database it write and entry to the log file and this contains a TID. I used grep to find mine ie

grep "next transaction ID" /var/log/syslog

This produced a list of TID's. (I log to syslog you might not, check postgresql.conf to find out)

The next thing I did was create a new database away from both the copy and the original databases. I did this using the initdb command as follows

initdb /var/lib/postgres/recovery_database

This creates a skeleton database data directory ready for action. Make sure no postmaster instances are running. I then started my new database as follows

/path/to/postmaster -D /var/lib/postgres/recovery_database

This database as it stands is not really much use so:

createdb links

I then fished out the create table script for my database (you may not need this). And created an empty database using it. I then had all my original tables with no data in them. The next thing I did was

I know I could have just soft linked them but I was being cautious. If I take copies the originals are safe from harm if I make a cock up.

Restart the recovery database again using the following. (please read the man page before using pg_resetxlog). I used 90812030 here because this number was the largest Transaction ID I could get from the logs.

From what I could tell I needed to use pg_resetxlog but this time I had to give pg_resetxlog some info about the WAL setting. Following the instructions in the man page I stopped the database and issued

Type: \copyright for distribution terms
\h for help with SQL commands
\? for help on internal slash commands
\g or terminate with semicolon to execute query
\q to quit

links=# select * from home_page;

I then used pg_dump to dump this table out to a backup file. The prospect of doing the above for each table made me cringe so I decided to risk trying something on the original database. I do not recommend you taking this short cut if you are really that worried about your data. I have the luxury in my case that the data is neither critical or really important. If you value your data don't do this to the original repository do it somewhere else first.

I shut down the recovery instance. I then ran

:~$ pg_resetxlog -x 90812030 -l 0x58,0x25 /var/lib/postgres/data

against the original database. I restarted the database using

/etc/init.d/postgresql start

I logged into the database and I was now able to see all the tables etc. I ran a few select statement and everything looked fine. I then logged out and

pg_dump links | gzip > links_dump.gz

This dumped the entire database out.

I then created another directory for a new database as follows.

initdb /var/lib/postgres/new_database

Remember to edit the new config files ie postgresql.conf. You might want to add a higher setting to checkpoint_segments, I used 15. I then ran

cat /var/lib/postgres/full_dump_links.gz | gunzip | psql links

This finished with the following errors.

ERROR: there is no unique constraint matching given keys for referenced table "home_page"
ERROR: insert or update on table "rev_index" violates foreign key constraint "lexicon_id_fk"
DETAIL: Key (lexicon_id)=(22342) is not present in table "lexicon".
ERROR: there is no unique constraint matching given keys for referenced table "home_page"

There are basically some constraints broken. I had some errors indicating that a few of the unique indexes and primary keys could not be created. This is not really concerning me too much since this is easily remedied. Everything after this point is simple administration tasks ie dropping and creating tables etc and fixing the broken constraints. I also had to check the integrity of the data which is simple enough.

I upgraded from kernel 2.6.5 to kernel 2.6.9 because I was getting DMA errors when ripping CD's to disc. I was also getting major errors with the SATA disk when copying the CD's to my mp3 player so I have bit the bullet and decided to try and upgrade.

During boot I came across what I thought was some king of bug. When I rebooted the kernel fsck complained about a bad file system, no indication as to which device just the error message.

I logged back in and my SATA disk was gone. It had not been mounted during boot which was a bit of a bummer. I had a look at the dmesg output and lo and behold it is now a SCSI device and my three old mount points are now invalid. I am using the VIA controller ie

CONFIG_SCSI_SATA_VIA=y

in my config. I find these changes very disconcerting but then I am not a kernel hacker. I wonder if there is an easy way to see changes like this without having to read through tons of Change logs. Its even worse when you are jumping several versions.

This error manifests itself when using grip to encode some CD's to ogg. When the CD is scratched it takes a long time to rip it to disk so I would normally set these aside and do them later. When I tried to abort the ripping grip fails with an application error and if I check the logs I see

kernel: cdrom: dropping to single frame dma

ripping after this point fails at around 90% of each track. I have hunted high and low for a solution and so far I have not found one. I am using a SCSI cd burner

'YAMAHA ' 'CRW2100S ' '1.0H' Removable CD-ROM

For the time being one solution that I seen mentioned was to reload the kernel module for the CD drive. This is not easy for me because mine was compiled into the kernel. I decided to download 2.6.8 and see if it works instead. I compiled my SCSI card driver as a module just in case anything went wrong.

Something weird did happen. When I booted into the new kernel I was unable to mount the H320 USB device which is a bit of a bummer. Worse than that was XFree86 started using between 60 and 90 percent CPU. Something is definitely not right with the the new kernel. I decided to recompile the kernel with the SCSI device built in in the off chance that it may have caused this but when I booted back in everything was fine for a few minutes and then X went mad again. I was still unable to see the USB device either. I was also unable to mount my SATA drive either which is where the CD collection is stored so I am switching back to the 2.6.5 kernel and will just reboot when it happens.

I just bought two of these and decided to get them working on Linux. This is a very rough guide on how to get it running, it is not a guide on how to compile a kernel. For Debian I wrote a page on Compiling a kernel for Debian that you could use as a guide but for other systems see the Kernel Rebuild Guide

First off these are USB Mass storage devices so you need to have USB enabled properly in your kernel. The appropriate options that I had to add to my kernel config file are as follows.

For those that don't know what the kernel config file is this is the file that is used to configure the kernel ;) When I recompiled my kernel I used

make menuconfig

this edits the config file before you compile and install the kernel. After running "make menuconfig" then go to drivers and at the bottom you should see USB device option select this and then select the devices you have on your machine.

To see what devices are on your machine you need to enable them in your BIOS and then you can use

lspci -v | grep HCI

to have a look at what USB controller your motherboard or PCI card is using. Mine was running a VIA controller.

To get the usb to appear when you plug it in you need to have the hotplug scripts installed. On Debian this is a simple

apt-get install hotplug

and thats sorted. I also added the following to my fstab file so that I can browse the devices.

/dev/sda1 /mnt/usb vfat defaults,auto,user,sync 0 0

That was it. I now have the iRiver H320 on my machine and it looks like a 20Gb hard drive. Now to get my CD Collection converted to OGG's.