pg_upgrade and shared libraries

I ran pg_upgrade on a dev cluster of mine and it halted on some shared libraries I had forgot to install for the new cluster.

Most of them were plain contrib that I'll continue to use so no problem there but there were some testing code I had been playing around with. I didn't know in which of my test databases I had used it so I had to find out somehow.

This is what I came up with, a query to list what shared libraries are used in a database. With that information I could drop the functions depending on shared libraries I no longer used.

EnterpriseDB + Red Hat = <3

Since Red Hat once manage to bring Linux to the room of Enterprise IT I'm hoping this would increase the acceptance of PostgreSQL adoption in the Enterprise world. We all know how slowly the Enterprise wheels are turning... :)

PostgreSQL 8.4 released

generate_series to cover for missing data in reports

Problem: Generating a nice report grouped by date while some dates are missing data. (I admit the topic indicates something a little bit more exciting...)

Previously I have solved this kind of problem in the client code or by having some sort of calendar table to join with. This time I don't have any client code (yet) and I have no other tables to use. Lucky me I'm using PostgreSQL since it has the generate_series function.

So basically all I have to do is a RIGHT OUTER JOIN with generate_series as dates:

As stated earlier I have usually solved this problem in client code. I recall doing so both when using MSSQL and Oracle although I haven't worked with them for a long while now. I can't recall ever having done any similar reports while using MySQL.

I'm curious about the current state of Oracle, MSSQL and MySQL since I'm not actively using them. Is there something similar as generate_series? What about DB2?

Follow up: Vacuum Full progress

Ok, to follow up my recent post I created a small example of how the progress indication could be done. The example is very simple and is only a proof of concept, nothing more. There is no support for multiple file nodes, only relation and index is considered, etc.

Estimate full vacuum time

Tonight during an upgrade of our systems I was doing a full vacuum on a couple of tables and I started to wonder how long it would take.

After poping the question on #postgresql@freenode Steve Atkins (I think it was) suggested I could get kind of a feeling of progress by looking at the file nodes.

So I did and it gave sort of a feeling about what stuff was happening but it was kind of a hassle to do. (I had one psql open and one xterm looking at the relfileno in one and the file node modification time in the other.)

This got me thinking that if I get a list of file nodes involved before I start the (full) vacuum I should be able to keep track of progress by using something like inotify. The idea is still a bit rough and I need to look into exactly what goes on during a full vacuum (like when are things written to disk) but I still think it's doable.

Would this kind of approach be useful for anything or am I just rambling? (After all it is 4:20am here now...)

As can see with the comparable runs there is pretty much no difference here. I would guess the difference starts to show with big datasets.

Oh, and if you go with the pre-stored solution don't forget to create a trigger on insert and update that updates the tsvector.

I also tested a forth approach by storing the tsvector in it's own table and then join the original table when searching. That approach also needs the triggers for insert and update but also for delete. I'm not sure if this approach has some significant to offer but it was still fun to test.

Since we're calling memcache from within triggers it is no good when the triggers starts to break on the account of exceptions being raised. So to deal with this we could, in the case of *_replace catch others (to catch all exceptions) or internal_error (to catch the specific exception thrown by pgmemcached) or change the trigger to use *_set function insted since it behaves the same in both cases.

Is it wise to use *_set in this case? Every block using exception handling will create a subtransaction hence it could have an inpact on performance, especially when used inside triggers.

I have yet to run some perfomance testing to see what impact *_set vs *_replace with exception handling will have on our database.

The lessons learned today was: BEGIN ... EXCEPTION and to use \set VERBOSITY 'verbose' to get the exact error code returned by a third party library (in our case XX000 which is internal_error). I also picked up others since I was unaware of it.