PostgreSQL saved my life (once more...)

Imagine you have a database, there you have a table and inside that table you have a date field. You have populated the database and now you have more than 1000 records inside that table, something like:

Now imagine that, for some reasons, that fecha_nacimiento field has to change into a varchar field and that all those date entries should appear with only their year value (without month and day).

Probably the first thing you thought when reading the text above was:

"hey, that's pretty easy, just write a simple python/perl/php/whatever script that connects to the database, then get the records from the table, then parse them, the put them back into the table"

Well, you were right, and it would take a small ammount of time to write that script (taking care that you will need the language, the posgtresql connector for that given language and the knowledge to write it so quickly), but there is an even quicker way to do that, using PostgreSQLfunctions directly!

There, we are telling PostgreSQL that it has to update the beyle_autor table, setting fecha_nacimiento as the YEAR part from a datetime object. We need to use to_date because the value we get from the database is already a varchar (aka character varying) and we need to convert it to a date/datetime value before passing it to EXTRACT.

Using that single query, all the records inside the table that match the given query (in that example I needed to keep complete dates that do not begin with 01/01, first of january), will be truncated and saved as only the year value. Easy, isn't it?

PostgreSQL database migration, the SQL_ASCII to UTF8 problem

Another reminder to myself, as I've faced this situation too many times (not too much lately, as I've switched definetely to UTF-8 in my database servers but I still have some old ones that fit perfectly for this post).

In order to migrate a database from one PostgreSQL server to another one (even if there is a version mismatch, for example, from the 7.x branch to the 8.x branch or between 8.1 and 8.3) all you need is pg_dump. With pg_dump you will be able to create a SQL dump of a whole database, which could be later imported into another database.

For example:

pg_dump openbsd_es > openbsd_es.sql

This way I've created a dump of a database called openbsd_es (used in the past for the defunct effort to keep the work on the .es translation of OpenBSD documentation). The dump was called openbsd_es.sql and I could import it in another PostgreSQL server, in a database with whatever name it has.

pg_dump has a number of options that will help you creating your database dump. For example, you can create a dump that will create full-insert sql statements using pg_dump this way:

pg_dump -D openbsd_es > openbsd_es.sql

(pretty useful if you need to load the sql file into MySQL, SQLite or any other database system)

Then, you only have to use psql to load the sql file back into another database:

psql openbsd_es_clon < openbsd_es.sql

At this point it is posible that you've see an error on screen, something like:

ERROR: invalid byte sequence for encoding "UTF8": 0xf36e2020
HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".

AHA!, the fscked UTF-8 error!!!

This is caused by a mismatch in your database encodings, surely because the database from where you got the SQL dump was encoded as SQL_ASCII while the new one is encoded as UTF8.

You can get some information using psql with the -l option. In the old server:

psql -l
[ stripped ]
openbsd_es | wu | SQL_ASCII
[ stripped ]

In the new one:

psql -l
[ stripped ]
openbsd_es_clon | wu | UTF8
[ stripped ]

Of course you can have databases with different encodings in the same PostgreSQL server, but it is not recommended (if your server data directory was created with UTF8 as its default encoding, you should stick to that for all of your databases).

So, get back to our sql file (openbsd_es.sql). We've created it from a SQL_ASCII encoded database, which results in something like:

As our new database is UTF8 encoded, it would be nice of we could change quickly the encoding of the sql file, just to import it without all those errors.

Here is where recode fits in this post. Recode is a small tool from the GNU project that let you change on-the-fly the encoding of a given file. For example, to change our sql file into a suitable UTF-8 encoding, we could call recode just like this:

Just one more thing to do before importing the sql file into the new database, open the new UTF-8 encoded sql file with your favourite editor and change the line:

SET client_encoding = 'SQL_ASCII';

to:

SET client_encoding = 'UTF8';

as this setting will set the encoding for the psql command before executing the SQL statements within the sql file.

Now, you can import the sql file as usual with (hopefully) no errors this time:

psql openbsd_es_clon < openbsd_es_utf8.sql

(interesting: you can install recode from ports in both FreeBSD and OpenBSD, /usr/ports/converters/recode in both BSD flavours. In linux it seems that it will be installed by default in some distributions, install it from packages if not.)

Accesing PostgreSQL objects information using SQL

Yesterday, one of my colleagues at work asked me an interesting question:

hey!, how could I know the number of fields a view has, in PostgreSQL?

The answer was pretty easy:

sure, just go into psql and write \d view_name.

That \d thing is a shortcut you can use to get information from any object located inside the db, while in psql (for those of you more familiar with MySQL, it is like the describe table_name command). Let me show you an example of \d table_name:

But I was assuming that he needed to know such information by himself, which was wrong. He needed to get the number of fields that view has, within some python code (aka not using psql).

We searched a little bit and finally we found how to do it.

When you go into psql, there is an option -E, which means (from the psql man page):

-E
--echo-hidden
Echo the actual queries generated by \d and other backslash com-
mands. You can use this to study psql's internal operations.
This is equivalent to setting the variable ECHO_HIDDEN from
within psql.

So, calling psql with the -E option will means we will be able to see the SQL sentence PostgreSQL is using behind the scenes to get such information, in the previous example something like:

Which is pretty much like the needs from my colleague. He could parse the results from such query to gather the information he needed or he could work a little bit more to get a more-refined sql sentence.

NOTE: My colleague needed a view information, I'm using a table as the example for this post, but the procedure would be pretty similar for a view, a sequence or any other type of object available in our db.