… getting the wrong answer fast is not an improvement over the right answer slow

Category Archives: Databases

Last week whilst trying to get to grips with SQL Server AlwaysOn Failover Clusters, I set up a simple iSCSI target using the “iscsitarget” package as per the Debian docs. However when trying to validate the cluster in WSFC (Windows Server Failover Clustering) the disk checks failed with:

“does not have the inquiry data (SCSI page 83h VPD descriptor) that is required by failover clustering”

This has something to do with the scsiId, which is required by the cluster manager to control volume ownership, being supplied by iscsitarget in a format unsupported by WSFC.

I failed to find a workaround for this and instead switched to using “tgt” to serve the iSCSI targets. I was pushed for time, and couldn’t find a straightforward guide so I’m documenting my steps here.

This is not an issue if you want to connect to a utf8 database, but the issue I had this morning was connecting to a latin1 database with psql from a Windows client (something I do rarely). If I set the codepage to utf8 to match client encoding, I got a “Not enough memory.” error:

I could set the codepage to 1252, but that would mean my setting for client_encoding would be a lie, and if I were to then revert to set client_encoding=’WIN1252′ I’d have come full circle and be back at the “FATAL: conversion between WIN1252 and LATIN1 is not supported” error message.

A quick google revealed these bugreports with no solutions. Another dig at the docs revealed the following passage:

pager

Controls use of a pager program for query and psql help output. If the environment variable PAGER is set, the output is piped to the specified program. Otherwise a platform-dependent default (such as more) is used.

It’s generally a bad idea to do logical dump/restores of slony nodes, and for this reason slony provides the CLONE PREPARE/CLONE FINISH action commands to clone subscriber nodes.

In this instance though, I’ve a test environment where I’d stopped the slons, dumped out and dropped a subscriber database and then gone on to do some other testing with that postgres cluster. Sometime later I want to do some more testing with the slony cluster; I never dropped this node from the slony config, but in the meantime I’ve changed the postgres version from 9.0 to 9.4 and recreated the postgres cluster with initdb. Schema wise nothing has changed with the old nodes.

What follows is some fudge to make the node resume with replication, it’s neither recommended nor guaranteed to work. Copy this process at your peril.

After recompiling the same version of slony I had before (2.2.4) and installing the binaries, I’ve just restored the old subscriber database dump into the new postgres 9.4 cluster and updated my extensions.

So what’s holding me off just restarting the slons now? Firstly slony stores the oid of each table participating in replication and these will surely have changed now. The following query produces a long list of tables where this is true:

For clarity, my resurrected node is ID 1, and was a subscriber at the time it was destroyed.

Secondly slony stores the transaction ID for each replication event against the node, and if my resurrected node has a lower current xid then new events are going to look like they’re in the past compared to events that have already been generated against the node. I can see the current xid of my new postgresql cluster with the following query:

TEST=# SELECT txid_current();
txid_current
--------------
25765

And I can compare this to slony events generated against the node in its old incarnation as follows:

So transaction ID wise my node is in the past, to bump that up I need about 63237 transactions to happen, if that were a lot higher I’d have to think of another way (perhaps by stopping all the slons and updating all values of ev_snapshot for node 1 on all nodes), but I can easily generate 60k transactions by hitting txid_current():