The Build » PostgreSQLhttp://thebuild.com/blog
programming, etc.Wed, 04 Feb 2015 05:53:00 +0000http://wordpress.org/?v=2.8.2enhourly1Logical Decoding and JSON Talks at FOSDEMhttp://thebuild.com/blog/2015/02/03/logical-decoding-and-json-talks-at-fosdem/
http://thebuild.com/blog/2015/02/03/logical-decoding-and-json-talks-at-fosdem/#commentsWed, 04 Feb 2015 05:53:00 +0000Xofhttp://thebuild.com/blog/?p=518The slides for my talks on logical decoding and the state of the art in JSON are now available on-line.
]]>http://thebuild.com/blog/2015/02/03/logical-decoding-and-json-talks-at-fosdem/feed/0When LIMIT attackshttp://thebuild.com/blog/2014/11/18/when-limit-attacks/
http://thebuild.com/blog/2014/11/18/when-limit-attacks/#commentsWed, 19 Nov 2014 06:36:54 +0000Xofhttp://thebuild.com/blog/?p=511One common source of query problems in PostgreSQL results an unexpectedly-bad query plan when a LIMIT clause is included in a query. The typical symptom is that PostgreSQL picks an index-based plan that actually takes much, much longer than if a different index, or no index at all, had been used.

Here’s an example. First, we create a simple table and an index on it:

PostgreSQL doesn’t keep correlated statistics about columns; each column’s statistics are kept independently. Thus, PostgreSQL made an assumption about the distribution of values of i in the table: they were scattered more or less evenly throughout. Thus, walking the index backwards meant that, to get 10 “hits,” it would have to scan about 100 index entries… and the index scan would be a big win.

It was wrong, however, because all of the i=1 values were clustered right at the beginning. If we reverse the order of the scan, we can see that was a much more efficient plan:

A CTE is an “optimization fence”: The planner is prohibited from pushing the ORDER BY or LIMIT down into the CTE. In this case, that means that it is also prohibited from picking the index scan, and we’re back to the sequential scan.

So, when you see a query come completely apart, and it has a LIMIT clause, check to see if PostgreSQL is guessing wrong about the distribution of data. If the total number of hits before the LIMIT are relatively small, you can often use a CTE to isolate that part, and only apply the LIMIT thereafter. (Of course, you might be better off just doing the LIMIT operation in your application!)

]]>http://thebuild.com/blog/2014/11/18/when-limit-attacks/feed/6“Be Very Afraid: Backup and Disaster Planning” at PGConf.EUhttp://thebuild.com/blog/2014/10/24/be-very-afraid-backup-and-disaster-planning-at-pgconf-eu/
http://thebuild.com/blog/2014/10/24/be-very-afraid-backup-and-disaster-planning-at-pgconf-eu/#commentsFri, 24 Oct 2014 08:11:41 +0000Xofhttp://thebuild.com/blog/?p=509Slides from my talk, Be Very Afraid: Backup and Disaster Planning, are now available.
]]>http://thebuild.com/blog/2014/10/24/be-very-afraid-backup-and-disaster-planning-at-pgconf-eu/feed/5“Finding and Repairing Database Corruption” at PGConf.EUhttp://thebuild.com/blog/2014/10/23/finding-and-repairing-database-corruption-at-pgconf-eu/
http://thebuild.com/blog/2014/10/23/finding-and-repairing-database-corruption-at-pgconf-eu/#commentsThu, 23 Oct 2014 12:37:25 +0000Xofhttp://thebuild.com/blog/?p=506Slides from my talk, Finding and Repairing Database Corruption, are now available.
]]>http://thebuild.com/blog/2014/10/23/finding-and-repairing-database-corruption-at-pgconf-eu/feed/0Djangocon 2014 US CFP Closes Soonhttp://thebuild.com/blog/2014/07/01/djangocon-2014-us-cfp-closes-soon/
http://thebuild.com/blog/2014/07/01/djangocon-2014-us-cfp-closes-soon/#commentsWed, 02 Jul 2014 02:48:16 +0000Xofhttp://thebuild.com/blog/?p=492Just a reminder that the Djangocon US 2014 Call for Proposals ends July 15, 2014… we would love your talks and tutorials!
]]>http://thebuild.com/blog/2014/07/01/djangocon-2014-us-cfp-closes-soon/feed/0The Djangocon US 2014 Call for Proposals is open!http://thebuild.com/blog/2014/06/11/the-djangocon-us-2014-call-for-proposals-is-open/
http://thebuild.com/blog/2014/06/11/the-djangocon-us-2014-call-for-proposals-is-open/#commentsThu, 12 Jun 2014 01:00:31 +0000Xofhttp://thebuild.com/blog/?p=490As program chair this year, I am very pleased to announce that the Call for Proposals for Djangocon US 2014 is now open! It only runs through the end of June, so be sure to get them in promptly!
]]>http://thebuild.com/blog/2014/06/11/the-djangocon-us-2014-call-for-proposals-is-open/feed/0“The Worst Day of Your Life” at Nordic PostgreSQL Day 2014http://thebuild.com/blog/2014/03/11/the-worst-day-of-your-life-at-nordic-postgresql-day-2014/
http://thebuild.com/blog/2014/03/11/the-worst-day-of-your-life-at-nordic-postgresql-day-2014/#commentsTue, 11 Mar 2014 18:01:04 +0000Xofhttp://thebuild.com/blog/?p=488I’ll be presenting “The Worst Day of Your Life” (dealing with major PostgreSQL disasters) at Nordic PostgreSQL Day 2014. My first trip to Scandinavia!
]]>http://thebuild.com/blog/2014/03/11/the-worst-day-of-your-life-at-nordic-postgresql-day-2014/feed/0“Really, Really Fast Django” at Djangocon EUhttp://thebuild.com/blog/2014/02/27/really-really-fast-django-at-djangocon-eu/
http://thebuild.com/blog/2014/02/27/really-really-fast-django-at-djangocon-eu/#commentsThu, 27 Feb 2014 19:35:44 +0000Xofhttp://thebuild.com/blog/?p=485I’ll be presenting “Really, Really Fast Django” at Djangocon EU. Do come! It’s on a private island!

]]>http://thebuild.com/blog/2014/02/27/really-really-fast-django-at-djangocon-eu/feed/1“The Worst Day of Your Life” at FOSDEM PgDay 2014http://thebuild.com/blog/2014/02/01/the-worst-day-of-your-life-at-fosdem-pgday-2014/
http://thebuild.com/blog/2014/02/01/the-worst-day-of-your-life-at-fosdem-pgday-2014/#commentsSat, 01 Feb 2014 17:41:29 +0000Xofhttp://thebuild.com/blog/?p=483FOSDEM PgDay 2014 was not, in fact, the worst day of my life; it was a great deal of fun. However, the slides from my presentation, The Worst Day of Your Life, dealing with PostgreSQL disasters, are now available.
]]>http://thebuild.com/blog/2014/02/01/the-worst-day-of-your-life-at-fosdem-pgday-2014/feed/0“JSON in PostgreSQL” at SCALE 12xhttp://thebuild.com/blog/2014/01/28/json-in-postgresql-at-scale-12x/
http://thebuild.com/blog/2014/01/28/json-in-postgresql-at-scale-12x/#commentsTue, 28 Jan 2014 19:10:56 +0000Xofhttp://thebuild.com/blog/?p=480I’ll be presenting “JSON in PostgreSQL” at SCALE 12x in Los Angeles on Friday, February 21, 2014. Do come!
]]>http://thebuild.com/blog/2014/01/28/json-in-postgresql-at-scale-12x/feed/0“PostgreSQL Proficiency for Python People” at PyCon 2014http://thebuild.com/blog/2014/01/26/postgresql-proficiency-for-python-people-at-pycon-2014/
http://thebuild.com/blog/2014/01/26/postgresql-proficiency-for-python-people-at-pycon-2014/#commentsMon, 27 Jan 2014 01:09:36 +0000Xofhttp://thebuild.com/blog/?p=478I’ll be giving a tutorial at PyCon 2014 in Montréal. Register early and often!
]]>http://thebuild.com/blog/2014/01/26/postgresql-proficiency-for-python-people-at-pycon-2014/feed/3django-pglocks 1.0.2 releasedhttp://thebuild.com/blog/2014/01/14/django-pglocks-1-0-2-released/
http://thebuild.com/blog/2014/01/14/django-pglocks-1-0-2-released/#commentsTue, 14 Jan 2014 18:25:41 +0000Xofhttp://thebuild.com/blog/?p=476django-pglocks 1.0.2 has been pushed out. This fixes a bug that Aymeric Augustin found that keep a lock from being properly released. He also contributed a test framework to it… many thanks to him for that!
]]>http://thebuild.com/blog/2014/01/14/django-pglocks-1-0-2-released/feed/0Speaking at FOSDEM 2014http://thebuild.com/blog/2014/01/07/speaking-at-fosdem-2014/
http://thebuild.com/blog/2014/01/07/speaking-at-fosdem-2014/#commentsTue, 07 Jan 2014 21:51:05 +0000Xofhttp://thebuild.com/blog/?p=473I’ll be speaking at FOSDEM 2014 in Brussels! I’ll be giving two talks:

“The Worst Day of Your Life” (on finding and recovering from PostgreSQL data corruption and bugs) on January 31st at the Raddison Blu hotel.

“Real-Life PostgreSQL JSON” on February 1 in the PostgreSQL devroom at the main FOSDEM conference.

]]>http://thebuild.com/blog/2014/01/07/speaking-at-fosdem-2014/feed/0PostgreSQL Replication Bughttp://thebuild.com/blog/2013/12/01/postgresql-replication-bug/
http://thebuild.com/blog/2013/12/01/postgresql-replication-bug/#commentsSun, 01 Dec 2013 08:22:56 +0000Xofhttp://thebuild.com/blog/?p=470There’s a very unpleasant replication issue in version 9.0.14, 9.1.10, 9.2.5, 9.3.0 and 9.3.1 of PostgreSQL. Be sure to read the linked wiki entry carefully; it can result in silent data corruption on secondary servers in replication sets.
]]>http://thebuild.com/blog/2013/12/01/postgresql-replication-bug/feed/0JSON vs hstore: Which will get you into a cool bar in the Mission?http://thebuild.com/blog/2013/07/03/json-vs-hstore-which-will-get-you-into-a-cool-bar-in-the-mission/
http://thebuild.com/blog/2013/07/03/json-vs-hstore-which-will-get-you-into-a-cool-bar-in-the-mission/#commentsWed, 03 Jul 2013 15:18:11 +0000Xofhttp://thebuild.com/blog/?p=464After all, isn’t that the most important issue?

Craig Kerstiens has a good overview of what the two types are. I wanted to give a quick and dirty flowchart as to which one I would use in a green-field development situation.

First, hstore performs better than JSON in nearly every situation, but performance is (as I wrote on Twitter) just one color in a rainbow of annoying decision points you have to make when building a system.

So, what would I do?

If I am accepting outside data as JSON and want to store it in its hierarchic form, it would require a very clear performance case for me to want to take the time to convert it to hstore.

If I am generating the data internally (rather than accepting it in its near-complete form), and the hierarchy is not deep, I’d use hstore for its faster performance.

If retaining the types of the values is essential, JSON is the choice, since hstore stores everything as a string.

If fully-arbitrary querying on key/value pairs is essential, hstore’s current indexing situation is far superior to JSON’s.

If you only query on a very small number (1-3) of keys at the top level, it’s probably better to build b-tree indexes over that key’s value than use GiST or GIN indexes; hstore does not have a significant advantage over JSON then.

As JSON support in PostgreSQL matures, this calculus will change, perhaps significantly. But that’s the 9.2 situation, in my view.

]]>http://thebuild.com/blog/2013/07/03/json-vs-hstore-which-will-get-you-into-a-cool-bar-in-the-mission/feed/8PostgreSQL Advisory Locks in Django Made Easyhttp://thebuild.com/blog/2013/07/02/postgresql-advisory-locks-in-django-made-easy/
http://thebuild.com/blog/2013/07/02/postgresql-advisory-locks-in-django-made-easy/#commentsTue, 02 Jul 2013 13:27:26 +0000Xofhttp://thebuild.com/blog/?p=460Advisory locks are a very useful feature in PostgreSQL, and they just aren’t used enough.

Here’s a scenario: You have a bulk import job. While that job is running, there’s an analysis job you want to prevent from starting, and you don’t want a bulk import to start while the analysis job is running. But any number of bulk importers can run at the same time. How do you communicate this?

With an advisory lock!

Each of the bulk importers can take a shared advisory lock. Those locks don’t block each other, so they can run freely. But the analysis job takes an exclusive advisory lock of the same type. It will wait until all the importers are done, and the importers will not be issued their shared locks until it completes.

Even better, PostgreSQL cleans them up for you when a session terminates; you don’t have to worry about a lock lingering when you didn’t mean it to.

To make using advisory locks easier in Django, I have a small context manager than can be used to wrap code that should run with an advisory lock held; it makes it much easier to use them in your application. You can find it on GitHub, and it installs using pip.

]]>http://thebuild.com/blog/2013/07/02/postgresql-advisory-locks-in-django-made-easy/feed/4PostgreSQL as a Schemaless Databasehttp://thebuild.com/blog/2013/07/02/postgresql-as-a-nosql-database/
http://thebuild.com/blog/2013/07/02/postgresql-as-a-nosql-database/#commentsTue, 02 Jul 2013 11:51:55 +0000Xofhttp://thebuild.com/blog/?p=457My presentation from FOSDEM 2013, PostgreSQL as a Schemaless Database, is now posted (sorry for the delay!).
]]>http://thebuild.com/blog/2013/07/02/postgresql-as-a-nosql-database/feed/0‘Rollback’ exception added to Xacthttp://thebuild.com/blog/2013/06/23/rollback-exception-added-to-xact/
http://thebuild.com/blog/2013/06/23/rollback-exception-added-to-xact/#commentsSun, 23 Jun 2013 18:32:33 +0000Xofhttp://thebuild.com/blog/?p=454I often find that I’m in the middle of a loop or something and discover an error. I want to exit the loop in a way that causes the database work I’ve done within it to be rolled back, but I don’t want that exception to propagate further.

Having noticed this pattern a lot, I’ve added it as a feature to Xact. Xact defines a Rollback exception. It processes it like any other exception, but then swallows it and normally exits the function or with clause. If you feel motivated, you can subclass Rollback, although the utility of that escapes me at the moment.

When Django 1.6 is released, Xact will be deprecated in favor of new functionality there… but for now, have fun with it!

]]>http://thebuild.com/blog/2013/06/23/rollback-exception-added-to-xact/feed/1DjangoCon US 2013 CFP is openhttp://thebuild.com/blog/2013/05/15/djangocon-us-2013-cfp-is-open/
http://thebuild.com/blog/2013/05/15/djangocon-us-2013-cfp-is-open/#commentsThu, 16 May 2013 06:59:31 +0000Xofhttp://thebuild.com/blog/?p=452The Call for Papers for DjangoCon US 2013 is now open.
]]>http://thebuild.com/blog/2013/05/15/djangocon-us-2013-cfp-is-open/feed/0psycopg2 2.5 releasedhttp://thebuild.com/blog/2013/04/07/psycopg2-2-5-released/
http://thebuild.com/blog/2013/04/07/psycopg2-2-5-released/#commentsSun, 07 Apr 2013 20:17:41 +0000Xofhttp://thebuild.com/blog/?p=450psycopg2, the Python PostgreSQL interface library, is now up to version 2.5. This includes built-in support for the JSON and range types… yay!
]]>http://thebuild.com/blog/2013/04/07/psycopg2-2-5-released/feed/0The Text Type… If That’s Your Real Name.http://thebuild.com/blog/2013/03/13/the-text-type-if-thats-your-real-name/
http://thebuild.com/blog/2013/03/13/the-text-type-if-thats-your-real-name/#commentsThu, 14 Mar 2013 04:14:02 +0000Xofhttp://thebuild.com/blog/?p=442tl;dr: Don’t give tables the same name as base PostgreSQL types, even though it will let you.

While I was presenting the talk, a client sent me email wondering why a pg_restore of an expression index was failing, because the return type text was not the same as pg_catalog.text. OK, that’s strange!

What had happened is that the database has a table with the name text, which PostgreSQL will happily let you do:

Needless to say, this isn’t a great idea, because although PostgreSQL seems to keep them straight most of the time, there are times (like a pg_restore processing an expression index) that it can get confused.

So, don’t do this.

]]>http://thebuild.com/blog/2013/03/13/the-text-type-if-thats-your-real-name/feed/5PostgreSQL, Python and Squid at PyPgDay 2013http://thebuild.com/blog/2013/03/13/postgresql-python-and-squid-at-pypgday-2013/
http://thebuild.com/blog/2013/03/13/postgresql-python-and-squid-at-pypgday-2013/#commentsWed, 13 Mar 2013 23:57:31 +0000Xofhttp://thebuild.com/blog/?p=437The slides for my presentation, “PostgreSQL, Python and Squid” (otherwise known as, “using Python in PostgreSQL and PostgreSQL from Python”) presented at PyPgDay 2013 at PyCon 2013, are available for download.
]]>http://thebuild.com/blog/2013/03/13/postgresql-python-and-squid-at-pypgday-2013/feed/0You Cannot Recover From the Loss of a Tablespacehttp://thebuild.com/blog/2013/03/10/you-cannot-recover-from-the-loss-of-a-tablespace/
http://thebuild.com/blog/2013/03/10/you-cannot-recover-from-the-loss-of-a-tablespace/#commentsSun, 10 Mar 2013 23:17:00 +0000Xofhttp://thebuild.com/blog/?p=429tl;dr: Each and every tablespace is critical to the operation of your PostgreSQL database. If you lose one, you’ve lost the entire database.

This one can be short and sweet: If you use tablespaces in PostgreSQL, each and every one of them is a critical part of your database. If you lose one, your database is corrupted, probably irretrievably. Never, ever think that if you lose a tablespace, you’ve just lost the data in the tables and indexes on the tablespace; you’ve lost the whole database.

In a couple of cases, clients have had what they thought were clever uses of tablespaces. In each case, they could have lead to disaster:

“We’ll keep cached and recent data on a tablespace on AWS instance storage on SSDs, and the main database on EBS. If the instance storage goes away, we’ll just recreate that data from the main database.”

“We’ll keep old historical data on a SAN, and more recent data on local storage.”

In each case, there was the assumption that if the tablespace was lost, the rest of the database would be intact. This assumption is false.

Each and every tablespace is a part of your database. They are the limbs of your database. PostgreSQL is an elephant, not a lizard; it won’t regrow a limb it has to leave behind. Don’t treat any tablespace as disposable!

]]>http://thebuild.com/blog/2013/03/10/you-cannot-recover-from-the-loss-of-a-tablespace/feed/9Why you want WAL archiving as well as streaming replicationhttp://thebuild.com/blog/2012/10/04/why-you-want-wal-archiving-as-well-as-streaming-replication/
http://thebuild.com/blog/2012/10/04/why-you-want-wal-archiving-as-well-as-streaming-replication/#commentsThu, 04 Oct 2012 20:04:25 +0000Xofhttp://thebuild.com/blog/?p=427A client of ours recently had me log into their server to set up a tablespace scheme for them. While I was in, I noticed that the secondary of the streaming replication pair wasn’t connecting to the primary. A quick check showed that the primary had been moved from one internal IP address to another, and in doing so everything had been updated except the pg_hba.conf file… so the secondary wasn’t able to connect.

This had happened several weeks prior.

The good news is that in addition to streaming replication, we had set up WAL archiving from the primary to the secondary, so the secondary was staying up to date using the WAL segments. We didn’t have to reimage the secondary; just fixing the pghba.conf and reloading the primary fixed the problem. Thanks to pgarchivecleanup, neither side was building up WAL segments.

There are several good reasons for including WAL archiving in your streaming replication setup. This kind of accidental problem is one of them.

]]>http://thebuild.com/blog/2012/10/04/why-you-want-wal-archiving-as-well-as-streaming-replication/feed/1“PostgreSQL When It’s Not Your Job” at DjangoCon UShttp://thebuild.com/blog/2012/09/04/postgresql-when-its-not-your-job-at-djangocon-us/
http://thebuild.com/blog/2012/09/04/postgresql-when-its-not-your-job-at-djangocon-us/#commentsTue, 04 Sep 2012 20:15:42 +0000Xofhttp://thebuild.com/blog/?p=423I presented “PostgreSQL When It’s Not Your Job” at DjangoCon US 2012, and you can get the slides here.
]]>http://thebuild.com/blog/2012/09/04/postgresql-when-its-not-your-job-at-djangocon-us/feed/7