PostgreSQL has three shutdown modes: smart, fast, and
immediate. For many years, the default has been "smart", but
Bruce Momjian has just
committed a patch to change the default to "fast" for
PostgreSQL 9.5. In my opinion, this is a good thing; I have
complained about the current, and agreed with others complaining
about it, many times, at least as far back as
December of 2010. Fortunately, we now seem to have now
achieved consensus on this change.

I've blogged before about how the buildfarm client software can be
useful for developers amd reviewers. Yesterday was a perfect
example. I was testing a set of patches for a bug fix for
pg_upgrade running on Windows, and they go all the way back to the
9.0 release. The simplest way to test these was using a buildfarm
animal. On jacana, I applied the relevant patch in each branch
repo, and then simply did this to build and test them all:

Pretty simple! The commands are shown here on multiple lines for
clarity, but in fact I wrote each set on one line, so after
applying the patches the whole thing took 2 lines. (Because jacana
only builds back to release 9.2, I had to repeat the process on
frogmouth for 9.0 and 9.1, using the same process).

I have a data analytics project which produces multiple statistical
metrics for a large volume of sensor data. This includes
percentiles (like median and 90%) as well as average, min and
max. Originally this worked using PL/R, which was pretty good
except that some of the R modules were crashy, which was not so
great for uptime.

This is why, two years ago, I ripped out all of the PL/R and
replaced it with PL/Python and SciPy. I love SciPy because it
gives me everything I liked about R, without most of the things I
didn't like. But now, I've ripped out the SciPy as
well. What am I replacing it with? Well, SQL.

In version 9.4, Andrew Gierth added support for percentiles to
PostgreSQL via
WITHIN GROUP aggregates. As far as I'm concerned, this is
second only to JSONB in reasons to use 9.4.

Now, one of the more complicated uses I make of aggregates is doing
"circular" aggregates, that is producing percentiles for a set of
circular directions in an effort to determine the most common
facings for certain devices. Here's the
PL/Python function I wrote for this, which calculates circular
aggregates using the "largest gap" method. This algorithm
assumes that the heading measurements are essentially unordered, so
to find the endpoints of the arc we look for two measurements which
are the furthest apart on the circle. This means shifting the
measurements to an imaginary coordinate system where the edge of
this gap is the low measurement, calculating percentiles, and then
shifting it back. Note that this method produces garbage if
the device turned around a complete circle during the aggregate
period.

Now, that SciPy function was pretty good and we used it for quite a
while. But we were unhappy with two things: first, SciPy is
rather painful as a dependency because the packaging for it is
terrible; second, having PostgreSQL call out to SciPy for each
iteration isn't all that efficient.

So, since 9.4 has percentiles now, I started writing a function
based the built-in SQL percentiles. Initially I was thinking

High availability of PostgreSQL databases is incredibly
important to me. You might even say it’s a
special interest of mine. It’s one reason I’m both excited and
saddened by a feature introduced in 9.4. I’m Excited because it’s a
feature I plan to make extensive use of, and saddened because it
has flown under the radar thus far. It’s not even listed in the
What’s
new in PostgreSQL 9.4 Wiki page. If they’ll let me, I may have
to rectify that.

What is this mysterious change that has me drooling all over my
keyboard? The new
recovery_min_apply_delay standby server setting. In name and
intent, it forces a standby server to delay application of upstream
changes. The implications, however, are much, much more
important.

Let me tell you a story; it’s not long, actually. A couple years
ago, I had to help a client that was using a hilariously
over-engineered stack to prevent data loss. I only say that because
at first glance, the number of layers and duplicate servers would
shock most people, and the expense would finish the job. This was
one of my full recommended stacks, plus a few extra bits for the
truly paranoid. DRBD-bonded servers, Pacemaker failover, off-site
disaster recovery streaming clones, nightly backup, off-site backup
and historical WAL storage, and long-term tape archival in a vault
for up to seven years. You would need to firebomb several cities to
get rid of this data.

But data permanence and availability are not synonymous. All it
took was a single misbehaving CPU to take out the entire
constellation of database servers, and corrupt a bunch of recent
WAL files for good measure. How this is possible, and how difficult
it is to avoid, is a natural extension of using live streaming
replicas for availability purposes. We always need to consider one
important factor: immediacy applies to everything.

This week the following commit has landed in PostgreSQL code
tree, introducing a new feature that will be released in 9.5:

commit: cb1ca4d800621dcae67ca6c799006de99fa4f0a5
author: Tom Lane <tgl@sss.pgh.pa.us>
date: Sun, 22 Mar 2015 13:53:11 -0400
Allow foreign tables to participate in inheritance.
Foreign tables can now be inheritance children, or parents. Much of the
system was already ready for this, but we had to fix a few things of
course, mostly in the area of planner and executor handling of row locks.
[...]
Shigeru Hanada and Etsuro Fujita, reviewed by Ashutosh Bapat and Kyotaro
Horiguchi, some additional hacking by me

As mentioned in the commit message,
foreign tables can now be part of an inheritance tree, be it as
a parent or as a child.

Well, seeing this commit, one word comes immediately in mind:
in-core sharding. And this feature opens such possibilities with
for example a parent table managing locally a partition of foreign
child tables located on a set of foreign servers.

PostgreSQL offers some way to already do
partitioning by using CHECK constraints (non-intuitive system
but there may be improvements in a
close future in this area). Now combined with the feature
committed, here is a small example of how to do sharding without
the need of any external plugin or tools, only postgres_fdw
being needed to define foreign tables.

Now let's take the example of 3 Postgres servers, running on the
same machine for simplicity, using ports 5432, 5433 and 5434. 5432
will hold a parent table, that has two child tables, the two being
foreign tables, located on servers listening at 5433 and 5434. The
test case is simple: a log table partitioned by year.

First on the foreign servers, let's create the child tables.
Here it is for the table on server 5433:

On November 18th, 2015, we will have an independent, multi-track
conference all about high performance PostgreSQL: pgConf SV. This conference is being
organized by CitusData at
the South San Francisco Convention Center. Stay tuned for call for
presentations, sponsorships, and more details soon.

In PostgreSQL 9.5, we can see improved
performance for Index Scan on ">" condition.

In order to explain this optimization, consider the below
schema:

create table tbl2(id1 int, id2
varchar(10), id3 int);

create index idx2 on tbl2(id2,
id3);

Query as:

select count(*) from tbl2 where id2>'a' and id3>990000;

As per design prior to this patch, Above
query used following steps to retrieve index tuples:

Find the scan start position by searching first position in
BTree as per the first key condition i.e. as per id2>'a'

Then it fetches each tuples from position found in step-1.

For each tuple, it matches all scan key condition, in our
example it matches both scan key condition.

If condition match, it returns the tuple otherwise scan
stops.

Now problem is here that already first scan
key condition is matched to find the scan start position (Step-1),
so it is obvious that any further tuple also will match the first
scan key condition (as records are sorted).

So comparison on first scan key condition
again in step-3 seems to be redundant.

So we have made the changes in BTree scan
algorithm to avoid the redundant check i.e. remove the first key
comparison for each tuple as it is guaranteed to be always
true.

Performance result summary:

I would like to thanks Simon Riggs for
verifying and committing this patch. Simon Riggs also confirmed
improvement of 5% in both short and long index, on the least
beneficial data-type and considered to be very positive win
overall.

On March 10th, we had our third ever pgDay for SFPUG, which was
a runaway success. pgDaySF 2015 was held together with FOSS4G-NA
and EclipseCon; we were especially keen to join FOSS4G because of
the large number of PostGIS users attending the event. In all,
around 130 DBAs, developers and geo geeks joined us for pgDay SF
... so many that the conference had to reconfigure the room to add
more seating!

The day started out with Daniel Caldwell showing how to use
PostGIS for offline mobile data, including a phone demo.

Ozgun Erdogan presented pg_shard with a a short demo.

Gianni Ciolli flew all the way from London to talk about using
Postgres' new Logical Decoding feature for database auditing.

Peak excitement of the day was Paul Ramsey's "PostGIS Feature
Frenzy" presentation.

We also had presentations by Mark Wong and Bruce Momjian, and
lightning talks by several presenters. Slides for some sessions are
available on the
FOSS4G web site. According to FOSS4G, videos will be available
sometime soon.

Of course, we couldn't have done it without our sponsors:
Google, EnterpriseDB, 2ndQuadrant, CitusDB and
pgExperts. So a big thank you to our sponsors, our
speakers, and the staff of FOSS4G-NA for creating a great day.

Last Thursday, I had this short and one-sided conversation with
myself: “Oh, cool, Pg 9.4 is out for RDS. I’ll upgrade my new
database before I have to put it into production next week, because
who knows when else I’ll get a chance. Even though I can’t use
pg_dumpall, this will take me what, 20 […]

Before PostgreSQL got streaming replication, back in version
9.0, people kept asking when we’re going to get replication. That
was a common conversation-starter when standing at a conference
booth. I don’t hear that anymore, but this dialogue still happens
every now and then:

- I have streaming replication set up, with a master and
standby. How do I perform failover?
- That’s easy, just kill the old master node, and run “pg_ctl
promote” on the standby.
- Cool. And how do I fail back to the old master?
- Umm, well, you have to take a new base backup from the new
master, and re-build the node from scratch..
- Huh, what?!?

pg_rewind is a better answer to that. One way to think of it is
that it’s like rsync on steroids. Like rsync, it copies files that
differ between the source and target. The trick is in how it
determines which files have changed. Rsync compares timestamps,
file sizes and checksums, but pg_rewind understands the PostgreSQL
file formats, and reads the WAL to get that information
instead.

I started hacking on pg_rewind about a year ago, while working
for VMware. I got it working, but it was a bit of a pain to
maintain. Michael Paquier helped to keep it up-to-date, whenever
upstream changes in PostgreSQL broke it. A big pain was that it has
to scan the WAL, and understand all different WAL record types –
miss even one and you might end up with a corrupt database. I made
big changes to the way WAL-logging works in 9.5, to make that
easier. All WAL record types now contain enough information to know
what block it applies to, in a common format. That slashed the
amount of code required in pg_rewind, and made it a lot easier to
maintain.

I have just committed pg_rewind into the PostgreSQL git
repository, and it will be included in the upcoming 9.5 version. I
always intended pg_rewind to be included in PostgreSQL itself; I
started it as a standalone project to be able to develop it faster,
outside the PostgreSQL release cycle, so I’m glad it finally made
it into the main distribution now. Please give it a l

Developers have been really excited about the addition of JSON
support starting PostgreSQL v9.2. They feel they now have the
flexibility to work with a schema-less unstructured dataset while
staying within a relational DBMS. So what’s the buzz all about?
Let’s explore below …

Why is NoSQL so attractive?

Rapid turnaround time … it is as simple as that. With the push
to decrease time-to-market, developers are under constant pressure
to turn POCs around very quickly. It is actually not just POCs,
marketable products are increasingly getting the same treatment.
The attitude is, “If I don’t get it out, someone else will.”.

Any decent sized application will need to store data somewhere.
Rather than going through the pains of designing schemas and the
debates on whether to normalize or not, developers just want to get
to the next step. That’s how databases like MongoDB gained such
tremendous popularity. They allow for schema-less, unstructured
data to be inserted in document form and the developers find it
easy to convert class objects within their code into that document
directly.

There is a trade-off, however. The document (and key/value
store) databases are very unfriendly to relations. While retrieving
data, you will have a very hard time cross referencing between
different tables making analytics nearly impossible. And, nightmare
of nightmares for mission critical applications, these databases
are not ACID compliant.

In walks PostgreSQL with JSON and HSTORE support.

NoSQL in PostgreSQL

While the HSTORE contrib module has been providing key/value
data types in standard PostgreSQL table columns since v8.2, the
introduction of native JSON support in v9.2 paves way for the true
power of NoSQL within PostgreSQL.

Starting v9.3, not only do you have the ability to declare JSON
data types in standard tables, you now have functions to encode
data to JSON format and also to extract data elements from a JSON
column. What’s more, you can also interchange data between JSON and
HSTORE using simple & intuitive fun

I did a new PostGIS talk for FOSS4G North America 2015, an
exploration of some of the tidbits I've learned over the past six
months about using PostgreSQL
and PostGIS together to make
"magic" (any sufficiently advanced technology...)

Somehow I've gotten through 10 years of SQL without ever
learning this construction, which I found while proof-reading a
colleague's blog post
and looked so unlikely that I had to test it before I believed it
actually worked. Just goes to show, there's always something new to
learn.

Suppose you have a GPS location table:

gps_id: integer

geom: geometry

gps_time: timestamp

gps_track_id: integer

You can get a correct set of lines from this collection of
points with just this SQL:

SELECT gps_track_id, ST_MakeLine(geom ORDER BY gps_time ASC) AS geom FROM gps_poinstGROUP BY gps_track_id

Those of you who already knew about placing ORDER
BY within an aggregate function are going "duh", and the
rest of you are, like me, going "whaaaaaa?"

Prior to this, I would solve this problem by ordering all the
groups in a CTE or sub-query first, and only then pass them to the
aggregate make-line function. This, is, so, much, nicer.

PostgreSQL has provided table partitions for a long time. In
fact, one might say it has always
had partitioning. The functionality and performance of table
inheritance has increased over the years, and there are innumerable
arguments for using it, especially for larger tables consisting of
hundreds of millions of rows. So I want to discuss a quirk that
often catches developers off guard. In fact, it can render
partitioning almost useless or counter-productive.

PostgreSQL has a very good overview in its partitioning
documentation. And the pg_partman
extension at PGXN follows the
standard partitioning model to automate many of the pesky tasks for
maintaining several aspects of partitioning. With modules like
this, there’s no need to manually manage new partitions, constraint
maintenance, or even some aspects of data movement and
archival.

However, existing partition sets exist, and not everyone knows
about extensions like this, or have developed in-house systems
instead. Here’s something I encountered recently:

This looks innocuous enough, but PostgreSQL veterans are already
shaking their heads. The documentation alludes to how this could be
a problem:

Keep the partitioning constraints simple, else the planner may
not be able to prove that partitions don’t need to be visited.

The issue in this case, is that adding the interval of a month
changes the right boundary of this range constraint into a dynamic
value. PostgreSQL will not use dynamic values in evaluating check
constraints. Here’s a query plan from PostgreSQL 9.4.1, which is
the most recent release as of

The BDR team has recently introduced support for dynamically
adding new nodes to a BDR group from SQL into the current
development builds. Now no configuration file changes are required
to add nodes and there’s no need to restart the existing or newly
joining nodes.

This change does not appear in the current 0.8.0 stable release;
it’ll land in 0.9.0 when that’s released, and can be found in the
bdr-plugin/next branch in the mean time.

New nodes negotiate with the existing nodes for permission to
join. Soon they’ll be able to the group without disrupting any DDL
locking, global sequence voting, etc.

There’s also an easy node removal process so you don’t need to
modify internal catalog tables and manually remove slots to drop a
node anymore.

New node join process

With this change, the long-standing GUC-based configuration for
BDR has been removed. bdr.connections no longer exists and
you no longer configure connections with bdr.[conname]_dsn
etc.

Instead, node addition is accomplished with the
bdr.bdr_group_join(...) function. Because this is a
function in the bdr extension, you must first CREATE
EXTENSION bdr;. PostgreSQL doesn’t have extension dependencies
and the bdr extension requires the btree_gist
extension so you’ll have to CREATE EXTENSION btree_gist
first.

Creating the first node

Creation of the first node must now be done explicitly using
bdr.bdr_group_create. This promotes a standalone
PostgreSQL database to a single-node BDR group, allowing other
nodes to then be joined to it.

You must pass a node name and a valid externally-reachable
connection string for the dsn parameter, e.g.:

Note that the dsn is not used by the root node its self. It’s
used by other nodes to connect to the root node, so you can’t use a
dsn like host=localhost dbname=mydb if you intend to have
nodes on multiple machines.

Adding other nodes

The PostGIS development team is happy to release patch for
PostGIS 2.1, the 2.1.6 release. As befits a patch release, the
focus is on bugs, breakages, and performance issues. Users with
large tables of points will want to priorize this patch, for
substantial (~50%) disk space savings.

Last winter, we open-sourced pg_shard, a transparent
sharding extension for PostgreSQL. It brought straightforward
sharding capabilities to PostgreSQL, allowing tables and queries to
be distributed across any number of servers.

Today we’re excited to announce the next release of pg_shard.
The changes in this release include:

Improved performace — INSERT
commands run up to four times faster

Shard repair — Easily bring inactive
placements back up to speed

Copy script — Quickly import data from CSV and
other files from the command line

Whether you want a distributed document store alongside your
normal PostgreSQL tables or need the extra computational power
afforded by a sharded cluster, pg_shard can help. We continue to
grow pg_shard’s capabilities and are open to feature requests.

A nice feature extending the usage of pgbench,
in-core tool of Postgres aimed at doing benchmarks, has landed in
9.5 with this commit:

commit: 878fdcb843e087cc1cdeadc987d6ef55202ddd04
author: Robert Haas <rhaas@postgresql.org>
date: Mon, 2 Mar 2015 14:21:41 -0500
pgbench: Add a real expression syntax to \set
Previously, you could do \set variable operand1 operator operand2, but
nothing more complicated. Now, you can \set variable expression, which
makes it much simpler to do multi-step calculations here. This also
adds support for the modulo operator (%), with the same semantics as in
C.
Robert Haas and Fabien Coelho, reviewed by Álvaro Herrera and
Stephen Frost

pgbench has for ages support for custom input files using -f
with custom variables, variables that can be set with for example
\set or \setrandom, and then can be used in a custom set of SQL
queries:

Up to 9.4, those custom variables can be calculated with simple
rules of the type "var operator var2" (the commit message above is
explicit enough), resulting in many intermediate steps and
variables when doing more complicated calculations (note as well
that additional operands and variables, if provided, are simply
ignored after the first three ones):

In 9.5, such cases become much easier because pgbench has been
integrated with a parser for complicated expressions. In the case
of what is written above, the same calculation can be done more
simply with that, but far more fancy things can be done:

Amit Kapila and I
have been working very hard to make parallel sequential scan ready
to commit to PostgreSQL 9.5. It is not all there yet, but we
are making very good progress. I'm very grateful to everyone
in the PostgreSQL community who has helped us with review and
testing, and I hope that more people will join the effort.
Getting a feature of this size and complexity completed is
obviously a huge undertaking, and a significant amount of work
remains to be done. Not a whole lot of brand-new code remains
to be written, I hope, but there are known issues with the existing
patches where we need to improve the code, and I'm sure there are
also bugs we haven't found yet.
Read more »

On 18th of March, Alvaro Herrera committed patch: array_offset()
and array_offsets() These functions return the offset
position or positions of a value in an array. Author: Pavel
Stěhule Reviewed by: Jim Nasby It's been a while since my last
“waiting for" post – mostly because while there is a lot of work
happening, […]

Last night I attended the second WhatcomPUG.
This meeting was about Sqitch,
a interesting database revision control mechanism. The system is
written in Perl and was developed by David Wheeler of PgTap fame.
It looks and feels like git. As it is written in Perl it definitely
has too many options. That said, what we were shown works, works
well and appears to be a solid and thorough system for the job.

I also met a couple of people from CoinBeyond. They are a
point-of-sale software vendor that specializes in letting "regular"
people (read: not I or likely the people reading this blog) use
Bitcoin!

That's right folks, the hottest young currency in the market
today is using the hottest middle aged technology for their
database, PostgreSQL. It was great to see that they are also
located in Whatcom County. The longer I am here, the more I am
convinced that Whatcom County (and especially Bellingham) is a
quiet tech center working on profitable ventures without the noise
of places like Silicon Valley. I just keep running into people
doing interesting things with technology.

POSTGRESQL is an open-source, full-featured relational
database. This blog gives an overview of how POSTGRESQL engine processes
queries received from the user.

Typical simplified flow of PostgreSQL engine
is:

SQL Engine Flow

As part of this blog, I am going to cover all
modules marked in yellow colour.

Parser:

Parser module is responsible for syntactical
analysis of the query. It constitute two sub-modules:

1. Lexical scanner

2. Bison rules/actions

Lexical
Scanner:

Lexical scanner reads each character from the given
query and return the appropriate token based on the matching rules.
E.g. rules can be as follows:

Name given in the <> is the
state name, in the above example <xc> is the state name for
the comment start. So once it sees the comment starting character,
comment body token will be read in the <xc> state only.

Bison:

Bison reads token returned from scanner and matches
the same against the given rule for a particular query and
performs the associated actions. E.g. the bison rule for SELECT
statement is:

So each returned token is matched with the rule
mentioned above in left-right order, if at any time it does not
find matching rule, then either it goes to next possible matching
rule or throws an error.

Analyzer:

Analyzer module is responsible for doing semantic
analysis of the given query. Each raw information about the
query received from the Parser module is transformed to database
internal object form to get the corresponding object id. E.g.
relation name "tbl" get replaces with its object id.

Output of analyzer module is Query tree, structure
of same can be seen in the structure "Query" of
file src/include/nodes/parsenodes.h

Optimizer:

Optimizer module also consider to be brain of SQL
engine is responsible for choosing the best path for execution of
the query. Best path for a query is selected based on the cost of
the path. The path with least cost is considered to be a winner
path.

Based on the winner path, plan is created which is
used by executor to execut