tail -f /dev/dimhttp://tapoueh.org/blogDimitri Fontaine's blogen-usEmacs Muse and Tapoueh's Common Lispa pgDay in Paris!http://tapoueh.org/blog/2015/03/pgday-paris.htmlI was lucky to participate as a speaker to the
Nordic PostgreSQL Day 2015
and it's been another awesome edition of the conference. Really smooth,
everything has been running as it should, with about one hundred people at
the conference.

The
Nordic pgDay is such a successful conference that I long wanted to have
just the same in my area, and so we made
pgDay Paris and modeled it against
Nordic. It's planned to be all the same, just with a different audience
given the location.

April 21st, save the date and join us in Paris

The
pgDay Paris welcomes English Speaking speakers and the Call for Papers
is now open, so please consider
submitting your talk proposal for pgDay Paris by cliking on the link and
filling in a form. Not too hard a requirement for being allowed to visit
such a nice city as Paris, really!

]]>dim@tapoueh.org (Dimitri Fontaine)Mon, 16 Mar 2015 15:22:00 +0100http://tapoueh.org/blog/2015/03/pgday-paris.htmlBack From FOSDEM 2015http://tapoueh.org/blog/2015/02/09-back-from-fosdem-2015.htmlThe
FOSDEM 2015 edition has been awesome this year, the usual mix of meeting
with old friends, talking about interesting topics, seeing tremendous
activity in all Open Source domains, and having Belgium beers in the
evenings.

FOSDEM PGDAY

On the Friday before the real FOSDEM event our own
PostgreSQL Europe
organized a one-day event, the
FOSDEM PGDAY. It as an intense day of
conferences about PostgreSQL, where I had the opportunity to present
pgloader in the context of dealing with database migrations.

Migrate from MySQL to PostgreSQL in one command

PostgreSQL User Group, Paris Meetup

This presentation about migrating to PostgreSQL was also given at the
PostgreSQL User Group Meetup in Paris more recently, and I'm happy to
announce here that we have more than 200 registered members in the group
now!

FOSDEM

At the FOSDEM event proper I had the pleasure to present my recent talk
about backups:

Nobody cares about backups, think about data recovery

If you want to remember only one thing about that presentation, it must be
that we don't care about how you take backups, we only care about if you're
able to recover data in worst case scenarios. The only to check a backup is
to recover it. Do automated testing of your backups, which means automated
recovery.

If you're not there on the Friday but still want to talk about
pgloader,
join us at the PostgreSQL devroom and booth!

]]>dim@tapoueh.org (Dimitri Fontaine)Thu, 22 Jan 2015 01:48:00 +0100http://tapoueh.org/blog/2015/01/22-my-first-slashdot-effect.htmlNew release: pgloader 3.2http://tapoueh.org/blog/2015/01/16-pgloader-3-2.htmlPostgreSQL comes with an awesome bulk copy protocol and tooling best known
as the
COPY and
\copy commands. Being a transactional system, PostgreSQL
COPY implementation will
ROLLBACK any work done if a single error is found
in the data set you're importing. That's the reason why
pgloader got
started: it provides with error handling for the
COPY protocol.

That's basically what pgloader used to be all about

As soon as we have the capability to load data from unreliable sources,
another use case appears on the horizon, and soon enough
pgloader grew the
capacity to load data from other databases, some having a more liberal
notion of what is sane data type input.

To be able to adapt to advanced use cases in database data migration
support, pgloader has grown an advanced command language wherein you can
define your own load-time data projection and transformations, and your own
type casting rules too.

New in
version 3.2 is that in simple cases, you don't need that command file
any more. Check out the
pgloader quick start page to see some examples where
you can use pgloader all from your command line!

Here's one such example, migrating a whole MySQL database data set over to
PostgreSQL, including automated schema discovery, automated type casting and
on-the-fly data cleanup (think about zero dates or booleans in
tinyint(1)
disguise), support for indexes, primary keys, foreign keys and comments.
It's as simple as:

As usual in unix commands, the
- input filename stands for
standard input
and allows streaming data from a remote compressed file down to PostgreSQL.

So if you have any data loading job, including data migrations from SQLite,
MySQL or MS SQL server: have a look at
pgloader!

]]>dim@tapoueh.org (Dimitri Fontaine)Fri, 16 Jan 2015 09:35:00 +0100http://tapoueh.org/blog/2015/01/16-pgloader-3-2.htmlGoing to Chicago, Postgres Openhttp://tapoueh.org/blog/2014/08/29-Postgres-Open.htmlNext month,
Postgres Open 2014 is happening in Chicago, and I'll have the
pleasure to host a tutorial about PostgreSQL Extensions
Writing & Using Postgres Extensions, and a talk aimed at developers wanting
to make the best out of PostgreSQL,
PostgreSQL for developers:

The tutorial is based on first hand experience on the
PostgreSQL Extension Packaging System both as a user and a developer. It's a
series of practical use cases where using extensions will simplify your life
a lot, and each of those practical use case is using real world data (thanks
to
pgloader).

Most of the examples covered in the tutorial have a blog entry here that
present the idea and the solution, so the tutorial is all about putting it
all together. You can already read the blog posts under the
YeSQL and
Extensions for a preview.

If you want to learn how to implement a modern search user interface for
your own product, something that your users know how to use already, then
the tutorial is for you, as we will cover PostgreSQL based approximate
searches with suggestions (
did you mean ...?) and autocompletion.

The tutorial gives you the toolset you will use to avoid the situation
depicted here.

See you all in Chicago!

]]>dim@tapoueh.org (Dimitri Fontaine)Fri, 29 Aug 2014 14:26:00 +0200http://tapoueh.org/blog/2014/08/29-Postgres-Open.htmlTurn your PostgreSQL queries into Chartshttp://tapoueh.org/blog/2014/08/25-pgcharts.htmlEarlier this year we did compare compare
Aggregating NBA data, PostgreSQL vs MongoDB then talked about
PostgreSQL, Aggregates and histograms where we even produced a nice
Histogram chart directly within the awesome
psql console. Today, let's get
that same idea to the next level, with
pgcharts:

The application's specifications are quite simple: edit an SQL query, set
your
categories and your
data series, add in some
legends, and get a nice
chart. Currently supported are
bar,
column,
pie and
donut charts, and we
should be able to add anything that
http://www.highcharts.com/ has support
for.

Currently, you need to compile the application yourself, and for that you
need to install the
SBCL compiler. Soon enough you will have a
debian
package to play with! The
README at the
pgcharts github place has the
details to get you started. Enjoy!

]]>dim@tapoueh.org (Dimitri Fontaine)Mon, 25 Aug 2014 14:09:00 +0200http://tapoueh.org/blog/2014/08/25-pgcharts.htmlWhy is pgloader so much faster?http://tapoueh.org/blog/2014/05/14-pgloader-got-faster.htmlpgloader loads data into PostgreSQL. The new version is stable enough
nowadays that it's soon to be released, the last piece of the
3.1.0 puzzle
being full
debian packaging of the tool.

The pgloader logo is a loader truck, just because.

As you might have noticed if you've read my blog before, I decided that
pgloader needed a full rewrite in order for it to be able to enter the
current decade as a relevant tool. pgloader used to be written in the
python programming language, which is used by lots of people and generally
quite appreciated by its users.

Why changing

Still, python is not without problems, the main ones I had to deal with
being
poor performances and lack of threading capabilities. Also, the
pgloader setup design was pretty hard to maintain, and adding compatiblity
to other
loader products from competitors was harder than it should.

After some basic performances checking as seen in my
Common Lisp Sudoku Solver project where I did get up to
ten times faster
code when compared to python, it felt like the amazing set of features of
the language could be put to good use here.

So, what about performances after rewrite?

The main reason why I'm now writing this blog post is receiving emails from
pgloader users with strange feelings about the speedup. Let's see at the
numbers one user gave me, for some data point:

Wait, you're still using COPY right?

So yes,
pgloader is still using
COPY. This time the protocol implementation
is to be found in the Common Lisp
Postmodern driver, which is really great.
Before that, back when pgloader was python code, it was using the very good
psycopg driver, which also exposes the COPY protocol.

So, what did happen here?

Well it happens that pgloader is now built using Common Lisp technologies,
and those are really great, powerful and fast!

In the
pgloader case I've been using the
lparallel utilities, in particular
its
queuing facility to be able to implement
asynchronous IOs where a thread
reads the source data and preprocess it, fills up a batch at a time in a
buffer that is then pushed down to the writer thread, that handles the
COPY
protocol and operations.

So my current analysis is that the new thread based architecture used with a
very powerful compiler for the Common Lisp high-level language are allowing
pgloader to enter a whole new field of
data loading performances.

Conclusion

Rather than a configuration file, the way to use the new pgloader is using a
command language that has been designed to look as much like SQL as possible
in the pgloader context, to make it easy for its users. Implementation wise,
it should now be trivial enough to implement compatibility with other
data
load software that some
PostgreSQL competitor products do have.

Also, the new code base and feature set seems to attract way more users than
the previous implementation ever did, despite using a less popular
programming language.

You can already
download pgloader binary packages for
debian based
distributions and
centos based ones too, and you will even find a
Mac OS X
package file (
.pkg) that will make
/usr/local/bin/pgloader available for you
on the command line. If you need a windows binary, drop me an email.

The first stable release of the new
pgloader utility is scheduled to be
named
3.1.0 and to happen quite soon. We are hard at work on packaging the
dependencies for
debian, and you can have a look at the
Quicklisp to debian
project if you want to help us get there!

]]>dim@tapoueh.org (Dimitri Fontaine)Wed, 14 May 2014 14:59:00 +0200http://tapoueh.org/blog/2014/05/14-pgloader-got-faster.htmlNew York!http://tapoueh.org/blog/2014/04/17-PGConf-NYC.htmlA couple of week ago I had the chance to participate into the
PGConf NYC 2014 Conference, one of the biggest conferences about PostgreSQL
worldwide.

I presented one of my favourite talks over there, where the whole goal is to
blow the mind of innocent developers and show them how much they can do in
just SQL.

PostgreSQL for developers, window functions galore!

The basis for the talk is this detailed blog entry about the
Reset Counter
application and how to leverage SQL to write the code for it.

]]>dim@tapoueh.org (Dimitri Fontaine)Thu, 17 Apr 2014 13:53:00 +0200http://tapoueh.org/blog/2014/04/17-PGConf-NYC.htmlNordic PostgreSQL Day 2014http://tapoueh.org/blog/2014/03/25-Nordic-PGDAY-2014.htmlLast week some PostgreSQL users, contributors and advocates have organized a
really great conference in Stockholm, Sweden, where I had the please to give
the following talk:

PostgreSQL is YeSQL!

Nordic PostgreSQL Conference

The conference was very well put together and the occasion to meet with
plenty of well known PostgreSQL friends and newcomers to the community too.
If you were there, I hope you had as much of a good time than I did!

All the slides from this day of conferences have now been uploaded on the
PostgreSQL Wiki Page for Nordic PGDAY, so that if you couldn't make it to
the event you can still read through the material and send emails around to
the authors with any and all questions about the contents!

See you at the next PostgreSQL conference maybe, this time in
New York City!

The Hacker's Guide to Python

In other news, my good friend
Julien Danjou (you might know him for having
hacked together the
awesome window manager) just released today
The Hacker's Guide to Python where you will
Learn everything you need to
build a successful Python project. I've had the pleasure to answer's
Julien's questions about how to best benefit from
PostgreSQL when coding in
python, which makes it for an interview chapter within:

In this book, we'll see how you can leverage Python to efficiently tackle your problems and build great Python applications.

And while at it:

The Hacker's Guide to Python is out!
http://t.co/jsw2RBObpv
Buy today and get 20% off with the coupon THGTP20 :-)

The function width_bucket(op numeric, b1 numeric, b2 numeric, count int)
returns (as an int) the bucket to which operand would be assigned in an
equidepth histogram with count buckets, in the range b1 to b2.

For example
width_bucket(5.35, 0.024, 10.06, 5) returns
3.

Let's have a look at our dataset from the NBA games and statistics, and get
back to counting
rebounds in the
drb field. A preliminary query informs us
that we have stats ranging from 10 to 54 rebounds per team in a single game,
a good information we can use in the following query:

Console Histograms

Now, what would it take to actually be able to display the full story right
into our
psql console, for preview before actually integrating a new diagram
in our reporting solution? Turns out it's not very complex.

First, we want to avoid hard coding the range of
rebounds we're processing,
so we are going to compute that in a first step. Then we want the
histogram
data, which is a ordered list of ranges with a min and a max value and a
frequency, which is how many games were recorded with a number or
rebounds
within any given
bucket range. And last, we want to display something a
little more visual than just a list of numbers:

The query is using the
Common Table ExpressionsWITH syntax so that it's
easier to read and understand, then the data-related magic happens in the
histogram CTE. We use the
width_bucket function to get a
calibration number
for each of our games, and we
GROUP BY bucket to be able to
aggregate the
min and
max values.

As we're using PostgreSQL though, just having two columns with the min and
max as separate values is not enough, what we actually need is a
discrete range of
rebounds for each
bucket, hence using the
int4range range
constructor function.

Of course, within the same
GROUP BY aggregation here it's still possible to
count the number of games having a
rebounds stat within the bucket, defining
the histogram's frequency.

The only remaining step then consists into hacking our way into actually
displaying something
visual enough for a quick less-than-1-minute effort of
data crunching, using the
repeat function which is part of
PostgreSQL String Functions and Operators. Note that we're using the
Window Function expression
max(freq) over() to have access the highest
frequency value from each and every result row.

So... Did I mention lately?

PostgreSQL is YeSQL!

By the way, the whole scripting and data and SQL is available at
github/dimitri/nba, and there's an
Hacker News entry to comment on the
article if you're interested.

]]>dim@tapoueh.org (Dimitri Fontaine)Fri, 21 Feb 2014 13:25:00 +0100http://tapoueh.org/blog/2014/02/21-PostgreSQL-histogram.htmlAggregating NBA data, PostgreSQL vs MongoDBhttp://tapoueh.org/blog/2014/02/17-aggregating-nba-data-PostgreSQL-vs-MongoDB.htmlWhen reading the article
Crunching 30 Years of NBA Data with MongoDB Aggregation I coulnd't help but
think that we've been enjoying
aggregates in SQL for 3 or 4 decades already.
When using
PostgreSQL it's even easy to actually add your own aggregates
given the SQL command
create aggregate.

The next step after thinking how obvious the queries written in the
mentionned article would be to express in SQL was to actually load the data
into PostgreSQL and write the aggregate queries, of course.

Loading the data

With the help of a little bit of
Common Lisp code and using the
mongo-cl-driver it was easy enough to parse the given
BSON file. What was
more complex was to actually understand enough of the data model to produce
a relational design out of it, avoiding data redundancy as much as possible.

We call that step
normalization in old-style relational databases, and the
goal of that process is to avoid
functional dependency so that the data is
easier to understand, verify and process the data once loaded.

For instance, rather than have both scores from each team and a column
won
per team, which would be a boolean but is a number in the given
BSON file,
we store only the scores. Here's the main table definition of the stats we
are going to be playing with, the
game table:

I don't know about you, but I have quite a hard time deciphering what that
query is actually doing, and when the explanation text talks about
using a
6-stage pipeline my understanding is that the application developper has
been writing the
execution plan of the query here. Let's ignore the query
format itself, as it's obviously meant to be generated by a tool rather than
typed by a human being.

What we have here is a pretty basic query using a
join, a
where clause to
restrict the data set we are playing with, a
group by clause to define which
data to compute the aggregates against, with an
order by and a
limit clause
for presenting the result. To be realistic, if you've ever done any SQL at
all, then you know how to read that query because you've been writing dozens
of similar ones.

Here's, as in the original article, the same query against a much larger
data set this time, with all games of the 2000s decade:

Correlating stats with wins

The goal here is to compute how often a team wins when they record more
defensive rebounds than their opponent across the entire data set.

To be able to compute the percentage, we have to have a count of all
registered games, of course. Then we are going to count how many times the
winner team registered a greater
team_stats.drb than the loser, and
count
how many times in SQL is usually written as a
sum(case when <condition> then
1 else 0 end), which is what we're doing here:

We note here than in the original MongoDB article the aggregation query is
short of computing the
percentage directly, apparently it's been done in the
client tool, maybe using a
spreadsheet application or something.

I only pasted the first few lines of the result because I'm not sure how to
make sense of it, really.

Interesting factoid

What I find most interesting in the following
factoid proposed in the
MongoDB article is the complete lack of the query you need to run in order
to grab the matching data:

An interesting factoid: the team that recorded the fewest defensive rebounds
in a win was the 1995-96 Toronto Raptors, who beat the Milwaukee Bucks 93-87
on 12/26/1995 despite recording only 14 defensive rebounds.

When doing the necessary query in SQL, using a
Common Table Expression (the
WITH syntax) and a
Window Function for good measure, we get actually 4
different games with the minimum defensive rebounds in our history of NBA
games, 14:

Total rebounds and wins

The next interesting aside is the following:

As an aside, the Cleveland Cavaliers beat the New York Knicks 101-97 on
April 11, 1996, despite recording only 21 total rebounds. Inversely, the San
Antonio Spurs lost to the Houston Rockets, 112-110, on January 4, 1992
despite recording 75 total rebounds.

Again it's easy enough in SQL to have more details about the
aside presented
in our source article, and we get a slightly different story.

Conclusion

It's quite hard for me to appreciate the work done in the
MongoDB
aggregation framework really, when we've been enjoying advanced aggregation
and statistics in
PostgreSQL for a very long time. With the addition of
Window Functions and
Aggregate Functions for Statistics it's possible to
implement advanced analysis right into your SQL queries.

When the problem you have to solve involves analyzing data, one of the more
advanced tooling you can find around certainly is the SQL language, in
particular its implementation in PostgreSQL!

]]>dim@tapoueh.org (Dimitri Fontaine)Mon, 17 Feb 2014 23:40:00 +0100http://tapoueh.org/blog/2014/02/17-aggregating-nba-data-PostgreSQL-vs-MongoDB.htmlPostgreSQL FOSDEM Conferencehttp://tapoueh.org/blog/2014/02/FOSDEM-Extensions.htmlBack from the
FODESM 2014 Conference, here's the slides I've been using for
the
Advanced Extension Use Cases talk I gave, based on the ongoing work to
be found under the
Tour of Extensions index in this web site.

If you're interested into the talk contents, then you might be interested
into the following list of articles where I actually did all the work
leading to the slides in the above PDF:

In our previous article
Aggregating NBA data, PostgreSQL vs MongoDB we spent
time comparing the pretty new
MongoDB Aggregation Framework with the decades
old SQL aggregates. Today, let's showcase more of those SQL aggregates,
producing a nice
histogram right from our SQL console.

In our
Tour of Extensions today's article is about advanced tag indexing. We
have a great data collection to play with and our goal today is to be able
to quickly find data matching a complex set of tags. So, let's find out
those
lastfm tracks that are tagged as
blues and
rhythm and blues, for
instance.

At the
Open World Forum two weeks ago I had the pleasure to meet with
Colin Charles. We had a nice talk about the current state of both
MariaDB
and
PostgreSQL, and even were both interviewed by the Open World Forum Team.
The interview is now available online. Dear French readers, it's in English.

PostgreSQL is an all round impressive
Relational DataBase Management System
which implements the SQL standard (see the very useful reference page
Comparison of different SQL implementations for details). PostgreSQL also
provides with unique solutions in the database market and has been leading
innovation for some years now. Still, there's no support for
Autonomous
Transactions within the server itself. Let's have a look at how to easily
implement them with
PL/Proxy.

Last Friday I had the chance to be speaking at the
Open World Forum in the
NewSQL track, where we had lots of interest and excitement around the
NoSQL
offerings. Of course, my talk was about explaining how
PostgreSQL is Web Scale with some historical background and technical
examples about what this database engine is currently capable of.

In a previous article about
Trigger Parameters we have been using the
extension
hstore in order to compute some extra field in our records, where
the fields used both for the computation and for storing the results were
passed in as
dynamic parameters. Today we're going to see another
trigger
use case for
hstore: we are going to record changes made to our tuples.

Sometimes you want to compute values automatically at
INSERT time, like for
example a
duration column out of a
start and an
end column, both
timestamptz. It's easy enough to do with a
BEFORE TRIGGER on your table.
What's more complex is to come up with a parametrized spelling of the
trigger, where you can attach the same
stored procedure to any table even
when the column names are different from one another.

There was SQL before
window functions and SQL after
window functions: that's
how powerful this tool is. Being that of a deal breaker unfortunately means
that it can be quite hard to grasp the feature. This article aims at making
it crystal clear so that you can begin using it today and are able to reason
about it and recognize cases where you want to be using
window functions.

In our recent article about
The Most Popular Pub Names we did have a look at
how to find the pubs nearby, but didn't compute the
distance in between that
pub and us. That's because how to compute a distance given a position on the
earth expressed as
longitude and
latitude is not that easy. Today, we are
going to solve that problem nonetheless, thanks to
PostgreSQL Extensions.

In a recent article here we've been talking about how do do
Batch Updates in
a very efficient way, using the
Writable CTE features available in
PostgreSQL 9.1. I sometime read how
Common Table Expressions changed the
life of fellow DBAs and developers, and would say that
Writable CTE are at
least the same boost again.

In a recent article
Craig Kerstiens from
Heroku did demo the really useful
crosstab extension. That function allows you to
pivot a table so that you
can see the data from different categories in separate columns in the same
row rather than in separate rows. The article from
Craig is
Pivoting in Postgres.

Tonight I had the pleasure to present a talk at the
Dublin PostgreSQL User Group using remote technologies. The talk is about
how to make the most ouf of PostgreSQL when using SQL as a developer, and
tries to convince you to dive into
mastering SQL by showing how to solve an
application example all in SQL, using
window functions and
common table
expressions.

Performance consulting involves some tricks that you have to teach over and
over again. One of them is that SQL tends to be so much better at dealing
with plenty of rows in a single statement when compared to running as many
statements, each one against a single row.

In the article from yesterday we talked about
PostgreSQL HyperLogLog with
some details. The real magic of that extension has been skimmed over though,
and needs another very small article all by itself, in case you missed it.

As
Guillaume says, we've been enjoying a great evening conference in Lyon 2
days ago, presenting PostgreSQL to developers. He did the first hour
presenting the project and the main things you want to know to start using
PostgreSQL in production, then I took the opportunity to be talking to
developers to show off some SQL.

Let's say you need to
ALTER TABLE foo ALTER COLUMN bar TYPE bigint;, and
PostgreSQL is helpfully telling you that no you can't because such and such
views depend on the column. The basic way to deal with that is to copy
paste from the error message the names of the views involved, then prepare a
script wherein you first
DROP VIEW ...; then
ALTER TABLE and finally
CREATE
VIEW again, all in the same transaction.

You certainly know that implementing
dynamic triggers in
PLpgSQL is
impossible. But I had a very bad night, being up from as soon as 3:30 am
today, so that when a developer asked me about reusing the same trigger
function code from more than one table and for a dynamic column name, I
didn't remember about it being impossible.

The drawback of hosting a static only website is, obviously, the lack of
comments. What happens actually, though, is that I receive very few comments
by direct mail. As I don't get another
spam source to cleanup, I'm left
unconvinced that's such a drawback. I still miss the low probability of
seeing blog readers exchange directly, but I think a
tapoueh.org mailing
list would be my answer, here...

So, when
8.4 came out there was all those comments about how getting
window functions was an awesome addition. Now, it seems that a lot of people
seeking for help in
#postgresql just don't know what kind of problem this
feature helps solving. I've already been using them in some cases here in
this blog, for getting some nice overview about
Partitioning: relation size per “group”.

After discovering the excellent
Gwene service, which allows you to subscribe
to
newsgroups to read
RSS content (
blogs,
planets,
commits, etc), I came to
read this nice article about
Happy Numbers. That's a little problem that
fits well an interview style question, so I first solved it yesterday
evening in
Emacs Lisp as that's the language I use the most those days.

The idea of the day ain't directly from me, I'm just helping with a very
thin subpart of the problem. The problem, I can't say much about, let's just
assume you want to reduce the storage of
MD5 in your database, so you want
to abuse
bit strings. A solution to use them works fine, but the datatype is
still missing some facilities, for example going from and to hexadecimal
representation in text.

This time, we are trying to figure out where is the bulk of the data on
disk. The trick is that we're using
DDL partitioning, but we want a “nice”
view of size per
partition set. Meaning that if you have for example a
parent table
foo with partitions
foo_201006 and
foo_201007, you would want
to see a single category
foo containing the accumulated size of all the
partitions underneath
foo.

This time we're having a database where
sequences were used, but not
systematically as a
default value of a given column. It's mainly an historic
bad idea, but you know the usual excuse with bad ideas and bad code: the
first 6 months it's experimental, after that it's historic.

The problem was raised this week on
IRC and this time again I felt it would
be a good occasion for a blog entry: how to load an
XML file content into a
single field?

]]>dim@tapoueh.org (Dimitri Fontaine)Mon, 03 Feb 2014 10:04:00 +0100http://tapoueh.org/blog/2014/02/FOSDEM-Extensions.htmlFOSDEM 2014http://tapoueh.org/blog/2014/01/29-fodsem-2014.htmlThis year again the
PostgreSQL community is organising a
FOSDEM PGDay rigth
before the main event. Have a look at the
PostgreSQL FOSDEM Schedule, it's
packed with awesome talks... personnaly, it's been awhile since I wanted to
see so many of them!

]]>dim@tapoueh.org (Dimitri Fontaine)Wed, 29 Jan 2014 11:03:00 +0100http://tapoueh.org/blog/2014/01/29-fodsem-2014.htmlImport fixed width data with pgloaderhttp://tapoueh.org/blog/2013/11/18-pgloader-fixed-width-data.htmlA long time ago we talked about how to
Import fixed width data with pgloader, following up on other stories still
online at
Postgres OnLine Journal and on
David Fetter's blog. Back then, I
showed that using pgloader made it easier to import the data, but also
showed quite poor performances characteristics due to using the
debug mode
in the timings. Let's update that article with current
pgloader wonders!

Redoing the python based test

Let's be fair, hardware did evolve in those past 3 years, and the test that
ran in 14 seconds was done with debug information level, which is the wrong
way to do tests.

First, the URL used in the blog post of april 2010 is no longer valid. You
can find a list of other interesting data at the
Census 2000 Gazetteer Files
page, and of course you have more recent version of the data available. In
another format entirely, this time tab-based csv-like, so better for general
usage, but not for this test where I wanted to reuse the same data source as
3 years ago.

What we can see in that command is that pgloader will actually download the
zip archive file from its http source URL, unzip it locally then work on the
filename from the archive matching the one we know about: we don't want to
hardcode in the command the name of the directory contained in the zip file.

Also, contrary to the previous version, it's quite easy to just trim the
loc_name column as we load the data. Here I've been adding a new function to
do that, because I wanted to play with optimizing it (adding type
declarations and inlining it), but the loading works about as well with just
the following (just timed 3 runs at
0.771s,
0.654s and
0.862s) :

Note that you could be providing that definition in your own
trim.lisp file
and provide it using the
--load trim.lisp command line option, pgloader
would then compile that to machine code for you before processing your data
file.

Conclusion

If you're already using pgloader, you will enjoy the new version of it! The
new version comes with a command line option to migrate the old
configuration file into a command string, making upgrades even easier.

Of course, if you're interested, consider giving the release candidate a
try, it's available on the
pgloader github repository already.

]]>dim@tapoueh.org (Dimitri Fontaine)Mon, 18 Nov 2013 12:48:00 +0100http://tapoueh.org/blog/2013/11/18-pgloader-fixed-width-data.htmlMigrating Sakila from MySQL to PostgreSQLhttp://tapoueh.org/blog/2013/11/12-migrating-sakila.htmlAs presented at the
PostgreSQL Conference Europe the new version of
pgloader
is now able to fully migrate a MySQL database, including discovering the
schema, casting data types, transforming data and default values.
Sakila is
the traditional MySQL example database, in this article we're going to fully
migrate it over to PostgreSQL.

What about switching to PostgreSQL, it's easier than ever.

Without further ado, here's what happens when you ask
pgloader to please
migrate the whole thing over to
PostgreSQL:

In those
2 and a half seconds, the whole dataset has been converted. Note
that the indexes are being built in parallel with the data loading, and all
indexes against the same relation are built in parallel to each other,
too.

Here's the
pgloader command that we used:

LOAD DATABASE
FROM mysql://root@localhost/sakila
INTO postgresql://localhost:54393/sakila
WITH include drop, create tables, no truncate,
create indexes, reset sequences, foreign keys
SET maintenance_work_mem to '128MB', work_mem to '12MB',
search_path to 'sakila' /* migrate to a specific schema */
CAST type datetime to timestamptz
drop default drop not null using zero-dates-to-null,
type date drop not null drop default using zero-dates-to-null,
type year to integer
BEFORE LOAD DO
$$ create schema if not exists sakila; $$;

Here's an example of how the casting rules work in that very case, where
we've been using mostly default rules:

We can see that we're using a couple for
custom data types in PostgreSQL,
those are the conversion from the
ENUM and
SET datatypes that MySQL database
is using here. The
SET datatype is simply converted to an array of ENUM
values in PostgreSQL.

At this point, you're left with reviewing the queries in your code and
adapting those. Also unhandled, the triggers and stored procedures and
views.

Your turn!

]]>dim@tapoueh.org (Dimitri Fontaine)Tue, 12 Nov 2013 11:37:00 +0100http://tapoueh.org/blog/2013/11/12-migrating-sakila.htmlBack From Dublinhttp://tapoueh.org/blog/2013/11/05-back-from-dublin.htmlLast week I had the pleasure to present two talks at the awesome
PostgreSQL Conference Europe. The first one was actually a tutorial about
Writing & using Postgres Extensions where we spent 3 hours on what are
PostgreSQL Extensions, what you can expect from them, and how to develop a
new one. Then I also had the opportunity to present the new version of
pgloader in a talk about
Migrating from MySQL to PostgreSQL.

Writing and Using PostgreSQL Extensions

This slide set is a series of almost
one hundred SQL queries against
real
life data sets to show off some nice use cases where PostgreSQL Extensions
are useful.

Dive into PostgreSQL Extensions, with detailed use cases

From MySQL to PostgreSQL with pgloader

This talk present the challenge of migrating your data from MySQL to
PostgreSQL, where they are protected against malicious abuse of all sorts.
Also goes at lengths to present
pgloader 3.1 which is going to be the new
version of pgloader, to be realeased soon, and announced here.

How to migrate to MySQL using pgloader... and why

PostgreSQL Conference Europe

If you've never been that conference, just realise how awesome it is.
Spending a full week with smart people who want to have good times together,
talking about either your own use case for PostgreSQL and giving away free
advices (read as free consulting too), or talking about the innards of
PostgreSQL in any level of details.

The conference runs in a different European city each year, we made it in
Prato, Italy, then
Paris, France, then
Stuttgart, Germany, then
Amsterdam,
Netherlands, then
Prague, Check Republic, and this time it was in
Dublin,
Ireland. Who know where we're joining next year... will you be there?

PostgreSQL User Group in Paris

Also, as announced in the
lightning talks series, the
IRILL french
organisation about
Research and Innovation on Free Software has agreed on
principle to host the PostgreSQL User Group in Paris. I'm soon to pick a
date with them for our first event, stay tuned!

]]>dim@tapoueh.org (Dimitri Fontaine)Tue, 05 Nov 2013 09:53:00 +0100http://tapoueh.org/blog/2013/11/05-back-from-dublin.htmlDenormalizing Tagshttp://tapoueh.org/blog/2013/10/23-denormalizing-tags.htmlIn our
Tour of Extensions today's article is about advanced tag indexing. We
have a great data collection to play with and our goal today is to be able
to quickly find data matching a complex set of tags. So, let's find out
those
lastfm tracks that are tagged as
blues and
rhythm and blues, for
instance.

Here,
pgloader extracted the table and index definitions from the SQLite
database using the
sqlite_master catalog and the
PRAGMA table_info()
commands, and migrated the data in a streaming fashion down to PostgreSQL,
using the
COPY protocol.

Having a look at the
demo_tags.py script we can actually see how to use the
relations here, and we realize they are using the
64-bit signed integer ROWID system column. We need something comparable to
be able to make sense of the data:

Here the query is mainly doing a
JOIN in between the
tid table (containing
track ids) and the
tid_tag table (containing association in between tracks
and tags), filtering on the
case insensitive regular expression'setzer'. As
we can imagine from reading the query execution time, we don't have any
index to implement the filtering here.

Advanced tag indexing

PostgreSQL comes with plenty of interesting datatypes, one of them is known
as the
Arrays Type. PostgreSQL also provides a very rich set of extensions,
some of them found under the
contrib package; one of them is
intarray. Let
me quote for you the most interesting part of the documentation for that
extension:

The @@ and ~~ operators test whether an array satisfies a query, which is
expressed as a value of a specialized data type query_int. A query consists
of integer values that are checked against the elements of the array,
possibly combined using the operators & (AND), | (OR), and ! (NOT).
Parentheses can be used as needed. For example, the query 1&(2|3) matches
arrays that contain 1 and also contain either 2 or 3.

PostgreSQL Documentation in F.18. intarray

tags# create extension intarray;
CREATE EXTENSION

The way the
intarray extension works, we need to build a new table that
contains for each track the list of tags it's been associated with, as an
array of integers. We're going to use our
rowid identifier for that purpose,
as in the following query:

That query here allows us to easily inject as many tags as we want to, so
that it's easy to use it as a
template from within an application where the
user is going to provide for the tags list. The
intarray extension's
query
format also accepts other operators (
or and
not) as we saw before, so if you
want to expose those to your users you would need to tweak the
query_int
building part of the SQL.

Now, how many tracks have been tagged with
both the
blues and the
rhythm and
blues tags, will you ask me:

Conclusion

The usual way to handle a set of user defined tags and query against it
involves join against a reference table of tags, but then it's quite
complicated to express the full search query: we want tracks tagged with
both
blues and
rhythm and blues and might want then to exclude
finger
picking.

The
intarray extension provides a powerful
query specialized language with
direct index support, so that you can build dynamic indexes searches
directly from your application.

]]>dim@tapoueh.org (Dimitri Fontaine)Thu, 24 Oct 2013 13:40:00 +0200http://tapoueh.org/blog/2013/10/23-denormalizing-tags.htmlAn Interview about MariaDB and PostgreSQLhttp://tapoueh.org/blog/2013/10/16-Interview-MariaDB-PostgreSQL.htmlAt the
Open World Forum two weeks ago I had the pleasure to meet with
Colin Charles. We had a nice talk about the current state of both
MariaDB
and
PostgreSQL, and even were both interviewed by the Open World Forum Team.
The interview is now available online. Dear French readers, it's in English.

Allo Mum? Yeah, I'm on TV. Well, Actually, Internet TV.

Here's the video:

Executive Summary: MariaDB is a drop-in fully Open Source replacement for
MySQL and sees quite some progress and innovation being made, and
PostgreSQL
is YeSQL!

]]>dim@tapoueh.org (Dimitri Fontaine)Wed, 16 Oct 2013 21:07:00 +0200http://tapoueh.org/blog/2013/10/16-Interview-MariaDB-PostgreSQL.htmlPostgreSQL Autonomous Transactionhttp://tapoueh.org/blog/2013/10/14-autonomous-transactions.htmlPostgreSQL is an all round impressive
Relational DataBase Management System
which implements the SQL standard (see the very useful reference page
Comparison of different SQL implementations for details). PostgreSQL also
provides with unique solutions in the database market and has been leading
innovation for some years now. Still, there's no support for
Autonomous
Transactions within the server itself. Let's have a look at how to easily
implement them with
PL/Proxy.

In PostgreSQL we have
pluggable languages: it's possible to add support for
programming languages to write your own
stored procedures, and the core
server ships with support for 5 such languages:
PL/C,
PL/SQL,
PLpgSQL,
PL/perl,
PL/python and
PL/tcl.

The
PL/Proxy procedural language is not about providing an existing
programming language. It's about providing the user with
remote procedure
call and
sharding facilities to spread any kind of laod in between a herd of
PostgreSQL servers. It's a very good
Scaling Out solution, that we're going
to use for something quite different here.

Remote Procedure Calls

Now, the main feature
PL/proxy provides and that we're going to benefit from
today is the
remote procedure call facility: when a function is called on a
server, proxy it to another one, calling the same function with the same
arguments over there, and fetching the result back. Classic
proxying and
RPC.

PL/Proxy will use binary transfers when possible to avoid marshalling steps here

The main thing to understand with
PL/Proxy is that the remote procedure call
happens in its own transaction, when it returns it's already been committed
on the remote server. So there's no local control over the PL/Proxy
transaction, if you need to
ROLLBACK locally that's too late.

Well, that limitation actually is a very interesting feature if what you
want to obtain is an
Autonomous Transaction, because it's actually what it
is. If you want the Autonomous Transaction to happen locally, all is needed
is to connect the proxy back to the current database.

A practical Use Case: auditing trigger

Autonomous Transactions are useful when several units of processing need to
be done, and we want to be able to
COMMIT some of those without impacting
the main transaction's ability to
ROLLBACK.

Note that with the
savepoint standard feature it's possible to
ROLLBACK a
part of the processing while still issuing a
COMMIT for the overall
transaction, so if that's what you need to do, you're already covered with a
stock install of PostgreSQL.

Now, say you want to log any attempt to
UPDATE a row in that specific
critical table of yours, even if the transaction is then aborted. That's
often referenced under the name
audit trigger and we already saw how to
implement such a trigger in our article
Auditing Changes with Hstore.

The whole goal of today's exercise is going to populate our
audit table even
when the main transaction fails. Let's first see what happens with the
solution we had already when we
ROLLBACK the main transaction:

Installing PLproxy

It begins as usual:

~# create extension plproxy;
CREATE EXTENSION

For that command to work you need to have installed the Operating System
Package for plproxy (if using PostgreSQL 9.3 under debian you need to
install the
postgresql-9.3-plproxy package, as found in the
PostgreSQL debian repository). If you don't have a package for
PL/Proxy you
need to fetch the sources from
https://github.com/markokr/plproxy-dev then
run
make install.

The Setup

Now that we have the extension, we need to use the
CREATE SERVER command to
have an entry point to a
remote transaction on the same connection string.

So we have a
PL/proxycluster to use, named
local, and we tested it with a
very simple function that just returns whatever integer we give it as
argument. We can see that the overhead to reconnect locally is not daunting
in our pretty simple example.

The remote auditing trigger

We already had a
trigger function named
audit that will work with the magic
variables
NEW and
OLD. What we want now is that the trigger function issues
a
remote procedure call to our
PL/proxy local connection instead:

What you can see in that new setup is that the trigger calls the function
audit_trigger which in turns call the function
audit_proxy. That
proxy
function is the key for us to benefit from the
PL/proxy remote transaction
management, all the proxy function does is connect back to
localhost then
call the function named
audit with the same parameter it got called with.

The previous example was made quite generic thanks to using
hstore. We can't
use
pseudo types with
PL/proxy so we need a pair of functions per table we
want to be able to audit in this fashion.

Conclusion

Thanks to a design where extensibility is a first class citizen,
PostgreSQL
makes it possible to implement
Autonomous Transactions without having to
edit its source code. Here we're using a special kind of a
Foreign Data Wrapper: the
PL/proxy driver allows implementing
remote
procedure calls and
sharding.

If you need
Autonomous Transactions and though PostgreSQL might not be
suitable in your case, now is the time to review your choice!

]]>dim@tapoueh.org (Dimitri Fontaine)Mon, 14 Oct 2013 11:25:00 +0200http://tapoueh.org/blog/2013/10/14-autonomous-transactions.htmlGeolocation with PostgreSQLhttp://tapoueh.org/blog/2013/10/09-ip4r.htmlLet's get back to our
Tour of Extensions that had to be kept aside for
awhile with other concerns such as last chance
PostgreSQL data recovery. Now
that we have a
data loading tool up to the task (read about it in the
Loading Geolocation Data article) we're going to be able to play with the
awesome
ip4r extension from
RhodiumToad.

The name of the game is to put IP adresses on a map

In this article we are going to
geolocalize our users given their IP
address. That can be useful when you know the
timezone settings of the
different locations on earth that you have in the database, to automatically
adapt to your user's current location for example. Of course we're going to
do something more exciting than that. Read on!

What we have here is a classic
ip range column where we can see that the
datatype output function is smart enough to display ranges either in their
CIDR notation or in the more general
start-end notation when no CIDR
applies.

The IP4R operators are not written this way

The
ip4r extension provides several operators to work with the dataset we
have, some of those operators are supported by the index we just created.
And just for the fun of it here's a catalog query to inquire about them:

Note that we could have been using the psql
\dx+ ip4r command instead of
course, but that query directly list operators that the
GiST index knows how
to solve. The operator
>>= reads as
contains and is the one
we're going to use here.

Geolocation meta-data

Now with the
MaxMind schema that we are using in that example, the
interesting data actually is to be found in the other table, the
geolite.localtion one. Let's use another IP address now, I'm told that
google.us has address 74.125.195.147, where is that IP from:

Emergency Pub

What if you want to make an application to help lost souls find the nearest
pub from where they are currently? Now that you know their location from the
IP address they are using in their browser, it should be easy enough right?

The pub names I got last time where all located in the UK, so we need an UK
based IP address now: it seems that
bbc.co.uk has address 212.58.251.195.

Conclusion

While some
geolocation data provider are giving you some libs and code to do
quite fast lookups, any interesting thing you want to do with the
geolocation data is about the
meta data. And that's where yet again
PostgreSQL shines: you can actually use specialized data types and
operators,
JOINs and
KNN searches, all from within a single query. You get
back only those results you are interested into, and the application is then
responsible for adding value to that, rather than processing the data
itself.

Typically what the application here would be doing is drawing a map and
locating the pubs on it, adding maybe descriptions and votes and notes on
each address, maybe even the draft menu. An ideal application might even be
able to join the draft menu of each nearby pub against your own preferences
and offer you a nice short list ordered by what you're most likely to want
to drink at this hour.

Living in the future is both exciting and frightening!

]]>dim@tapoueh.org (Dimitri Fontaine)Wed, 09 Oct 2013 17:42:00 +0200http://tapoueh.org/blog/2013/10/09-ip4r.htmlOpen World Forum Conferencehttp://tapoueh.org/blog/2013/10/07-Open-World-Forum.htmlLast Friday I had the chance to be speaking at the
Open World Forum in the
NewSQL track, where we had lots of interest and excitement around the
NoSQL
offerings. Of course, my talk was about explaining how
PostgreSQL is Web Scale with some historical background and technical
examples about what this database engine is currently capable of.

]]>dim@tapoueh.org (Dimitri Fontaine)Mon, 07 Oct 2013 22:05:00 +0200http://tapoueh.org/blog/2013/10/07-Open-World-Forum.htmlA Worthwile Micro Optimisationhttp://tapoueh.org/blog/2013/10/03-micro-optimizing-int-to-ip-address.htmlIn our previous article about
Loading Geolocation Data, we did load some
data into PostgreSQL and saw the quite noticable impact of a user
transformation. As it happens, the function that did the integer to IP
representation was so naive as to scratch the micro optimisation itch of
some Common Lisp hackers: thanks a lot guys, in particular
stassats who came
up with the solution we're seeing now.

Quite a direct naive implementation. Which is good to show what you can
expect in a kind of a worst case, and that worst case was using
31.546
seconds rather than
17.425 seconds when not doing any conversion. Well of
course the python code was spending
78.979 seconds for not doing any
conversion, but that's not the topic today.

Let's now see one of the micro optimised solution, the one I picked among a
list of 8 different proposal, each a little more crazy than the previous
one:

As usual the idea is to compute all you can in advance, here thanks to the
load-time-value special operator that's part of the Common Lisp Standard. So
we compute a table of all the dotted representation for a pair of two bytes,
and we do that computation at
load time, which happens when you load the
compiled code artifact you generated from your sources. Then all we have to
do is take the upper and lower bytes, fetch their representation in our
table, and concatenate both with a middle dot.

The reason why we only keep 2 bytes in the table is so that we don't require
about
64 GB of memory to be able to transform ip addresses...

Thanks to the optimisation, the
two bigint as text to iprange as text
transformation now has an added cost of
620 ms with our data set. The whole
file loading is now averaging to
10.07841 µs per row, or just a tad more
than
10 microseconds per row, transformation included this time.

Less than a second of added cost within a complete process taking around a
minute, that basically means that the transformation is now free.

Despite what you might have read elsewhere, my experience with the Common
Lisp Community so far really is great!

]]>dim@tapoueh.org (Dimitri Fontaine)Thu, 03 Oct 2013 22:10:00 +0200http://tapoueh.org/blog/2013/10/03-micro-optimizing-int-to-ip-address.htmlLoading Geolocation Datahttp://tapoueh.org/blog/2013/10/01-loading-geolocation-data.htmlAs I've been mentionning in the past already, I'm currently rewriting
pgloader from scratch in
Common Lisp. In terms of
technical debt that's akin
to declaring
bankrupcy, which is both sad news and good news as there's
suddenly new hope of doing it right this time.

Let's dive into the python to common lisp rewrite

Why rewriting pgloader?

Several problems hinted me into doing something other than maintaining the
code I had for
pgloader.

First, the configuration file format was already used well beyond its
capacities: it's quite hard to express advanced fine grained options when
using the infamous
INI file format. Continuing with python I did consider
using the
pyparsing library and even went as far as writing a prototype
command language with it. While the lib was working great for that, at the
time at least it wasn't available for
Python 3 and not to be found in some
OS vendors.
pgloader is available today for Debian and derivatives, RedHat
and derivatives, FreeBSD, OpenBSD, and some more. And you can easily enough
use it on Windows too, because it only depends on
Psycopg, basically, so
that I wanted to keep things simple for the end user. Basically, integrating
that lib was a problem.

Then again, the performances were a bigger and bigger worry for me as the
size of files one want to load into its database grew bigger. The real
problem was not perceived as the current performance characteristics but
rather how to improve them, and I didn't see many ways forward. The most
obvious one was to rewrite critical parts in C, but if I need to switch away
from python I'd rather find a better compromse than writing python modules
in C.

That's in short what lead me to consider alternatives. And as I had some
really good preliminary performances results with a programming language I
can actually like, I decided to go with Common Lisp.

A Test case

The blog post I actually wanted to write would have been about the awesome
ip4r extension, which is about indexing IP range searches as mostly used
nowadays in
geoip location. The first step here is to find some data to use
and load them, and wanting to do that I realised we were lacking an
all-integrated tool easy enough to use.

And that's exactly the reason why I'm working on
pgloader in the first
place, to make it really easy to load data, converting it on the fly and all
the jazz. And that
ip4r test case actually has been a very good excuse at
improving what I already had.

The new pgloader command language

Here's the current way of doing things with the new
pgloader written in
Common-Lisp, then I'll tell you why using a
parser generator library in that
language is less of a burden than doing the same thing in Python:

What you see here looks quite complex and heavy. That's because it's
addressing the full needs, let's see about that:

the source of the data loading is a zip archive that we have to download from a http url,

the zip archive contains a single directory entry wherein we find the data files we want to load, currently GeoLiteCity_20130903... as we want to be able to update the data without having to edit the command, you will note the support for matching the file name against a full path name: FILENAME MATCHING ~/GeoLiteCity-Blocks.csv/ is applying a regular expression,

we want the loading process to take care about creating the database schema we're going to use, and even drop the current index if any before loading, because we know that loading the data in bulk and then creating the index is way faster than keeping the index around,

there's no out of band place where to register whether a field value is null in a plain text CSV file, even if technically that could be done, so we have to provide per field instructions on what a NULL will look like in the data itself,

the files we're using don't have the exact definition we want: rather than a pair of double precision fields for hosting the location we want a PostgreSQL point datatype column; and rather than a pair of bigint fields we want an ip4r column in PostgreSQL,

and of course as soon as the data loading is done we want to create the shiny ip4rGiST index.

And as we are using an attribution licence for the data here, let me tell
you that: This article includes
GeoLite data created by
MaxMind, available
from
http://www.maxmind.com.

Transforming and Projecting

The former
pgloader was capable of using dynamic python code to reformat
fields on the fly, which was already quite good, but not up to the task
described just above. What we really want to be able to do here is
project
any number of fields into a
possibly different number of columns that are
dependant on the fields.

Traditionnaly used for Full Outer Join, but I liked the symbol

In the simplest case possible, each field ends up into a column of the same
name, and that's still supported of course, it looks like this:

We can see new sections, as the command is now actually extracting the
archive (at the size and given how many tests I wanted to do, the fetching
step as been avoided in the tests here), then preparing the load with all
the
CREATE TABLE IF NOT EXISTS steps, and then actually parsing the files
and loading the data.

So for doing the same work, we went from nearly
2 minutes with the previous
solution down to less than
30 seconds, including the zip archive extraction
and the create table steps. Another way to look at it is that
pgloader here
spent
9.73213 µs to process each row. Yeah, less than 10
microseconds per
row.

Declaring bankrupcy on the previous code base was maybe not the worst
decision I made this year, finally...

I want to note here that the python importing code is actually using the
csv
module for reading the files, and that module is already written in C in
fact. So we're actually comparing python and C on the one hand to Common
Lisp alone on the other hand.

Common Lisp version with fields to columns projection

Now, as we have some time budget left, let's have the loader actually do
more work for us and convert on the fly to the data representation we are
interested into:

So this time with the extra work we're going from about 18 seconds up to
about 32 seconds. The projection of integers to ip ranges is costly, but we
are still so much ahead from the old solution than we have enough time to
run the command
create index blocks_ip4r_idx on geolite.blocks using
gist(iprange); and still finish about 40 seconds earlier.

This time 31,456,000 microseconds where spent loading 1,790,461 rows, and
that gives us
17.568659 µs per row in average, including the
two bigint as
text to iprange as text transformation calls. Not too bad.

Conclusion

Earlier in the article I said I would tell you why it's ok to use a
parser
generator library such as
esrap here, whereas in the python case it was a
burden. The first part is that we don't have the major
platform problem that
python has with version 2 against version 3:
the Common Lisp Specs are a
really
stable base to develop libs, and so even decade old untouched code
has a really good chance to work as-is.

Another kind of stable, here

It's also possible and should be easy enough with
cl-buildapp to produce a
static all-included binary image of the application so that users don't have
to care about which programming language and libraires are used when
developping
pgloader.

And of course the proficiency and interactivity of using Common Lisp is so
much better than when using python than you almost wonder why, given the
performances characteristics we're seeing here and in almost any other test,
o why are so much people still using python?

Can I use that new software today?

Yes you can, be aware that it's not for the faint of heart in the current
shape of things. Most of the development time has been spent on features
development and testing, integration with a
nice enough command language and
things like that, and very few consideration has been made yet to ease of
use and advanced error management.

If you're a Common Lisp user already, you can fetch the code and look
around, the
docstrings and the code generated by the
parser should help you
figure out the API and use it. Of course, any feedback is welcome!

If you're not a Common Lisp user already, then the
README.md file has
instructions to get you started with the software, and you can hack your way
around with the command language from examples in that blog post. Proper
documentation of the supported commands and their options is still on the
to-do list.

The
pgloader software is distributed under the same Licence as PostgreSQL
itself, a licence in between MIT and two clauses BSD ones.

]]>dim@tapoueh.org (Dimitri Fontaine)Tue, 01 Oct 2013 16:52:00 +0200http://tapoueh.org/blog/2013/10/01-loading-geolocation-data.htmlOpen World Forum 2013http://tapoueh.org/blog/2013/09/19-OSCD-2013.htmlHave you heard about the
Open World Forum conference that takes place in
Paris, October 3-5, 2013? I'll be presenting a talk about
PostgreSQL in the
track
NewSQL: Managing large data sets with relational technologies.

We call it the world's most advanced open source database, and we are
actually offering in the same package full ACID compliance per default and
advanced trade-offs to reach any kind of flexibility needed, all with
per-transaction controls. That's including your replication and High
Availability setup. Given proper use of those controls and some schemaless
extensions, PostgreSQL really is web scale!

I hope to meet you there, on Friday the 4th of October, 10am!

]]>dim@tapoueh.org (Dimitri Fontaine)Thu, 19 Sep 2013 17:44:00 +0200http://tapoueh.org/blog/2013/09/19-OSCD-2013.htmlPostgreSQL data recoveryhttp://tapoueh.org/blog/2013/09/16-PostgreSQL-data-recovery.htmlThe following story is only interesting to read if you like it when bad
things happen, or if you don't have a trustworthy backup policy in place. By
trustworthy I mean that
each backup you take
must be tested with a test
recovery job. Only tested backups will prove useful when you need them. So
go read our
Backup and Restore documentation chapter then learn how to setup
Barman for handling
physical backups and
Point In Time Recovery. Get back
when you have proper backups, including recovery testing in place. We are
waiting for you. Back? Ok, let's see how bad you can end up without backups,
and how to still recover. With luck.

Set a trustworthy backup solution, and review your policy

Did I mention that a
trustworthy backup solution includes automated testing
of your ability to recover from any and all backup you've been taking? That
might be more important than you think.

This article is going to be quite long. Prepare yourself a cup of your
favorite beaverage.
TL;DR PostgreSQL resilience, flexibility and tolerance
to
bad situations is quite remarkable and allowed us to get
some data back
in the middle of a
destroyed cluster.

The Setup

Most of the customers I visit have already laid out a
backup strategy and
implemented it. Most of them did implement it with custom in-house scripts.
They hire high-skilled engineers who have been doing system administration
for more than a decade, and who are more than able to throw a
shell script
at the problem.

Shell scripting must in hindsight be one of the most difficult things to do
right, given how many times it turns around doing something else entirely
than what the program author though it would. If you want another one of my
quite bulk advices, stop doing any
shell scripting today: a
shell is a nice
interactive tool, if you are doing non-interactive scripting, that's
actually called system programming and you deserve a better tool than that.

In our very case, the customer did realize that a
production setup had been
approved and was running live
before any backup solution was in place. Think
about it for a minute. If you don't have
tested backups in place, it's not
production ready.

Well, the incident was taken seriously, and the usual backup
scripts
deployed as soon as possible. Of course, the shell scripts depended in
non-obvious ways on some parameters (environment variables, database
connections, database setup with special configuration tables and rows). And
being a
shell script, not much verification that the setup was complete had
been implemented, you see.

The Horror Story

And guess what the first thing that
backup script is doing? Of course,
making sure enough place is available on the file system to handle the next
backup. That's usually done by applying a
retention policy and first
removing backups that are
too old given said policy. And this script too did
exactly that.

The problem is that, as some of you already guessed (yes, I see those smiles
trying to hide brains thinking as fast as possible to decide if the same
thing could happen to you too), well, the script configuration had not been
done before entering production. So the script ran without setup, and
without much checking, began making bytes available. By removing any file
more than 5 days old. Right. In.
$PGDATA.

But recently modified are still there, right?

Exactly, not all the files of the database system had been removed. Surely
something can be done to recover data from a very small number of important
tables? Let's now switch to the present tense and see about it.

Can you spell data loss?

Remember, there's no
backups. The
archive_command is set though, so that's a
first track to consider. After that, what we can do is try to start
PostgreSQL on a copy of the remaining
$PGDATA and massage it until it allows
us to
COPY the data out.

The desperate PITR

The
WAL Archive is starting at the file
000000010000000000000009, which
makes it unusable without a corresponding
base backup, which we don't have.
Well, unless maybe if we tweak the system. We need to first edit the system
identifier, then reset the system to only begin replaying at the file we do
have. With some luck...

A broken clock is still right twice a day, a broken backup never is...

Using the transaction data we get from reading the first archive log file we
have with
xlogdump then using
pg_resetxlog and thus accepting to maybe lose
some more data, we still can't start the system in
archive recovery mode,
because the
system identifier is not the same in the WAL files and in the
system's
pg_controldata output.

So we tweak our fresh cluster to match, by changing the first 8 bytes of the
control file, paying attention to the byte order here. As I already had a
Common Lisp
REPL open on my laptop, the easier for me to switch from decimal
representation of the
database system identifier was so:

Paying attention to the byte order means that you need to edit the control
file's first 8 bytes in reverse:
F3 3D 3B 52 71 3D 33 52. But in our case,
no massaging seems to allow PostgreSQL to read from the archives we have.

On to massaging what is remaining in the old cluster then.

The Promotion

I'm usually not doing promotion in such a prominent way, but I clearly
solved the situation thanks to my colleagues from the 24/7 support at
2ndQuadrant, with a special mention to
Andres Freund for inspiration and
tricks:

We also know how to recover your data, but we first insist in proper backups

Oh, did I mention about proper backups and how you need to have been
successfully testing them before you can call a service
in production or
have any hope about your recovery abilities? I wasn't sure I did...

Playing fast and loose with PostgreSQL

The damaged cluster is not starting, for lack of important meta-data kind-of
files. First thing missing is
pg_filenode.map in the
global directory. Using
xlogdump it should be possible to recover just this file if it's been
changed in the WAL archives we have, but that's not the case.

Trying to salvage a damage case

pg_filenode.map

As this file is only used for shared relations and some bootstraping
situation (you can't read current
pg_class file node from
pg_class, as the
file mapping is the information you need to know which file to read), and
knowing that the version on disk was older than 5 days on a cluster recently
put into production, we can allow ourselves trying something: copy the
pg_filenode.map from another fresh cluster.

My understanding is that this file only changes when doing heavy maintenance
on system tables, like
CLUSTER or
VACUUM FULL, which apparently didn't get
done here.

By the way, here's one of those tricks I learnt in this exercise. You can
read the second and fourth columns as filenames in the same directory:

od -j 8 -N $((512-8-8)) -td4 < $PGDATA/global/pg_filenode.map

So copying default
pg_filenode.map allowed us to pass that error and get to
the next.

pg_clog

Next is the lack of some
pg_clog files. That's a little tricky because those
binary files contain the
commit log information and are used to quickly
decide if recent transactions are still in flight, or committed already, or
have been rolled back. We can easily trick the system and declare that all
transaction older than 5 days (remember the bug in the
cleanup script was
about that, right?) have in fact been
committed. A commit in the
CLOG is a
01 value, and in a single byte we can stuff as many as 4 transactions'
status.

Here's how to create those file from scratch, once you've noticed that
01010101 is in fact the ascii code for the letter
U.

(code-char #b01010101)
#\U

So to create a series of clog file where all transactions have been
committed, so that we can see the data, we can use the following command
line:

pg_database

The next step we are confronted to is that PostgreSQL has lost its baking
files for the
pg_database relation and has no idea what are those
directories in
$PGDATA/base supposed to be all about. We only have the
numbers!

That said, the customer still had an history of the commands used to install
the database server, so knew in which order the databases where created. So
we had an OID to name mapping. How to apply it?

Well
pg_database is a shared catalog and the underlying file apparently
isn't that easy to hack around, so the easiest solution is to actually hack
the
CREATE DATABASE command and have it accepts a
WITH OIDS option (
OIDS is
already a PostgreSQL keyword,
OID is not, and we're not going to introduce
new keywords just for that particular patch).

Equiped with that
hacked version of PostgreSQL it's then possible to use
the new command and create the databases we need with the
OIDS we know.

Those
OIDS are then to be found on-disk in the file where
pg_database is
internally stored, and we can ask the system where that file is:

Once that file is copied over to the (running, as it happened) damaged
cluster, it's then possible to actually open a connection to a database. And
that's pretty impressive. But suddenly it didn't work anymore...

Sytem Indexes

This problem was fun to diagnose. The first
psql call would be fine, but the
second one would always complain with an error you might have never seen in
the field. I sure didn't before.

FATAL: database "dbname" does not exists
DETAIL: Database OID 17838 now seems to belong to "otherdbname"

Part of PostgreSQL startup is building up some caches, and for that it's
using indexes. And we might have made a mistake, or the index is corrupted,
but apparently there's a mismatch somewhere.

But your now all-time favourite development team knew that would happen to
you and is very careful that any feature included in the software is able to
bootstrap itself without using any indexes. Or that in bad situations the
system knows how to resist the lack of those indexes by turning the feature
off, which is the case for
Event Triggers for example, as you can see in the
commit
cd3413ec3683918c9cb9cfb39ae5b2c32f231e8b.

Another kind of indexing system

So yes, it is indeed possible to start
PostgreSQL and have that marvellous
production ready system avoid any system indexes, for dealing with cases
where you have reasons to think those are corrupted... or plain missing.

While at it, we edit the
postgresq.conf and adjust some index usage related
settings, as you can see, because this problem will certain happen outside
of the system indexes.

If you're not using (only) PostgreSQL as your database system of choice, now
is the time to check that you can actually start those other systems when
their internal indexes are corrupted or missing, by the way. I think that
tells a lot about the readiness of the system for production usage, and the
attitude of the developpers towards what happens in
the field.

Also note that with PostgreSQL it's then possible to rebuild those system
indexes using the
REINDEX command.

So we now have a running PostgreSQL service, servicing the data that still
is available. Well, not quite, We have a PostgreSQL service that accepts to
start and allows connections to a specific database.

pg_proc, pg_operator, pg_cast, pg_aggregate, pg_amop and others

The first query I did try on the new database was against
pg_class to get
details about the available tables. The
psql command line tool is doing a
large number of queries in order to serve the
\d output, the
\dt one is
usable in our case.

To know what queries are sent to the server by
psql commands use the
\set
ECHO_HIDDEN toggle.

About any query is now complaining that the target database is missing
files. To understand which file it is, I used the following query in a fresh
cluster. The following example is about an error message where
base/16384/12062 is missing:

In our specific case, no extensions were used. Check that before taking
action here, or at least make sure that the tables you want to try and
recover data from are not using extensions, that would make things so much
more complex.

Here we can just use default settings for most of the system catalogs: we
are using the same set of
functions,
operators,
casts,
aggregates etc as any
other 9.2 system, so we can directly use files created by
initdb and copy
them over where the error message leads.

pg_namespace

Some error messages are about things we should definitely not ignore. The
content of the
pg_namespace relation was lost on about all our databases,
and the application here were using non default schema.

To recover from that situation, we need to better understand how this
relation is actually stored:

So it's pretty easy here, actually, when you make the right connections:
let's import a default
pg_namespace file then append to it thanks to
COPY
IN, being quite careful about using
tabs (well, unless you use the
delimiter
option of course):

# copy pg_namespace from stdin with oids;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 16443 my_namespace 10 \N
>> \.

And now there's a new
schema in there with the
OID we want. Wait, how do we
figure out the OID we need?

So in the result of that query we have no
nspname, but we happen to know
that the table
bar is supposed to be in the schema
my_namespace.

And believe it or not, that method actually allows you to create a schema in
a database in a running cluster. We directly are editing the catalog files
and editing even the
OID of the rows we are injecting.

The reason we couldn't do that with
pg_database, if you're wondering about
that, is that
pg_database is a shared catalog and part of the bootstrapping,
so that it was impossible to start PostgreSQL until we fix it, and the only
implementation of
COPY we have requires a running PostgreSQL instance.

pg_attributes and pg_attrdef

So now we are able to actually refer to the right relation in a SQL command,
we should be able to dump its content right? Well, it so happens that in
some case it's ok and in some cases it's not.

We are very lucky in that exercise in that
pg_attribute is not missing. We
might have been able to rebuild it thanks to some
pg_upgrade implementation
detail by forcing the
OID of the next table to be created and then issuing
the right command, as given by
pg_dump. By the way, did I mention about
backups? and automated recovery tests?

We need the data attributes

In some cases though, we are missing the
pg_attrdef relation, wholesale.
That relation is used for default expressions attached to columns, as we can
see in the following example, taken on a working database server:

We need to remember that the goal here is to salvage some data out of an
installation where lots is missing, it's not at all about being able to ever
use that system again. Given that, what we can do here is just ignore the
default expression of the columns, by directly updating the catalogs:

COPY the data out! now!

At this point we are now able to actually run the
COPY command to store the
interesting data into a plain file, that is going to be usable on another
system for analysis.

Not every relation from the get go, mind you, sometime some
default catalogs
are still missing, but in that instance of the data recovery we were able to
replace all the missing pieces of the puzzle by just copying the underlying
files as we did in the previous section.

Conclusion

Really,
PostgreSQL once again surprises me by its flexibility and
resilience. After having tried quite hard to kill it dead, it was still
possible to actually rebuild the
cluster into shape piecemeal and get the
interesting data back.

Using an already made solution is often better because they don't just
include
backup support, but also
recovery support. You don't want to figure
out recovery at the time you need it, and you don't want to have to discover
if your backup really is good enough to be recovered from at the time you
need it either. You should be testing your backups, and the only test that
counts is a recovery.

It's even one of those rare cases where using
PostgreSQL replication would
have been a solution: the removing of the files did happen without
PostgreSQL involved, it didn't know that was happening and wouldn't have
replicated that to the standby.

The use
trigrams is often complementary to
Full Text Search. With trigrams
we can implement typing correction suggestions or index
like and
POSIX Regular Expressions searches.

Whatever the use case, it all begins as usual by enabling the extension
within your database server. If you're running from
PostgreSQL packages be
sure to always install the
contrib package, really. A time will come when
you need it and you will then be happy to only have to type
CREATE EXTENSION
to get started.

# create extension pg_trgm;
CREATE EXTENSION

Setting up the use case

The use case I want to show today is to suggest corrections to some words
the user did obviously typoed, because your search form is not finding any
result. Or to offer suggest as you type feature maybe, doing a database
search for approximate matching strings in a kind of
catalog that you have
to offer auto-completion.

This small database offers ten thousands
products and simplifies the schema
so much as to offer a single column
actor in the
products table. Let's
pretend we just filled in a search box to find products by actor name, but
we don't know the right spelling of the actor's name or maybe the cat really
wanted to help us on the keyboard that day.

A cat! that picture should at least double the traffic to this article...

The
trigram extension comes with two operators of interest for this
situation here, which are the
similarity operator named
% and the
distance
operator named
<->. The
similarity operator will compare the list of
trigrams extracted from the query terms with those extracted from each data
of our table, and filter out those rows where the data is considered not
similar enough.

Oh and by the way, did you know that the
~* operator we used above to
discover that there's not a single
Tony actor in our products table, that
~*
operator implements a
case insensitive posix regex search in PostgreSQL?
Isn't that awesome? Now, on to the next surprise, have a look at that
explain plan:

Note that without the
WHERE clause to filter on the trigram similarity I get
run times of 30ms rather than 3ms in my tests here, because the GiST index
is not used then. As usual
EXPLAIN is your friend and remember that a query
plan will change depending on the volume of your data set as known by the
PostgreSQL planner statistics.

Conclusion

The
trigram extension allows indexing
like searches and
regular expression
searches, and also know how to compute
similarity and
distance in between
texts, and how to index that. That's another power tool included with
PostgreSQL. Another reason why you won't believe how much behind the other
database systems you know of really are, if you ask me.

When all you have is hammer... doesn't apply to PostgreSQL

Oh, and get ready for PostgreSQL 9.3. Another release packed with awesome.

]]>dim@tapoueh.org (Dimitri Fontaine)Fri, 06 Sep 2013 16:15:00 +0200http://tapoueh.org/blog/2013/09/06-pg_trgm-suggestions.htmlAuditing Changes with Hstorehttp://tapoueh.org/blog/2013/08/27-auditing-changes-with-hstore.htmlIn a previous article about
Trigger Parameters we have been using the
extension
hstore in order to compute some extra field in our records, where
the fields used both for the computation and for storing the results were
passed in as
dynamic parameters. Today we're going to see another
trigger
use case for
hstore: we are going to record changes made to our tuples.

Comparing hstores

One of the operators that hstore propose is the
hstore - hstore operator
whose documentation says that it will
delete matching pairs from left
operand.

The idea is to add a row in the
audit table each time it is updated, with
the
hstore representation of the data in flight before and after the change.
So as to avoid the problem of not being able to easily rebuild the current
value of a row at any time in the history, we're going to store a couple of
full
hstore representations here.

Now, we need to attach the trigger to the table which is the source of our
events. Note that we could attach the same trigger to any table in fact, as
the details of the
audit table has nothing specific about the
example table.
If you want to do that, though, you will certainly want to add the name of
the source table of the event you're processing, available from within your
trigger as
TG_TABLE_NAME. Oh and maybe add
TG_TABLE_SCHEMA while at it!

The
hstore extension is really useful and versatile, and we just saw another
use case for it!

]]>dim@tapoueh.org (Dimitri Fontaine)Tue, 27 Aug 2013 17:35:00 +0200http://tapoueh.org/blog/2013/08/27-auditing-changes-with-hstore.htmlTrigger Parametershttp://tapoueh.org/blog/2013/08/23-parametrized-triggers.htmlSometimes you want to compute values automatically at
INSERT time, like for
example a
duration column out of a
start and an
end column, both
timestamptz. It's easy enough to do with a
BEFORE TRIGGER on your table.
What's more complex is to come up with a parametrized spelling of the
trigger, where you can attach the same
stored procedure to any table even
when the column names are different from one another.

I found a kind of trigger that I can use!

The exact problem to solve here is how to code a
dynamic trigger where the
trigger's function code doesn't have to hard code the field names it will
process. Basically,
PLpgSQL is a static language and wants to know all about
the function data types in use before it
compiles it, so there's no easy way
to do that.

The exemple

Let's start simple, with a table having a
d_start and a
d_end column where
to store, as you might have already guessed, a start timestamp (with
timezone) and an ending timezone. The goal will be to have a parametrized
trigger able to maintain a
duration for us automatically, something we
should be able to reuse on other tables.

I edited the result for it to be easier to read, splitting it on more than
one line, so if you try that at home you will have a different result.

What's happening in that first example is that we are transforming a
row
type into a value of type
hstore. A
row type is the result of
select foo
from foo;. Each PostgreSQL relation defines a type of the same name, and you
can use it as a
composite type if you want to.

Now, hstore also provides the
#= operator which will replace a
given field in a row, look at that:

And here's how to attach the trigger to our table. Don't forget the
FOR EACH
ROW part or you will have a hard time understanding why you can't accedd the
details of the
OLD and
NEW records in your trigger: they default to being
FOR EACH STATEMENT triggers.

The other important point is how we pass down the column names as argument
to the stored procedure above.

Conclusion

Thanks to the
hstore extension we've been able to come up with a dynamic
solution where you can give the name of the columns you want to work with at
CREATE TRIGGER time rather than hard-code that in a series of stored
procedure that will end up alike and a pain to maintain.

]]>dim@tapoueh.org (Dimitri Fontaine)Fri, 23 Aug 2013 12:08:00 +0200http://tapoueh.org/blog/2013/08/23-parametrized-triggers.htmlUnderstanding Window Functionshttp://tapoueh.org/blog/2013/08/20-Window-Functions.htmlThere was SQL before
window functions and SQL after
window functions: that's
how powerful this tool is. Being that of a deal breaker unfortunately means
that it can be quite hard to grasp the feature. This article aims at making
it crystal clear so that you can begin using it today and are able to reason
about it and recognize cases where you want to be using
window functions.

We see a part of the data as if through a little window

The whole idea behind
window functions is to allow you to process several
values of the result set at a time: you see through the window some
peer
rows and are able to compute a single output value from them, much like when
using an
aggregate function.

It's all about frames

When looking through a window we only see a frame at a time

PostgreSQL comes with plenty of features, one of them will be of great help
here to get a better grasp at what's happening with
window functions. The
first step we are going through here is understanding which
data the
function has access to. For each input row you have access to a
frame of the
data, and the first thing to understand here is that
frame.

First, meet with
array_agg, an
aggregate function that will build an array
for you. Let's use this tool to understand
window frames:

The
array_agg column in the previous query output allows us to see the full
exact content of the
windowing we're going to process. The window definition
here is
over (order by x) and actually means
over (order by x rows between
unbounded preceding and current row):

Did you know you could compute both the total sum of a column and the
proportion of the current value against this total within a single SQL
query? That's the breakthrough we're talking about now with
window
functions.

Partitioning into different frames

How to frame a partition?

Other frames are possible to define when using the clause
PARTITION BY. To
see that in action though we need some more data to work with. The following
query is setting up an example for us to work with and will produce three
values per day for three different days, thanks to an implicit
CROSS JOIN
construct here:

Available window functions

Any and all
aggregate function you already know can be used against a
window
frame rather than a
grouping clause, so you can already go use
sum,
min,
max,
count,
avg and the other you're used to.

You might already know that it's possible with PostgreSQL to use the
CREATE AGGREGATE command to register your own
custom aggregate. Any such
custom aggregate can then be given a
window frame definition to work against
too. As an exercize to my dear readers, I this time propose you implement a
weighted average aggregate and use it against a table where you have at
least three columns: a date, a weight and a measure, with several measures
per day. Now compute your
weighted average by applying your own aggregate to
your data set, either in a
grouping clause or a
window frame.

In this example you can see that we are reusing the same
window definition
each time, so we're giving it a name to make it simpler.

Conclusion

The real magic of what's called
window functions is actually the
frame of
data they can see when using the
OVER () clause and its
PARTITION BY and
ORDER BY and
frame clauses.

You need to remember that the windowing clauses are always considered last
in the query, meaning after the
WHERE clause. You can only see in any
frame
rows that have been selected for output: e.g. it's not directly possible to
compute a percentage over values that you don't want to display. You would
need to use a subquery in that case.

]]>dim@tapoueh.org (Dimitri Fontaine)Tue, 20 Aug 2013 12:04:00 +0200http://tapoueh.org/blog/2013/08/20-Window-Functions.htmlMigrating from MySQL to PostgreSQLhttp://tapoueh.org/blog/2013/08/08-MySQL-to-PostgreSQL.htmlAbout the only time when I will accept to work with MySQL is when you need
help to migrate away from it because you decided to move to
PostgreSQL
instead. And it's already been too much of a pain really, so after all this
time I began consolidating what I know about that topic and am writing a
software to help me here. Consider it the
MySQL Migration Toolkit.

A real classic that I couldn't resist using here...

Without further ado, here's a little demo of the test case I have here to
work on the problem at hand. I've been working from a new version of
pgloader and added to it, among other novelties and a complete new
implementation, a
command language. Here's what the
MySQL Migration to
PostgreSQL command looks like currently:

As you can see here we did import more than
one million and a half rows in
less than a minute, in a local virtual machine running on top of my laptop
(optimized for travelling, not for data processing).

More important than those numbers that you can't easily compare to anything
else, the feature set makes it the tool I failed to find before. Given the
command above, the tool will actually connect to a MySQL database, retrieve
its table list and column definitions and apply
casting rules to those to be
able to issue the right
CREATE TABLE statements in PostgreSQL.

Once the tables are in place in PostgreSQL, the data is
streamed from MySQL
to PostgreSQL using two concurrent threads and the
COPY protocol at the
pushing side of the data
pipe. Then the indexes (including
primary keys) are
recreated and the sequences reset.

Lots of work remain to be done on that project, and that's the main reason
why I wanted to finish the bits I'd been sitting on for several months
already, so that you can see it running and share some motivation to see a
version 1.0 sooner rather than later.

The
TODO list includes parallel loading of tables, per-column casting rules,
client-side
ddl-partitioning, enhanced logging, binary distribution for a
bunch of platforms and lots of production battle testing. Oh and also adding
back the main
pgloader capabilities when loading from
flat files (CSV, fixed
cols or otherwise) would be great too.

The battle testing is crucial given the source of the data: MySQL is known
for sometime introducing bugs in
minor versions (I just had to fight against
http://bugs.mysql.com/bug.php?id=19564 and hacked something to distinguish
NULL from empty string on the client), so the only way to convince oneself
that it works in any specific case is to test it.

This one is not my favourite toy...

If you're interested into that tool, as several people already told me they
are, then let's talk about how to reach version 1.0 together so that you
have a finished product that's easy to use, covering your bases, and Open
Source of course (to be released under
The PostgreSQL Licence)!

]]>dim@tapoueh.org (Dimitri Fontaine)Thu, 08 Aug 2013 17:41:00 +0200http://tapoueh.org/blog/2013/08/08-MySQL-to-PostgreSQL.htmlHow far is the nearest pub?http://tapoueh.org/blog/2013/08/05-earthdistance.htmlIn our recent article about
The Most Popular Pub Names we did have a look at
how to find the pubs nearby, but didn't compute the
distance in between that
pub and us. That's because how to compute a distance given a position on the
earth expressed as
longitude and
latitude is not that easy. Today, we are
going to solve that problem nonetheless, thanks to
PostgreSQL Extensions.

Some math are required to go from (long, lat) to distance on earth

The earthdistance PostgreSQL contrib

As the maths are complex enough to easily make mistakes when implementing
them again, we want to find an existing implementation that's been tested
already. PostgreSQL provides several
contrib extensions, one of those is
named
earthdistance and is made to solve our problem. Time to try it!

# create extension cube;
# create extension earthdistance;

Equiped with that extension we can now use its
<@> operator and compute a
distance in miles at the surface of the earth, given points as
(longitude,
latitude). So I had to import our data set again with points in the right
representation, then I could run this query:

Now we want to know what city are those pubs in right? With the following
URL and using the
Open Street Map APIs, I've been able to download a list of
cities in the same area as where the pub names were fetched in:
http://www.overpass-api.de/api/xapi?*[place=city][bbox=-10.5,49.78,1.78,59].

Tweaking the parser and import code at
https://github.com/dimitri/pubnames
was easy, and allowed to import those city names and locations in
0.087
seconds of real time, with the following schema:

As you can see we are fetching the pubs at a distance from our given point
and then the nearest city from where the pub is. The way it's implemented
here is called a
correlated subquery, and starting with 9.3 we will be able
to use the
LATERAL standard join construct, as in the following example:

So apparently the
bounded box that we've been given
(
[bbox=-10.5,49.78,1.78,59]) includes Ireland too... and more
importantly the query execution penalty is quite important. That's because
the planner only know how to solve that query by doing
Index Scan
using cities_pos_idx on public.cities c (cost=0.14..9.60 rows=73 width=25)
(actual time=0.016..0.016 rows=1 loops=27878), which means scanning
the position index of the cities 27878 times (once per pubnames entry).

It's possible to force the planner into doing it the obvious way though:

If we look at a map we see that
Westminster is in fact within
London given
our arbitrary rule of
within 5 miles, so in the next query we will simply
filter it out. Exercise left to the reader: write a query allowing to remove
from London's count the pubs that are actually in Westminster (when within 1
mile of the location we have for it). Then extend that query to address any
other situation like that in the whole data set.

Often the most powerful tool you have to make sense of your data...

And now what about the most popular pub names per city? Of course we want to
normalize again our pub names here but only for counting: we still display
all the names we did count.

Conclusion

As said in the previous article on the same theme, SQL when using
PostgreSQL
is indeed quite powerful! We've been able to easily add an implementation of
the earth distance computation from
longitude and
latitude as found in the
earthdistance contrib extension (already packaged for your Operating System
of choice, be sure to install
contribs by default), then to use it to solve
some interesting problems with our data set.

]]>dim@tapoueh.org (Dimitri Fontaine)Mon, 05 Aug 2013 08:11:00 +0200http://tapoueh.org/blog/2013/08/05-earthdistance.htmlThe Most Popular Pub Nameshttp://tapoueh.org/blog/2013/08/02-pub-names-knn.htmlIn his article titled
The Most Popular Pub NamesRoss Lawley did show us how
to perform some quite interesting
geographic queries against
MongoDB, using
some nice
Open Data found at the
Open Street Map project.

The Open Street Map project publishes a lot of information!

I found the idea behind that article really neat: using easily accessible
data produced by an Open Source project to show off some nice queries with
real data is what we should do more often. Also, as a PostgreSQL guy I
couldn't help but feel distracted by the query language used in that article
and thinking that it would be so much simpler to express the same queries in
SQL.

The idea behind SQL is that the syntax is made for the queries to be easy
enough to read and maintain while being insanely powerful at times. My take
on SQL is that it's often enough the easiest way to express your queries,
and that even if it can get crazy complex at times, that's only because SQL
has the power you need to pass all your complexity down to it.

At least that's my thinking, and this article is my try at sharing this
viewpoint with you.

Loading the data

The data itself is available in some kind of an XML format where they
managed to handle the data in a
EAV model:

For the sake of this article, we're going to use the simplest schema
possible. I didn't want to try and see if the
id actually is unique and
never omitted, for example, so here's the schema I've been working with:

Where the
MongoDB article used
imposm python library to load the data, I
wanted to take the opportunity to loading it as a stream: a
SAX like API to
read the XML should allow to send the data as we parse it in a
COPY stream,
right?

Here's a slightly edited portion of the code I've been using to parse and
load the data, available as the
pubnames project on
GitHub:

Given that code, we can parse the data in the XML file and load it into our
PostgreSQL table in about 5.5 seconds on my laptop. If I had to optimize
that loading time I'd try having two concurrent threads, one of them reading
from the XML file and pushing to a queue, and the other one pulling from the
same queue and filling our
COPY buffer.

Normalizing the data

The first query of our reference article
The Most Popular Pub Names shows
the python code they've been using in order to normalize the data so that
it's then possible to list
The Most Popular Pub Names in the United Kingdom.
Here, we didn't process the OSM data at all, so what about normalizing it
directly within a SQL query?

The
array_to_string function allows us to tweak the output to our
convenience, as the
array_agg(distinct(name) order by name) aggregate is
doing all the work for us here, in grouping all
names together and keeping
an ordered set of a unique entry per variant.

Which
names do we group together will you ask me? Well, those having the
same name apart from some spelling variants: we don't want to consider
The
to be a difference so we replace it with an empty string, and we do want to
consider both
And and
& as the same thing too.

Again, I'm reproducing the same processing as with the
MongoDB article.

Geolocating nearest pub (KNN search)

The spelling of the
KNN search in
PostgreSQL involves ordering the result
set with a
distance operator, which is itself spelled
<->. Here's the full
SQL for searching the pubs nearby our position, or actually the position
given as an example in the
MongoDB article:

As we're using the
point datatype in PostgreSQL, there's no simple way that
I know of to convert that distance into something like
meters or maybe
yards
here. That's of course possible to do, even considering the actual shape of
the earth, thanks to some
PostgreSQL Extensions such as
earthdistance or the
full blown
PostGIS. The details about that are for another article though.

PostgreSQL has a very rich and powerful datatype system that goes well
beyond storing numbers, text and dates. If you're not familiar with that
idea, you should read about it, maybe beginning with
PostgreSQL Data Types
chapter of the documentation.

Using a KNN specific index

With a dataset of 27878 rows having an answer in about 20ms is not a great
achievement. Indeed, we didn't create any indexing whatsoever on the table
yet, so the query planner has no other choice but to scan the whole content
on disk and filter it as it goes.

It would be way better for performances if we could instead evaluate our
query constraints (here, the
ORDER BY and
LIMIT clauses) using some index
search instead.

That's exactly the kind of situation that
GiST and
SP-GiST indexes have been
designed to be able to solve for you in PostgreSQL, and in particular the
KNN GiST support. Let's have a try at it:

Now we talk! With a dataset of 27878 rows in total, finding the 3 nearest
pubs in less than a millisecond is something we can actually be happy with,
and can use directly in a web application. I would expect this performances
to remain in the right ballpark even for a much larger dataset, and leave it
as an exercise for you to find that dataset and test the
KNN GiST indexes on
it!

Conclusion

PostgreSQL at the center of your dataverse

What I want to take home from this article is the idea that the plain old
SQL language still has lots to offer to modern data analysis needs, in
particular when you're using
PostgreSQL.

That database system knows how to stay relevant in a fast evolving
environment, where your needs are more and more demanding: more data, more
analysis, more users.

The past few releases of PostgreSQL each come with plenty of new features to
better support your demanding use cases. We do a solid new
release each
year, and you can check the
feature matrix to see by yourself the amazing
pace at which we are able to improve our system.

If you're using
PostgreSQL you have very few reasons to look for another
solution. Some cases of course are still best handled in system more
tolerant of data loss for example. When that happen though, in my
experience, it's always a complementary service that will run alongside
PostgreSQL. And for them to coexist peacefully, we even offer you
Foreign Data Wrappers!

]]>dim@tapoueh.org (Dimitri Fontaine)Fri, 02 Aug 2013 10:19:00 +0200http://tapoueh.org/blog/2013/08/02-pub-names-knn.htmlOSCON, Portland, and PDXPUGhttp://tapoueh.org/blog/2013/07/29-Portland-Conferences.htmlAfter spending an awesome week in
San Francisco, CA I'm lucky enough to be
spending another week in the USA, in
Portand, OR. The main excuse for
showing up here has been
OSCON where I presented a talk about the
fotolog
migration from MySQL to
PostgreSQL.

Fotolog is a photo sharing website having more than
32 millions of users
sharing more than
a billion of photos, which made for a very interesting
migration use case. In particular switching from a hand-made MySQL sharding
cluster of 37 databases to a fully integrated
PLProxy setup on 16 servers
hosting each 16 shards, in less than 6 months including complete code
rewrite, made things... interesting.

This image is the missing piece in the slide deck

While in Portland I also had the pleasure to present a talk at the
PGXPUG
User Group Meeting, hosted by
MyEmma who provided a very nice place, beers
and pizzas. Meeting with local users is always a great time, and I've
presenting the
PostgreSQL as a developer talk that I also did for the
Dublin
User Group, which you can see online at my
Dublin User Group conference
report page.

You are already using SQL, make the best out of it!

Apparently at least one of the attendees really did like the presentation.
The angle is to convince developpers to consider
SQL really as one of their
programming languages, more like they consider
Javascript than how they
generally consider
HTML. And this attendee's questions where all about how
much of the
middleware (or
model implementation) can we move down to
PostgreSQL.

Of course the technical answer is all of it, as demonstrated by
apache mod_libpq, wherein URLs are simply transformed into
stored procedure
calls. Which in PostgreSQL you can implement in basically any programming
language you want, like
PL/Python or
PL/Perl to give but only two examples.

]]>dim@tapoueh.org (Dimitri Fontaine)Mon, 29 Jul 2013 17:09:00 +0200http://tapoueh.org/blog/2013/07/29-Portland-Conferences.htmlTalking at the SFPUGhttp://tapoueh.org/blog/2013/07/19-SFPUG-Skytools3.htmlThose days feel really lucky to me. I'm currently visiting friends and
customers in San Francisco, and really enjoying my trip here! Of course
Josh Berkus took the opportunity to organise a
SFPUG meeting and I had the
pleasure of being the speaker over there.

My talk was about the most recent version of
Skytools and the opportunity to
realise that we're still missing a lot on documentation. One of the attendee
did propose to help us on that front as he apparently really likes technical
writing. I hope that effort will happen and help solve the main issue here.

The talk then generated a lot of questions from the audience, ranging from
how to upgrade from previous version (I covered tha topic in
Live Upgrading PGQ here) to lots of details about operating
londiste and the
many options you have to control
batch size and
lag.

Mainly as the talk was oriented towards
how to use skytools rather than
use
cases and architectures, the questions were centered around how to solve
problems using the tools.

Skytools3: the slides

I think the best part of it for me was that our host
Bazaarvoice actually is
using
Skytools and as a result were really interested into the talk and
asked very good questions too!

]]>dim@tapoueh.org (Dimitri Fontaine)Fri, 19 Jul 2013 10:24:00 +0200http://tapoueh.org/blog/2013/07/19-SFPUG-Skytools3.htmlBack from CHAR(13)http://tapoueh.org/blog/2013/07/15-char-13.htmlLast week was held the
CHAR(13) conference in a great venue in the UK
countryside. Not only did we discover UK under good weather conditions and
some local beers, we also did share a lot of good ideas!

The Hordwood House is quite of a maze really!

The conference was run side to side with
PGDAY UK, and those two days were
packed with great conferences!

I had the pleasure to present a talk about
Advanced Distributed
Architectures where some examples of architectures using
Streaming
Replication,
Skytools and
PLproxy are shown.

The main thing to remember might be that replication only is a tool!

]]>dim@tapoueh.org (Dimitri Fontaine)Mon, 15 Jul 2013 09:49:00 +0200http://tapoueh.org/blog/2013/07/15-char-13.htmlArchiving data as fast as possiblehttp://tapoueh.org/blog/2013/07/05-archiving-data-fast.htmlIn a recent article here we've been talking about how do do
Batch Updates in
a very efficient way, using the
Writable CTE features available in
PostgreSQL 9.1. I sometime read how
Common Table Expressions changed the
life of fellow DBAs and developers, and would say that
Writable CTE are at
least the same boost again.

Writable CTEs allow to easily implement data processing pipelines

In the case of archiving data into side tables the pipeline we're talking
about aims to move data out of a table (that's a
DELETE) then store it on
the destination (
archiving) table, and that's an
INSERT:

WITH archive AS (
DELETE FROM source WHERE ...
RETURNING s.id
)
INSERT INTO target(id, counter)
SELECT * FROM archive;

That's probably the most efficient way to move data around in PostgreSQL
when the source and target tables are within the same database. If not, then
I can't wait to play with the
Postgres Foreign Data Wrapper in 9.3, that
should allow to send the data directly with the same query.

Now, if you have foreign keys to deal with, the trick is to embed all the
related statements in the right ordering into a single transaction... and
that's about it. For archiving data remotely though, it's a little more
complex as we need to control two transactions in the right way™, and that
is a topic for another article later. The transactional model to follow
though is the one implemented in
PGQ already, so at least we know how to do
things.

About performances of that simple method, let me say that it's fast enough
that you can actually find yourself pusing loads of
WAL data down to your
Hot Standby server. If the system is already very highly loaded, then a way
to
throttle the impact is needed.

The way to fix that problem that I've been using is to check the
lag in
between the Hot Standby you're interested into and the primary server by
running that query periodically:

select pg_current_xlog_location() as current, replay_location as replay
from pg_stat_replication
where application_name = 'standby-name';

Be aware that any replication client that you use will show up in the
pg_stat_replication view, and that includes
pg_basebackup and
pg_receivexlog:

Then in between loops of running the
WITH archive AS (DELETE ...) INSERT
query, when the lag is higher than your arbitrary threshold, just pause
until it's back under control. That part I've implemented with a very simple
buzy loop around the previous query and a 1 second wait.

Now, to make sense of the returned data you can use the function
pg_xlog_location_diff as of 9.2. If you're still using 9.1, then you can
replicate its implementation in your client application code, it's simple
enough to do so. Here's a
Common Lisp version of it:

]]>dim@tapoueh.org (Dimitri Fontaine)Fri, 05 Jul 2013 15:30:00 +0200http://tapoueh.org/blog/2013/07/05-archiving-data-fast.htmlSimple Case for Pivoting in SQLhttp://tapoueh.org/blog/2013/07/04-Simple-case-for-pivoting.htmlIn a recent article
Craig Kerstiens from
Heroku did demo the really useful
crosstab extension. That function allows you to
pivot a table so that you
can see the data from different categories in separate columns in the same
row rather than in separate rows. The article from
Craig is
Pivoting in Postgres.

Pivoting a matrix, also known as a matrix transposition

Let's do the same setup as he did, with a table containing some randomly
generated data about hypothetical visits to a web page, say, by date then by
operating system.

Now that we have some data to play with, what we want is the number of
visits per os as different columns, having a result with 4 columns: the
date, the number of visits using OSX that day, then using Windows, then
using Linux. How to do that in plain SQL?

The other advantage of that solution —
apart from being standard SQL — is
that if you happen to have more than a single row per group for a given
category (after all, the data definition is not normalized here, it's known
as
EAV and you want to stay away from that as much as possible), then the
sum(case) query will work just fine.

]]>dim@tapoueh.org (Dimitri Fontaine)Thu, 04 Jul 2013 15:55:00 +0200http://tapoueh.org/blog/2013/07/04-Simple-case-for-pivoting.htmlConferences Reporthttp://tapoueh.org/blog/2013/07/03-conferences-report.htmlRecently I've been to some more conferences and didn't take the time to blog
about them, even though I really did have great fun over there. So I felt I
should take some time and report about my experience at those conferences.
And of course, some more is on the way, as the
PostgreSQL Conference Tour
gets busier each year it seems.

And PostgreSQL Conferences get more attendees each year!

PGCON 2013, Ottawa

In may was the famous
PGCON conference where PostgreSQL contributors are
meeting all together, offering the occasion to run the
Hackers Meeting. This
year's edition has been a really great one, with lots of people attending
and lots of really interesting talks to attend to. In fact, so much
interesting that I almost skipped the
Hallway Track entirely, which is
really impressive.

The main topics of interests I've been hearing in the more general talks
I've attended, and in the numerous informal chats I had (we call that the
beer track or the
social events) where about
Bi Directional Replication as
the next step forward with our core-included replication technology and how
to get more from
Extensions. I'm very happy to be affiliated with the
company working on those topics, as you can imagine.

My own take away for that conference is another batch of work to complete
for
Extensions and
Event Triggers, so expect to see some articles about
those topics in the following months, and with some luck I will even be able
to talk about what I want to achieve when those tools land in core.

PGDayFR 2013

Then in June was the
PG Day France where I presented a talk about
Petabyte PostgreSQL, in french. This talk is about listing the current
limitations preventing us from enjoying PostgreSQL at full capacity on a
single
Petabyte node, then talking about the work in progress to get there.

The conference itself was great with about one hundred attendees and a good
talk selection. It was a single track, all in french, and I still hope that
we would be able to organize a conference with two tracks, allowing us to
invite speakers from all over Europe. I'm pretty sure attendees would be
happy to listen to English talks if they had a choice about it and could go
to the main french session instead.

CHAR(13)

The next conference on the schedule is now
CHAR(13), a
2ndQuadrant company
conference about
Clustering, High Availability and Replication. I will be
talking about
Advanced Distributed Architectures, or in other words those
use cases for replication where the main goal is not to implement High
Availability.

]]>dim@tapoueh.org (Dimitri Fontaine)Wed, 03 Jul 2013 16:53:00 +0200http://tapoueh.org/blog/2013/07/03-conferences-report.htmlMake the Most ouf of SQLhttp://tapoueh.org/blog/2013/07/02-dubpug.htmlTonight I had the pleasure to present a talk at the
Dublin PostgreSQL User Group using remote technologies. The talk is about
how to make the most ouf of PostgreSQL when using SQL as a developer, and
tries to convince you to dive into
mastering SQL by showing how to solve an
application example all in SQL, using
window functions and
common table
expressions.

PostgreSQL for developer

And using remote technologies, the presentation have been recorded and made
available on the internet:

The quality seems ok!

]]>dim@tapoueh.org (Dimitri Fontaine)Tue, 02 Jul 2013 22:22:00 +0200http://tapoueh.org/blog/2013/07/02-dubpug.htmlNearest Big Cityhttp://tapoueh.org/blog/2013/05/02-nearest-big-city.htmlIn this article, we want to find the town with the greatest number of
inhabitants near a given location.

Note that you can actually index such queries, that's called a
KNN index.
PostgreSQL knows how to use some kind of indexes to fetch data matching an
expression such as
ORDER BY a <-> b, which allow you to consider a
KNN
search in your application.

This time you can see that I created an extra column with the
location in
there, so that I don't have to compute it each time I need it, like I did
before.

Now is the time to test that data set and hopefully fetch the same result as
before when we only had french cities loaded:

with neighbours as (
select country_code, city_lower, population
from maxmind_worldcities
where population is not null
order by loc <->
(select loc
from maxmind_worldcities
where city_lower = 'villeurbanne')
limit 10
)
select * from neighbours order by population desc limit 1;
country_code | city_lower | population
--------------+------------+------------
fr | lyon | 463700
(1 row)

Ok, looks like we're all set for the real problem. Now we want to pick for
each of those cities it's nearest neighboor, so here's how to do that:

create index on maxmind_worldcities(country_code, region_code, city_lower);
create index on maxmind_worldcities using gist(loc);
create table maxmind_neighbours as
select country_code, region_code, city_lower,
(with neighbours as (
select country_code, city_lower, population
from maxmind_worldcities
where population is not null
and country_code = wc.country_code
and region_code = wc.region_code
order by loc <-> wc.loc
limit 10)
select city_lower
from neighbours
order by population desc
limit 1
) as neighbour
from maxmind_worldcities wc ;

To be fair, I have to tell you that this query took almost 2 hours to
complete on my laptop here, but as I'm doing that for friend and a blog
article, I've been lazy and didn't try to optimise it. It could be using
LATERAL for sure, I don't know if that would help very much with
performances: I didn't try.

With that in hands we can now check some cities and their
biggest
neighbours, as in the following query:

]]>dim@tapoueh.org (Dimitri Fontaine)Thu, 02 May 2013 11:34:00 +0200http://tapoueh.org/blog/2013/05/02-nearest-big-city.htmlBulk Replicationhttp://tapoueh.org/blog/2013/03/18-bulk-replication.htmlIn the previous article here we talked about how to
properly update more
than one row at a time, under the title
Batch Update. We did consider
performances, including network round trips, and did look at the behavior of
our results when used concurrently.

A case where we want to apply the previous article approach is when
replicating data with a
trigger based solution, such as
SkyTools and
londiste. Well, maybe not in all cases, we need to have a amount of
UPDATE
trafic worthy of setting up the solution. As soon as we know we're getting
to
replay important enough batches of events, though, certainly using the
batch update tricks makes sense.

It so happens that
londiste 3 includes the capability to use
handlers. Those
are plugins written in
python (like all the client side code from
SkyTools)
whose job is to handle the
processing of the event batches. Several of them
are included in the
londiste sources, and one of them is named
bulk.py.

correct (
0)
- inserts as
COPY into table
- update as
COPY into temp table and single
UPDATE from there
- delete as
COPY into temp table and single
DELETE from there

delete (
1)
- as
correct, but
update are done as
DELETE then
COPY

merged(2)
- as
delete, but merge
insert rows with
update rows

Conclusion

Yes, by using that
handler which is provided by default in
londiste, you
will apply the previous article tricks in your replication solution. And you
can even choose to use that for only some of the tables you are replicating.

]]>dim@tapoueh.org (Dimitri Fontaine)Mon, 18 Mar 2013 14:54:00 +0100http://tapoueh.org/blog/2013/03/18-bulk-replication.htmlBatch Updatehttp://tapoueh.org/blog/2013/03/15-batch-update.htmlPerformance consulting involves some tricks that you have to teach over and
over again. One of them is that SQL tends to be so much better at dealing
with plenty of rows in a single statement when compared to running as many
statements, each one against a single row.

Another kind of Batch to update

So when you need to
UPDATE a bunch of rows from a given source, remember
that you can actually use a
JOIN in the
update statement. Either the source
of data is already in the database, in which case it's as simple as using
the
FROM clause in the
update statement, or it's not, and we're getting back
to that in a minute.

UPDATE FROM

Using that, you can actually update thousands of rows in our
target table in
a single statement, and you can't really get faster than that.

Preparing the Batch

Now, if you happen to have the source data in your application process'
memory, the previous bits is not doing you any good, you think. Well, the
trick is that pushing your in-memory data into the database and then joining
against the now local source of data is generally faster than looping in the
application and having to do a whole network
round trip per row.

As we're talking about performances, the trick here is to use the
COPY
protocol to fill in the
temporary table we just create to hold our data. So
we're now sending the whole data set in a temporary location in the
database, then using that as the
UPDATE source. And that's way faster than
doing a separate
UPDATE statement per row in your batch, even for small
batches.

Also, rather than using the SQL
COPY command, you might want to look up the
docs of the PostgreSQL driver you are currently using in your application,
it certainly includes some higher level facilities to deal with pushing the
data into the streaming protocol.

Insert or Update

And now sometime some of the rows in the batch have to be
updated while some
others are new and must be inserted. How do you do that? Well, PostgreSQL
9.1 brings on the table
WITH support for all
DML queries, which means that
you can do the following just fine:

That query here is
updating all the rows that are known in both the
target
and the
source and returns what we took from the
source in the operation, so
that we can do an
anti-join in the next step of the query, where we're
inserting any row that was not taken care of in the
update part of the
statement.

Note that when the batch gets to bigger size it's usually better to join
against the
target table in the
INSERT statement, because that will have an
index on the join key.

Concurrency patterns

Now, you will tell me that we just solved the
UPSERT problem. Well what
happens if more than one transaction is trying to do the
WITH (UPDATE)
INSERT dance at the same time? It's a single
statement, so it's a single
snapshot. What can go wrong?

Concurrent processing

What happens is that as soon as the concurrent sources contain some data for
the same
primary key, you get a
duplicate key error on the insert. As both
the transactions are concurrent, they are seeing the same
target table where
the new data does not exists, and both will conclude that they need to
INSERT the new data into the
target table.

There are two things that you can do to avoid the problem. The first thing
is to make it so that you're doing only one
batch update at any time, by
architecting your application around that constraint. That's the most
effective way around the problem, but not the most practical.

The other thing you can do, is force the concurrent transactions to
serialize one after the other, using an
explicit locking statement:

LOCK TABLE target IN SHARE ROW EXCLUSIVE MODE;

That
lock level is not automatically acquired by any PostgreSQL command, so
the only way it helps you is when you're doing that for every transaction
you want to serialize. When you know you're not at risk (that is, when not
playing the
insert or update dance), you can omit taking that
lock.

Conclusion

The SQL language has its quirks, that's true. It's been made for efficient
data processing, and with recent enough
PostgreSQL releases you even have
some advanced pipelining facilities included in the language. Properly
learning how to make the most out of that old component of your programming
stack still makes a lot of sense today!

]]>dim@tapoueh.org (Dimitri Fontaine)Fri, 15 Mar 2013 10:47:00 +0100http://tapoueh.org/blog/2013/03/15-batch-update.htmlEmacs Conferencehttp://tapoueh.org/blog/2013/03/04-Emacs-Conference.htmlThe
Emacs Conference is happening, it's real, and it will take place at the
end of this month in London. Check it out, and register at
Emacs Conference Event Brite. It's free and there's still some availability.

It's all about Emacs, and it rocks!

We have a great line-up for this conference, which makes me proud to be able
to be there. If you've ever been paying attention when using
Emacs then
you've already heard those names:
Sacha Chua is frequently blogging about
how she manages to improve her workflow thanks to
Emacs Lisp,
John Wiegley
is a proficient Emacs contributor maybe best known for his
ledgerEmacs
Mode, then we have
Luke Gorrie who hacked up
SLIME among other things, we
also have
Nic Ferrier who is starting a revolution in how to use
Emacs Lisp
with
elnode. And more! Including
Steve Yegge!

See you there in London.

]]>dim@tapoueh.org (Dimitri Fontaine)Mon, 04 Mar 2013 13:58:00 +0100http://tapoueh.org/blog/2013/03/04-Emacs-Conference.htmlHyperLogLog Unionshttp://tapoueh.org/blog/2013/02/26-hll-union.htmlIn the article from yesterday we talked about
PostgreSQL HyperLogLog with
some details. The real magic of that extension has been skimmed over though,
and needs another very small article all by itself, in case you missed it.

Which Set Operation do you want for counting unique values?

The first query here has the default level of magic in it, really. What
happens is that each time we do an update of the
HyperLogLoghash value, we
update some data which are allowing us to compute its cardinality.

The
HyperLogLog data structure is allowing the implementation of an
union
algorithm that will be able to compute how many unique values you happen to
have registered in both one day and the next. Extended in its general form,
and doing SQL, what you get is an
aggregate that you can use in
GROUP BY
constructs and
window functions. Did you read about them yet?

Installing postgresql-hll

It's as simple as
CREATE EXTENSION hll; really, even if to get there you
must have installed the
package on your system. We did some packaging work
for
debian and the result should appear soon in a distro near you.

Then you also need to keep your data in some table, straight from the
documentation we can use that schema:

So in our example what you see is that we want to decipher how many unique
IP addresses we saw, and we do that by first creating a
hash of that source
data then calling
hll_add() with the current value and the hash result.

The current value must be initialized using
hll_empty().

Concurrency

The most awake readers among you have already spotted that: using an
UPDATE
on the same row over and over again is a good recipe to kill any form of
concurrency, so you don't want to do that on your production setup unless
you don't care about those
UPDATE waiting piling up in your system.

The idea is then to fill-in a queue of updates and asynchronously update the
daily_uniques table from that queue, possibly using the
hll_add_agg
aggregate that the extension provides, so that you do only one
update per
batch of values to process.

∅: Empty Set and NULL

Now, what happens when the batch of new unique values you want to update
from is itself empty? Well I would have expected
hll_add_agg over an empty
set to return an empty
hll value, the same as returned by
hll_empty(), but
it turns out it's returning
NULL instead.

And then
hll_add(users, NULL) will happily return
NULL. So the next
UPDATE
is cancelling all the previous work, which is not nice. We had to cater for
that case explicitely in the
UPDATE query that's working from the batch of
new values to add to our current
HyperLogLog hash entry, and I can't resist
to show off one of the most awesome PostgreSQL features here:
writable CTE.

WITH hll(agg) AS (
SELECT hll_add_agg(hll_hash_text(value)) FROM new_batch
)
UPDATE daily_uniques
SET users = CASE WHEN hll.agg IS NULL THEN users
ELSE hll_union(users, hll.agg)
END
FROM hll
WHERE date = current_date;

That's how you protect against an empty set being turned into a
NULL. I
think the real fix would need to be included in
postgresql-hll itself, in
making it so that the
hll_add_agg aggregate returns
hll_empty() on an empty
set, and I will report that bug (with that very article as the detailed
explanation of it).

Using postgresql-hll

When using
postgresql-hll on the production system, we were able to get some
good looking numbers from our
daily_uniques table:

I coulnd't resist to show off two of my favorite SQL constructs in that
example query here, which are the
Common Table Expressions (or CTE) and
window functions. If that
over() clause reads strange to you, take a minute
now and go read about it. Yes, do that now, we're waiting.

The data here is showing that we did setup the facility in the middle of the
first day, and that the morning's activity is quite low.

Conclusion

When using
postgresql-hll you need to be careful not to kill your
application concurrency abilities, and you need to protect yourself against
the ∅ killer too. The other thing to keep in mind is that the numbers you
get out of the
hll technique are estimates within a given
precision, and you
might want to read some more about what it means for your intended usage of
the feature.

]]>dim@tapoueh.org (Dimitri Fontaine)Mon, 25 Feb 2013 10:23:00 +0100http://tapoueh.org/blog/2013/02/25-postgresql-hyperloglog.htmlPlaying with pgloaderhttp://tapoueh.org/blog/2013/02/12-playing-with-pgloader.htmlWhile making progress with both
Event Triggers and
Extension Templates, I
needed to make a little break. My current keeping sane mental exercise seems
to mainly involve using
Common Lisp, a programming language that ships with
about all the building blocks you need.

Yes, that old language brings so much on the table

When using
Common Lisp, you have an awesome interactive development
environment where you can redefine function and objects
while testing them.
That means you don't have to quit the interpreter, reload the new version of
the code and put the interactive test case together all over again after a
change. Just evaluate the change in the interactive environement: functions
are compiled incrementally over their previous definition, objects whose
classes have changed are migrated live.

See, I just said
objects and
classes.
Common Lisp comes with some advanced
Object Oriented Programming facilities named
CLOS and
MOP where the
Java and
Python and
C++ object models are just a subset of what you're being offered.
Hint, those don't have
Multiple Dispatch.

And you have a very sophisticated
Condition System where
Exceptions are just
a subset of what you can do (hint: have a look a
restarts and tell me you
didn't wish your programming language of choice had them). And it continues
that way for about any basic building bloc you might want to be using.

Loading data

Back to
pgloader will you tell me. Right. I've been spending a couple of
evening on hacking on the new version of pgloader in
Common Lisp, and wanted
to share some preliminary results.

Playing with the loader

The current status of the new
pgloader still is pretty rough, if you're not
used to develop in Common Lisp you might not find it ready for use yet. I'm
still working on the internal APIs and trying to make something clean and
easy to use for a developer, and then I will provide some external ways to
play with it, user oriented. I missed that step once with the
Python based
version of the tool, I don't want to do the same errors again this time.

So here's a test run with the current
pgloader, on a small enough data set
of
226 MB of
CSV files.

Streaming data

With the new code in
Common Lisp, I could benefit from real multi threading
and higher level abstraction to make it easy to use:
lparallel is a lib
providing exactly what I need here, with
workers and
queues to communicate
data in between them.

What I'm doing is that two threads are separated, one is reading the data
from either a
CSV file or a
MySQL database directly, and pushing that data
in the queue; while the other thread is pulling data from the queue and
writing it into our
PostgreSQL database.

The
streaming here is a tad slower than the
importing from files. Now if you
want to be fair when comparing those, you would have to take into account
the time it takes to
export the data out from its source. When doing that
export/import dance, a quick test shows a timing of
1m4.745s. Now, if we do
an
export only test, it runs in
31.822s. So yes streaming is a good thing to
have here.

Conclusion

We just got twice as fast as the python version.

Some will say that I'm not comparing fairly to the
Python version of
pgloader here, because I could have implemented the streaming facility in
Python too. Well actually I did, the option are called
section_threads and
split_file_reading, that you can set so that a reader is pushing data into a
set of queues and several workers are feeding each from its own queue. It
didn't help with performances at all. Once again, read about the infamous
Global Interpreter Lock to understand why not.

So actually it's a fair comparison here where the new code is twice as fast
as the previous one, with only some hours of hacking and before spending any
time on optimisation. Well, apart from using a
producer, a
consumer and a
queue, which I almost had to have for streaming in between two database
connections anyways.

]]>dim@tapoueh.org (Dimitri Fontaine)Tue, 12 Feb 2013 11:17:00 +0100http://tapoueh.org/blog/2013/02/12-playing-with-pgloader.htmlLive Upgrading PGQhttp://tapoueh.org/blog/2013/02/08-PGQ-Live-Upgrade.htmlSome
skytools related new today, it's been a while. For those who where at
my
FOSDEM's talk about
Implementing High Availability you might have heard
that I really like working with
PGQ. A new version has been released a while
ago, and the most recent verion is now
3.1.3, as announced in the
Skytools 3.1.3 email.

Upgrade time!

Skytools 3.1.3 enters debian

First news is that
Skytools 3.1.3 has been entering
debian today (I hope
that by the time you reach that URL, it's been updated to show information
according to the news here, but I might be early). As there's current a
debian freeze to release
wheezy (and you can help
squash some bugs), this
version is only getting uploaded to
experimental for now. Thanks to the
tireless work of
Christoph Berg though, this version is already available
from
apt.postgresql.org.

Upgrading to PGQ 3

The other news is that I've been testing
live upgrade scenario where we want
to upgrade from
PGQ to
PGQ3, and it works pretty well, and it's quite simple
to achieve too. Here's how.

So the first thing is to shut down the current
ticker process. Then we
install the new packages, assuming that you did follow the step in the wiki
pointed above, please go read
apt.postgresql.org again now if needs be.

Of course replace those
... with options such as your actual connection
string. I tend to always add
-vON_ERROR_STOP=1 to all these
commands, so that I don't depend on having the right
.psqlrc on the
particular server I'm connected to. Also remember that if you want to do
that for more than one database, you need to actually run that pair of
commands for each of them.

Now it's time to restart the new ticker. The main changes from the previous
one is that it is now a
C program called
pgqd that knows how to tick for any
number of
databases, so that you only have to have
one instance around
per
cluster now.

sudo /etc/init.d/skytools3 start
tail -f /var/log/skytools/pgqd.log

Those two commands are taking for granted that you did prepare the
pgqd
setup the
debian and
skytools way, by adding your config in
/etc/skytools3/pgqd.ini and editing
/etc/skytools.ini accordingly, so that
it's automatically taken into account at machine boot.

Note that I did actually exercised the procedure above while running a
pgbench test replicated with
londiste. Of course the replication has been
lagging a little while no
ticker was running, and then it catched-up as fast
as it could, in that case:

INFO {count: 245673, ignored: 0, duration: 422.104366064}

Happy Hacking!

So if you have any
batch processing needs, remember to consider what PGQ has
to offer. And yes if you're running some cron job to compute things out of
the database for you, you are doing some
batch processing.

Yes, I did search for Transactional Batch Processing

]]>dim@tapoueh.org (Dimitri Fontaine)Fri, 08 Feb 2013 15:52:00 +0100http://tapoueh.org/blog/2013/02/08-PGQ-Live-Upgrade.htmlAnother Great FOSDEMhttp://tapoueh.org/blog/2013/02/04-Another-great-FOSDEM.htmlThis year's FOSDEM has been a great edition, in particular the
FOSDEM PGDAY 2013 was a great way to begin a 3 days marathon of talking
about PostgreSQL with people not only from our community but also from
plenty other Open Source communities too: users!

PostgreSQL at FOSDEM made for a great event

Having had the opportunity to meet more people from those other development
communities, I really think we should go and reach for them in their own
conferences. About any PostgreSQL community member I've been talking about
with about that idea seemed to agree and generally already was thinking the
same thing. And most are already doing it, in fact...

Event Triggers

I'm currently in the middle of implementing
Event Triggers for PostgreSQL
and I have been for about the last 2 years. It's a quite complex feature to
get right and so the patch itself is complex and large, which means the
reviewing process is complex and takes time.

That also means that some parts of the design have already been redone
completely at least 3 times, and that what got
commited to the PostgreSQL
code is nothing like what the design we decided should go in looks like.
That's just a fact of life, maybe, but that makes for a very long
development process.

We're now getting to the end of it though, and this talk is showing both
where we want to go with
Event Triggers, where we are now and what remains
to be done for 9.3 if we want the feature to be any useful.

If you're interested into that development, have a look at the slide deck
and possibly ask me some questions about what's not clear on the
pgsql-hackers mailing list (preferably).

Event Triggers, The Real Mess™

The other way to get summarized and clear information about Event Triggers
is the wiki page by the same name:
Event Triggers.

You will see that while a lot has been done (internal refactoring, adding
new infrastructure and SQL level commands, and the minimum
PLpgSQL support);
a lot remains to be done where the code has already been submitted several
times, following several designs directions given by careful review on
hackers, and still we have some choices to make.

Implementing High-Availability

This talk is showing several ways to implement
High Availability with
PostgreSQL. The fact is that that term is overloaded already, and usually
covers two very different things which are
Service Availability and
Data
Availability.

In the talk, we're showing up several techniques that you can use to address
different set of compromises in between
scaling,
load balancing,
data
availability and
durability, and
service availability. The first two points
could seem unrelated to the main topic, but
scaling often is a simple enough
way to achieve
service availability... until you need to think about
sharding, that is.

Implementing High Availability of Services and Data with PostgreSQL

So the talk is all about making compromises in between them and getting to
an architecture able to implement the choosen compromises. While the talk
has been pretty well received, it was delivered in a 50 mins slot where we
usually take a whole day or three when addressing that problems at a
customer's site.

Some parts of how to get to the right architecture for the compromises that
are important for you can't be fully covered in that time slot, while still
being able to actually present the techniques that we're using.

I think it might be useful to extract a single use-case or two from that
talk then have a full 50 mins version reduced to a single or a couple of
very clear compromises and how to achieve them in details, rather than
trying to present a full range of techniques and how to use them in
different scenarios.

FOSDEM

After having been talking with many people, it appears that for next year's
edition I should be proposing a more general talk that aims at helping
developpers in other communities (python, ruby, etc) discover what's in for
them in PostgreSQL. This database is full of advanced features that are
really easy to use, and the only problem when preparing such a talk is
choosing the right subset...

If you're running a local developper user group and are interested into
learning some more about how PostgreSQL can help you in a daily basis,
please do get in touch with me and let's schedule a presentation together!

]]>dim@tapoueh.org (Dimitri Fontaine)Mon, 04 Feb 2013 09:55:00 +0100http://tapoueh.org/blog/2013/02/04-Another-great-FOSDEM.htmlA Sunday at FOSDEMhttp://tapoueh.org/blog/2013/01/30-A-Sunday-at-FOSDEM.htmlThe previous article
FOSDEM 2013 said to be careful with the
PostgreSQL devroom schedule because one of my talks there might get swapped
with a slot on the
FOSDEM PGDay 2013 which happens
this Friday and has been
sold out anyway.

Turns out it's not true, because we still depend on past century
technologies somehow. Not everybody will be looking at the schedule on the
web using a connected mobile device (you know, you've heard of them, those
tracking and surveillance devices, if you want to believe
Stallman), and as
the schedule gets printed on little paper sheets, it's unfortunately too
late to change it now.

Those flyers are already printed on paper sheets, the schedule too

So it happens that I'll be speaking twice on Sunday and not at all on Friday.

]]>dim@tapoueh.org (Dimitri Fontaine)Wed, 30 Jan 2013 10:50:00 +0100http://tapoueh.org/blog/2013/01/30-A-Sunday-at-FOSDEM.htmlFOSDEM 2013http://tapoueh.org/blog/2013/01/29-FOSDEM-2013.htmlThis year again I'm going to
FOSDEM, and to the extra special
PostgreSQL FOSDEM day. It will be the first time that I'm going to be at the
event for the full week-end rather than just commuting in for the day.

I'm Going to the FOSDEM, hope to see you there!

And I'm presenting two talks over there that are both currently scheduled on
the Sunday in the
PostgreSQL devroom. We're talking about changing that
though, so that one of those will in fact happen
this Friday at the
FOSDEM PGDay 2013, which has a different schedule, so consider watching for
that.

One of those two talks is about
Implementing High Availability (yes, with
PostgreSQL). It's been quite well received in the places I had to chance to
make it before (namely
PGDay France and
PG Conf Europe), and it's going to
be a stripped down version of it so that it fits well in the 45 mins slot we
have here.

The other talk is going to be about
Event Triggers, a feature new in
PostgreSQL 9.3 (due in september 2013, crossing fingers) and while the goal
of that talk is to introduce what the feature is all about and a bunch of
use cases that you can address by using it, it will certainly offer a peek
into the PostgreSQL development cycle and community processes.

See you in Brussels!

]]>dim@tapoueh.org (Dimitri Fontaine)Tue, 29 Jan 2013 10:11:00 +0100http://tapoueh.org/blog/2013/01/29-FOSDEM-2013.htmlpgloader: what's next?http://tapoueh.org/blog/2013/01/28-pgloader-future.htmlpgloader is a tool to help loading data into
PostgreSQL, adding some error
management to the
COPY command.
COPY is the fast way of loading data into
PostgreSQL and is transaction safe. That means that if a single error
appears within your bulk of data, you will have loaded none of it.
pgloader
will submit the data again in smaller chunks until it's able to isolate the
bad from the good, and then the good is loaded in.

Not quite this kind of data loader

In a recent migration project where we freed data from MySQL into
PostgreSQL, we used
pgloader again. But the loading time was not fast enough
for the service downtime window that we had here. Indeed
Python is not known
for being the fastest solution around. It's easy to use and to ship to
production, but sometimes you not only want to be able to be efficient when
writing code, you also need the code to actually run fast too.

Faster data loading

So I began writing a little dedicated tool for that migration in
Common Lisp
which is growing on me as my personal answer to the burning question:
python
2 or python 3? I find
Common Lisp to offer an even more dynamic programming
environment, an easier language to use, and the result often has
performances characteristics way beyond what I can get with python. Between
5 times faster and
121 times faster in some quite stupid benchmark.

Here, with real data, my one shot attempt has been running more than
twice
as fast as the python version, after about a day of programming.

See what's happening now?

The other thing here is that I've tempted to get
pgloader work in parallel,
but at the time I didn't know about the
Global Interpreter Lock that they
didn't find how to remove in Python 3 still, by the way. So my threading
attempts at making
pgloader work in parallel are pretty useless.

Whereas in
Common Lisp I can just use the
lparallel lib, which exposes
threading facilities and some
queueing facilities as a mean to communicate
data in between workers, and have my code easily work in parallel for real.

Compatibility

The only drawback that I can see here is that if you've been writing your
own
reformating modules in python for
pgloader (yes you can
implement your own reformating module for pgloader), then you would have to
port it to
Common Lisp. Shout me an email if that's your case.

Next version

So, I think we're going to have a
pgloader 3 someday, that will be way
faster than the current one, and bundle some more features: real parallel
behavior, ability to fetch non local data (connecting to MySQL directly, or
HTTP, S3, etc); and I'm thinking about offering a
COPY like syntax to drive
the loading too, while at it. Also, the ability to discover the set of data
to load all by itself when you want to load a whole database: think of it as
a special
Migration mode of operations.

Some feature requests can't be solved easily when keeping the old
.INI
syntax cruft, so it's high time to implement some kind of a real command
language. I have several ideas about those, in between the
COPY syntax and
the
SQL*Loader configuration format, which is both clunky and quite
powerful, too.

After a beginning in
TCL and a complete rewrite in python in
2005, it looks
like
2013 is going to be the year of
pgloader 3, in
Common Lisp!

]]>dim@tapoueh.org (Dimitri Fontaine)Mon, 28 Jan 2013 10:48:00 +0100http://tapoueh.org/blog/2013/01/28-pgloader-future.htmlAutomated Setup for pgloaderhttp://tapoueh.org/blog/2013/01/17-pgloader-auto-setup.htmlAnother day, another migration from
MySQL to
PostgreSQL... or at least
that's how it feels sometimes. This time again I've been using some quite
old scripts to help me do the migration.

That's how I feel for MySQL users

Migrating the schema

For the
schema parts, I've been using
mysql2pgsql with success for many
years. This tool is not complete and will do only about
80% of the work. As
I think that the schema should always be validated manually when doing a
migration anyway, I happen to think that it's good news.

Getting the data out

Then for the data parts I keep on using
pgloader. The data is never quite
right, and the ability to filter out what you can't readily import in a
reject file proves itself a a must have here. The problems you have in the
exported MySQL data are quite serious:

Can I have my data please?

First, date formating is not compatible with what PostgreSQL expects,
sometimes using
20130117143218 instead of what we expect:
2013-01-17
14:32:18, and of course even when the format is right (that seems to depend
on the MySQL server's version), you still have to transform the
0000-00-00
00:00:00 into
NULL.

Before thinking about the usage of that particular date rather than
using
NULL when you don't have the information, you might want to
remember that there's no
year zero in the calendar, it's year 1 BC and
then year 1.

Then, text encoding is often mixed up, even when the MySQL databases are
said to be in
latin1 or
unicode, you somehow always end up finding texts in
win1252 or some other
code page in there.

And of course, MySQL provides no tool to export the data to
CSV, so you have
to come up with your own. The
SELECT INTO OUTFILE command on the server
produces non conforming CSV (
\n can appear in non-escaped field contents),
and while the
mysql client manual page details that it outputs
CSV when
stdout is not a terminal, it won't even try to quote fields or escape
\t
when they appear in the data.

So, we use the
mysqltocsv little script to export the data, and then use
that data to feed
pgloader.

Loading the data in

Now, we have to write down a configuration file for pgloader to know what to
load and where to find the data. What about generating the file from the
database schema instead, using the query in
generate-pgloader-config.sql:

Conclusion

There are plenty of tools to assist you migrating away from MySQL and other
databases. When you make that decision, you're not alone, and it's easy
enough to find people to come and help you.

While MySQL is Open Source and is not a
lock in from a licencing
perspective, I still find it hard to swallow that there's no provided tools
for getting data out in a sane format, and that so many little
inconsistencies exist in the product with respect to data handling (try to
have a
NOT NULL column, then enjoy the default empty strings that have been
put in there). So at this point, yes, I consider that moving to
PostgreSQL
is a way to
free your data:

Free your data!

]]>dim@tapoueh.org (Dimitri Fontaine)Thu, 17 Jan 2013 14:32:00 +0100http://tapoueh.org/blog/2013/01/17-pgloader-auto-setup.htmlExtensions Templateshttp://tapoueh.org/blog/2013/01/08-Extensions-Templates.htmlIn a recent article titled
Inline Extensions we detailed the problem of how
to distribute an extension's
package to a remote server without having
access to its file system at all. The solution to that problem is non
trivial, let's say. But thanks to the awesome
PostgreSQL Community we finaly
have some practical ideas on how to address the problem as discussed on
pgsql-hackers, our development mailing list.

PostgreSQL is first an Awesome Community

The solution we talked about is to use
templates, and so I've been working
on a patch to bring
templates for extensions to PostgreSQL. As we're talking
about 3 new system catalogs, that's a big patch in term of lines of code. In
term of features though, it's quite an easy one.

Here's how it goes. Let's say you want to prepare the system to be able to
CREATE EXTENSION pair; without having to install it as an
OS package for
which you would need to get
root access on the server where your PostgreSQL
instance is running, which is not always easy, and sometimes not a good
idea.

Installing an extension template

With the
template patch I just sent on the lists, what you can do is prepare
a template with your extension's script and properties, then use it to
install the extensions.

The extension installation is now happening from the catalog templates
rather than the file system, which means you didn't need to be
root on the
system where the server is running. Also note that this example above did
happen when connected as the
database owner, a user who is not the
superuser. Requiring less privileges is always good news, right?

Managing upgrade scripts and extension update

Now that the extension is installed, you might want to update it with some
new awesome features. Let's have a look at that.

Upload your Extension Update Scripts

Rather than make a new version of the extension package with the new files
in there, then asking the operations team to make the new package available
on the internal repositories then install them on the servers, you could now
prepare and
QA the new setup that way:

Of course it's not the most realistic example when you look at the content.
In particular the
1.2 version that only adds a comment to the extension. I
needed another version to test the automatic upgrade path with more than one
step though, so here we go.

As you can see there's nothing too complex here, it's quite straightforward.
We need to separate away the
creating templates from the
updating templates
because we need
unique keys and we can't have that on
NULL columns.

Next steps

Now that we have the basics in place, the patch is far from finished still.
It needs
pg_dump and
psql support, support for the function
pg_available_extension_versions(), implementing some
ALTER TEMPLATE FOR
EXTENSION commands for which I only sketched the syntax in the grammar, and
some more infrastructure to be able to have
ALTER OWNER and
ALTER RENAME
commands.

All that is pretty technical though, the real thing that patch needs is some
quality review and maybe some adjustments. I would be surprised if it didn't
need adjustments, really. Because the way the community works, we always
need some. That's why the PostgreSQL product is so good!

]]>dim@tapoueh.org (Dimitri Fontaine)Tue, 08 Jan 2013 17:53:00 +0100http://tapoueh.org/blog/2013/01/08-Extensions-Templates.htmlInline Extensionshttp://tapoueh.org/blog/2012/12/13-Inline-Extensions.htmlWe've been having the
CREATE EXTENSION feature in
PostgreSQL for a couple of
releases now, so let's talk about how to go from here. The first goal of the
extension facility has been to allow for a clean
dump and
restore process of
contrib modules. As such it's been tailored to the needs of deploying files
on the
file system because there's no escaping from that when you have to
ship
binary and
executable files, those infamous
.so,
.dll or
.dylib things.

Now that we have the
Extension facility though, what we see is a growing
number of users taking advantage of it for the purpose of managing in house
procedural code and related objects. This code can be a bunch of
PLpgSQL or
plpython functions and as such you normaly create them directly from any
application connection to PostgreSQL.

So the idea would be to allow creating
Extensions fully from a SQL command,
including the whole set of objects it contains. More than one approach are
possible to reach that goal, each with downsides and advantages. We will see
them later in that document.

Before that though, let's first review what the extension mechanism has to
offer to its users when there's no
contrib like module to manage.

A use case for next generation extensions

The only design goal of the
9.1 PostgreSQL Extension feature has been to
support a proper
dump & restore user experience when using
contrib modules
such as
hstore or
ltree. Building up on that, what do
Extensions have to
offer to non
C developpers out there? In other words, what
CREATE EXTENSION
brings on the table that a bunch of
loose objects does not? What problems
can we now solve?

A Multi Functions Equipment, All Bundled Together

A way to phrase it is to say that
Extensions are user defined
CASCADE
support.
Extensions brings extensibility to the
pg_depend PostgreSQL
internal dependency tracking system that
CASCADE is built on. From that
angle,
Extensions are a way to manage dependencies of
SQL objects in a way
that allow you to manage them as a single entity.

One of the existing problems this helps solving is the infamous lack of
dependency tracking between function calls. Using
Extensions when you deal
with a set of functions acting as an API, you can at least protect that as a
unit:

And you also have a version number and tools integration to manage
extensions, with psql
\dx command and the equivalent feature in
pgAdmin.
Coming with your own version number management is not impossible, some do
that already. Here it's integrated and the upgrade sequences are offered too
(applying
1.1--1.2 then
1.2--1.3 automatically).

Let's just say that it's very easy to understand the
traction our users feel
towards leveraging
Extensions features in order to properly manage their set
of stored procedures and SQL objects.

The dump & restore experience

The common problem of all those proposals is very central to the whole idea
of
Extensions as we know them. The goal of building them as been to fix the
restoring experience when using extensions in a database, and we managed to
do that properly for contrib likes extensions.

A fly in the ointment

When talking about
Inline Extensions, the fly in the ointment is how to
properly manage their
pg_dump behavior. The principle we built for
Extensions and that is almost unique to them is to
omit them in the dump
files. The only other objects that we filter out of the dump are the one
installed at server initialisation times, when using
initdb, to be found in
the
pg_catalog and
information_schema systems'
schema.

At restore time, the dump file contains the
CREATE EXTENSION command so the
PostgreSQL server will go fetch the
control and
script files on disk and
process them, loading the database with the right set of SQL objects.

Now we're talking about
Extensions which we would maybe want to dump the
objects of, so that at
restore time we don't need to find them from unknown
external resources: the fact that the extension is
Inline means that the
PostgreSQL server has no way to know where its content is coming from.

Inline Extension Proposals

Now, on to some proposals to make the best out of our all time favorite
PostgreSQL feature, the only one that makes no sense at all by itself...

Starting from an empty extension

We already have the facility to add existing
loose objects to an extension,
and that's exactly what we use when we create an extension for the first
time when it used not to be an extension before, with the
CREATE EXTENSION
... FROM 'unpackaged'; command.

Opening
CREATE EXTENSION so that it allows you to create a really
empty
extension would then allow you to fill-in as you need, with as many commands
as you want to add objects to it. The
control file properties would need to
find their way in that design, that sure can be taken care of.

Look me, an Empty Extension!

The main drawback here is that there's no separation anymore in between the
extension author, the distribution means, the DBA and the database user.
When you want to install a third party
Extension using only SQL commands,
you could do it with that scheme by using a big script full of one-liners
commands.

So that if you screw up your
copy/pasting session (well you should maybe
reconsider your choice of tooling at this point, but that's another topic),
you will end up with a perfectly valid
Extension that does not contain what
you wanted. As the end user, you have no clue about that until the first
time using the extension fails.

CREATE EXTENSION AS

The next idea is to embed the
Extension script itself in the command, so as
to to get a cleaner command API (in my opinion at least) and a better error
message when the paste is wrong. Of course it your
paste problem happens to
just be loosing a line in the middle of the script there is not so much I
can do for you...

I've actually proposed a patch to implement that, as you can see in the
pg_dump --extension-script commit fest entry. As spoiled by the commit fest
entry title, the main problem we have with
Inline Extensions is their
management in the seamless experience of
dump & restore that we are so happy
to have now. More about that later, though.

Extension Templates

Another idea is to continue working from control parameters and scripts to
install and update extensions, but to have two different places where to
find those. Either on the server's
File System (when dealing with
contribs
and
shared libraries, there's but a choice), or on the system catalogs.

We Already Have TEXT SEARCH TEMPLATE After All

The idea would then be to have some new specific
TEMPLATE SQL Object that
would be used to
import or
upload your control file and create and update
scripts in the database, using nothing else than a SQL connection. Then at
CREATE EXTENSION time the system would be able to work either from the file
system or the
template catalogs.

One obvious problem is how to deal with a unique namespace when we split the
sources into the file system and the database, and when the file system is
typically maintained by using
apt-get or
yum commands.

Then again I would actually prefer that mechanism better than the other
proposals if the idea was to load the file system control and scripts files
as
TEMPLATEs themselves and then only operate
Extensions from
Templates. But
doing that would mean getting back to the situation where we still are not
able to devise a good, simple and robust
pg_dump policy for extensions and
templates.

Conclusion

I hope to be finding the right solution to my long term plan in this release
development cycle, but it looks like the right challenge to address now is
to find the right compromise instead. Using the
Templates idea already
brings a lot on the table, if not the whole set of features I would like to
see.

PostgreSQL: Building on Solid Foundations

What would be missing mainly would be the ability for an
Extension to switch
from being file based to being a template, either because the author decided
to change the way he's shipping it, or because the user is switching from
using the
pgxn client to using
proper system packages. I guess that's
something we can see about later, though.

]]>dim@tapoueh.org (Dimitri Fontaine)Thu, 13 Dec 2012 11:34:00 +0100http://tapoueh.org/blog/2012/12/13-Inline-Extensions.htmlEditing SQLhttp://tapoueh.org/blog/2012/11/06-Interactive-SQL.htmlIt's hard to read my blog yet not know I'm using
Emacs. It really is a great
tool and has a lot to compare to
PostgreSQL in terms of extensibility,
documentation quality and community. And there's even a native
implementation of the
PostgreSQL Protocol written in
Emacs Lisp.

One of the things where
Emacs really shines is that interactive development
environment you get when working on some
Emacs Lisp code. Evaluating an
function as easy as a single
key chord, and that will both compile in the
function and load it in the running process. I can't tell you how many times
I've been missing that ability when editing C code.

With
PostgreSQL too we get a pretty interactive environment with the
psql
console application, or with
pgAdmin. One feature from
pgAdmin that I've
often wished I had in
psql is the ability to edit my query online and easily
run it in the console, rather than either using the
readline limited history
editing features or launching a new editor process each time with
\e. At the
same time I would much prefer using my usual
Emacs editor to actually
edit
the query.

If you've been reading that blog before you know what to expect. My solution
to the stated problem is available in
pgdevenv-el, an
Emacs package aimed at
helping
PostgreSQL developers. Most of the features in there are geared
toward the
core backend developers, except for this one I want to talk about
today (I'll blog about the other ones too I guess).

What you can see from that screenshot is that the selected query text has
been sent to the
psql buffer and exectuted over there. And that the
psql
buffer is echoing all queries sent to it. What you can not see straight from
that picture is the interaction to get there. Well, I've been implementing
some
elisp features that I was missing.

First, movement: you can do
C-M-a and
C-M-e to navigate to the beginning and
the end of the SQL query at point, like you do in
C or in
lisp in
Emacs.

Then, selection: you can do
C-M-h to select the SQL query at point, you
don't have to navigate yourself,
pgdev-sql-mode knows how to do that. Side
note,
pgdev-sql-mode is the name of the
minor mode you need to activate in
your SQL buffers to have the magic available.

Last but not least, evaluation: as when editing lisp code, you can now use
C-M-x to send the current query text to an associated
psql buffer.

The way to associate the
psql buffer to an
SQL buffer is currently done
thanks to the other
pgdevenv-el features that this blog post is not talking
about, and the setup is addressed in the documentation: you have to let know
pgdevenv-el where your PostgreSQL branches are installed locally so that it
can prepare you a
Shell buffer with
PGDATA and
PGPORT already set for you.
And currently, for
C-M-x to work you need to open the buffer yourself before
hand, using
C-c - n (to run the command
pgdev-open-shell), and type
psql in
the
Shell prompt.

What that means for me is that I can at least edit SQL (in
PostgreSQL
regression files and other places) in my usual
Emacs buffer and actually
refine it as I go until it does exactly what I need, without having to use
the
readline history editing or the
\e command, which is not great when your
Shell is in already running inside
Emacs.

]]>dim@tapoueh.org (Dimitri Fontaine)Tue, 06 Nov 2012 09:55:00 +0100http://tapoueh.org/blog/2012/11/06-Interactive-SQL.htmlPostgreSQL for developershttp://tapoueh.org/blog/2012/11/02-Conference-AFUP-Lyon.htmlAs
Guillaume says, we've been enjoying a great evening conference in Lyon 2
days ago, presenting PostgreSQL to developers. He did the first hour
presenting the project and the main things you want to know to start using
PostgreSQL in production, then I took the opportunity to be talking to
developers to show off some SQL.

That slide deck contains mainly SQL language, but some french too, rather
than english. Sorry for the inconvenience if that's not something you can
read. Get me to talk at an english developer friendly conference and I'll
translate it for you! :)

The aim of that talk is to have people think about SQL as a real asset in
their development tool set. SQL really should get compared to your
application development language rather than your UI formating language,
it's more like PHP or Python than it is like HTML.

So the whole talk is about showing off some advanced SQL features, all
provided by default in released PostgreSQL versions. The main parts of the
talk all come from an article in this blog:
Reset Counter.

]]>dim@tapoueh.org (Dimitri Fontaine)Fri, 02 Nov 2012 16:22:00 +0100http://tapoueh.org/blog/2012/11/02-Conference-AFUP-Lyon.htmlAnother awesome confhttp://tapoueh.org/blog/2012/10/30-Prague-Lyon.htmlLast week was
PostgreSQL Conference Europe 2012 in Prague, and it's been
awesome. Many thanks to the organisers who did manage to host a very smooth
conference with
290 attendees, including speakers. That means you kept
walking into interesting people to talk to, and in particular the
Hallway
Track has been a giant success.

Next conference is in Lyon and will be in French, the talk is called
Présentation de PostgreSQL. The audience is going to be composed of PHP
developers interested to know more about PostgreSQL, I'll tell you how it
goes!

]]>dim@tapoueh.org (Dimitri Fontaine)Tue, 30 Oct 2012 12:50:00 +0100http://tapoueh.org/blog/2012/10/30-Prague-Lyon.htmlPrefixes and Rangeshttp://tapoueh.org/blog/2012/10/16-prefix-update.htmlIt's been a long time since I last had some time to spend on the
prefix
PostgreSQL extension and its
prefix_range data type. With PostgreSQL 9.2
out, some users wanted me to update the extension for that release, and
hinted me that it was high time that I fix that old bug for which I already
had a patch.

The
changelog is all about fixing an index search bug and updating the
package to primarily be an extension for PostgreSQL 9.1 and 9.2. Of course
older Major Versions are still supported (all of them since
8.1, but please
first consider upgrading PostgreSQL) if you want to install it
manually,
using the
prefix--1.2.0.sql file.

Range Types

If you step back a little there's an interesting question to answer here.
Why isn't
prefix_range and
PostgreSQL Range Type? Given the names it seems
like a pretty good candidate.

Well the thing is that to make a generic range type you need to have a total
ordering on the range elements, and a distance function that tells you how
far any two elements of a range are one from each other.

When talking about prefixes, I don't see how to do that. The prefix range
['abcd', 'abce') contains an infinity of elements, all the
strings that
begin with the letters
abcd. I guess that coming with an ordering on text is
possible, but what if any text element represents a prefix?

I mean that in our case, the elements would be of type
prefix, and
'abcd' is
a prefix of
'abcdefg'. The question I want to answer is that given a table
with prefixes
'abcd',
'abce' and
'abcde' which row in there has the longest
prefix matching the literal
'abcdef'.

I'm not seeing how to abuse the
Range Types mechanism to implement that, so
if you have some ideas please share them!

]]>dim@tapoueh.org (Dimitri Fontaine)Tue, 16 Oct 2012 10:47:00 +0200http://tapoueh.org/blog/2012/10/16-prefix-update.htmlReset Counterhttp://tapoueh.org/blog/2012/10/05-reset-counter.htmlI've been given a nice puzzle that I think is a good blog article
opportunity, as it involves some thinking and
window functions.

What's to solve

Say we store in a table entries from a
counter that only increases and the
time stamp when we did the measurement. So that when you read
30 then later
40 in fact that means we counted
10 more the second reading when compared to
the first, in other words the first
30 are counted again in the second
counter value,
40.

Now of course it's a real world counter. Think network traffic counter on a
network interface, if you want something real to play with in your mind. So
the counter will sometime reset and you will read measure sequences such as
40, 0, 20 if you happen to read just when the counter is reset, or most of
the time that will look like
45, 25, 50.

The question we want to answer is, given a series of that counter measures,
including some resets, what is the current logical value of the counter?

Given the sequence of measures
0, 10, 20, 30, 40, 0, 20, 30, 60 the result
we want is
40 + 60, that is
100. Right?

Playing with some data

Let's model an hypothetical dataset easy enough to play with. What about
just the previous example? We also need to
time stamp the measurements,
let's just use a
tick for now, as it's easier to think about:

Now that we have some data in a table to play with, let's try to find out
the numbers we are interested in: we only want to keep the latest measure we
read on the counter just before it wraps. That means values where the
next
one (in tick or time stamping order) is lesser than the current counter
value.

As we are lucky enough to be playing with the awesome
PostgreSQL which
brings
window functions on the table, we can easily implement just what we
said in a readable way:

select tick, nb,
case when lead(nb) over w < nb then nb
when lead(nb) over w is null then nb
else null
end as max
from measures
window w as (order by tick);

The firt
case is the exact translation of the problem as spelled in english
in just the previous paragraph where we stated we want to keep the current
counter value in case of a
wraparound, so I guess it's easy enough to get
at.

Then we have a couple of tricks in that query in order to massage the data
as we want it. First, the last row of the output won't have a
lead, that
window function call is going to return
NULL. In that case, we keep the
current counter value as if we just did a
wraparound. And finally, when
there's no
wraparound, we don't care about the data. Well, for the purpose
of knowing the current
logical value of the counter, that is.

with t(tops) as (
select case when lead(nb) over w < nb then nb
when lead(nb) over w is null then nb
else null
end as max
from measures
window w as (order by tick)
)
select sum(tops) from t;
sum
-----
255
(1 row)

All good!

Counter logical value over a given period

Now of course what we want is to find the logical value of the counter for a
given day's or month's worth of measures. We then need to pay attention to
the value of the counter at the start of our period so that we know to
substract it from the logical sum over the period.

Here's an SQL version of the same sentence, applied to the period in between
ticks
4 and
14, in a completely arbitrary choosing of mine:

with t as (
select tick,
first_value(nb) over w as first,
case when lead(nb) over w < nb then nb
when lead(nb) over w is null then nb
else null
end as max
from measures
where tick >= 4 and tick < 14
window w as (order by tick)
)
select sum(max) - min(first) as sum
from t;

Here we are using the
first_value() window function to retain it in the
whole resultset of the
Common Table Expression (the inner query introduced
by the keyword
WITH is called that way). And when doing the sum we're
interested in at the outer level, we didn't forget to substract the first
value: we need to use an aggregate here because we're doing a
sum()
aggregate at the same query level, and we have the same value in each row of
the resultset, so we used
min(),
max() would have been as good.

Another important trick we're using in that query is how to express the date
range. Never use
between for that, as you would end up counting boundaries
twice, and customer won't like your accounting process if you do that.
Always use a combo of inclusive and exclusive boundaries comparison, as in
that
WHERE clause in the previous query.

Let's have a quick look at the raw data in that range, using another nice
aggregate that PostgreSQL comes with:

And now, the
logical counter value for that period is computed as the
following value by the previous query:

sum
-----
105
(1 row)

We can verify it manually, we want
40 + 60 + 35 - 30, I think we're all good
again. Don't forget we have to substract the first measure from the period!

Extending the problem

Another interesting problem, that we didn't have here but that I find
interesting enough to extend this article, is finding the ranges of time
(here, ticks) within which the counter didn't reset.

The query is more complex because we need to split the data into partitions,
each partition containing data from the same counter series of measures
without wrapping. The usual trick is to self-join our data set so that for
each given row we have a set of rows from the same partition, we are going
to instead use a
correlated subquery to go fetch the next
wraparound value:

With that as an input it's then possible to build ranges of ticks including
non wrapping set of measures from our counter, and get for each range the
logical value tat the counter had at the end of it:

Conclusion

What I hope to have shown here, apart from some
window function tips and
some nice use cases for
common table expressions, is that as a developper
adding
SQL to your tool set is a very good idea.

You don't want to have several parts of your code dealing with a logical
counter like this, because you want the reporting, accounting, quota,
billing and other software to all agree on the values. And you most probably
want to avoid to fetch a huge result set of data and process it in the
application memory (it'd better fit) rather than just get back a single
integer column single row resultset, right?

If you find this SQL example to be off the limits, it's a good sign that you
need to improve on your skills so that SQL is a real asset of your developer
multi languages multi paradygm talents.

]]>dim@tapoueh.org (Dimitri Fontaine)Fri, 05 Oct 2012 09:44:00 +0200http://tapoueh.org/blog/2012/10/05-reset-counter.htmlPostgreSQL 9.3http://tapoueh.org/blog/2012/09/15-PostgreSQL-9.3.htmlPostgreSQL 9.2 is released! It's an awesome new release that I urge you to
consider trying and adopting, an upgrade from even
9.1 should be very well
worth it, as your hardware could suddenly be able to process a much higher
load. Indeed, better performances mean more work done on the same budget,
that's the name of the game!

As a
PostgreSQL contributor though, the release of
9.2 mainly means to me
that it's time to fully concentrate on preparing
9.3. The developement
season of which as already begun, by the way, so some amount of work has
already been done here.

The list of things I want to be working on for that next release is quite
long, and looks more like a christmas list than anything else. Let's only
talk about those things I might as well make happen rather than all the
things I wish I was able to be delivering in a single release...

Event Triggers

We missed
9.2 for wanting to include too big a feature in one go, leading to
too many choices to review and take decision about, for once, and also to
some non optimal choices that had to be reconsidered. Thanks to
PGCON in
Ottawa earlier this year, I could meet in person with
Robert Haas and we've
been able to decide how to attack that big patch I had. The first step has
been to
commit in the PostgreSQL tree only infrastructure parts, on which we
will be able to build the feature itself.

Infrastructure

What we already have today is the ability to run
user defined function when
some event occurs, and an event can only be a
ddl_command_start as of now.
Also the
trigger itself must be written in
PLpgSQL or
PL/C, as the support
for the other languages was not included from the patch.

That leaves some work to be done in the next months, right?

PL support

The
user defined function will get some information from
magic variables
such as
TG_EVENT and such. That allows easier integration of future
information we want to add, without disrupting those existing
triggers that
you wrote (no
API change), at the cost of having to write a specific
integration per
procedural language.

So one of the first things to do now is to take the support for the others
PL that I had in my proposal and make a new patch with only that in there.

Fill-in more information

Then again, this first infrastructure part was all about being actually able
to run a user function and left behind most of the information I would like
the function to have. The information already there is the
command tag, the
event name and the
parsetree that's only usable if you're writing your
trigger in
C, which we expect some users to be doing.

To supplement that, we're talking about the
Object ID that has been the
target of the
event, the
schema it leaves in when applicable, the
Object
Name, the
Operation that's running (
CREATE,
ALTER,
DROP), the
Object Kind
being the target of said operation (e.g.
TABLE or
FUNCTION), and the
command
string.

Publishing the Command String

Publishing the
Command String here is not an easy task, because we have to
rebuild a normalized version of it. Or maybe we can go with passing explicit
context in which the command is running, such as the
search_path.

Even with an explicit context that would be easy enough to
SET back again
(in a remote server where you would be replicating the
DDL, say), it would
be better to normalize the
command string so as to remove extra spaces and
make it easier to parse and process from a
user defined function.

That part looks like where most of the work is going to happen in the next
commit fests.

Events

The other big thing I want to be working on with respect to this feature is
the
event support, which is basically
hard coded to be
ddl_command_start in
the current state of the
9.3 code.

We certainly will want to be able to run
user defined function not only at
the very beginning of a
DDL command, but also just before it finishes so
that the newly created object already exists, for example.

We might also be interested into supporting triggers on more than
DDL, there
I doubt we will see that happening in
9.3, as some people in the community
would go crazy about complex use cases. Time is limited, and I think this is
better kept open for the next release, as the way our beloved PostgreSQL
works is by delivering reliable features: quality first.

Use cases

I'm always happy to hear about use cases for the features I'm working on,
and this one has the potential to be covering a non trivial amount of them.
I already can think of
trigger based replication systems and some integrated
extension network facilities. With your help we can give those the place
they should have: early days use cases in a great collection.

Extensions

So yes,
event triggers first use case for me is in relation with
extensions.
Surprise! There's still some more I want to do with
extensions, so much that
I could consider their implementation in
9.1 just an enabler. In
9.1 the
game has been to offer the best support we could design for existing
contrib
modules, with a very strong angle toward clean support for
dump and
restore.

The typical contrib module exports in SQL a list of C coded functions,
sometime supporting a new datatype, sometime a set of administration
functions. It's quite rare that contrib modules are handling
user data
embedded in their SQL definition, and when it happens it's mostly
configuration kind of data, such as with
PostGIS.

Now we want to fully support
extensions that are maintaining their own
user
data, or even those that are all about them. The main difficulty here is
that our current design of
dump and
restore support is following a model
where installing the same extesion in a new database is all covered by
create extension foo;. This is a limited model of the reality, that we need
to expand.

The first manifestation of those problems is in the
SEQUENCE support in
extensions, and that impacts one of my favorite extensions:
PGQ.

PostgreSQL releases

PostgreSQL just released an awesome release with
9.2, where we get
tremendous performance optimisations and truly innovative features, such as
RANGE TYPE. How not to consider PostgreSQL as a part of your application
stack, where to develop and host your features.

While users are enjoying the newer release, contributors are already
preparing the next one, hard at work again!

]]>dim@tapoueh.org (Dimitri Fontaine)Sat, 15 Sep 2012 18:43:00 +0200http://tapoueh.org/blog/2012/09/15-PostgreSQL-9.3.htmlAutumn 2012 Conferenceshttp://tapoueh.org/blog/2012/08/01-autumn-conferences.htmlThe
PostgreSQL community host a number of
conferences all over the year, and
the next ones I'm lucky enough to get to are approaching fast now. First,
next month in September, we have
Postgres Open in Chicago, where my talk
about
Large Scale Migration from MySQL to PostgreSQL has been selected!

This talk shares hindsights about the why and the how of that migration,
what problems couldn't be solved without moving away and how the solution
now looks. The tools used for migrating away the data, the methods the new
architecture are detailed. And the new home, in the cloud!

Not that much later after that the European PostgreSQL community is giving
us a very nice occasion to get to Prague with
PostgreSQL Conference Europe 2012 (October 23-26). If you've been meaning to
meet with the community, if you've been meaning to visit Prague someday, or
any mix of those two very good reasons, think about booking that conference
already.

]]>dim@tapoueh.org (Dimitri Fontaine)Thu, 02 Aug 2012 01:08:00 +0200http://tapoueh.org/blog/2012/08/01-autumn-conferences.htmlPGDay France 2012http://tapoueh.org/blog/2012/06/08-pgdayfr-lyon.htmlThe french PostgreSQL Conference,
pgday.fr, was yesterday in Lyon. We had a
very good time and a great schedule with a single track packed with 7 talks,
addressing a diverse set of PostgreSQL related topics, from GIS to fuzzy
logic, including replication.

You might have guessed it already, I did talk about replication. Here's the
slide deck I did use, it's in french, sorry if you don't grok that language.

The conference was very nice and did go smoothly, even if we were “only” 60
of us I had the pleasure to meet with different users with very different
set of needs. Very happy to have been there!

]]>dim@tapoueh.org (Dimitri Fontaine)Fri, 08 Jun 2012 16:17:00 +0200http://tapoueh.org/blog/2012/06/08-pgdayfr-lyon.htmlBack From PgConhttp://tapoueh.org/blog/2012/05/24-back-from-pgcon.htmlLast week was the annual
PostgreSQL Hackers gathering in Canada, thanks to
the awesome
pgcon conference. This year's issue has been packed with good
things, beginning with the
Cluster Summit then followed the next day by the
Developer Meeting just followed (yes, in the same day) with the
In Core Replication Meeting. That was a packed shedule!

The
in core replication project has been presented with slides titled
Future In-Core Replication for PostgreSQL and got a very good reception. For
instance, people implementing
Slony (
Jan Wieck,
Christopher Browne and
Steve
Singer where here) appreciated the concepts here and where rather supportive
of both the requirements and the design, and appreciated the very early demo
and results that we had to show already, as a kind of a proof of concepts.

After those first two days, we could start the actual show. I had the honnor
to present a migration use case entitled
Large Scale MySQL Migration where
we're speaking about going from MySQL to PostgreSQL, from 37 to 256 shards,
moving more than 6TB of data including binary
blobs that we had to process
with
pl/java. A quite involved migration project whose slides you now can
read here:

I've heard that we should soon be able to enjoy audio and video recordings
of the sessions, so if you couldn't make it this year for any reason, don't
miss that, you will have loads of very interesting talks to virtually
attend. I definitely will do that to catch-up with some talks I couldn't
attend, having to pick one out of three is not an easy task, all the more
when you add the providential
hallway track.

]]>dim@tapoueh.org (Dimitri Fontaine)Thu, 24 May 2012 09:40:00 +0200http://tapoueh.org/blog/2012/05/24-back-from-pgcon.htmlClean PGQ Subconsumershttp://tapoueh.org/blog/2012/04/26-unregister-subconsumers.htmlNow that you're all using the wonders of
Cooperative Consumers to help you
efficiently and reliably implement your business constraints and offload
them from the main user transactions, you're reaching a point where you have
to clean up your development environment (because that's what happens to
development environments, right?), and you want a way to start again from a
clean empty place.

Here we go. It used to be much more simple than that, so if you're still
using
PGQ from
Skytools2, just jump to the next step.

Unregister Subconsumers

That query will figure out subconsumers in the system function
pgq.get_consumer_info() and ask PGQ to please
unregister them, losing events
in the way, even events from batches that are currently active.

with subconsumers as (
select q1.queue_name,
q2.consumer_name,
substring(q1.consumer_name from '%.#"%#"' for '#') as subconsumer_name
from (select *
from pgq.get_consumer_info()
where lag is null)
as q1
join (select *
from pgq.get_consumer_info()
where lag is not null)
as q2
on q1.queue_name = q2.queue_name
)
select *,
pgq_coop.unregister_subconsumer(queue_name, consumer_name,
subconsumer_name, 1)
from subconsumers;

Unregister Consumers

Now that the first step is done, we have to
unregister the main consumers,
which is easy and what you already did before:

Drop queues

And as we want to really clean up the mess, let's also drop the queues.

select queue_name, pgq.drop_queue(queue_name)
from pgq.queue;

]]>dim@tapoueh.org (Dimitri Fontaine)Thu, 26 Apr 2012 15:05:00 +0200http://tapoueh.org/blog/2012/04/26-unregister-subconsumers.htmlPGQ Coop Consumershttp://tapoueh.org/blog/2012/03/12-PGQ-Cooperative-Consumers.htmlWhile working a new
PostgreSQL architecture for an high scale project that
used to be in the top 10 of internet popular web sites (in terms of
visitors), I needed to be able to off load some processing from the main
path: that's called a
batch job. This needs to be
transactional: don't run
the job if we did
rollback; the transaction, process all
events that were
part of the same transaction in the same transaction, etc.

That calls for using
PGQ, the
jobs queue solution from
Skytools, the power
horse for
Londiste. If
PGQ is good enough to build a full trigger-based
replication solution on top of it, certainly it's good enough for our custom
processing, right? Well, you still need to check that your expectations are
met, and that was happily the case in my implementation. It's a very common
problem, and
PGQ very often is a great solution to it.

As this implementation is
PHP centric, we've been using
libphp-pgq to drive
our background workers. Using
PGQ in
PHP has been very easy to setup, the
only trap being not to forget about running the
ticker process.

It got interesting because of two elements. First, we're nor running a
single database instance here but a bunch of them... make it
256 databases.
Each of them having
5 queues to consume, that would be about
1280 consumer
processes, distributed on
16 servers that's still
80 per server, so way too
many. What we did instead is reuse the
queue mover script found in the
Skytools distribution and adapt it to
forward the event of the 1280 source
queues to only 5 destination queues. We then process the events from this
single location.

Now it's easier to deal with, but we're not still exactly there. Of course,
with so many sources, concentrating them all into the same place means that
a single consumer is not able to process the events as fast as they are
produced. That's where the
cooperative consuming shines, it's very easy to
turn your
consumer into a
cooperative one even on an existing and running
queue, and that's what we did. So now we can choose how many
workers we want
per queue: one of them has 4 workers, another one see not so much activity
and 1 worker still fits.

The queue mover script that knows how to subscribe to many queues from the
same process is going to be contributed to Skytools proper, of course.

]]>dim@tapoueh.org (Dimitri Fontaine)Mon, 12 Mar 2012 14:43:00 +0100http://tapoueh.org/blog/2012/03/12-PGQ-Cooperative-Consumers.htmlExtension White Listinghttp://tapoueh.org/blog/2012/03/08-extension-white-listing.htmlPostgreSQL 9.1 includes proper extension support, as you might well know if
you ever read this very blog here. Some hosting facilities are playing with
PostgreSQL at big scale (hello
Heroku!) and still meet with small caveats
making their life uneasy.

To be specific, only
superusers are allowed to install C coded stored
procedures, and that impacts a lot of very useful PostgreSQL extension: all
those shiped in the
contrib package are coded in C. Now,
Heroku is not
giving away
superuser access to their hosted customers in order to limit the
number of ways they can shoot themselves in the foot. And given PostgreSQL
security model, being granted
database owner is mostly good enough for day
to day operation.

Mostly, but as we see, not completely good enough. How to arrange for a non
superuser to be able to still install a C-coded extension in his own
database? That's quite dangerous as any bug causing a crash would mean a
PostgreSQL whole restart. So you not only want to empower
CREATE EXTENSION
to database owners, you also want to be able to review and explicitely
white
list the allowed extensions.

Here we go:
pgextwlist is a PostgreSQL extensions implementing just that
idea. You have to tweak
local_preload_libraries so that it gets loaded
automatically and early enough, and you have to provide for the list of
authorized extensions in the
extwlist.extensions setting.

As you can see, it allows non
superusers to install an extension written in C.

]]>dim@tapoueh.org (Dimitri Fontaine)Thu, 08 Mar 2012 14:25:00 +0100http://tapoueh.org/blog/2012/03/08-extension-white-listing.htmlpgbouncer munin pluginhttp://tapoueh.org/blog/2011/11/16-pgbouncer-munin.htmlIt seems that if you search for a
munin plugin for
pgbouncer it's easy
enough to reach an old page of mine with an old version of my plugin, and a
broken link. Let's remedy that by publishing here the newer version of the
plugin. To be honest, I though it already made its way into the official
munin
1.4 set of plugins, but I've not been following closely enough.

As the plugin is 300 lines of python code, it's not a good idea to just
inline it here, so please grab it at
pgbouncer_.

You might need to know that the script name once installed should follow the
form
pgbouncer_dbname_stats_requests or
pgbouncer_dbname_pools, where of
course
dbname can contain any number of
_ characters. This script supports
quite old versions of
pgbouncer that didn't accept the normal
pq protocol,
you did have to use
psql to have any chance of getting the data from a
script, you couldn't then just use a PostgreSQL driver such as
psycopg2.

]]>dim@tapoueh.org (Dimitri Fontaine)Wed, 16 Nov 2011 14:00:00 +0100http://tapoueh.org/blog/2011/11/16-pgbouncer-munin.htmlBack From Amsterdamhttp://tapoueh.org/blog/2011/10/26-back-from-amsterdam.htmlAnother great conference took place last week,
PostgreSQL Conference Europe 2011 was in Amsterdam and plenty of us
PostgreSQL geeks were too. I attended to lot of talks and did learn some
more about our project, its community and its features, but more than that
it was a perfect occasion to meet with the community.

Dave Page talked about
SQL/MED under the title
PostgreSQL at the center of your dataverse and detailed what to expert from
a
Foreign Data Wrapper in PostgreSQL 9.1, then how to write your own.
Wherever you are currently managing your data, you can easily enough make it
so that PostgreSQL integrates them by means of fetching them to answer your
queries. Which means real time data federating: you don't copy data around,
you remote access them when executing the query.

I might need to come up with new
Foreign Data Wrappers in a not too distant
future, now that I better grasp how much work it really is to do that, it
appears to be a good migration strategy too:

In the presentation of
Synchronous Replication and Durability Tuning we
mainly saw that mixing
synchronous and
asynchronous transactions in your
application is the key to real performances across the ocean, as the speed
of the light is not infinite. From Baltimore to Amsterdam the latency can
not be better than
100ms and that's not the same as
instant nowadays.

Then again, depending on the number of concurrent queries to sync over the
ocean link, the experimental setup was able to achieve several thousands of
queries per second, which is validating the model we picked for
sync rep and
its implementation.

If you want to read the slides again at home, or if you could not be there
for some reason, then most of the talks are now available online at the
PostgreSQL Conference Europe Talks 2011 wiki page.

]]>dim@tapoueh.org (Dimitri Fontaine)Wed, 26 Oct 2011 10:08:00 +0200http://tapoueh.org/blog/2011/10/26-back-from-amsterdam.htmlImplementing backupshttp://tapoueh.org/blog/2011/10/12-backup-strategy.htmlI've been asked about my opinion on backup strategy and best practices, and
it so happens that I have some kind of an opinion on the matter.

I tend to think best practice here begins with defining properly the
backup
plan you want to implement. It's quite a complex matter, so be sure to ask
yourself about your needs: what do you want to be protected from?

The two main things to want to protect from are hardware loss (crash
disaster, plane in the data center, fire, water flood, etc) and human error
(
UPDATE without a where clause). Replication is an answer to the former,
archiving and dumps to the latter. You generally need both.

Often enough “backups” include
WALarchiving and
shipping and nightly or
weekly
base backups, with some retention and some scripts or procedures
ready to setup
Point In Time Recovery and recover some data without
interfering with the WAL archiving and shipping. Of course with PostgreSQL
9.0 and 9.1, the
WAL Shipping can be implemented with
streaming replication
and you can even have a
Hot Standby. But for backups you still want
archiving.

Mostly I still implement
pg_dump -Fc nightly backups with a custom retention
(for example, 1 backup a month kept 2 years, 1 backup a week kept 6 or 12
months, 1 backup a night kept 1 to 2 weeks), when the database size allow
the
pg_dump run to remain constrained in the
maintenance window, if any.

Don't forget that while
pg_dump runs, you can't roll out
DDL changes to the
production system any more, so you want to be careful about this
maintenance window thing. When you have one.

Physical backups are not locking
rollouts away, but they often suck a good
deal of the
IO bandwidth so you need to pick up a right timing to do them.
That's how you can get to once a week base backup and WAL
archiving.

If you can't
pg_dump production, maybe you can have
automated restore jobs
from the
physical backups that you then
pg_dump -Fc, so that you still have
that. That can come up handy, really: you can't test your
major upgrade out
of a
physical backup.

Also,
obviously, never consider your backup strategy implemented until you
have either
automated restores in place or a regular schedule to exercise
them (
staging instances, devel instances).

Then as far as the practical tools go, I tend to think that
pg_staging is
worth its installation complexity, and for WAL archiving and base backup I
recommend
walmgr from
Skytools, that's a very handy tool. When using
PostgreSQL
9.0 or
9.1, consider using
walmgr3 so that it's behaving nice
alongside
streaming replication.

]]>dim@tapoueh.org (Dimitri Fontaine)Wed, 12 Oct 2011 22:22:00 +0200http://tapoueh.org/blog/2011/10/12-backup-strategy.htmlScaling Stored Procedureshttp://tapoueh.org/blog/2011/10/06-scaling-with-stored-procedures.htmlIn the news recently
stored procedures where used as an excuse for moving
away logic from the database layer to application layer, and to migrate away
from a powerful technology to a simpler one, now that there's no logic
anymore in the database.

It's not the way I would typically approach scaling problems, and apparently
I'm not alone on the
Stored Procedures camp. Did you read this nice blog
post
Mythbusters: Stored Procedures Edition already? Well it happens in
another land that where my comfort zone is, but still has some interesting
things to say.

I won't try and address all of the myths they attack in a single article.
Let's pick the scalability problems, the two of them I think about are code
management and performances. We are quite well equiped for that in
PostgreSQL, really.

For code maintainance we now have
PostgreSQL Extensions, which allows you to
pack all your procedures into separate
extensions, and to maintain a version
number and upgrade procedures for each of them. You can handle separate
rollouts in development for going from
1.12 to
1.13 then
1.14 and after the
developers tested it more completely and changed their mind again on the
best API they want to work with,
1.15 which is stamped ok for production.
At this point,
ALTER EXTENSION UPGRADE will happily apply all the rollouts
in sequence to upgrade from
1.12 straight to
1.15 in one go. And if you
prefer to bake a special careful script to handle that big jump, you also
can provide a specific
extension--1.12--1.15.sql script.

Of course you're managing all those files with your favorite
SCM, to answer
to some other myth from the blog reference we are loosely following.

I wanted to talk about the other side of the scalability problem, which is
the operations side of it. What happens when you need to scale the database
in terms of its size and level of concurrent activity? PostgreSQL earned a
very good reputation at being able to scale-up, what about scaling-out?
Certainly, now that you're all down into
Stored Procedure, it's going to be
a very bad situation?

Well, in fact, you're then in a very good position here, thanks to
PLproxy.
This
extension is a custom procedural language whose job is to handle a
cluster of database shards that all expose the same PL API, and it's very
good at doing that.

Stored Procedures are a very good tool to have, be sure to get comfortable
enough with them so that you can choose exactly when to use them. If you're
not sure about that, we at
2ndQuadrant will be happy to help you there!

]]>dim@tapoueh.org (Dimitri Fontaine)Thu, 06 Oct 2011 18:23:00 +0200http://tapoueh.org/blog/2011/10/06-scaling-with-stored-procedures.htmlSee you in Amsterdamhttp://tapoueh.org/blog/2011/10/04-see-you-in-Amsterdam.htmlThe next
PostgreSQL conference is approaching very fast now, I hope you have
your ticket already: it's a very promissing event! If you want some help in
deciding whether to register or not, just have another look at
the schedule.
Pick the talks you want to see. It's hard, given how packed with good ones
the schedule is. When you're mind is all set, review the list. Registered?

I'll be presenting another talk about extensions, but this time I've geared
up to use cases, with
Extensions are good for business logic. The idea is
not to talk about how to make PostgreSQL play fair with extensions including
at
dump and
restore times, that's already done and I've been talking only
too much about it. The idea this time is to figure out how much you get
from this feature.

If you ever felt like something is missing in your processes between pushing
rollouts in devel environments and refining them as developers are testing
and preparing something for the live databases, then we have something for
you here. Including how to easily compare state between production and
development, but without having to guess or reverse engineer anything.

Yeah, extensions are all about getting even more professional! A great tool
you'll be happy to master!

And now I need to prepare a damn good slide deck, right? See you there! :)

]]>dim@tapoueh.org (Dimitri Fontaine)Tue, 04 Oct 2011 14:25:00 +0200http://tapoueh.org/blog/2011/10/04-see-you-in-Amsterdam.htmlSkytools3: walmgrhttp://tapoueh.org/blog/2011/09/21-skytools-walmgr-part-1.htmlLet's begin the
Skytools 3 documentation effort, which is long overdue. The
code is waiting for you over at
github, and is stable and working. Why is
it still in
release candidate status, I hear you asking? Well because it's
missing updated documentation.

WalMgr is the Skytools component that manages
WAL shipping for you, and
archiving too. It knows how to prepare your master and standby setup, how
to take a base backup and push it to the standby's system, how to archive
(at the satndby) master's WAL files as they are produced and have the
standby restore from this archive.

What's new in
walmgr from Skytools 3 is its support for
Streaming
Replication that made its way into PostgreSQL 9.0 and is even more useful in
PostgreSQL 9.1 (better monitoring, synchronous replication option).

Getting ready

Now, I'm using debian here, and a build virtual machine where I'm doing the
backporting work. As
PostgreSQL 9.1 is now out, let's use that.

So the order of operations is to prepare a standby, then have it restore
from the archives, then activate the wal streaming and check that the setup
allows the standby to switch back and forth between the streaming and the
archives.

Setting walmgr

To prepare the standby, we will do a
base backup of the master. That step
is handled by
walmgr, so we first need to set it up. Here's the sample
master.ini file:

Following articles will show how to manage that archive and how to go from
that to an
Hot Standby fed by either
Streaming Replication or
Archives.

]]>dim@tapoueh.org (Dimitri Fontaine)Wed, 21 Sep 2011 17:21:00 +0200http://tapoueh.org/blog/2011/09/21-skytools-walmgr-part-1.htmlPostgreSQL and debianhttp://tapoueh.org/blog/2011/09/05-apt-postgresql-org.htmlAfter talking about it for a very long time, work finally did begin! I'm
talking about the
apt.postgresql.org build system that will allow us, in the
long run, to propose
debian versions of binary packages for
PostgreSQL and
its extensions, compiled for a bunch of debian and ubuntu versions.

We're now thinking to support the
i386 and
amd64 architectures for
lenny,
squeeze,
wheezy and
sid, and also for
maverick and
natty, maybe
oneiric too
while at it.

It's still the very beginning of the effort, and it was triggered by the
decision to move
sid to
9.1. While it's a good decision in itself, I still
hate to have to pick only one PostgreSQL version per debian stable release
when we have all the technical support we need to be able to support all
stable releases that
upstream is willing to maintain. If you've been living
under a rock, or if you couldn't care less about
debian choices, the problem
here for debian is ensuring security (and fixes) updates for PostgreSQL —
they promise they will handle the job just fine in the social contract, and
don't want to have to it without support from PostgreSQL if a
debian stable
release contains a deprecated PostgreSQL version.

That opens the door for PostgreSQL community to handle the packaging of its
solutions as a service to its debian users. We intend to open with support
for
8.4,
9.0 and
9.1, and maybe
8.3 too, as
Christoph Berg is doing good
progress on this front. See, it's teamwork here!

We still have more work to do, and setting up the build environment so that
we are able to provide the packages for so much targets will indeed be
interesting. Getting there, a step after another.

]]>dim@tapoueh.org (Dimitri Fontaine)Mon, 05 Sep 2011 17:14:00 +0200http://tapoueh.org/blog/2011/09/05-apt-postgresql-org.htmlpg_restore -L & pg_staginghttp://tapoueh.org/blog/2011/08/29-pgstaging-and-pgrestore-listing.htmlOn the
PostgreSQL Hackers mailing lists,
Andrew Dunstan just proposed some
new options for
pg_dump and
pg_restore to ease our lives. One of the
answers was talking about some scripts available to exploit the
pg_restore
listing that you play with using options
-l and
-L, or the long name
versions
--list and
--use-list. The
pg_staging tool allows you to easily
exploit those lists too.

The
pg_restore list is just a listing of one object per line of all objects
contained into a
custom dump, that is one made with
pg_dump -Fc. You can
then tweak this listing in order to comment out some objects (prepending a
;
to the line where you find it), and give your hacked file back to
pg_restore
--use-list so that it will skip them.

What's pretty useful here, among other things, is that a table will have in
fact more than one line in the listing. One is for the
TABLE definition,
another one for the
TABLE DATA. So that
pg_staging is able to provide you
with options for only restoring some
schemas, some
schemas_nodata and even
some
tablename_nodata_regexp, to use directly the configuration options
names.

How to do a very simple exclusion of some table's data when restoring a
dump, will you ask me? There we go. Let's first prepare an environment,
where I have only a
PostgreSQL server running.

Now I have a dump with some nearly random SQL objects in it, let's filter
out the tables named
reformat and
parallel from that. We will take the
sample setup from the
pg_staging project. Going the quick route, we will
not even change the default sample database name that's used, which is
postgres. After all, the
catalog command of
pg_staging that we're using
here is a
developer command, you're supposed to be using
pg_staging for a
lot more services that just this one.

The little bonus with using
pg_staging is that when filtering out a
schema
it will track all tables and triggers from that schema, and also the
functions used in the trigger definition. Which is not as easy as it
sounds, believe me!

The practical use case is when filtering out
PGQ and
Londiste, then the
PGQ
triggers will automatically be skipped by
pg_staging rather than polluting
the
pg_restore logs because the
CREATE TRIGGER command could not find the
necessary implementation procedure.

]]>dim@tapoueh.org (Dimitri Fontaine)Mon, 29 Aug 2011 18:05:00 +0200http://tapoueh.org/blog/2011/08/29-pgstaging-and-pgrestore-listing.htmlSkytools, version 3http://tapoueh.org/blog/2011/08/26-skytools3.htmlYou can find
skytools3 in debian experimental already, it's in
release
candidate status. What's missing is the documentation, so here's an idea:
I'm going to make a blog post series about
skytools next features, how to
use them, what they are good for, etc. This first article of the series
will just list what are those new features.

Here are the slides from the
CHAR(11) talk I made last month, about that
very subject:

The new version comes with a lot of new features.
PGQ now is able to
duplicate the queue events from one node to the next, so that it's able to
manage
switching over. To do that we have three types of nodes now,
root,
branch and
leaf.
PGQ also supports
cooperative consumers, meaning that you
can share the processing load among many
consumers, or workers.

Londiste now benefits from the
switch over feature, and is packed with new
little features like
add <table> --create, the new
--trigger-flags argument,
and the new
--handler thing (to do e.g. partial table replication). Let's
not forget the much awaited
execute <script> command that allows to include
DDL commands into the replication stream, nor the
parallelCOPY support that
will boost your initial setup.

walmgr in the new version behaves correctly when using
PostgreSQL 9.0.
Meaning that as soon as no more
WAL files are available in the archives, it
returns an error code to the
archiver so that the server switches to
streaming live from the
primary_conninfo, then back to replaying the files
from the archive if the connection were to fail, etc. All in all, it just
works.

Details to follow here, stay tuned!

]]>dim@tapoueh.org (Dimitri Fontaine)Fri, 26 Aug 2011 21:30:00 +0200http://tapoueh.org/blog/2011/08/26-skytools3.htmlpgfincore in debianhttp://tapoueh.org/blog/2011/08/19-pgfincore-in-debian.htmlAs of pretty recently,
pgfincore is now in debian, as you can see on its
postgresql-9.0-pgfincore page. The reason why it entered the
debian
archives is that it reached the
1.0 release!

Rather than talking about what
pgfincore is all about (
A set of functions to
manage pages in memory from PostgreSQL), I will talk about its packaging and
support as a
debian package. Here's the first example of a modern
multi-version packaging I have to offer.
pgfincore packaging supports
building for
8.4 and
9.0 and
9.1 out of the box, even if the only binary
you'll find in
debian sid is the
9.0 one, as you can check on the
pgfincore debian source package page.

Also, this is the first package I've done properly using the newer version
of
debhelper, which make the
debian/rules file easier than ever. Let's have
a look at it:

The
debian/rules file is known to be the corner stone of your debian
packaging, and usually is the most complex part of it. It's a
Makefile at
its heart, and we can see that thanks to the
debhelper magic it's not that
complex to maintain anymore.

Then, this file is using support from a bunch of helpers command, each of
them comes with its own man page and does a little part of the work. The
overall idea around
debhelper is that what it does covers 90% of the cases
around, and it's not aiming for more. You have to
override the parts where
it defaults to being wrong.

Here for example the build system has to produce files for all three
supported versions of
PostgreSQL, which means invoking the same build system
three time with some changes in the
environment (mainly setting the
PG_CONFIG variable correctly). But even for that we have a
debian facility,
that comes in the package
postgresql-server-dev-all, called
pg_buildext. As
long as your extension build system is
VPATH friendly, it's all automated.

Please read that last sentence another time.
VPATH is the thing that allows
Make to find your source tree somewhere in the system, not in the current
working directory. That allows you to cleanly build the same sources in
different build locations, which is exactly what we need here, and is
cleanly supported by
PGXS, the
PostgreSQL Extension Building Infrastructure.

Which means that the main
Makefile of
pgfincore had to be simplified, and
the code layout too. Some advances
Make features such as
$(wildcard ...)
and all will not work here. See what we got at the end:

No more
Make magic to find source files. Franckly though, when your sources
are 1
c file and 2
sql files, you don't need that much magic anyway. You
just want to believe that a single generic
Makefile will happily build any
project you throw at it, only requiring minor adjustment. Well, the reality
is that you might need some more little adjustments if you want to benefit
from
VPATH building, and having the binaries for
8.4 and
9.0 and
9.1 built
seemlessly in a simple loop. Like we have here for
pgfincore.

Now the
Makefile still contains a little bit of magic, in order to parse the
extension version number from its
control file and produce a
script named
accordingly. Then you'll notice a difference between the
postgresql-9.1-pgfincore.install file and the
postgresql-9.0-pgfincore.install. We're just not shipping the same files:

All of this will come pretty handy when we finally sit down and work on a
way to provide binary packages for PostgreSQL and its extensions, and all
supported versions of those at that. This very project is not dead, it's
just sleeping some more.

]]>dim@tapoueh.org (Dimitri Fontaine)Fri, 19 Aug 2011 23:00:00 +0200http://tapoueh.org/blog/2011/08/19-pgfincore-in-debian.htmlpgloader tutorialhttp://tapoueh.org/blog/2011/08/15-pgloader-tutorial.htmlTo finish up the pgloader series, I've compiled all the information into a
single page, the long awaited
pgloader tutorial. That should help lots of
users to get started with
pgloader.
]]>dim@tapoueh.org (Dimitri Fontaine)Mon, 15 Aug 2011 15:33:00 +0200http://tapoueh.org/blog/2011/08/15-pgloader-tutorial.htmlpgloader constant colshttp://tapoueh.org/blog/2011/08/12-pgloader-udc.htmlThe previous articles in the
pgloader series detailed
How To Use PgLoader
then
How to Setup pgloader, then what to expect from a
parallel pgloader
setup, and then
pgloader reformating. Another need you might encounter when
you get to use
pgloader is adding
constant values into a table's column.

The basic situation where you need to do so is adding an
origin field to
your table. The value of that is not to be found in the data file itself,
typically, but known in the pgloader setup. That could even be the
filename
you are importing data from.

Of course the configuration is not so straightforward as to process fields
in the data file in the order that they appear, after all the
examples/pgloader.conf are also a test suite.

Long story short: if you need to add some
constant values into the target
table you're loading data to,
pgloader will help you there!

]]>dim@tapoueh.org (Dimitri Fontaine)Fri, 12 Aug 2011 11:00:00 +0200http://tapoueh.org/blog/2011/08/12-pgloader-udc.htmlpgloader reformatinghttp://tapoueh.org/blog/2011/08/05-reformating-modules-for-pgloader.htmlBack to our series about
pgloader. The previous articles detailed
How To Use PgLoader then
How to Setup pgloader, then what to expect from a
parallel pgloader setup. This article will detail how to
reformat input
columns so that what
PostgreSQL sees is not what's in the data file, but the
result of a
transformation from this data into something acceptable as an
input for the target data type.

Here's what the
pgloader documentation has to say about this
reformat
parameter:
The value of this option is a comma separated list of columns to
rewrite, which are a colon separated list of column name, reformat module
name, reformat function name.

The documentation says some more about it, so check it out. Also, the
reformat_path option (set either on the command line or in the configuration
file) is used to find the python module implementing the reformat function.
Please refer to the manual as to how to set it.

Now, obviously, for the
reformat to happen we need to write some code.
That's the whole point of the option: you need something very specific, you
are in a position to write the 5 lines of code needed to make it happen,
pgloader allows you to just do that. Of course, the code needs to be
written in python here, so that you can even benefit from the
parallel pgloader settings.

Let's see an reformat module exemple, as found in
reformat/mysql.py in the
pgloader sources:

Whenever you have an input file with data that PostgreSQL chokes upon, you
can solve this problem from
pgloader itself: no need to resort to scripting
and a pipelines of
awk (which I use a lot in other cases, don't get me
wrong) or other tools. See, you finally have an excuse to
Dive into Python!

]]>dim@tapoueh.org (Dimitri Fontaine)Fri, 05 Aug 2011 11:30:00 +0200http://tapoueh.org/blog/2011/08/05-reformating-modules-for-pgloader.htmlSee Tsung in actionhttp://tapoueh.org/blog/2011/08/02-see-tsung-in-action.htmlTsung is an open-source multi-protocol distributed load testing tool and a
mature project. It's been available for about 10 years and is built with
the
Erlang system. It supports several protocols, including the
PostgreSQL
one.

When you want to benchmark your own application, to know how many more
clients it can handle or how much gain you will see with some new shiny
hardware,
Tsung is the tool to use. It will allow you to
record a number of
sessions then replay them at high scale.
pgfouine supports Tsung and is
able to turn your PostgreSQL logs into Tsung sessions, too.

Tsung did get used in the video game world, their version of it is called
uTsung, apparently using the
uLink game development facilities. They even
made a video demo of uTsung, that you might find interresting:

]]>dim@tapoueh.org (Dimitri Fontaine)Tue, 02 Aug 2011 10:30:00 +0200http://tapoueh.org/blog/2011/08/02-see-tsung-in-action.htmlParallel pgloaderhttp://tapoueh.org/blog/2011/08/01-parallel-pgloader.htmlThis article continues the series that began with
How To Use PgLoader then
detailed
How to Setup pgloader. We have some more fine points to talk about
here, today's article is about loading your data in parallel with
pgloader.

several files at a time

Parallelism is implemented in 3 different ways in pgloader. First, you can
load more than one file at a time thanks to the
max_parallel_sections
parameter, that has to be setup in the
global section of the file.

This setting is quite simple and already allows the most common use case.

several workers per file

The other use case is when you have huge files to load into the database.
Then you want to be able to have more than one process reading the file at
the same time. Using
pgloader, you already did the compromise to load the
whole content in more than one transaction, so there's no further drawback
here about having those multiple transactions per file spread to more than
one load
worker.

There are basically two ways to split the work between several workers here,
and both are implemented in pgloader.

N workers, N splits of the file

section_threads = 4
split_file_reading = True

Setup this way,
pgloader will launch 4 different
threads (see the
caveat
section of this article). Each thread is then given a part of the input
data file and will run the whole usual pgloader processing on its own. For
this to work you need to be able to
seek in the input stream, which might
not always be convenient.

one reader, N workers

section_threads = 4
split_file_reading = False
rrqueue_size = 5000

With such a setup,
pgloader will start 4 different worker
threads that will
receive the data input in an internal
python queue. Another active
thread
will be responsible of reading the input file and filling the queues in a
round robin fashion, but will hand all the processing of the data to each
worker, of course.

how many threads?

If you're using a mix and match of
max_parallel_sections and
section_threads
with
split_file_reading set to
True of
False, it's uneasy to know exactly
how many
threads will run at any time in the loading. How to ascertain
which section will run in parallel when it depends on the timing of the
loading?

The advice here is the usual one, don't overestimate the capabilities of
your system unless you are in a position to check before by doing trial
runs.

caveat

Current implementation of all the parallelism in
pgloader has been done with
the
python threading API. While this is easy enough to use when you want to
exchange data between threads, it's suffering from the
Global Interpreter Lock issue. This means that while the code is doing its
processing in parallel, the
runtime not so much. You might still benefit
from the current implementation if you have hard to parse files, or custom
reformat modules that are part of the loading bottleneck.

future

The solution would be to switch to using the newer
python multiprocessing
API, and some preliminary work has been done in pgloader to allow for that.
If you're interested in real parallel bulk loading,
contact-me!

]]>dim@tapoueh.org (Dimitri Fontaine)Mon, 01 Aug 2011 12:15:00 +0200http://tapoueh.org/blog/2011/08/01-parallel-pgloader.htmlHow to Setup pgloaderhttp://tapoueh.org/blog/2011/07/29-how-to-setup-pgloader.htmlIn a previous article we detailed
how to use pgloader, let's now see how to
write the
pgloader.conf that instructs
pgloader about what to do.

This file is expected in the
INI format, with a
global section then one
section per file you want to import. The
global section defines some
default options and how to connect to the
PostgreSQL server.

The configuration setup is fully documented on the
pgloader man page that
you can even easily find online. As all
unix style man pages, though, it's
more a complete reference than introductory material. Let's review.

global section

Here's the
global section of the
examples/pgloader.conf file of the source
files. Well, some options are
debugger only options, really, so I changed
their value so that what you see here is a better starting point.

You don't see all the connection setup, here
base was enough. You might
need to setup
host,
port and
user, and maybe even
pass, too, to be able to
connect to the PostgreSQL server.

The logging options allows you to set a file where to log all
pgloader
messages, that are categorized as either
DEBUG,
INFO,
WARNING,
ERROR or
CRITICAL. The options
log_min_messages and
client_min_messages are another
good idea stolen from
PostgreSQL and allow you to setup the level of chatter
you want to see on the interactive console (standard output and standard
error streams) and on the log file.

Please note that the
DEBUG level will produce more that 3 times as many data
as the data file you're importing. If you're not a
pgloader contributor or
helping them, well,
debug it, you want to avoid setting the log chatter to
this value.

The
client_encoding will be
SET by
pgloader on the PostgreSQL connection it
establish. You can now even set any parameter you want by using the
pg_option_parameter_name magic settings. Note that the command line option
--pg-options (or
-o for brevity) allows you to override that.

Then, the
copy_every parameter is set to
5 in the examples, because the test
files are containing less than 10 lines and we want to test several
batches
of commits when using them. So for your real loading, stick to default
parameters (
10 000 lines per
COPY command), or more. You can play with this
parameter, depending on the network (or local access) and disk system you're
using you might see improvements by reducing it or enlarging it. There's no
so much theory of operation as empirical testing and setting here. For a
one-off operation, just remove the lines from the configuration.

The parameters
null and
empty_string are related to interpreting the data in
the text or
csv files you have, and the documentation is quite clear about
them. Note that you have global setting and per-section setting too.

The last parameter of this example,
max_parallel_sections, is detailed later
in the article.

files section

After the
global section come as many sections as you have file to load.
Plus the
template sections, that are only there so that you can share a
bunch of parameters in more than one section. Picture a series of data file
all of the same format, the only thing that will change is the
filename.
Use a template section in this case!

That's 2 of the examples from the
examples/pgloader.conf file, in 3 sections
so that we see one template example. Of course, having a single section
using the template, it's just here for the example.

data file format

The most important setting that you have to care about is the file format.
Your choice here is either
text,
csv or
fixed. Mostly, what we are given
nowadays is
csv. You might remember having read that the nice thing about
standards is that there's so many to choose from... well, the
csv land is
one where it's pretty hard to find different producers that understand it
the same way.

So when you fail to have pgloader load your
mostly csv files with a
csv
setup, it's time to consider using
text instead. The
text file format
accept a lot of tunables to adapt to crazy situations, but is all
python
code when the
python csv module is a C-coded module, more efficient.

If you're wondering what kind of format we're talking about here, here's the
cluttered pgloader example for your reading pleasure, using
^ (carret) as
the field separator:

So when you have such kind of data, well, it might be that
pgloader is still
able to help you!

Please refer to the
pgloader man page to know about each and every parameter
that you can define and the values accepted, etc. And the
fixed data format
is to be used when you're not given a field separator but field positions in
the file. Yes, we still encounter those from time to time. Who needs
variable size storage, after all?

]]>dim@tapoueh.org (Dimitri Fontaine)Fri, 29 Jul 2011 15:00:00 +0200http://tapoueh.org/blog/2011/07/29-how-to-setup-pgloader.htmlNext month partitionshttp://tapoueh.org/blog/2011/07/27-check-parts-for-next-month.htmlWhen you do partition your tables monthly, then comes the question of when
to create next partitions. I tend to create them just the week before next
month and I have some nice
nagios scripts to alert me in case I've forgotten
to do so. How to check that by hand in the end of a month?

As you see, our partitions are named
_YYYYMM so that's it's easy to match
them in our queries, but I guess about everyone does about the same here.
Then the
to_char expressions only allow to not enter manually
'%201108' in
the query text. And there's a trick so that we display how many partitions
we have this month, adding a line to the result...

]]>dim@tapoueh.org (Dimitri Fontaine)Wed, 27 Jul 2011 22:35:00 +0200http://tapoueh.org/blog/2011/07/27-check-parts-for-next-month.htmlHow To Use PgLoaderhttp://tapoueh.org/blog/2011/07/22-how-to-use-pgloader.htmlThis question about
pgloader usage coms in quite frequently, and I think the
examples
README goes a long way in answering it. It's not exactly a
tutorial but is almost there. Let me paste it here for reference:

installing pgloader

Either use the
debian package or the one for your distribution of choice if
you use another one. RedHat, CentOS, FreeBSD, OpenBSD and some more already
include a binary package that you can use directly.

Or you could
git clone https://github.com/dimitri/pgloader.git and go from
there. As it's all
python code, it runs fine interpreted from the source
directory, you don't
need to install it in a special place in your system.

setting up the test environment

To use them, please first create a
pgloader database, then for each example
the tables it needs, then issue the pgloader command:

]]>dim@tapoueh.org (Dimitri Fontaine)Fri, 22 Jul 2011 13:38:00 +0200http://tapoueh.org/blog/2011/07/22-how-to-use-pgloader.htmlSkytools3 talk Slideshttp://tapoueh.org/blog/2011/07/19-skytools3-talk-slides.htmlIn case you're wondering, here are the slides from the
CHAR(11) talk I gave,
about
Skytools3.0,
soon to be released. That means as soon as I have
enough time available to polish (or write) the documentation.

The slides for all the talks should eventually make their way to a central
place, but expect some noticable delay here. Sorry about that, and have a
good reading meanwhile!

]]>dim@tapoueh.org (Dimitri Fontaine)Tue, 19 Jul 2011 14:24:00 +0200http://tapoueh.org/blog/2011/07/19-skytools3-talk-slides.htmlBack From CHAR(11)http://tapoueh.org/blog/2011/07/13-back-from-char11.htmlCHAR(11) finished somewhen in the night leading to today, if you consider
the
social events to be part of it, which I definitely do. This conference
has been a very good one, both on the organisation side of things and of
course for its content.

It began with a perspective about the evolution of replication solutions, by
Jan Wieck himself. In some way
Skytools is an evolution of
Slony, in the
sense that it reuses the same concepts, a part of the design, and even share
bits of the implementation (like the
txid_snapshot datatype that were added
in PostgreSQL 8.3). The evolution occured in choosing a subset of the
features of Slony and then simplifying the user interface as much as
possible. And with Skytools 3.0, those features that were removed but still
are useful to solve real-life problems are now available too.

Of course the talk did approach the other replication solutions (not just
the trigger based ones), and did compare
RServ to
Bucardo for example. And
then all those were compared to the
PostgreSQL core replication facilities,
which are quite a different animal. It was a really nice
keynote here,
preparing the audience minds to make the most out of all the other talks.

I will not review all the talks in details, as I'm pretty sure some other
attendees will turn into reporters themselves: scaling the write load!

Still
repmgr got its share of attention.
Greg Smith and
Cédric Villemain
did present both how to do
read scaling and
auto failover management with
this tool, going into fine details about how it works internally and how to
best design your services architecture for maximum
data availibility. The
question and answers section led to insist on the fact that you can not have
data availibility with less than 3 production nodes.

Magnus Hagander detailed how flexible the core protocol support for
replication (and streaming) really is. That flexibility means that you can
quite easily talk this protocol from any application, and the idea of a
wal
proxy did pop out again (see
Back from PgCon2010 article for my first
mentionning of the idea). The main difference is that we now have
synchronous replication support, so that the proxy could be trusted both for
archiving and serving standbys.

Of course
Simon still has lots of ideas about next 10 years of replication
oriented projects for core PostgreSQL code, and his talk nicely summarized
the previous 7 years. Future is bright, and guess what, it's beginning
today!

We also heard about
Heroku, and these guys are doing crazy impressive
things. Like running
150 000 PostgreSQL instances, for example, showing
that you can actually use our prefered database server in the hosting
business. I expect that the maturing solution and tool sets providing data
availibility are soon to be a game changer here. What they are doing is
designing a
flexible data architecture with strong guarantees (
no data
loss). The
cloud elasticity is reaching out from the stateless services,
and
those guys are making it happen now.

May you live in interresting times!

]]>dim@tapoueh.org (Dimitri Fontaine)Wed, 13 Jul 2011 17:15:00 +0200http://tapoueh.org/blog/2011/07/13-back-from-char11.htmlMulti-Version support for Extensionshttp://tapoueh.org/blog/2011/06/29-multi-version-support-for-extensions.htmlWe still have this problem to solve with extensions and their packaging.
How to best organize things so that your extension is compatible with before
9.1 and
9.1 and following releases of
PostgreSQL?

Well, I had to do it for the
ip4r contribution, and I wanted the following
to happen:

And here's a simple enough way to achieve that. First, you have to get your
packaging ready the usual way, and to install the build dependencies. Then
realizing that
/usr/share/postgresql-common/supported-versions from the
latest
postgresql-common package will only return
8.3 in
lenny (yes, I'm
doing some
backporting here), we have to tweak it.

Now we are allowed to build our extension for all those versions, so we add
9.1 to the
debian/pgversions file. And
debuild will do the right thing now,
thanks to
pg_buildext from
postgresql-server-dev-all.

The problem we face is that the built is not an
extension as in
9.1, so
things like
\dx in
psql and
CREATE EXTENSION will not work out of the box.
First, we need a control file. Then we need to remove the transaction
control from the install script (here,
ip4r.sql), and finally, this script
needs to be called
ip4r--1.05.sql. Here's how I did it:

Be careful not to forget to remove any and all
BEGIN; and
COMMIT; lines from
the
ip4r.sql file, which meant that I also removed support for
Rtree, which
is not relevant for modern versions of PostgreSQL saith the script (post
8.2). That means I'm not publishing this very work yet, but I wanted to
share the
debian/postgresql-9.1-extension.links idea.

Notice that I didn't change anything about the
.sql.in make rule, so I
didn't have to use the support for
module_pathname in the control file.

Now, after the usual
debuild step, I can just
sudo debi to install all the
just build packages and
CREATE EXTENSION will run fine. And in
9.0 you get
the old way to install it, but it still works:

]]>dim@tapoueh.org (Dimitri Fontaine)Wed, 29 Jun 2011 09:50:00 +0200http://tapoueh.org/blog/2011/06/29-multi-version-support-for-extensions.htmlBack from Ottawa, preparing for Cambridgehttp://tapoueh.org/blog/2011/05/30-back-from-ottawa-preparing-for-cambridge.htmlWhile
Magnus is all about
PG Conf EU already, you have to realize we're just
landed back from
PG Con in Ottawa. My next stop in the annual conferences
is
CHAR 11, the
Clustering, High Availability and Replication conference in
Cambridge, 11-12 July. Yes, on the old continent this time.

This year's
pgcon hot topics, for me, have been centered around a better
grasp at
SSI and
DDL Triggers. Having those beasts in
PostgreSQL would
allow for auditing, finer privileges management and some more automated
replication facilities. Imagine that
ALTER TABLE is able to fire a
trigger,
provided by
Londiste or
Slony, that will do what's needed on the cluster by
itself. That would be awesome, wouldn't it?

At
CHAR 11 I'll be talking about
Skytools 3. You know I've been working on
its
debian packaging, now is the time to review the documentation and make
there something as good looking as the monitoring system are...

Well, expect some news and a nice big picture diagram overview soon, if work
schedule leaves me anytime that's what I want to be working on now.

]]>dim@tapoueh.org (Dimitri Fontaine)Mon, 30 May 2011 11:00:00 +0200http://tapoueh.org/blog/2011/05/30-back-from-ottawa-preparing-for-cambridge.htmlPreparing for PGCONhttp://tapoueh.org/blog/2011/05/12-preparing-for-pgcon.htmlIt's this time of the year again, the main international
PostgreSQL Conference is next week in Ottawa, Canada. If previous years are
any indication, this will be great event where to meet with a lot of the
members of your community. The core team will be there, developers will be
there, and we will meet with users and their challenging use cases.

This is a very good time to review both what you did in the project those
last 12 months, and what you plan to do next year. To help with that,
several
meeting events are organized. They're like a whole-day round table
with a kind of an agenda, with a limited number of invited people in, and
very intense on-topic discussions about how to organize ourselves for
another great year of innovation in PostgreSQL.

Then we have two days full of talks where I usually learn some new aspect of
the project or of the product, and where ideas tend to just pop-up in a
continuous race. Being away from home and with people you see only once a
year (some of them more than that of course, hi European fellows!) seems to
allow for some broader thinking.

Well of course this list is just a first selection, hallway tracks are often
what guides me through talks or make me skip some.

See you there!

]]>dim@tapoueh.org (Dimitri Fontaine)Thu, 12 May 2011 10:30:00 +0200http://tapoueh.org/blog/2011/05/12-preparing-for-pgcon.htmlTables and Views dependencieshttp://tapoueh.org/blog/2011/05/04-tables-and-views-dependencies.htmlLet's say you need to
ALTER TABLE foo ALTER COLUMN bar TYPE bigint;, and
PostgreSQL is helpfully telling you that no you can't because such and such
views depend on the column. The basic way to deal with that is to copy
paste from the error message the names of the views involved, then prepare a
script wherein you first
DROP VIEW ...; then
ALTER TABLE and finally
CREATE
VIEW again, all in the same transaction.

A nice view from this table...

So you have to copy paste also the view definitions. With large view
definitions, it quickly gets cumbersome to do so. Well when you're working
on operations, you have to bear in mind that cumbersome is a synonym for
error prone, in fact — so you want another solution if possible.

Oh, and the other drawback of this solution is that
ALTER TABLE will first
take a
LOCK on the table, locking out any activity. And more than that, the
lock acquisition will queue behind current activity on the table, which
means waiting for a fairly long time and damaging the service quality on a
moderately loaded server.

It's possible to abuse the
system catalogs in order to find all
views that
depend on a given table, too. For that, you have to play with
pg_depend and
you have to know that internally, a
view is in fact a
rewrite rule. Then
here's how to produce the two scripts that we need:

Replace
SCHEMA.TABLENAME and
COLUMN_NAME with your targets here and the
first query should give you one row per candidate view. Well if you're not
using the
\o trick, that is — if you do, check out the generated file
instead, with
\! cat /tmp/drop.sql for example.

Please note that this catalog query is not accurate, as it will select as a
candidate any view that will by chance both depend on the target table and
contain the
column_name in its text definition. So either filter out the
candidates properly (by proper proof reading then another
WHERE clause), or
just accept that you might
DROP then
CREATE again more
views than need be.

If you need some more details about the
\t \o sequence you might be
interested in this older article about
resetting sequences.

]]>dim@tapoueh.org (Dimitri Fontaine)Wed, 04 May 2011 11:45:00 +0200http://tapoueh.org/blog/2011/05/04-tables-and-views-dependencies.htmlExtension module_pathname and .sql.inhttp://tapoueh.org/blog/2011/05/02-extension-module_pathname-and-sqlin.htmlWhile currently too busy at work to deliver much Open Source contributions,
let's debunk an old habit of
PostgreSQL extension authors. It's all down to
copy pasting from
contrib, and there's no reason to continue doing
$libdir
this way ever since
7.4 days.

Let's take an example here, with the
prefix extension. This one too will
need some love, but is still behind on my spare time todo list, sorry about
that. So, in the
prefix.sql.in we read

Two things are to change here. First, the PostgreSQL
backend will
understand just fine if you just say
AS '$libdir/prefix'. So you have to
know in the
sql script the name of the shared object library, but if you do,
you can maintain directly a
prefix.sql script instead.

The advantage is that you now can avoid a compatibility problem when you
want to support PostgreSQL from
8.2 to
9.1 in your extension (older than
that and it's
no longer supported). You directly ship your script.

For compatibility, you could also use the
control filemodule_pathname
property. But for
9.1 you then have to add a implicit
Make rule so that the
script is derived from your
.sql.in. And as you are managing several scripts
— so that you can handle
versioning and
upgrades — it can get hairy (
hint,
you need to copy
prefix.sql as
prefix--1.1.1.sql, then change its name at
next revision, and think about
upgrade scripts too). The
module_pathname
facility is better to keep for when managing more than a single extension in
the same directory, like the
SPI contrib is doing.

Sure, maintaining an extension that targets both antique releases of
PostgreSQL and
CREATE EXTENSION super-powered one(s) (not yet released) is a
little more involved than that. We'll get back to that, as some people are
still pioneering the movement.

On my side, I'm working with some
debiandeveloper on how to best manage the
packaging of those extensions, and this work could end up as a specialized
policy document and a coordinated
team of maintainers for all things
PostgreSQL in
debian. This will also give some more steam to the PostgreSQL
effort for debian packages: the idea is to maintain packages for all
supported version (from
8.2 up to soon
9.1), something
debian itself can not
commit to.

]]>dim@tapoueh.org (Dimitri Fontaine)Mon, 02 May 2011 17:30:00 +0200http://tapoueh.org/blog/2011/05/02-extension-module_pathname-and-sqlin.htmlSome notes about Skytools3http://tapoueh.org/blog/2011/04/11-some-notes-about-skytools3.htmlI've been working on
skytools3 packaging lately. I've been pushing quite a
lot of work into it, in order to have exactly what I needed out of the box,
after some 3 years of production and experiences with the products. Plural,
yes, because even if
pgbouncer and
plproxy are siblings to the projets (same
developers team, separate life cycle and releases), then
skytools still
includes several sub-projects.

This split is needed so that you can install your
daemons (we call them
consumers) on separate machines than where you run
PostgreSQL. But for the
walmgr part, it makes no sense to install it if you don't have a local
PostgreSQL service, as it's providing
archive and
restore commands. Then
the
ticker, you're free to run it on any machine really, so just package it
this way (in
skytools3 the
ticker is written in
C and does not depend on the
python framework any more).

What you can't see here yet is the new goodies that wraps it as a quality
debian package. A new
skytools user is created for you when you install the
skytools3 package (which contains the services), along with a skeleton file
/etc/skytools.ini and a user directory
/etc/skytools/. Put in there your
services configuration file, and register those service in the
/etc/skytools.ini file itself. Then they will get cared about in the
init
sequence at startup and shutdown of your server.

The services will run under the
skytools system user, and will default to
put their log into
/var/log/skytools/. The
pidfile will get into
/var/run/skytools/. All integrated, automated.

Next big
TODO is about documentation, reviewing it and polishing it, and I
think that
skytools3 will then get ready for public release. Yes, you read
it right, it's happening this very year! I'm very excited about it, and
have several architectures that will greatly benefit from the switch to
skytools3. More on that later, though! (Yes, my
to blog later list is
getting quite long now).

]]>dim@tapoueh.org (Dimitri Fontaine)Mon, 11 Apr 2011 11:30:00 +0200http://tapoueh.org/blog/2011/04/11-some-notes-about-skytools3.htmltowards pg_staging 1.0http://tapoueh.org/blog/2011/03/29-towards-pg_staging-10.htmlIf you don't remember about what
pg_staging is all about, it's a central
console from where to control all your
PostgreSQL databases. Typically you
use it to manage your development and pre-production setup, where developers
ask you pretty often to install them some newer dump from the production,
and you want that operation streamlined and easy.

The list of supported commands is quite long now, and documented too (it
comes with two man pages). The
restore one is the most important and will
create the database, add it to the
pgbouncer setup, fetch the backup named
dbname.`date -I`.dump, prepare a filtered object list (more on that), load
preSQL scripts, launch
pg_restore,
VACUUM ANALYZE the database when
configured to do so, load the
postSQL scripts then optionaly
switch the
pgbouncer setup to default to this new database.

Filtering

The newer option is called
tablename_nodata_regexp, and here's its documentation in full:

List of table names regexp (comma separated) to restore without
content. The
pg_restore catalog
TABLE DATA sections will get
filtered out. The regexp is applied against
schemaname.tablename
and non-anchored by default.

This comes to supplement the
schemas and
schemas_nodata options, that allows
to only restore objects from a given set of
schemas (filtering out triggers
that will calls function that are in the excluded schemas, like
e.g.
Londiste ones) or to restore only the
TABLE definitions while skipping
the
TABLE DATA entries.

Setup

To setup your environment for
pg_staging, you need to take some steps. It's
not complex but it's fairly involved. The benefit is this amazingly useful
central unique console to control as many databases as you need.

You need a
pg_staging.ini file where to describe your environment. I
typically name the sessions in there by the name of the database to restore
followed by a
dev or
preprod extension.

You need to have all your backups available through
HTTP, and as of now,
served by the famous
apachemod_dir directory listing. It's easy to add
support to other methods, but is has not been done yet. You also need to
have a cluster wide
--globals-only backup available somewhere so that you
can easily create the users etc you need from
pg_staging.

You also need to run a
pgbouncer daemon on each database server, allowing
you to bypass editing connection strings when you
switch a new database
version live.

You also need to install the
client script, have a local
pgstaging system
user and allow it to run the client script as root, so that it's able to
control some services and edit
pgbouncer.ini for you.

Status

I'm still using it a lot (several times a week) to manage a whole
development and pre-production environment set, so the very low
code activity of the project is telling that it's pretty stable (last series
of
commits are all bug fixes and round corners).

Given that, I'm thinking in terms of
pg_staging 1.0 soon! Now is a pretty
good time to try it and see how it can help you.

]]>dim@tapoueh.org (Dimitri Fontaine)Tue, 29 Mar 2011 15:30:00 +0200http://tapoueh.org/blog/2011/03/29-towards-pg_staging-10.htmlExtensions in 9.1http://tapoueh.org/blog/2011/03/01-extensions-in-91.htmlIf you've not been following closely you might have missed out on extensions
integration. Well,
Tom spent some time on the patches I've been preparing
for the last 4 months. And not only did he commit most of the work but he
also enhanced some parts of the code (better factoring) and basically
finished it.

At the
previous developer meeting his advice was to avoid putting too much
into the very first version of the patch for it to stand its chances of
being integrated, and while in the review process more than one major
PostgreSQL contributor expressed worries about the size of the patch and the
number of features proposed. Which is the usual process.

Then what happened is that
Tom finally took a similar reasoning as mine
while working on the feature. To maximize the benefits, once you have the
infrastructure in place, it's not that much more work to provide the really
interesting features. What's complex is agreeing on what exactly are their
specifications. And in the
little time window we got on this commit fest
(well, we hijacked about 2 full weeks there), we managed to get there.

All the
contrib modules that are installing
SQL objects into databases for
you to use them are now converted to
Extensions too, and will get released
in
9.1 with an upgrade script that allows you to
upgrade from unpackaged.
That means that once you've upgraded from a past PostgreSQL release up to
9.1, it will be a command away for you to register
extensions as such. I
expect third party
extension authors (from
ip4r to
temporal) to release a
upgrade-from-unpackaged version of their work too.

Of course, a big use case of the
extensions is also in-house
PL code, and
having version number and multi-stage upgrade scripts there will be
fantastic too, I can't wait to work with such a tool set myself. Some later
blog post will detail the benefits and usage. I'm already trying to think
how much of this version and upgrade facility could be expanded to classic
DDL objects…

So expect some more blog posts from me on this subject, I will have to talk
about
debian packaging an extension (it's getting damn easy with
postgresql-server-dev-all — yes it has received some planing ahead), and
about how to package your own extension, manage upgrades, turn your current
pre-9.1 extension into a
full blown extension, and maybe how to stop
worrying about extension when you're a DBA.

If you have some features you would want to discuss for next releases,
please do contact me!

Meanwhile, I'm very happy that this project of mine finally made it to
core,
it's been long in the making. Some years to talk about it and then finally
4 months of coding that I'll remember as a marathon. Many Thanks go to all
who helped here, from
2ndQuadrant to early reviewers to people I talked to
over beers at conferences… lots of people really.

To an extended PostgreSQL (and beyond) :)

]]>dim@tapoueh.org (Dimitri Fontaine)Tue, 01 Mar 2011 16:30:00 +0100http://tapoueh.org/blog/2011/03/01-extensions-in-91.htmlBack from FOSDEMhttp://tapoueh.org/blog/2011/02/07-back-from-fosdem.htmlThis year we were in the main building of the conference, and apparently the
booth went very well, solding lots of
PostgreSQL merchandise etc. I had the
pleasure to once again meet with the community, but being there only 1 day I
didn't spend as much time as I would have liked with some of the people there.

In case you're wondering, my
extension's talk went quite well, and several
people were kind enough to tell me they appreciated it! There was video
recording of it, so we will soon have proofs showing how bad it really was
and how
polite those people really are :)

I will soon be able to write an article series detailing what's an Extension
and how you deal with them, either as a user or an author. Well in fact the
goal is for any user to easily become an extension author, as I think lots
of people are already maintaining server side code but missing tools to
manage it properly. But that will begin once the patch is in, so that I
present
the real stuff rather than what I proposed to the community… Stay
tuned!

And I will even do my
Extension's talk which had a
success at pgday.eu. The
talk will be updated to include the last developments of the extension's
feature, as some of it changed already in between, and to detail the plan
for the
ALTER EXTENSION ... UPGRADE feature that I'd like to see included as
soon as
9.1, but time is running so fast.

In fact the design for the
UPGRADE has been done and reviewed already, but
there's yet to reach consensus on how to setup which is the upgrade file to
use when upgrading from a given version to another. I've solved it in my
patch, of course, by adding properties into the extension's
control
file. That's the best place to have that setup I think, it allows lots of
flexibility, leave the extension's author in charge, and avoids any hard
coding of any kind of assumptions about file naming or whatever.

Next days and reviews will tell us more about how the design is received.
Meanwhile, we're working on finalizing the main extension's patch, offering
pg_dump support.

]]>dim@tapoueh.org (Dimitri Fontaine)Tue, 01 Feb 2011 13:35:00 +0100http://tapoueh.org/blog/2011/02/01-going-to-fosdem.htmlpg_basebackuphttp://tapoueh.org/blog/2010/11/07-pg_basebackup.htmlHannu just gave me a good idea in
this email on
-hackers, proposing that
pg_basebackup should get the
xlog files again and again in a loop for the
whole duration of the
base backup. That's now done in the aforementioned
tool, whose options got a little more useful now:

Yeah, as implementing the
xlog idea required having some kind of
parallelism, I built on it and the script now has a
--jobs option for you to
setup how many processes to launch in parallel, all fetching some
base
backup files in its own standard (
libpq)
PostgreSQL connection, in
compressed chunks of
8 MB (so that's not
8 MB chunks sent over).

The
xlog loop will fetch any
WAL file whose
ctime changed again,
wholesale. It's easier this way, and tools to get optimized behavior already
do exist, either
walmgr or
walreceiver.

The script is still a little
python self-contained short file, it just went
from about
100 lines of code to about
400 lines. There's no external
dependency, all it needs is provided by a standard python installation. The
problem with that is that it's using
select.poll() that I think is not
available on windows. Supporting every system or adding to the dependencies,
I've been choosing what's easier for me.

import select
p = select.poll()
p.register(sys.stdin, select.POLLIN)

If you get to try it, please report about it, you should know or easily
discover my
email!

]]>dim@tapoueh.org (Dimitri Fontaine)Sun, 07 Nov 2010 13:45:00 +0100http://tapoueh.org/blog/2010/11/07-pg_basebackup.htmlIntroducing Extensionshttp://tapoueh.org/blog/2010/10/21-introducing-extensions.htmlAfter reading
Simon's blog post, I can't help but try to give some details
about what it is exactly that I'm working on. As he said, there are several
aspects to
extensions in
PostgreSQL, it all begins here:
Chapter 35. Extending SQL.

It's possible, and mostly simple enough, to add your own code or behavior to
PostgreSQL, so that it will use your code and your semantics while solving
user queries. That's highly useful and it's easy to understand how so when
you look at some projects like
PostGIS,
ip4r (index searches of
ip in a
range, not limited to
CIDR notation), or our own
Key Value Store,
hstore.

So, what's in an Extension?

An
extension in its simple form is a
SQLscript that you load on your
database, but manage separately. Meaning you don't want the script to be
part of your backups. Often, that kind of script will create new datatypes
and operators, support functions, user functions and index support, and then
it would include some
C code that ships in a
shared library object.

As far as PostgreSQL is concerned, at least in the current version of my
patch, the extension is first a
meta information file that allows to
register it. We currently call that the
control file. Then, it's an
SQL
script that is
executed by the server when you
create the
extension.

If it so happens that the
SQL script depends on some
shared library objects
file, this has to be present at the right place (
MODULE_PATHNAME) for the
extension to be successfully created, but that's always been the case.

The problem with current releases of PostgreSQL, that the
extension patch is
solving, is the
pg_dump and
pg_restore support. We said it, you don't want
the
SQL script to be part of your dump, because it's not maintained in your
database, but in some code repository out there. What you want is to be able
to install the
extension again at the file system level then
pg_restore your
database — that depends on it being there.

And that's exactly what the
extension patch provides. By now having a
SQL
object called an
extension, and maintained in the new
pg_extension catalog,
we have an
Oid to refer to. Which we do by recording a dependency between
any object created by the script and the
extensionOid, so that
pg_dump can
be instructed to skip those.

Ok I've edited the output in a visible way, to leave the
Version and
Custom
Variable Classes column out. It's taking lots of screen place and it's not
that useful here. Maybe the
classes one will even get dropped out of the
patch before reaching
9.1, we'll see.

See, that was easy enough. Same thing, the extra columns have been
removed. So, what's in this extension, will you ask me, what are those
objects that you would normally (that is, before the patch) find in your
pg_dump backup script?

This function main intended users are the
extension authors themselves, so
that it's easy for them to figure out which system identifier (the
objid
column) has been attributed to some
SQL objects from their install
script. With this knowledge, you can prepare some
upgrade scripts. But
that's for another patch altogether, so we'll get back to the matter in
another blog entry.

So we chose
trgm as an example, let's follow the documentation and create a
test table and a custom index in there, just so that the extension is put to
good use. Then let's try to
DROP our extension, because we're testing the
infrastructure, right?

Of course PostgreSQL is smart enough here — the
extension patch had nothing
special to do to achieve that, apart from recording the dependencies. Next,
as we didn't
drop extension pg_trgm cascade;, it's still in the database. So
let's see what a
pg_dump will look like. As it's quite a lot of text to
paste, let's see the
pg_restore catalog instead. And that's a feature that
needs to be known some more, too.

As you see, the only SQL object that got into the backup are an
EXTENSION
and its
COMMENT. Nothing like the types or the functions that the
pg_trgm
script creates.

What does it means to extension authors?

In order to be an
extension, you have to prepare a
control file where to
give the necessary information to register your script. This file must be
named
extension.control if the script is named
extension.sql, at least at
the moment. This file can benefit from some variable expansion too, like
does the current
extension.sql.in, in that if you provide an
extension.control.in file the term
VERSION will be expanded to whatever
$(VERSION) is set to in your
Makefile.

If you never wrote a
C coded
extension for PostgreSQL, this might look
complex and irrelevant. Baseline is that you need a
Makefile so that you can
benefit easily from the PostgreSQL infrastructure work and have the
make
install operation place your files at the right place, including the new
control file.

That's it for today, folks

A next blog entry will detail what happens with extensions providing
user
data, and the
CREATE EXTENSION name WITH NO DATA; variant. Stay tuned!

]]>dim@tapoueh.org (Dimitri Fontaine)Thu, 21 Oct 2010 13:45:00 +0200http://tapoueh.org/blog/2010/10/21-introducing-extensions.htmlExtensions: writing a patch for PostgreSQLhttp://tapoueh.org/blog/2010/10/15-extensions-writing-a-patch-for-postgresql.htmlThese days, thanks to my
community oriented job, I'm working full time on a
PostgreSQL patch to terminate basic support for
extending SQL. First thing I
want to share is that patching the
backend code is not as hard as one would
think. Second one is that
git really is helping.

“Not as hard as one would think, are you kidding me?”, I hear some
say. Well, that's true. It's
C code in there, but with a very good layer of
abstractions so that you're not dealing with subtle problems that much. Of
course it happens that you have to, and managing the memory isn't an
option. That said,
palloc() and the
memory contexts implementation makes
that as easy as
in lots of cases, you don't have to think about it.

PostgreSQL is very well known for its reliability, and that's not something
that just happened. All the source code is organized in a way that makes it
possible, so your main task is to write code that looks as much as possible
like the existing surrounding code. And we all know how to
copy paste,
right?

So, my current work on the
extensions is to make it so that if you install
hstore in your database (to pick an example), your backup won't contain any
hstore specific objects (types, functions, operators, index support objects,
etc) but rather a single line that tells PostgreSQL to install
hstore again.

CREATE EXTENSION hstore;

The feature already works in
my git branch and I'm extracting infrastructure
work in there to ease review. That's when
git helps a lot. What I've done is
create a new branch from the master one, then
cherry pick the patches of
interest. Well sometime you have to resort to helper tools. I've been told
after the fact that using
git cherry-pick -n would have allowed the
following to be much simpler:

That's what I did to prepare a side branch containing only changes to a part
of my current work. I had to filter the diff so much only because I'm
commiting in rather big steps, rather than very little chunks at a time. In
this case that means I had a single patch with several
units of changes and
I wanted to extract only one. Well, it happens that even in such a case,
git
is helping!

There's more to say about the
extension related feature of course, but
that'll do it for this article. I'd just end up with the following nice
diffstat of 4 days of work:

]]>dim@tapoueh.org (Dimitri Fontaine)Fri, 15 Oct 2010 11:30:00 +0200http://tapoueh.org/blog/2010/10/15-extensions-writing-a-patch-for-postgresql.htmlDate puzzle for startershttp://tapoueh.org/blog/2010/10/08-date-puzzle-for-starters.htmlThe
PostgreSQLIRC channel is a good place to be, for all the very good help
you can get there, because people are always wanting to remain helpful,
because of the off-topics discussions sometime, or to get to talk with
community core members. And to start up your day too.

This morning's question started simple : “how can I check if today is the
"first sunday fo the month". or "the second tuesday of the month" etc?”

So you just have to compare the result of the function with
'today'::date
and there you go. The problem is that the question could be read in the
other way round, like, what is today in
first or
secondday name of this
month
format? Once more,
RhodiumToad to the rescue:

But the part that I found nice to play with was my first reading of the
question, as I don't get to lose my ideas that easily, you see… so what
about writing a function to return the date of any
nth occurrence of a given
day of week in a
given month, defaulting to this very month?

So you see we just got the first Sunday of this month
(0, 0) and the second
Thursday
(1, 4) of the previous one. Any date within a month is a good way
to tell which month you want to work in, as the function's written, abusing
date_trunc like it does.

Now the way the function is written is unfinished. You want to fix it in one
of two ways. Either stop using
generate_series to only output one row at a
time, or fix the
API so that you can ask for more than a
nth dow at a
time. Of course, that was a starter for me, not a problem I need to solve
directly, and that was a good excuse for a blog entry, so I won't fix
it. That's left as an exercise to our interested readers!

]]>dim@tapoueh.org (Dimitri Fontaine)Fri, 08 Oct 2010 10:00:00 +0200http://tapoueh.org/blog/2010/10/08-date-puzzle-for-starters.htmlResuming work on Extensions, first little stephttp://tapoueh.org/blog/2010/10/07-resuming-work-on-extensions-first-little-step.htmlYeah I'm back on working on my part of the extension thing in
PostgreSQL.

So that's still more steps that one want to call dead simple, but still. The
format-patch command is to save my work away (all patches that are in the
extension branch but not in the
master — well that was only one of them
here). Then, as the master repository
URL didn't change, I can simply
pull
the changes in. Of course I had a nice message
warning: no common commits.

Once pulled, I trashed my local copy and replaced it with the new official
one, that's
git reset --hard pgmaster/master, then in the
extension branch I
could trash it and have it linked to the local
master again.

Of course, the
git am method wouldn't apply my patch as-is, there was some
underlying changes in the source files, the identification tag changed from
$PostgreSQL$ to, e.g.,
src/backend/utils/adt/genfile.c, and I had to cope
with that. Maybe there's some tool (
git am -3 ?) to do it automatically, I
just copy edited the
.patch file.

Lastly, it's all about checking the result and publishing the result. This
last line is
git push -f and is when I just trashed and replaced my
postgresql-extension community repository. I don't think anybody was
following it, but should it be the case, you will have to
reinit your copy.

More blog posts to come about extensions, as I arranged to have some real
time to devote on the topic. At least I was able to arrange things so that I
can work on the subject for real, and the first thing I did, the very night
before it was meant to begin, is catch a
tonsillitis. Lost about a week, not
the project! Stay tuned!

]]>dim@tapoueh.org (Dimitri Fontaine)Thu, 07 Oct 2010 17:15:00 +0200http://tapoueh.org/blog/2010/10/07-resuming-work-on-extensions-first-little-step.htmlRegexp performances and Finite Automatahttp://tapoueh.org/blog/2010/09/26-regexp-performances-and-finite-automata.htmlThe major reason why I dislike
perl so much, and
ruby too, and the thing I'd
want different in the
Emacs LispAPI so far is how they set developers mind
into using
regexp. You know the quote, don't you?

Some people, when confronted with a problem, think “I know, I'll use regular
expressions.” Now they have two problems.

That said, some situations require the use of
regexp — or are so much
simpler to solve using them than the maintenance hell you're building here
ain't that big a drag. The given expressiveness is hard to match with any
other solution, to the point I sometime use them in my code (well I use
rx
to lower the burden sometime, just see this example).

The thing you might want to know about
regexp is that computing them is an
heavy task usually involving
parsing their representation,
compiling it to
some executable code, and then
executing generated code. It's been showed in
the past (as soon as 1968) that a
regexp is just another way to write a
finite automata, at least as soon as you don't need
backtracking. The
writing of this article is my reaction to reading
Regular Expression Matching Can Be Simple And Fast (but is slow in Java,
Perl, PHP, Python, Ruby, ...), a very interesting article — see the
benchmarks in there.

The bulk of it is that we find mainly two categories of
regexp engine in the
wild, those that are using
NFA and
DFA intermediate representation
techniques, and the others. Our beloved
PostgreSQL sure offers the feature,
it's the
~ and
~*operators. The implementation here is based on
Henry Spencer's work, which the aforementioned article says

became very widely used, eventually serving as the basis for the slow
regular expression implementations mentioned earlier: Perl, PCRE, Python,
and so on.

Having a look at the actual implementation shows that indeed, current
PostgreSQL code for
regexp matching uses intermediate representations of
them as
NFA and
DFA. The code is quite complex, even more than I though it
would be, and I didn't have the time it would take to check it against the
proposed one from the
simple and fast article.

So all in all, I'll continue avoiding
regexp as much as I currently do, and
will maintain my tendency to using
awk when I need them on files (it allows
to refine the searching without resorting to more and more pipes in the
command line). And as far as resorting to using
regexp in PostgreSQL is
concerned, it seems that the code here is already about topnotch. Once more.

]]>dim@tapoueh.org (Dimitri Fontaine)Sun, 26 Sep 2010 21:00:00 +0200http://tapoueh.org/blog/2010/09/26-regexp-performances-and-finite-automata.htmlWindow Functions example remixhttp://tapoueh.org/blog/2010/09/12-window-functions-example-remix.htmlThe drawback of hosting a static only website is, obviously, the lack of
comments. What happens actually, though, is that I receive very few comments
by direct mail. As I don't get another
spam source to cleanup, I'm left
unconvinced that's such a drawback. I still miss the low probability of
seeing blog readers exchange directly, but I think a
tapoueh.org mailing
list would be my answer, here...

Anyway,
David Fetter took the time to send me a comment by mail with a
cleaned up rewrite of the previous entry
SQL, here's it for your pleasure!

As you can see
David chose to filter the first change in the subquery rather
than hacking it away with a simple
-1 at the outer level. I'm still
wondering which way is cleaner (that depends on how you look at the
problem), but I think I know which one is simpler! Thanks
David for this
blog entry!

]]>dim@tapoueh.org (Dimitri Fontaine)Sun, 12 Sep 2010 21:35:00 +0200http://tapoueh.org/blog/2010/09/12-window-functions-example-remix.htmlWindow Functions examplehttp://tapoueh.org/blog/2010/09/09-window-functions-example.htmlSo, when
8.4 came out there was all those comments about how getting
window functions was an awesome addition. Now, it seems that a lot of people
seeking for help in
#postgresql just don't know what kind of problem this
feature helps solving. I've already been using them in some cases here in
this blog, for getting some nice overview about
Partitioning: relation size per “group”.

hey there, how can i count the number of (value) changes in one column?

example: a table with a column
weight. let's say we have 5 rows, having
the following values for weight:
5, 10, 7, 7, 7. the number of changes of
weight would be 2 here (from 5 to 10 and 10 to 7). any idea how I could do
that in SQL using PGSQL 8.4.4? GROUP BY or count(distinct weight)
obviously does not work. thx in advance

Now, several of us began talking about
window functions and about the fact
that you need some other column to identify the ordering of those weights,
obviously, because that's the only way to define what a change is in this
context. Let's have a first try at it.

Not too bad, but of course we are seeing a false change on the first line,
as for any
window of rows you define the previous one, given by
lag()
over(), will be
NULL. The easiest way to accommodate is the following:

So don't be shy and go read about
window functions in SQL expressions and
window function processing in the query table expressions. That's a very
nice tool to have and my guess is that you will soon enough realize the only
reason why you could think you don't have a need for them is that you didn't
know it existed, and what you can do with it.
Sharpen your saw! :)

]]>dim@tapoueh.org (Dimitri Fontaine)Thu, 09 Sep 2010 16:35:00 +0200http://tapoueh.org/blog/2010/09/09-window-functions-example.htmlSynchronous Replicationhttp://tapoueh.org/blog/2010/09/06-synchronous-replication.htmlAlthough the new asynchronous replication facility that ships with 9.0 ain't
released to the wide public yet, our hackers hero are already working on the
synchronous version of it. A part of the facility is rather easy to design,
we want something comparable to
DRBD flexibility, but specific to our
database world. So
synchronous would either mean
recv,
fsync or
apply,
depending on what you need the
standby to have already done when the master
acknowledges the
COMMIT. Let's call that the
service level.

The part of the design that's not so easy is more interesting. Do we need to
register standbys and have the
service level setup per standby? Can we get
some more flexibility and have the
service level set on a per-transaction
basis? The idea here would be that the application knows which transactions
are meant to be extra-safe and which are not, the same way that you can set
synchronous_commit to off when dealing with web sessions, for example.

Why choosing? I hear you ask. Well, it's all about having more data safety,
and a typical setup would contain an asynchronous reporting server and a
local
failover synchronous server. Then add a remote one, too. So even if we
pick the transaction based facility, we still want to be able to choose at
setup time which server to failover to. Than means we don't want that much
flexibility now, we want to know where the data is safe, we don't want to
have to guess.

Some way to solve that is to be able to setup a slave as being the failover
one, or say, the
sync one. Now, the detail that ruins it all is that we need
a
timeout to handle worst cases when a given slave loses its connectivity
(or power, say). Now, the slave ain't in
sync any more and some people will
require that the service is still available (
timeout but
COMMIT) and some
will require that the service is down: don't accept a new transaction if you
can't make its data safe to the slave too.

The answer would be to have the master arbitrate between what the
transaction wants and what the slave is setup to provide, and what it's able
to provide at the time of the transaction. Given a transaction with a
service level of
apply and a slave setup for being
async, the
COMMIT does
not have to wait, because there's no known slave able to offer the needed
level. Or the
COMMIT can not happen, for the very same reason.

Then I think it all flows quite naturally from there, and while arbitrating
the master could record which slave is currently offering what
service
level. And offering the information in a system view too, of course.

The big question that's not answered in this proposal is how to setup that
being unable to reach the wanted
service level is an error or a
warning?

That too would need to be for the master to arbitrate based on a per standby
and a per transaction setting, and in the general case it could be a
quorum
setup: each slave is given a
weight and each transaction a
quorum to
reach. The master sums up the weights of the standby that ack the
transaction at the needed
service level and the
COMMIT happens as soon as
the quorum is reached, or is canceled as soon as the
timeout is reached,
whichever comes first.

Such a model allows for very flexible setups, where each standby has a
weight and offers a given
service level, and each transaction waits until a
quorum is reached. Giving the right weights to your standbys (like, powers
of two) allow you to set the quorum in a way that only one given standby is
able to acknowledge the most important transactions. But that's flexible
enough you can change it at any time, it's just a
weight that allows a
sum
to be made, so my guess would be it ends up in the
feedback loop between the
standby and its master.

The most appealing part of this proposal is that it doesn't look complex to
implement, and should allow for highly flexible setups. Of course, the devil
is in the details, and we're talking about latencies in the distributed
system here. That's also being discussed on the
mailing list.

]]>dim@tapoueh.org (Dimitri Fontaine)Mon, 06 Sep 2010 18:05:00 +0200http://tapoueh.org/blog/2010/09/06-synchronous-replication.htmlHappy Numbershttp://tapoueh.org/blog/2010/08/30-happy-numbers.htmlAfter discovering the excellent
Gwene service, which allows you to subscribe
to
newsgroups to read
RSS content (
blogs,
planets,
commits, etc), I came to
read this nice article about
Happy Numbers. That's a little problem that
fits well an interview style question, so I first solved it yesterday
evening in
Emacs Lisp as that's the language I use the most those days.

A happy number is defined by the following process. Starting with any
positive integer, replace the number by the sum of the squares of its
digits, and repeat the process until the number equals 1 (where it will
stay), or it loops endlessly in a cycle which does not include 1. Those
numbers for which this process ends in 1 are happy numbers, while those
that do not end in 1 are unhappy numbers (or sad numbers).

Now, what about implementing the same in pure
SQL, for more fun? Now that's
interesting! After all, we didn't get
WITH RECURSIVE for tree traversal
only,
did we?

Unfortunately, we need a little helper function first, if only to ease the
reading of the recursive query. I didn't try to inline it, but here it goes:

That was easy: it will output one row per digit of the input number — and
rather than resorting to powers of ten and divisions and remainders, we do
use plain old text representation and
substring. Now, to the real
problem. If you're read what is an happy number and already did read the
fine manual about
Recursive Query Evaluation, it should be quite easy to
read the following:

That shows how it works for some
happy number, and it's easy to test for a
non-happy one, like for example
17. The query won't cycle thanks to the
seen
array and the
having filter, so the only difference between an
happy and a
sad number will be that in the former case the last line output by the
recursive query will have
n = 1. Let's expand this knowledge
into a proper function (because we want to be able to have the number we
test for happiness as an argument):

We need the
desc nulls last trick in the
order by because the
array_length()
of any dimension of an empty array is
NULL, and we certainly don't want to
return all and any number as unhappy on the grounds that the query result
contains a line
input, {}. Let's now play the same tricks as in the puzzle
article:

(Yes, I tricked the
EXPLAIN ANALYZE output so that it fits on the page width
here). For what it's worth, finding the first
10000 happy numbers in
Emacs
Lisp on the same laptop takes
2830 ms, also running a recursive version of
the code.

]]>dim@tapoueh.org (Dimitri Fontaine)Mon, 30 Aug 2010 11:00:00 +0200http://tapoueh.org/blog/2010/08/30-happy-numbers.htmlPlaying with bit stringshttp://tapoueh.org/blog/2010/08/26-playing-with-bit-strings.htmlThe idea of the day ain't directly from me, I'm just helping with a very
thin subpart of the problem. The problem, I can't say much about, let's just
assume you want to reduce the storage of
MD5 in your database, so you want
to abuse
bit strings. A solution to use them works fine, but the datatype is
still missing some facilities, for example going from and to hexadecimal
representation in text.

To understand the magic in the second function, let's walk through the tests
one could do when wanting to grasp how things work in the
bitstring world
(using also some reading of the fine documentation, too).

Storing
varbits rather than the
text form of the
MD5 allows us to go from
6510 MB down to
4976 MB on a sample table containing 100 millions
rows. We're targeting more that that, so that's a great win down here!

In case you wonder, querying the main index on
varbit rather than the one on
text for a single result row, the cost of doing the conversion with
varbit_to_hex seems to be around
28 µs. We can afford it.

Hope this helps!

]]>dim@tapoueh.org (Dimitri Fontaine)Thu, 26 Aug 2010 17:45:00 +0200http://tapoueh.org/blog/2010/08/26-playing-with-bit-strings.htmlEditing constants in constraintshttp://tapoueh.org/blog/2010/08/09-editing-constants-in-constraints.htmlWe're using constants in some constraints here, for example in cases where
several servers are replicating to the same
federating one: each origin
server has his own schema, and all is replicated nicely on the central host,
thanks to
Londiste, as you might have guessed already.

For bare-metal recovery scripts, I'm working on how to change those
constants in the constraints, so that
pg_dump -s plus some schema tweaking
would kick-start a server. Here's a
PLpgSQL snippet to do just that:

This relies on the fact that our constraints are on the column
server. Why
would this be any better than a
sed one-liner, would you ask me? I'm fed up
with having pseudo-parsing scripts and taking the risk that the simple
command will change data I didn't want to edit. I want context aware tools,
pretty please, to
feel safe.

Otherwise I'd might have gone with
pg_dump -s| sed -e 's:\(server =\)
17:\1 18:' but this one-liner already contains too much useless magic
for my taste (the space before
17 ain't in the group match to allow for
having
\1 18 in the right hand side. And this isn't yet parametrized, and
there I'll need to talk to the database, as that's were I store the servers
name and their id (a
bigserial — yes, the constraints are all generated from
scripts). I don't want to write an
SQL parser and I don't want to play
loose, so the
PLpgSQL approach is what I'm thinking as the best tool
here. Opinionated answers get to my mailbox!

]]>dim@tapoueh.org (Dimitri Fontaine)Mon, 09 Aug 2010 14:45:00 +0200http://tapoueh.org/blog/2010/08/09-editing-constants-in-constraints.htmldebian packaging PostgreSQL extensionshttp://tapoueh.org/blog/2010/08/06-debian-packaging-postgresql-extensions.htmlIn trying to help an extension
debian packaging effort, I've once again
proposed to handle it. That's because I now begin to know how to do it, as
you can see in my
package overview page at
debian QA facility. There's a
reason why I proposed myself here, it's that yet another tool of mine is now
to be found in
debian, and should greatly help
extension packaging
there. You can already check for the
postgresql-server-dev-all package page
if you're that impatient!

Back? Ok, so I used to have two main gripes against debian support for
PostgreSQL. The first one, which is now feeling alone, is that both project
release support policy aren't compatible enough for debian stable to include
all currently supported stable PostgreSQL major version. That's very bad
that debian stable will only propose one major version, knowing that the
support for several of them is in there.

The problem is two fold: first, debian stable has to maintain any
distributed package. There's no
deprecation policy allowing for droping the
ball. So the other side of this coin is that debian developers must take on
themselves maintaining included software for as long as stable is not
renamed
oldstable. And it so happens that there's no debian developer that
feels like maintaining
end of lined PostgreSQL releases without help from
PostgreSQL Core Team. Or, say, without official statement that they would
help.

Now, why I don't like this situation is because I'm pretty sure there's very
few software development group offering as long and reliable maintenance
policy as PostgreSQL is doing, but debian will still happily distribute
unknown-maintenance-policy pieces of code in its stable repositories. So the
uncertainty excuse is rather poor. And highly frustrating.

Note: you have to admit that the debian stable management model copes very
well with all the debian included software. You can't release stable with
a new PostgreSQL major version unless each and every package depending on
PostgreSQL will actually work with the newer version, and the debian
scripts will care for upgrading the cluster. Where it's not working good
is when you're using debian for a PostgreSQL server for a proprietary
application, which happens quite frequently too.

The consequence of this fact leads to my second main gripe against debian
support for PostgreSQL: the extensions. It so happens that the PostgreSQL
extensions are developped for supporting several major versions from the
same source code. So typically, all you need to do is recompile the
extension against the new major version, and there you go.

Now, say debian new stable is coming with
8.4 rather than
8.3 as it used
to. You should be able to just build the extensions (like
prefix), without
changing the source package, nor droping
postgresql-8.3-prefix from the
distribution on the grounds that
8.3 ain't in debian stable anymore.

As you can see, the
pg_buildext tool allows you to list the PostgreSQL major
versions the extension you're packaging supports, and only those that are
both in your list and in the current debian supported major version list
will get built.
pg_buildext will do a
VPATH build of your extension, so it's
capable of building the same extension for multiple major versions of
PostgreSQL. Here's how it looks:

# build all supported version
pg_buildext build $(SRCDIR) $(TARGET) "$(CFLAGS)"
# then install each of them
for v in `pg_buildext supported-versions $(SRCDIR)`; do \
dh_install -ppostgresql-$$v-prefix ;\
done

So you still need to maintain
debian/pgversions and the
postgresql-X.Y-extension.* files, but then a change in debian support for
PostgreSQL major versions will be handled automatically (there's a facility
to trigger automatic rebuild when necessary).

All this ranting to explain that pretty soon, the extenion's packages that I
maintain will no longer have to be patched when dropping a previously
supported major version of PostgreSQL. I'm breathing a little better, so
thanks a lot
Martin!

]]>dim@tapoueh.org (Dimitri Fontaine)Fri, 06 Aug 2010 13:00:00 +0200http://tapoueh.org/blog/2010/08/06-debian-packaging-postgresql-extensions.htmlQuerying the Catalog to plan an upgradehttp://tapoueh.org/blog/2010/08/05-querying-the-catalog-to-plan-an-upgrade.htmlSome user on
IRC was reading the releases notes in order to plan for a minor
upgrade of his
8.3.3 installation, and was puzzled about potential needs for
rebuilding
GIST indexes. That's from the
8.3.5 release notes, and from the
8.3.8 notes you see that you need to consider
hash indexes on
interval
columns too. Now the question is, how to find out if any such beasts are in
use in your database?

It happens that
PostgreSQL is letting you know those things by querying its
system catalogs. That might look hairy at first, but it's very worth getting
used to those system tables. You could compare that to introspection and
reflexive facilities of some programming languages, except much more useful,
because you're reaching all the system at once. But, well, here it goes:

Now you could replace the
WHERE clause with
WHERE amname IN ('gist', 'hash')
to check both conditions at once. What about pursuing the restriction on the
hash indexes rebuild to schedule, as they should only get done to indexes on
interval columns. Well let's try it:

We're not there yet, because as you notice, the catalogs are somewhat
optimized and not always in a normal form. That's good for the system's
performance, but it makes querying a bit uneasy. What we want is to get from
the
indclass column if there's any of them (it's an
oidvector) that applies
to an
interval data type. There's a subtlety here as the index could store
interval data even if the column is not of an
interval type itself, so we
have to find both cases.

Well the
subtlety applies after you know what an
operator class is:
“An
operator class defines how a particular data type can be used with an
index” is what the
CREATE OPERATOR CLASS manual page teaches us. What we
need to know here is that an index will talk to an operator class to get to
the data type, either the
column data type or the index
storage one.

Most certainly this query will return no row for you, as
hash indexes are
not widely used, mainly because they are not crash tolerant. For seeing some
results you could remove the
amname restriction of course, that would show
the query is working, but don't forget to add the restriction back to plan
for the upgrade!

But hey, why walking the extra mile here, would you ask me? After all, in
the second query we would already have had the information we needed should
we added the
indexdef column, albeit in a human reader friendly way: the
resultset would then contain the
CREATE INDEX command you need to issue to
build the index from scratch. That would be enough for checking only the
catalog, but the extra mile allows you to produce a
SQL script to build the
indexes that need your attention post upgrade. That last step is left as an
exercise for the reader, though.

]]>dim@tapoueh.org (Dimitri Fontaine)Thu, 05 Aug 2010 11:00:00 +0200http://tapoueh.org/blog/2010/08/05-querying-the-catalog-to-plan-an-upgrade.htmlDatabase Virtual Machineshttp://tapoueh.org/blog/2010/08/03-database-virtual-machines.htmlToday I'm being told once again about
SQLite as an embedded database
software. That one ain't a
database server but a
software library that you
can use straight into your main program. I'm yet to use it, but it looks
like
its SQL support is good enough for simple things — and that covers
loads of things. I guess read-only cache and configuration storage would be
the obvious ones, because it seems that
SQLite use cases aren't including
mixed concurrency, that is workloads with concurrent readers and writers.

The main thing is how to further optimize
PostgreSQL given what we have. It
seems that among the major road blocks in the performance work is how we get
the data from disk and to the client. We're still spending so many time in
the
CPU that the disk bandwidth are not always saturated, and that's a
problem. Further thoughts on the
full length article, but that's just about
a one page section now!

]]>dim@tapoueh.org (Dimitri Fontaine)Tue, 03 Aug 2010 13:30:00 +0200http://tapoueh.org/blog/2010/08/03-database-virtual-machines.htmlPartitioning: relation size per “group”http://tapoueh.org/blog/2010/07/26-partitioning-relation-size-per-group.htmlThis time, we are trying to figure out where is the bulk of the data on
disk. The trick is that we're using
DDL partitioning, but we want a “nice”
view of size per
partition set. Meaning that if you have for example a
parent table
foo with partitions
foo_201006 and
foo_201007, you would want
to see a single category
foo containing the accumulated size of all the
partitions underneath
foo.

Note that by simply removing those last two lines here, you will get a
detailed view of the
indexes and
tables that are taking the most volume on
disk at your place.

Now, what about using
window functions here so that we get some better
detailed view of historic changes on each partition? With some evolution
figure in percentage from the previous partition of the same year,
accumulated size per partition and per year, yearly sum, you name it. Here's
another one you might want to try, ready for some tuning (schema name, table
name, etc):

WITH s AS (
select relname,
pg_relation_size(c.oid) as size,
pg_total_relation_size(c.oid) as tsize,
substring(substring(relname from '[0-9]{6}$') for 4)::bigint as year
from pg_class c
join pg_namespace n on n.oid = c.relnamespace
where c.relkind = 'r'
-- and n.nspname = 'public'
-- and c.relname ~ 'stats'
and substring(substring(relname from '[0-9]{6}$') for 4)::bigint >= 2008
order by relname
),
sy AS (
select relname,
size,
tsize,
year,
(sum(size) over w_year)::bigint as ysize,
(sum(size) over w_month)::bigint as cumul,
(lag(size) over (order by relname))::bigint as previous
from s
window w_year as (partition by year),
w_month as (partition by year order by relname)
),
syp AS (
select relname,
size,
tsize,
rank() over (partition by year order by size desc) as rank,
case when ysize = 0 then ysize
else round(size / ysize::numeric * 100, 2) end as yp,
case when previous = 0 then previous
else round((size / previous::numeric - 1.0) * 100, 2) end as evol,
cumul,
year,
ysize
from sy
)
SELECT relname,
pg_size_pretty(size) as size,
pg_size_pretty(tsize) as "+indexes",
evol, yp as "% annuel", rank,
pg_size_pretty(cumul) as cumul, year,
pg_size_pretty(ysize) as "yearly sum",
pg_size_pretty((sum(size) over())::bigint) as total
FROM syp
ORDER BY relname;

Hope you'll find it useful, I certainly do!

]]>dim@tapoueh.org (Dimitri Fontaine)Mon, 26 Jul 2010 17:00:00 +0200http://tapoueh.org/blog/2010/07/26-partitioning-relation-size-per-group.htmlEmacs and PostgreSQLhttp://tapoueh.org/blog/2010/07/22-emacs-and-postgresql.htmlThose are my two all times favorite Open Source Software. Or
Free Software
in the
GNU sense of the world, as both the
BSD and the
GPL are labeled free
there. Even if I prefer the
The Debian Free Software Guidelines as a global
definition and the
WTFPL license. But that's a digression.

I think that
Emacs and
PostgreSQL do share a lot in common. I'd begin with
the documentation, which quality is amazing for both projects. Then of
course the extensibility with
Emacs Lisp on the one hand and
catalog-driven operations on the other hand. Whether you're extending Emacs
or PostgreSQL you'll find that it's pretty easy to tweak the system
while
it's running. The other comparison points are less important, like the fact
the both the systems get about the same uptime on my laptop (currently
13
days, 23 hours, 57 minutes, 10 seconds).

So of course I'm using
Emacs to edit
PostgreSQL.sql files, including stored
procedures. And it so happens that
line numbering in plpgsql is not as
straightforward as one would naively think, to the point that we'd like to
have better tool support there. So I've extended Emacs
linum-mode minor mode
to also display the line numbers as computed per PostgreSQL, and here's what
it looks like:

]]>dim@tapoueh.org (Dimitri Fontaine)Thu, 22 Jul 2010 09:30:00 +0200http://tapoueh.org/blog/2010/07/22-emacs-and-postgresql.htmlBackground writershttp://tapoueh.org/blog/2010/07/19-background-writers.htmlThere's currently a thread on
hackers about
bg worker: overview and a series
of 6 patches. Thanks a lot
Markus! This is all about generalizing a concept
already in use in the
autovacuum process, where you have an independent
subsystem that require having an autonomous
daemon running and able to start
its own
workers.

I've been advocating about generalizing this concept for awhile already, in
order to have
postmaster able to communicate to subsystems when to shut down
and start and reload, etc. Some external processes are only external because
there's no need to include them
by default in to the database engine, not
because there's no sense to having them in there.

So even if
Markus work is mainly about generalizing
autovacuum so that he
has a
coordinator to ask for helper backends to handle broadcasting of
writesets for
Postgres-R, it still could be a very good first step towards
something more general. What I'd like to see the generalization handle are
things like
PGQ, or the
pgagent scheduler. In some cases,
pgbouncer too.

What we're missing there is an
API for everybody to be able to extend
PostgreSQL with its own background processes and workers. What would such a
beast look like? I have some preliminary thoughts about this in my
Next Generation PostgreSQL article, but that's still early thoughts. The
main idea is to steal as much as sensible from
Erlang Generic Supervisor Behaviour, and maybe up to its
Generic Finite State Machinesbehavior. In the
Erlang world, a
behavior is a
generic process.

The
FSM approach would allow for any user daemon to provide an initial state
and register functions that would do some processing then change the
state. My feeling is that if those functions are exposed at the SQL level,
then you can
talk to the daemon from anywhere (the Erlang ideas include a
globally —cluster wide— unique name). Of course the goal would be to
provide an easy way for the
FSM functions to have a backend connected to the
target database handle the work for it, or be able to connect itself. Then
we'd need something else here, a way to produce events based on the clock. I
guess relying on
SIGALRM is a possibility.

I'm not sure about how yet, but I think getting back in consultancy after
having opened
2ndQuadrantFrance has some influence on how I think about all
that. My guess is that those blog posts are a first step on a nice journey!

]]>dim@tapoueh.org (Dimitri Fontaine)Mon, 19 Jul 2010 16:30:00 +0200http://tapoueh.org/blog/2010/07/19-background-writers.htmlLogs analysishttp://tapoueh.org/blog/2010/07/13-logs-analysis.htmlNowadays to analyze logs and provide insights, the more common tool to use
is
pgfouine, which does an excellent job. But there has been some
improvements in logs capabilities that we're not benefiting from yet, and
I'm thinking about the
CSV log format.

So the idea would be to turn
pgfouine into a set of
SQL queries against the
logs themselves once imported into the database. Wait. What about having our
next PostgreSQL version, which is meant (I believe) to include CSV support
in
SQL/MED, to directly expose its logs as a system view?

A good thing would be to expose that as a ddl-partitioned table following
the log rotation scheme as setup in
postgresql.conf, or maybe given in some
sort of a setup, in order to support
logrotate users. At least some
facilities to do that would be welcome, and I'm not sure plain
SQL/MED is
that when it comes to
source partitioning.

Then all that remains to be done is a set of
SQL queries and some static or
dynamic application to derive reports from there.

This is yet again an idea I have in mind but don't have currently time to
explore myself, so I talk about it here in the hope that others will share
the interest. Of course, now that I work at
2ndQuadrant, you can make it so
that we consider the idea in more details, up to implementing and
contributing it!

]]>dim@tapoueh.org (Dimitri Fontaine)Tue, 13 Jul 2010 14:15:00 +0200http://tapoueh.org/blog/2010/07/13-logs-analysis.htmlUsing indexes as column store?http://tapoueh.org/blog/2010/07/08-using-indexes-as-column-store.htmlThere's a big trend nowadays about using column storage as opposed to what
PostgreSQL is doing, which would be row storage. The difference is that if
you have the same column value in a lot of rows, you could get to a point
where you have this value only once in the underlying storage file. That
means high compression. Then you tweak the
executor to be able to load this
value only once, not once per row, and you win another huge source of data
traffic (often enough, from disk).

Well, it occurs to me that maybe we could have column oriented storage
support without adding any new storage facility into PostgreSQL itself, just
using in new ways what we already have now. Column oriented storage looks
somewhat like an index, where any given value is meant to appear only
once. And you have
links to know where to find the full row associated in
the main storage.

There's a work in progress to allow for PostgreSQL to use indexes on their
own, without having to get to the main storage for checking the
visibility. That's known as the
Visibility Map, which is still only a hint
in released versions. The goal is to turn that into a crash-safe trustworthy
source in the future, so that we get
covering indexes. That means we can use
an index and skip getting to the full row in main storage and get the
visibility information there.

Now, once we have that, we could consider using the indexes in more
queries. It could be a win to get the column values from the index when
possible and if you don't
output more columns from the
heap, return the
values from there. Scanning the index only once per value, not once per row.

There's a little more though on the point in the
Next Generation PostgreSQL
article I've been referencing already, should you be interested.

]]>dim@tapoueh.org (Dimitri Fontaine)Thu, 08 Jul 2010 11:15:00 +0200http://tapoueh.org/blog/2010/07/08-using-indexes-as-column-store.htmlMVCC in the Cloudhttp://tapoueh.org/blog/2010/07/06-mvcc-in-the-cloud.htmlAt
CHAR(10)Markus had a talk about
Using MVCC for Clustered Database Systems and explained how
Postgres-R does
it. The scope of his project is to maintain a set of database servers in the
same state, eventually.

Now, what does it mean to get "In the Cloud"? Well there are more than one
answer I'm sure, mine would insist on including this "Elasticity" bit. What
I mean here is that it'd be great to be able to add or lose nodes and stay
online. Granted, that what's
Postgres-R is providing. Does that make it
ready for the "Cloud"? Well it happens so that I don't think so.

Once you have elasticity, you also want
scalability. That could mean lots of
thing, and
Postgres-R already provides a great deal of it, at the connect
and reads level: you can do your business
unlimited on any node, the others
will eventually (
eagerly) catch-up, and you can do your
select on any node
too, reading from the same data set. Eventually.

What's still missing here is the hard sell,
write scalability. This is the
idea that you don't want to sustain the same
write load on all the members
of the "Cloud cluster". It happens that I have some idea about how to go on
this, and this time I've been trying to write them down. You might be
interested into the
MVCC in the Cloud part of my
Next Generation PostgreSQL
notes.

My opinion is that if you want to distribute the data, this is a problem
that falls in the category of finding the data on disk. This problem is
already solved in the executor, it knows which operating system level file
to open and where to seek inside that in order to find a row value for a
given relation. So it should be possible to teach it that some relation's
storage ain't local, to get the data it needs to communicate to another
PostgreSQL instance.

I would call that a
remote tablespace. It allows for distributing both the
data and their processing, which could happen in parallel. Of course that
means there's now some latency concerns, and that some
JOIN will get slow if
you need to retrieve the data from the network each time. For that what I'm
thinking about is the possibility to manage a local copy of a remote
tablespace, which would be a
mirror tablespace. But that's for another blog
post.

Oh, if that makes you think a lot of
SQL/MED, that would mean I did a good
enough job at explaining the idea. The main difference though would be to
ensure transaction boundaries over the local and remote data: it's one
single distributed database we're talking about here.

]]>dim@tapoueh.org (Dimitri Fontaine)Tue, 06 Jul 2010 10:50:00 +0200http://tapoueh.org/blog/2010/07/06-mvcc-in-the-cloud.htmlBack from CHAR(10)http://tapoueh.org/blog/2010/07/05-back-from-char10.htmlIt surely does not feel like a full month and some more went by since we
were enjoying
PGCon 2010, but in fact it was already the time for
CHAR(10). The venue was most excellent, as Oxford is a very beautiful
city. Also, the college was like a city in the city, and having the
accomodation all in there really smoothed it all.

On a more technical viewpoint, the
range of topics we talked about and the
even broader one in the
"Hall Track" make my mind full of ideas, again. So
I'm preparing a quite lengthy article to summarise or present all those
ideas, and I think a post series should cover the points in there. When
trying to label things, it appears that my current obsessions are mainly
about
PostgreSQL in the Cloud and
Further Optimising PostgreSQL, so that's
what I'll be talking about those next days.

Meanwhile I'm going to search for existing solutions on how to use the
Paxos algorithm to generate a reliable distributed sequence, using
libpaxos
for example. The goal would be to see if it's feasible to have a way to
offer some global
XID from a network of servers in a distributed fashion,
ideally in such a way that new members can join in at any point, and of
course that losing a member does not cause downtime for the online ones. It
sounds like this problem has been extensively researched and is solved,
either by the
Global Communication Systems or the underlying
algorithms. Given the current buy-in lack of our community for
GCS my guess
is that bypassing them would be a pretty good move, even if that mean
implementing a limited form of
GCS ourselves.

]]>dim@tapoueh.org (Dimitri Fontaine)Mon, 05 Jul 2010 09:30:00 +0200http://tapoueh.org/blog/2010/07/05-back-from-char10.htmlBack from PgCon2010http://tapoueh.org/blog/2010/05/27-back-from-pgcon2010.htmlThis year's edition has been the
best pgcon ever for me. Granted, it's only
my third time, but still :) As
Josh said the
"Hall Track" in particular was
very good, and the
Dev Meeting has been very effective!

Extensions

This time I prepared some
slides to present the extension design and I tried
hard to make it so that we get to agree on a plan, even recognizing it's not
solving all of our problems from the get go. I had been talking about the
concept and design with lots of people already, and continued to do so while in
Ottawa on Monday evening and through all Tuesday. So Wednesday, I felt
prepared. It proved to be a good thing, as I edited the slides with ideas from
several people I had the chance to expose my ideas to! Thanks
Greg Stark and
Heikki Linnakangas for the part we talked about at the meeting, and a lot more
people for the things we'll have to solve later (Hi
Stefan!).

So the current idea for
extensions is for the
backend support to start with a
file in
`pg_config --sharedir`/extensions/foo/control containing
the
foo extension's
metadata. From that we know if we can install an extension
and how. Here's an example:

The other files should be
install.sql,
uninstall.sql and
foo.conf. The only
command the user will have to type in order for using the extension in his
database will then be:

INSTALL EXTENSION foo;

For that to work all that needs to happen is for me to write the code. I'll
keep you informed as soon as I get a change to resume my activities on the
git branch I'm using. You can already find my first attempt at a
pg_execute_from_file() function
there.

Building atop that backend support we already have two gentlemen competing on
features to offer to
distribute and
package extensions! That will complete the
work just fine, thanks guys.

Hot Standby

Heikki's talk about
Built-in replication in PostgreSQL 9.0 left me with lots of
thinking. In particular it seems we need two projects out of core to complete
what
9.0 has to offer, namely something very simple to prepare a base backup
and something more involved to manage a pool of standbys.

pg_basebackup

The idea I had listening to the talk was that it might be possible to ask the
server, in a single SQL query, for the list of all the files it's using. After
all, there's those
pg_ls_files() and
pg_read_file() functions, we could put
them to good use. I couldn't get the idea out of my head, so I had to write
some code and see it running:
pg_basebackup is there at
github, grab a copy!

What it does is very simple, in about 100 lines of self-contained python code
it get all the files from a running server through a normal PostgreSQL
connection. That was my first
recursive query. I had to create a new function
to get the file contents as the existing one returns text, and I want
bytea
here, of course.

Note that the code depends on the
bytea representation in use, so it's only
working with
9.0 as of now. Can be changed easily though, send a patch or just
ask me to do it!

Lastly, note that even if
pg_basebackup will compress each chunk it sends over
the
libpq connection, it won't be your fastest option around. Its only
advantage there is its simplicity. Get the code, run it with 2 arguments: a
connection string and a destination directory. There you are.

wal proxy, wal relay

The other thing that we'll miss in
9.0 is the ability to both manage more than
a couple of
standby servers and to manage failover gracefully. Here the idea
would be to have a proxy server acting as both a
walreceiver and a
walsender. Its role would be to both
archive the WAL and
relay them to the real
standbys.

Then in case of master's failure, we could instruct this
proxy to be fed from
the elected new master (manual procedure), the other standbys not being
affected. Well apart than apparently changing the
timeline (which will happen
as soon as you promote a standby to master) while streaming is not meant to be
supported. So the
proxy would also disconnect all the
slaves and have them
reconnect.

If we need such a finesse, we could have the
restore_command on the
standbys
prepared so that it'll connect to the
proxy's archive. Now on failover, the
standbys are disconnected from the stream, get a
WAL file with a new
timeline
from the
archive, replay it, and reconnect.

That means that for a full
HA scenario you could get on with three
servers. You're back to two servers at failover time and need to rebuild the
crashed master as a standby, running a base backup again.

If you've followed the idea, I hope you liked it! I still have to motivate some
volunteers so that some work gets done here, as I'm probably not the one to ask
to as far as coding this is concerned, if you want it out before
9.1 kicks in!

Queuing

We also had a nice
Hall Tack session with
Jan Wieck,
Marko Kreen and
Jim Nasby
about how to get a single general (enough) queueing solution for PostgreSQL. It
happens that the Slony queueing ideas made their way into
PGQ and that we'd
want to add some more capabilities to this one.

What we talked about was adding more interfaces (event producers, event format
translating at both ends of the pipe) and optimising how many events from the
past we keep in the queue for the subscribers, in a cascading environment.

It seems that the basic architecture of the queue is what
PGQ 3 provides
already, so it could even be not that much of a hassle to get something working
out of the ideas exchanged.

Of course, one of those ideas has been discussed at the
Dev Meeting, it's about
deriving the transaction commit order from the place which already has the
information rather than
reconstructing it after the fact. We'll see how it
goes, but it started pretty well with a design mail thread.

Other talks

I went to some other talks too, of course, unfortunately with an attention span
far from constant. Between the social events (you should read that as
beer
drinking evenings) and the hall tracks, more than once my brain were less
present than my body in the talks. I won't risk into commenting them here, but
overall it was very good: in about each talk, new ideas popped into my
head. And I love that.

Conclusion: I'm addicted.

The social aspect of the conference has been very good too. Once more, a warm
welcome from the people that are central to the project, and who are so easily
available for a chat about any aspect of it! Or just for sharing a drink.

Meeting our users is very important too, and
pgcon allows for that also. I've
met some people I'm used to talk to via
IRC, and it was good fun sharing a beer
over there.

All in all, I'm very happy I made it to Ottawa despite the volcano activity,
there's so much happening over there! Thanks to all the people who made it
possible by either organizing the conference or attending to it! See you next
year, I'm addicted...

]]>dim@tapoueh.org (Dimitri Fontaine)Thu, 27 May 2010 14:26:00 +0200http://tapoueh.org/blog/2010/05/27-back-from-pgcon2010.htmlImport fixed width data with pgloaderhttp://tapoueh.org/blog/2010/04/27-import-fixed-width-data-with-pgloader.htmlSo, following previous blog entries about importing
fixed width data, from
Postgres Online Journal and
David (perl) Fetter, I couldn't resist following
the meme and showing how to achieve the same thing with
pgloader.

I can't say how much I dislike such things as the following, and I can't
help thinking that non IT people are right looking at us like this when
encountering such prose.

Note the
-T option is for
TRUNCATE, which you only need when you want to
redo the loading, I've come to always mention it in interactive usage. The
-v option is for some more
verbosity and the
-s for the
summary at end of
operations.

With the
pgloader.conf and
places2k.txt in the current directory, and an
empty table, just typing in
pgloader at the prompt would have done the job.

Oh, the
pg_option_standard_conforming_strings bit is from the
git HEAD, the
current released version has no support for setting any PostgreSQL knob
yet. Still, it's not necessary here, so you can forget about it.

You will also notice that
pgloader didn't trim the data for you, which ain't
funny for the
places column. That's a drawback of the fixed width format
that you can work on two ways here, either by means of

UPDATE places SET loc_name = trim(loc_name) ;

or a custom
reformat module for
pgloader. I guess the latter solution is overkill, but
it allows for
pipe style processing of the data and a single database write.

Send me a mail if you want me to show here how to setup such a reformatting
module in a next blog entry!

]]>dim@tapoueh.org (Dimitri Fontaine)Tue, 27 Apr 2010 12:01:00 +0200http://tapoueh.org/blog/2010/04/27-import-fixed-width-data-with-pgloader.htmlpgloader activity reporthttp://tapoueh.org/blog/2010/04/06-pgloader-activity-report.htmlYes. This
pgloader project is still maintained and somewhat
active. Development happens when I receive a complaint, either about a bug
in existing code or a feature in yet-to-write code. If you have a bug to
report, just send me an email!

This time it's about new features, the goal being to open
pgloader usage
without describing all the file format related details into the
pgloader.conf file. This time around,
Simon is giving feedback and told me
he would appreciate that pgloader would work more like the competition.

We're getting there with some new options. The first one is that rather than
only
Sections, now your can give a
filename as an argument.
pgloader will
then create a configuration section for you, considering the file format to
be
CSV, setting
columns = *. The default
field separator is
|,
so you have also the
-f, --field-separator option to set that from the
command line.

As if that wasn't enough,
pgloader now supports any
PostgreSQL option either
in the configuration file (prefix the real name with
pg_option_) or on the
command line, via the
-o, --pg-options switch, that you can use more than
once. Command line setting will take precedence over any other setup, of
course. Consider for example
-o standard_conforming_strings=on.

While at it, some more options can now be set on the command line, including
-t, --section-threads and
-m, --max-parallel-sections on the one hand and
-r, --reject-log and
-j, --reject-data on the other hand. Those two last
must contain a
%s place holder which will get replaced by the
section name,
or the
filename if you skipped setting up a
section for it.

Your
pgloader usage is now more command line friendly than ever!

]]>dim@tapoueh.org (Dimitri Fontaine)Tue, 06 Apr 2010 09:10:00 +0200http://tapoueh.org/blog/2010/04/06-pgloader-activity-report.htmlFinding orphaned sequenceshttp://tapoueh.org/blog/2010/03/17-finding-orphaned-sequences.htmlThis time we're having a database where
sequences were used, but not
systematically as a
default value of a given column. It's mainly an historic
bad idea, but you know the usual excuse with bad ideas and bad code: the
first 6 months it's experimental, after that it's historic.

Not talking about genome orphaned sequences here, though

Still, here's a query for
8.4 that will allow you to list those
sequences
you have that are not used as a default value in any of your tables:

]]>dim@tapoueh.org (Dimitri Fontaine)Wed, 17 Mar 2010 13:35:00 +0100http://tapoueh.org/blog/2010/03/17-finding-orphaned-sequences.htmlGetting out of SQL_ASCII, part 2http://tapoueh.org/blog/2010/02/23-getting-out-of-sql_ascii-part-2.htmlSo, if you followed the previous blog entry, now you have a new database
containing all the
static tables encoded in
UTF-8 rather than
SQL_ASCII. Because if it was not yet the case, you now severely distrust
this non-encoding.

Now is the time to have a look at properly encoding the
live data, those
stored in tables that continue to receive write traffic. The idea is to use
the
UPDATE facilities of PostgreSQL to tweak the data, and too fix the
applications so as not to continue inserting badly encoded strings in there.

Finding non UTF-8 data

First you want to find out the badly encoded data. You can do that with this
helper function that
RhodiumToad gave me on IRC. I had a version from the
archives before that, but the
regexp was hard to maintain and quote into a
PL function. This is avoided by two means, first one is to have a separate
pure
SQL function for the
regexp checking (so that you can index it should
you need to) and the other one is to apply the regexp to
hex encoded
data. Here we go:

As you can see, this function allows to check the conversion process from a
given supposed encoding before to actually convert the data in place. This
is very useful as even when you're pretty sure the non-utf8 data is
latin1,
sometime you find it's
windows-1252 or such. So double check before telling
leon.nettoyeur() to update your precious data!

Also, there's a facility to use
translate() when none of the encoding match
your expectations. This is a skeleton just replacing invalid characters with
a
space, tweak it at will!

Conclusion

Enjoy your clean database now, even if it still accepts new data that will
probably not pass the checks, so we still have to be careful about that and
re-clean every day until the migration is effective. Or maybe add a
CHECK
clause that will reject badly encoded data...

In fact here we're using
Londiste to replicate the
live data from the old to
the new server, and that means the replication will break each time there's
new data written in non-utf8, as the new server is running
8.4, which by
design ain't very forgiving. Our plan is to clean-up as we go (remove table
from the
subscriber, fix it, add it again) and migrate as soon as possible!

Bonus points to those of you getting the convoluted reference :)

]]>dim@tapoueh.org (Dimitri Fontaine)Tue, 23 Feb 2010 17:30:00 +0100http://tapoueh.org/blog/2010/02/23-getting-out-of-sql_ascii-part-2.htmlGetting out of SQL_ASCII, part 1http://tapoueh.org/blog/2010/02/18-getting-out-of-sql_ascii-part-1.htmlIt happens that you have to manage databases
designed by your predecessor,
and it even happens that the team used to not have a
DBA. Those
histerical
raisins can lead to having a
SQL_ASCII database. The horror!

The problem of course is not being able to read the data you just stored,
which is seldom the use case anywhere you use a database solution such as
PostgreSQL.

Now, it happens too that it's high time to get off of
SQL_ASCII, the
infamous. In our case we're lucky enough in that the data are all in fact
latin1 or about that, and this comes from the fact that all the applications
connecting to the database are sharing some common code and setup. Then we
have some tables that can be tagged
archives and some other
live. This blog
post will only deal with the former category.

For those tables that are not receiving changes anymore, we will migrate
them by using a simple but time hungry method:
COPY OUT|recode|COPY IN. I've
tried to use
iconv for recoding our data, but it failed to do so in lots of
cases, so I've switched to using the
GNU recode tool, which works just fine.

The fact that it takes so much time doing the conversion is not really a
problem here, as you can do it
offline, while the applications are still
using the
SQL_ASCII database. So, here's the program's help:

The
-I option is neat enough to create the indexes in parallel, but with no
upper limit on the number of index creation launched. In our case it worked
well, so I didn't have to bother.

Take a look at the
recode.sh script, and don't hesitate editing it for your
purpose. It's missing some obvious options to get useful in the large, such

We'll get back to the subject of this entry in
part 2, dealing with how to
recode your data in the database itself, thanks to some insane regexp based
queries and helper functions. And thanks to a great deal of IRC based
helping, too.

]]>dim@tapoueh.org (Dimitri Fontaine)Thu, 18 Feb 2010 11:37:00 +0100http://tapoueh.org/blog/2010/02/18-getting-out-of-sql_ascii-part-1.htmlResetting sequences. All of them, please!http://tapoueh.org/blog/2010/02/16-resetting-sequences-all-of-them-please.htmlSo, after restoring a production dump with intermediate filtering, none of
our sequences were set to the right value. I could have tried to review the
process of filtering the dump here, but it's a
one-shot action and you know
what that sometimes mean. With some pressure you don't script enough of it
and you just crawl more and more.

Still, I think how I solved it is worthy of a blog entry. Not that it's
about a super unusual
clever trick, quite the contrary, because questions
involving this trick are often encountered on the support
IRC.

The idea is to query the catalog for all sequences, and produce from there
the
SQL command you will have to issue for each of them. Once you have this
query, it's quite easy to arrange from the
psql prompt as if you had dynamic
scripting capabilities. Of course in
9.0 you will have
inline anonymousDO
blocks.

Once you have the
/tmp/sequences.sql file, you can ask
psql to execute its
command as you're used to, that's using
\i in an explicit transaction block.

Now, the interresting part if you got here attracted by the blog entry title
is in fact the query itself. A nice way to start is to
\set ECHO_HIDDEN then
describe some table, you now have a catalog example query to work with. Then
you tweak it somehow and get this:

Coming next, a
recode based script in order to get from
SQL_ASCII to
UTF-8,
and some strange looking queries too.

recode.sh [-npdf0TI] [-U user ] -s schema [-m mintable] pattern

Stay tuned!

]]>dim@tapoueh.org (Dimitri Fontaine)Tue, 16 Feb 2010 16:23:00 +0100http://tapoueh.org/blog/2010/02/16-resetting-sequences-all-of-them-please.htmlprefix 1.1.0http://tapoueh.org/blog/2009/11/30-prefix-110.htmlSo I had two
bugreports about
prefix in less than a week. It means several
things, one of them is that my code is getting used in the wild, which is
nice. The other side of the coin is that people do find bugs in there. This
one is about the behavior of the
btree opclass of the type
prefix range. We
cheat a lot there by simply having written one, because a range does not
have a strict ordering: is
[1-3] before of after
[2-4]? But when you know
you have no overlapping intervals in your
prefix_range column, being able to
have it part of a
primary key is damn useful.

Note: in
8.5 we should have a way to express
contraint exclusion and have
PostgreSQL forbids overlapping entries for us. Not being there yet, you
could write a
constraint trigger and use the
GiST index to have nice speed
there, which is exactly what this
constraint exclusion support is about.

This happens in the
compare support function (see
Interfacing Extensions To Indexes) so that means you now have to rebuild
your
prefix_range btree indexes, hence the version number bump.

]]>dim@tapoueh.org (Dimitri Fontaine)Mon, 30 Nov 2009 12:10:00 +0100http://tapoueh.org/blog/2009/11/30-prefix-110.htmlPGDay.eu, Paris: it was awesome!http://tapoueh.org/blog/2009/11/09-pgdayeu-paris-it-was-awesome.htmlmoment. Lots of
attendees, lots of quality talks (
slides are online), good
food, great party: all the ingredients were there!

It also was for me the occasion to first talk about this tool I've been
working on for months, called
pg_staging, which aims to empower those boring
production backups to help maintaining
staging environments (for your
developers and testers).

All in all such events keep reminding me what it means exactly when we way
that one of the greatest things about
PostgreSQL is its community. If you
don't know what I'm talking about, consider
joining!

]]>dim@tapoueh.org (Dimitri Fontaine)Mon, 09 Nov 2009 09:50:00 +0100http://tapoueh.org/blog/2009/11/09-pgdayeu-paris-it-was-awesome.htmlprefix 1.0.0http://tapoueh.org/blog/2009/10/06-prefix-100.htmlSo there it is, at long last, the final
1.0.0 release of prefix! It's on its
way into the debian repository (targetting sid, in testing in 10 days) and
available on
pgfoundry to.

In order to make it clear that I intend to maintain this version, the number
has 3 digits rather than 2... which is also what
PostgreSQL users will
expect.

The only last minute change is that you can now use the first version of the
two following rather than the second one:

For you information, I'm thinking about leaving
pgfoundry as far as the
source code management goes, because I'd like to be done with
CVS. I'd still
use the release file hosting though at least for now. It's a burden but it's
easier for the users to find them, when they are not using plain
apt-get
install. That move would lead to host
prefix and
pgloader and the
backports
over there at
github, where my next pet project,
pg_staging, will be hosted
too.

The way to see this
pgfoundry leaving is that if everybody does the same,
then migrating the facility to some better or more recent hosting software
will be easier. Maybe some other parts of the system are harder than the
sources to migrate, though. If that's the case I'll consider moving them out
too, maybe getting listed on the
PostgreSQL Software Catalogue will prove
enough as far as web presence goes?

]]>dim@tapoueh.org (Dimitri Fontaine)Tue, 06 Oct 2009 15:56:00 +0200http://tapoueh.org/blog/2009/10/06-prefix-100.htmlprefix 1.0~rc2-1http://tapoueh.org/blog/2009/07/09-prefix-10rc2-1.htmlI've been having problem with building both
postgresql-8.3-prefix and
postgresql-8.4-prefix debian packages from the same source package, and
fixing the packaging issue forced me into modifying the main
prefixMakefile. So while reaching
rc2, I tried to think about missing pieces easy
to add this late in the game: and there's one, that's a function
length(prefix_range), so that you don't have to cast to text no more in the
following wildspread query:

]]>dim@tapoueh.org (Dimitri Fontaine)Thu, 09 Jul 2009 12:48:00 +0200http://tapoueh.org/blog/2009/07/09-prefix-10rc2-1.htmlprefix extension reaches 1.0 (rc1)http://tapoueh.org/blog/2009/06/23-prefix-extension-reaches-10-rc1.htmlAt long last, after millions and millions of queries just here at work and
some more in other places, the
prefix project is reaching
1.0 milestone. The
release candidate is getting uploaded into debian at the moment of this
writing, and available at the following place:
prefix-1.0~rc1.tar.gz.

If you have any use for it (as some
VoIP companies have already), please
consider testing it, in order for me to release a shiny
1.0 next week! :)

Recent changes include getting rid of those square brackets output when it's
not neccesary, fixing btree operators, adding support for more operators in
the
GiST support code (now supported:
@>,
<@,
=,
&&). Enjoy!

]]>dim@tapoueh.org (Dimitri Fontaine)Tue, 23 Jun 2009 10:53:00 +0200http://tapoueh.org/blog/2009/06/23-prefix-extension-reaches-10-rc1.htmlPgCon 2009http://tapoueh.org/blog/2009/05/27-pgcon-2009.htmlI can't really compare
PgCon 2009 with previous years versions, last time I
enjoyed the event it was in 2006, in Toronto. But still I found the
experience to be a great one, and I hope I'll be there next year too!

I've met a lot of known people in the community, some of them I already had
the chance to run into at Toronto or
Prato, but this was the first time I
got to talk to many of them about interresting projects and ideas. That only
was awesome already, and we also had a lot of talks to listen to: as others
have said, it was really hard to get to choose to go to only one place out
of three.

I'm now back home and seems to be recovering quite fine from jet lag, and I
even begun to move on the todo list from the conference. It includes mainly
Skytools 3 testing and contributions (code and documentation),
Extension Packaging work (Stephen Frost seems to be willing to help, which I
highly appreciate) begining with
search_path issues, and posting some
backtrace to help fix some
SPI_connect() bug at
_PG_init() time in an
extension.

The excellent
lightning talk about _How not to Review a Patch_ by Joshua
Tolley took me out of the
dim, I'll try to be
bright enough and participate
as a reviewer in later commit fests (well maybe not the first next ones as
some personal events on the agenda will take all my
"free" time)...

Oh and the
Golconde presentation gave some insights too: this queueing based
solution is to compare to the
listen/notify mechanisms we already have in
PostgreSQL, in the sense that's it's not transactional, and the events are
kept in memory only to achieve very high distribution rates. So it's a very
fine solution to manage a distributed caching system, for example, but not
so much for asynchronous replication (you need not to replicate events tied
to rollbacked transactions).

So all in all, spending last week in Ottawa was a splendid way to get more
involved in the PostgreSQL community, which is a very fine place to be
spending ones free time, should you ask me. See you soon!

]]>dim@tapoueh.org (Dimitri Fontaine)Wed, 27 May 2009 14:30:00 +0200http://tapoueh.org/blog/2009/05/27-pgcon-2009.htmlPrepared Statements and pgbouncerhttp://tapoueh.org/blog/2009/05/14-prepared-statements-and-pgbouncer.htmlOn the performance mailing list, a recent
thread drew my attention. It
devired to be about using a connection pool software and prepared statements
in order to increase scalability of PostgreSQL when confronted to a lot of
concurrent clients all doing simple
select queries. The advantage of the
pooler is to reduce the number of
backends needed to serve the queries, thus
reducing PostgreSQL internal bookkeeping. Of course, my choice of software
here is clear:
PgBouncer is an excellent top grade solution, performs real
well (it won't parse queries), reliable, flexible.

The problem is that while conbining
pgbouncer and
prepared statements is
possible, it requires the application to check at connection time if the
statements it's interrested in are already prepared. This can be done by a
simple catalog query of this kind:

SELECT name
FROM pg_prepared_statements
WHERE name IN ('my', 'prepared', 'statements');

Well, this is simple but requires to add some application logic. What would
be great would be to only have to
EXECUTE my_statement(x, y, z) and never
bother if the
backend connection is a fresh new one or an existing one, as
to avoid having to check if the application should
prepare.

The
preprepare pgfoundry project is all about this: it comes with a
prepare_all() function which will take all statements present in a given
table (
SET preprepare.relation TO 'schema.the_table';) and prepare them for
you. If you now tell
pgbouncer to please call the function at
backend
creation time, you're done (see
connect_query).

There's even a detailed
README file, but no release yet (check out the code
in the
CVS,
pgfoundry project page has
clear instruction about how to do so.

]]>dim@tapoueh.org (Dimitri Fontaine)Thu, 14 May 2009 00:00:00 +0200http://tapoueh.org/blog/2009/05/14-prepared-statements-and-pgbouncer.htmlSkytools 3.0 reaches alpha1http://tapoueh.org/blog/2009/04/14-skytools-30-reaches-alpha1.htmlIt's time for
Skytools news again! First, we did improve documentation of
current stable branch with hosting high level presentations and
tutorials on
the
PostgreSQL wiki. Do check out the
Londiste Tutorial, it seems that's
what people hesitating to try out londiste were missing the most.

The other things people miss out a lot in current stable Skytools (version
2.1.9 currently) are cascading replication (which allows for
switchover and
failover) and
DDL support. The new incarnation of skytools, version
3.0reaches alpha1 today. It comes with full support for
cascading and
DDL, so
you might want to give it a try.

It's a rough release, documentation is still to get written for a large part
of it, and bugs are still to get fixed. But it's all in the Skytools spirit:
simple and efficient concepts, easy to use and maintain. Think about this
release as a
developer preview and join us :)

]]>dim@tapoueh.org (Dimitri Fontaine)Tue, 14 Apr 2009 00:00:00 +0200http://tapoueh.org/blog/2009/04/14-skytools-30-reaches-alpha1.htmlPrefix GiST index now in 8.1http://tapoueh.org/blog/2009/02/10-prefix-gist-index-now-in-81.htmlThe
prefix project is about matching a
literal against
prefixes in your
table, the typical example being a telecom routing table. Thanks to the
excellent work around
generic indexes in PostgreSQL with
GiST, indexing
prefix matches is easy to support in an external module. Which is what
the
prefix extension is all about.

Maybe you didn't come across this project before, so here's the typical
query you want to run to benefit from the special indexing, where the
@>
operator is read
contains or
is a prefix of:

SELECT * FROM prefixes WHERE prefix @> '0123456789';

Now, a user asked about an
8.1 version of the module, as it's what some
distributions ship (here, Red Hat Enterprise Linux 5.2). It turned out it
was easy to support
8.1 when you already support
8.2, so the
CVS now hosts
8.1 support code. And here's what the user asking about the feature has to
say:

It's works like a charm now with 3ms queries over 200,000+ rows. The speed
also stays less than 4ms when doing complex queries designed for fallback,
priority shuffling, and having multiple carriers.

]]>dim@tapoueh.org (Dimitri Fontaine)Tue, 10 Feb 2009 00:00:00 +0100http://tapoueh.org/blog/2009/02/10-prefix-gist-index-now-in-81.htmlImporting XML content from filehttp://tapoueh.org/blog/2009/02/05-importing-xml-content-from-file.htmlThe problem was raised this week on
IRC and this time again I felt it would
be a good occasion for a blog entry: how to load an
XML file content into a
single field?

The usual tool used to import files is
COPY, but it'll want each line of the
file to host a text representation of a database tuple, so it doesn't apply
to the case at hand.
RhodiumToad was online and offered the following code
to solve the problem:

As you can see, the trick here is to use the
large objects API to load the
file content into memory (
content variable), then to parse it knowing it's
an
UTF8 encoded
XML file and return an
XML datatype object.

The presentation page is in russian but the slides are in English, so have a
nice read!

]]>dim@tapoueh.org (Dimitri Fontaine)Wed, 04 Feb 2009 00:00:00 +0100http://tapoueh.org/blog/2009/02/04-asko-oja-talks-about-skype-architecture.htmlComparing Londiste and Slonyhttp://tapoueh.org/blog/2009/01/31-comparing-londiste-and-slony.htmlIn the page about
Skytools I've encouraged people to ask some more questions
in order for me to be able to try and answer them. That just happened, as
usual on the
#postgresql IRC, and the question is
What does londiste lack that slony has?
]]>dim@tapoueh.org (Dimitri Fontaine)Sat, 31 Jan 2009 00:00:00 +0100http://tapoueh.org/blog/2009/01/31-comparing-londiste-and-slony.htmlControling HOT usage in 8.3http://tapoueh.org/blog/2009/01/28-controling-hot-usage-in-83.htmlAs it happens, I've got some environments where I want to make sure
HOT (
aka
Heap Only Tuples) is in use. Because we're doing so much updates a second
that I want to get sure it's not killing my database server. I not only
wrote some checking view to see about it, but also made a
quick article
about it in the
French PostgreSQL website. Handling around in
#postgresql
means that I'm now bound to write about it in English too!

So
HOT will get used each time you update a row without changing an indexed
value of it, and the benefit is skipping index maintenance, and as far as I
understand it, easying
vacuum hard work too. To get the benefit,
HOT will
need some place where to put new version of the
UPDATEd tuple in the same
disk page, which means you'll probably want to set your table
fillfactor to
something much less than
100.

Here's even an extended version of the same request, displaying the
fillfactor option value for the tables you're inquiring about. This comes
separated from the first example because you get the
fillfactor of a
relation into the
pg_class catalog
reloptions field, and to filter against a
schema qualified table name, you want to join against
pg_namespace too.

]]>dim@tapoueh.org (Dimitri Fontaine)Wed, 28 Jan 2009 00:00:00 +0100http://tapoueh.org/blog/2009/01/28-controling-hot-usage-in-83.htmlLondiste Trickhttp://tapoueh.org/blog/2009/01/21-londiste-trick.htmlSo, you're using
londiste and the
ticker has not been running all night
long, due to some restart glitch in your procedures, and the
on call admin
didn't notice the restart failure. If you blindly restart the replication
daemon, it will load in memory all those events produced during the night,
at once, because you now have only one tick where to put them all.

The following query allows you to count how many events that represents,
with the magic tick numbers coming from
pgq.subscription in columns
sub_last_tick and
sub_next_tick.

SELECT count(*)
FROM pgq.event_1,
(SELECT tick_snapshot
FROM pgq.tick
WHERE tick_id BETWEEN 5715138 AND 5715139
) as t(snapshots)
WHERE txid_visible_in_snapshot(ev_txid, snapshots);

In our case, this was more than
5 millions and 400 thousands of events. With
this many events to care about, if you start londiste, it'll eat as many
memory as needed to have them all around, which might be more that what your
system is able to give it. So you want a way to tell
londiste not to load
all events at once. Here's how: add the following knob to your
.ini
configuration file before to restart the londiste daemon:

pgq_lazy_fetch = 500

Now,
londiste will lazyly fetch
500 events at once or less, even if a single
batch (which contains all
events between two
ticks) contains a huge number
of events. This number seems a good choice as it's the default
PGQ setting
of number of events in a single
batch. This number is only outgrown when the
ticker is not running or when you're producing more
events than that in a
single transaction.

Hope you'll find the tip useful!

]]>dim@tapoueh.org (Dimitri Fontaine)Wed, 21 Jan 2009 00:00:00 +0100http://tapoueh.org/blog/2009/01/21-londiste-trick.htmlFake entryhttp://tapoueh.org/blog/2008/12/04-fake-entry.htmlThis is a test of a fake entry to see how muse will manage this.