Since the last time I talked about it, I had quite a few feedbacks, bug issues, pull requests, and so on. Many issues were fixed, the last of it tonight.

I also added two new reports. I had the idea while working on my customers' clusters.

One of them had a lot of writes on their databases, and I wanted to know how much writes occured in the WAL files. vmstat would only tell me how much writes on all files, but I was only interested in WAL writes. So I added a new report that grabs the current XLOG position, and diff it with the previous XLOG position. It gives something like this with a pgbench test:

Another customer wanted to know how many temporary files were written, and their sizes. Of course, you can get that with the pg_stat_database view, but it only gets added when the query is done. We wanted to know when the query is executed. So I added another report:

One of the tools I usually need when I go see customers is vmstat. Nothing beats vmstat to give me a real overview of what the server is really doing. This overview gives system metrics, such as CPU usage, and disk usage. That's quite useful to check where the bottleneck comes from.

I wish I had a PostgreSQL tool like that. I wished enough to eventually build it. I call it pgstat because I couldn't find a better name for it.

It's an online command tool that connects to a database and grabs its activity statistics. As PostgreSQL has many statistics, you have a command switch to choose the one you want (-s):

archiver for pg_stat_archiver

bgwriter for pg_stat_bgwriter

connection for connections by type

database for pg_stat_database

table for pg_stat_all_tables

tableio for pg_statio_all_tables

index for pg_stat_all_indexes

function for pg_stat_user_function

statement for pg_stat_statements

pbpools for pgBouncer pools statistics

pbstats for pgBouncer general statistics

It looks a lot like vmstat. You ask it the statistics you want, and the frequency to gather these statistics. Just like this:

Quick blog post to remind you that you still have a few hours to submit your talks to pgconf.eu if you haven't done so yet. Read our call for paper, and submit your proposals to be part of an awesome PostgreSQL event

This is the commit message of commit c87ff71f374652936a6089215a30998492b14d52 in the PostgreSQL git repository. This new feature is really interesting. It was written by Mark Kirkwood, reviewed by Laurenz Albe, and commited by Magnus Hagander.

It allows one to see when the autovacuum will fire an ANALYZE on a table. This is extremely useful. At least, to me, when I do a training course. Before, you only had dead_tuples in pg_stat_all_tuples to guess when autovacuum will fire a VACUUM. You had nothing to guess when it will fire an ANALYZE. As the commit message says, this information is available in PostgreSQL but not publicly available. This new function makes it available for the next PostgreSQL release. The PostgreSQL project only adds new features in the development branch of PostgreSQL. So the older releases won't have it. Unless you have an extension which will do the same thing.

That's what I did today: add the function in an extension, and add the extension on pgxn.org so that anyone can install it.

It's actually quite simple to install. The easier way is to use pgxnclient:

If you haven't already send a proposal, you should seriously consider it. PostgreSQL Conference Europe 2013 will take place in Dublin, Ireland, on October 29-November 1, and we accept talks on a wide range of topics related to PostgreSQL:

Developing applications for PostgreSQL

Administering large scale PostgreSQL installations

Case studies and/or success stories of PostgreSQL deployments

PostgreSQL tools and utilities

PostgreSQL hacking

Community & user groups

Tuning the server

Migrating from other systems

Scaling/replication

Benchmarking & hardware

PostgreSQL related products

To be honest, I would love to see more case studies, as we usually don't have a lot and that they are really interesting to many people.

Well, there are only 20 days left to send your proposal. Read our call for paper, think about something you want to talk about, and submit your proposals (yes, you can submit more than one ).

In the first article, we talked about the minimum required functions to be able to add a foreign data wrapper to a PostgreSQL database. Alas, we couldn't do much with it, as it didn't have other capabilities. And one of the most interesting capabilities of a FDW is to access remote datas. In other words, read them.

So, in this blog post, we'll see how we can add reading capabilities to our simple FDW.

In the previous blog post, we said that there are four hook functions to implement to get this capability:

That's quite a lot of code. Simply put, all executor void functions just call elog so that we know when the function is called. The only executor returning function is the one returning a tuple. We simply return an empty tuple right now. The planner functions do as less as they can to be able to do a SELECT statement. We'll explain what they do later.

So, what happened? the SELECT statement is executed in two parts: the planner part, and the executor part. The planner called our handler function to make sure that the hook functions were set. Then it called each of the functions it needed: simpleGetForeignRelSize, simpleGetForeignPaths, and simpleGetForeignPlan, as I'm on a PostgreSQL 9.2 release. Then the executor called once again the handler function. And then it started reading the table: it called simpleBeginForeignScan to prepare the reading, simpleIterateForeignScan as much as needed till it got an empty tuple (so only once in our case), and then simpleEndForeignScan to clean everything that needed to get cleaned. We got no rows because our code was that simple.

Now, we need to make it actually read something. We'll use an sqlite database as our example, but the idea is the same for every FDW you can find.

To read a table on an sqlite database, you need to open a database connection, prepare a query, and then loop on all the tuples returned by your query. At the end, you'll have to close the database connection. Here is a simple code that do it alone (I mean, outside of a FDW):

Of course, this is a really simple code, without error checking and stuff like that. Anyway, that's enough to work on our simple FDW. Remember that we have at least three functions to code:

simpleBeginForeignScan will open the database connection (sqlite3_open)

simpleIterateForeignScan will prepare the query (sqlite3_prepare) and fetch the first tuple (sqlite3_step) if it didn't do that already, otherwise it will only fetch the next tuple (sqlite3_step)

and simpleEndForeignScan will close the database connection (sqlite3_close)

The question is: how can we give the connection handle and the statement handle created in the first and second function to the second and third functions? and the answer is: with some private Foreign Data Wrapper structure that will be stored in the ForeignScanState structure.

So, onto the code. The first thing we need to do is add the sqlite header file:

This structure will contain the connection handle, the statement handle, and the query string.

Now, we will write the simpleBeginForeignScan. It will open the connection, and prepare the query. As our foreign table accepts one option, the table name, we'll take advantage of this to name the table we want to read. We don't yet have the code to get the options back. We'll write a function to get them. Here is its declaration:

The code is easy to read. We first get a handle on the foreign table and on the foreign server. We put all options into the options variable, and we loop through it. We check that we have the two options we need and we set the variables for them. It errors out if we don't have the two options we need (ie, the database name and the table name).

Now that we can get the two options through this function, we can write the code of the simpleBeginForeignScan hook function:

Once again, it's quite easy to read. We first grab the database and table names. Then, we open the database with the sqlite3_open function. If it didn't work, we use ereport to report the error. If everything is fine, we build the query string, which will simply be a "SELECT * FROM" and our table name. Finally, we allocate some memory for our SimpleFdwExecutionState structure, and we push the connection handle and the query string in it.

Now that the FDW can open a connection, we need to be able to execute a query, and fetch each resulting row. This is the aim of the simpleIterateForeignScan function. It will prepare the query if it hasn't done so, and then it will fetch the next tuple. Here is its code:

sqlite3_column_count gives us the number of columns in the resulting tuple, and we use that in a for loop, to get each of the values. BuildTupleFromCStrings builds a HeapTuple from all gathered values, and ExecStoreTuple stores the HeapTuple in the slot.

Finally, we need the ending function. It will clean every part of memory we allocated to scan the remote table. So, it's pretty simple: we have to deallocate the query string, the statement handle, and the database handle:

So, we had to create quite a lof of functions to be able to read some remote data: add some planner hook functions, add some executor reading hook functions, add some helper functions (actually one to get the list of options). That wasn't hard to do, but there are plenty of things to know.

To be honest, we didn't put many thoughts on the planner functions. Even if we get 25 rows, the planner still thinks we'll have 0 rows, and that it will cost nothing to read the remote data. And we didn't use the simpleReScanForeignScan function.

But this is enough for today. We'll go deeper in the planner hook functions in another blog post.

By the way, the simple FDW is available on this github repository: Writing A FDW. This repository will contain all the code written for this serie of blog posts.

I'm still working on my foreign data wrapper. Actually, I haven't written a lot of code. Instead, I've been reading the code of the other foreign data wrappers and it helped me a lot to understand many things, and to see how far each went.

I'll try to start a few blog posts about coding a foreign data wrapper.

This first one (this one) will be about the two functions needed to declare a foreign data wrapper. Let's take a look at the CREATE FOREIGN DATA WRAPPER DDL statement:

AnalyzeForeignTable, collect statistics for this foreign table when ANALYZE is executed on it

None are really required. Of course, if you implement none of them, the foreign data wrapper won't be really useful

All these hooks are to be set inside a structure of type FdwRoutine. It's best to use the makeNode() function to set all fields to NULL. Then, you'll be able to set each of the hooks you really need, and don't care about the rest.

At the end, this function hands back the FdwRoutine structure to the caller of the function.

Of course, all the my* functions must be previously declared. Rather than the actual code, the more interesting thing to know in a foreign data wrapper is which hooks it support. It really depends a lot on the FDW. For example, the Oracle FDW doesn't support yet the write functions. The MySQL FDW doesn't support the write, the explain, and the analyze functions. Etc.

The validator function checks the options given to a specific foreign object (foreign data wrapper, foreign server, user mapping, and foreign table). It gets a list of all the options, and the context of the options (aka the foreign object type). The function has to go through each of them, checks that they are valid, and error out if anything wrong happens (send back the ERRCODE_FDW_INVALID_OPTION_NAME error if it isn't a valid option, please).

A good way of doing it is to first add a FDW option structure, this way:

struct myFdwOption
{
const char *optname;
Oid optcontext;
};

The optname will obviously be the option name. The optcontext will contain the kind of object this option is for. There are four possibilities: ForeignDataWrapperRelationId, ForeignServerRelationId, UserMappingRelationId, and ForeignTableRelationId.

Last week, one of my customers asked me during a training why there is no foreign data wrapper for sqlite (actually, you can have one with multicorn, but there's no native one). I have no idea why but I was happy to learn that no sqlite FDW already existed. I wanted to write one for quite some time now, and it appeared to be the perfect idea.

So, in the evening, I started working on one. I took as foundation the blackhole foreign data wrapper, written by Andrew Dunstan. It helped a lot to start quickly. I found a bit surprising that it didn't include #ifdef for some functions and hooks, so that it could be compatible with 9.1 and 9.2. I added them in a patch that you can find here if you need them. Otherwise, you can simply delete some parts of the blackhole code.

I wanted something really simple to start with. Almost two years ago, at pgconf.eu, I went to see Dave Page's talk on FDW (PostgreSQL at the center of your dataverse). So I already knew that you mainly need three functions to read a table: BeginForeignScan, IterateForeignScan, EndForeignScan. The first one has to open the connection (if it's not opened yet). The second one will be executed as many times as there are rows to grab. On its first execution, it must launch the query and get the first row. On every other iteration, it will grab a new row. And the third function helps cleaning memory. So I started to include the sqlite tutorial code in the blackhole FDW. And it worked great. I had to write some other functions, specifically the handler and the validator, but,in an hour, I had something working. Of course, it wasn't pretty. The database filename was written in the code, with no option to change it. The query executed remotely was also written in the code, which means you couldn't change the tablename without recompiling.

So I started to look at options for the foreign server and the foreign table. Most of the code was taken from Dave Page's mysql_fdw code. I quickly had a foreign data wrapper allowing two options: database (to change the database file), and table (to target a specific remote table).

The whole code is available on the sqllite_fdw github repository. I'll try to add the missing features as soon as possible. And when I'll have something working and tested, I'll post it on pgxn.