It seems the buildfarm has been failing not only on Win32, but also on other machines! The fact is that on Win32 the failure is only much more probable. So probable, in fact, it seems to happen everytime.

The failing test is the stats test, which tries to measure whether the stat system is actually counting operations on the tables. Initially it was thought that autovacuum was the cause, but some investigation suggests that not to be the case.

Magnus has been helping me pinpoint the problem. The first thing we tried was to have autovacuum use a "reasonable" setting for vacuum_cost_delay. "Give it 10 milliseconds," I told him, confident that such a low setting was enough to cause the scheduler to let the stats system to run and thus increment the counters. The theory was that autovacuum being enabled caused stats not to have time to run in the 2 seconds that the test sleeps.

It didn't work though, so he raised it to 100ms and then 1000ms, to no avail. The test still failed.

Next, he raised the 2 seconds sleep to 10 seconds. It didn't work either. So he turned autovacuum off, and reran the test. Guess what? The test still failed!!

Then he checked the test manually, and it turned out that the pgstat views show the table to always have counters on 0!

We haven't been able to pinpoint the exact cause, but now it's looking like the autovacuum change wasn't the culprit; maybe it was the autovacuum change plus something else. We're not sure.

On other news, I managed to get the whole buildfarm in red for several hours with a pg_regress patch. The fix to the problem was quite simple, but since I'm spending some time researching a new appt. to move to, it took longer than expected. Sadly, all the Windows machines are still failing a test with amazing reproducibility. We need some Windows hacker involved in order to fix it though ... the failure is pretty bizarre.

Got a new book collection today. Yay! Huh, and that has to do with a
PostgreSQL-related blog exactly what, you ask?

The story goes like this. Back in 2004, I went to Caracas, Venezuela, for a
couple of weeks. During my stay, I was asked to do some consulting for a
courier company that was using PostgreSQL for its mission-critical database.
It was getting very slow, they whined, and had been approached by DB2
salespeople to replace it. The company that did the Linux support for them
was not very fond of ditching their free database for some proprietary stuff,
so they asked me if I could have a look and try to "make it faster".

Turns out they were running PostgreSQL 7.1, which was already quite old, and
had some horrible queries and very big tables. Most of the queries did not
properly use indexes, so their I/O channels were saturated by the constant
seqscans. They needed a lot of tuning, caring, and query refining; but what
they needed the most was an upgrade. So that's what I focused on at that time.

Of course, I was paid for this work.

When I got home, however, they were still having some issues, and on IM I
kept giving suggestions on how to improve queries, pointing the appropriate
indexes to create, suggesting FSM configuration improvements and proper vacuum
scheduling, etc. This went on for almost two years. In those months I never
saw a dime from them and I didn't really care, because it was entertaining most
of the time, and a good experience; and I already had Command Prompt's wage, or
EnterpriseDB's before that, so it felt a bit unethical to behave like we had a
support contract.

A couple of weeks ago, however, they were having problems again, queries
were taking too long, customers were starting to be annoyed and a pretty girl was starting to pull her hair out — so one sunny saturday her and I sat down and studied some of the
problems and finally figured'em out. Just before I left, I tossed my (shameless plug)
Amazon wishlist URL
to give her an idea.

Today I got this:

For you the not-curious-enough-to-click, that's "The Complete Calvin and
Hobbes". That's right, ten kilogram and ten years of Calvin and Hobbes in the
most beautiful hardcover edition I can imagine. Who wants two years of
support's money? I'll take a nice gift like this any day. This one will keep
me busy for quite a while. Thank you Jacqueline!

Don't get me wrong — I, like most of you, need the dollars anyway for paying
the rent and other mundane stuff. I also enjoy being able to eat :-) and will
happily continue working on stuff I don't quite enjoy as much as doing
PostgreSQL support over Jabber, just to have that privilege. I still have fun
by hacking on the challenge that is Mammoth Replicator.
But I'm a Calvin and Hobbes lover and things like these cannot be beat by mere checks.

Yesterday I had a meeting with the NIC.cl guys -- they run the
Chilean ccTLD. They've been running their stuff using a different open source
database and they are not happy with it for two reasons: 1) they discovered
that their replication system chooses to skip some updates from time to time,
which made them a bit nervous :-) and 2) a competing, closed source database company
recently bought part of the first company's infrastructure providers, which
makes the business future of that other open source company unclear. (I don't think it's very difficult to guess what database they are using.)

So they contacted me so that I could update them on the "state of the art" in
PostgreSQL replication and stuff. I mentioned Afilias and they were very
interested; people from Afilias are likely to be contacted soon to figure out
how their experience with Slony-I has been. I can't help being excited; I expect this to be a future
public success story, and a notorious one, here in Chile. It makes me really
happy; soon we _will_ see more of these here, I hope.

Ok, so I think I've postponed writing here for too long. I guess the most
important thing that I've omitted is the switch from EnterpriseDB to Command
Prompt.

EnterpriseDB was not a bad place for me to work in, I think. There was
nothing wrong with them in principle. I just felt somehow awkward. One fine
day Josh Drake emailed me and offered to go with him instead; it didn't take
me much to say aye. To me it wasn't really much difference either way; one
could say that the actual difference is more in how I relate to the "big
bosses" in each company, than in any other matter. But this relationship
really matters to me, which is why I switched.

(A couple of people has approached me asking how was EDB as an employer. If you are a PostgreSQL hacker looking for a job, I think EnterpriseDB is a good place to be in. If you are a long-term contributor, make sure you'll have time to work on the free stuff!)

CommandPrompt is great in one important regard: they've been alive for a long
time and have great customers. They are not going away anytime soon. They
have well defined products which they have been selling for some time already.
This doesn't mean that their products (particularly the Replicator stuff) do
not need further work, :-) particularly when no one from PGDG has been
overseeing their internal development. So I have plenty of things to do in
their proprietary offerings.

But I didn't enter CommandPrompt just to work on proprietary stuff. I really
need to take a break from this pressure and do something "in the open." I
have talked to Joshua about this and he knows it, so I hope to get back on
track sometime soon.

Another development cycle has come and go. Looking back, I see that I did manage to get some new things, but certainly not as much as I'd like. If I wasn't distracted so easily this release would be better ;-) Looking back a little, here is what I achieved for PostgreSQL 8.1:

Shared row locking. I wrote about this one a couple of times before.

Shared dependency tracking. This patch was committed at last minute, because it conflicted hard with the roles patch. I managed to get it in however. For readers not in the know, this new functionality ensures that the system refuses to drop any user that owns objects (tables, functions, etc) in some database, or that has permissions (grants) on some objects. This is not a big feature but it means the database is always in a consistent state.

Integrated autovacuum. Yes, we now have an integrated autovacuum daemon in the backend. It's far more powerful than the contrib daemon: the various parameters can be set per-table, and it can be disabled per-table too if needed.

Present

The EnterpriseDB guys published a press release stating that they have hired me and a couple of other guys. Truth is, I've been working on their account for a couple months already. It's cool because they let me contribute my work back to the community; in fact, the idea is that my work would be 50% closed and 50% community, though I haven't seen the closed part yet. I'm OK with the split, because I was already doing closed software before, so in no way it's worse than before, but I'd like the closed part to be as small as possible. We'll see how it all turns out.

Future

So what now? I have some plans, and EDB has plans for me:

Shared dependencies still have to go some way. I need to implement a way to facilitate dropping users, because currently you have to drop or give away the objects manually. My proposal is to implement a couple of commands, DROP OWNED BY and REASSIGN OWNED BY.

Autovacuum needs further improvements, such as working without row-level stats. Also, I'd like the ability to track Xid wraparound per-table, so we don't force a database-wide vacuum on anybody.

I want to start working on the optimizer and executor code, just to know how does it work and maybe to do something interesting on the long run. My plan for the first step is to add some more specific optimization to BETWEEN. I had an idea for a second step, but I don't remember right now what it was and I don't have my notes handy. The final grand plan was to implement recursive SQL queries, but I'm a long way from that, so maybe someone will beat me to it.

Slony-II. EDB wants me to lay my hands on Spread, or whatever group communication suit Slony-II ends up using. My first step is to develop some sort of test harness for Spread. I'm still not sure how to do it. I still have some research to do on the subject.

It's been a month since my last entry, so clearly I'm not a good blogger. Anyway:

I submitted the MultiXactId patch some time ago. Tom reviewed and committed it. We don't have the foreign key problem anymore! I was kinda ashamed by that problem, so I'm happy it's now solved. Fortunately, we figured a way to make the mechanism be used only when it's really needed, that is, only when a row is locked by two or more transactions at the same time. This way we don't even incur in a performance hit — the only point where the mechanism enters the picture is when previous Postgres' versions would block, so it's a net win.

Right now I'm looking at Heikki Linnakangas' two-phase commit patch. With some luck I can help clean it up some before it gets to Tom, so that when he gets it, we can quickly have a fully functional 2PC implementation in time for 8.1.

Additionally, I'm looking at completing my shared dependencies patch before 8.1 freezes. I let it rot some; I partly blame Stephen Frost's SQL role support patch. Of course it's a lame excuse — it'd be cool to have both things in. But they conflict anyway so I'm leaving this behind a little.

Ok, I am amazed: the phantom Xid idea worked! Only I didn't name it phantom Xid because that name has a bad history. So I named it MultiXactId. The implementation is finished; it only needs some more testing, making sure it deals correctly with wraparound, and that it correctly truncates SLRU segments.

To recap: the idea is to allow a row to be share-locked. This means somehow storing multiple TransactionIds in the tuple header. This is done by using a shared counter similar to TransactionId, which I have dubbed MultiXactId. Each MultiXactId is associated with variable-length storage in SLRU, on which the set of TransactionIds resides. This MultiXactId is stored in the tuple's Xmax, just like a TransactionId is stored in case of a SELECT FOR UPDATE.

I just posted the patch; probably Tom will have some disagreements with the way some things are done, but he already expressed approval of the general idea, so I expect that eventually this is going to be committed.

I feel somewhat ashamed that it took me two weeks (counting from the last blog entry) to complete the implementation. I hope not everything takes me this much, or I won't be having a lot done.

After this I'll return to the shared dependencies patch, and then I don't know what my next project will be ... maybe I'll continue trying to figure out a way to do the translation of the docs, which is badly needed.

After wasting several days trying to come up with a way to make Postgres' lock manager spill to disk, I gave up and ICQed Bruce. This was mainly motivated by the fact that several people complained that the idea of having any process sleep on I/O waiting for the lock manager was a loser. After thinking on it for a while I cannot but agree. I even considered writing some sort of lightweight B-Tree handler on top of SLRU, so we could quickly find lock entries on disk by their LOCKTAG; this is better than using a plain sequential structure, but it's difficult to get right, it's lots of additional code, and it will still need I/O for getting locks :-(

So when I talked to Bruce, he simply said it was a bad idea to use the lock manager to lock tuples; we can't afford to have that many lock entries. In a way it was a disheartening idea. He wondered that maybe we should use some sort of Phantom TransactionIds, an idea on which hearing I almost freeze on terror – I had already heard of Phantom Xids from him, during the nested transactions project, and it was very painful (to me) to code and in the end it was an utter failure. (Althought coding that was a very interesting exercise from which I learned a thing or three – the patches and the disappointing comments from Tom are here.)

So I spent the next day thinking on phantom Ids and felt miserable ;-) because I didn't really see how would they apply to this case.

Turns out it's surprinsingly easy to have it all fit together. Phantom Xids in this case are just a means of indicating a set of TransactionIds; and we store the set on SLRU. So when somebody wants to lock a row, he creates a set; and when somebody needs to lock exclusively, he can sleep on every TransactionId of the set. The only tricky part is how to store variable length content on SLRU, but that is easily solved by using two SLRU areas.

So far I have coded a good deal of the whole thing; though it still doesn't do everything yet, I have now confirmed that the idea is perfectly workable and I'm sure it will have perfect performance.

This means we will have shared row locks for 8.1! I can't help but feel good about that. No more deadlocks in foreign keys. Cool!