27 October 2010

Over at the Command Prompt blog, Joshua Drake makes a (probably deliberately) provocative point about “users” not wanting replication, as opposed to “customers” who do. I’ll confess I’m not 100% sure about his distinction between “users” and “customers,” so I’ll just make something up: Users are the people sitting in front of the application, entering data, buying shoes, or doing whatever it is that the database enables; customers are the CIOs, CTOs, Directors of Engineering, and the other people who make purchasing decisions.

He writes:

Yes, Command Prompt customers want replication. Yes, PostgreSQL Experts, EntepriseDB and OmniTI customers want replication. However, customers are not users. At least not in the community sense and the users in the community, the far majority of them do not need or want replication. A daily backup is more than enough for them.

Well, yes, as far as it goes, he’s absolutely right. Users don’t need or want replication. They don’t need or want PostgreSQL, for that matter; VSAM, flat files, or a magic hamster would be fine with them, too, as long as the data that comes out is the data that goes in.

But for how many users, really, is “It’s OK if you lose today’s data, gone, irretrievably, pffft, yes?” really an acceptable answer? Very few. Very very few, and getting fewer all the time. One of the strongest pushes behind moving services into the “cloud” (i.e., external hosting providers of various kinds) is that they provide near-constant recovery and fault-tolerance. Users don’t care if their data is protected by hardware-level solutions like SANs, or software-level solutions like replication, as long as it is protected.

Users who profess not to care about this are either not putting authoritative data into a database, or just haven’t had the inevitable data disaster happen to them yet.

For me, the biggest feature of PostgreSQL’s 9.0 replication is that it is much, much easier to set up than any previous solution. Slony is a heroic project, and has lots of happy customers using it extensively, but it is notoriously fiddly and complex to set up.

Like a lot of technologies, replication hasn’t been a demand for a lot of PostgreSQL implementation because the cost didn’t seem worth the payoff. 9.0 brings the implementation cost way, way down, and thus, we’ll start seeing a lot more interest in putting replication in.

Maybe I misread Joshua’s original post but I think the distinction he was trying to make was that for ever large scale customer of OmniTi and the others that use PostgreSQL and want replication, there are many many more small scale users of PostgreSQL (independent web devs and the like) who do not. He was defining users as everyone who downloads and installs PostgreSQL and customers as the small subset of those users who pay for support.

I should let Joshua clarify what he means, but I think you got the definition of “user vs customer” wrong. I’d say what was meant was “customers” are professionals with big databases or a 100% uptime imperative and hardware ressources to match, whereas “users” are a less demanding crowd, by any or all metrics.

I happen to sit on both sides of the fence, at work and at home, and while replication is a must-have for work, it is a pointless hassle for home (only 1 server available, and pg_dump is more than enough).

With that being said, let the debate rage on about where precisely is the dividing line, and how big are the crowds in each camp.

I love it. Great discussion all around. My premise is very simple, we can not allow the few customers that drive companies like Command Prompt, PgExperts etc… to define what the community (and thus users of the community) deem valuable. The users, whether hobby, independent consultants (web devs etc..), by far do not need replication. Heck, most of them have databases that are small enough that they can backup every hour without issue. Tom Lane said, “The majority of the community won’t use replication.” Tom Lane, is right.

For me, replication is a solved problem with existing tech. I am much more interested in plpsm, sqlmed, real partitioning, and triggers on views.

I hope everyone is coming to PgWest maybe Berkus, Christophe and I will have it out in a round table :P

Tom Lane said, “The majority of the community won’t use replication.” Tom Lane, is right.

I don’t doubt he’s right. I’d bet that the majority of PG installations don’t use WAL archiving, PITR, PL/Python, or any number of other advanced features of PG. It doesn’t mean that they’re without value, though.

And context is everything. Tom’s quote was in the context of, “What configuration options should we have, and what should their defaults be?” not “What features should be included in PG strategically?” It’s one thing to say that we shouldn’t set up the default GUCs assuming that the install will be using replication; it’s another thing to say that replication shouldn’t be in the core.