Sponsors

West 09 Talks

Talks

Video and slides are on their way! (You will need to be logged in to view slides)

Josh Berkus

You've heard that PostgreSQL is the highest-performance transactional open source database, but you're not seeing it on YOUR server. In fact, your PostgreSQL application is kind of poky. What should you do? While doing advanced performance engineering for really high-end systems takes years to learn, you can learn the basics to solve performance issues for 80% of PostgreSQL installations in less than an hour. In this session, you will learn: -- The parts of database application performance -- The performance setup procedure -- Basic troubleshooting tools -- The 13 postgresql.conf settings you need to know -- Where to look for more information.

Rod Anderson

For the small business support person being able to provide PostgreSQL hosting for a small set of specific applications without having to build and support several Pg installations is necessary. By building a multi-tenant Pg cluster with one tenant per database and each application in it's own schema maintenance and support is much simpler. The issues that present themselves are how to provide and control dba and user access to the database and get the applications into their own schema. With this comes need to make logging in to the database (pg_hba.conf) as non-complex as possible.

Matt Smiley

This is a basic primer aimed primarily at developers or DBAs new to Postgres. The format is a Q/A style tour with examples, based on common questions and pitfalls. Begin with a quick tour of relevant parts of the postgres catalog, with an aim to answer simple but important questions like:

How many rows does the optimizer think my table has?

When was it last analyzed?

Which other tables also have a column named "foo"?

How often is this index used?

What functions/aggregates are available?

How are they implemented?

What are the other concurrent sessions doing?

Is lock contention affecting my session?

Who is blocking me?

Who am I blocking?

Then explore slightly more involved questions like:

What is an execution plan?

What are access methods and join methods?

Why does join order matter?

How do I find which part of this bad plan went horribly wrong?

But won't adding yet another index fix the problem?

Why does my DBA keep telling me not to add gratuitous indexes?

Are the statistics "bad"?

How can I tell?

How are my columns' values distributed, and why should I care?

What are some common pitfalls, and how can I avoid them?

Selena Deckelmann

Bucardo is a mature replication system for PostgreSQL that supports asynchronous replication for both master-slave and multi-master systems. Originally designed for slow and unreliable networks, it has remarkable recovery ability, an easy to use command-line interface and development is active! Uses for Bucardo include: a slave read-only database, multi-master replication, data warehousing and just having fun moving your data around! Will include overview replication for PostgreSQL in general, a tour of features, a basic configuration walk through and the much feared live demo!

Joshua D. Drake

Adrian Klaver

Dabo is a Python based desktop application framework. The talk will demonstrate creating a simple application using the three tiers (database, business objects, ui) of the framework. I create my applications by hand coding. There is a GUI interface available and it will be introduced.

Josh Berkus

Most users know that PostgreSQL has a 23-year development history. But did you know that Postgres code is used for over a dozen other database systems? Thanks to our liberal licensing, many companies and open source projects over the years have taken the Postgres or PostgreSQL code, changed it, added things to it, and/or merged it into something else. Illustra, Truviso, Aster, Greenplum, and others have seen the value of Postgres not just as a database but as some darned good code they could use. We'll explore the lineage of these forks, and go into the details of some of the more interesting ones.

Aurynn Shaw

This talk discusses Exceptable, a sane exception propagation framework for PG 8.3 and above. We will be covering the need for coherent exception propagation in stored-procedure-based applications, how exceptions are handled in PG 8.3, the tools 8.4 brings regarding raising exceptions from PL/PGSQL, and how a consistent exception model benefits architectures heavily invested in stored procedures. Finally, we will discuss Exceptable, a PL/PGSQL and Python API for consistent handling of exceptions from the Database to the Application Layer.

Robert Hodges

Users have traditionally used database clusters to solve database availability and performance requirements. However, clustering requirements are changing as hardware improvements make performance concerns obsolete for many users. In this talk I will discuss how the Tungsten project uses master/slave replication, group communications, and rules processing to develop easy-to-manage database clusters that solve database availability, protect data, and address hardware utilization. Our implementation is based on existing PostgreSQL capabilities like Londiste and WAL shipping, which we eventually plan to replace with our own log-based replication. Come see the future of database clustering with Tungsten!

David Fetter

8.4 has brand new capabilities inside the database: Windowing functions and Common Table Expressions. You'll learn about each with practical examples to make your querying days more fun. Time permitting, we'll do some that are less practical.

Jeff Davis

UNIQUE indexes have long held a unique position among constraints: they are the only way to express a constraint that two tuples in a table conflict without resorting to triggers and locks (which severely impact performance). But what if you want to impose the constraint that one person can't be in two places at the same time? In other words, you have a schedule, and you want to be sure that two periods of time for the same person do not overlap. This is nearly impossible to do efficiently with the current version of PostgreSQL -- and most other database systems. I will be presenting Generalized Index Constraints, which is being submitted for inclusion in the next PostgreSQL release, along with the PERIOD data type (available now from PgFoundry). I will show how these can, together, offer a fast, scalable, and highly concurrent solution to a very common business requirement. A business requirement is still a requirement even if your current database system can't do it!

Mark Wong & Gabrielle Roth

pg_proctab is a collection of PostgreSQL stored functions that provide access to the operating system process table using SQL. We'll show you which functions are available and where they collect the data, and give examples of their use to collect processor and I/O statistics on SQL queries. These stored functions currently only work on Linux-based systems.

Jeff Davis

I'll be showing how the extensible pieces of PostgreSQL fit together to give you the full power of native functionality -- including performance. These pieces, when combined, make PostgreSQL able to do almost anything you can imagine. A variety add-ons have been very successful in PostgreSQL merely by using this extensibility. Examples in this talk will range from PostGIS (a GIS extension for PostgreSQL) to DBI-Link (manage any data source accessible via perl DBI).

Bill Karwin

Odds are if you develop database applications you have been asked to make a large table of textual data searchable. How can we do this most reliably and efficiently? Bill compares a range of techniques to search text data in PostgreSQL: - Ad hoc searching with the LIKE predicate. - PostgreSQL Text Search features. - Sphinx Search engine. - Google Custom Search. - Tagging architecture.

Bill Karwin

SQL is from Mars, Objects are from Venus. This talk is for software developers who know SQL but are stuck trying to implement common object-oriented structures in an SQL database. Polymorphism: Suppose your blog supports comments, but then your comments need to reference multiple types of content, for example news, blog articles, and videos. What then? Extensibility: We’ve all designed customizable software, allowing customers to extend a data model with new data attributes. See how to design flexible systems, while using efficient SQL queries. Hierarchies: Tree-structured data relationships are common, but working with trees in SQL usually implies recursive queries. There are a few solutions to solve this more cleanly, including Common Table Expressions in PostgreSQL 8.4. ActiveRecord Dos and Dont’s: Web development frameworks have popularized the use of design patterns, but when it comes to multi-table queries, complex views, and assignment of OO responsibilities, ActiveRecord falls short as a one-size-fits-all Domain Model.

Adrian Klaver

An exploration of various Python projects (PyRTF,ReportLab,xlwt) that help with presenting your data in formats (rtf,pdf,xls) that other people want. I will step through a simple data extraction and conversion process using the above software to create an RTF,PDF and XLS file respectively.

Joshua D. Drake

Magnus Hagander

PostgreSQL supports several options for securing communications when deployed outside the typical webserver/database combination. This talk will go into some details about the features that make this possible, with some extra focus on the changes in 8.4. The main areas discussed are: * Securing the channel between client and server using SSL, including an overview of the threats and how to secure against them * Securing the login process, using LDAP, Kerberos or SSL certificates, including the use of smartcards to log into the database The talk will not focus on security and access control inside the database once the user is connected and authenticated.

Scott Bailey

Few things we model in our databases are as complicated as time. The major database vendors have struggled for years with implementing the base data types to represent time. And the capabilities and functionality vary wildly among databases. Fortunately PostgreSQL has one of the best implementations out there. We will look at PostgreSQL's core functionality, discuss temporal extensions, modeling temporal data, time travel and bitemporal data.

David Wheeler

We're all used to unit testing our applications by now. The Extreme and Agile programming movements have done a great deal to promote unit testing, to the extent that many of us are now dependent on tests to assure that our applications work reliably. But how often do we test the database underlying our applications? Given that the database, as the repository for all of the knowledge and data for an application, just might be the single most important part of that application, the time for standardized database unit testing has come. This talk promotes the practice of writing and running unit tests that directly test the schema, storage, and functionality of application databases. Following a review of the available PostgreSQL unit testing frameworks, we'll review examples of testing tables, views, columns, constraints, indexes, triggers, and functions. The idea is to promote complete test coverage every aspect of a database, independent of application unit tests, to ensure reliably canonical data integrity.

Aurynn Shaw

This talk is a discussion on permissions systems, and how they relate to web applications. Generally, web applications tend to implement their permissions system asynchronously to the DB permissions system - if the DB's permissions system is used at all. We will cover why you should investigate a vertically integrated permissions stack, the benefits and drawbacks to doing so, and will cover a basic library implementation on Pylons, Simpycity and Repoze.who that utilizes a consistent permissions model.

Aaron Sheldon

Contemporary XML parsers store data transiently in memory, and require extensive database development for each XML schema. In the use of large XML based ontologies, and thesauruses the limitations of the DOM, and SAX parsers presents a significant bottleneck in development. But XML itself conforms to a specific syntax, and as such can be generally represented by a fixed table structure in a relational database. The author will demonstrate that through the isomorphism between directed acyclic graphs and the dictionary ordering on integer vectors, a table that records the structure of an XML document in a single column index can be found. Furthermore the single column index allows for queries of ancestry and descendency without recursion. Finally, as well as the database being developed in PostgreSQL, the XML parser was written completely plpgSQL.

Tutorials

Bruce Momjian

This talk is designed for PostgreSQL administrators. It covers all aspects of PostgreSQL administration, including installation, security, file structure, configuration, reporting, backup, daily maintenance, monitoring activity, disk space computations, and disaster recovery. It shows how to control host connectivity, configure the server, find the query being run by each session, and find the disk space used by each database.

Jim Mlodgenski

While there is no dominate monitoring solution endorsed by the PostgreSQL community, many of the largest users of PostgreSQL such as hi5 utilize Hyperic to monitor their data tier. Hyperic provides the framework for a robust monitoring and alerting solution that enterprises need. This tutorial will begin with a basic discussion on how to configure rudimentary alerts on PostgreSQL metrics and then dig deep into move advanced monitoring of additional PostgreSQL modules such as Slony and PGBouncer.

David Wheeler

So you've got testing religion, but the question now is, how do you test your database?

This tutorial introduces pgTAP, a comprehensive, easy-to-use test suite for PostgreSQL. We'll work from getting and installing pgTAP, to writing a simple test, to running the test and integrating it into your test environment. And then we really get into the fun stuff:

Kevin Kempter

PostgreSQL Backup and Recovery This talk will cover the basics of a backup / recovery strategy, plus modules on Point in Time Recovery (PITR) and how to setup and failover a warm standby server. Topics Covered/TOC:

Overview

Backup Options

pg_dump

pg_dumpall

Restore Options

psql

pg_restore

TOC files

PITR

PITR setup

PITR Recovery

Warm Standby

Setup

Failover options

Failover example

Kevin Kempter

This talk would present and end2end look at SLONY including an overview, details of a sample setup of a SLONY cluster, setup details for switchover and failover scripts plus a look at incorporating PG Pool II into the cluster. Topics Covered/TOC:

Overview

Installation & general info

Slonik

Setup of a sample SLONY cluster

initialization/setup of the cluster

preparing the slave(s)

the slon daemon

starting replication

Walk through of the initialization/setup for the sample cluster

Switchover

Failover

PG Pool II

install / setup

Configuration

Setup for automated failover

Walkthrough of end-2-end sample instance

Aurynn Shaw

This mini-tutorial covers building a small application on Howdah, an open source, Python based web development framework by Commandprompt, Inc. We will cover the full process of designing a vertically coherent application on Howdah, integrating DB-level stored procedures, DB exception propagation through Exceptable, DB access through Simpycity, authentication through repoze.who, permissions through VerticallyChallenged, and application views through Pylons. By the end of the talk, we will have covered a full application built on The Stack, and how to cover common pitfalls in using Howdah components.