Pivotal Greenplum Review

We use Greenplum for production but not for development and testing because it consumes too much resources

We use the scrum methodology to manage our engineering work. And
while this does give us the flexibility to quickly respond to changing
business needs, it also means that our databases’ schemas are not set in
stone. To accomodate frequent changes to the way we structure our
data, we’ve adapted Ruby on Rails database migrations for use beyond our Rails apps.

Our first challenge was to find and extract the migration’s
functionality from Rails so that we could use it without needing to
bring in the other features of Rails we didn’t need. Thankfully, this
turned out to be relatively simple because almost all of the
functionality can already be accessed by Rake tasks, so it was just a
matter of building a Rake file that contains the tasks we wanted from
Rails. The only thing we had to add to get started was a Rake task for
creating new migrations, since Rails creates them either automatically
when creating new model objects or through the `rails generate
migration` command.

We quickly ran into problems, though, because we use Greenplum, a
DBMS built on Postgres that adds support for features that help
accomodate big data. Unlike standard Postgres, Greenplum adds features,
such as table partitions, not found in most DBMSes, but Rails is
designed to be DBMS agnostic so you can easily switch between, say,
Postgres and MySQL and SQLite without any more trouble than changing a
single configuration file. So we decided to cut around Rails’s database
abstractions and instead directly write SQL in our migrations and dump
SQL structure files rather than Rails schema files.

Only this didn’t work because, to complicate matters further,
although we use Greenplum in production and staging environments, for
local testing and development we use Postgres because Greenplum has
minimum requirements that ended up consuming too much of our
workstations’ resources. So we ultimately ended up developing a hybrid
solution that allows us to support Greenplum and Postgres in the same
migrations.

The two main aspects of the solution are selective application of
options when making changes to the database and keeping two separate
dumps of the database, one for Greenplum with Greemplum-only syntax and
one for Postgres with Greenplum-only syntax removed. For example, we
rewrote the Rails `create_table` migration function to add support for
Greenplum options like partitions, data distribution, and append-only
tables, but then have the function ignore those options when running the
migrations against our development Postgres databases. This allows us
to use a single set of migrations on all of our databases, drastically
simplifying what could have otherwise been a gnarly challenge.

By no means is our solution complete or perfect. It only supports
those features of Greenplum that we use, and new features only get added
in as we need them, plus there is some risk associated with keeping two
separate schema dumps and the potential for them to get out-of-sync.
Still, compared to maintaining a database with frequently changing
schemas without the aid migrations, it’s lightyears better.

Disclosure: I am a real user, and this review is based on my own experience and opinions.