Tag Archives: pyyaml

In the previous post, I covered installation of Git, PostgreSQL and Python under Windows in order to set up a Pyrseas testing and development environment. Today, we’ll explore installation of the Python dependencies.

However, if you try pip install psycopg2 (or even easy_install psycopg2), it’s very likely you’ll see the error:

error: Unable to find vcvarsall.bat

As best as I’ve been able to determine the only way to get around this is by installing Microsoft Visual Express Studio. According to this email and this post, for Python 2.7, it must be the 2008 Express Studio which, to make things interesting, is no longer available from the download links given. If you search enough you may find it here (download vcsetup.exe) (Update below). After installing VC++ 2008 Express (and if you haven’t installed Strawberry Perl—a later installment in our saga), the pip install psycopg2 command should succeed.

However, if you try to import psycopg2 at the Python 2.7 prompt you may be surprised with a traceback ending in:

Ahh … the mysteries of Windows DLLs. Don’t despair: this probably means you don’t have the PostgreSQL DLLs (libpq.dll in particular) in your PATH. Add one of the postgres\x.x\bin directories to your PATH and (hopefully) you should then be able to connect from Python 2.7 to your PostgreSQL installations.

OK, let’s turn our attention to Python 3.2. If you followed the Hitchhiker’s Guide instructions previously and added C:\Python27 to your PATH, you’ll now have to change that to C:\Python32. Suggestion: create a couple of batch scripts, e.g., env27.bat and env32.bat, so you can easily switch between the two Python installations. And don’t forget to add the postgres\x.x\bin directory as well.

For 3.2, once again run the distribute_setup.py script, easy_install pip, and pip install pyyaml, as for 2.7 above. Then you can run pip install psycopg2, and if you installed VC++ previously, the gods may smile upon you and you may see the following message:

Successfully installed psycopg2
Cleaning up...

At this point, if you followed along, you’ll have four versions of PostgreSQL (8.4 through 9.2), two versions of Python (2.7 and 3.2), each with PyYAML and psycopg2, ready for testing. If you’re anxious to check things out, invoke one of the PATH setup scripts and try the following, from the Pyrseas source directory:

This has to be created and maintained manually which may be OK if you’re starting off on a new database project, but not so cool if you have an existing database with a hundred tables (or hundreds of thousands of tables–see page 6).

Thus for Pyrseas I decided to create first a tool (dbtoyaml) to output an existing database schema in YAML format. Structurally, a single hierarchy, e.g., database → schemas → tables (somewhat like pgAdmin’s left panel), appeared preferable to Andromeda’s design. After a couple of iterations, the Pyrseas YAML specification ended as follows:

PyYAML’s load function can read the YAML file and turn it into the Python dict, so the second tool (yamltodb) can deal with the structure above directly.

The most important implementation decision for yamltodb was how to compare the input map to the map created from the current database’s catalogs. A few options were possible:

SQL-based comparisons

Map comparisons

Internal structure comparisons

SQL Comparisons

This is the approach taken by Andromeda. Its essence is described in step 4 of Dictionary Based Database Upgrades. It consists of storing the input specification into a set of tables and the existing catalogs into a parallel set (potentially one could use the information_schema views), and then issuing SQL queries to compare the two sets.

The presumed advantage of this method is that some of the operations can be done –as SQL set operations– in the DBMS. However, the input spec first has to be inserted into the tables, a row-at-a-time, thus negating the former benefit. In addition, the affected database has to carry at least one set of extraneous catalog-like tables.

Map Comparisons

This is the naive approach consisting of “walking the trees,” i.e., comparing the input map to an internal map created from the current database. This works, up to a point. For example, if a table in the input map is not present in the internal map, it’s easy enough to call a function to generate a CREATE TABLE statement.

However, as most readers know, a PostgreSQL database usually has many dependencies between the various system entities, e.g., foreign keys on primary keys, views on tables, etc. So comparing hierarchies isn’t ideal.

Internal Comparisons

In a forthcoming post, I’ll explore this method as well as other issues such as generating the SQL statements in the correct order.

Version 0.1

Carol starts by creating the film table in her database, caroldev, using the CREATE TABLE script, version 0.1, which we saw in the original article. She then uses the Pyrseas dbtoyaml tool to ouput the database specification:

This should be mostly self-explanatory, but let’s go over it. Indentation indicates structure (nesting) and hyphens are used for each element in a list. Thus, there is one schema (public) and one table (film) defined within it. The table has three columns (id, title and release_year) with the given data types and none of them are nullable. The PRIMARY KEY (film_pkey) is on the id column and there is a CHECK constraint on the release_year column. By the way, the alphabetical order within each level is simply the default output format of PyYAML.

Carol can redirect the output of dbtoyaml to a file, say, film.yaml and store it in a Git repository or some other VCS. Dave will then be able to pull the file from the repository and use it to generate the SQL statements needed to create the table in his database, or the development, test and production databases, using the yamltodb tool as follows:

If he is satisfied with the SQL, he can pipe the output to psql, e.g.,

$ yamltodb -1 davedev film.yaml | psql davedev

Version 0.2

For the second version, Dave issues an ALTER TABLE to add the length column, invokes dbtoyaml to create a new film.yaml and commits that to his repository. Carol uses another ALTER TABLE to add the rating column in her database, and stores a new film.yaml in her repository.

Manny, the integration manager, merges the new film.yaml versions into the project-wide repository. He uses yamltodb to output the statements needed to upgrade to version 0.2:

Version 0.3

For the next round, Carol and Dave issue the statements shown under Version 0.3 in the previous article and commit newer versions of film.yaml which Manny integrates into the project repository. The yamltodb output against the project database is as follows:

Summary

I cannot objectively review the tools I have created, but sincerely hope others will find them useful in managing changes to PostgreSQL databases in conjunction with a VCS.

Pyrseas dbtoyaml and yamltodb require Python, psycopg2 and the PyYAML library. They currently support the basic PostgreSQL features (schemas, tables, PRIMARY KEYs, FOREIGN KEYs, UNIQUE and CHECK constraints, indexes) but the intention is to expand them to cover all features.

The primary audience of the tools are DBAs and developers, but they can be used in conjunction with something like depesz Versioning to implement a stricter system for control over production databases.

I’d like to take this opportunity to thank Adam Cornett, an early adopter/tester, who found an issue with a FOREIGN KEY across schemas, which has been fixed in the master repository.