Migrate your site from MySQL to PostgreSQL — Part 1

When my site, Design Harbor, started up (then as Abscissa Tech), there was one sensible choice for a PHP-driven, database-backed Website: MySQL. Then, in the summer of 2001, I ported the site to PostgreSQL (also known simply as Postgres) and haven’t looked back.

This first in a two-part series describes my motivation for switching to Postgres, and explains step-by-step how to convert existing MySQL data to Postgres. The second part will explain how to change the PHP backend to ensure that it works with the new database system.

The Motivation To Switch

I first read about Postgres in an article at PHPBuilder. It compared Postgres and MySQL, which I used exclusively at the time. But, after I read the article, Postgres became lost in the information repositories of my brain — it was an interesting fact, but one which was irrelevant to my Web design.

I continued to use MySQL and attributed the MySQL failures and crashes to the incompetence of my hosting service, which I couldn’t change at the time. As soon as I was able to, I changed Web hosts. This new host’s service and philosophy was very different from my previous one, as they had a higher commitment to security and stability than did my old host. The new company tried to convince me to use Postgres, suggesting that it was more stable, but I passed off the idea and pointed out that my entire site was already coded for MySQL. They relented and installed MySQL specifically for my site. That’s when the problems began.

My first job was to copy my old MySQL data from the old server to my new Web host’s MySQL server. First, I dumped the existing data to an SQL file, which I transferred to the new Web server. I then proceeded to import the SQL file. At the sight of the several thousand line file, MySQL promptly crashed. When the company restarted MySQL, about half of my data was there, but MySQL worked only intermittently. Finally, they had to delete the information that was imported so I could try again. MySQL crashed again. This cycle continued several more times, until I finally decided to split up my SQL file into several pieces. I had to try several times, but I was eventually able to import most of my data into the new MySQL server. Everything was fine, and I sighed with relief.

Over the next several moths, MySQL crashed almost bi-weekly, culminating in a really bad crash near the end of June 2001. This time, the data stored by MySQL was hopelessly corrupted and unrecoverable. I had an SQL backup file, but in no way did I relish the task of restoring the data from that backup, based upon my previous experiences with importing large amounts of data into MySQL. About that time, the hosting company encouraged me once again to port the site, which was written in PHP, to Postgres. Frustrated with MySQL, I finally decided to do it.

Transferring Data From MySQL to Postgres

Porting data from MySQL to Postgres is a little challenging, as Postgres supports a more standard version of SQL than MySQL, and it isn’t possible to make a straight SQL dump over to Postgres. However, the SQL syntaxes are quite similar, so in my case, it didn’t take too long.

Converting the MySQL Dump

First, ask your Web host to create a database for your account. Databases in Postgres, just like MySQL databases, consist of a collection of tables which contain the actual data. Then make a SQL file dump of your MySQL database using the mysqldump command.

mysqldump -u username -p databasename > sqldump.txt

Download the entire SQL dump file to your computer using FTP. Now that you have a SQL file on your computer, you can turn it into a file that Postgres will import.

First, cut all the MySQL CREATE TABLE queries out of the dump file and paste them into a separate text file. The next step is to redefine the tables using language Postgres can understand.

Postgres’s table-creating SQL is similar, but not identical, to MySQL’s. Here’s an example:

In a Postgres table definition, the name of the field should be followed by the field type. A few common field types are given in the previous example, but you can find an extensive list in the Postgres documentation on Data Types. Postgres has a huge selection of data types for different tasks, and can store all manner of data, from Internet addresses and monetary information to geometric object definitions. Here’s a quick overview of the most commonly used data types.

The SERIAL type field is the equivalent of a MySQL auto-incrementing unique ID. Postgres databases can contain SQL functions and reports as well as tables and records; the auto-incrementing SERIAL function is automatically added to the database when you define a field of type SERIAL in your table. As the auto-incrementing system resides outside of the actual table, we can custom-build logic for the unique id’s value and do other wizardry. When porting from MySQL to Postgres, the default action is sufficient.

The VARCHAR type is exactly what it purports to be, a variable-length text field. The length of the field is defined by the value in parentheses. For example, VARCHAR(5) defines a field that can contain up to 5 characters of text.

SMALLINT, INT, and BIGINT are used to define integer items. SMALLINT fields can store numbers ranging from -32768 to +32767 (this actual size may vary slightly depending on your computer type; the previous is the most common system integer size). INT fields can store larger numbers from -2147483648 to +2147483647. BIGINT field types are for anything bigger and have no size limit.

REAL field types are for real numbers that contain decimal fractions. They can store up to 6 decimal places. DOUBLE PRECISION fields are similar, but they can hold up to 15 decimal places.

A BOOLEAN field is either true or false, one or zero. It is identical to its MySQL counterpart.

TIMESTAMP fields are like their cousins in MySQL. A timestamp is updated to the current date and time each time the record is updated. Postgres time fields can also contain timezone information. For details on the more complex uses of Postgres time data, read the date time page of the PostgreSQL documentation.

Creating The Tables

Once you create a separate table definition SQL file and redefine the tables in Postgres-speak, check to make sure each CREATE TABLE query ends with a semicolon – a Postgres requirement. Then connect to your Web host using a tool like telnet, and create the tables using the following method.

First, open your table definition file with a text editor. Then log in to your host’s machine and run the Postgres Interactive Terminal, psql, by typing psql. The default authentication scheme uses your telnet/FTP username for the Postgres account. This allows Postgres to automatically authenticate your identity without forcing you to type in a username and password combination. Your Web host might authenticate differently, in which case you can give the psql program additional arguments like this: psql -d databasename -U username -W. The -d allows you to specify the database, -U specifies the username, and -W asks psql to prompt you for a password.

Once you have psql running, paste each CREATE TABLE query individually into psql and press the enter key. If you make a mistake in your SQL syntax, psql will tell you what went wrong. By entering each table individually, you receive debugging information on a per table basis, which simplifies things greatly.

If, after you enter the table definitions, you realize you missed a field or two, you have two options. You can use the ALTER TABLE command, or just drop the table using DROP TABLE, and then enter it again once you’ve fixed it. If you use the second method, you will run across a Postgres caveat that leaves certain artefacts from dropped tables that get in the way of creating new ones.

To use the DROP TABLE command, type DROP TABLE practicetable;. This will drop the table, but when you go to redefine the table, you will receive an error. Dropping a table doesn’t eliminate the sequence that goes along with fields of type SERIAL that are in the table. These leftover sequences get in the way when you try to recreate the table. To solve this, delete the sequence using DROP SEQUENCE sequencename; before you drop the table. Unfortunately, the sequence name isn’t the same as the SERIAL name. When you define a SERIAL type field, Postgres automatically generates a sequence named like this: tablename_colname_seq. In the case of practicetable, the DROP SEQUENCE statement would look like this: DROP SEQUENCE practicetable_someID_seq;. You’re then free to drop the table and start again.

After you finish inserting the tables, type z to double-check a list of the tables. Typing q will quit psql when you’re done. It’s finally time to prepare your data for import to Postgres.

Cleaning Up The Dump

Because MySQL keeps mostly to the SQL language standard, importing actual data from a SQL dump file isn’t too hard. However, there is a caveat that requires us to edit the SQL dump before we hand it to Postgres.

The main data-entry difference between MySQL and Postgres involves quotes. In Postgres, string values (values that contain text) must be surrounded by single quotes. MySQL allows both, but the mysqldump program fortunately uses single quotes, which are Postgres-friendly. However, MySQL and Postgres differ on what to do with quotes that appear within a string. Where MySQL represents quotes with a "", Postgres requires a ". Use your text editor’s search/replace features to replace all of the "" with". Interestingly, Postgres uses '' to represent single quotes; MySQL uses that method as well, so you won’t have to change the single quote escaping.

Importing To Postgres

After you fix the quotes in the SQL dump file, upload the file to your Web host’s machine log in as you did before to create the tables. Navigate to the directory where the SQL dump file resides. Start psql, but with a few different command-line options: psql -f sqldump.txt, and replace sqldump.txt with the new name you give the file. This command takes the entire SQL file and imports it right into the proper Postgres tables. As before, you may have to add extra options so psql can authenticate you properly. If you receive any errors, psql will tell you where they occurred. Search for that part of the file, try to figure out and fix the problem, and feed the problematic query manually into the psql command-line tool. In my case, everything imported without a hitch and I was ready to finish the job. It wasn’t until a little later that I noticed another problem.

After I started to use the new Postgres-driven site, I ran across another incompatibility issue between MySQL and Postgres. Postgres sequences, used by SERIAL type auto-incrementing fields, start with 1 and auto-increment by one each time a record with a SERIAL type field is inserted. However, when I imported the MySQL dump, the SQL in that dump defined the values of my main integer key. In my case, I had unique IDs counting the whole way to 60, but the sequence was still at one. Each INSERT statement I made failed because the sequence was giving a result that was not a unique ID. Frustrated, I ran 60 INSERT statements to bring the sequence up to the proper point, but I later learned a quicker fix from a friend who regularly uses Postgres. Here’s what he told me to do:

Connect to your host using a terminal program like telnet. Then start the psql program as you did when you defined the tables. First, identify the highest ID value that exists in the table. This can be done using SELECT fieldname FROM tablename WHERE fieldname=MAX(fieldname);. Then delete the sequence in question from the database using DROP SEQUENCE table_colname_seq;, where tableis the table name and colname is the name of the SERIAL field. Then recreate the sequence using CREATE SEQUENCE table_colname_seq START 61;, and replace 61 with the number of the highest ID value in the database plus one.

Installing a GUI Tool

Once I successfully put my data into Postgres, I needed a way for my non-Unix-savvy partner to successfully work with the data in the database. My choice for that with MySQL had been phpMyAdmin, a great tool for viewing and editing databases online. Fortunately for me, phpMyAdmin has been ported to Postgres in the form of phpPgAdmin.

The installation of phpPgAdmin is simple. First, download the latest stable distribution from the phpPgAdmin Website and put it somewhere on your Web host account where it can be viewed on the Web. Then log on to your host’s machine using a tool like telnet. Navigate to the directory where the phpPgAdmin.tar.gz file resides and type tar -xzvf phpPgAdmin.tar.gz to unpack the program (you may have to substitute a different filename; this is just used as an example). Next, move to the new subdirectory created by the unpacking process and read the README file.

Finally, open config.inc.php in a text editor on your Web host’s machine (vi, pico, and emacs are popular text editors; check with your host to see what they have available). Inside the file I found instructions for the configuration of phpPgAdmin. Fill out a few questions within the file and load up the page in a browser. PhpPgAdmin will prompt you for your username and login, and you can then begin to manage your database through its interface.

Conclusion

Postgres is a more stable, reliable database server that can handle a much larger load than MySQL. It also has more advanced features and sticks closer to the SQL standards than does its counterpart. The conversion of data from MySQL to Postgres can be done using the default SQL dump files, if you follow the suggestions described above. It can become a confusing task, but good help is available. If you find yourself stumped, you can find documentation resources on the very useful Postgres Website.

The next tutorial in this series will explore the differences between MySQL and Postgres access in PHP, as well as provide explanations, tips, and tricks to quickly make your existing code Postgres-friendly.