Once I had installed PostGIS it was relative simple to create a spatial database instance. For convenience the first thing I did was create a database user with the same name as my Linux username. This means that the default Postgres authentication system will just work transparently:

sudo -u postgres /usr/bin/createuser -s -e -P kms

If you run this command as is you’ll be prompted to enter a password for the newly created user, this is not strictly necessary for what I’ll be doing today, but it will be needed as I install and configure tools to use the database. You should also note that this command will grant this user superuser privileges across the entire Postgres installation, you probably shouldn’t do this on a production system.

The first command creates the database, which I’ve chosen to call “opendata”. The next command configures the new database to support the PL/pgSQL programming language which is used by the functions that make up PostGIS. The third command loads the PostGIS functions. Finally the fourth command creates an new table in the database called spatial_ref_sys which holds information about different spatial reference systems and allows us to use database functions to transform between these systems. Note that I’ve given the first two commands the -e flag, this forces the commands to echo the SQL statements to the console so you can see what’s happening.

To confirm that the database has been created I connected using the psql command and listed the tables:

The OS OpenData Strategi shapefile dataset that I previously downloaded arrives as a Zip file. Unzipping this gives me a top level folder called Strategi Shape. Under this are three directories: data, doc, and gazetteer. I’m going to ignore the gazetteer data for the time being, and the doc directory contains README files and licencing information.

The data directory contains two sub-directories: GB_NORTH and GB_SOUTH. These directories contain the actual shapefiles. The shapefiles contain three different types of geometric data: LINES, POLYGONS, and POINTS, the first step is to create a table for each data type:

Replace “/path/to/” with the path to your unzipped data. The -p flag to shp2pgsql triggers prepare mode, in this mode shp2pgsql only creates the tables and does not populate them. The -I flags creates an index on the geometry column in the table. You could do this as a separate step, but it’s convenient to do it here. Finally the -s 27700 instructs PostGIS about the datum our data is using, in this case OSGB36. I’m planning a further article to go into the whole datum thing in more detail.

Notice that the output of shp2pgsql is piped into the psql command. This is because the output of shp2pgsql is a sequence of SQL commands that create and/or populate the database. If you chop off the pipe and the psql command you’ll see the SQL echoed to your screen.

At this point I can connect to the database again using psql and examine the tables that have been created:

This is where I ran into a problem. While trying to populate the point table I kept getting constraint errors. After battering at the problem for a couple of hours I turned to the GIS Stack Exchange website for help. This is an excellent resource for those wanting to ask questions (or provide answers) on issues about GIS. After a little bit if investigation and discussion user amercader came up with the answer. There was a mismatch between the geometry that shp2pgsql was using to create the table (MULTIPOINT) and the geometry of the data that it was trying to load (POINT).

The solution was to drop the table from the database, re-run the shp2pgsql command to create the database but direct the output to a file rather than passing straight to psql:

This could take some time to complete depending on how fast your machine and disks are, but once it’s done that’s the database set up and the data loaded. The -a flag tells shp2pgsql to append the data rather than the default which is to drop the existing table and then load the new data. To check that the data has been loaded I counted the rows in each table: