I have made some progress towards the longstanding goal of drawing nice diagrams of database schemas. Firstly, I’ve figured out how to use yEd‘s Entity Relationship node types as table nodes. These special node types have both a node label, and an additional content field. The additional field is normally free text but can also contain HTML data (akin to GraphViz‘s support for HTML nodes).

So a database table can be drawn with its name in the label, and a list of fields in the node body. And, with HTML mode, that list can be a table with columns for name, type, PK/FK status, etc.

But yEd has not quite caught up with GraphViz. In GraphViz, cells in an HTML table can be identified with ports, and an edge drawn between nodes can then target specific ports in each. yEd only supports a handful of predefined ports per shape (the centre, each side, and each corner). Furthermore, its automatic layout algorithms ignore the port data, and draw edges to the nearest part of a node. Still, those layout algorithms are the great strength of yEd and flexibility vs layout is a reasonable tradeoff.

Since entering the HTML contents by hand is time-consuming and prone to error, I’ve written a quick program to do it. The program, pgschemagraph.py, is run against a live PostgreSQL database. It will examine the table and foreign key definitions in the given schemas, and generate a GraphML representation of them.

As you can see, pgschemagraph.py has found the tables, their fields and types (including standard SQL types, user defined types like “cube”, and domains like “imageid”), the primary keys (indicated by underscoring) and whether they can be NULL (indicated by italics). It has also discovered the foreign key relations between tables.

Because the program works in terms of abstract database definitions, rather than nicely drawn diagrams, it does not do much layout. When first loaded in yEd, this file will show the tables stacked atop one another. The most pgschemagraph.py can do is guess the height of each node based on the number of rows in it (based on a heuristic of 22 points per line in the default font; it can’t do the width, because it does not know how wide each character is).

But it does perform about 80% of the work required to draw a good diagram. A human can easily and quickly do the rest.

At present it uses the standard information_schema catalog rather than the PostgreSQL-specific pg_catalog. The aim was that this would make it easier to port to other systems that support information_schema. The downsides are that information_schema is actually slightly difficult to use (since it’s very generic), and does not provide some useful information such as inheritence and partition relationships. In some of my schemas, tables are partitioned into multiple subtables. It would be nice to that represented in the diagram, rather than have the subtables floating around unconnected. So a rewrite to use pg_catalog may be in order. (Porting to alternative systems will then require a more system-specific approach).

More information can always be shown on the diagram. One idea is to allow different colours to be used. This could be done by schema, or by grouping of tables.

It would be possible to have the program run yEd’s layout engine directly. Sadly, yEd is proprietary software, and however much I appreciate its help in laying out my diagrams, I’m averse to depending on it directly. Generating the standard GraphML output (and potentially .dot output, for GraphViz), and running the program myself is a reasonable compromise.