PAN graphs

From VoroWiki

In teaching a programme in geographical information systems (GIS), the instructor often finds extensive confusion over appropriate spatial data structures for computer storage and manipulation of polygon data-either when the student attempts to develop his own data structure for a particular assignment, or when he needs to compare the internal structures of various commercial systems. One tool developed to facilitate class discussion is the PAN graph, and the concept appears to have value in clarifying problems and suggesting plausible solutions. There is a journal paper available on PAN graphs, see Gold (1988)[1].

Contents

Definition

The PAN graph (or, more properly, directed multi-hypergraph) has three vertices, denoted P (for polygons), A (for arcs) and N (for nodes), preferably drawn in the order shown in the figure on the right (other terms may, of course, be as readily used). Each of these vertices represents one of the three possible classes of graphical data that may exist on the underlying map or data structure being analysed. Thus nodes are zero-, arcs are one- and polygons are two-dimensional objects. The fundamental concept is that a map is in fact a graph, and graphs only have these three classes of objects or entities. Thus, each vertex on the PAN graph represents a data or record class rather than any specific element of that class. All three vertices may or may not exist in a PAN graph for any particular data structure.

(a) The vertices of the PAN graph. (b) Pointers from polygons to arcs. (c) Mutual pointers between polygons and arcs. (d) Polygons have pointers to nodes only by way of arcs. (e) Triangle-based data structure. (f) Triangulation structure showing geometric walk.

While the terms used here were selected primarily for euphony, the GIS/automated cartography field has generated many equivalent terms. If strict terminology is required perhaps the older field of graph theory should be invoked and reference made exclusively to regions, edges and nodes, hence a REN graph. In this paper the PAN graph terminology will be used.

Any particular data structure with pointers between polygons, arcs and nodes can be represented by a particular PAN graph and the basic properties of the data structure examined by starting with the three basic object classes and inserting arrows between vertices on the PAN graph whenever a data record in the data structure under discussion has a pointer to any neighbouring objects of the same or different class. (The terms "vertices" and "arrows" on the PAN graph are used to avoid confusion with the "nodes" and "arcs" on the map being analysed.)

This is most easily described by example. If all polygons in the data structure have pointers to the arcs forming their edges, then the PAN graph has an arrow between the P and A vertices ((b) in the figure). This arrow has a direction from P to A. In addition, if the data structure specifies that each arc has pointers to the two polygons it separates, then the PAN graph will have a second arrow from A to P ((c) in the figure).

This technique permits the ready display of the linkages between different data classes in the data structure and the comparison of different schemes. Graphs, however, are of particular value because they may be traversed to see if it is possible to reach one node from another, and in how many steps. An examination of some particular PAN graph, (d) in the figure for example, may show that nodes may be reached from polygons only by way of arcs.

Retaining the proposed structure of figure (d), does this two-step process for finding the nodes around a polygon impose any significant costs for the intended applications? Would a different structure be better, possibly trading reduced computer or disc access time for increased storage size? Again, if there are no pointers away from nodes in the proposed structure, is it expected that there would ever be a need to find the polygons or arcs adjacent to a node? Draw the PAN graph and the potential problems become immediately evident.

Geometric alternatives to pointers

Thus far we have considered access between data types purely by way of pointers; we have used a graph-theoretic approach. However, maps are not just graphs; objects have positions as well. Thus, if we have no pointer from nodes to polygons, it is alternatively possible to obtain the node (x, y) coordinates and use point-in-polygon techniques to walk through the network until we find a polygon with this (x, y) location "inside" it (or at least on its edge). This procedure is described in Gold et al. (1977)[2]. (We ignore the possibility of simply examining each object in the map until we find a match; this is considered to be cheating-as well as being very expensive for large data sets.)

A good example of a locational walk occurs when working with triangular networks. In that particular case the polygons are always triangles. Each triangle has pointers to each of the three data points (nodes) forming its vertices and also to each of the adjacent triangles-no edge (arc) records are needed-hence the resulting PAN graph of figure (e). However, although no pointers exist from nodes to polygons, using the geometric coordinates of the node permits the location of any one of the triangles having that vertex by performing a network walk based on geometric criteria. (Graph-theoretic and geometric methods frequently complement each other in problems of spatial data handling.) To distinguish access of this type, which is inevitably slower than direct pointer look-up but much superior to brute-force searching, a dashed line (and arrow) are used on the PAN graph (figure (f)).

Dual graph representations

In addition, more work is being undertaken on dual-graph representations of spatial data. It comes as a shock to the student when he has laboriously worked his way through a conventional data structure, and then through the dual representation (where polygons are represented by nodes and nodes expand to be regions; e.g. the relationship between a Delaunay triangulation and the associated Voronoi polygons) to find that the PAN graph and related data structure of the dual are represented merely by interchanging the letters P and N on the original PAN graph. This way of expressing the alternate representations of a map helps clarify and systematize what he has just learned. For example, if figure (f) represents the Delaunay triangulation model, then the Voronoi data structure can be easily represented by exchanging the letters P and N, and then re-orienting the graph (figure (a) on the left). For a graphical illustration of this see the duality between the VD and the DT.

This new PAN graph makes it very clear that we can readily represent Voronoi polygons without defining any edges and without creating any polygon record other than the geometric coordinates of its defining data point. In addition, our original triangles have become nodes (at the triangle circumcentres) which possess pointers to each of the three Voronoi polygons that meet there. The nodes also have pointers to the three adjacent nodes, permitting easy traversal of the graph formed by the polygon edges. The dashed arrow indicates that, if we wish to find the nodes forming the edges of a Thiessen polygon, a walk through the triangulation will find one triangle having as a vertex the defining data point of the desired Voronoi polygon; the remaining nodes are located using the node-to-node pointers. Other configurations are, of course, also possible.

A simple example

Finally, we give a simple cartographic example. We have digitized our polygon edges (arcs) using the "spaghetti and meatballs" approach-we have recorded left polygon/right polygon and from node/to node for each arc. This gives us the PAN graph of figure (b). As we are primarily interested in calculating polygon areas, what should we do now?

In this case, as a minimum, we must be able to associate arcs with each polygon, either directly or indirectly, in order to calculate areas. There are no immediately obvious geometric approaches (dashed arrows) and in the absence of any further application information, there seems to be little value in implementing a direct polygon-to-node pointer. We must therefore obtain an appropriate program to generate polygon-to-arc pointers, our main concern. This would in addition give us access to the nodes surrounding each polygon, if they were needed. Since we already have arc-to-polygon pointers, it should be noted that generating the reverse pointer to one already in existence requires merely a single pass through the data set. In this case, each arc is read in turn and its two associated polygons identified. Each of these polygon records is then updated to include a pointer to the arc being processed. Upon completion of this pass through the arc file it may be desirable to read through the polygon file, for each record ordering the list of arcs in (anticlockwise) order. Figure (c) gives the PAN graph of the final data structure. This permits the solution of our initial problem-the collection of arcs bounding each polygon-so that areas may be calculated.