Automated cartographic generalization has been an important research area
for years now. Besides many papers, three books are devoted exclusively
to automated generalization
[2],[8],[9].
Several frameworks have been proposed [1],[11],
and several of the operators that actually perform generalization
have been descibed, especially for line simplification.

The most basic of the generalization operators is selection:
which map objects should be selected for display on a particular
map at a particular scale [13]? This question has
been addressed for point features, where it is called
settlement selection[5],[14].
In the case of subdivisions, the dissolution operator
can be used for elimination (the inverse of selection),
for instance with the GAP-tree [15].

The selection of line features has been considered in the context of
road network maps and drainage networks.
We are aware of two papers that select subsets of the roads to
perform generalization [6],[12].
Both consider whether algorithms in graph theory can be used.
For example, a road network map connecting a set of cities can be
generalized by only selecting the roads in the minimum spanning tree.
This idea guarantees the topological property of connectedness,
but there are serious limitations to the graph approach.
Firstly, there is no guarantee that avoiding coalescence and
imperceptibility--two of the objectives of generalization--is realized.
These geometric aspects that are not present in a graph.
Secondly, semantic aspects of road network generalization
such as avoiding large detours are not taken into account explicitly.

Another paper [7] discusses the detection and
collapse of groups of junctions to a single junction. This is
another important aspect of road network generalization and
complementary to road selection.

In this paper we address road network generalization by selection
more rigorously.
We will first define what the input to road network generalization
should look like.
Then we list the basic geometric, topological, and semantic
requirements [16]
of the output, the generalized road network.
Such a list is necessary to design an approach that performs
the task at hand well. Based on the list, we outline a three-step
approach to road network generalization.
Most of the ideas in the paper have been implemented and
applied to a data set.

A road network is modeled as an embedded graph consisting of
nodes and arcs.
Some of the nodes are intersections, others serve to describe the shape
of the road and are basically control points. A road network may have
crossing arcs due to bridges and tunnels, implying that the graph is
not necessarily planar.
In a realistic model of a road network, roads have an importance.
Highways are the most important roads and small dirtroads are
unimportant. Any road network generalization algorithm must take
into account that important roads should be given preference over
unimportant roads. Therefore, the importance must be part of the
input specification.
We model the input road network as a graph G with node set N
and arc set A, and with the following information:

Each node in the set N is a point with geographic coordinates.
Each node has references to the incident arcs of the graph.

A subset P of N of the nodes of the graph are important places.

Each arc in A has some importance, representing the type of
road (dirt road, ..., highway), and it has references to the two
incident nodes.

The information listed above contains graph structure like incidence
between nodes and arcs, coordinates to obtain embedding, and
semantic information like importance.

From the input, two important concepts can be introduced. They
are sets whose elements depend on the currently selected network.

A subset Q of N of the nodes of the graph play a role more
important that only describing the shape of a road. These are the
end nodes (degree 1 in the graph) and the intersections
(degree at least 3 in the graph).
Subsets P and Q need not be disjoint.

The arcs of A are grouped into consecutive parts called roads.
A road is a maximal consecutive sequence of arcs connecting exactly
two nodes from P v Q. The set of all roads in the network
is denoted R.
Each road r in R has a certain length l(r). We assume that all
arcs forming one road have the same importance, which is defined to
be the importance of the road.

Before applying generalization, the whole input network is selected.
When the algorithm to be described operates, parts of the input network
may no longer be selected and the sets defined above will change.
An intersection of degree 3 may cease to be an intersection if one
of the incident roads is no longer selected. Therefore, the set Q is
defined in terms of the selected road network.
The same holds true for the set of roads, which may merge with
adjacent roads.

It is essential in the design of a generalization function that
attention be given first to the requirements. These can be metric,
like avoiding coalescence, or topological, like maintaining connectivity,
or semantic, like keeping short routes between important places.
Keeping the objective of road network maps and the reasons for
generalization in mind, we have come to the following requirements:

Avoid coalescence: no two roads may be closer than some distance
d. Here d is a tunable real number that depends
on the scale of the generalized map. E.g., d can be set
so that no two roads are closer than half a millimeter on the
generalized map.

Avoid detours: for any two important nodes p1,p2 in P, the
length of the shortest path between them in the generalized network
should be at most c times the length of the shortest path
in the input network. Again c is a tunable parameter that
must be set greater than 1.

Give preference to important roads: whenever possible, important
roads should be given preference for inclusion over less
important roads.

The first requirement is geometric and the other two are semantic.
The topological requirement that the generalized network be connected
is implied by the second requirement: no connection implies an
infinite detour.

The list above contains one hard requirement and two `soft' requirements.
The first requirement will always be guaranteed by the algorithm, no
matter what input road network. Given this decision, we cannot
guarantee the second requirement, so it is soft.
It may be the case that only way to avoid a long detour is to include
a road that is very close to another road. The other choice would be
to avoid long detours always, but then one has to accept that coalescence
may occur. We have chosen to avoid coalescence over avoiding detours.
The third requirement is soft in the sense that in the output network,
some less important roads may be chosen while some important roads are not.
This is desirable as we noted before, since less important roads may
be relatively important in the neighborhood.

We say that two roads are in conflict if their minimum distance
is less than d. This statement must be interpreted appropriately.
Two roads that start (end) at the same intersection always have distance 0,
but we do not want the algorithm to detect a conflict (coalescence)
in this case.
We redefine the smallest distance between two roads as the smallest
distance while excluding both parts of the road closer than d
from the two ends. In the case of crossing roads (bridge or tunnel),
we also exclude parts of the roads closer than d from the
crossing point.

Figure 1: Circles around ends of roads, and distances between roads.

The algorithm to detect if two roads are in conflict is used often in
whole generalization procedure. Therefore, it is important that this
detection be done very efficiently. We treat this test first before
giving the whole generalization procedure.

Suppose that two roads r1 and r2
have m1 and m2 nodes defining them, respectively.
A brute force test for minimum distance would take O(m1·m2) time.
By employing techniques from computational geometry
[4],[10] one can improve this considerably.
We compute the Voronoi diagram of all nodes and
arcs [17] from the roads r1 and r2.
A Voronoi region of a feature from r1 adjacent to a
Voronoi region of a feature from r2 gives a pair of
features that may give the minimum distance between r1 and r2.
By testing all pairs of neighboring Voronoi regions and their
features we can find the overall minimum.

Figure 2: Voronoi diagram of the nodes and arcs of two roads.

Such an algorithm to find the minimum distance takes only
O((m1+m2)log(m1+m2)) time; it takes O((m1+m2)log(m1+m2))
time to compute the Voronoi diagram and it takes
O(m1+m2) time to test adjacent regions.
One can also use a heuristic to be even faster in most cases.
Compute the bounding box B1 of r1, the bounding box
B2 of r2, and test if these are at least distance d apart.
Such a test requires only O(m1+m2) time in total, or only
O(1) time if the bounding boxes have already been computed.
But it sometimes fails to decide whether roads are too close,
and we need a more time consuming algorithm in such cases.

The method we have designed for road network generalization
has three stages. In the first, an initial selection of all
roads is made based exclusively on the importance of the
roads and the total number of conflicts. The second step is to
test all pairs of roads for conflicts, and delete the least
important of the two. The third step consists of adding
roads, possibly of lesser importance, to avoid long detours.
Care must be taken that no new conflicts are introduced here.
We discuss the three stages in more detail.

We assume that there is a fixed number of different importance values,
for example, the integers 1 up to 5, where 5 is most important.
In the first step we'll compute an integer i so that if all
roads of importance i and up are selected, then there are
not too many conflicts. For each possible importance value
and implied selection, we determine how many roads are in
conflict with some other road. We choose the smallest value
i such that the percentage of roads in conflict is less
than, say, 30%. This implies that in the second step we need only
eliminate less than 30% of the roads to obtain a conflict-free
road network. Probably only fewer roads need be eliminated,
because when two roads are in conflict, we need to remove only one.

When testing whether a road r is in conflict with some other
road, we start with the roads that bound the same face as r.
As soon as we find a conflict we can stop the test for r and
mark it as being in conflict.
Because the test for conflicts excludes the parts of roads near
the ends, it may be the case that r has no conflicts with roads
bounding the same face, but has conflicts with roads that bound
other faces. By cleverly choosing the sequence by which to test
roads we can usually decide after a few conflict tests whether r
has conflicts or not. A logical sequence is a breadth-first sequence
starting from the road r. By using bounding boxes, it is possible
to decide when it is safe to conclude that r is not in conflict.

An upper bound on the running time of the first step is O(knlogn),
where k is the number of roads and n is the total number of nodes
of the network. One can expect to do the first step much faster with
the suggestions for efficiency just given.

The second step of the algorithm resolves all conflicts
by eliminating one of each pair of roads in conflict.
It is natural, and in accordance to the third requirement, to
remove the road with smaller importance. In case of the
same importance we can remove either road. When a road is
removed, a junction can become a node of degree 2, which
can imply that two roads are adjoined to one. One has the
choice to make which importance to assign to the next road,
if their importances were different.

Figure 3: When the vertical road is eliminated, the two other roads become one.

Elimination of conflicts should be done incrementally.
When a conflict between two roads is resolved by elimination
of one road, other conflicts with the eliminated road are
resolved too, of course. But new conflicts may arise when
two roads are adjoined, since the former junction, now
normal vertex, no longer has a conflict-free region around it.
This is the reason that we cannot precompute all conflicting
pairs and simply remove one road of each.

We'll treat the roads in order of increasing importance.
If the road under consideration is in conflict with some
road of the same or greater importance, we eliminate it.
If this makes a new, merged road to appear, we have to
treat it too, even if both subroads have already been tested.
Again, bounding box tests and a suitable order by which to
test other roads will help considerably for efficiency.

The worst case running time of the second step is O(knlogn),
and again we expect to do much better in practice.

The third step of the generalization method is to add roads
needed to avoid detours, but without creating new conflicts.
This involves the roads eliminated in the first step, some
of which may not have any conflict, but also roads eliminated
in the second step. Such a road may have been eliminated
because of a conflict with a road that itself was eliminated
later in the second step. We choose to be conservative with
adding new roads. They are added only if they really help
to avoid a detour and don't have a conflict. One could also
choose to always add roads that are not in conflict.

Note that detours are defined in terms of shortest paths
in the input network, and detours are avoided only for the
set P of important nodes. So we start by computing all
shortest paths between pairs of nodes in Q in the input network.
Using Dijkstra's algorithm [3] this takes O(mnlogn)
time in total, where m is the number of important points and n
is the total number of nodes and arcs in the network.

Most difficult in this step is how to generate roads
that can be tested for inclusion in the road network.
We do this as follows.
For each important node p we compute its shortest path in
the currently selected network to all other important nodes.
If each path is at most c times the length of the corresponding
shortest path in the input network, we don't have any detours from p
and we are happy with the connections from p. If there is a serious
detour to another important node p', we'll try to add roads to
create a shorter connection.
We take the shortest path between p and p' in the input network and
consider which parts are not present in the currently selected network.
We test each of these for conflicts with the selected roads,
and add them if there are no conflicts. If we achieved to add
all parts, the shortest path between p and p' is present
in the selected network. If only some parts could be added, we
may still have reduced the length of the shortest path between
p and p' and possibly, avoided the detour.

For the analysis, we consider the running time for each important
node that is tested. We compute the lengths of the shortest
paths to every other important node in O(nlogn) time
with Dijkstra's algorithm. For each of the at most m important
nodes to which a serious detour exists, we may need to consider
determining and adding at most k roads, where k is the total number
of roads in the input network. For these roads we must test if they
are not in conflict with any other road. This test takes O(nlogn)
time for each road piece, leading to a
worst-case running time of O(km2nlogn).
But one can expect a running time somewhere between O(mnlogn) time
and O(m2nlogn) time typically, if heuristics with bounding boxes
and an appropriate order to test for conflicts are used.

If the method above does not avoid a detour sufficiently,
it is certainly possible to generate other short paths and try
to insert these. As noted before, however, it is not possible
to obtain a guarantee on the detour as long as one insists
on avoiding coalescence.

The main problem with the method outlined above is how it deals with
short roads. Any road that appears on the map with length less than
2d, for instance, will never be in conflict since it lies
completely in the distance d-regions of its two ends. This implies
that clusters of short roads may appear in the generalized network.
We will discuss three possible solutions to the problem.

Short roads themselves are not really a problem, only a collection
of many short roads close together is. This implies that the induced
subdivision has many small faces. A similar observation can be made
when we consider imperceptibility. A short is not imperceptible because
it is part of the larger network, but a very small face in the subdivision
will be imperceptible.

We can redefine conflicts for short roads as follows.
Two roads are in conflict if both have length at most 2d and
they are incident to some small face with area at most, say, 2d2.

The method described above will resolve the conflict by removing one
of the conflicting short roads. This causes the small face to be
merged with another face.

In the first step of the algorithm, a preselection of the roads was
made based solely on the importance. If fewer roads are preselected,
then fewer short roads will appear in the generalized network,
because roads are added only in the third step. Here the addition
is only done if it helps to avoid a detour. Large clusters
of many short roads aren't useful to avoid detours, one needs
only a representative selection of connections. These will be
established in the third step.

To achieve that the preselection is smaller, can adapt the first
step by computing the smallest value i such that at least
some percentage of roads are not in conflict, and at least some
percentage of the roads is not too short. This may cause i to
be greater than before (described in the first step).
The motivation for this solution is that a cluster of short roads
generally contains many that are less important.

Perhaps the best solution to avoiding clusters of short roads is
to apply a separate processing step where these clusters are collapsed
to single junctions. The detection and collapse of clusters has
been studied by Mackechnie and Mackaness [7], and our
methods can be combined with theirs. Detection and collapse of clusters
can be done beforehand, after which our generalization method can take
over. It is also possible that our method is done first without giving
special attention to short roads, as it is described in the previous
section. Then the clusters of junctions can be detected and
replaced by single junctions afterwards. A drawback of this third
option is that the junction collapse methods don't deal with
coalescence and detours explicitly.

This paper presented a new approach to decide which roads should be
selected when generalizing a road network. The approach was designed
while keeping three main objectives in mind: not allowing roads to
be too close, avoiding detours between important points, and giving
priority to bigger roads. These objectives led to a three-step
approach that performs generalization. Several options and
alternatives were suggested to tune or vary upon the approach.

The ideas outlined in this paper have been implemented partly and
can be run via the World Wide Web. The Java applet can be found at
http://www.jarno.demon.nl/thesis.htm.
The implementation is a prototype that hasn't been optimized in
running time. Therefore it is rather slow. Not all solutions suggested
in this paper are included. It does allow tuning of several parameters,
however, and gives an indication of possible outputs of the
generalization method.

Figure 4: Data set used by the Java applet.

In this study of road network generalization, we have only chosen
the selection operator as means of generalization. There are several
other operators that can be useful: line simplification, line
smoothing, merge lines, collapse a road (and merge the
incident intersections), and local displacement of lines.
This is a subject of further research. Another issue is a
more extended study of dealing with short roads, especially
the combination of junction collapse with our method.