On Thu, 08 Jan 2009 21:28:27 minh thu wrote:
> Hi,
>> I'd like to process some kind of graph data structure,
> say something like
>> data DS = A [DS] | B DS DS | C.
Graphs in funtional languages aren't usually represented in this sort of
manner. Trees are fine to represent like that as they're acyclic and have
exactly one parent for each node but for graphs it's much more difficult. Say
that you have a graph with directed connections like this:
0 -> 1
1 -> 2
2 -> 3
1 -> 3
3 -> 4
Now you want to alter node 4. Node 3 has to be updated to point to the new
version of 4, node 1 has to be changed to point to the new version of 3, node
2 has to be changed to point to the new version of node 3, then node 1 has to
be changed again to point to the new version of 2, then finally 0 can be
changed to point to the new version of 1 and returned.
There is no simple way using this representation to handle that double-update
to node 1, or to handle disconnected or cyclic graphs. Updates are extremely
difficult since Haskell data structures are not mutable and have no concept
of identity. The approach of treating nodes as structures with pointers to
each other cannot be cleanly and efficiently implemented in an immutable
fashion. It only really makes sense in a stateful, imperative context.
An approach that suits functional languages better is to store a flat
structure listing the edges leaving each node. This, I believe, is the
approach taken by Haskell's main graph library, FGL
(http://hackage.haskell.org/cgi-bin/hackage-scripts/package/fgl). You would
now have something like:
data MyNode nv = MyNode {nodeId::Int, nodeValue::nv}
data MyEdge ev = MyEdge {edgeDestination::Int, edgeValue::ev}
data MyGraph nv ev = MyGraph {
maxNode :: Int,
nodes :: (Map Int nv),
edges :: (Map Int [MyEdge ev])}
emptyGraph :: MyGraph nv ev
emptyGraph = MyGraph 0 (Data.Map.empty) (Data.Map.empty)
getNode :: MyGraph nv ev -> Int -> Maybe (MyNode nv)
getNode g id = ((nodes g) `lookup` id) >>= (\v -> MyNode id v)
getEdgesLeaving :: MyGraph nv ev -> Int -> [MyEdge ev]
getEdgesLeaving g id = fromMaybe [] ((edges g) `lookup` id)
addNode :: nv -> MyGraph nv ev -> (Int, MyGraph nv ev)
addNode val g = (maxNode newGraph, newGraph)
where
newNodeId = (maxNode g) + 1
newGraph = MyGraph newNodeId (insert newNodeId val (nodes g)) (edges g)
... and so on. (This is all totally untested - use at your own peril.)
Each node in the graph has a unique identifying number, issued in sequence
using maxNode as a counter. This makes identifying cycles easy. The nodes map
contains the value for each node based on its id. The edges map contains a
list of links from each node to others in the graph. Finding links entering a
node is quite expensive - if you need to do this often then maintaining a
second list of edges entering each node would speed it up.
Each node and each edge can have a custom data structure attached. New nodes
and edges can be added without having to modify references elsewhere, nodes
have a distinct identity given by the associated Int and the graph is
immutable - operations on it produce modified copies.
Cheers,
Tim