Re: [igraph] (no subject)

From:

Tamás Nepusz

Subject:

Re: [igraph] (no subject)

Date:

Thu, 1 Mar 2012 20:35:29 +0100

How is a Graph internally represented?

We currently use an indexed edge list data structure. Basically, we have a list in which the edges are stored in the order of addition. We then build two indexes on top of the edge list; one of the indexes can be used to traverse the edge list in increasing order of the source vertices, the other one can be used to traverse the edge list in increasing order of the target vertices. We also have a second-level index that allows one to jump quickly to the point in the first-level index where the out- or in-neighbors of a particular vertex start. The advantage of this data structure is that most of the query operations are fast (see http://igraph.sourceforge.net/doc/html/ch04s02s02.html for more details), but this comes at a price, since we have to re-build the indices when the graph is mutated (e.g., nodes or edges are added or removed). Things are not as bad as they seem, though, but it is in general advantageous to add/delete nodes or edges in large batches because we can then re-build the index only after all the additions or removals have been performed.

How can I know which data structures are used to represent the graph internally?

Ideally, you shouldn't worry about that, and you shouldn't rely on that either ;) We may replace the underlying data structure at any time.

Is it then really representad as a Vector of edges?

Yes, but note that the vector is indexed. (If you are interested in the gory details, check out src/type_indexededgelist.c -- you can see that we maintain six vectors in igraph_t; two for the edge list itself, two for the first-level indices and two for the second-level indices). So, for instance, getting the degree of a vertex does not require us to scan the entire vector: we can simply jump to the point in the first-level index where the neighbors of the vertex start, then jump to the point where the neighbors of the _next_ vertex start (since the first-level index is sorted by the source vertices), subtract the two pointers, and then we have the degree. The page I linked above lists the time complexity of the query operations and none of them takes more time than O(d), where d is the average degree of a vertex.

For some algorithms, it is more efficient to work with adjacency lists than with the original indexed edge list representation. In these cases, you can construct an igraph_adjlist_t "on the fly", mutate it as you wish, and then convert it back to an igraph_t. However, note that the algorithm implementations expect an igraph_t, so you cannot pass them an adjacency list directly.

The main reason for igraph_adjlist_t is that algorithms requiring a lot of graph mutations (e.g., igraph_rewire) do an implicit conversion to igraph_adjlist_t, work on the adjacency list, and then convert it back to an igraph_t. We also have a "lazy adjacency list", which constructs the sublists of the adjacency list "on demand" (i.e. when they are first requested). Furthermore, we also provide a function to convert a graph to an "incidence list" representation where vertices are mapped to the list of edges they are incident on.