[comp.lang.c++] How do you build iterators for graphs?

jps@cs.brown.edu (John Shewchuk) (07/27/89)

Assume you have a data structure that contains a graph of the form:


   +---+ 
   | A | <- Root
   +---+
     |\
     | \
     |  \
     |  +---+
     |  | B |
     |  +---+
     |  /
     | /
   +---+
   | C |
   +---+

and you want to do a depth first traversal.  If you have visited A and
C and are now visiting B, then you need to know if you seen C before.
Normally, you would associate a 'mark' with each node to indicate
whether the node was visited- simply examine C to see if the marked
bit is set.  Unfortunately, this means only one iterator can be
traversing the graph at a time.  To fix this you can create a friend
class (called graph_iterator or something) that stores the marks and
the current position.

A problem arises when storing the marks- you must make an association
between the nodes and the mark.  I do not see any inexpensive way to
do this.  Hashing takes too much time and space and because the key to
a node is an address, this rules out the obvious direct storage
techniques.

Is there a way to do this that uses unit space and unit access time?
Any suggestions for alternatives?


Thanks.

John Shewchuk                                                jps@cs.brown.edu

sakkinen@tukki.jyu.fi (Markku Sakkinen) (07/27/89)

In article <11336@brunix.UUCP> jps@cs.brown.edu (John Shewchuk) writes:
>
>Assume you have a data structure that contains a graph of the form:
>
>[ Picture: root A, vertices A-B, A-C, B-C ]
Apparently _directed_ graphs are meant (but it does not make a big difference).

>and you want to do a depth first traversal.  If you have visited A and
>C and are now visiting B, then you need to know if you seen C before.
>Normally, you would associate a 'mark' with each node to indicate
>whether the node was visited- simply examine C to see if the marked
>bit is set.  Unfortunately, this means only one iterator can be
>traversing the graph at a time.  To fix this you can create a friend
>class (called graph_iterator or something) that stores the marks and
>the current position.
>
>A problem arises when storing the marks- you must make an association
>between the nodes and the mark.  I do not see any inexpensive way to
>do this.  Hashing takes too much time and space and because the key to
>a node is an address, this rules out the obvious direct storage
>techniques.
>
>Is there a way to do this that uses unit space and unit access time?
>Any suggestions for alternatives?

The question is not in the most appropriate group,
since the actual problem does not pertain to any programming language.
The solution may however be neatest to implement in a language with
good data abstraction facilities (but that includes CLU, Ada, Modula-2, ...).

The key point: you want several iterators to traverse the graph
at the same time. This certainly implies that the graph (structure,
if not the contents of the nodes) must be constant, at least as long
as there are active iterators. Hence a very simple solution suggests itself:
1. Do one traversal that constructs the linearised list of nodes
   (or in C++ better the vector of their addresses).
2. Each iterator must then only traverse the linear list.

If you want to update the graph structure sometimes (when there are no
iterators in the midst of it), you could consider e.g. the following:
Put into each graph object also a pointer to the corresponding linear list.
If updates are expected to happen often, you can use a lazy evaluation
tactic by constructing the linear list only when the first iteration
(after an update) is requested.

With some effort, you can do the above completely "behind the scenes"
w.r.t. the users of graphs and iterators. Essentially the same principle
of two (or more) alternative underlying representations of an abstract
datatype has often been suggested for complex numbers: have both the
ordinary and the polar representation, use the more expedient one for
each operation, and convert when necessary. As a difference from that case,
in the current problem we need only one-way conversions (from graph
to linear).

Markku Sakkinen
Department of Computer Science
University of Jyvaskyla (a's with umlauts)
Seminaarinkatu 15
SF-40100 Jyvaskyla (umlauts again)
Finland

sakkinen@tukki.jyu.fi (Markku Sakkinen) (07/27/89)

In article <1056@tukki.jyu.fi> I wrote:
> ...
>>[ Picture: root A, vertices A-B, A-C, B-C ]
                     ========
Pardon, of course it should have read _edges_.
Sometimes fingers are faster than thoughts.

Markku Sakkinen
Department of Computer Science
University of Jyvaskyla (a's with umlauts)
Seminaarinkatu 15
SF-40100 Jyvaskyla (umlauts again)
Finland

jack@odi.com (Jack Orenstein) (07/27/89)

In article <11336@brunix.UUCP>, jps@cs.brown.edu (John Shewchuk) writes:
> 
> Assume you have a data structure that contains a graph of the form:
> 
> 
>    +---+ 
>    | A | <- Root
>    +---+
>      |\
>      | \
>      |  \
>      |  +---+
>      |  | B |
>      |  +---+
>      |  /
>      | /
>    +---+
>    | C |
>    +---+
> 
> and you want to do a depth first traversal.  If you have visited A and
> C and are now visiting B, then you need to know if you seen C before.
> Normally, you would associate a 'mark' with each node to indicate
> whether the node was visited- simply examine C to see if the marked
> bit is set.  Unfortunately, this means only one iterator can be
> traversing the graph at a time.  To fix this you can create a friend
> class (called graph_iterator or something) that stores the marks and
> the current position.
> 
> A problem arises when storing the marks- you must make an association
> between the nodes and the mark.  I do not see any inexpensive way to
> do this.  Hashing takes too much time and space and because the key to
> a node is an address, this rules out the obvious direct storage
> techniques.

This kind of problem arises whenever there are multiple iterators over
a single collection of objects. Some state has to be recorded for each
iteration.  It is tempting to put this state information in the object
being traversed, the graph in this case (a bit in each node), but it
has the drawback noted - that only one iteration is possible. The same
problem arises in iterating over something as simple as a vector in
which the state of an iteration is recorded in a pointer or subscript
associated with the vector.

A general solution is to record the state information in the iterator
object, as noted by jps. This can be done without the space and time
overhead of hashing. Instead, store multiple mark bits per node, and
have the iterator store information indicating which mark bit to use.
The trick is that the same bit (i.e. in the same relative position) is
used in each node.  Different simultaneous iterations will use
different mark bits.

More specifically, allocate multiple bits per node, either statically
or dynamically.  For simplicity, assume that each node has N
statically allocated mark bits, permitting up to N iterations. The
graph object, which encompasses all the nodes, records active
iterations, and rejects requests for new iterations if there are
already N iterations active. When an iteration is started, an integer
between 0 and N-1 is returned which serves to identify the iteration.
This integer is stored in the iterator object.  When iteration i
visits a node x, the ith mark bit of x is set.  At the end of
iteration i, the graph must be notified so that it can clear bit i in
each node and "reuse" iteration i to fulfill a new request.

This is more efficient than the hash table solution. The iterator
looks the same externally. Internally, there is one additional piece
of information communicated between the graph and the iterator - the
iteration identifier.



Jack Orenstein
Object Design, Inc.

vaughan@mcc.com (Paul Vaughan) (07/27/89)

Your problem is the classic "How do you associate new properties with
an existing data structure?" problem.  If the number of active
iterators is fixed and small, you might try just adding an array of
marks and allocating an index when a new iterator fires up.  If the
number of iterators is typically small, but varies for significant
periods of time, you might use dynamically allocated linked lists to
record the marks--searching down the list before each visit.  Any sort
of table will do the trick, sorted, hashed, linear, whatever.  
	One interesting case is where the graph is mostly hierarchical
and the inverse pointers are stored.  Here, it pays to check to see if
there is more than one inverse (parent) pointer before doing the
reconvergence check.  If not, you couldn't have been there before.  
	If your graph doesn't change very often but you do lots of
traversals, you might consider making an array of pointers by
traversing the graph once and storing the linear order in which you
traversed it in the array.  Then subsequent traversals need only march
down the array, and iterators become simple integers.
	In LISP, I implemented a macro for graph traversal that
compiled a function based on the options the user specified.  It could
handle any combination of depth, breadth, preoperations,
postoperations, mostly hierarchical, and searches (instead of
traverals).  The default reconvergence check used hash tables, but by
passing in appropriate functions, mark bits or other tables could be
used.  The macro accepted a reconvergence table object, an accessor
for the table and a modifier for the table.  The most common idiom was
to gensym a unique symbol, pass it in as the table, have the modifier
be a function that set a mark slot in the current node to be the
symbol (the modifier was called with the table and the current node as
arguments) and the accessor simply compared the table with the mark
slot of the current node.  The macro was arranged so that it didn't
involve any overhead for any of the features that weren't used.
Conceptually, if the various functions (children, parents,
reconvergence-table-accessor, reconvergence-table-modifier) were
inline functions there need not be any function call overhead, but
LISP doesn't really let you do that (and the compiler I was using
didn't do it for me).
	I've gone into some detail about the LISP macro I used because
I think it represents a form of code reuse that is almost impossible
to acheive in any other language.  I'd be interested if someone could
create such a facility for c++.  Graphs are one of the most
fundamental and widely used data structures, so such a thing would be
appropriate for a library.  Perhaps an extension to the gnu genclass
facility would be the easiest way to do it in c++.

 Paul Vaughan, MCC CAD Program | ARPA: vaughan@mcc.com | Phone: [512] 338-3639
 Box 200195, Austin, TX 78720  | UUCP: ...!cs.utexas.edu!milano!cadillac!vaughan