[comp.lang.prolog] Wierd Topological Sort

markb@sdcrdcf.UUCP (Mark Biggar) (04/13/88)

I have a problem where I need to do a topological sort on the nodes of a
directed graph.  This would be simple except that the graphs can contain
cycles.   It is the nature of the problem then each node has only ONE
edge leaving it, but may have several edges leading to it.  This means
that any connected subgraph can have at most one cycle in it.  I wish to
treat these cycles as single nodes in the sorted list.

So if I have a list of directed edges like:

[d(e,a),d(b,c),d(f,b),d(a,c),d(d,e),d(c,e)]

I get the result:

[[e,a,c],b,f,d]  (or some other permutation that is a valid topological sort)

The graphs I am working with may have several disjoint pieces (each of which
may contain a cycle).

Mark Biggar
{allegra,burdvax,cbosgd,hplabs,ihnp4,akgua,sdcsvax}!sdcrdcf!markb
markb@rdcf.sm.unisys.com

ok@quintus.UUCP (Richard A. O'Keefe) (04/14/88)

In article <5224@sdcrdcf.UUCP>, markb@sdcrdcf.UUCP (Mark Biggar) writes:
> I have a problem where I need to do a topological sort on the nodes of a
> directed graph.  This would be simple except that the graphs can contain
> cycles.   It is the nature of the problem then each node has only ONE
> edge leaving it, but may have several edges leading to it.  This means
> that any connected subgraph can have at most one cycle in it.  I wish to
> treat these cycles as single nodes in the sorted list.
> 
> So if I have a list of directed edges like:
> 
> [d(e,a),d(b,c),d(f,b),d(a,c),d(d,e),d(c,e)]
> 
> I get the result:
> 
> [[e,a,c],b,f,d]  (or some other permutation that is a valid topological sort)
> 
> The graphs I am working with may have several disjoint pieces (each of which
> may contain a cycle).
> 
There are two parts to the question:  how do you do this at all, and how
do you do it in Prolog.  The Prolog part isn't terribly interesting, there
is code for topological sort and union/find already drifting around.  So
I'll concentrate on the first part.

Such a graph is a sort of forest, except that some of the "trees" have
a ring as their root instead of a single node.

The best method I can think of is to adapt the usual Depth-First-Search
method for determining strongly connected components.

    for Node in nodes(Graph) do visited(Node) := false od
    Order := []
    for Node in nodes(Graph) do if not visited(Node) then
	Stack := []	% all the nodes reachable from Node
	N := Node
	while not visited(N) and there is an arc N->P do
	    visited(N) := true
	    Stack := [N|Stack]
	    N := P
	od
	if visited(N) then
	    %  either we've found a cycle, or a previously processed path
	    if member(N, Stack) then
		%  it's a cycle
		Cycle := []
		repeat Cycle,Stack := [head(Stack)|Cycle], tail(Stack)
		until head(Cycle) = N
		Order := [[Cycle]|Order]
	    fi
	else
	    % we've found a plain root
	    visited(N) := true
	    Stack := [N|Stack]
	fi
	Order := append(Stack, Order)
    fi od
    Order := reverse(Order)

Interestingly enough, we can use ordinary lists for Stack, Cycle, and Order.
If the member(N, Stack) test takes k steps, so what?  It must have taken at
least k steps to make Stack in the first place, so using ordinary member/2
won't spoil the asymptotic efficiency.  The trick is representing visited()
and the collection of arcs.  That's what will make this an O(V.lgV) method
in Prolog (V being the number of vertices) rather than an O(V) method.

Caveat:  I haven't tested this code.

markb@sdcrdcf.UUCP (Mark Biggar) (04/22/88)

In article <5224@sdcrdcf.UUCP> I write:
>I have a problem where I need to do a topological sort on the nodes of a
>directed graph.  This would be simple except that the graphs can contain
>cycles.   It is the nature of the problem then each node has only ONE
>edge leaving it, but may have several edges leading to it.  This means
>that any connected subgraph can have at most one cycle in it.  I wish to
>treat these cycles as single nodes in the sorted list.
>
>So if I have a list of directed edges like:
>
>[e(e,a),e(b,c),e(f,b),e(a,c),e(d,e),e(c,e)]
>
>I get the result:
>
>[[e,a,c],b,f,d]  (or some other permutation that is a valid topological sort)
>
>The graphs I am working with may have several disjoint pieces (each of which
>may contain a cycle).

First off, thanks to everyone you answered my article.

Given the restrictions on the graphs as stated above,  the graphs can be
thought of as a forest of trees some of which are rooted in a set of
cycles.  So the basic idea is to prune the trees from the leaves to the
roots and then handle the cycles.  So my first shot at it looked like:

    topsort(List,Slist) :-
	leaves(List,List,Nodes,Rest),       % separate out the leaves
	(Nodes = [] ->                      % if there weren't any...
	    cycles(Rest,Slist)              %   do the cycles
	;                                   % else
	    topsort(Rest,Tlist),            %   topsort the rest of
	    append(Tlist,Nodes,Slist)       %   and append it on the front
	).

    leaves(_,[],[],[]).                     % empty list has no leaves
    leaves(List,[e(Tail,Head)|Edges],[Tail|Nodes],Rest) :-
	    not(member(e(_,Tail),List)),    % a leaf if no one points at it
	    leaves(List,Edges,Nodes,Rest).  %  find the rest
    leaves(List,[Edge|Edges],Nodes,[Edge|Rest]) :-
	    leaves(List,Edges,Nodes,Rest).  % skip over ono-leaves

This ignores handling the cycles for now.

This implementation has two problem with: the first is that it is wrong
in that it looses the root nodes of trees which are not rooted in a cycle.
Second I don't like the append.

The first problem has two reasonable solutions:

1) Notice and handle root nodes as the last edge leading to that root node
    is removed.  But that requires upto two extra passes over the list of
    nodes for each edge that is removed, one to see if the head of the edge
    is a root node and one to see if this edge is the last one pointing at
    that root node.  So I rejected this solution.

2) Trees that are not rooted in a cycle can be made rooted.  In the
    original data edges like e(a,a) are not possible, so I can make one
    pass over the original data and add such an edge for each unrooted
    tree and then handle them at the same time I handle the cycles.

The append can be gotten rid of by noticing that I don't have to collect
the leaves in bunches, I can prune tham one at a time and still get a
valid topsort.  And by borrowing "the build this list up as I pass the
arguments down" trick (from the non-naive reverse) I get the following:

    topsort([],[]).
    topsort(Edges, Nodes) :-
	root_trees(Edges,Edges,RootedEdges),
	prune_leaves(RootedEdges,SortedLeaves,[],Residue),
	cycles(Residue,SortedLeaves,Nodes).

    root_trees(_,[],[]).
    root_trees(Edges,[e(T,H)|Rest],[e(T,H)|RootedRest]) :-
	member(e(H,_),Edges),                   % not a root
	root_trees(Edges,Rest,RootedRest).      % so skip it
    root_trees(Edges,[e(T,H)|Rest],[e(H,H),e(T,H)|RootedRest]) :-
	root_trees(Edges,Rest,RootedRest).      % add simple cycle

    prune_leaves(Edges,OutList,InLIst,Residue) :-
	first_leaf(Edges,Edges,Leaf,Others),
	prune_leaves(Others,OutList,[Leaf|InList],Residue).
    prune_leaves(Edges,Leafs,Leafs,Edges).

    first_leaf(Edges,[e(T,H)|Rest],Leaf,[e(T,H)|Others]) :-
	member(e(_,T),Edges),
	first_leaf(Edges,Rest,Leaf,Others).
    first_leaf(Edges,[e(T,H)|Rest],T,Rest).
    % note first_leaf is meant to fail on first_leaf(_,[],_,_).
    % which of course means I didn't find any leaves

    cycles([],X,X).
    cycles([e(T,T)|Rest],IList,[T|OList]) :-
	cycles(Rest,IList,OList).               % handle the simple cycles
    cycles([e(T,H)|Rest],IList,[[H|Nodes]|OList]) :-
	prune_leaves(Rest,Nodes,[],Residue),    % I,ve broken the cycle and
	cycles(Residue,IList,OList).            % can use prune_leaves to
						% get the rest of the cycle

I would like any advice I can get on how to make this better.

Mark Biggar
{allegra,burdvax,cbosgd,hplabs,ihnp4,akgua,sdcsvax}!sdcrdcf!markb
markb@rdcf.sm.unisys.com