[comp.lang.functional] functional programming formulation of graph algorithms

corvara@ztivax.UUCP (Dr Gerd Venzl) (05/27/91)

Some months ago somewhere in the literature I found a statement like

   "Functional programming is not well suited for algorithms of graph
    theory as these usually make frequent use of side effects."

When I tried to write down a purely functional formulation of some
manipulations on binary decision diagrams I found that indeed to be
quite a difficult and unnatural task. Are there really fundamental
reasons for these difficulties to arise or is my vision and understanding
of graph algorithms just too imperative? Are there any published attempts
to give functional formulations of graph algorithms? Any opinions from
netland?

Please reply by email, I will summarize if there is sufficient feedback.

Torsten Roessel
Siemens AG, Munich (FRG)
InterNet: roessel@ztivax.siemens.com

reddy@reddy.cs.uiuc.edu (Uday S. Reddy) (05/29/91)

In article <5332@ztivax.UUCP> corvara@ztivax.UUCP (Dr Gerd Venzl) writes:

   Some months ago somewhere in the literature I found a statement like

      "Functional programming is not well suited for algorithms of graph
       theory as these usually make frequent use of side effects."

I am curious where exactly you found that quote.  

In any case, the reason graph algorithms cannot be efficiently
implemented in functional languages is that they involve "shared
updates".  Depth-first search, for instance, involves updating tags at
vertices to keep track of which vertices have been visited.
"Modifying" a tag means making a new graph which is same as the old
one except that one of its tags is changed.

The simplest way to handle these in existing pure functional languages
is to (abstractly) represent a graph as a finite map

	Vertex -> List(Edge)

Such maps can be implemented, in turn, by balanced search trees of
some kind to allow for (nondestructive) updating.  Of course, if you
are using an "impure" functional language with assignments, you can
use them to get what you want.  But, you should be careful not to
create awful side effects.

Researchers are working on notations for expressing sequentiality
constraints so that compilers can update data structures in place.
See the proceedings of the upcoming FPCA, for example.

Uday Reddy
--
Uday Reddy
Department of Computer Science
University of Illinois
1304 West Springfield Avenue
Urbana, IL 61801.

tmb@ai.mit.edu (Thomas M. Breuel) (05/30/91)

In article <REDDY.91May28195125@reddy.cs.uiuc.edu> reddy@reddy.cs.uiuc.edu (Uday S. Reddy) writes:

   Researchers are working on notations for expressing sequentiality
   constraints so that compilers can update data structures in place.
   See the proceedings of the upcoming FPCA, for example.

I think this is the wrong approach. Using data structures based on
reverse diffs (as in RCS), an update involving side effects:

   val new_graph = (Graph.update_add_edge(graph,from,to); graph)

is no more efficient (except for possibly a small, constant time
overhead) than a more functional formulation:

   val new_graph = Graph.with_edge(graph,from,to)

Therefore, the following claim doesn't quite hold:

      Some months ago somewhere in the literature I found a statement like

	 "Functional programming is not well suited for algorithms of graph
	  theory as these usually make frequent use of side effects."

   I am curious where exactly you found that quote.  

   In any case, the reason graph algorithms cannot be efficiently
   implemented in functional languages is that they involve "shared
   updates".

Since both "updates" and lookups can be carried out in constant time
even in a purely functional setting using data structures based on
reverse diffs, this is not true.

   Such maps can be implemented, in turn, by balanced search trees of
   some kind to allow for (nondestructive) updating.  Of course, if you
   are using an "impure" functional language with assignments, you can
   use them to get what you want.  But, you should be careful not to
   create awful side effects.

The main problem is that most purely functional languages do not
provide data structures based on reverse diffs as primitives. If you
don't have side-effects in your language and only have data structures
like "list", "tuple", and "vector" built in, you lose, of course. (But,
by analogy, if you don't have "vector" in you language, you don't
get constant time lookups, so the fault is not with the functional
programming paradigm, but with the absence of the right primitive
data type.)

However, in a language like SML, you can isolate the side effects to
a small, abstract module and program your graph algorithms happily,
side-effect free, and efficiently.

						Thomas.

jwb@cepmax.ncsu.edu (John W. Baugh Jr.) (05/30/91)

Thomas M. Breuel writes about "reverse diffs":

> I think this is the wrong approach. Using data structures based on
> reverse diffs (as in RCS), an update involving side effects:
> 
>    val new_graph = (Graph.update_add_edge(graph,from,to); graph)
> 
> is no more efficient (except for possibly a small, constant time
> overhead) than a more functional formulation:
> 
>    val new_graph = Graph.with_edge(graph,from,to)

I've never heard of reverse diffs--could you please give some examples
and/or references.  Also, can reverse diffs deal effectively with
other structured data types (like matrices: constant access & constant
update)?

John Baugh
jwb@cepmax.ncsu.edu

reddy@reddy.cs.uiuc.edu (Uday S. Reddy) (05/30/91)

In article <TMB.91May29174602@volterra.ai.mit.edu> tmb@ai.mit.edu (Thomas M. Breuel) writes:

      Researchers are working on notations for expressing sequentiality
      constraints so that compilers can update data structures in place.
      See the proceedings of the upcoming FPCA, for example.

   I think this is the wrong approach. Using data structures based on
   reverse diffs (as in RCS), an update involving side effects:

      val new_graph = (Graph.update_add_edge(graph,from,to); graph)

   is no more efficient (except for possibly a small, constant time
   overhead) than a more functional formulation:

      val new_graph = Graph.with_edge(graph,from,to)
----------

I just forgot how touchy this subject is.  I cannot get into a long
debate at this time, but some clarifications.

I don't recall recommending "side effects" of any kind.  The work I
referred in the above quote uses pure functional languages.
Additional sequentiality annotations are added so that compilers can
implement them efficiently.

I wasn't referring to the work of Swarup, Ireland and me, which also
happens to appear in the upcoming FPCA.  Our work uses a different
approach with explicit assignments.  But, it too does not involve
"side effects".

If you believe there is no problem in dealing with updates in
functional languages, take a standard text book on algorithms and
write the interesting algorithms in a functional language using your
favorite techniques.  ("interesting" means using updates in an
interesting way).  Then answer the following questions:

- Are the data structures represented as abstractly as in the
imperative programs?

- Do these data structures use the language's storage management or do
you need to manage the storage explicitly?

- Are the algorithms as efficient as the imperative versions? 

- Are they as "convenient" as the imperative versions?  For example,
do they involve passing an inordinate number of arguments while the
corresponding imperative versions don't?

If you are happy with your answers to all these questions, publish
your results!

--
Uday Reddy
Department of Computer Science
University of Illinois
1304 West Springfield Avenue
Urbana, IL 61801.

tmb@ai.mit.edu (Thomas M. Breuel) (05/30/91)

In article <1991May29.222935.13101@ncsu.edu> jwb@cepmax.ncsu.edu (John W. Baugh Jr.) writes:

   [Reddy, suggesting the addition of "sequence" primitives to allow
   a compiler to generate in-place updates rather than copying for modifying
   an array even in a purely functional language]

   > I think this is the wrong approach. Using data structures based on
   > reverse diffs (as in RCS), an update involving side effects [...]
   > is no more efficient (except for possibly a small, constant time
   > overhead) than a more functional formulation.

   I've never heard of reverse diffs--could you please give some examples
   and/or references.  Also, can reverse diffs deal effectively with
   other structured data types (like matrices: constant access & constant
   update)?

At the end, you find some old SML code (sorry about the style, it was
just a quick initial hack for playing around with the idea) that
should illustrate the point. The primitives you can use to operate on
INTDICT arrays are completely functional: sub, assoc, length, and
generate.  From the programmer's point of view, there is no way to
modify an INTDICT, although internally, the implementation uses
side-effects for efficiency.

The code gives you constant-time update and access to the latest
version of the array. If you try to update older versions, you pay
more overhead.  For converting code containing side-effects to a
functional style, it doesn't matter that updates of old versions take
long, since code written using side-effects by necessity will only
access and update the latest version of an array. For other purposes,
there are more complicated, tree-based mechanisms for keeping diffs
that give you log-time update and log-time retrieval for old versions.

The technique can be applied to any of the data structures used in
imperative languages.

As an efficiency hack, and to assert that you ever only want to use
the latest version of an array (as is the case for converted imperative
programs), you could also simply tag old versions as completely invalid
and raise an exception when an attempt is made to use them. I suspect
this would be equivalent to the "sequence" primitives that Reddy
mentions, but it does not require any language constructs beyond
what the functional subset of languages like SML already offers.

For other tricks you can play with reverse diffs, look at the RCS
documentation.

						Thomas.

signature VALUETYPE =
    sig
	type T
	val makestring:T->string
    end

signature INTDICT =
    sig
	type T
	type dict
	val generate : (int -> T) -> int -> dict
	val sub : dict * int -> T
	val length : dict -> int
	val assoc : dict * int * T -> dict
	val print : dict -> unit
    end

functor ReverseDiffArray(T:VALUETYPE):INTDICT =
    struct
	type T = T.T
	val makestring = T.makestring
	datatype rdiff = BASE of (T array) | DIFF of (int * T * rdiff ref)
	type dict = rdiff ref
	    
	nonfix sub
	    
	fun generate f n = ref(BASE(ArrayU.generate f n))
	    
	fun sub(ref(BASE(x)),i:int) = Array.sub(x,i)
	  | sub(ref(DIFF(key,value,next)),i:int) =
	    if i = key then value else sub(next,i);
		
	fun length(ref(BASE(x))) = Array.length(x)
	  | length(ref(DIFF(_,_,x))) = length(x)
	    
	fun assoc(last:dict as ref(BASE(x)),key:int,value:T) =
	    let
		val old = Array.sub(x,key)
		val new = (Array.update(x,key,value); ref(BASE(x)))
		val back = ref(DIFF(key,old,new))
	    in
		last := !back; new
	    end
	  | assoc(last:dict,key:int,value:T) =
	    let
		val n = length(last)
		val x = Array.array(length(last),sub(last,0))
		fun loop i = if i>=n then () else (Array.update(x,i,sub(last,i)); loop(i+1))
	    in
		loop 0; Array.update(x,key,value); ref(BASE(x))
	    end
	
	fun print(a:dict) =
	    let
		fun p(x) = output(std_out,x)
		val n = length(a)
		fun loop i =
		    if i>=n then ()
		    else (p(makestring(sub(a,i))); if i<n-1 then p " " else (); loop(i+1))
	    in
		p "<"; loop 0; p ">\n"
	    end
    end

structure IID = ReverseDiffArray(struct type T = int val makestring:T->string = makestring end);

- open IID;
open IID
- val a = generate (fn x => x) 10;
val a = ref (BASE prim?) : dict
- print a;
<0 1 2 3 4 5 6 7 8 9>
val it = () : unit
- val b = assoc(a,5,55);                   <-- constant time update
val b = ref (BASE prim?) : dict
- print b;
<0 1 2 3 4 55 6 7 8 9>
val it = () : unit
- print a;
<0 1 2 3 4 5 6 7 8 9>
val it = () : unit
- val c = assoc(b,6,66);                   <-- constant time update
val c = ref (BASE prim?) : dict
- print c;
<0 1 2 3 4 55 66 7 8 9>
val it = () : unit
- print b;
<0 1 2 3 4 55 6 7 8 9>
val it = () : unit
- print a;
<0 1 2 3 4 5 6 7 8 9>
val it = () : unit
- val d = assoc(a,3,33);                   <-- causes full copy (could be more efficient)
val d = ref (BASE prim?) : dict
- print d;
<0 1 2 33 4 5 6 7 8 9>
val it = () : unit
- print a;
<0 1 2 3 4 5 6 7 8 9>
val it = () : unit
- 

wg@opal.cs.tu-berlin.de (Wolfgang Grieskamp) (06/02/91)

tmb@ai.mit.edu (Thomas M. Breuel) writes:

 > [INTDICT "reverse-diff" implementation, cf. original posting]

Nice demonstration of the "clean" use of references in SML. Although I 
believe updatable arrays should be build in or implemented 
in a foreign language anyway, and a compiler can do the job slightly
better then the SML programmer, if it implements reference
counting and/or performs sharing analysis.

In the first case (RC) the generated code might check at run-time
if the object to update is "exclusive", i.e. only one reference
exists to it. If so, no "diffs" have to be created. In the second case,
static analysis detects that a given object is always exclusive 
in some context.

In any case one has to find a sequentialization order which 
minimizes sharing. Consider a expression which swaps two values of 
an array:

	assoc(assoc(A,j,A sub i),i,A sub j)

This, then, may be expressed by the "parallel" let (I hope "and" 
does something like this in SML; have the manual not in hand):

	let val xi = A sub i
	and val A1 = assoc(A,j,xi)
	and val xj = A sub j
	and val A2 = assoc(A1,i,xj) 
	in  A2

Textual sequentialization leads to the result that A is shared during 
the first call to assoc. But a simple heuristic can find the desired
ordering (i.e. first performing the sub applications, then the assoc
applications).  Let each parameter of a function be marked to be 
either "destructive" (function might sel-update the parameter) or 
"non-destructive" (function will definitively not). This information 
will be pre-given for primitive functions, and will be inherited 
by user defined functions.

Naturally, "assoc" is destructive in it's array argument, and sub is not 
(either pre-given or by analysis). The above parallel let will be
sequentialized performing first the non-destructive sub applications and 
then the destructive assoc applications.

To conclude, selective update is not _such_ a efficience penalty in 
functional languages provided your compiler really takes care of
it. Unfortunately, most functional language compilers I know of
won't (except the OPAL compiler).

--
Wolfgang Grieskamp 
wg@opal.cs.tu-berlin.de tub!tubopal!wg wg%opal@DB0TUI11.BITNET