[comp.lang.modula2] Recursively defined data types.

markh@csd4.milw.wisc.edu (Mark William Hopkins) (02/15/88)

In an earlier posting, I asked why Modula2 does not allow direct recursion
on data types as exemplified by the following definition:

	     List  =  RECORD CASE Null:BOOLEAN OF
				  TRUE:          |
				  FALSE: Head: Atomic-Type;
					 Tail: List
                             END
                      END;

Most of the answers I got were of the form:

          Because the compiler does not know how to "size" this type.

The obvious answer to this is to represent the recursive field (Tail) internally
by a pointer -- a language and its implementation are two independent issues.

So why do I even make an issue about it in the first place?  Using pointers and 
allowing for direct recursion in a language are fully equivalent.  Direct 
recursion would be implemented with pointers anyway.  My question, then, was 
why using pointers is favored over direct recursion.

It seems more or less like a matter of choice, since they essentally do the same
thing.  I had other things in mind, though,  that tend to sway the issue in 
favor of using recursion.

A good programming language, from what I can infer from Wirth's writings should
support modularity and should be powerful while maintaining uniformity. 

UNIFORMITY would require that as many distinct concepts as possible be unified 
and implemented in a uniform way.  For example, the following sets might be 
unified in a uniform programming language:

     BRANCHING:                        RECURSION:
     Variant Records (Type sums),      Dynamic Types,
     Branching statements (If, Case)   Loops and recursive procedues,
                                       Recursive functions

     PARALLELISM:                      DEFINITIONS:
     Records  (Type products),         Type declarations,
     Parallel execution,               Procedure declarations,
     Parallel evaluation               Assignment statements

Uniformity would demand direct recursion on data types in place of pointers.

One thing that MODULARITY would require is that the difference between 
user-defined and predefined constructs be made invisible to the language.  This
would give the designer wide latitude in implementing (and ***extending***) the
language and would allow for bootstrapping.

Wirth has gone part of the way in adding a new data type in Modula2 to represent
procedures.  The low-level module SYSTEM also has a type to represent words.
But what about Types?  Why isn't there a data type - Type - in the language?
With this addition, one could construct variable types (and solve the problem 
that Pascal and Modula2 have had with representing variable array types) and
functions that return types (i.e. user-defined type constructors).  This is
POLYMORPHISM.  
This, too, would favor direct recursion in favor of indirect recursion with
pointers.

This latter point brings up a side issue: would polymorphism in Modula2 
resolve the need for such things as Formal Types (or conformant array parameters
in Pascal)?

gvcormack@watdragon.waterloo.edu (Gordon V. Cormack) (02/15/88)

In article <4593@uwmcsd1.UUCP>, markh@csd4.milw.wisc.edu (Mark William Hopkins) writes:
> So why do I even make an issue about it in the first place?  Using pointers and 
> allowing for direct recursion in a language are fully equivalent.  Direct 
> recursion would be implemented with pointers anyway.  My question, then, was 
> why using pointers is favored over direct recursion.

It is not true that they are equivalent.  Recursion can be used only
to express trees:  it cannot represent DAGS or data structures with
cycles.

Now we can get into how functional programming doesn't "use" any
time or space at all, so I suppose we could always "cover" one of
these data structures by potentially large sets of trees. 
I don't believe this is a very good way to express such things, and
the implementation (deep down we are interested in implementation)
can often take orders of complexity more time and space.

Besides, any notation I've seen for recursive data types 
(cf. Hoare's "Recursive Data Types") has notational baggage for
dealing with and setting the variants that is equivalent to using
pointers and "new" and "dispose".

> Wirth has gone part of the way in adding a new data type in Modula2 to represent
> procedures.  The low-level module SYSTEM also has a type to represent words.
> But what about Types?  Why isn't there a data type - Type - in the language?
> With this addition, one could construct variable types (and solve the problem 
> that Pascal and Modula2 have had with representing variable array types) and
> functions that return types (i.e. user-defined type constructors).  This is
> POLYMORPHISM.  
> This, too, would favor direct recursion in favor of indirect recursion with
> pointers.
> 
> This latter point brings up a side issue: would polymorphism in Modula2 
> resolve the need for such things as Formal Types (or conformant array parameters
> in Pascal)?

Polymorphism is a current research topic.  It is too bad that Wirth
has chosen to ignore this very important issue in programming
language design, but it is far from a trivial change (as implied 
above) to bolt it into the language.

Whether or not polymorphism eliminates formal parameter specifications
depends of the flavour of polymorphism we are talking about.  In ML
this is true to some extent.  In the language I am involved in
designing and implementing (ForceTwo) formal parameter specifications
continue to be very important.  There is a paper "Polymorphism in
the compiled programming language ForceOne" in the 1987 Hawaii
Conference on Systems Sciences, or you can ask me for a more recent
tech. report.
-- 
Gordon V. Cormack     CS Dept, University of Waterloo, Canada N2L 3G1
                      gvcormack@waterloo  { .CSNET or .CDN or .EDU }
                      gvcormack@water         { UUCP or BITNET }

carroll@snail.CS.UIUC.EDU (02/16/88)

	I still don't understand how you would reference a recursive
data element that was far down in the structure. How would you parametrize
the reference to, say, the 10th data field of a recursively defined type?
Would you have to type in 10 "." operators, with the field names
between them? And how would you handle variable depth references?

Alan M. Carroll		amc@woodshop.cs.uiuc.edu	carroll@s.cs.uiuc.edu
...{ihnp4,convex}!uiucdcs!woodshop!amc
Quote of the day :
	"Touch my soul, catch the very light
	 Hide the moment, from my eager eyes"

markh@csd4.milw.wisc.edu (Mark William Hopkins) (02/17/88)

In article <5158@watdragon.waterloo.edu> gvcormack@watdragon.waterloo.edu (Gordon V. Cormack) writes:
>In article <4593@uwmcsd1.UUCP>, markh@csd4.milw.wisc.edu (Mark William Hopkins) writes:
>> So why do I even make an issue about it in the first place?  Using pointers and 
>> allowing for direct recursion in a language are fully equivalent.  Direct 
>> recursion would be implemented with pointers anyway.  My question, then, was 
>> why using pointers is favored over direct recursion.
>
>It is not true that they are equivalent.  Recursion can be used only
>to express trees:  it cannot represent DAGS or data structures with
>cycles.

I argue that they are fully equivalent:

Example of how a recursive data type would be used to represent a cyclic 
structure:

type     List = RECORD CASE Null:BOOLEAN OF
			 TRUE:             |
			 FALSE: Data:Atom;
				Next:List
                       END
                END;
...
var      L : List;
...

		Here, L is representing the circular list:

		       L ~~~>   A --> B --> C -* ,  L  =   *---------------*
				^              |           |       A       |
				|              |           *---------------*
				*--------------*           | *-----------* |
<ASSIGNING L ITS VALUE>                                    | |     B     | |
WITH L DO                                                  | *-----------* |
   Null := FALSE;                                          | | *-------* | |
   Data := A;                                              | | |   C   | | |
   WITH Next DO                                            | | *-------* | |
      Null := FALSE;                                       | | |   L   | | |
      Data := B;                                           | | *-------* | |
      WITH Next DO                                         | *-----------* |
         Null := FALSE;                                    *---------------*
         Data := C;
         Next := L       (***** This closes the cycle *****)
      END
   END
END

This is if Modula-2 had data type recursion in it.

I guess I don't see the problem ... except if it be in assigning the last Next
field the value of the very record being accessed.

It is obvious that one would have to use internal pointers to represent the Next
field, but other than that there does not seem to be any major problem here.

markh@csd4.milw.wisc.edu (Mark William Hopkins) (02/17/88)

In article <11100002@snail> carroll@snail.CS.UIUC.EDU writes:
>
>	I still don't understand how you would reference a recursive
>data element that was far down in the structure. How would you parametrize
>the reference to, say, the 10th data field of a recursively defined type?
>Would you have to type in 10 "." operators, with the field names
>between them? And how would you handle variable depth references?

If Modula-2 had data type recursion, then roughly like this:
List would be defined by:
	
                  List = RECORD CASE Null:Boolean OF
				   TRUE:              |
				   FALSE: Data:Atom;
					  Next:List
                                END
                         END;

<ACCESING THE TENTH ELEMENT OF THE LIST L:List>

ReadHead is declared to be of type List:

		    ReadHead := L;
		    FOR I := 1 TO 9 DO
			 ReadHead := ReadHead.Next
                    END;
		    RETURN ReadHead.Data;

sullivan@marge.math.binghamton.edu (fred sullivan) (02/17/88)

I first dealt with tree structures using snobol-4, which uses recursively
defined types.  I was very disappointed when I learned about how one does
these things in Pascal.  By the time modula came along, I was over the
shock.  One reason for wanting recursively defined types is that their
semantics is much cleaner than pointer semantics.  I approve of the idea.

Fred Sullivan
Department of Mathematical Sciences
State University of New York at Binghamton
Binghamton, New York  13903
Email: sullivan@marge.math.binghamton.edu

kasper@csli.STANFORD.EDU (Kasper Osterbye) (02/18/88)

Dear Mark,

Recursive datatypes are not pointers in disguise. If I am not wrong, they
will only represent trees - and here is why.

I will assume your definition of a list.

List = RECORD CASE Null:BOOLEAN OF
         TRUE : |
         FALSE: Key: ...;
                Data: ....;
                Next: List;
       END


now assume that we have variables l1,l2 of type "List", and that l1 is (a,b,c).
the magic of the concept recursive datatypes is that it is not pointers so
when I do a 

l2 := l1;

l2 gets assigned a COPY of l1, and if you change anything to l1, it will not
affect l2.

further more, an assignment of the form:

l1.next.next := l1

will not give you a cycle, but rather the list (a,b,a,b,c). 

That is at least how I learned recursive types. As to your changed syntax
for pointers so they pretend they are recursive, I dislike it, it looks
different, but you will still have to learn the concept of pointers to
use it.
The why not use pointers afterall?

/Kasper

BEB@UNO.BITNET (Bruce Bettis) (02/19/88)

The concept of recursively defined data types is, I think, intuitively clean
and elegant, but only from a theoretical perspective. The cold hard realities
are: (1) the size of a static or dynamic allocation is impossible to determine
with the same elegance, (2) each recursive structure would be contiguous in
memory, with all the constraints that implies, (3) assignments require more
(sometimes *lots* more) resources, namely cpu cycles and memory. These are
the three biggies, to me, I imagine there are lots of other practical reasons
to prefer referential structures. Still, it's a nice concept :)

                                  Bruce
death before disclaimer
<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
<>Handle:   Bruce Bettis         USnail:   University of New Orleans  <>
<>BITnet:   <BEB@UNO.BITNET>               Computer Research Center   <>
<>Voices:   (504) 286-7067                 New Orleans, La. 70148     <>
<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

dcw@doc.ic.ac.uk (Duncan C White) (02/25/88)

I have been following the current discussion about recursive data types
[RDTs from now on] with great interest.

I would like to add my thoughts to the matter.

In article <2420@csli.STANFORD.EDU> kasper@csli.UUCP (Kasper Osterbye) writes:
>Dear Mark,
>
>RDTs are not pointers in disguise. If I am not wrong, they
>will only represent trees - and here is why.
>
>I will assume your definition of a list.
>
>List = RECORD CASE Null:BOOLEAN OF
>         TRUE : |
>         FALSE: Key: ...;
>                Data: ....;
>                Next: List;
>       END
>
>
>now assume that we have variables l1,l2 of type "List", and that l1 is (a,b,c).
>the magic of the concept RDTs is that it is not pointers so
>when I do a 
>
>l2 := l1;
>
>l2 gets assigned a COPY of l1, and if you change anything to l1, it will not
>affect l2.
>

Why not support TWO kinds of assignment :

1).	assign-with-copy:	creates an entirely new copy, as you suggest
			[ie. what you are assuming := does..]
and
2).	assign-reference:	makes a new reference to the same object.
			[roughly what Modula-2 currently does with := ]


Incidentally, this is what SIMULA-67 does.... [applying it to class-instances
of course, rather than direct recursion] :

ie.	:= is (1) and :- is (2).

Personally, I think I prefer adding a new inbuilt polymorphic function
copy( ), to be used as in
	l2 := copy( l1 );
and leaving := unchanged [ meaning (2)]


Kasper continues:
>further more, an assignment of the form:
>
>l1.next.next := l1
>
>will not give you a cycle, but rather the list (a,b,a,b,c). 
>

Is there any problem here?

One question does remain: is it always possible for the compiler to generate
code to copy an RDT?

I think it is, but my mind bottles out at really complex egs, and I don't
have a proof...

>That is at least how I learned RDTs. As to your changed syntax
>for pointers so they pretend they are recursive, I dislike it, it looks
>different, but you will still have to learn the concept of pointers to
>use it.
>The why not use pointers afterall?
>

Well, RDTs are a much higher level concept than pointers.

I would like to suggest a totally new syntax for such RDTs... borrowed from
the experimental functional language HOPE [Burstall & McQueen, Edinburgh]
in which you declare new RDTs as groups of alternative SHAPES, and then
use pattern-matching to determine which shape an element of an RDT is.

For example, in HOPE syntax: [ dollars just to highlight it]

$	data num_tree == niltree ++ node( tree X tree ) ++ tip( num );
$
$	dec rev_tree : num_tree -> num_tree;
$	--- rev_tree( niltree ) <= niltree;
$	--- rev_tree( tip(n) ) <= tip(n);
$	--- rev_tree( node(left,right) ) <=
$		node( rev_tree(right), rev_tree(left) );

In a Modula-like language, then, we might write:

$	TYPE
$		num_tree =
$			SHAPE niltree
$			OR node( left : num_tree ; right : num_tree )
$			OR tip( t : INTEGER ) ;
$
$	FUNCTION rev_tree( atree : num_tree ) : num_tree;
$	BEGIN
$		CASE SHAPEOF( atree ) OF
$		  niltree : RETURN niltree;
$		| tip     : RETURN tip( atree.t );
$		| node    : RETURN node(
$				rev_tree(atree.right),
$				rev_tree(atree.left) );
$		END;
$	END rev_tree;

here, I have introduced a predicate SHAPEOF( x ) to extract the current shape
of a piece of shaped-data - after all, we need an alternative to pattern
matching.

Possible extensions:

1).	HOPE also provides polymorphic data declarations:

eg.

$	data tree( alpha ) == niltree ++ tip( alpha ) ++
$		node( tree(alpha) X tree(alpha) );

alpha "matches" any single type at run-time - but notice that HOPE cannot
declare PROLOG-style "untyped" lists.

So why not add something like this to the language ?
[Perhaps allowing untyped RDTs too? ]

2).	Classes.  Object-orientated programming is a whole new subject,
	but now Prof. Wirth has shot his bolt in announcing OBERON
	[my opinion: it's a bad joke] someone could introduce a nice
	set of class-extensions to Modula.


If anyone has any thoughts on these, I'd welcome discussion of them.


Finally, however, may I say that I think the resultant langauge would be
a lovely language to program in, but it would most definitely NOT be
Modula-2.


>/Kasper

	Duncan.

----------------------------------------------------------------------------
Duncan White,           |       Flying is the art of aiming oneself
Dept. Of Computing,     |       at the ground and missing.
Imperial College,       |               -- Douglas Adams, So Long and Thanks
London SW7, England     |                  for all the fish.