[comp.lang.fortran] Data types _without_ pointers

jlg@lanl.gov (Jim Giles) (09/23/88)

From article <1032@amelia.nas.nasa.gov>, by fouts@lemming.nas.nasa.gov.nas.nasa.gov (Marty Fouts):
> [...]
> NO. NO. NO. (Gee, I can capshout too ;-)  A pointer is also a
> mechanism by which I introduce an arbitrary data structure into my
> language, allowing me to easily express data structures that the 
> original language designer didn't think to provide.  [...]

I have never seen a data structure that couldn't be described without
pointers.  The first book I read on data structuring was "Structured
Programming" by Dahl, Dijkstra, and Hoare [1972 - I read it for the
first time that very year].  Hoare's chapter on data structuring contains 
a complete listing of the mechanisms needed to define any structure that 
I've ever seen anyone use.  The language should have all of these in 
addition to the basic types of integer and float that it decides to support.

1) Unstructured Data Types:  This was Hoare's name for what Pascal
   calls an enumerated data type.  Example:
             type suit = (club,diamond,heart,spade)

2) The Cartesian Product:  Hoare's name for what Pascal calls records
   and C calls structures.  Example:
             type cardface = (s:suit, r:rank)

3) The Discriminated Union: Hoare's name for Pascal's variant records.
   I believe even C has this concept too :-).  Example:
             type pokercard = (normal:cardface, wild:(joker1, joker2))

4) The Array: Yes, array is a separate data type.  It's not a structure
   built up from lists - it _should_ be a distinct type.  (Multi-d
   arrays should also be allowed and not implemented as array-of-array-....)

5) The Powerset: Hoare's name for Pascal's sets.  Example:
             type hand = powerset of cardface
   A 'hand' may contain from zero up to all 52 cards.

6) The Sequence: Strings, sequential files, etc..  A sequence is a data
   structure made up of an ordered list of data items all of the same
   type.  Sequences have a property of length, which can be explicitly
   kept with the sequence or the sequence may have a distinct member
   which signals the end.  Example:
             type string = sequence of character

7) Recussive Data Structures: Linked lists, etc..  These _are_ defined
   and used in some high-level languages _without_ user visable pointers.
   Example:
             type list = (value:integer, link:list)
   A variable of any recursive data type _may_ have no value at certain
   times during the execution of a code - it should be possible to check
   for this condition.

_NONE_ of these require user accessable pointers not even the linked list!
Items of type 'list' have values of type 'list' - _not_ values of type
'address_of_list'.  In no case may the user of any of these data type
alter the memory location of a data object (without copying the data).
With the addition of an allocate/deallocate mechanism (which is needed
for the recursive data structures anyway - might as well make it
available to all data types), the set is capable of representing _any_
data structure that I've ever seen.  

Now, indeed, the _internal_ representations of these data structures may
ABOUND with pointers.  It is neither the business nor the desire of
most users to concern themselves with the internal implementation of
these things (any more than most users care what type of semiconductor
logic is used in the machine's hardware).

Note: Hoare's chapter goes on to describe how to use the above
structures in combination in order to implement sparse versions
of many of them - also without explicit pointer usage.

So, back to my original statement, pointers are needed _only_ for the
purpose of placing a data type over an arbitrary memory location.  This
is a perfectly useful function.  I might, for example, want to treat
a 27 word SEQUENCE as if it were a 3x3x3 ARRAY.  I _could_ copy it to
an array - but that's time consuming.  I _could_ manipulate the 
sequence directly and do the index calculations explicitly - but that's
hard to read and maintain.  It is more readable - and faster - to
define a pointer to a 3x3x3 array and set it to the address of the
sequence.  This use of pointers for 'structured type coersion' is
perfectly acceptable and is the _only_ reason I can see for including
pointers in a programming language (of course, a suitably constructed
run-time EQUIVALENCE would serve the same function).

J. Giles
Los Alamos

fouts@lemming.nas.nasa.gov.nas.nasa.gov (Marty Fouts) (09/23/88)

In article <4026@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>From article <1032@amelia.nas.nasa.gov>, by fouts@lemming.nas.nasa.gov.nas.nasa.gov (Marty Fouts):
>> [...]
>> NO. NO. NO. (Gee, I can capshout too ;-)  A pointer is also a
>> mechanism by which I introduce an arbitrary data structure into my
>> language, allowing me to easily express data structures that the 
>> original language designer didn't think to provide.  [...]
>
>I have never seen a data structure that couldn't be described without
>pointers.

Read the note you are responding to.  I'm not claiming a structure
that can't be described without structures.  I'm claiming that many
data structures and their use can be expressed in a very readable way
by using pointers, and that I like that readability.  I use that
readability.

Take as an example, a fifo (a structure I use sometimes in discrete
event modeling.)  I can implement a fifo using an array and two
integers (to keep track of where I am).  Using the convention that the
head number records the first used entry in the queue and that
the tail number records the first unused entry, except when they both
have the same value, in which case the fifo is empty, I can write the
two Fortran routines:

      SUBROUTINE ADDTO(FIFO,ISIZE,IHEAD,ITAIL,VALUE)
      DIMENSION FIFO(ISIZE)
      IF ((ITAIL .EQ. IHEAD) .AND. (IHEAD .NE. 0)) THEN
C
C FIFO FULL ERROR HANDLING
C
        RETURN
      ENDIF
      ITAIL = ITAIL + 1
      IF (ITAIL .GT. ISIZE) ITAIL = 1
      FIFO(ITAIL) = VALUE
      IF (IHEAD .EQ. 0) IHEAD = 1
      RETURN
      END

      FUNCTION REMOVE(FIFO,ISIZE,IHEAD,ITAIL,VALUE)
      DIMENSION FIFO(ISIZE)
      IF (ITAIL .EQ. IHEAD) THEN
C
C FIFO EMPTY ERROR HANDLING
C
         RETURN
     ENDIF
     REMOVE = FIFO(IHEAD)
     IHEAD = IHEAD + 1
     IF (IHEAD .GT. ISIZE) IHEAD = 1
     IF (IHEAD .EQ. ITAIL) THEN
        IHEAD = 0
        ITAIL = 0
     ENDIF
     RETURN
     END

I can also do the same thing using pointers (from C):

typedef struct fifo_entry {
    real value;
    struct fifo_entry *next;
} FIFO_ENTRY, *FIFO_ENTRY_PTR;

typedef struct fifo {
   FIFO_ENTRY_PTR head, tail;
} FIFO;

void addto(fifo,new_value)
    FIFO fifo;
    real new_value;
{
    FIFO_ENTRY_PTR new_entry;

    new_entry = (FIFO_ENTRY_PTR)malloc(sizeof(FIFO_ENTRY));
    if (new_entry == 0) {
      /* MEMORY ERROR EXIT */
      return;
    }

    new_entry->value = new_value;

    if (fifo.tail != 0)  (fifo.tail)->next = new_entry;
    fifo.tail = new_entry;
    if (fifo.head == 0) fifo.head = new_entry;
    return;
}

real remove(fifo)
    FIFO fifo;
{
    real return_value;
    FIFO_ENTRY_PTR old_entry;

    if (fifo.head == 0) {
      /* EMPTY FIFO ERROR EXIT */
      return
    }

    old_entry = fifo.head;
    return_value = old_entry->value;
    fifo.head = fifo.head->next;
    free((char *)old_entry);
}

Neither of these is a good substitute for (Not available in any
language I usually use)

Declare fifo to be a fifo of real;

and using 'fifo = x' to add x to the end of fifo and 'x = fifo' to
remove x from the begining of the fifo.

In the example, the two forms are (to me, because I've had a lot of
experience with both) about equally readable.  The Fortran form is
somewhat more restricted because it implements a fixed length fifo,
but that doesn't matter in the applications I run.

I would prefer the third form, but if I have to chose between the
first two, I would prefer the second, because it more closely maps to
what I'm thinking about when I think of fifos.

The code gets even messier with more difficult structures, like
balanced binary trees.

On the other hand, if you implement recursive structures using Hoare's
notation, you have to think about recursive structures, and I just
haven't been trained to do that.  Try representing a dynamic length
fifo using Hoare's structure and answering the questions:

1) What happens when I add to a fifo?
2) How do I represent this structure?
3) How do I keep track of the head/tail?
4) How do I detect an empty fifo?

BTW, recursive structures are a very clumsy way to think about bags,
which differ from sets by being allowed to contain duplicate entries,
and about ordered sets/bags, which require the set to be maintained in
a sequence which is externally enforced.

In summary:  Yes, you can have data types without pointers, either by
forcing them on top of existing languages, or by using Hoare's
recursive structure technique.  I never said otherwise.  I said that
if I don't have the data types and have to build my own abstractions,
I prefer to use pointers to do it.

+-+-+-+     I don't know who I am, why should you?     +-+-+-+
   |        fouts@lemming.nas.nasa.gov                    |
   |        ...!ames!orville!fouts                        |
   |        Never attribute to malice what can be         |
+-+-+-+     explained by incompetence.                 +-+-+-+

peter@ficc.uu.net (Peter da Silva) (09/23/88)

In article <4026@lanl.gov>, jlg@lanl.gov (Jim Giles) writes:
> _NONE_ of these require user accessable pointers not even the linked list!
> Items of type 'list' have values of type 'list' - _not_ values of type
> 'address_of_list'....

This is an interesting arrangement. How do you propose to implement insertion
into a doubly linked list without pointers:

	struct {
		struct list *prev, *next;
		data;
	} *p, *q;

	...

	q->next = p;
	p->prev->next = q;
	q->prev = p->prev;
	p->prev = q;

Whatever syntax would you use that would adequately hide the pointers and
still let you express this? I am genuinely curious.
-- 
Peter da Silva  `-_-'  Ferranti International Controls Corporation.
"Have you hugged  U  your wolf today?"            peter@ficc.uu.net

jlg@lanl.gov (Jim Giles) (09/24/88)

From article <1580@ficc.uu.net>, by peter@ficc.uu.net (Peter da Silva):
> 	struct {
> 		struct list *prev, *next;
> 		data;
> 	} *p, *q;
> 	...
> 	q->next = p;
> 	p->prev->next = q;
> 	q->prev = p->prev;
> 	p->prev = q;

How about (in Fortran-like syntax):

      type dlink
         integer data
         dlink :: next, prev       !recursive data declaration
      end type dlink

      dlink :: p, q
      ...
      q%next = p                   !Fortran 8x uses % for the structure ref
      p%prev%next = q              !I know!  I don't like % either
      q%prev = p%prev
      p%prev = q

No pointers!  Only variables of type 'dlink'.  The advantage is that
type checking will prevent me from assigning any address other that
another variable of type 'dlink' to any of the links.  With pointers,
type checking is not performed (if it is, then some of the really useful
things that pointers can do are lost) - I am stuck with the possibility
of a link in my list pointing off into left field.  I only want _that_
to be possible if I really might want to point into left field.

J. Giles
Los Alamos

peter@ficc.uu.net (Peter da Silva) (09/24/88)

In article <4096@lanl.gov>, jlg@lanl.gov (Jim Giles) writes:
> From article <1580@ficc.uu.net>, by peter@ficc.uu.net (Peter da Silva):
> > 	struct {
> > 		struct list *prev, *next;
> > 		data;
> > 	} *p, *q;
> > 	...
> > 	q->next = p;
> > 	p->prev->next = q;
> > 	q->prev = p->prev;
> > 	p->prev = q;

>       type dlink
>          integer data
>          dlink :: next, prev       !recursive data declaration
>       end type dlink

>       dlink :: p, q
>       ...
>       q%next = p                   !Fortran 8x uses % for the structure ref
>       p%prev%next = q              !I know!  I don't like % either
>       q%prev = p%prev
>       p%prev = q

> No pointers!  Only variables of type 'dlink'.

So, how do you differentiate between copying pointers and copying structures?
In 'C' I can say:

	extern struct doubly_linked_list empty_list;	/* initialised to
							   empty in runtime
							   initialisation,
							   say */
	struct doubly_linked_list header;

	header = empty_list;	/* initialise header by copying data in */

Now in Fortranoid:

	dlink :: emptylist
	common /initlist/ emptylist	! initialised to an empty list
					! in runtime initialistion.
	dlink :: header

	header = emptylist	! Is this a pointer copy or a structure copy?

Say what you mean and mean what you say.

Before you ask if I ever use such things, the answer is 'yes'. In fact,
I frequently copy list elements. Here is an example from a program I'm
using on a real-time operating system (Browser on AmigaDOS):

struct MessagePort *messageport;	/* messageport contains a
					   doubly linked list */
struct Message *msg, copymsg;	/* message is a list element */

	msg = GetMsg(messageport);
	if(I'm going to be using this message for a while) {
		copymsg = *msg;	/* copies contents of msg */
		ReplyMsg(msg);	/* releases it for reuse */
		msg = copymsg;	/* keep the code tidy */
	}

	use msg, without caring if it's local or not.

	if(msg != copymsg)
		ReplyMsg(msg);

How would you do this without pointers, and without confusing the hell
out of the syntax of your linked lists? And without making a subroutine
call (in-line copies can be much faster... efficiency remember).
-- 
Peter da Silva  `-_-'  Ferranti International Controls Corporation.
"Have you hugged  U  your wolf today?"            peter@ficc.uu.net

seanf@sco.COM (Sean Fagan) (09/26/88)

In article <4096@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>How about (in Fortran-like syntax):
>      type dlink
>         integer data
>         dlink :: next, prev       !recursive data declaration
>      end type dlink
>      dlink :: p, q
>      q%next = p                   !Fortran 8x uses % for the structure ref
>      p%prev%next = q              !I know!  I don't like % either
>      q%prev = p%prev
>      p%prev = q
>No pointers!  Only variables of type 'dlink'.  The advantage is that
>type checking will prevent me from assigning any address other that
>another variable of type 'dlink' to any of the links.  With pointers,
>type checking is not performed (if it is, then some of the really useful
>things that pointers can do are lost) - I am stuck with the possibility
>of a link in my list pointing off into left field.  I only want _that_
>to be possible if I really might want to point into left field.

Um, to paraphrase, "a pointer by any other name would still point as sweet"
(I'm a programmer, not a muse).  Just because it doesn't look like a pointer
doesn't mean it's *not* a pointer (and if it isn't, it is not useful in the
above example).  Pascal won't let you do anything other than what you're
doing above.  In fact, you could, probably, get rid of the '^' operator in
Pascal, and you would have something like the above FORTRAN 8x code (except
you would use '.' instead of '%').

You're right, over typechecked pointers are not as useful as non-typechecked
pointers.  Witness Pascal vs. C.  However, most C compilers will warn about
pointers being assigned to or from arbitrary objects (MSC will, for example,
complain about malloc() being assinge to a struct X * without a type-cast).

Also, in an earier posting, Jim complained about a lack of a standard for C.
There is one, and has been one for quite a while:  the White Book (aka the
Bible, the Old Testament, K&R, and, of course, "The C Programming
Language").  As far as I've been able to tell, any code that conforms to
what they describe is portable to all C compilers anywhere (8086's, Pr0mes,
Cybers, 3B's, Suns, etc.).  It's just not as useful as going outside of the
standard is.  And, sad to say, FORTRAN has that same problem.  In how many
ways does VMS FORTRAN deviate from the standard F77?  How about CDC NOS
FORTRAN 5?  Cray CFT?  It's very easy to write portable code, and it's very
easy to write useful code.  It's just difficult, at times, to make the two
intersect.

-- 
Sean Eric Fagan  | "ld runs like a turtle out of antartica"
seanf@sco.UUCP   |     Steven Ryan (smryan@garth.UUCP)
(408) 458-1422   | Any opinions expressed are my own, not my employers'.

ka@june.cs.washington.edu (Kenneth Almquist) (09/29/88)

jlg@lanl.gov (Jim Giles) writes:
> I have never seen a data structure that couldn't be described without
> pointers.
>
> [...]
>
> _NONE_ of these require user accessable pointers not even the linked list!
> Items of type 'list' have values of type 'list' - _not_ values of type
> 'address_of_list'.  In no case may the user of any of these data type
> alter the memory location of a data object (without copying the data).

When *I* talk about a linked list, I mean (among other things) a data
structure in which it is possible to insert an element at an arbitrary
location in the list in time O(1).  Jim doesn't say how the data structures
he talks about being able to describe can be manipulated, but the last
comment quoted above certainly suggests that inserting an element in a
list using Hoare's mechanism requires copying the entire list up to the
point of an insertion, so an insertion at an arbitrary point in the list
requires time O(n).  In article <4108@lanl.gov>, Jim gives an implementation
of a fifo queue using an array.  The standard implementation of a queue
using a linked list requires time O(1) to add an element, while Jim's code
has a worst case time of O(n) to add an element, where n is the number of
elements in the queue.

The whole point of chosing a data structure is to find one which will
handle the operations you want to perform on it efficiently.  If Hoare's
method of describing data structures does not let us describe linked lists
and queues with access times of order O(1), then it is only useful in
contexts where efficiency doesn't matter.  To quote again from article
<4108@lanl.gov>:

> Fifo's _can_ be implemented as recursive data structures, but arrays
> with indices are easier and more efficient.

But if your language supports pointers, implementing a queue using a
linked list is easier than using an array.  I'm pretty sure I could get
a linked list implementation right the first time, whereas getting the
array implementation right is considerably harder.  Jim's code contains
a number of bugs (e.g. queue%empty and queue%full are not set correctly,
growing the array is done incorrectly).  Fixing the bugs will increase
the complexity of the code significantly, and it will *still* suffer
from an O(n) worst case time.

> Use the structure which best fits the problem.  This means: rarely use
> pointers

In my view, pointers quite often fit the problem.  I do see the value
of abstract data types, which allow the user to define a queue data
type once and then use it where ever a queue is needed, but abstract
data types don't eliminate the need for pointers to allow queues to be
implemented easily in the first place.
				Kenneth Almquist