[comp.lang.fortran] Arrays and pointers

fouts@lemming.nas.nasa.gov.nas.nasa.gov (Marty Fouts) (09/21/88)

From the point of view of a language designer, one way to think of
arrays is that they provide a notation for naming a block of
contiguous memory and a mapping references to single elements of that
block.  Consider the Fortran array A(2,2) (using 1 based indexing.)
This array names 4 consecutive locations:

  Some_Address + 0  A(1,1)
  Some_Address + 1  A(2,1)
  Some_Address + 2  A(1,2)
  Some_Address + 3  A(2,2)

This approach generalizes to multiple dimensions, structures don't
have to fit into a single memory word, and indices can range from any
minimum to any maximum.  In this model, an Array is precisely a
rectangular array with adjacent elements.

This is exactly equivalent to using pointer arithmetic to accomplish
the naming, in which the arithematic is given using the classic offset
formulae:

  Pointer = Some_Address

  Address_Of(A(i,j)) = Pointer + (i-1) + ( (j-1) * 2 )

It doesn't matter if your language allows you to explicitly express
the latter form or not, if you implement arrays as blocks of memory,
the equivalence exists.  [It is in fact, how the compiler does array
reference calculations without optimization in most languages. . .]

You can notice that (j-1) * 2 requires a multiplication. (Bit shift in
this case, but multiplication if the array dimension isn't a power of
two.) This can be avoided by building a table of precomputed
addresses, called a dope vector.  Some compilers do this as an
optimization for multiple dimension arrays, to avoid a lot of runtime
calculations.

A 'Trick' many Fortran programmers use to avoid array index
calculations when marching through a multiple dimension array is to
use Fortran to "flatten" the array:

      PROGRAM XMPL
      PARAMETER *** Specify values for N, M, and L
      DIMENSION X(N,M,L)
      CALL INIT(X,N*M*L)
      *** etc ***
      END

      SUBROUTINE INIT(X,ISIZE)
      DIMENSION X(ISIZE)
      DO 10 I = 1, ISIZE
        X(I) = *** initialization value
   10 CONTINUE
      RETURN
      END

Of course, if your language has explicit address manipulation, you can
avoid the subroutine call via

      P = &X(1,1,1)
      DO 10 I = 0, ISIZE - 1
        *(P+I) = *** initialization value
   10 CONTINUE

You can also do your own 'dope vector' optimization:

      DO 30 I = 1, L
        DO 20 J = 1, M
          DO 10 K = 1,N
            *** Calculate using X(I,J,K)
   10     CONTINUE
   20   CONTINUE
   30 CONTINUE

can also be expressed as

      INDEX = 0
      DO 30 I = 1, L
        DO 20 J = 1, M
          DO 10 K = 1,N
            *** Calculate using *(P+INDEX)
            INDEX = INDEX + 1
   10     CONTINUE
   20   CONTINUE
   30 CONTINUE

an optimization one can argue that the compiler should make, but not
all compilers do.  Also, you can write your own function for mapping
names to physical address and implement sparse matrices, triangular
matrices, etc, in a way you can control.

You can also generalize all of these things.  Current Fortran doesn't
give you the kind of control to do this, because it has no mechanism
for explicit control of name to address mapping and aliasing at this
level.  Current C does some of this, although not all of it well.  In
addition, current C adds more general addres to name aliasing
functionality.

Contrary to some of the arguments in this list, from the point of view
of a language designer, pointers/arrays serve very much the same
purpose.  contiguous arrays are a compact special purpose mechanism
for annotating a mapping from names to addresses.  Pointers also
provide such a mechanism, plus give additional semantics.  C
recognizes the existing equivalance by allowing the two syntactic
structures which express the same semantics to be used interchangably.
This is a notational convenience which can make algorithms easier to
read.  (Also harder, it is still possible to write unreadable code in
any language.)

To see the value of multiple syntax for the same semantics, consider
the result (proven long ago) that the only structures needed to
express control in sequential programs are sequence, branching, and
loop-while-true.  Most languages offer several variations on branching
and looping, for the notational use of the programmer, and few people
would argue that we should do away with all control structures except
goto and while-condition-do-block.

A language with pointers has more expressive capabilty (although, of
course, it can't compute anything one without them can) than one which
doesn't. This gives the programmer more power in expressing
computation, at the expensive of providing more ways to make mistakes.

Some programmers would prefer not to have access to the expressive
power, as a way of avoiding making bugs.  Others, myself included, are
willing to take the chance.

Marty
+-+-+-+     I don't know who I am, why should you?     +-+-+-+
   |        fouts@lemming.nas.nasa.gov                    |
   |        ...!ames!orville!fouts                        |
   |        Never attribute to malice what can be         |
+-+-+-+     explained by incompetence.                 +-+-+-+

lamaster@ames.arc.nasa.gov (Hugh LaMaster) (09/21/88)

In article <1023@amelia.nas.nasa.gov> fouts@lemming.nas.nasa.gov.nas.nasa.gov (Marty Fouts) writes:
>
>Contrary to some of the arguments in this list, from the point of view
>of a language designer, pointers/arrays serve very much the same
>purpose.  contiguous arrays are a compact special purpose mechanism
>for annotating a mapping from names to addresses.  Pointers also

You lost me here.  Arrays are a particular type of data structure.  They
are no different from any other type of data structure.  So, if I
generalize your argument, you could say "any data structure is a
compact mechanism for annotating a mapping from names to addresses."
Taken to an extreme, I don't need data structures at all, just pointers
and memory cells.  Which is true, but...

ALMOST everyone thinks data structures are much clearer and less error
prone than pointers and raw memory, and

Portability always suffers when you drag in raw memory.  Contrary to
popular belief, not all machines are byte addressable, and not all
byte addressable machines allow arbitrary alignment.  In fact, not all
machines have a single contiguous address space.  For all of these
reasons, old Fortran programmers prefer to use pointers ONLY for passing
the address of a data structure around, and not for other things.

>Some programmers would prefer not to have access to the expressive
>power, as a way of avoiding making bugs.  Others, myself included, are
>willing to take the chance.

I have not seen anyone in this group suggest abolishing pointers.  In fact,
it turns out that many Fortrans have pointers, as some people were
boasting.  The argument is whether pointers should be used for anything
fancier than as a pointer to a data structure.  I suggest that, from the
point of view of project management, other uses of pointers should be
treated like assembly language code:  OK, if you have to, but, confine it
to a small number of replaceable modules.



-- 
  Hugh LaMaster, m/s 233-9,  UUCP ames!lamaster
  NASA Ames Research Center  ARPA lamaster@ames.arc.nasa.gov
  Moffett Field, CA 94035     
  Phone:  (415)694-6117       

fouts@lemming.nas.nasa.gov.nas.nasa.gov (Marty Fouts) (09/22/88)

In article <15269@ames.arc.nasa.gov> lamaster@ames.arc.nasa.gov.UUCP (Hugh LaMaster) writes:
>
>You lost me here.  Arrays are a particular type of data structure.  They
>are no different from any other type of data structure.  So, if I
>generalize your argument, you could say "any data structure is a
>compact mechanism for annotating a mapping from names to addresses."
>Taken to an extreme, I don't need data structures at all, just pointers
>and memory cells.  Which is true, but...
>

I haven't lost you.  That's exactly my point.  Now add to it the
notational convenience of using pointers when they make the code
clearer to read then array references.  All programming can be done
with a Turing machine, but we go for notational convenience

>Portability always suffers when you drag in raw memory.  Contrary to
>popular belief, not all machines are byte addressable, and not all
>byte addressable machines allow arbitrary alignment.  In fact, not all
>machines have a single contiguous address space.  For all of these
>reasons, old Fortran programmers prefer to use pointers ONLY for passing
>the address of a data structure around, and not for other things.
>

Yes, you can write bad code with pointers.  That's not my point.  I
have never said to use *raw memory addresses*.  I haven't mentioned
byte addressing either.  Nor have I mentioned 'a single contiguous
address space.'  *NONE* of these things are relevant to the correct
use of pointers in addressing arrays.  Pointer arithmetic is done in
"thing" sized units, and it doesn't matter if the thing is 1/8 of a
word, 1 word, or N words long.  If you don't misuse pointers, you
don't see the underlying addressing mode.  Also, in this context,
pointers are used to address that part of memory which is a contiguous
address space --- After all, we are talking about pointers into
Fortran style arrays, which are contiguous.  Not a single such
address space, but many, one for each array.

Don't generalize about old Fortran programmers.  I am an old Fortran
programmer, having written my first "High Level Language" code for an
IBM 1620 and having written > 100K lines of Fortran; some
of which was ported to many machines.  I use pointers for all kinds of
name aliasing, all of the time.

>I have not seen anyone in this group suggest abolishing pointers.  In fact,

The context of 'expressive power' in the above was not abolishing
pointers, it was abolishing the expression of pointer
arithmetic/array naming equivalence in a programming language, which
has been espoused in this list.

>it turns out that many Fortrans have pointers, as some people were
>boasting.  The argument is whether pointers should be used for anything
>fancier than as a pointer to a data structure.  I suggest that, from the
>point of view of project management, other uses of pointers should be
>treated like assembly language code:  OK, if you have to, but, confine it
>to a small number of replaceable modules.
>

You are thinking of how to misuse pointers.  I am thinking of how to
correctly use them.  Even Wirth, that bastion of safe-code, thought
that pointers were tame enough for Pascal and Modula-2.  I can write
unmaintainable code using any facility (remember the GOTO debate and
how the first round was finally ended by Knuth's well reasoned when
to/ when not to paper?) or I can write elegant code, using any
facility.

+-+-+-+     I don't know who I am, why should you?     +-+-+-+
   |        fouts@lemming.nas.nasa.gov                    |
   |        ...!ames!orville!fouts                        |
   |        Never attribute to malice what can be         |
+-+-+-+     explained by incompetence.                 +-+-+-+

jlg@lanl.gov (Jim Giles) (09/22/88)

From article <1030@amelia.nas.nasa.gov>, by fouts@lemming.nas.nasa.gov.nas.nasa.gov (Marty Fouts):
> [...]                                        Now add to it the
> notational convenience of using pointers when they make the code
> clearer to read then array references.  [...]

That is - _NEARLY_NEVER_!!  A pointer is a way of placing the structural
template defined by a data type over an arbitrary memory location.  
                                                                 ^
PERIOD!!  That's all a pointer is for.  If I don't need that functionality,
I also don't want any pointers!  You're right in your argument: the purpose
of a programming language is to provide convenient and readable access
to the functionality of the machine.  You're wrong in your conclusion:
pointers (especially the C syntax) usually obscure the meaning of the
code because they are used to implement functionality that's inappropriate
to them.

By the way, the use of pointers in the above way _is_ sometimes useful
and should be provided in the language.  But it shouldn't be used to
implement other things for which it is _not_ appropriate.

J. Giles
Los Alamos

fouts@lemming.nas.nasa.gov.nas.nasa.gov (Marty Fouts) (09/22/88)

In article <3968@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>From article <1030@amelia.nas.nasa.gov>, by fouts@lemming.nas.nasa.gov.nas.nasa.gov (Marty Fouts):
>> [...]                                        Now add to it the
>> notational convenience of using pointers when they make the code
>> clearer to read then array references.  [...]
>
>That is - _NEARLY_NEVER_!!  A pointer is a way of placing the structural
>template defined by a data type over an arbitrary memory location.  
>                                                                 ^
>PERIOD!!  That's all a pointer is for.

NO. NO. NO. (Gee, I can capshout too ;-)  A pointer is also a
mechanism by which I introduce an arbitrary data structure into my
language, allowing me to easily express data structures that the 
original language designer didn't think to provide.  As Wirth pointed
out in the Pascal design, structure+new/dispose+pointer-to-structure
allows construction of arbitrary data structures.

Unless, of course, you believe that you can correctly identify and
supply all of the data structures that I'm going to need, and you want
to burden all programers with being aware of all these data
structures.

Consider the case of FIFOs.  I can fairly easily implement a fixed
length FIFO using an array, but I grew up on data structures texts
where it was always done with a linked list and to me such an
approach produces more readable code. (Your milage may vary, but such
is the notation I'm use to.)

Marty
+-+-+-+     I don't know who I am, why should you?     +-+-+-+
   |        fouts@lemming.nas.nasa.gov                    |
   |        ...!ames!orville!fouts                        |
   |        Never attribute to malice what can be         |
+-+-+-+     explained by incompetence.                 +-+-+-+

bga@raspail.UUCP (Bruce Albrecht) (09/23/88)

In article <1030@amelia.nas.nasa.gov>, fouts@lemming.nas.nasa.gov.nas.nasa.gov (Marty Fouts) writes:
> You are thinking of how to misuse pointers.  I am thinking of how to
> correctly use them.  Even Wirth, that bastion of safe-code, thought
> that pointers were tame enough for Pascal and Modula-2.  I can write
> unmaintainable code using any facility (remember the GOTO debate and
> how the first round was finally ended by Knuth's well reasoned when
> to/ when not to paper?) or I can write elegant code, using any
> facility.

You're only partially right here.  Very few strongly typed languages (and
neither Pascal nor Modula-2 are in this category) have any pointer operations
other than assignment and comparison.  The pointer increment and decrement
operations are the major reason why many people are against the way C uses
pointers as a way to index through arrays.  

It's interesting that you compare the use of pointers to the use of GOTO.
The argument against the GOTO is that unrestricted use of them tended to
result in unreadable, unmaintainable code.  I think that using pointer
increment/decrement tends to produce the same result.  It's more difficult
to provide bounds checking in C if you are using pointers to traverse the
array elements, and I suspect code is more obscure if the code is not
accessing each element sequentially.  Nearly any random access to an array
is going to be more clearly represented using indices than by using pointers.

ok@quintus.uucp (Richard A. O'Keefe) (09/23/88)

In article <3968@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>That is - _NEARLY_NEVER_!!  A pointer is a way of placing the structural
>template defined by a data type over an arbitrary memory location.  
>                                                                 ^
>PERIOD!!  That's all a pointer is for.

That's a very strange way of thinking about pointers.
It is possible to have pointers without having pointer arithmetic and without
having any way of obtaining the address of an existing location.  For
example, in one of my favourite languages, I would write
	REF [,] REAL two d array = HEAP [1:m,1:n] REAL;
which allocates a new m x n array and binds the reference two d array to it.
[Thanks to "dereferencing", the fact that two d array is a reference can be
ignored in the code that uses it.  This is not an unqualified good.]

A pointer is the way of referring to a dynamically allocated object, ...
                                                                   ^
COMMA!!  A pointer is for lots of other things too.
But "placing a structural template" sounds more like EQUIVALENCE (:-)...

jlg@lanl.gov (Jim Giles) (09/24/88)

From article <364@raspail.UUCP>, by bga@raspail.UUCP (Bruce Albrecht):
> [...]
> It's interesting that you compare the use of pointers to the use of GOTO.

It is indeed.  Hoare pointed out that when you compare control structures
to data structures, the GOTO is isomorphic to the pointer.  I don't think
this observation is original to Hoare, but it's true whoever originated it.


J. Giles
Los Alamos> The argument against the GOTO is that unrestricted use of them tended to
> result in unreadable, unmaintainable code.  I think that using pointer
> increment/decrement tends to produce the same result.  It's more difficult
> to provide bounds checking in C if you are using pointers to traverse the
> array elements, and I suspect code is more obscure if the code is not
> accessing each element sequentially.  Nearly any random access to an array
> is going to be more clearly represented using indices than by using pointers.

jlg@lanl.gov (Jim Giles) (09/24/88)

From article <463@quintus.UUCP>, by ok@quintus.uucp (Richard A. O'Keefe):
> [...]
> A pointer is the way of referring to a dynamically allocated object, ...
>                                                                    ^
> COMMA!!  A pointer is for lots of other things too.
> But "placing a structural template" sounds more like EQUIVALENCE (:-)...

But there are _better_ ways of doing all those other things - ie. with
a separate feature for each thing instead of overloading all those
completely different features onto the same construct.  I can do all
numerical work with just integers, but I don't - I _want_ the language
to also support floating point, bit, complex, etc..  In the same way,
I _want_ the dynamic memory feature to be different from the linked-list
(recursive data type) feature to be different from....  A pointer isn't
even the _best_ way of referring to a dynamically allocated object
(I'd rather just refer to it as a 'whatevertype' variable, not as a
'pointer_to_whatevertype').  The bottom line is that a user visable
pointer is just another source of possible error - something I can do
without.

J. Giles
Los Alamos