[comp.lang.misc] Lets define "pointer"

gudeman@cs.arizona.edu (David Gudeman) (10/25/90)

In article  <KERS.90Oct24102535@cdollin.hpl.hp.com> kers@hplb.hpl.hp.com (Chris Dollin) writes:
]
]...I'm not sure I agree with Jim. But I'd rather disagree
]about the same thing that about two different ones, which is what I
]think David is in danger of doing...

You're taking all the fun out of it :-).  OK, I said before that this
disagreement seems to be about differing ideas of what a pointer is
(and then I promptly forgot it).  If Jim is using the definition

(1) pointer: a low-level abstraction of a machine address

then most of his points are unarguable.  Such a low-level construct is
bound to be the source of hard-to-find bugs, and is bound to cause
problems for an optimizing compiler.  I think this is far too specific
a definition, and the idea is already covered by the term "address".
You might use "address pointer" to refer to such beasts in a
programming language.  Note that C does not technically have such
pointers, although you can often pretend that it does.

Another possibility is that Jim is using the definition

(2) pointer: an abstraction of array positions that is not bounds checked

To assume no bounds checking is far too restrictive.  Of course if you
don't make this assumption then a lot of things Jim has said don't
make sense.

Others are using the definition

(3) pointer: a reference to a mutable object

That is, if you have X and Y, and something done to X can cause Y to
change, X and Y must be or contain pointers.  I think this is far too
general a definition, especially since the well-known term "mutable
object" already describes the situation.

I have been using the definition

(4) pointer: an abstration of a position in a sequence

Now I'll admit that this is non-standard and that I never completely
spelled out my assumptions.  In the future I'll try to use the term
"sequence pointer" to refer to these.  In particular, I was _not_
arguing that there is no need for recursive data structures or for a
better way to reference dynamic storage (both done with pointers in
C).  My argument was that pointer(4) is a useful abstraction.

Another possibility for the definition that Jim has been using:

(5) pointer: an abstraction of a reference (to a mutable object) that
             has a specific dereferencing operation.

In other words, you can't write "p = &x" and then start using "p" as
though it were "x", you have to write "*p".  This is probably the most
traditional definition, but if Jim has been using this definition then
his comments comparing indexes to pointers don't make sense.  As far
as I'm concerned there are both advantages and disadvantages to having
a specific dereferencing operation so I probably wouldn't argue either
way.
-- 
					David Gudeman
Department of Computer Science
The University of Arizona        gudeman@cs.arizona.edu
Tucson, AZ 85721                 noao!arizona!gudeman

jlg@lanl.gov (Jim Giles) (10/25/90)

From article <26767@megaron.cs.arizona.edu>, by gudeman@cs.arizona.edu (David Gudeman):
>
> [...]                                   If Jim is using the definition
> 
> (1) pointer: a low-level abstraction of a machine address
> 
> then most of his points are unarguable.  Such a low-level construct is
> bound to be the source of hard-to-find bugs, and is bound to cause
> problems for an optimizing compiler.  I think this is far too specific
> a definition, and the idea is already covered by the term "address".

This is indeed the definition of the object I call "pointer".  I'm
gratified that you consider most of my points to be unarguable in
this context.  I can't agree with you that this definition is too
specific though.  In fact, for very low-level applications, this is
the pointer's optimal interpretation.

> [...]
> Another possibility is that Jim is using the definition
> 
> (2) pointer: an abstraction of array positions that is not bounds checked

First, you left out the fact that pointers are only similar in this
way to *one*dimensional* arrays.  My definition of arrays requires
more than boundedness of indexing - it requires the possibility
of multiple orthogonal index sets.

Note, only C pointers have the ability to index.  No other popular
language does (unless it can trace its design to C).  So, indexing
is an extension - it has no place in the fundamental definition
of what pointers are.

Note: as a practical matter, everyone assumed that C pointer arithmetic
_was_ unbounded until the debate about the ANSI proposed standard a
few years back.  To be sure, K&R (the closest thing to a language
definition that C had before the standard) did not define what pointer
arithmetic beyond the bounds of individually allocated objects meant.
But, everyone uniformly seemed to assume that it was valid address
calculation.  It is certainly one of the most common _non-standard_
practices today.

> [...]
> Others are using the definition
> 
> (3) pointer: a reference to a mutable object
>
> That is, if you have X and Y, and something done to X can cause Y to
> change, X and Y must be or contain pointers.  I think this is far too
> general a definition, especially since the well-known term "mutable
> object" already describes the situation.

This is what I've been refering to as variables which are associated
with each other through the "aliased" attribute.  To be sure, pointer
(those raw address things) can be used to implement this feature, but
pointers aren't sufficiently controllable.  (Or, if you make them
sufficiently controllable, then they won't be any different from the
"aliased" attribute I recommend - at least, that is my contention.)

> [...]
> I have been using the definition
> 
> (4) pointer: an abstration of a position in a sequence
> 
> Now I'll admit that this is non-standard and that I never completely
> spelled out my assumptions.  In the future I'll try to use the term
> "sequence pointer" to refer to these.  In particular, I was _not_
> arguing that there is no need for recursive data structures or for a
> better way to reference dynamic storage (both done with pointers in
> C).  My argument was that pointer(4) is a useful abstraction.

From your previous posting of this idea, I can't see that it requires
more than the data structures that I've been hawking.  However, I wouldn't
call this object a "pointer" - it is clearly a "list index" or some such
thing.  I _would_ like to know more about this abstraction though.
Even if I don't decide it needs direct implementation in a language,
it could be a good example of the types of things that need to be
supported.

> [...]
> Another possibility for the definition that Jim has been using:
> 
> (5) pointer: an abstraction of a reference (to a mutable object) that
>              has a specific dereferencing operation.

To be sure, a pointer (that raw address thing) does need a dereferencing
operator in order to retrieve specific objects.  This need is due to the
properties of the pointer.  It is most certainly _not_ the definition of a
pointer.  I _have_ argued that dereferencing and the "address-of" operator
would be unnecessary if pointers did not exist.  Instead, for "aliased"
variables (and _only_ for those) a "shallow copy" assignment operator
is all that is needed:

      aliased integer :: x, y
      integer :: z

      x = 5             !normal "deep copy" assignment
			!the _value_ of X is replaced with 5
			!if the _reference_ part of X is nil,
			!(a possibility for "aliased" variables)
			!it is allocated first.

      y <- x            !"shallow copy" assignment
			!the _reference_ part of Y is replaced with
			!a copy of the _reference_ part of X
			!the two variables are now _really_ aliased
			!the attribute merely declared they were _allowed_
			!to be

      z = y             !normal "deep copy" assignment
			!the value of Z is replaced by that of Y (now 5)

      z <- y            !ILLEGAL!  Z has not been declared aliased with Y

Note that in all contexts, the variables X, Y, and Z have the usual
meaning of integer variables - _except_ when the "shallow copy"
operator is used.  In that case, it is the _reference_ part of the
variable's meaning that is used or altered.  No "dereference"
operator is needed since the value is accessible just like any
other integer variable.

Consider the degree of control here:  X and Y may be aliased to each
other (or not), but they can _never_ be aliased to anything else in
this context.  If they are passed to some other context (through
procedure arguments or as global variables) the new context would
_also_ have to declare them as "aliased".  This last constraint
can be verified at load time.  This means that the compiler is
_always_ informed about which variables might be aliased and which
are certainly never aliased - result: more efficient code.  The user
is also aware easily of aliasing possibilities - result: easier
debugging.

Note: the "shallow copy" operator requires an "l-value" on _both_
the left- and the right-hand sides (very much like pointer assignment,
which is what this replaces).

J. Giles

gudeman@cs.arizona.edu (David Gudeman) (10/25/90)

In article  <3808@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
]From article <26767@megaron.cs.arizona.edu>, by gudeman@cs.arizona.edu (David Gudeman):
]>
]> (1) pointer: a low-level abstraction of a machine address
]
]...  I can't agree with you that this definition is too
]specific though...

But there are great differences between machine addresses and the
pointers of most programming languages.  Pointers, for example are
generally typed, and can legally only point at a certain set of
objects (though some languages/implementations don't provide much
security on this).  Also, there is a restricted set of basic
operations that produce new pointers (address-of and allocation).  You
can't make up arbitrary integers and expect them to be legal pointers.
And in languages with pointer arithmetic, there is no guarantee that
memory is laid out in any given manner.  That is, if you declare two
arrays {int A1[5], A2[5]; ...} you have no guarantee that A1[7] is a
reference to A2.  Pointers can't be true memory addresses unless the
programmer knows something about the way memory is laid out.

]Note: as a practical matter, everyone assumed that C pointer arithmetic
]_was_ unbounded until the debate about the ANSI proposed standard a
]few years back.

Not quite.  Everyone assumed that pointer arithmetic was legal and
predictable as long as the addresses involved were legal unsigned
longs.  But no one ever (to my knowledge) thought that the language
allowed the _dereferencing_ of arbitrary pointers in any portable
manner.

]> (3) pointer: a reference to a mutable object
]
]This is what I've been refering to as variables which are associated
]with each other through the "aliased" attribute.

I had something a bit more abstract in mind.  What happens when you do
something like this

  x = new_object;
  y = x;
  change_object(y);

where the function call "change_object" has some side-effect on y?  At
the abstract level we can say that the object referenced by x is
mutable, in which case the change to y causes an identical change in
x.  Or we can say the object is immutable, in which case it can't
actually be "changed", you can only assign a different object to y,
and x remains the same.  For example in Icon, strings are immutable
and lists are mutable.  Therefore after the code sequence

  s1 := "abc"; s2 := s1; s2[2] := "x"
  l1 := [1,2,3]; l2 := l1; l2[2] := 0

you have s1 = "abc", s2 = "axc", l1 = l2 = [1,0,3].  This feature
cannot be completely described with either aliasing or pointers,
because two immutable objects may alias either other.  It is also
possible to imagine a situation where two data objects do not overlap
in memory but are still connected as identical mutable objects (for
example in an object distributed over a network).

]From your previous posting of this idea, I can't see that it requires
]more than the data structures that I've been hawking.  However, I wouldn't
]call this object a "pointer" - it is clearly a "list index" or some such
]thing.

Like anything else, it can be implemented on top of other
abstractions.  However without direct language support I doubt many
people would use it, they would just fake it with indexes.  This is
similar to the situation in C where you need an explicit "break" in
switch statements.  How many people define a macro

#define cs break; case

and how many just keep writing out the "break" everywhere?

] "aliased"
]variables (and _only_ for those) a "shallow copy" assignment operator
]is all that is needed:
]      x = 5             !normal "deep copy" assignment
]      y <- x            !"shallow copy" assignment
]      z = y             !normal "deep copy" assignment

And presumably:
       x = 6		!the values of both y and x get changed to 6

Well, I said I wouldn't argue either way, but I may as well play
Devil's Advocate.  This is syntactically nicer and (I believe) less
confusing than pointers, but you have lost the idea of references as
first-class objects.  That is, you cannot pass references to
procedures or get them returned from procedures.  The only things you
can do with references is what the language specifically provides for.
-- 
					David Gudeman
Department of Computer Science
The University of Arizona        gudeman@cs.arizona.edu
Tucson, AZ 85721                 noao!arizona!gudeman

Chris.Holt@newcastle.ac.uk (Chris Holt) (10/25/90)

In article <3808@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>David Gudeman suggested:
>> (4) pointer: an abstration of a position in a sequence
>>     ... My argument was that pointer(4) is a useful abstraction.
>
>From your previous posting of this idea, I can't see that it requires
>more than the data structures that I've been hawking.  However, I wouldn't
>call this object a "pointer" - it is clearly a "list index" or some such
>thing.  I _would_ like to know more about this abstraction though.
>Even if I don't decide it needs direct implementation in a language,
>it could be a good example of the types of things that need to be
>supported.

This one is interesting, but needs generalizing.  So let's look at
a "tree index", or even better, a graph index.  This "points" at
a component (node) of a graph, and can "move" in any direction for
which there is a field in the node (i.e. an arc from the indexed
node to another node might be described by a field name; viewing the
node as a function and applying it to the field name yields
the result consisting of the original graph, with the new node index).

At this point ( :-) there are a couple of different things we might
want to do.  Some indices will want merely to move about the graph,
looking at various nodes and the relationships among them; these
have read-access.  Others may wish to add new arcs and nodes to the
graph; these require create-access.  Still others may wish to have
the power to remove arcs and nodes, requiring destroy-access.

Perhaps the best discipline would be to create a graph attached
to an unchangeable initial index.  Then, any other index for the
graph would be declared with respect to that one, together with
its access properties.  Or perhaps not?
-----------------------------------------------------------------------------
 Chris.Holt@newcastle.ac.uk      Computing Lab, U of Newcastle upon Tyne, UK
-----------------------------------------------------------------------------
 "Who overcomes by reason sole hath overcome but half his foe..."

jlg@lanl.gov (Jim Giles) (10/26/90)

From article <26788@megaron.cs.arizona.edu>, by gudeman@cs.arizona.edu (David Gudeman):
> In article  <3808@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> ]From article <26767@megaron.cs.arizona.edu>, by gudeman@cs.arizona.edu (David Gudeman):
> ]>
> ]> (1) pointer: a low-level abstraction of a machine address
> ]
> ]...  I can't agree with you that this definition is too
> ]specific though...
> 
> But there are great differences between machine addresses and the
> pointers of most programming languages.  Pointers, for example are
> generally typed, and can legally only point at a certain set of
> objects (though some languages/implementations don't provide much
> security on this).  [...]

Quite so.  In C, this constraint is unenforceable without expensive
run-time tests. (You can cast to (void *) legally.  Then you can
cast to (anything *) later.  Nothing but a tag on the pointer value
and a run-time test could detect this in the general case.)

In Pascal-like languages (including the one I propose - which
differs only in syntax and in placing much stricter bounds on the
range of aliasing allowed) the underlying type is indeed
adequately constrained.  A properly constructed compiler/loader
environment can insure that pointers always are used with the
proper type.  You are correct that this is an important property.

> [...]               Also, there is a restricted set of basic
> operations that produce new pointers (address-of and allocation).  [...]

The set of operations defined on _any_ data type in a computer
language is restricted.  I know of no language which contains all
the normal mathematically defined _integer_ functions.  Pointers
don't fare any better.  In fact, there is no need for _any_
address operation but assignment (what I call "shallow copy")
in a high level language.  For a _very_ low level language, the
address calculations should be as unrestricted as possible.

Either way, the pointer itself is an address.  That's what the
_reference_ part of a variable which the "shallow copy" operator
effects is.  Putting constraints on the allowed operations on
addresses doesn't change what they are.

> ]Note: as a practical matter, everyone assumed that C pointer arithmetic
> ]_was_ unbounded until the debate about the ANSI proposed standard a
> ]few years back.
> 
> Not quite.  Everyone assumed that pointer arithmetic was legal and
> predictable as long as the addresses involved were legal unsigned
> longs.  But no one ever (to my knowledge) thought that the language
> allowed the _dereferencing_ of arbitrary pointers in any portable
> manner.

Well, I don't know about whether people expected it to be portable
or not.  The fact remains that pointing about like that _is_ a
common occurrence in C code that I've been unfortunate enough to
have had contact with.  Some of this is 'professionally' written
code with wide distribution.  I'm told that this particular problem
was among the many which slowed the porting of much 'standard'
UNIX software to UNICOS on the Cray.

> ]> (3) pointer: a reference to a mutable object
> ]
> ]This is what I've been refering to as variables which are associated
> ]with each other through the "aliased" attribute.
> 
> I had something a bit more abstract in mind.  What happens when you do
> something like this
> 
>   x = new_object;
>   y = x;
>   change_object(y);
> 
> where the function call "change_object" has some side-effect on y?  [...]

See below.

> [...]
> ] "aliased"
> ]variables (and _only_ for those) a "shallow copy" assignment operator
> ]is all that is needed:
> ]      x = 5             !normal "deep copy" assignment
> ]      y <- x            !"shallow copy" assignment
> ]      z = y             !normal "deep copy" assignment
> 
> And presumably:
>        x = 6		!the values of both y and x get changed to 6

Yes.

> [...]
> Well, I said I wouldn't argue either way, but I may as well play
> Devil's Advocate.  This is syntactically nicer and (I believe) less
> confusing than pointers, but you have lost the idea of references as
> first-class objects.  That is, you cannot pass references to
> procedures or get them returned from procedures.  The only things you
> can do with references is what the language specifically provides for.

Huh?  A variable with the "aliased" attribute can be passed to a
procedure.  If it is the object of a "shallow copy" operation in
the procedure, the resulting new _reference_ part of the variable
will be returned just like any change in the _value_ part of the
variable would have been.  Why would you not expect this to be the
case?  As I have said, the "aliased" attribute is semantically
_identical_ to the pointers in Pascal and similar languages (except
for the more stringent bounds on aliasing).  If Pascal pointers are
"first class" objects - then so are "aliased" variables.

J. Giles