[comp.lang.misc] Answers, Chapter 1: TeX

jlg@lanl.gov (Jim Giles) (10/20/90)

> [... All quotes from Dan Bernstein (who else) ...]
> QUESTION 1: Do you see the contradiction in ``I absolutely refuse to
> look at the example you're giving me!'' and ``I am not turning away from
> anything!''?

Had I actually made the first statement, I would have been in
contradiction.  I did not.  I said (and still say) that I will
consider _any_ example that you care to post.  You have refused to
post the example you keep refering to.  However, I _have_ finally
looked over the source of the code you refer to. See below.

>> If there's something in TeX that can't be
>> done with recursive data structures (and the other features I've supported
>> on the net so visibly), then you ought to be able to give a specific
>> example - in C if you like.
>
> Packed array tries are not trivial to implement. I don't see why I
> should bother to type in a presentation just for you, when Knuth has
> already done a much better job.

Well don't bother then.  As it happens, someone _DID_ email the source
code for TeX 'packed array tries' over the weekend.  The news is even
worse for your argument that I expected.  There are _NO_ pointers in
the source code for TeX 'packed array tries'.  The code does everything
through the use of Pascal arrays.  Since arrays are one of the features
that I've "supported on the net so visibly" that I mentioned above, the
TeX code doesn't even meet the criterion.

> QUESTION 2: Why do you refuse to look up packed array tries? Perhaps
> because you're afraid that I'm right?

No, but if I were as cynical as you, I would suspect the reason that
you didn't post the example was that you knew the actual code didn't
support your position.  I am _not_ so cynical.  I'm sure you read the
_description_ of the code (which, no doubt pretended that explicit
pointers were used throughout).  As such, I accept that you argued the
question in good faith.

> QUESTION 3: Can you read? Do you see the word ``packed''? Does it even
> occur to you that there's a difference between a packed array trie and a
> trie (by which people usually mean a list trie or a normal array trie)?

It's not clear to me what kind of packing that pointers allow that
isn't allowed by some combination of the features that I've mentioned.
The example of TeX makes it quite clear that at least one example you
thought important _can_ be done with arrays quite easily.

End of chapter 1.

J. Giles

djones@megatest.UUCP (Dave Jones) (10/23/90)

From article <66253@lanl.gov>, by jlg@lanl.gov (Jim Giles):
> ... There are _NO_ pointers in
> the source code for TeX 'packed array tries'.  The code does everything
> through the use of Pascal arrays.

For theoretical considerations, an index into an array is not much
different from a pointer is it? Range-checking is the only significant
difference I can think of, and that only makes incorrect programs
fail nicer. (Pointers also have some measure of range-checking,
"segmentation violation" for example. Never happened to *me* of
course, but I've heard rumors.)

jlg@lanl.gov (Jim Giles) (10/24/90)

In article <14269@goofy.megatest.UUCP>, djones@megatest.UUCP (Dave Jones) writes:
> From article <66253@lanl.gov>, by jlg@lanl.gov (Jim Giles):
> > ... There are _NO_ pointers in
> > the source code for TeX 'packed array tries'.  The code does everything
> > through the use of Pascal arrays.
> 
> For theoretical considerations, an index into an array is not much
> different from a pointer is it? Range-checking is the only significant
> difference I can think of, and that only makes incorrect programs
> fail nicer. (Pointers also have some measure of range-checking,
> "segmentation violation" for example. Never happened to *me* of
> course, but I've heard rumors.)

For thetheoretical considerations, pointers and one-dimensional arrays
are indeed very similar.  But the context of the discussion makes it
clear that the person who posted TeX as an example of pointer efficiency
was of the opinion that pointers and arrays differed importantly.  In
fact, in the same news article that he proposed TeX, he also gave
examples which attempted to prove the supposed superiority of pointers
to arrays.

Of course, pointers and one dimensional arrays - in spite of their
similarity - are indeed different.  For one thing: arrays are bounded.
For another: distinct arrays are generally not aliased.  These two
properties make debugging and maintaining array codes somewhat easier
than pointer code.  Similarly, these two differences (particularly the
no-alias information) is useful to the compiler in generating more
efficient code.  After all, all the common optimization techniques
that compilers use are inhibited to one extent or other by the
possible presence of aliasing.

J. Giles

gudeman@cs.arizona.edu (David Gudeman) (10/24/90)

-- 
					David Gudeman
Department of Computer Science
The University of Arizona        gudeman@cs.arizona.edu
Tucson, AZ 85721                 noao!arizona!gudeman

gudeman@cs.arizona.edu (David Gudeman) (10/24/90)

In article  <3656@lanl.gov> jlg@lanl.gov (Jim Giles) writes:

]Of course, pointers and one dimensional arrays - in spite of their
]similarity - are indeed different.  For one thing: arrays are bounded.

You know, Jim, it doesn't help your argument any when you keep
bringing up the same points long after they have been discredited.
Pointers are bounded just like array indexes.  Yes, most
implementations of C don't check the bounds, but that is because in
the definition of the C language, run-time errors are generally
defined to produced undefined results.  You could easily change the
implementation and the language specification to produce error
messages on out-of-bounds references.  This is not a difference
between pointers and arrays, it is a difference between language
philosophies.  No doubt if C had true arrays, an out-of-bounds index
would produce undefined results just like out-of-bounds pointers.
-- 
					David Gudeman
Department of Computer Science
The University of Arizona        gudeman@cs.arizona.edu
Tucson, AZ 85721                 noao!arizona!gudeman

jlg@lanl.gov (Jim Giles) (10/24/90)

From article <26726@megaron.cs.arizona.edu>, by gudeman@cs.arizona.edu (David Gudeman):
> In article  <3656@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> 
> ]Of course, pointers and one dimensional arrays - in spite of their
> ]similarity - are indeed different.  For one thing: arrays are bounded.
> 
> You know, Jim, it doesn't help your argument any when you keep
> bringing up the same points long after they have been discredited.
> Pointers are bounded just like array indexes.  [...]

Really?  I don't know _any_ language that has pointers that places
bounds on where it can point.  In C, if I have an 'int' var for
example, unless the int has the register attribute (in which case
it's _supposedly_ not in memory anyway :-), I can have _any_ (int *)
variable point to that 'int'.  No matter where in memory an object
is, a pointer to that type of object can point there.

> [...]                                          Yes, most
> implementations of C don't check the bounds, but that is because in
> the definition of the C language, run-time errors are generally
> defined to produced undefined results.  [...]

What you are talking about here is the ANSI C (which is _very_ new)
constraint on the validity of pointer _arithmetic_.  This constraint
merely says that comparing (or subtracting) pointers that currently
point to within separately allocated objects is undefined.  The same
status occurs if you add (or subtract) integers to (or from) pointers
so that the result leaves the bounds of a single allocated object.

These constraints are present to allow certain lazy implementations
on segmented archetectures to do all pointer arithmetic without
refering to the segment component of the addresses.  The pointer
itself can still be _assigned_ to point anywhere in memory - only
pointer arithmetic is effected by the constraint you mention.

By the way, I opposed this ANSI specification.  One of the _few_
things I think pointers are at all good for is implementing the
memory manager.  If pointer arithmetic across segment boundaries
is not reliable, then pointers can't be reliably used to implement
the memory manager (or, at least, not without considerable extra
difficulty).

J. Giles

gudeman@cs.arizona.edu (David Gudeman) (10/24/90)

In article  <3681@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
][I argue that pointers are bounded just like arrays]
]Really?  I don't know _any_ language that has pointers that places
]bounds on where it can point.

We are comparing pointers to indexes here.  The only time these two
are comparable is when the pointer is one that was calculated with
integer addition on a pointer to an array.  An out-of-bounds pointer
--in correspondence to an out-of-bounds index-- is a pointer whose
value is not in the same array as the starting pointer.

]What you are talking about here is the ANSI C (which is _very_ new)
]constraint on the validity of pointer _arithmetic_.  This constraint
]merely says that comparing (or subtracting) pointers that currently
]point to within separately allocated objects is undefined.

No, I'm talking about C in general.  And of course I'm talking about
pointer arithmetic, otherwise there is no correspondence between
indexes and pointers.  I don't believe any definition of C has ever
defined the effect of dereferencing a pointer that was miscalculated
so that it points out of the region the starting array.  However, C
_could_ define this as producing an error, and then pointer bounds
checking would be just as safe and reliable as array index bounds
checking.

This is all starting to sound very familiar...
-- 
					David Gudeman
Department of Computer Science
The University of Arizona        gudeman@cs.arizona.edu
Tucson, AZ 85721                 noao!arizona!gudeman

kers@hplb.hpl.hp.com (Chris Dollin) (10/24/90)

David Gudeman writes:

   In article  <3656@lanl.gov> jlg@lanl.gov (Jim Giles) writes:

   ]Of course, pointers and one dimensional arrays - in spite of their
   ]similarity - are indeed different.  For one thing: arrays are bounded.

   You know, Jim, it doesn't help your argument any when you keep
   bringing up the same points long after they have been discredited.
   Pointers are bounded just like array indexes.  Yes, most
   implementations of C don't check the bounds, but that is because in
   the definition of the C language, run-time errors are generally
   [rest omitted]

That's not what Jim means by "bounded" (if I understand him correctly) - his
point is that given two *arrays* A, B then it is *known* that A and B have no
locations (ie, updatable bits of store) in common, so assignments within A 
cannot affect the values accessible from B.

This is in contrast to two pointers P, Q (of the same type); an assignment
through P (*P = 42, or P[42] = MAX_INT) might well change values accessible
through Q.

Jim wants ALIAS declarations for the cases where variables can overlap.
Presumably in the case of a procedure with a header something like

    void example( T A[Bound], T B{Bound] ) ...

where A and B are not to be aliased (so the comoiler can do its wizzy 
optimisations), all calls have a proof obligation to show that the actual 
arguments are *in fact* not aliased.

Is that right, Jim? [I'm not sure I agree with Jim. But I'd rather disagree
about the same thing that about two different ones, which is what I think David
is in danger of doing.]

--

Regards, Kers.      | "You're better off  not dreaming of  the things to come;
Caravan:            | Dreams  are always ending  far too soon."

mhcoffin@watmsg.uwaterloo.ca (Michael Coffin) (10/24/90)

In article <3681@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> ...
>What you are talking about here is the ANSI C (which is _very_ new)
>constraint on the validity of pointer _arithmetic_.  This constraint
>merely says that comparing (or subtracting) pointers that currently
>point to within separately allocated objects is undefined.  The same
>status occurs if you add (or subtract) integers to (or from) pointers
>so that the result leaves the bounds of a single allocated object.

These restrictions are not new.  I quote from page 98 of "The C
Programming Language" by Kernighan and Ritchie, 1978:

    "Any pointer can be meaningfully compared for equality or
    inequality with NULL.  But all bets are off if you do arithmetic
    or comparisons with pointers pointing to different arrays.  If
    you're lucky, you'll get obvious nonsense on all machines.  If
    you're unlucky, your code will owrk on one machine but collapse
    mysteriously on another."

---
Michael Coffin				mhcoffin@watmsg.waterloo.edu
Dept. of Computer Science		office: (519) 885-1211
University of Waterloo			home:   (519) 725-5516
Waterloo, Ontario, Canada N2L 3G1

jlg@lanl.gov (Jim Giles) (10/25/90)

From article <26740@megaron.cs.arizona.edu>, by gudeman@cs.arizona.edu (David Gudeman):
> In article  <3681@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> ][I argue that pointers are bounded just like arrays]
> ]Really?  I don't know _any_ language that has pointers that places
> ]bounds on where it can point.
> 
> We are comparing pointers to indexes here.  [...]

_ONE_ of the things that we are comparing pointers to is indexing.
That's one of the problems with pointers (at least, pointers a`la C)
is that many _different_ functionalities are all rolled up into
the pointer mechanism.

So, let's get a definition straight here.  When I say bounded (in the
context of a pointer discussion), I mean that the address of an object
is limited by its declaration (or, at least, its initialization - which
may be dynamic) to a specific memory location.  This means that an
inspection of the declaration and of the present location is sufficient
to determine whether the present reference is "in bounds".

With pointers, you can't make such a simple determination.  No matter
what memory location it references, it _may_ be a legal and intended
one.  This makes debugging pointers incrementally harder - especially
in large codes.  The compiler _can't_ make any simple automatic bounds
checks (although, in C it can perhaps automatically limit index
calculations).  Even the user with an interactive debugger may be
hard pressed to determine whether the pointer is correct or not.

Arrays (even dynamic ones) are always allocated by the compiler or
a compiler generated allocator (at least, according to the language
model I've been presenting) - so you _know_ that they are located
distinctly from other objects with sufficient space for their declared
length (unless the compiler is broken - in which case all bets are off
anyway).

Boundedness and aliasing are _slightly_ different concepts.  Clearly
something that is unbounded is potentially aliased to everything (or,
everything of the same underlying type).  Something that's bounded
_may_ still be aliased: this may even be desireable.

I recommend two mechanisms to implement boundedness: 1) all objects
are distinct; 2) operations allowing aliasing between distinct variables
must be explicitly requested when these variables are declared.  This
means that even those things which _are_ allowed to be aliased are
bounded to be aliased only to others of a small "family" of variables.
Variables _not_ in this "family" are _known_ not to be aliased to
anything within it (in fact, the constraint can be enforced by the
compiler and loader alone - no run-time support is needed - whether
run-time checks might be more efficient is a different question).

The bottom line is that I can think of some applications for bounded
aliasing, and certainly bounded non-aliased variables are very useful,
but I can't think of any application except the low-level memory manager
which needs unbounded memory access.

J. Giles

dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) (10/26/90)

A pointer is not an address.  A pointer is a way of finding an
address.

Just thought I'd mention it.
--
Rahul Dhesi <dhesi%cirrusl@oliveb.ATC.olivetti.com>
UUCP:  oliveb!cirrusl!dhesi

gl8f@astsun9.astro.Virginia.EDU (Greg Lindahl) (10/28/90)

In article <26726@megaron.cs.arizona.edu> gudeman@cs.arizona.edu (David Gudeman) writes:
>In article  <3656@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>
>]Of course, pointers and one dimensional arrays - in spite of their
>]similarity - are indeed different.  For one thing: arrays are bounded.
>
>You know, Jim, it doesn't help your argument any when you keep
>bringing up the same points long after they have been discredited.
>Pointers are bounded just like array indexes.

Actually, it just shows that we're talking past each other most of the
time. Here's a practical example where the boundedness of array
references allows you to easily generate better code:

     a[i] = b[i] + c[i]             *a = *b + *c;
     d[i] = b[i] / 2.               *d = *b / 2.

For the array version, the compiler can check to see if a and b
overlap, and if not it doesn't have to load b[i] twice. For the
pointer version, the compiler has to figure out what a, b, c, and d
point at, if it even bothers at all.

>  Yes, most
>implementations of C don't check the bounds, but that is because in
>the definition of the C language, run-time errors are generally
>defined to produced undefined results.

I'm not worried about run-time errors, I'm worried about run-time
efficiency. Some languages are hard to optimize. When C programers do
strength reduction by hand on array accesses, they create an
optimization mess.

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (10/29/90)

In article <1990Oct28.015733.9181@murdoch.acc.Virginia.EDU>, gl8f@astsun9.astro.Virginia.EDU (Greg Lindahl) writes:
> Actually, it just shows that we're talking past each other most of the
> time. Here's a practical example where the boundedness of array
> references allows you to easily generate better code:
> 
>      a[i] = b[i] + c[i]             *a = *b + *c;
>      d[i] = b[i] / 2.               *d = *b / 2.
> 
> For the array version, the compiler can check to see if a and b
> overlap, and if not it doesn't have to load b[i] twice. For the
> pointer version, the compiler has to figure out what a, b, c, and d
> point at, if it even bothers at all.

Yes, *sigh* we're all talking past each other.
	THIS IS NOT AN INTRINSIC PROPERTY OF POINTERS;
	it is a property of the programming language *C*.

I haven't my Euclid manual handy, so I'll use Pascal syntax and
the Euclid idea:
	var
	    az: zone of real;
	    bz: zone of real;
	    ap, dp: ^az;
	    bp, cp: ^bz;
	begin
	    new(ap);	(* allocates from zone az only *)
	    ...
	    ap^ := bp^ + cp^;	(* this assignment CANNOT change bp^ or cp^ *)
	    dp^ := bp^/2;	(* neither can this *)	    
	    ...

The fact that C pointers are not statically associated with
separate zones is no more a fundamental property of pointers
than the fact that PL/I pointers are not statically associated
with a type.

To put this into C terms, suppose we introduce a new type
constructor "pointer into named array".  Here's how the example
might look:

	float a[N], b[N], c[N], d[N];
	float *[a] ap, *[b] bp, *[c] cp, *[d] dp;
	... ap = ... bp = ... dp = ...
	*ap = *bp + *cp;	/* can ONLY change a[] and *ap */
	*dp = *bp/2;		/* can ONLY change d[] and *dp */

This would be an upwards compatible extension C, just as 'const' was.
-- 
The problem about real life is that moving one's knight to QB3
may always be replied to with a lob across the net.  --Alasdair Macintyre.

gl8f@astsun7.astro.Virginia.EDU (Greg Lindahl) (10/29/90)

In article <4119@goanna.cs.rmit.oz.au> ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes:

>Yes, *sigh* we're all talking past each other.
>	THIS IS NOT AN INTRINSIC PROPERTY OF POINTERS;
>	it is a property of the programming language *C*.

Depends on how you define "pointers" ;-)

>To put this into C terms, suppose we introduce a new type
>constructor "pointer into named array".

C already has these. An example:

     float a[100];
     int i = 10;

     a[i];

Cheers (and don't forget the ;-),

-- greg

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (10/29/90)

In article <1990Oct29.051730.10838@murdoch.acc.Virginia.EDU>, gl8f@astsun7.astro.Virginia.EDU (Greg Lindahl) writes:
> In article <4119@goanna.cs.rmit.oz.au> ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes:
> >Yes, *sigh* we're all talking past each other.
> >	THIS IS NOT AN INTRINSIC PROPERTY OF POINTERS;
> >	it is a property of the programming language *C*.

> Depends on how you define "pointers" ;-)

I certainly *don't* define "pointer" to mean "exactly what C means by
a pointer, no more, no less".  PL/I and Pascal and Algol 68 are some
well-known languages that have pointer-valued variables; Euclid is a
language which deserved to be better known.
None of them provides pointer arithmetic or pointer ordering.

If people want to say "pointer ARITHMETIC" is bad, let them say so.
If people want to say "pointer ORDERING" is bad, let them say so.
If people say "POINTERS" are bad, and they are talking with the
intention of being understand, then they are referring to the
idea behind PL/I POINTER, Pascal "^", Algol 68 and Mary REF, and
Euclid's whatever-it-was.  The question is whether this common
idea can be "tamed".  I am still waiting to hear what the objection
to Euclid's construct is.

> >To put this into C terms, suppose we introduce a new type
> >constructor "pointer into named array".

> C already has these. An example:

>      float a[100];
>      int i = 10;
>      a[i];

Where is "i" constrained to index only "a"?
(Am I the only person who remembers ESPL?  In ESPL you *could*
declare a variable to be a "verified index" for a particular array.)

-- 
The problem about real life is that moving one's knight to QB3
may always be replied to with a lob across the net.  --Alasdair Macintyre.

anw@maths.nott.ac.uk (Dr A. N. Walker) (10/30/90)

In article <3791@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>> In article  <3681@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>> ]Really?  I don't know _any_ language that has pointers that places
>> ]bounds on where it can point.

	Pascal:  can only point to specially allocated storage
		[ie, no equivalent to "int i, *pi = &i;"]
	Algol: cannot point to objects of narrower scope
		[eg, the equivalent of "int *pi; f() { int i; pi = &i; }"
		 is illegal]

>With pointers, [...] The compiler _can't_ make any simple automatic bounds
>checks (although, in C it can perhaps automatically limit index
>calculations).

	Well, it's only non-simple in the case you describe elsewhere of
address punning through casts;  these are (or ought to be) rare enough
that a non-trivial check [eg, by tagging] has only marginal effects on
the efficiency.  Older readers will recall that it used to be *extremely
normal* to test programs with a whole battery of checks "enabled", with
a consequent large penalty in run-time speed, and to disable the checks
for "production runs".  If it is unacceptable these days to disable the
checks, then the answer is to provide them in hardware, as we used to
in the 50s and 60s.

>Variables _not_ in this "family"
	[of declared-to-be-alias-able variables]
>				  are _known_ not to be aliased to
>anything within it

	Yes, but one trouble is that one of the commonest cases in
practice is where a variable is aliassed with itself, as in a[i] v. a[j].
Checking that i != j can plainly only be done, in general, at run-time.
It gets worse when you are worried about a[j] v. b[j] where a, b and j
are parameters to some procedure -- you plainly need to be sure that
a != b [and it gets worse in languages that allow slicing and other
subsetting of arrays], which again involves arbitrarily messy checks if
a and b are potentially inherited from further procedures and you insist
that the checks be carried out by compiler and loader.  The problems
will only be exacerbated if you deprive programmers of natural ways of
using pointers, so that they find themselves [through brute force or
ignorance] emulating proper data structures by indexing into arrays.

	You are right that in many cases pointerless code can help
the compiler by making certain aliasses impossible;  on the other
hand, in most if not all of the cases where this matters, the
compiler can make the same deductions given a program which is
properly divided into modules and in which scope rules are enforced.

-- 
Andy Walker, Maths Dept., Nott'm Univ., UK.
anw@maths.nott.ac.uk

jlg@lanl.gov (Jim Giles) (10/30/90)

From article <1990Oct29.191321.3202@maths.nott.ac.uk>, by anw@maths.nott.ac.uk (Dr A. N. Walker):
> [...]
> 	Yes, but one trouble is that one of the commonest cases in
> practice is where a variable is aliassed with itself, as in a[i] v. a[j].

Well, it is one of the common forms of aliasing.  However, it is clearly
marked as such: both are references to 'a'.  Further, it is a very rare
problem compared to array processing with 'a[j]' vs. 'b[j]' - here it
is _very_ important that we _know_ 'a' to be disjoint from 'b'.

> [...]
> It gets worse when you are worried about a[j] v. b[j] where a, b and j
> are parameters to some procedure -- you plainly need to be sure that
> a != b [...]

That has been my point.  If 'a' and 'b' aren't declared together in
the same 'aliased' declaration, they _can't_ be aliased.  Not at all.
Ever.

This requires that the loader check procedure arguments against
alias status and guarantee that the caller doesn't pass aliased
arguments to a subroutine that doesn't also declare them aliased.
Since all the tests are done at compile- and load-time, there is
no run-time penalty.  The price to be paid for this is that any
case that is still ambiguous at load-time must be assumed aliased.

> [...]  [and it gets worse in languages that allow slicing and other
> subsetting of arrays], [...]

Exactly the reason for wanting to get explicit compile-time and
load-time checkable constraints into the language.  This permits
the compiler to generate efficient code for those things (the majority)
which are _known_ not to be aliased.

> [...]                  which again involves arbitrarily messy checks if
> a and b are potentially inherited from further procedures and you insist
> that the checks be carried out by compiler and loader.  [...]

It is because of this problem of inheriting data through several levels
of the call chain that the test in the loader is most important.  The
loader _can_ perform this test reliably and quickly.  This permits
efficient code to be generated while maintaining confidence that
the no-alias assumptions won't be violated.

> [...]                                                   The problems
> will only be exacerbated if you deprive programmers of natural ways of
> using pointers, so that they find themselves [through brute force or
> ignorance] emulating proper data structures by indexing into arrays.

I don't know any natural ways of using pointers.  Pointers are one of
the most unnatural data structuring tools that I've ever encountered.
Further, in the context of the data structuring tools I've recommended,
I see no reason to use arrays to simulate any data structure other than
arrays.   Linked lists, graphs, trees, etc. should all be implemented
as recursive data structures (aliased or not as teh need arises).
Where's the need for arrays?

> 
> 	You are right that in many cases pointerless code can help
> the compiler by making certain aliasses impossible;  on the other
> hand, in most if not all of the cases where this matters, the
> compiler can make the same deductions given a program which is
> properly divided into modules and in which scope rules are enforced.

You yourself have pointed out many specific cases where a compiler
CANNOT make such deductions.  The case of arrays 'a' and 'b'
being passed to the same function as arguments is a case in point:

real function :: example(real::a(:),b(:))
   do while (some_condition)
      SOME CODE INVOLVING BOTH a AND b
   end do
end function

Since 'a' and 'b' aren't declared with the 'aliased' attribute, then
they _must_ not be aliased.  The loader can and _should_ determine
that this constraint is obeyed.  Using the language features I've
been recommending, the compiler can optimize the loop all it wants.
Without the constraints I advocate, such optimizations are eather
unsafe or not applied at all.

J. Giles

gudeman@cs.arizona.edu (David Gudeman) (10/31/90)

In article  <4349@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
]That has been my point.  If 'a' and 'b' aren't declared together in
]the same 'aliased' declaration, they _can't_ be aliased.  Not at all.

I don't see why you keep arguing that "arrays with this new 'alias'
declaration of mine" are more optimizable than "pointers without the
'alias' declaration".  You are comparing apples and oranges.  The
appropriate comparison is to compare the optimization potential of
arrays vs. pointers either both with or both without the 'alias'
declaration.
-- 
					David Gudeman
Department of Computer Science
The University of Arizona        gudeman@cs.arizona.edu
Tucson, AZ 85721                 noao!arizona!gudeman

jlg@lanl.gov (Jim Giles) (10/31/90)

From article <26971@megaron.cs.arizona.edu>, by gudeman@cs.arizona.edu (David Gudeman):
> [...]                  You are comparing apples and oranges.  The
> appropriate comparison is to compare the optimization potential of
> arrays vs. pointers either both with or both without the 'alias'
> declaration.

Ok.  First _without_ the 'aliased' attribute:
   If your language doesn't have the ability to declare aliasing
   status, arrays are a direct win over pointers (which is why
   Fortran is usually faster than C on array intensive code).  Of
   course, optimizing the arrays this way is unsafe unless the
   loader checks all passed arguments to enforce the constraint that
   distinct arrays are NOT aliased.  Most Fortran environments
   _don't_ test this - so you can have strange and difficult to find
   errors arising from this cause.  Meanwhile, C converts all array
   args to pointers and generates inefficient code.  I don't regard
   either solution satisfactory.

Now _with_ the 'aliased' attribute:
   If your language _does_ have the ability to declare aliasing status
   (like the one I'm recommending), then pointers don't give you any
   capability you don't already have _without_ them.  Pointers in such
   a language are neither more powerful, more legible, nor more
   efficient than an appropriate combination of the other features
   I've mentioned.  So, what do we need pointers for?

   In fact, there are other distinctions between the different data
   types I've recommended _besides_ their aliasing status which yield
   optimization possibilities in addition to the alias free nature of
   the language.  Using pointers to simulate these data types
   (sequences for example) deprives the compiler of information which
   _could_ be used to improve the performance of the code.

So, arrays (and/or the other data types I've mentioned) are an
improvement on pointers - with or without the 'aliased' attribute.

I recommend the 'aliased' attribute in order to provide the
capability (especially in connection with recursive data types) to
do all the things that Pascal pointers do.  The aliased attribute
achieves this and at the same time allows easier optimization: the
compiler has an explicit local declaration of all possible aliasing
in each routine - the rest of the variables are _guaranteed_ not to
be aliased.

The price of testing this constraint is the load-time test that
Fortran compiler/loader environments _should_ already be doing.
This can be made considerably easier with a requirement of function
interface/prototype declarations of all procedures that are to be
called.  The compiler can then test the aliasing constraint at
compile-time and the loader need only make sure that the
interface/prototype information actually matches the corresponding
procedure declaration.

J. Giles

gudeman@cs.arizona.edu (David Gudeman) (11/01/90)

In article  <4464@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
]
]Ok.  First _without_ the 'aliased' attribute:
]   ... arrays are a direct win over pointers (which is why
]   Fortran is usually faster than C on array intensive code).  Of
]   course, optimizing the arrays this way is unsafe unless the
]   loader checks all passed arguments to enforce the constraint that
]   distinct arrays are NOT aliased.

You are still comparing apples and oranges.  You are comparing
"arrays, assumed not to be aliased" with "pointers, assumed to be
aliased".

]Now _with_ the 'aliased' attribute:
]   (like the one I'm recommending), then pointers don't give you any
]   capability you don't already have _without_ them.  Pointers in such
]   a language are neither more powerful, more legible,...

This, of course, is a matter of opinion.  Maybe to clarify things, you
could post your definition of "pointer".  In particular, I'm
curious about how you think passing a pointer (C-style) is different
from passing an array.

]   ...  Using pointers to simulate these data types
]   (sequences for example) deprives the compiler of information which
]   _could_ be used to improve the performance of the code.

I'd also like to know what information this is.  Presumably a complete
definition of what you mean by "pointer" and "array" will make this
obvious.
-- 
					David Gudeman
Department of Computer Science
The University of Arizona        gudeman@cs.arizona.edu
Tucson, AZ 85721                 noao!arizona!gudeman

jlg@lanl.gov (Jim Giles) (11/01/90)

From article <27028@megaron.cs.arizona.edu>, by gudeman@cs.arizona.edu (David Gudeman):
> [...]
> You are still comparing apples and oranges.  You are comparing
> "arrays, assumed not to be aliased" with "pointers, assumed to be
> aliased".

Oh, I'm sorry.  I have given this information before, but perhaps you
missed it.  An array is a mapping from one or more bounded index sets
to values of an underlying type.  Each array is assumed distinct from
all others unless _explicitly_ declared otherwise.

Note: such explicit declaration of aliasing must be locally visible
to the compiler at compile time and any attempt to cheat must be
intercepted be load-time checks.

> [...]
> ]Now _with_ the 'aliased' attribute:
> ]   (like the one I'm recommending), then pointers don't give you any
> ]   capability you don't already have _without_ them.  Pointers in such
> ]   a language are neither more powerful, more legible,...
> 
> This, of course, is a matter of opinion.  [...]

The 'soft' words (like 'legible') might be considered opinion, the
statement that pointers provide no more power is a statement of _fact_.

> [...]                                     Maybe to clarify things, you
> could post your definition of "pointer".  [...]

I've done this before too.  Perhaps you missed it.

My definition of pointer is a variable whose value is an address and
has the following operations defined on it:

   Dereferencing to some predefined (for each pointer) type object.
   Comparing to other pointers (of the same underlying type) for equality.
   Assigning to/from other pointers (of the same underlying type)
   Being initialized to some address by an allocation mechanism

This is essentially the definition of Pascal pointers.  In addition,
other operations are defined by some languages:

   Comparing for relative order
   Scaled differences between two pointers
   Scaled sums of a pointer with an integer
   Casting to some other underlying type
   ...

> [...]                                     In particular, I'm
> curious about how you think passing a pointer (C-style) is different
> from passing an array.

I've posted this before too.  A C style pointer (because C has the
capability of scaled address addition) is similar (but not identical
to) the passing of a one dimensional array which is indexed by
integers.  But, the array differs by requiring its bounds to be
passed and by its ability to be multidimensional and by the fact
that it is _not_ aliased or overlapping with any other argument or
global variable without explicit _local_ declaration of the fact.

In fact, my definition of arrays differs from the Fortran definition
only in the fact that I allow array args to be aliased if explicitly
declared so, and Fortran _never_ allows them to be.  Also, I would
recommend that the array bounds be passed implicitly rather than
requiring the user to do it.

> [...]
> ]   ...  Using pointers to simulate these data types
> ]   (sequences for example) deprives the compiler of information which
> ]   _could_ be used to improve the performance of the code.
> 
> I'd also like to know what information this is.  Presumably a complete
> definition of what you mean by "pointer" and "array" will make this
> obvious.

I gave an example of this before too.  Consider the Fortran loop:

      do 10 i=1,N
	 x(1,i) = x(2,i)
  10  continue

It is easy for the compiler to see that the second column is being
copied to the first and the source and destinations don't overlap.
So, the code can easily be vectorized, pipelined, or unrolled for
faster execution.  Even the IBM PC can benefit from this knowledge
(and issue a block move instruction).

The corresponding C code is:

      p = &x;
      q = p + N;
      for (i=0;i<N;i++) *p++ = *q++;

I've taken the obvious step of making array 'x' be row-wise to make
the C version the same as the Fortran.  Note that it's not as easy
for either the user or the compiler to tell that the move is safe
from dependencies.  It's possible: 'q' starts out 'N' ahead of 'p'
and the loop only takes 'N' steps.  But the test is clearly more
complicated than the array version required.

Suppose that you have to do the assignment 'against the grain' of
the array.  That is, the elements to be moved aren't the consecutive
ones.  The Fortran version is:

      do 10 i=1,M
	 x(i,1) = x(i,2)
   10 continue

The compiler can again _easily_ see that there are no dependencies.
The C version is:

      p = &x;
      q = p + 1;
      for (i=0;i<M;i++,p+=M,q+=M) *p = *q;

Here, the determination that there is no overlap requires the
compiler to see that the 'M'-parity of 'p' is always zero and
the 'M'-parity of 'q' is always one (both based on the base of
'x').  Few C compilers can even do the first example.  I don't
know any that can see this last.

Note:  I picked Fortran as the counter-example but I could just as
easily have used Pascal, Modula, Ada, Algol, etc..  In all of these,
the compiler can take advantage of the information which is lost if
the operation done using pointers.  Even in C, in those few contexts
where multidimensional arrays are actually recognized, it is better
to use the array notation so the compiler has more complete
information (if you can find a C compiler capable of making use
of such information).

J. Giles

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (11/01/90)

In article <4569@lanl.gov>, jlg@lanl.gov (Jim Giles) writes:
> From article <27028@megaron.cs.arizona.edu>, by gudeman@cs.arizona.edu (David Gudeman):
> > You are still comparing apples and oranges.  You are comparing
> > "arrays, assumed not to be aliased" with "pointers, assumed to be
> > aliased".

> Oh, I'm sorry.  I have given this information before, but perhaps you
> missed it.  An array is a mapping from one or more bounded index sets
> to values of an underlying type.  Each array is assumed distinct from
> all others unless _explicitly_ declared otherwise.

A consequence of this definition is that array cross-sections (as in
PL/I, Algol 68, and Fortran Extended) are *NOT* arrays.  I don't
remember seeing anything in the Algol 68 definition that forbade
aliasing, so by this definition Algol 68 didn't have array parameters
at all.  I have never seen the PL/I standard, so although I was never
warned against aliasing in PL/I programs that may have been a fault of
the textbooks and vendors' manuals that I used (thankfully, it has been
some years since I last used PL/I).

In fact, it appears that only APL (which uses value semantics for all
parameters and assignments) Euclid, and Fortran have arrays (and with
nested arrays, it's not clear to me whether APL2 has them).
-- 
The problem about real life is that moving one's knight to QB3
may always be replied to with a lob across the net.  --Alasdair Macintyre.

jlg@lanl.gov (Jim Giles) (11/02/90)

From article <4181@goanna.cs.rmit.oz.au>, by ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe):
> In article <4569@lanl.gov>, jlg@lanl.gov (Jim Giles) writes:
>> [...]       An array is a mapping from one or more bounded index sets
>> to values of an underlying type.  Each array is assumed distinct from
>> all others unless _explicitly_ declared otherwise.
> 
> A consequence of this definition is that array cross-sections (as in
> PL/I, Algol 68, and Fortran Extended) are *NOT* arrays. [...]

Well, Fortran Extended array cross-sections certainly are not arrays.
If they are passed to subroutines, they can become arrays there.  The
reason they aren't arrays has nothing to do with aliasing though -
cross sections are not indexible (so they don't meet the frist part of
my definition of being mappings from index sets to values).

Fortran Extended pointers are aliased arrays (which can be aliased to
anything in their scope that has the target attribute and is the same
underlying type).  But, this constitutes an explicit declaration
(thought not explicit enough to my taste - I have opposed the new
standard) so they still fit into my definition (if only barely).

An array section passed to a subroutine must not be aliased to any
other argument or global variable that that is visible in the scope of
the subroutine (unless explicitly declared as usual).  In this case,
the section is considered an array again - it can be indexed, etc..
Once again, this fits my definition.

Still, you make a good point.  The operations allowed on array
sections should be carefully considered.  The purpose of making the
definition of arrays as I did was to allow the compiler (with very
little assist from the loader) to produce efficient code for array
manipulation and be assured that the optimizations are safe from
aliasing.  If some operations on array sections are possible which
preserve this property but are in contradiction to my definition of
arrays, then I am willing to add array sections as a separate data
construct to my list and define those new operations on them.

J. Giles

gl8f@astsun8.astro.Virginia.EDU (Greg Lindahl) (11/02/90)

In article <27028@megaron.cs.arizona.edu> gudeman@cs.arizona.edu (David Gudeman) writes:
>In article  <4464@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>]
>]Ok.  First _without_ the 'aliased' attribute:
>]   ... arrays are a direct win over pointers (which is why
>]   Fortran is usually faster than C on array intensive code).  Of
>]   course, optimizing the arrays this way is unsafe unless the
>]   loader checks all passed arguments to enforce the constraint that
>]   distinct arrays are NOT aliased.
>
>You are still comparing apples and oranges.  You are comparing
>"arrays, assumed not to be aliased" with "pointers, assumed to be
>aliased".

Looks to me that he's trying to figure out the fastest way to do
array-intensive calculations in C and Fortran. In that case, the
problems with C pointers cause C compilers to miss optimizations. If
you aren't concerned about such a situation, it's not surprising that
you two are talking past each other.

gudeman@cs.arizona.edu (David Gudeman) (11/02/90)

In article  <4569@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
]The 'soft' words (like 'legible') might be considered opinion, the
]statement that pointers provide no more power is a statement of _fact_.

Not until you define "power".

]... A C style pointer ... is similar (but not identical
]to) the passing of a one dimensional array which is indexed by
]integers.  But, the array differs by requiring its bounds to be
]passed and by its ability to be multidimensional and by the fact
]that it is _not_ aliased or overlapping with any other argument or
]global variable without explicit _local_ declaration of the fact.

Well then you are really making three distinct arguments and
confusing your readers by merging them into one, aided and abetted
by a specialized terminology that the net public in general do not
share.  Your arguments seem to be (I'll assume everyone agrees that
arrays are multi-dimensional):

(1) Arrays should be bounds checked.

(2) Aliasing of data structures should not be allowed unless made
explicit by declarations.

(3) Pointers should not have a dereferencing operation.

To take the last point first, you keep saying that you want to
eliminate pointers, but you also keep saying that you would replace
them by something that is semantically equivalent.  All you have done
is replaced the notion of an reference-valued object with a
co-reference, replaced pointer creation with a special assignment, and
removed the need for a dereference operation.  You haven't eliminated
pointers, you have just changed them to something similar.

As to the other points, you keep saying that "arrays have property X
and pointers don't", when what you really mean is that bounds-checked
arrays with guaranteed no-aliasing have property X and that raw
machine addresses don't have property X.  If you wouldn't keep
refering to raw machine addresses as pointers there would be a lot
less confusion.  There is no reason why pointers in some language
could not be defined to be bounded and unaliased.  If pointers were
defined that way, then pointers would have property X.  Property X
here mostly refers to your assertions about the greater optimizability
and safety of arrays over pointers.  (Your example depended on the
assumption that the C pointers were not bounded and that arbitrary
aliasing is possible.)

To make the situation even more bemusing, your version of pointers
without dereferencing constitutes a direct counter-example to what you
claim is impossible for pointers.
-- 
					David Gudeman
Department of Computer Science
The University of Arizona        gudeman@cs.arizona.edu
Tucson, AZ 85721                 noao!arizona!gudeman

anw@maths.nott.ac.uk (Dr A. N. Walker) (11/03/90)

In article <4349@lanl.gov> jlg@lanl.gov (Jim Giles) writes:

	[Quite a lot, but I think we've reached a stage where we can
say what the situation is, and then agree to differ.]

	Suppose we have a procedure with two arrays as parameters.  In
a call of the procedure, the actual parameters may be arbitrarily messily
derived from other parameters in the call chain, may be slices or other
subsets [not in all languages, of course] perhaps of the same array.
There is a spiffy optimisation available (say, a block move, or a clever
vectorisation) that sometimes doesn't work if the parameters (call them
A and B) are not sufficiently different.  Call the optimised version of
the code ALPHA, and the unoptimised version BETA.  In real life, though
not on the machines available to impoverished Maths departments, ALPHA
may run (say) 30 times faster than BETA.

	Then Fortran should compile the code somewhat like

		IF interfere (A, B) THEN undefined ELSE ALPHA

As ALPHA is a perfectly good value of "undefined", this is usually optimised
to just ALPHA.  We would all agree, I hope, that this is less than ideal,
and can give rise to difficult-to-uncover bugs.

	C should compile the code to something like

		IF interfere (A, B) THEN BETA ELSE ALPHA

In practice, to economise on code size, unless "interfere" can be determined
very easily at compile time to be FALSE, it will compile to BETA.  This
is clearly slower ("less efficient") than the Fortran.

	JLG would like us to declare whether or not A and B can interfere.
If we *do* so declare, then it compiles to BETA, otherwise the code is

		IF interfere (A, B) THEN ERROR ELSE ALPHA

where, we hope, "interfere" can usually, if not always, be decided
statically by the compiler and loader, giving another optimisation.

	JLG's solution is obviously better than the Fortran.  Further,
it satisfies the rule that optimisation must not break code that, prima
facie, works, so it is a "correct" solution.  Further, it is clearly no
slower than the C, and perhaps much faster.  He therefore has a tenable
case.

	The counter argument is that it puts the burden in the wrong
place.  JLG is asking the *programmer* to assert something that may
not be assertable.  Some of the procedures in the call chain may be
library procedures, or may involve global variables from other modules,
for example.  How do *I* know what aliassing may occur?  Further,
in languages which (very usefully) allow dynamic array slicing, the
presence or absence of aliassing may often only be determinable at
run time, so either I program "defensively" and declare the alias
(so having a program which always runs slowly), or I take pot luck
and have a program which may fail at run time.

	In fact, the *real* situation, is that if the compiler and
loader can determine whether or not aliassing can occur, then the
"alias" declaration is unnecessary, and if they can't, then a run-time
check is needed anyway.  Hints to compilers, if thought desirable,
could be given in all sorts of other ways.

>I don't know any natural ways of using pointers.

	Well, we've been round this before.  *I* find them useful;
why do you want me to change my natural images and pictures of what
my programs do?  No-one is forcing *you* to use them.

	If you insist on thinking of pointers merely as ways of
storing machine addresses in a half-hearted attempt to avoid the
(assumed) overheads of array indexing, then of course you find
them unnatural.  On the other hand, if you have (say) a graph,
and at some point in your program you discover a node with an
interesting property, how *do* you naturally refer later to that
same node *other than* by explicitly or implicitly keeping a
pointer to that node?

>	   Linked lists, graphs, trees, etc. should all be implemented
>as recursive data structures (aliased or not as teh need arises).

	Even if you implement the graph itself as a recursive structure
(implied:  without pointers [and I am by no means convinced that that
is desirable, even if it is possible]), you may still find yourself
wanting to keep fingers on particular nodes.  A finger by any other
name is still a pointer.

-- 
Andy Walker, Maths Dept., Nott'm Univ., UK.
anw@maths.nott.ac.uk

jlg@lanl.gov (Jim Giles) (11/03/90)

From article <27094@megaron.cs.arizona.edu>, by gudeman@cs.arizona.edu (David Gudeman):
> [...]
> Well then you are really making three distinct arguments and
> confusing your readers by merging them into one, aided and abetted
> by a specialized terminology that the net public in general do not
> share.  Your arguments seem to be (I'll assume everyone agrees that
> arrays are multi-dimensional):

I'm glad you assume that arrays can be multidimensional.  Before the
advent of C, this was accepted by nearly everyone as one of the possible
properties of arrays.  This is not (or, at least, _should_ not be)
"specialized terminology that the net public in general do not share."

> [...]
> (1) Arrays should be bounds checked.

At least as an option, yes.  Further, the compiler should be allowed
(and able) to make use of its knowledge of the bounds to generate more
efficient code (especially for statically allocated arrays where the
bounds are fixed).  Again, before C, this was a universal assumption.
Are you telling me that this property of arrays is one that net readers
are not aware of?

> [...]
> (2) Aliasing of data structures should not be allowed unless made
> explicit by declarations.

That is the only new point that I have introduced.  And, as such,
it has been the one that I've taken most pains to point out.
Nevertheless, this has _always_ been true of arrays in most
programming languages (other than C and its relatives).  So, with
respect to _arrays_, this should still be a fairly widely
acknowledged property - even if not universal because of C.

>  [...]
> (3) Pointers should not have a dereferencing operation.

This is only a minor point which makes the syntax of using recursive
data structures more legible.  In fact, the only objections I have
to pointers a`la Pascal are the syntax and the lack of a way to
inform the compiler when two pointers of the same type are _not_
aliased.  So, aside from those two changes, my "aliased" attribute
is identical to Pascal style pointers.  This has little to do with
the `C style pointer vs. array' debate.

> [...]                                          You haven't eliminated
> pointers, you have just changed them to something similar.

When comparing to Pascal sytle pointers, this is indeed what I have
done.  It is also what I've said all along that I've done.  That is
what I mean when I say that something is semantically equivalent:
it has the same meaning with a possibly different syntax.

> [...]            There is no reason why pointers in some language
> could not be defined to be bounded and unaliased.  If pointers were
> defined that way, then pointers would have property X. [...]

In which case, why call them pointers?  Surely to call such things
"pointers" would be to use "specialized terminology that the net
public in general do not share." In fact, the terminology that the
net public share is C and/or possibly Fortran and Pascal.  Most
other languages seem to be relegated to the category of esoterica.
Certainly, nothing outside those bounds can be regarded as shared.
So, in a net discussion, pointers are those things in C or perhaps
limited to those things in Pascal.

> [...]
> To make the situation even more bemusing, your version of pointers
> without dereferencing constitutes a direct counter-example to what you
> claim is impossible for pointers.

On the contrary.  If you'll look more carefully at my past articles on
this subject, you will find that the objects that I call "aliased"
variables have all the bad properties that I ascribe to pointers.
They are more strictly bounded, so that the damage is limited to those
operations involving variables in the same "aliased" declaration.
But, the damage is there.  This more strict limitation on aliasing is
one of the two differences between my proposal an Pascal pointers:
this is the only semantic difference.  Again, this has little
relevance to the `C-pointer vs. arrays' debate.

J. Giles

jlg@lanl.gov (Jim Giles) (11/03/90)

From article <1990Nov2.183359.6761@maths.nott.ac.uk>, by anw@maths.nott.ac.uk (Dr A. N. Walker):
> [...stuff that shows that _sombody_ finally understands what I'm saying...]
>
> 	The counter argument is that it puts the burden in the wrong
> place.  JLG is asking the *programmer* to assert something that may
> not be assertable.  Some of the procedures in the call chain may be
> library procedures, or may involve global variables from other modules,
> for example.  How do *I* know what aliassing may occur?  [...]

Exactly the problem I kept trying to point out to someone via private
email (and the reason I don't deal with that person over email any
more - he ignored the point and kept claiming that the compiler by
itself could do all).  It is for this reason that the loader is
required to do some work.  It _can_ do the call tree analysis (if the
compiler passes along all the information).  Any errors the loader
finds in this search correspond to aliasing that you didn't declare
(or possibly, aliasing that you _did_ declare unnecessarily).  This is
how you learn what aliasing may occur.  I maintain loaders in large
machine environments - so I'm reasonably sure that the loader is up to
the task provided the allowed aliasing is specified clearly enough.

> [...]                                                    Further,
> in languages which (very usefully) allow dynamic array slicing, the
> presence or absence of aliassing may often only be determinable at
> run time, so either I program "defensively" and declare the alias
> (so having a program which always runs slowly), or I take pot luck
> and have a program which may fail at run time.

Yes, this is a problem.  This is one reason that I insist on complete
information about arrays to be retained by the parameter passing
mechanism.  The compiler _can_ analyse _some_ of these occurrences
and give messages if you guessed wrong.  To be sure, there are some
cases where the compiler can't decide - and you can't either - so
you have to program defensively in those cases.  Fortunately, such
cases, while common, are not in the majority - the method I suggest
will allow significant and useful control for the programmer in
most cases

> [...]
> 	In fact, the *real* situation, is that if the compiler and
> loader can determine whether or not aliassing can occur, then the
> "alias" declaration is unnecessary, [...]

Well, true only if the loader is able to do code generation or the
compiler is able to do interprocedural analysis.  In existing
implementations, the information found by the loader is too late
to effect what code the compiler generates.  This is why the
declarations at compile time are necessary - the loader only
checks to make sure that your intended constraints are actually
met.  (Unfortunately, if they are not met, it is indeed the
user's problem to correct.  But the loader can at least give
full particulars about what didn't match-up.)

> [...]                               and if they can't, then a run-time
> check is needed anyway.  [...]

Or, defensive programming (and accepting the loss of efficiency)
is also possible here.  I don't know which to prefer.  The run-time
check is only required for those cases which the compiler/loader
combination couldn't fathom.  This may happen in a lot of important
cases, but this still leaves most cases working efficiently with
the scheme I propose.

> [...]                    Hints to compilers, if thought desirable,
> could be given in all sorts of other ways.

Yes, but this is the way I've chosen.  You can disagree with the syntax
or argue that it fails in some difficult to fathom case (like passing
the red and black squares of a chessboard as separate arguments), but
I will still claim that it's useful in many other important cases.
If you have an alternate mechanism to suggest, please feel free!
I'd like to see other people's ideas.

> [...]            On the other hand, if you have (say) a graph,
> and at some point in your program you discover a node with an
> interesting property, how *do* you naturally refer later to that
> same node *other than* by explicitly or implicitly keeping a
> pointer to that node?

By keeping an aliased variable of the same recursive type which
references that node.  Note, as I've repeatedly said, "aliased"
variables provide the _same_ functionality as Pascal pointers.
In fact, the compiler _should_ generate the same code (if not
better, since my suggestion limits aliasing more strictly).

> [...]                                         A finger by any other
> name is still a pointer.

Well, I will continue to stick to the definition of pointers that
I've posted.  Since my "aliased" variables are more restricted I
will continue not to call them pointers.  But, they allow the user
to implement the same structures (in the same way: if he aliases
everything) as Pascal pointers do.  For recursive data structures,
"aliased" variables work the same way Pascal pointers do, but the
class of objects each variable may be aliased to is much more
severely limited - that's the whole difference apart from minor
syntactic changes.

As for a finger being a pointer: only when I point with it.

J. Giles

peter@ficc.ferranti.com (Peter da Silva) (11/04/90)

> > (2) Aliasing of data structures should not be allowed unless made
> > explicit by declarations.

> Nevertheless, this has _always_ been true of arrays in most
> programming languages (other than C and its relatives).

And in practice this has pretty much never been enforced. In practice all
function and subroutine parameters in Fortran have all the characteristics
of pointers.
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.   'U`
peter@ferranti.com

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/06/90)

In article <3656@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> But the context of the discussion makes it
> clear that the person who posted TeX as an example of pointer efficiency
> was of the opinion that pointers and arrays differed importantly.

That is correct. In Q there is no relation between pointers and arrays,
except that a standard pointer abbreviation p[i] happens to look like
the array indexing builtin a[i].

> In
> fact, in the same news article that he proposed TeX, he also gave
> examples which attempted to prove the supposed superiority of pointers
> to arrays.

Close enough. I will return to this point several articles from now.

---Dan

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/06/90)

In article <3681@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> If pointer arithmetic across segment boundaries
> is not reliable, then pointers can't be reliably used to implement
> the memory manager (or, at least, not without considerable extra
> difficulty).

This statement is unjustified. It is, in fact, false. I am told that at
least one C IBM PC memory manager keeps track of memory separately in
each segment.

---Dan

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/06/90)

In article <4349@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> It is because of this problem of inheriting data through several levels
> of the call chain that the test in the loader is most important.  The
> loader _can_ perform this test reliably and quickly.

It is inappropriate to give the loader sufficient knowledge of your
language to perform these tests. It is also rather stupid to delay
checks until load time, since some packages (such as libraries) may not
be linked until much later.

The .h-.c (package spec-package body) model is sufficient to detect
procedure call interface errors at compile time.

> I don't know any natural ways of using pointers.  Pointers are one of
> the most unnatural data structuring tools that I've ever encountered.

Knuth says that all data structures in the real world seem to be quite
appropriately modelled as a set of structures, some fields of which
contain pointers to other structures. There are many, many, many people
who agree with him that pointers are a natural way to express data
structures. I am not saying that you're wrong, but your statement is not
objective.

---Dan

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/06/90)

In article <4837@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> > [...]            There is no reason why pointers in some language
> > could not be defined to be bounded and unaliased.  If pointers were
> > defined that way, then pointers would have property X. [...]
> In which case, why call them pointers?  Surely to call such things
> "pointers" would be to use "specialized terminology that the net
> public in general do not share."

But many of us agree in substance with your definition of pointers as
objects to which you assign the ``address'' (whatever that is) of
another object, and which you ``dereference'' to get the value of or
store into that object. That's the essence of a pointer, and adding
assertions to pointers doesn't take away that essence.

(By the way, I don't know anyone who agrees with you that addresses in
the above definition need to have anything to do with machine addresses.
But that's a minor quibble.)

---Dan

jlg@lanl.gov (Jim Giles) (11/07/90)

From article <1845:Nov607:02:2390@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> In article <3681@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>> [... pointers not reliable across segments ...]
> 
> This statement is unjustified. It is, in fact, false. I am told that at
> least one C IBM PC memory manager keeps track of memory separately in
> each segment.

Exactly my point, pointers are presumably being used _within_
segments, but they are not adequate for writing the _whole_ memory
manager - something else must be in use to track the segments.

However, you are missing my point here.  In this one particular case
I actually _support_ the use of pointers (or some form of unbounded
address arithmetic).  I oppose the above mentioned ANSI C constraint
on pointers because the ability to do unbounded address calculations
was the only use I thought they were fit for.

J. Giles

jlg@lanl.gov (Jim Giles) (11/07/90)

From article <2047:Nov607:10:1690@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> In article <4349@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> [...]
> The .h-.c (package spec-package body) model is sufficient to detect
> procedure call interface errors at compile time.

Not if the information in the .h file _doesn't_match_ the definition
of the corresponding routines in the .c file.  This is the kind of
check that I have _always_ recommended for the loader to do.  The
loader need know _nothing_ about the language to perform these
tests, since it only makes sure that they match.  The compiler must
provide sufficient information to tell the loader which constraints
must be promoted across the call-tree - but this information can
exist in a language independent form.

J. Giles

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/07/90)

In article <5074@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> From article <1845:Nov607:02:2390@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> > In article <3681@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> >> [... pointers not reliable across segments ...]

What you actually said was that a memory manager cannot use pointers
reliably. We all agree that pointers cannot be used portably outside of
the arrays that they point to.

> > This statement is unjustified. It is, in fact, false. I am told that at
> > least one C IBM PC memory manager keeps track of memory separately in
> > each segment.
> Exactly my point, pointers are presumably being used _within_
> segments, but they are not adequate for writing the _whole_ memory
> manager - something else must be in use to track the segments.

Huh? Pointers are used to track the segments. It's just that you never
try to use one pointer across segments. This sort of bi-level allocation
is actually rather easy. Pointers are perfectly adequate for writing the
_whole_ memory manager.

---Dan

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/07/90)

In article <5077@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> From article <2047:Nov607:10:1690@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> > In article <4349@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> > [...]
> > The .h-.c (package spec-package body) model is sufficient to detect
> > procedure call interface errors at compile time.
> Not if the information in the .h file _doesn't_match_ the definition
> of the corresponding routines in the .c file.

In which case the error will be detected at *compile time*, when the
compiler gets around to that .c file. I stand by my statement.

> The compiler must
> provide sufficient information to tell the loader which constraints
> must be promoted across the call-tree - but this information can
> exist in a language independent form.

I suppose you could have the loader accept arbitrary strings from the
compiler describing these constraints, but then it's not going to
generate intelligent error messages. Have you ever used AT&T's C++?
This isn't a very good solution. The compiler can do a much better job.

---Dan

gudeman@cs.arizona.edu (David Gudeman) (11/07/90)

In article  <5077@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
] The compiler must
]provide sufficient information to tell the loader which constraints
]must be promoted across the call-tree - but this information can
]exist in a language independent form.

Sounds like a fun new software tool -- a conditional linker.  I wish I
had time to work on it.  I visualize a linker with control constructs
roughly similar to the C preprocessor.  One necessary exension of
course, would be an error producer

#error "Declaration of function %s does not match definition" f1

I expect that it would also need some extra things in the expression
syntax for type checking and such, for example

#if type(X) = type(Y)

where X and Y are some sort of (language-independent) type expression.

It could be used to conditionaly compile code depending on things that
can't be determined until link time like

#if guaranteed_not_aliased(A,B)
/* highly optimized code that assumes no aliasing */
#else
/* less optimized code to do the same thing for the case where A and B
might be aliased */
#endif

Furthermore, this linker could be downward compatible with linkers
that don't have conditional statements, so that you could install the
new linker without changing the existing compilers on your system.
-- 
					David Gudeman
Department of Computer Science
The University of Arizona        gudeman@cs.arizona.edu
Tucson, AZ 85721                 noao!arizona!gudeman

jlg@lanl.gov (Jim Giles) (11/07/90)

From article <7298:Nov620:50:5990@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> [...]
> Huh? Pointers are used to track the segments. It's just that you never
> try to use one pointer across segments. This sort of bi-level allocation
> is actually rather easy. Pointers are perfectly adequate for writing the
> _whole_ memory manager.

Well, not on my PC at any rate.  Segments have a maximum size of
64K.  The memory is 640K.  Clearly, some arithmetic on numbers bigger
than 16-bits is required to keep track of free space.  C pointers
can't do that.  But, _something_ must be.  Must not be pointers though,
all them use 16-bit arithmetic.

J. Giles

jlg@lanl.gov (Jim Giles) (11/07/90)

From article <7456:Nov620:54:4090@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> In article <5077@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> [...]
>> Not if the information in the .h file _doesn't_match_ the definition
>> of the corresponding routines in the .c file.
> 
> In which case the error will be detected at *compile time*, when the
> compiler gets around to that .c file. I stand by my statement.

Oh, now you've eliminated separate compilation.  In my experience the
main code and the libraries it calls are seldom available in source
on the same machine.  Often compiled by separate companies, possibly
in separate countries.  Mistakes happen, the .h file may not match
the code which generated the .o file (you haven't got the .c file).
I stand by my statement.

J. Giles