[comp.lang.misc] Answers, Chapter 1: TeX

jlg@lanl.gov (Jim Giles) (10/20/90)

> [... All quotes from Dan Bernstein (who else) ...]
> QUESTION 1: Do you see the contradiction in ``I absolutely refuse to
> look at the example you're giving me!'' and ``I am not turning away from
> anything!''?

Had I actually made the first statement, I would have been in
contradiction.  I did not.  I said (and still say) that I will
consider _any_ example that you care to post.  You have refused to
post the example you keep refering to.  However, I _have_ finally
looked over the source of the code you refer to. See below.

>> If there's something in TeX that can't be
>> done with recursive data structures (and the other features I've supported
>> on the net so visibly), then you ought to be able to give a specific
>> example - in C if you like.
>
> Packed array tries are not trivial to implement. I don't see why I
> should bother to type in a presentation just for you, when Knuth has
> already done a much better job.

Well don't bother then.  As it happens, someone _DID_ email the source
code for TeX 'packed array tries' over the weekend.  The news is even
worse for your argument that I expected.  There are _NO_ pointers in
the source code for TeX 'packed array tries'.  The code does everything
through the use of Pascal arrays.  Since arrays are one of the features
that I've "supported on the net so visibly" that I mentioned above, the
TeX code doesn't even meet the criterion.

> QUESTION 2: Why do you refuse to look up packed array tries? Perhaps
> because you're afraid that I'm right?

No, but if I were as cynical as you, I would suspect the reason that
you didn't post the example was that you knew the actual code didn't
support your position.  I am _not_ so cynical.  I'm sure you read the
_description_ of the code (which, no doubt pretended that explicit
pointers were used throughout).  As such, I accept that you argued the
question in good faith.

> QUESTION 3: Can you read? Do you see the word ``packed''? Does it even
> occur to you that there's a difference between a packed array trie and a
> trie (by which people usually mean a list trie or a normal array trie)?

It's not clear to me what kind of packing that pointers allow that
isn't allowed by some combination of the features that I've mentioned.
The example of TeX makes it quite clear that at least one example you
thought important _can_ be done with arrays quite easily.

End of chapter 1.

J. Giles

djones@megatest.UUCP (Dave Jones) (10/23/90)

From article <66253@lanl.gov>, by jlg@lanl.gov (Jim Giles):
> ... There are _NO_ pointers in
> the source code for TeX 'packed array tries'.  The code does everything
> through the use of Pascal arrays.

For theoretical considerations, an index into an array is not much
different from a pointer is it? Range-checking is the only significant
difference I can think of, and that only makes incorrect programs
fail nicer. (Pointers also have some measure of range-checking,
"segmentation violation" for example. Never happened to *me* of
course, but I've heard rumors.)

jlg@lanl.gov (Jim Giles) (10/24/90)

In article <14269@goofy.megatest.UUCP>, djones@megatest.UUCP (Dave Jones) writes:
> From article <66253@lanl.gov>, by jlg@lanl.gov (Jim Giles):
> > ... There are _NO_ pointers in
> > the source code for TeX 'packed array tries'.  The code does everything
> > through the use of Pascal arrays.
> 
> For theoretical considerations, an index into an array is not much
> different from a pointer is it? Range-checking is the only significant
> difference I can think of, and that only makes incorrect programs
> fail nicer. (Pointers also have some measure of range-checking,
> "segmentation violation" for example. Never happened to *me* of
> course, but I've heard rumors.)

For thetheoretical considerations, pointers and one-dimensional arrays
are indeed very similar.  But the context of the discussion makes it
clear that the person who posted TeX as an example of pointer efficiency
was of the opinion that pointers and arrays differed importantly.  In
fact, in the same news article that he proposed TeX, he also gave
examples which attempted to prove the supposed superiority of pointers
to arrays.

Of course, pointers and one dimensional arrays - in spite of their
similarity - are indeed different.  For one thing: arrays are bounded.
For another: distinct arrays are generally not aliased.  These two
properties make debugging and maintaining array codes somewhat easier
than pointer code.  Similarly, these two differences (particularly the
no-alias information) is useful to the compiler in generating more
efficient code.  After all, all the common optimization techniques
that compilers use are inhibited to one extent or other by the
possible presence of aliasing.

J. Giles

gudeman@cs.arizona.edu (David Gudeman) (10/24/90)

-- 
					David Gudeman
Department of Computer Science
The University of Arizona        gudeman@cs.arizona.edu
Tucson, AZ 85721                 noao!arizona!gudeman

gudeman@cs.arizona.edu (David Gudeman) (10/24/90)

In article  <3656@lanl.gov> jlg@lanl.gov (Jim Giles) writes:

]Of course, pointers and one dimensional arrays - in spite of their
]similarity - are indeed different.  For one thing: arrays are bounded.

You know, Jim, it doesn't help your argument any when you keep
bringing up the same points long after they have been discredited.
Pointers are bounded just like array indexes.  Yes, most
implementations of C don't check the bounds, but that is because in
the definition of the C language, run-time errors are generally
defined to produced undefined results.  You could easily change the
implementation and the language specification to produce error
messages on out-of-bounds references.  This is not a difference
between pointers and arrays, it is a difference between language
philosophies.  No doubt if C had true arrays, an out-of-bounds index
would produce undefined results just like out-of-bounds pointers.
-- 
					David Gudeman
Department of Computer Science
The University of Arizona        gudeman@cs.arizona.edu
Tucson, AZ 85721                 noao!arizona!gudeman

jlg@lanl.gov (Jim Giles) (10/24/90)

From article <26726@megaron.cs.arizona.edu>, by gudeman@cs.arizona.edu (David Gudeman):
> In article  <3656@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> 
> ]Of course, pointers and one dimensional arrays - in spite of their
> ]similarity - are indeed different.  For one thing: arrays are bounded.
> 
> You know, Jim, it doesn't help your argument any when you keep
> bringing up the same points long after they have been discredited.
> Pointers are bounded just like array indexes.  [...]

Really?  I don't know _any_ language that has pointers that places
bounds on where it can point.  In C, if I have an 'int' var for
example, unless the int has the register attribute (in which case
it's _supposedly_ not in memory anyway :-), I can have _any_ (int *)
variable point to that 'int'.  No matter where in memory an object
is, a pointer to that type of object can point there.

> [...]                                          Yes, most
> implementations of C don't check the bounds, but that is because in
> the definition of the C language, run-time errors are generally
> defined to produced undefined results.  [...]

What you are talking about here is the ANSI C (which is _very_ new)
constraint on the validity of pointer _arithmetic_.  This constraint
merely says that comparing (or subtracting) pointers that currently
point to within separately allocated objects is undefined.  The same
status occurs if you add (or subtract) integers to (or from) pointers
so that the result leaves the bounds of a single allocated object.

These constraints are present to allow certain lazy implementations
on segmented archetectures to do all pointer arithmetic without
refering to the segment component of the addresses.  The pointer
itself can still be _assigned_ to point anywhere in memory - only
pointer arithmetic is effected by the constraint you mention.

By the way, I opposed this ANSI specification.  One of the _few_
things I think pointers are at all good for is implementing the
memory manager.  If pointer arithmetic across segment boundaries
is not reliable, then pointers can't be reliably used to implement
the memory manager (or, at least, not without considerable extra
difficulty).

J. Giles

gudeman@cs.arizona.edu (David Gudeman) (10/24/90)

In article  <3681@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
][I argue that pointers are bounded just like arrays]
]Really?  I don't know _any_ language that has pointers that places
]bounds on where it can point.

We are comparing pointers to indexes here.  The only time these two
are comparable is when the pointer is one that was calculated with
integer addition on a pointer to an array.  An out-of-bounds pointer
--in correspondence to an out-of-bounds index-- is a pointer whose
value is not in the same array as the starting pointer.

]What you are talking about here is the ANSI C (which is _very_ new)
]constraint on the validity of pointer _arithmetic_.  This constraint
]merely says that comparing (or subtracting) pointers that currently
]point to within separately allocated objects is undefined.

No, I'm talking about C in general.  And of course I'm talking about
pointer arithmetic, otherwise there is no correspondence between
indexes and pointers.  I don't believe any definition of C has ever
defined the effect of dereferencing a pointer that was miscalculated
so that it points out of the region the starting array.  However, C
_could_ define this as producing an error, and then pointer bounds
checking would be just as safe and reliable as array index bounds
checking.

This is all starting to sound very familiar...
-- 
					David Gudeman
Department of Computer Science
The University of Arizona        gudeman@cs.arizona.edu
Tucson, AZ 85721                 noao!arizona!gudeman

kers@hplb.hpl.hp.com (Chris Dollin) (10/24/90)

David Gudeman writes:

   In article  <3656@lanl.gov> jlg@lanl.gov (Jim Giles) writes:

   ]Of course, pointers and one dimensional arrays - in spite of their
   ]similarity - are indeed different.  For one thing: arrays are bounded.

   You know, Jim, it doesn't help your argument any when you keep
   bringing up the same points long after they have been discredited.
   Pointers are bounded just like array indexes.  Yes, most
   implementations of C don't check the bounds, but that is because in
   the definition of the C language, run-time errors are generally
   [rest omitted]

That's not what Jim means by "bounded" (if I understand him correctly) - his
point is that given two *arrays* A, B then it is *known* that A and B have no
locations (ie, updatable bits of store) in common, so assignments within A 
cannot affect the values accessible from B.

This is in contrast to two pointers P, Q (of the same type); an assignment
through P (*P = 42, or P[42] = MAX_INT) might well change values accessible
through Q.

Jim wants ALIAS declarations for the cases where variables can overlap.
Presumably in the case of a procedure with a header something like

    void example( T A[Bound], T B{Bound] ) ...

where A and B are not to be aliased (so the comoiler can do its wizzy 
optimisations), all calls have a proof obligation to show that the actual 
arguments are *in fact* not aliased.

Is that right, Jim? [I'm not sure I agree with Jim. But I'd rather disagree
about the same thing that about two different ones, which is what I think David
is in danger of doing.]

--

Regards, Kers.      | "You're better off  not dreaming of  the things to come;
Caravan:            | Dreams  are always ending  far too soon."

mhcoffin@watmsg.uwaterloo.ca (Michael Coffin) (10/24/90)

In article <3681@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> ...
>What you are talking about here is the ANSI C (which is _very_ new)
>constraint on the validity of pointer _arithmetic_.  This constraint
>merely says that comparing (or subtracting) pointers that currently
>point to within separately allocated objects is undefined.  The same
>status occurs if you add (or subtract) integers to (or from) pointers
>so that the result leaves the bounds of a single allocated object.

These restrictions are not new.  I quote from page 98 of "The C
Programming Language" by Kernighan and Ritchie, 1978:

    "Any pointer can be meaningfully compared for equality or
    inequality with NULL.  But all bets are off if you do arithmetic
    or comparisons with pointers pointing to different arrays.  If
    you're lucky, you'll get obvious nonsense on all machines.  If
    you're unlucky, your code will owrk on one machine but collapse
    mysteriously on another."

---
Michael Coffin				mhcoffin@watmsg.waterloo.edu
Dept. of Computer Science		office: (519) 885-1211
University of Waterloo			home:   (519) 725-5516
Waterloo, Ontario, Canada N2L 3G1

jlg@lanl.gov (Jim Giles) (10/25/90)

From article <26740@megaron.cs.arizona.edu>, by gudeman@cs.arizona.edu (David Gudeman):
> In article  <3681@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> ][I argue that pointers are bounded just like arrays]
> ]Really?  I don't know _any_ language that has pointers that places
> ]bounds on where it can point.
> 
> We are comparing pointers to indexes here.  [...]

_ONE_ of the things that we are comparing pointers to is indexing.
That's one of the problems with pointers (at least, pointers a`la C)
is that many _different_ functionalities are all rolled up into
the pointer mechanism.

So, let's get a definition straight here.  When I say bounded (in the
context of a pointer discussion), I mean that the address of an object
is limited by its declaration (or, at least, its initialization - which
may be dynamic) to a specific memory location.  This means that an
inspection of the declaration and of the present location is sufficient
to determine whether the present reference is "in bounds".

With pointers, you can't make such a simple determination.  No matter
what memory location it references, it _may_ be a legal and intended
one.  This makes debugging pointers incrementally harder - especially
in large codes.  The compiler _can't_ make any simple automatic bounds
checks (although, in C it can perhaps automatically limit index
calculations).  Even the user with an interactive debugger may be
hard pressed to determine whether the pointer is correct or not.

Arrays (even dynamic ones) are always allocated by the compiler or
a compiler generated allocator (at least, according to the language
model I've been presenting) - so you _know_ that they are located
distinctly from other objects with sufficient space for their declared
length (unless the compiler is broken - in which case all bets are off
anyway).

Boundedness and aliasing are _slightly_ different concepts.  Clearly
something that is unbounded is potentially aliased to everything (or,
everything of the same underlying type).  Something that's bounded
_may_ still be aliased: this may even be desireable.

I recommend two mechanisms to implement boundedness: 1) all objects
are distinct; 2) operations allowing aliasing between distinct variables
must be explicitly requested when these variables are declared.  This
means that even those things which _are_ allowed to be aliased are
bounded to be aliased only to others of a small "family" of variables.
Variables _not_ in this "family" are _known_ not to be aliased to
anything within it (in fact, the constraint can be enforced by the
compiler and loader alone - no run-time support is needed - whether
run-time checks might be more efficient is a different question).

The bottom line is that I can think of some applications for bounded
aliasing, and certainly bounded non-aliased variables are very useful,
but I can't think of any application except the low-level memory manager
which needs unbounded memory access.

J. Giles

dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) (10/26/90)

A pointer is not an address.  A pointer is a way of finding an
address.

Just thought I'd mention it.
--
Rahul Dhesi <dhesi%cirrusl@oliveb.ATC.olivetti.com>
UUCP:  oliveb!cirrusl!dhesi

gl8f@astsun9.astro.Virginia.EDU (Greg Lindahl) (10/28/90)

In article <26726@megaron.cs.arizona.edu> gudeman@cs.arizona.edu (David Gudeman) writes:
>In article  <3656@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>
>]Of course, pointers and one dimensional arrays - in spite of their
>]similarity - are indeed different.  For one thing: arrays are bounded.
>
>You know, Jim, it doesn't help your argument any when you keep
>bringing up the same points long after they have been discredited.
>Pointers are bounded just like array indexes.

Actually, it just shows that we're talking past each other most of the
time. Here's a practical example where the boundedness of array
references allows you to easily generate better code:

     a[i] = b[i] + c[i]             *a = *b + *c;
     d[i] = b[i] / 2.               *d = *b / 2.

For the array version, the compiler can check to see if a and b
overlap, and if not it doesn't have to load b[i] twice. For the
pointer version, the compiler has to figure out what a, b, c, and d
point at, if it even bothers at all.

>  Yes, most
>implementations of C don't check the bounds, but that is because in
>the definition of the C language, run-time errors are generally
>defined to produced undefined results.

I'm not worried about run-time errors, I'm worried about run-time
efficiency. Some languages are hard to optimize. When C programers do
strength reduction by hand on array accesses, they create an
optimization mess.

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (10/29/90)

In article <1990Oct28.015733.9181@murdoch.acc.Virginia.EDU>, gl8f@astsun9.astro.Virginia.EDU (Greg Lindahl) writes:
> Actually, it just shows that we're talking past each other most of the
> time. Here's a practical example where the boundedness of array
> references allows you to easily generate better code:
> 
>      a[i] = b[i] + c[i]             *a = *b + *c;
>      d[i] = b[i] / 2.               *d = *b / 2.
> 
> For the array version, the compiler can check to see if a and b
> overlap, and if not it doesn't have to load b[i] twice. For the
> pointer version, the compiler has to figure out what a, b, c, and d
> point at, if it even bothers at all.

Yes, *sigh* we're all talking past each other.
	THIS IS NOT AN INTRINSIC PROPERTY OF POINTERS;
	it is a property of the programming language *C*.

I haven't my Euclid manual handy, so I'll use Pascal syntax and
the Euclid idea:
	var
	    az: zone of real;
	    bz: zone of real;
	    ap, dp: ^az;
	    bp, cp: ^bz;
	begin
	    new(ap);	(* allocates from zone az only *)
	    ...
	    ap^ := bp^ + cp^;	(* this assignment CANNOT change bp^ or cp^ *)
	    dp^ := bp^/2;	(* neither can this *)	    
	    ...

The fact that C pointers are not statically associated with
separate zones is no more a fundamental property of pointers
than the fact that PL/I pointers are not statically associated
with a type.

To put this into C terms, suppose we introduce a new type
constructor "pointer into named array".  Here's how the example
might look:

	float a[N], b[N], c[N], d[N];
	float *[a] ap, *[b] bp, *[c] cp, *[d] dp;
	... ap = ... bp = ... dp = ...
	*ap = *bp + *cp;	/* can ONLY change a[] and *ap */
	*dp = *bp/2;		/* can ONLY change d[] and *dp */

This would be an upwards compatible extension C, just as 'const' was.
-- 
The problem about real life is that moving one's knight to QB3
may always be replied to with a lob across the net.  --Alasdair Macintyre.

gl8f@astsun7.astro.Virginia.EDU (Greg Lindahl) (10/29/90)

In article <4119@goanna.cs.rmit.oz.au> ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes:

>Yes, *sigh* we're all talking past each other.
>	THIS IS NOT AN INTRINSIC PROPERTY OF POINTERS;
>	it is a property of the programming language *C*.

Depends on how you define "pointers" ;-)

>To put this into C terms, suppose we introduce a new type
>constructor "pointer into named array".

C already has these. An example:

     float a[100];
     int i = 10;

     a[i];

Cheers (and don't forget the ;-),

-- greg

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (10/29/90)

In article <1990Oct29.051730.10838@murdoch.acc.Virginia.EDU>, gl8f@astsun7.astro.Virginia.EDU (Greg Lindahl) writes:
> In article <4119@goanna.cs.rmit.oz.au> ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes:
> >Yes, *sigh* we're all talking past each other.
> >	THIS IS NOT AN INTRINSIC PROPERTY OF POINTERS;
> >	it is a property of the programming language *C*.

> Depends on how you define "pointers" ;-)

I certainly *don't* define "pointer" to mean "exactly what C means by
a pointer, no more, no less".  PL/I and Pascal and Algol 68 are some
well-known languages that have pointer-valued variables; Euclid is a
language which deserved to be better known.
None of them provides pointer arithmetic or pointer ordering.

If people want to say "pointer ARITHMETIC" is bad, let them say so.
If people want to say "pointer ORDERING" is bad, let them say so.
If people say "POINTERS" are bad, and they are talking with the
intention of being understand, then they are referring to the
idea behind PL/I POINTER, Pascal "^", Algol 68 and Mary REF, and
Euclid's whatever-it-was.  The question is whether this common
idea can be "tamed".  I am still waiting to hear what the objection
to Euclid's construct is.

> >To put this into C terms, suppose we introduce a new type
> >constructor "pointer into named array".

> C already has these. An example:

>      float a[100];
>      int i = 10;
>      a[i];

Where is "i" constrained to index only "a"?
(Am I the only person who remembers ESPL?  In ESPL you *could*
declare a variable to be a "verified index" for a particular array.)

-- 
The problem about real life is that moving one's knight to QB3
may always be replied to with a lob across the net.  --Alasdair Macintyre.

anw@maths.nott.ac.uk (Dr A. N. Walker) (10/30/90)

In article <3791@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>> In article  <3681@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>> ]Really?  I don't know _any_ language that has pointers that places
>> ]bounds on where it can point.

	Pascal:  can only point to specially allocated storage
		[ie, no equivalent to "int i, *pi = &i;"]
	Algol: cannot point to objects of narrower scope
		[eg, the equivalent of "int *pi; f() { int i; pi = &i; }"
		 is illegal]

>With pointers, [...] The compiler _can't_ make any simple automatic bounds
>checks (although, in C it can perhaps automatically limit index
>calculations).

	Well, it's only non-simple in the case you describe elsewhere of
address punning through casts;  these are (or ought to be) rare enough
that a non-trivial check [eg, by tagging] has only marginal effects on
the efficiency.  Older readers will recall that it used to be *extremely
normal* to test programs with a whole battery of checks "enabled", with
a consequent large penalty in run-time speed, and to disable the checks
for "production runs".  If it is unacceptable these days to disable the
checks, then the answer is to provide them in hardware, as we used to
in the 50s and 60s.

>Variables _not_ in this "family"
	[of declared-to-be-alias-able variables]
>				  are _known_ not to be aliased to
>anything within it

	Yes, but one trouble is that one of the commonest cases in
practice is where a variable is aliassed with itself, as in a[i] v. a[j].
Checking that i != j can plainly only be done, in general, at run-time.
It gets worse when you are worried about a[j] v. b[j] where a, b and j
are parameters to some procedure -- you plainly need to be sure that
a != b [and it gets worse in languages that allow slicing and other
subsetting of arrays], which again involves arbitrarily messy checks if
a and b are potentially inherited from further procedures and you insist
that the checks be carried out by compiler and loader.  The problems
will only be exacerbated if you deprive programmers of natural ways of
using pointers, so that they find themselves [through brute force or
ignorance] emulating proper data structures by indexing into arrays.

	You are right that in many cases pointerless code can help
the compiler by making certain aliasses impossible;  on the other
hand, in most if not all of the cases where this matters, the
compiler can make the same deductions given a program which is
properly divided into modules and in which scope rules are enforced.

-- 
Andy Walker, Maths Dept., Nott'm Univ., UK.
anw@maths.nott.ac.uk

jlg@lanl.gov (Jim Giles) (10/30/90)

From article <1990Oct29.191321.3202@maths.nott.ac.uk>, by anw@maths.nott.ac.uk (Dr A. N. Walker):
> [...]
> 	Yes, but one trouble is that one of the commonest cases in
> practice is where a variable is aliassed with itself, as in a[i] v. a[j].

Well, it is one of the common forms of aliasing.  However, it is clearly
marked as such: both are references to 'a'.  Further, it is a very rare
problem compared to array processing with 'a[j]' vs. 'b[j]' - here it
is _very_ important that we _know_ 'a' to be disjoint from 'b'.

> [...]
> It gets worse when you are worried about a[j] v. b[j] where a, b and j
> are parameters to some procedure -- you plainly need to be sure that
> a != b [...]

That has been my point.  If 'a' and 'b' aren't declared together in
the same 'aliased' declaration, they _can't_ be aliased.  Not at all.
Ever.

This requires that the loader check procedure arguments against
alias status and guarantee that the caller doesn't pass aliased
arguments to a subroutine that doesn't also declare them aliased.
Since all the tests are done at compile- and load-time, there is
no run-time penalty.  The price to be paid for this is that any
case that is still ambiguous at load-time must be assumed aliased.

> [...]  [and it gets worse in languages that allow slicing and other
> subsetting of arrays], [...]

Exactly the reason for wanting to get explicit compile-time and
load-time checkable constraints into the language.  This permits
the compiler to generate efficient code for those things (the majority)
which are _known_ not to be aliased.

> [...]                  which again involves arbitrarily messy checks if
> a and b are potentially inherited from further procedures and you insist
> that the checks be carried out by compiler and loader.  [...]

It is because of this problem of inheriting data through several levels
of the call chain that the test in the loader is most important.  The
loader _can_ perform this test reliably and quickly.  This permits
efficient code to be generated while maintaining confidence that
the no-alias assumptions won't be violated.

> [...]                                                   The problems
> will only be exacerbated if you deprive programmers of natural ways of
> using pointers, so that they find themselves [through brute force or
> ignorance] emulating proper data structures by indexing into arrays.

I don't know any natural ways of using pointers.  Pointers are one of
the most unnatural data structuring tools that I've ever encountered.
Further, in the context of the data structuring tools I've recommended,
I see no reason to use arrays to simulate any data structure other than
arrays.   Linked lists, graphs, trees, etc. should all be implemented
as recursive data structures (aliased or not as teh need arises).
Where's the need for arrays?

> 
> 	You are right that in many cases pointerless code can help
> the compiler by making certain aliasses impossible;  on the other
> hand, in most if not all of the cases where this matters, the
> compiler can make the same deductions given a program which is
> properly divided into modules and in which scope rules are enforced.

You yourself have pointed out many specific cases where a compiler
CANNOT make such deductions.  The case of arrays 'a' and 'b'
being passed to the same function as arguments is a case in point:

real function :: example(real::a(:),b(:))
   do while (some_condition)
      SOME CODE INVOLVING BOTH a AND b
   end do
end function

Since 'a' and 'b' aren't declared with the 'aliased' attribute, then
they _must_ not be aliased.  The loader can and _should_ determine
that this constraint is obeyed.  Using the language features I've
been recommending, the compiler can optimize the loop all it wants.
Without the constraints I advocate, such optimizations are eather
unsafe or not applied at all.

J. Giles

gudeman@cs.arizona.edu (David Gudeman) (10/31/90)

In article  <4349@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
]That has been my point.  If 'a' and 'b' aren't declared together in
]the same 'aliased' declaration, they _can't_ be aliased.  Not at all.

I don't see why you keep arguing that "arrays with this new 'alias'
declaration of mine" are more optimizable than "pointers without the
'alias' declaration".  You are comparing apples and oranges.  The
appropriate comparison is to compare the optimization potential of
arrays vs. pointers either both with or both without the 'alias'
declaration.
-- 
					David Gudeman
Department of Computer Science
The University of Arizona        gudeman@cs.arizona.edu
Tucson, AZ 85721                 noao!arizona!gudeman

jlg@lanl.gov (Jim Giles) (10/31/90)

From article <26971@megaron.cs.arizona.edu>, by gudeman@cs.arizona.edu (David Gudeman):
> [...]                  You are comparing apples and oranges.  The
> appropriate comparison is to compare the optimization potential of
> arrays vs. pointers either both with or both without the 'alias'
> declaration.

Ok.  First _without_ the 'aliased' attribute:
   If your language doesn't have the ability to declare aliasing
   status, arrays are a direct win over pointers (which is why
   Fortran is usually faster than C on array intensive code).  Of
   course, optimizing the arrays this way is unsafe unless the
   loader checks all passed arguments to enforce the constraint that
   distinct arrays are NOT aliased.  Most Fortran environments
   _don't_ test this - so you can have strange and difficult to find
   errors arising from this cause.  Meanwhile, C converts all array
   args to pointers and generates inefficient code.  I don't regard
   either solution satisfactory.

Now _with_ the 'aliased' attribute:
   If your language _does_ have the ability to declare aliasing status
   (like the one I'm recommending), then pointers don't give you any
   capability you don't already have _without_ them.  Pointers in such
   a language are neither more powerful, more legible, nor more
   efficient than an appropriate combination of the other features
   I've mentioned.  So, what do we need pointers for?

   In fact, there are other distinctions between the different data
   types I've recommended _besides_ their aliasing status which yield
   optimization possibilities in addition to the alias free nature of
   the language.  Using pointers to simulate these data types
   (sequences for example) deprives the compiler of information which
   _could_ be used to improve the performance of the code.

So, arrays (and/or the other data types I've mentioned) are an
improvement on pointers - with or without the 'aliased' attribute.

I recommend the 'aliased' attribute in order to provide the
capability (especially in connection with recursive data types) to
do all the things that Pascal pointers do.  The aliased attribute
achieves this and at the same time allows easier optimization: the
compiler has an explicit local declaration of all possible aliasing
in each routine - the rest of the variables are _guaranteed_ not to
be aliased.

The price of testing this constraint is the load-time test that
Fortran compiler/loader environments _should_ already be doing.
This can be made considerably easier with a requirement of function
interface/prototype declarations of all procedures that are to be
called.  The compiler can then test the aliasing constraint at
compile-time and the loader need only make sure that the
interface/prototype information actually matches the corresponding
procedure declaration.

J. Giles

gudeman@cs.arizona.edu (David Gudeman) (11/01/90)

In article  <4464@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
]
]Ok.  First _without_ the 'aliased' attribute:
]   ... arrays are a direct win over pointers (which is why
]   Fortran is usually faster than C on array intensive code).  Of
]   course, optimizing the arrays this way is unsafe unless the
]   loader checks all passed arguments to enforce the constraint that
]   distinct arrays are NOT aliased.

You are still comparing apples and oranges.  You are comparing
"arrays, assumed not to be aliased" with "pointers, assumed to be
aliased".

]Now _with_ the 'aliased' attribute:
]   (like the one I'm recommending), then pointers don't give you any
]   capability you don't already have _without_ them.  Pointers in such
]   a language are neither more powerful, more legible,...

This, of course, is a matter of opinion.  Maybe to clarify things, you
could post your definition of "pointer".  In particular, I'm
curious about how you think passing a pointer (C-style) is different
from passing an array.

]   ...  Using pointers to simulate these data types
]   (sequences for example) deprives the compiler of information which
]   _could_ be used to improve the performance of the code.

I'd also like to know what information this is.  Presumably a complete
definition of what you mean by "pointer" and "array" will make this
obvious.
-- 
					David Gudeman
Department of Computer Science
The University of Arizona        gudeman@cs.arizona.edu
Tucson, AZ 85721                 noao!arizona!gudeman

jlg@lanl.gov (Jim Giles) (11/01/90)

From article <27028@megaron.cs.arizona.edu>, by gudeman@cs.arizona.edu (David Gudeman):
> [...]
> You are still comparing apples and oranges.  You are comparing
> "arrays, assumed not to be aliased" with "pointers, assumed to be
> aliased".

Oh, I'm sorry.  I have given this information before, but perhaps you
missed it.  An array is a mapping from one or more bounded index sets
to values of an underlying type.  Each array is assumed distinct from
all others unless _explicitly_ declared otherwise.

Note: such explicit declaration of aliasing must be locally visible
to the compiler at compile time and any attempt to cheat must be
intercepted be load-time checks.

> [...]
> ]Now _with_ the 'aliased' attribute:
> ]   (like the one I'm recommending), then pointers don't give you any
> ]   capability you don't already have _without_ them.  Pointers in such
> ]   a language are neither more powerful, more legible,...
> 
> This, of course, is a matter of opinion.  [...]

The 'soft' words (like 'legible') might be considered opinion, the
statement that pointers provide no more power is a statement of _fact_.

> [...]                                     Maybe to clarify things, you
> could post your definition of "pointer".  [...]

I've done this before too.  Perhaps you missed it.

My definition of pointer is a variable whose value is an address and
has the following operations defined on it:

   Dereferencing to some predefined (for each pointer) type object.
   Comparing to other pointers (of the same underlying type) for equality.
   Assigning to/from other pointers (of the same underlying type)
   Being initialized to some address by an allocation mechanism

This is essentially the definition of Pascal pointers.  In addition,
other operations are defined by some languages:

   Comparing for relative order
   Scaled differences between two pointers
   Scaled sums of a pointer with an integer
   Casting to some other underlying type
   ...

> [...]                                     In particular, I'm
> curious about how you think passing a pointer (C-style) is different
> from passing an array.

I've posted this before too.  A C style pointer (because C has the
capability of scaled address addition) is similar (but not identical
to) the passing of a one dimensional array which is indexed by
integers.  But, the array differs by requiring its bounds to be
passed and by its ability to be multidimensional and by the fact
that it is _not_ aliased or overlapping with any other argument or
global variable without explicit _local_ declaration of the fact.

In fact, my definition of arrays differs from the Fortran definition
only in the fact that I allow array args to be aliased if explicitly
declared so, and Fortran _never_ allows them to be.  Also, I would
recommend that the array bounds be passed implicitly rather than
requiring the user to do it.

> [...]
> ]   ...  Using pointers to simulate these data types
> ]   (sequences for example) deprives the compiler of information which
> ]   _could_ be used to improve the performance of the code.
> 
> I'd also like to know what information this is.  Presumably a complete
> definition of what you mean by "pointer" and "array" will make this
> obvious.

I gave an example of this before too.  Consider the Fortran loop:

      do 10 i=1,N
	 x(1,i) = x(2,i)
  10  continue

It is easy for the compiler to see that the second column is being
copied to the first and the source and destinations don't overlap.
So, the code can easily be vectorized, pipelined, or unrolled for
faster execution.  Even the IBM PC can benefit from this knowledge
(and issue a block move instruction).

The corresponding C code is:

      p = &x;
      q = p + N;
      for (i=0;i<N;i++) *p++ = *q++;

I've taken the obvious step of making array 'x' be row-wise to make
the C version the same as the Fortran.  Note that it's not as easy
for either the user or the compiler to tell that the move is safe
from dependencies.  It's possible: 'q' starts out 'N' ahead of 'p'
and the loop only takes 'N' steps.  But the test is clearly more
complicated than the array version required.

Suppose that you have to do the assignment 'against the grain' of
the array.  That is, the elements to be moved aren't the consecutive
ones.  The Fortran version is:

      do 10 i=1,M
	 x(i,1) = x(i,2)
   10 continue

The compiler can again _easily_ see that there are no dependencies.
The C version is:

      p = &x;
      q = p + 1;
      for (i=0;i<M;i++,p+=M,q+=M) *p = *q;

Here, the determination that there is no overlap requires the
compiler to see that the 'M'-parity of 'p' is always zero and
the 'M'-parity of 'q' is always one (both based on the base of
'x').  Few C compilers can even do the first example.  I don't
know any that can see this last.

Note:  I picked Fortran as the counter-example but I could just as
easily have used Pascal, Modula, Ada, Algol, etc..  In all of these,
the compiler can take advantage of the information which is lost if
the operation done using pointers.  Even in C, in those few contexts
where multidimensional arrays are actually recognized, it is better
to use the array notation so the compiler has more complete
information (if you can find a C compiler capable of making use
of such information).

J. Giles

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (11/01/90)

In article <4569@lanl.gov>, jlg@lanl.gov (Jim Giles) writes:
> From article <27028@megaron.cs.arizona.edu>, by gudeman@cs.arizona.edu (David Gudeman):
> > You are still comparing apples and oranges.  You are comparing
> > "arrays, assumed not to be aliased" with "pointers, assumed to be
> > aliased".

> Oh, I'm sorry.  I have given this information before, but perhaps you
> missed it.  An array is a mapping from one or more bounded index sets
> to values of an underlying type.  Each array is assumed distinct from
> all others unless _explicitly_ declared otherwise.

A consequence of this definition is that array cross-sections (as in
PL/I, Algol 68, and Fortran Extended) are *NOT* arrays.  I don't
remember seeing anything in the Algol 68 definition that forbade
aliasing, so by this definition Algol 68 didn't have array parameters
at all.  I have never seen the PL/I standard, so although I was never
warned against aliasing in PL/I programs that may have been a fault of
the textbooks and vendors' manuals that I used (thankfully, it has been
some years since I last used PL/I).

In fact, it appears that only APL (which uses value semantics for all
parameters and assignments) Euclid, and Fortran have arrays (and with
nested arrays, it's not clear to me whether APL2 has them).
-- 
The problem about real life is that moving one's knight to QB3
may always be replied to with a lob across the net.  --Alasdair Macintyre.

jlg@lanl.gov (Jim Giles) (11/02/90)

From article <4181@goanna.cs.rmit.oz.au>, by ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe):
> In article <4569@lanl.gov>, jlg@lanl.gov (Jim Giles) writes:
>> [...]       An array is a mapping from one or more bounded index sets
>> to values of an underlying type.  Each array is assumed distinct from
>> all others unless _explicitly_ declared otherwise.
> 
> A consequence of this definition is that array cross-sections (as in
> PL/I, Algol 68, and Fortran Extended) are *NOT* arrays. [...]

Well, Fortran Extended array cross-sections certainly are not arrays.
If they are passed to subroutines, they can become arrays there.  The
reason they aren't arrays has nothing to do with aliasing though -
cross sections are not indexible (so they don't meet the frist part of
my definition of being mappings from index sets to values).

Fortran Extended pointers are aliased arrays (which can be aliased to
anything in their scope that has the target attribute and is the same
underlying type).  But, this constitutes an explicit declaration
(thought not explicit enough to my taste - I have opposed the new
standard) so they still fit into my definition (if only barely).

An array section passed to a subroutine must not be aliased to any
other argument or global variable that that is visible in the scope of
the subroutine (unless explicitly declared as usual).  In this case,
the section is considered an array again - it can be indexed, etc..
Once again, this fits my definition.

Still, you make a good point.  The operations allowed on array
sections should be carefully considered.  The purpose of making the
definition of arrays as I did was to allow the compiler (with very
little assist from the loader) to produce efficient code for array
manipulation and be assured that the optimizations are safe from
aliasing.  If some operations on array sections are possible which
preserve this property but are in contradiction to my definition of
arrays, then I am willing to add array sections as a separate data
construct to my list and define those new operations on them.

J. Giles

gl8f@astsun8.astro.Virginia.EDU (Greg Lindahl) (11/02/90)

In article <27028@megaron.cs.arizona.edu> gudeman@cs.arizona.edu (David Gudeman) writes:
>In article  <4464@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>]
>]Ok.  First _without_ the 'aliased' attribute:
>]   ... arrays are a direct win over pointers (which is why
>]   Fortran is usually faster than C on array intensive code).  Of
>]   course, optimizing the arrays this way is unsafe unless the
>]   loader checks all passed arguments to enforce the constraint that
>]   distinct arrays are NOT aliased.
>
>You are still comparing apples and oranges.  You are comparing
>"arrays, assumed not to be aliased" with "pointers, assumed to be
>aliased".

Looks to me that he's trying to figure out the fastest way to do
array-intensive calculations in C and Fortran. In that case, the
problems with C pointers cause C compilers to miss optimizations. If
you aren't concerned about such a situation, it's not surprising that
you two are talking past each other.

gudeman@cs.arizona.edu (David Gudeman) (11/02/90)

In article  <4569@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
]The 'soft' words (like 'legible') might be considered opinion, the
]statement that pointers provide no more power is a statement of _fact_.

Not until you define "power".

]... A C style pointer ... is similar (but not identical
]to) the passing of a one dimensional array which is indexed by
]integers.  But, the array differs by requiring its bounds to be
]passed and by its ability to be multidimensional and by the fact
]that it is _not_ aliased or overlapping with any other argument or
]global variable without explicit _local_ declaration of the fact.

Well then you are really making three distinct arguments and
confusing your readers by merging them into one, aided and abetted
by a specialized terminology that the net public in general do not
share.  Your arguments seem to be (I'll assume everyone agrees that
arrays are multi-dimensional):

(1) Arrays should be bounds checked.

(2) Aliasing of data structures should not be allowed unless made
explicit by declarations.

(3) Pointers should not have a dereferencing operation.

To take the last point first, you keep saying that you want to
eliminate pointers, but you also keep saying that you would replace
them by something that is semantically equivalent.  All you have done
is replaced the notion of an reference-valued object with a
co-reference, replaced pointer creation with a special assignment, and
removed the need for a dereference operation.  You haven't eliminated
pointers, you have just changed them to something similar.

As to the other points, you keep saying that "arrays have property X
and pointers don't", when what you really mean is that bounds-checked
arrays with guaranteed no-aliasing have property X and that raw
machine addresses don't have property X.  If you wouldn't keep
refering to raw machine addresses as pointers there would be a lot
less confusion.  There is no reason why pointers in some language
could not be defined to be bounded and unaliased.  If pointers were
defined that way, then pointers would have property X.  Property X
here mostly refers to your assertions about the greater optimizability
and safety of arrays over pointers.  (Your example depended on the
assumption that the C pointers were not bounded and that arbitrary
aliasing is possible.)

To make the situation even more bemusing, your version of pointers
without dereferencing constitutes a direct counter-example to what you
claim is impossible for pointers.
-- 
					David Gudeman
Department of Computer Science
The University of Arizona        gudeman@cs.arizona.edu
Tucson, AZ 85721                 noao!arizona!gudeman

anw@maths.nott.ac.uk (Dr A. N. Walker) (11/03/90)

In article <4349@lanl.gov> jlg@lanl.gov (Jim Giles) writes:

	[Quite a lot, but I think we've reached a stage where we can
say what the situation is, and then agree to differ.]

	Suppose we have a procedure with two arrays as parameters.  In
a call of the procedure, the actual parameters may be arbitrarily messily
derived from other parameters in the call chain, may be slices or other
subsets [not in all languages, of course] perhaps of the same array.
There is a spiffy optimisation available (say, a block move, or a clever
vectorisation) that sometimes doesn't work if the parameters (call them
A and B) are not sufficiently different.  Call the optimised version of
the code ALPHA, and the unoptimised version BETA.  In real life, though
not on the machines available to impoverished Maths departments, ALPHA
may run (say) 30 times faster than BETA.

	Then Fortran should compile the code somewhat like

		IF interfere (A, B) THEN undefined ELSE ALPHA

As ALPHA is a perfectly good value of "undefined", this is usually optimised
to just ALPHA.  We would all agree, I hope, that this is less than ideal,
and can give rise to difficult-to-uncover bugs.

	C should compile the code to something like

		IF interfere (A, B) THEN BETA ELSE ALPHA

In practice, to economise on code size, unless "interfere" can be determined
very easily at compile time to be FALSE, it will compile to BETA.  This
is clearly slower ("less efficient") than the Fortran.

	JLG would like us to declare whether or not A and B can interfere.
If we *do* so declare, then it compiles to BETA, otherwise the code is

		IF interfere (A, B) THEN ERROR ELSE ALPHA

where, we hope, "interfere" can usually, if not always, be decided
statically by the compiler and loader, giving another optimisation.

	JLG's solution is obviously better than the Fortran.  Further,
it satisfies the rule that optimisation must not break code that, prima
facie, works, so it is a "correct" solution.  Further, it is clearly no
slower than the C, and perhaps much faster.  He therefore has a tenable
case.

	The counter argument is that it puts the burden in the wrong
place.  JLG is asking the *programmer* to assert something that may
not be assertable.  Some of the procedures in the call chain may be
library procedures, or may involve global variables from other modules,
for example.  How do *I* know what aliassing may occur?  Further,
in languages which (very usefully) allow dynamic array slicing, the
presence or absence of aliassing may often only be determinable at
run time, so either I program "defensively" and declare the alias
(so having a program which always runs slowly), or I take pot luck
and have a program which may fail at run time.

	In fact, the *real* situation, is that if the compiler and
loader can determine whether or not aliassing can occur, then the
"alias" declaration is unnecessary, and if they can't, then a run-time
check is needed anyway.  Hints to compilers, if thought desirable,
could be given in all sorts of other ways.

>I don't know any natural ways of using pointers.

	Well, we've been round this before.  *I* find them useful;
why do you want me to change my natural images and pictures of what
my programs do?  No-one is forcing *you* to use them.

	If you insist on thinking of pointers merely as ways of
storing machine addresses in a half-hearted attempt to avoid the
(assumed) overheads of array indexing, then of course you find
them unnatural.  On the other hand, if you have (say) a graph,
and at some point in your program you discover a node with an
interesting property, how *do* you naturally refer later to that
same node *other than* by explicitly or implicitly keeping a
pointer to that node?

>	   Linked lists, graphs, trees, etc. should all be implemented
>as recursive data structures (aliased or not as teh need arises).

	Even if you implement the graph itself as a recursive structure
(implied:  without pointers [and I am by no means convinced that that
is desirable, even if it is possible]), you may still find yourself
wanting to keep fingers on particular nodes.  A finger by any other
name is still a pointer.

-- 
Andy Walker, Maths Dept., Nott'm Univ., UK.
anw@maths.nott.ac.uk

jlg@lanl.gov (Jim Giles) (11/03/90)

From article <27094@megaron.cs.arizona.edu>, by gudeman@cs.arizona.edu (David Gudeman):
> [...]
> Well then you are really making three distinct arguments and
> confusing your readers by merging them into one, aided and abetted
> by a specialized terminology that the net public in general do not
> share.  Your arguments seem to be (I'll assume everyone agrees that
> arrays are multi-dimensional):

I'm glad you assume that arrays can be multidimensional.  Before the
advent of C, this was accepted by nearly everyone as one of the possible
properties of arrays.  This is not (or, at least, _should_ not be)
"specialized terminology that the net public in general do not share."

> [...]
> (1) Arrays should be bounds checked.

At least as an option, yes.  Further, the compiler should be allowed
(and able) to make use of its knowledge of the bounds to generate more
efficient code (especially for statically allocated arrays where the
bounds are fixed).  Again, before C, this was a universal assumption.
Are you telling me that this property of arrays is one that net readers
are not aware of?

> [...]
> (2) Aliasing of data structures should not be allowed unless made
> explicit by declarations.

That is the only new point that I have introduced.  And, as such,
it has been the one that I've taken most pains to point out.
Nevertheless, this has _always_ been true of arrays in most
programming languages (other than C and its relatives).  So, with
respect to _arrays_, this should still be a fairly widely
acknowledged property - even if not universal because of C.

>  [...]
> (3) Pointers should not have a dereferencing operation.

This is only a minor point which makes the syntax of using recursive
data structures more legible.  In fact, the only objections I have
to pointers a`la Pascal are the syntax and the lack of a way to
inform the compiler when two pointers of the same type are _not_
aliased.  So, aside from those two changes, my "aliased" attribute
is identical to Pascal style pointers.  This has little to do with
the `C style pointer vs. array' debate.

> [...]                                          You haven't eliminated
> pointers, you have just changed them to something similar.

When comparing to Pascal sytle pointers, this is indeed what I have
done.  It is also what I've said all along that I've done.  That is
what I mean when I say that something is semantically equivalent:
it has the same meaning with a possibly different syntax.

> [...]            There is no reason why pointers in some language
> could not be defined to be bounded and unaliased.  If pointers were
> defined that way, then pointers would have property X. [...]

In which case, why call them pointers?  Surely to call such things
"pointers" would be to use "specialized terminology that the net
public in general do not share." In fact, the terminology that the
net public share is C and/or possibly Fortran and Pascal.  Most
other languages seem to be relegated to the category of esoterica.
Certainly, nothing outside those bounds can be regarded as shared.
So, in a net discussion, pointers are those things in C or perhaps
limited to those things in Pascal.

> [...]
> To make the situation even more bemusing, your version of pointers
> without dereferencing constitutes a direct counter-example to what you
> claim is impossible for pointers.

On the contrary.  If you'll look more carefully at my past articles on
this subject, you will find that the objects that I call "aliased"
variables have all the bad properties that I ascribe to pointers.
They are more strictly bounded, so that the damage is limited to those
operations involving variables in the same "aliased" declaration.
But, the damage is there.  This more strict limitation on aliasing is
one of the two differences between my proposal an Pascal pointers:
this is the only semantic difference.  Again, this has little
relevance to the `C-pointer vs. arrays' debate.

J. Giles

jlg@lanl.gov (Jim Giles) (11/03/90)

From article <1990Nov2.183359.6761@maths.nott.ac.uk>, by anw@maths.nott.ac.uk (Dr A. N. Walker):
> [...stuff that shows that _sombody_ finally understands what I'm saying...]
>
> 	The counter argument is that it puts the burden in the wrong
> place.  JLG is asking the *programmer* to assert something that may
> not be assertable.  Some of the procedures in the call chain may be
> library procedures, or may involve global variables from other modules,
> for example.  How do *I* know what aliassing may occur?  [...]

Exactly the problem I kept trying to point out to someone via private
email (and the reason I don't deal with that person over email any
more - he ignored the point and kept claiming that the compiler by
itself could do all).  It is for this reason that the loader is
required to do some work.  It _can_ do the call tree analysis (if the
compiler passes along all the information).  Any errors the loader
finds in this search correspond to aliasing that you didn't declare
(or possibly, aliasing that you _did_ declare unnecessarily).  This is
how you learn what aliasing may occur.  I maintain loaders in large
machine environments - so I'm reasonably sure that the loader is up to
the task provided the allowed aliasing is specified clearly enough.

> [...]                                                    Further,
> in languages which (very usefully) allow dynamic array slicing, the
> presence or absence of aliassing may often only be determinable at
> run time, so either I program "defensively" and declare the alias
> (so having a program which always runs slowly), or I take pot luck
> and have a program which may fail at run time.

Yes, this is a problem.  This is one reason that I insist on complete
information about arrays to be retained by the parameter passing
mechanism.  The compiler _can_ analyse _some_ of these occurrences
and give messages if you guessed wrong.  To be sure, there are some
cases where the compiler can't decide - and you can't either - so
you have to program defensively in those cases.  Fortunately, such
cases, while common, are not in the majority - the method I suggest
will allow significant and useful control for the programmer in
most cases

> [...]
> 	In fact, the *real* situation, is that if the compiler and
> loader can determine whether or not aliassing can occur, then the
> "alias" declaration is unnecessary, [...]

Well, true only if the loader is able to do code generation or the
compiler is able to do interprocedural analysis.  In existing
implementations, the information found by the loader is too late
to effect what code the compiler generates.  This is why the
declarations at compile time are necessary - the loader only
checks to make sure that your intended constraints are actually
met.  (Unfortunately, if they are not met, it is indeed the
user's problem to correct.  But the loader can at least give
full particulars about what didn't match-up.)

> [...]                               and if they can't, then a run-time
> check is needed anyway.  [...]

Or, defensive programming (and accepting the loss of efficiency)
is also possible here.  I don't know which to prefer.  The run-time
check is only required for those cases which the compiler/loader
combination couldn't fathom.  This may happen in a lot of important
cases, but this still leaves most cases working efficiently with
the scheme I propose.

> [...]                    Hints to compilers, if thought desirable,
> could be given in all sorts of other ways.

Yes, but this is the way I've chosen.  You can disagree with the syntax
or argue that it fails in some difficult to fathom case (like passing
the red and black squares of a chessboard as separate arguments), but
I will still claim that it's useful in many other important cases.
If you have an alternate mechanism to suggest, please feel free!
I'd like to see other people's ideas.

> [...]            On the other hand, if you have (say) a graph,
> and at some point in your program you discover a node with an
> interesting property, how *do* you naturally refer later to that
> same node *other than* by explicitly or implicitly keeping a
> pointer to that node?

By keeping an aliased variable of the same recursive type which
references that node.  Note, as I've repeatedly said, "aliased"
variables provide the _same_ functionality as Pascal pointers.
In fact, the compiler _should_ generate the same code (if not
better, since my suggestion limits aliasing more strictly).

> [...]                                         A finger by any other
> name is still a pointer.

Well, I will continue to stick to the definition of pointers that
I've posted.  Since my "aliased" variables are more restricted I
will continue not to call them pointers.  But, they allow the user
to implement the same structures (in the same way: if he aliases
everything) as Pascal pointers do.  For recursive data structures,
"aliased" variables work the same way Pascal pointers do, but the
class of objects each variable may be aliased to is much more
severely limited - that's the whole difference apart from minor
syntactic changes.

As for a finger being a pointer: only when I point with it.

J. Giles

peter@ficc.ferranti.com (Peter da Silva) (11/04/90)

> > (2) Aliasing of data structures should not be allowed unless made
> > explicit by declarations.

> Nevertheless, this has _always_ been true of arrays in most
> programming languages (other than C and its relatives).

And in practice this has pretty much never been enforced. In practice all
function and subroutine parameters in Fortran have all the characteristics
of pointers.
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.   'U`
peter@ferranti.com

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/06/90)

In article <3656@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> But the context of the discussion makes it
> clear that the person who posted TeX as an example of pointer efficiency
> was of the opinion that pointers and arrays differed importantly.

That is correct. In Q there is no relation between pointers and arrays,
except that a standard pointer abbreviation p[i] happens to look like
the array indexing builtin a[i].

> In
> fact, in the same news article that he proposed TeX, he also gave
> examples which attempted to prove the supposed superiority of pointers
> to arrays.

Close enough. I will return to this point several articles from now.

---Dan

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/06/90)

In article <3681@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> If pointer arithmetic across segment boundaries
> is not reliable, then pointers can't be reliably used to implement
> the memory manager (or, at least, not without considerable extra
> difficulty).

This statement is unjustified. It is, in fact, false. I am told that at
least one C IBM PC memory manager keeps track of memory separately in
each segment.

---Dan

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/06/90)

In article <4349@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> It is because of this problem of inheriting data through several levels
> of the call chain that the test in the loader is most important.  The
> loader _can_ perform this test reliably and quickly.

It is inappropriate to give the loader sufficient knowledge of your
language to perform these tests. It is also rather stupid to delay
checks until load time, since some packages (such as libraries) may not
be linked until much later.

The .h-.c (package spec-package body) model is sufficient to detect
procedure call interface errors at compile time.

> I don't know any natural ways of using pointers.  Pointers are one of
> the most unnatural data structuring tools that I've ever encountered.

Knuth says that all data structures in the real world seem to be quite
appropriately modelled as a set of structures, some fields of which
contain pointers to other structures. There are many, many, many people
who agree with him that pointers are a natural way to express data
structures. I am not saying that you're wrong, but your statement is not
objective.

---Dan

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/06/90)

In article <4837@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> > [...]            There is no reason why pointers in some language
> > could not be defined to be bounded and unaliased.  If pointers were
> > defined that way, then pointers would have property X. [...]
> In which case, why call them pointers?  Surely to call such things
> "pointers" would be to use "specialized terminology that the net
> public in general do not share."

But many of us agree in substance with your definition of pointers as
objects to which you assign the ``address'' (whatever that is) of
another object, and which you ``dereference'' to get the value of or
store into that object. That's the essence of a pointer, and adding
assertions to pointers doesn't take away that essence.

(By the way, I don't know anyone who agrees with you that addresses in
the above definition need to have anything to do with machine addresses.
But that's a minor quibble.)

---Dan

jlg@lanl.gov (Jim Giles) (11/07/90)

From article <1845:Nov607:02:2390@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> In article <3681@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>> [... pointers not reliable across segments ...]
> 
> This statement is unjustified. It is, in fact, false. I am told that at
> least one C IBM PC memory manager keeps track of memory separately in
> each segment.

Exactly my point, pointers are presumably being used _within_
segments, but they are not adequate for writing the _whole_ memory
manager - something else must be in use to track the segments.

However, you are missing my point here.  In this one particular case
I actually _support_ the use of pointers (or some form of unbounded
address arithmetic).  I oppose the above mentioned ANSI C constraint
on pointers because the ability to do unbounded address calculations
was the only use I thought they were fit for.

J. Giles

jlg@lanl.gov (Jim Giles) (11/07/90)

From article <2047:Nov607:10:1690@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> In article <4349@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> [...]
> The .h-.c (package spec-package body) model is sufficient to detect
> procedure call interface errors at compile time.

Not if the information in the .h file _doesn't_match_ the definition
of the corresponding routines in the .c file.  This is the kind of
check that I have _always_ recommended for the loader to do.  The
loader need know _nothing_ about the language to perform these
tests, since it only makes sure that they match.  The compiler must
provide sufficient information to tell the loader which constraints
must be promoted across the call-tree - but this information can
exist in a language independent form.

J. Giles

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/07/90)

In article <5074@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> From article <1845:Nov607:02:2390@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> > In article <3681@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> >> [... pointers not reliable across segments ...]

What you actually said was that a memory manager cannot use pointers
reliably. We all agree that pointers cannot be used portably outside of
the arrays that they point to.

> > This statement is unjustified. It is, in fact, false. I am told that at
> > least one C IBM PC memory manager keeps track of memory separately in
> > each segment.
> Exactly my point, pointers are presumably being used _within_
> segments, but they are not adequate for writing the _whole_ memory
> manager - something else must be in use to track the segments.

Huh? Pointers are used to track the segments. It's just that you never
try to use one pointer across segments. This sort of bi-level allocation
is actually rather easy. Pointers are perfectly adequate for writing the
_whole_ memory manager.

---Dan

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/07/90)

In article <5077@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> From article <2047:Nov607:10:1690@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> > In article <4349@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> > [...]
> > The .h-.c (package spec-package body) model is sufficient to detect
> > procedure call interface errors at compile time.
> Not if the information in the .h file _doesn't_match_ the definition
> of the corresponding routines in the .c file.

In which case the error will be detected at *compile time*, when the
compiler gets around to that .c file. I stand by my statement.

> The compiler must
> provide sufficient information to tell the loader which constraints
> must be promoted across the call-tree - but this information can
> exist in a language independent form.

I suppose you could have the loader accept arbitrary strings from the
compiler describing these constraints, but then it's not going to
generate intelligent error messages. Have you ever used AT&T's C++?
This isn't a very good solution. The compiler can do a much better job.

---Dan

gudeman@cs.arizona.edu (David Gudeman) (11/07/90)

In article  <5077@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
] The compiler must
]provide sufficient information to tell the loader which constraints
]must be promoted across the call-tree - but this information can
]exist in a language independent form.

Sounds like a fun new software tool -- a conditional linker.  I wish I
had time to work on it.  I visualize a linker with control constructs
roughly similar to the C preprocessor.  One necessary exension of
course, would be an error producer

#error "Declaration of function %s does not match definition" f1

I expect that it would also need some extra things in the expression
syntax for type checking and such, for example

#if type(X) = type(Y)

where X and Y are some sort of (language-independent) type expression.

It could be used to conditionaly compile code depending on things that
can't be determined until link time like

#if guaranteed_not_aliased(A,B)
/* highly optimized code that assumes no aliasing */
#else
/* less optimized code to do the same thing for the case where A and B
might be aliased */
#endif

Furthermore, this linker could be downward compatible with linkers
that don't have conditional statements, so that you could install the
new linker without changing the existing compilers on your system.
-- 
					David Gudeman
Department of Computer Science
The University of Arizona        gudeman@cs.arizona.edu
Tucson, AZ 85721                 noao!arizona!gudeman

jlg@lanl.gov (Jim Giles) (11/07/90)

From article <7298:Nov620:50:5990@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> [...]
> Huh? Pointers are used to track the segments. It's just that you never
> try to use one pointer across segments. This sort of bi-level allocation
> is actually rather easy. Pointers are perfectly adequate for writing the
> _whole_ memory manager.

Well, not on my PC at any rate.  Segments have a maximum size of
64K.  The memory is 640K.  Clearly, some arithmetic on numbers bigger
than 16-bits is required to keep track of free space.  C pointers
can't do that.  But, _something_ must be.  Must not be pointers though,
all them use 16-bit arithmetic.

J. Giles

jlg@lanl.gov (Jim Giles) (11/07/90)

From article <7456:Nov620:54:4090@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> In article <5077@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> [...]
>> Not if the information in the .h file _doesn't_match_ the definition
>> of the corresponding routines in the .c file.
> 
> In which case the error will be detected at *compile time*, when the
> compiler gets around to that .c file. I stand by my statement.

Oh, now you've eliminated separate compilation.  In my experience the
main code and the libraries it calls are seldom available in source
on the same machine.  Often compiled by separate companies, possibly
in separate countries.  Mistakes happen, the .h file may not match
the code which generated the .o file (you haven't got the .c file).
I stand by my statement.

J. Giles

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/08/90)

In article <5091@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> From article <7298:Nov620:50:5990@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> > Huh? Pointers are used to track the segments. It's just that you never
> > try to use one pointer across segments. This sort of bi-level allocation
> > is actually rather easy. Pointers are perfectly adequate for writing the
> > _whole_ memory manager.
> Well, not on my PC at any rate.  Segments have a maximum size of
> 64K.  The memory is 640K.  Clearly, some arithmetic on numbers bigger
> than 16-bits is required to keep track of free space.  C pointers
> can't do that.  But, _something_ must be.  Must not be pointers though,
> all them use 16-bit arithmetic.

You are being illogical. ``Must not be arrays though, all them use
16-bit indexing.'' Just because arithmetic can't take pointers out of
their segments doesn't mean pointers are restricted to a single segment.

Perhaps it would help if you realized that a C pointer is an
abbreviation for a pointer to an array. When you index it, you are
(conceptually) dereferencing the pointer and then indexing into the
array.

---Dan

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/08/90)

In article <5093@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> From article <7456:Nov620:54:4090@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> > In article <5077@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> > [...]
> >> Not if the information in the .h file _doesn't_match_ the definition
> >> of the corresponding routines in the .c file.
> > In which case the error will be detected at *compile time*, when the
> > compiler gets around to that .c file. I stand by my statement.
> Oh, now you've eliminated separate compilation.

You are babbling.

Do you think that the Ada package model is entirely misconceived? I
thought you liked Ada.

> Mistakes happen, the .h file may not match
> the code which generated the .o file (you haven't got the .c file).

Yes, and when code has bugs, you fix the code. You don't rewrite your
compiler and loader to use some idiotic replacement for the concept of a
package.

---Dan

jlg@lanl.gov (Jim Giles) (11/08/90)

From article <6363:Nov720:32:5090@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> [...]
> You are being illogical. ``Must not be arrays though, all them use
> 16-bit indexing.'' Just because arithmetic can't take pointers out of
> their segments doesn't mean pointers are restricted to a single segment.

You're still missing the point.  In order to implement a memory
manager, it is necessary to _compare_ pointers which are _known_ to
point to _different_ data objects.  You need to do this, for
example, to determine if an object that is being added to the free
list is adjacent to existing free list items (so that they can be
merged).  Because of the ANSI restrictions on pointers, it is not
safe to compare them in this way (in fact, neither of the C compilers
I have at home do this job with pointers).

J. Giles

jlg@lanl.gov (Jim Giles) (11/08/90)

From article <6555:Nov720:35:2690@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> In article <5093@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>> From article <7456:Nov620:54:4090@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> [...]
>> > In which case the error will be detected at *compile time*, when the
>> > compiler gets around to that .c file. I stand by my statement.

I leave it up to the rest of the net.  Does the above statement sound
like separate compilation is allowed in Dan's model?  If so, it must
be clear to everyone that there may not _be_ a .c file corresponding
to the .h file - the code might be proprietary, the vendor sent the
.h file and the .o file and nothing else.  So, how does the compiler
check at compile time when it "gets around to that .c file" which
doesn't even exist?

> [...]
> Yes, and when code has bugs, you fix the code. You don't rewrite your
> compiler and loader to use some idiotic replacement for the concept of a
> package.

I will ignore the fact that you're being ad hominem again.  I don't
suggest an 'idiotic' replacement for packages.  In fact, I support
packages (or modules, or whatever you want to call them).

What I don't do is suggest that the compiler involve itself in an
elaborate scheme to avoid requiring the user to make a few simple
declarations about aliasing.  Even if your scheme works (it's long,
I'll have to study it), it enlarges .o files by factors which are
probably exponential in the number of procedure arguments; it
requires that many extra passes through the compiler for each
code (at least through the code generation phase of the compiler);
and it looks extremely cumbersome and unreliable.  And, it _does_
require modifications to the loader (at least to the extent of
recognising all the long suffixes to the procedure names which
describe which combination of aliasing is allowed in each object
code version).

J. Giles

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/08/90)

In article <5224@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> If so, it must
> be clear to everyone that there may not _be_ a .c file corresponding
> to the .h file - the code might be proprietary, the vendor sent the
> .h file and the .o file and nothing else.  So, how does the compiler
> check at compile time when it "gets around to that .c file" which
> doesn't even exist?

As I said, the compiler checks the validity of the .h file against each
c file when it compiles the .c file. If the vendor has sent along a .o,
then obviously the compiler has compiled the vendor's .c, so it had the
chance to check at that point.

You started this subthread by saying that the loader should check
prototype matches and the compiler should not.

You may be right about the first, but AT&T's C++ uses the same solution,
and it's an awful botch. If you ever make a prototyping error in it,
you'll see why.

You are absolutely wrong in saying that the compiler should not check
prototypes. I don't think you'll find anyone to agree with you on that.

---Dan

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/08/90)

In article <5224@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> What I don't do is suggest that the compiler involve itself in an
> elaborate scheme to avoid requiring the user to make a few simple
> declarations about aliasing.  Even if your scheme works (it's long,
> I'll have to study it),

It works, and it's not long. You just haven't listened to my short
explanations. I await the end of your ``study,'' and your final
admission that a C compiler can, without any hints from the user, detect
the non-aliasing in Fortran-converted-to-C array code.

> it enlarges .o files by factors which are
> probably exponential in the number of procedure arguments;

Wrong. You said many months ago that most code does not show aliasing.
If this is true, then the compiler can generate two partitions (which
means double-size .o files) and use the more efficient one in all cases.
It only needs to generate more partitions if you actually *do* have
aliasing which you want the compiler to detect on a fine grain. (As
usual, *if* the compiler does interprocedural analysis, it can figure
out exactly which partitions are necessary.)

> it
> requires that many extra passes through the compiler for each
> code (at least through the code generation phase of the compiler);

I pointed this out in the article you're studying. Yes, Jim, if you
generate code that assumes aliasing and code that does not, you have
indeed generated twice as much code.

> and it looks extremely cumbersome and unreliable.

It is not cumbersome, it is easy to implement, and it is perfectly
reliable.

> And, it _does_
> require modifications to the loader (at least to the extent of
> recognising all the long suffixes to the procedure names which
> describe which combination of aliasing is allowed in each object
> code version).

You are grasping at straws. The number of the partition chosen takes at
most a few bits to encode.

---Dan

jlg@lanl.gov (Jim Giles) (11/08/90)

From article <11854:Nov723:28:2490@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> In article <5224@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> [...]
> As I said, the compiler checks the validity of the .h file against each
> c file when it compiles the .c file. If the vendor has sent along a .o,
> then obviously the compiler has compiled the vendor's .c, so it had the
> chance to check at that point.

Well, as I said, mistakes happen.  You might have compiled your
main code with (say) version x.y of their .h files.  They might have
sent you copies of version a.b of their .o files.  The first tool
that _can_ test this kind of mismatch is the loader.  The fact that
version x.y .h files match the version x.y .o files doesn't help.
Clearly, all you have found is an even _more_ efficient way for
my loader test to be implemented - compare version numbers.

J. Giles

jlg@lanl.gov (Jim Giles) (11/08/90)

> [...]
>> And, it _does_
>> require modifications to the loader (at least to the extent of
>> recognising all the long suffixes to the procedure names which
>> describe which combination of aliasing is allowed in each object
>> code version).
> 
> You are grasping at straws. The number of the partition chosen takes at
> most a few bits to encode.

The number of different _possible_ partitions in your scheme varies
as 2^N where N is the number of procedure arguments.  This ignores the
possible aliasing of globals which multiplies in another big factor.
Since you recommend allowing the user to assert aliasing information,
any of these partitions are possibly generated as a result (assuming
that you don't always automatically generate all of them).

Well, you have an exponential naming problem (at least), which is more
than a few bits.

J. Giles

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/08/90)

In article <5241@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> The number of different _possible_ partitions in your scheme varies
> as 2^N where N is the number of procedure arguments.

No, it does not. That's the third incorrect formula you've posted for
the same quantity.

Even if you were right, no sane compiler would generate *every* possible
partition. It would just generate up to the space-time tradeoff decided
by the implementor. If you are correct that most code doesn't have
aliasing, then the right spot to chop off is at TWO partitions. One says
that all arguments of this type are aliased, one says that none are.

> This ignores the
> possible aliasing of globals which multiplies in another big factor.

I have not been ignoring globals; as I indicate in the article you're
studying, *all* possibly aliased pointers must be considered together.

> Well, you have an exponential naming problem (at least), which is more
> than a few bits.

Again, you are dreaming. Say the compiler gives up at 20 copies of the
same code (which in practice gives hellishly good coverage). How many
bits does it take to encode an integer between 1 and 20.

---Dan

peter@ficc.ferranti.com (Peter da Silva) (11/08/90)

In article <5222@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> You're still missing the point.  In order to implement a memory
> manager, it is necessary to _compare_ pointers which are _known_ to
> point to _different_ data objects.

So? A memory manager is an implementation-dependent module, not a portable
one. I can use swiss bank accounts for all ANSI cares.

> Because of the ANSI restrictions on pointers, it is not
> safe to compare them in this way (in fact, neither of the C compilers
> I have at home do this job with pointers).

Let me guess... you have an IBM-PC at home? Well, *THAT* is the machine
responsible for the ANSI restrictions on pointers *in portable code*.

I rest my case (good thing, it was getting pretty heavy).
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.   'U`
peter@ferranti.com

jlg@lanl.gov (Jim Giles) (11/09/90)

From article <12614:Nov801:02:4990@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> In article <5241@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>> The number of different _possible_ partitions in your scheme varies
>> as 2^N where N is the number of procedure arguments.
> 
> No, it does not. That's the third incorrect formula you've posted for
> the same quantity.

You are correct, the formula (as I admitted) is incorrect.  The actual
number of different partitions is _greater_than_ 2^N (where N is the
number of parameters).

> [...]
>> This ignores the
>> possible aliasing of globals which multiplies in another big factor.
> 
> I have not been ignoring globals; as I indicate in the article you're
> studying, *all* possibly aliased pointers must be considered together.

But, my extimate (2^N) did - that is what I was referring to: the fact
that 2^N is only an estimate of the number of partitions among parameters.
In order for _my_ estimate to be correct, _I_ would have to include
globals into the mix.  _I_ was the one that was ignoring globals, not
_you_.  I never accused _you_ of ignoring globals.

> [...]
>> Well, you have an exponential naming problem (at least), which is more
>> than a few bits.
> 
> Again, you are dreaming. Say the compiler gives up at 20 copies of the
> same code (which in practice gives hellishly good coverage). How many
> bits does it take to encode an integer between 1 and 20.

Well, let's try it shall we?  I have a program with 5 parameters.
Let's call the parameters 'a' through 'e'.  The possible partitions
of these 6 are below:

   {{a}{b}{c}{d}{e}} {{a b}{c}{d}{e}} {{a b}{c d}{e}} {{a b}{c e}{d}}
   {{a b}{c}{d e}} {{a b}{c d e}} {{a c}{b}{d}{e}} {{a c}{b d}{e}}
   {{a c}{b e}{d}} {{a c}{b}{d e}} {{a c}{b d e}} {{a d}{b}{c}{e}}
   {{a d}{b c}{e}} {{a d}{b e}{c}} {{a d}{b}{c e}} {{a d}{b c e}}
   {{a e}{b}{c}{d}} {{a e}{b c}{d}} {{a e}{b d}{c}} {{a e}{b}{c d}}
   {{a e}{b c d}} {{a}{b c}{d}{e}} {{a}{b c}{d e}} {{a}{b d}{c}{e}}
   {{a}{b d}{c e}} {{a}{b e}{c}{d}} {{a}{b e}{c d}} {{a}{b}{c d}{e}}
   {{a}{b}{c e}{d}} {{a}{b}{c}{d e}} {{a b c}{d}{e}} {{a b c}{d e}}
   {{a b d}{c}{e}} {{a b d}{c e}} {{a b e}{c}{d}} {{a b e}{c d}}
   {{a c d}{b}{e}} {{a c d}{b e}} {{a c e}{b}{d}} {{a c e}{b d}}
   {{a d e}{b}{c}} {{a d e}{b c}} {{a}{b c d}{e}} {{a}{b c e}{d}}
   {{a}{b d e}{c}} {{a}{b}{c d e}} {{a b c d}{e}} {{a b c e}{d}}
   {{a b d e}{c}} {{a c d e}{b}} {{a}{b c d e}} {{a b c d e}}

Well, there are 52 of these (remember, I said that the number was
_at_least_ 2^N).  So, if you're going to stop at 20, which 20 are
you going to pick.  No matter which ones you pick, I can give
explicit assertions (remember _your_ scheme permits these) that
describe one that's not on your list of 20 - so you have to
have a consistent way of giving this new version a name.  As I
said, you have an exponential naming problem (at least).

J. Giles

jlg@lanl.gov (Jim Giles) (11/09/90)

From article <7AZ6O15@xds13.ferranti.com>, by peter@ficc.ferranti.com (Peter da Silva):
> [...]
> Let me guess... you have an IBM-PC at home? Well, *THAT* is the machine
> responsible for the ANSI restrictions on pointers *in portable code*.
> 
> I rest my case (good thing, it was getting pretty heavy).

I know the IBM-PC is the cause of the ANSI restriction.  That was
the gist of my complaint against ANSI.  They let a few bad IBM-PC
implementations dictate a rule which makes it impossible to rely on
pointers to do memory management implementation.  There is no reason
for the rule except to include these laze compilers into the class
of "standard conforming" C compilers.  Even on a PC, it's possible
to implement pointers so that they can be compared no matter where
they point (it just means that pointer compares are a little
slower).  This should have been required by the standard.

By the way, Peter da Silva also points out that memory managers are
machine dependent anyway (so it's alright for them to be in assembly
I guess is his conclusion).  This is true to a very small extent.
I've seen memory managers which work well on large numbers of
different types of hardware but are written in high-level languages.
The machine dependence usually takes the form of a few parameters
which specify whether memory is byte/word addressible, how big
memory is, and a few other simple things.  If the C standard had
required pointer comparisons to _always_ be reliable, a portable
memory manager of this kind could easily be written in C for nearly
every machine I've ever seen.  As I say, this is one of the few
things that I think pointers really _are_ suited to.

J. Giles

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/09/90)

In article <5344@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> Well, let's try it shall we?  I have a program with 5 parameters.
> Let's call the parameters 'a' through 'e'.  The possible partitions
> of these 6 are below:

Hmmm. Did you start with 6, decide it was too much, and edit it down to
5? Gee. Back in April, when I addressed the same issue, I went up to 6.
See below.

> Well, there are 52 of these

At least you got that right. (I wrote them out by hand. Took a few
minutes. You?)

As a matter of fact, most of my responses to this article are contained
in that article back from April, which you never responded to. How about
giving it a try? It's quoted below. Pay attention to the last paragraph.

> No matter which ones you pick, I can give
> explicit assertions (remember _your_ scheme permits these) that
> describe one that's not on your list of 20 - so you have to
> have a consistent way of giving this new version a name.

Sorry, my answer to this bit is contained in the article starting the
f2c-cc-f77 thread. Note that this can be applied to optimizers for the
current C language, which cannot express such assertions; so you're
making a huge assumption here. I really am talking about cc, not qc or
nemesis95.

Okay, I'll be nice and let you off from doing your homework. What I said
about calling the function:

> When the compiler is compiling a call to the function: It now uses the
> partitions as requirements, with the same meaning for aliasing as above.
> If it thinks all the x's might be aliased, it has to generate code that
> calls the simplest version. The crucial point is that in a correct
> translation of a correct Fortran program, it knows that none of the
> arguments are aliased, so it can choose code that calls the
> all-elements-separated version.

You will also note that I required the partitions to be chosen by a
fixed method not depending on any other outside information, so the
compiler knows what choices are available. If there are several possible
partitions, it'll choose the strongest ones (expressing the strongest
anti-aliasing requirements), then choose any way it wants from those.

---Dan

From cmcl2!stealth.acf.nyu.edu!brnstnd Thu Apr 12 19:40:55 CST 1990
Article 5307 of comp.lang.misc:
Path: cmcl2!stealth.acf.nyu.edu!brnstnd
>From: brnstnd@stealth.acf.nyu.edu
Newsgroups: comp.lang.misc
Subject: The number of aliasing patterns for N arrays
Message-ID: <548:Apr1221:38:3790@stealth.acf.nyu.edu>
Date: 12 Apr 90 21:38:37 GMT
References: <20080@megaron.cs.arizona.edu> <14332@lambda.UUCP>
Reply-To: brnstnd@stealth.acf.nyu.edu (Dan Bernstein)
Distribution: usa
Organization: IR
Lines: 22
X-Original-Subject: Re: Pointers as 3-tuples

In article <14332@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
> From article <20080@megaron.cs.arizona.edu>, by gudeman@cs.arizona.edu (David Gudeman):
> > Actually, it increases exponentially with the number of aliasing
> > patterns in the arguments of the functions.  For a function of 3
> > array arguments, the maximum number of versions needed is 2^3 = 8.
> No, the number increases combinatorially.  Aliasing occurs pairwise
> between arguments (an/or global data).  So, the number of potentially
> aliased pairs is N choose 2, ie. a binomial coefficient.
  [ implying that the number of versions is 2^(N choose 2), not 2^N ]

Hmmm, this doesn't sound right. After all, if A and B could be aliased,
and B and C could be aliased, then A and C could be aliased. (This is
why the natural keyword is ``alias,'' not ``noalias.'')

I count 2 for 2, 5 for 3, 15 for 4, 52 for 5, 218 for 6. Go figure.

Anyway, this is all irrelevant, because as Jim loves to remind us,
aliasing is rarely useful. (It can't be---otherwise Fortran would allow
it.) Therefore you only need two versions: the amazingly fast one for no
aliasing, and the slow one for the rare cases. Right, Jim?

---Dan

peter@ficc.ferranti.com (Peter da Silva) (11/09/90)

In article <5347@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> I know the IBM-PC is the cause of the ANSI restriction.

And means that the ANSI restriction is pretty much irrelevent: if they'd
left it out, then your PC compilers would have kept doing whatever it is
they do, they would just be "sub ANSI". As it is you can assume that pointer
comparison is safe without restrictions so long as you're not on an 80x86.

> I've seen memory managers which work well on large numbers of
> different types of hardware but are written in high-level languages.

And they can continue to do so. The version of "malloc" in K&R1 is still
as portable as it was before ANSI stepped in.

> If the C standard had
> required pointer comparisons to _always_ be reliable, a portable
> memory manager of this kind could easily be written in C for nearly
> every machine I've ever seen.

Still can. One of the reasons I bought an Amiga instead of an IBM PC.
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.   'U`
peter@ferranti.com 

mjs@hpfcso.HP.COM (Marc Sabatella) (11/10/90)

>> Because of the ANSI restrictions on pointers, it is not
>> safe to compare them in this way (in fact, neither of the C compilers
>> I have at home do this job with pointers).
>
>Let me guess... you have an IBM-PC at home? Well, *THAT* is the machine
>responsible for the ANSI restrictions on pointers *in portable code*.

I hope that's not the only reason!  I would be more concerned about people
doing things like trying to figure which variables are n bss and which are in
the data segment (in your favorite Unix linear address space) by merely
comparing their addresses and assuming the higher one was bss.  Or assuming
that two consecutively defined global variables X and Y might have the
interesting property that (char *)&y - (char *)&x = sizeof(x).  None of which
is true in any but the most simple minded linearly addressed architectures.

These assumptions were in fact made in two widely known "malloc" algorithms for
Unix, and they both break on any system that doesn't have the most basic
address space.  For instance, many Unix implementations in which "malloc"
resides in a shared library.

In any case, a portable memory manager can deal with doling out chunks of an
internal memory pool.  Additional pools can non-portably be obtained through
calls to sbrk() on Unix, but the memory manager is free to compare pointers
within each pool.  It cannot portably inquire whether one pool is "higher" or
"lower" in memory than another, but if the pools are indeed acquired via sbrk()
then you know darn well based on the acquisition order which is "higher".  The
only trouble is determining from which pool a pointer about to be free'd came
from, but that is only if you weren't clever to put that information in the
the header of the allocated block your pointer is pointing into.

jlg@lanl.gov (Jim Giles) (11/10/90)

From article <24149:Nov905:42:0990@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> [...]
> As a matter of fact, most of my responses to this article are contained
> in that article back from April, which you never responded to. How about
> giving it a try? It's quoted below. Pay attention to the last paragraph.

Actually I _did_ respond to that article.  I didn't respond to the
_next_ one you sent on the same thread. (The one where you claimed
that there were linear sorting techniques.  There are indeed: radix
sorts - which I assume are what you are refering to.  Unfortunately,
these are linear in _both_ the number of elements to be sorted N
_and_ the number of radix B digits in the keys D.  Note that if the
keys are distinct, Nlog(N) is less than (or equal to) ND.  This is
because D is the log base B of the size of the keyspace and N - at
most - fills the keyspace.  That's why I didn't bother to answer
your next post - you didn't seem to realize that an Nlog(N) sort is
_faster_ than a 'linear' radix sort when the keyspace is sparcely
filled.)

> [...]
>> > Actually, it increases exponentially with the number of aliasing
>> > patterns in the arguments of the functions.  For a function of 3
>> > array arguments, the maximum number of versions needed is 2^3 = 8.
>> No, the number increases combinatorially.  Aliasing occurs pairwise
>> between arguments (an/or global data).  So, the number of potentially
>> aliased pairs is N choose 2, ie. a binomial coefficient.
>   [ implying that the number of versions is 2^(N choose 2), not 2^N ]

I will point out that, at the time, the subject of the conversation
was the computational complexity of testing passed parameters to
determine whether they are aliased (at least, that's what I was
discussing at that time - if you thought the discussion was something
else, you should have explicitly said so,instead of claiming my
math was wrong).  It can be seen that the number of combinations
to test is indeed N choose 2.  I pointed out that this might be
trimmed a little by doing a sort of the parameter addresses before
testing: hence the side-discussion about the speed of sort algorithms.

> [...]
> Hmmm, this doesn't sound right. After all, if A and B could be aliased,
> and B and C could be aliased, then A and C could be aliased. (This is
> why the natural keyword is ``alias,'' not ``noalias.'')
> 
> I count 2 for 2, 5 for 3, 15 for 4, 52 for 5, 218 for 6. Go figure.
> 
> Anyway, this is all irrelevant, because as Jim loves to remind us,
> aliasing is rarely useful. (It can't be---otherwise Fortran would allow
> it.) Therefore you only need two versions: the amazingly fast one for no
> aliasing, and the slow one for the rare cases. Right, Jim?

But, one of the things I've consistently insisted upon is _not_ to
have the slow one on those rare occasions, but to have a slightly
slower one which compromises on speed only for those variables which
are _actually_ going to be aliased.  This leaves either one of us
with an exponential problem (at least - I still don't have a closed
form solution - so I don't make promises about the size being _just_
exponential).

I solve the problem by doing a call chain analysis and failing to
load if the propagated aliasing constraints don't match.  This keeps
incorrect optimizations from being done _and_ provides feedback to
the user allowing him to correct the aliasing in whatever way he
considers appropriate (even by explicitly allowing it).  As far as I
can tell, Dan doesn't solve (or even address) the problem.

J. Giles

peter@ficc.ferranti.com (Peter da Silva) (11/13/90)

In article <8960023@hpfcso.HP.COM> mjs@hpfcso.HP.COM (Marc Sabatella) writes:
> The only trouble is determining from which pool a pointer about to be free'd
> came from, but that is only if you weren't clever to put that information in
> the the header of the allocated block your pointer is pointing into.

You're right. You *can* write a portable malloc using pointers. OK, Jim, looks
like your one argument against ANSI's restriction on pointers is null and
void, unless you can think of a case where such a tag isn't useful. The only
remaining reasons for knowing more about the innards of a pointer than ANSI
tells you is for stuff like garbage collectors which tend to be inherently
non-portable.
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.   'U`
peter@ferranti.com 

barmar@think.com (Barry Margolin) (11/13/90)

In article <H1_6+7A@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:
>In article <5347@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>> I know the IBM-PC is the cause of the ANSI restriction.
>
>And means that the ANSI restriction is pretty much irrelevent: if they'd
>left it out, then your PC compilers would have kept doing whatever it is
>they do, they would just be "sub ANSI". As it is you can assume that pointer
>comparison is safe without restrictions so long as you're not on an 80x86.

The 80x86 is not the only architecture with multi-level memory.  There have
been quite a few systems built with segmented or object-based memory
architectures.

I use a machine on which implementing pointer comparison between unrelated
arrays is even harder than on an 80x86.  On the 80x86 dealing with the
segment number is just a performance issue.  On my Lisp Machine, global and
malloc'ed objects are allocated as individual arrays in the Lisp heap
(pointers are a combination of an array and an index), and garbage
collection can change the relative order in memory of two heap objects, so
it wouldn't be correct to use the machine addresses when comparing
pointers.

There are ways consistent pointer comparison could be implemented in this
system, but they're as unattractive as the mechanism for the 80x86.  For
instance, the arrays could have an extra element at the beginning that
would contain its object number, a number that would be incremented for
each object allocated; pointer comparison would first compare the object
numbers, and only compare the offsets if the object numbers are equal
(unfortunately, this would mean that pointer comparison could cause page
faults).

Or the C runtime could allocate a big array at program startup time, and do
all its allocation out of that (growing it when necessary).  Symbolics
Fortran does this for all its static data.  Pointers would just be indices
into this one array, so comparison would just compare their values.
However, one of the prime benefits of allocating each object as a separate
array, though, is that the Lisp Machine's hardware array bounds checking
will catch invalid array accesses, and this would be lost if a single array
were used (actually, a variant using indirect arrays might work, but then
comparisons would cause indirection and possible page faults, as above).
And growing the C heap would be expensive, as it would have to be copied.

You keep on talking about implementing a portable memory allocator in C.  I
presume you're talking about a design where it allocates big chunks of
memory using the system's malloc(), and then doles out smaller pieces of
this; pointer comparison would be used by your free primitive to determine
which chunk the object came from.  This can be done portably by using back
pointers from the allocation points to the beginning of the chunk; this is
faster than the comparison method (the former is constant time, the latter
is O(Nchunks)), but it does require extra overhead for each allocated
object.  Other portable designs make similar tradeoffs.  This just goes to
show that memory management is best done using implementation-dependent
techniques, because efficient memory management requires consideration of
the memory architecture of the system.  Portable memory allocators are nice
toys, but don't expect them to be blazingly fast.
--
Barry Margolin, Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

dc@sci.UUCP (D. C. Sessions) (11/13/90)

In article <5347@lanl.gov> jlg@lanl.gov (Jim Giles) writes:

>                                       Even on a PC, it's possible
>to implement pointers so that they can be compared no matter where
>they point (it just means that pointer compares are a little
>slower).  This should have been required by the standard.
 [...]
>J. Giles

Nice try, but unless you either (a) assume that the compiler is always 
emitting Ring 0 code, or (b) build in some very nonportable system calls
(and talk about SLOWER!), then there is no way to determine where the 
referents of two pointers are.  Both approaches require access to the 
segmentation mechanism (let alone the paging mechanism, which we'll ignore 
since your reference is to linear addresses).  Also, the results are 
volatile: even if you _do_ generate Ring 0 code, by the time the comparison 
is checked one of the segments may have moved.

Note that this argument applies to _any_ segmented memory architecture. 
The ANSI C standard could, of course, have jumped into the architecture
wars by specifying an underlying memory model.  (Why stop there?  How about
a language-defined byte order, too?)  If the standard _had_ explicitly 
excluded the 80x86 family, this whole discussion would be moot, since the 
standard would have been a dead letter.  C'est la guerre.
-- 
| The above opinions may not be original, but they are mine and mine alone. |
|            "While it may not be for you to complete the task,             |
|                 neither are you free to refrain from it."                 |
+-=-=-    (I wish this _was_ original!)        D. C. Sessions          -=-=-+

peter@ficc.ferranti.com (Peter da Silva) (11/14/90)

In article <1990Nov12.194544.14504@Think.COM> barmar@think.com (Barry Margolin) writes:
> The 80x86 is not the only architecture with multi-level memory.  There have
> been quite a few systems built with segmented or object-based memory
> architectures.

Well, yes, but the 80x86 is the only survivor with a large enough following
to make it worth while.

[re: the lisp machine]
> Or the C runtime could allocate a big array at program startup time, and do
> all its allocation out of that (growing it when necessary).

This is what the Burroughs C compiler had to do.

Perhaps (here's a heretical thought) it's not worth implementing a C compiler
on some machines.

> You keep on talking about implementing a portable memory allocator in C.

Well, Jim Giles does. I'm not sure why, but I see no great difficulty in
doing it of you're willing to make the effort to carry a tag around. Not as
fast as machine-dependent code, but isn't that always the case?

In any case...

Would it have been such a bad thing to standardise a method of determining
if two pointers pointed into the same object?

They did it for offsets in structures with OFFSET_OF.
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.   'U`
peter@ferranti.com 

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/14/90)

In article <5N.6GH5@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:
> Would it have been such a bad thing to standardise a method of determining
> if two pointers pointed into the same object?

When I asked in comp.lang.c several months ago about whether there was
any portable way to see if p pointed within array a[n], the best answer
was to test p against a + i for each i. I remember a few people saying
that it wouldn't be hard to add a builtin to the language.

So what would a good syntax be? Q (as defined---I'm not sure how to add
this to q2c) has just in(p,a,n), and I haven't bothered to look for a
special syntax. Perhaps a ternary construct a <= p < a + n, just to
annoy the parser writers? p < q < r is true if p + i == q and q + j == r
for positive integers i and j where the additions are valid. Similarly
with <=.

> They did it for offsets in structures with OFFSET_OF.

Yep. It's certainly doable on Intel's cute li'l chips, as well as on
real computers.

---Dan