[comp.lang.misc] Pointers as 3-tuples

peter@ficc.uu.net (Peter da Silva) (04/08/90)

In article <14309@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) misses the point
and goes off on a tangent:
> Well, as has already been pointed out, the pA=A assignment is actually
> _illegal_ in C.

It's not illegal in C. It's an implicit type-cast on assignment.

To avoid a warning you must now use:

	pA = (t *)A;

> As for "known at compile time" - well, N and M aren't if they were
> passed as parameters.

Sure, because A was passed as a 3-tuple. N and M aren't known explicitly
unless declared, but N*M is known. And that's all you need to bound-check
this baby.

The point I'm making isn't that array indexing is or is not better than
pointers, but that for the purpose at hand neither array indexing or pointers
is better or worse at bounds checking.

Just think of *A as a shorthand for A[0].
-- 
 _--_|\  `-_-' Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>.
/      \  'U`
\_.--._/
      v

jlg@lambda.UUCP (Jim Giles) (04/09/90)

From article <ZZR2:M4xds13@ficc.uu.net>, by peter@ficc.uu.net (Peter da Silva):
> [...]
> Just think of *A as a shorthand for A[0].

Very good!  Now we've got grounds for some sort of agreement.  If you will
review my preceeding articles on this subject, you will find that I have
said almost exactly this.  In fact, I will repeat my earlier statement:
except for aliasing and bounds checking, pointer arithmetic is semantically
IDENTICAL to one dimensional array indexing.  Under your proposed 3-tuple
idea, the distinction between pointers and arrays is further narrowed. In
fact, because of the way you've constructed it, the two concepts are now
nearly identical since the aliasing can only be within the _same_ bounded
object.  But this also can be done with arrays (ie. A[i] and A[j] for 
unknown i and j are assumed aliased).  So, since there is now _NO_
remaining difference between the two concepts, why have both in the
language?  Arrays are the more flexible (since they can be multidimensional).
Further, the array syntax tends to be more legible.

So, I will just think of A[0] as a legible (therefore preferable) substitute
for *A.

Note that the above argument only applies to the issue of pointer _arithmetic_.
For recursive data structures, either pointers or some other mechanism is
still required.  (Actually, even recursive data structures may be implemented
with arrays, but it is less elegant in the general case.)  I prefer a more
direct syntax for recursive data structures than to go through user-visable
pointers - but that's another issue.  The point of this discussion is
that pointers are not, and should not be treated, as a useful method of
implementing arrays.

J. Giles

peter@ficc.uu.net (Peter da Silva) (04/09/90)

> Under your proposed 3-tuple
> idea, the distinction between pointers and arrays is further narrowed.

Not at all. The *semantics* of pointers as addresses, and pointers as
3-tuples, are identical. A legal ANSI C program should operate identically
under either implementation. It's just a safety net... not a new concept.

[ And it's not my concept, either... much as I'd like to claim authorship,
  I didn't originate it. ]

> In
> fact, because of the way you've constructed it, the two concepts are now
> nearly identical since the aliasing can only be within the _same_ bounded
> object.

That was always true, it was just unchecked.

> For recursive data structures, either pointers or some other mechanism is
> still required.

And this is an important point. C can be implemented with a certain set of
minimal constructs. It's really a very low-level language. And at this level
pointers and arrays are the same... and the internals of recursive data
structures are visible. If you don't like that, use a higher-level language.
Treat C as a fancy assembler.

> (Actually, even recursive data structures may be implemented
> with arrays, but it is less elegant in the general case.)

Exactly. Pointers are more general than arrays.

> The point of this discussion is
> that pointers are not, and should not be treated, as a useful method of
> implementing arrays.

Not at all. Since pointers and arrays are isomorphic, that pointers are a
useful method of implementing arrays becomes a tautology.
-- 
 _--_|\  `-_-' Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>.
/      \  'U`
\_.--._/
      v

jlg@lambda.UUCP (Jim Giles) (04/10/90)

From article <1BT2FU7ggpc2@ficc.uu.net>, by peter@ficc.uu.net (Peter da Silva):
> [...]
>> (Actually, even recursive data structures may be implemented
>> with arrays, but it is less elegant in the general case.)
> 
> Exactly. Pointers are more general than arrays.

Pointers are more general than arrays in that they are free from bounds
checking and allow arbitrary aliasing.  Arrays are more general than
pointers in that they can be multiply dimensioned.  Arrays also tend
to be more legible and easier to optimize.  Neither should be used
to perform tasks better suited to the other.

> [...]
>> The point of this discussion is
>> that pointers are not, and should not be treated, as a useful method of
>> implementing arrays.
> 
> Not at all. Since pointers and arrays are isomorphic, that pointers are a
> useful method of implementing arrays becomes a tautology.

Pointers are only isomorphic to _one_dimensional_arrays_, and only if
their aliasing and freedom from bounds are both curtailed.  Having now
removed the only features that made pointers even remotely interesting
(as an array-like mechanism) I see no reason for them to remain.

J. Giles

diamond@tkou02.enet.dec.com (diamond@tkovoa) (04/10/90)

In article <1BT2FU7ggpc2@ficc.uu.net> peter@ficc.uu.net (Peter da Silva) writes:

>And this is an important point. C can be implemented with a certain set of
>minimal constructs. It's really a very low-level language. And at this level
>pointers and arrays are the same... and the internals of recursive data
>structures are visible. If you don't like that, use a higher-level language.
>Treat C as a fancy assembler.

Hear, hear!  Now, if only we could teach all these programmers not to abuse
this assembler, that they should do most of their programming in high-level
languages, and use this fancy assembler only for rare exceptional cases
such as memory allocators and device drivers.....

-- 
Norman Diamond, Nihon DEC     diamond@tkou02.enet.dec.com
This_blank_intentionally_left_underlined________________________________________

peter@ficc.uu.net (Peter da Silva) (04/10/90)

In article <14316@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
> Pointers are more general than arrays in that they are free from bounds
> checking and allow arbitrary aliasing.

The first statement here is false, as I have demonstrated. By making a
pointer a 3-tuple (low, current, high) it is subject to the same bounds
checking. The second statement is irrelevant, because arrays can also be
arbitrarily aliased. Fortran sidesteps this problem by declaring that this
aliasing is illegal, but really doesn't provide a mechanism for ensuring
that no illegal aliasing is going on. Other languages make no such
guarantees.

> Arrays are more general than
> pointers in that they can be multiply dimensioned.

The C semantics of multiply dimensioned arrays as arrays of arrays
demonstrates the opposite.

> Pointers are only isomorphic to _one_dimensional_arrays_, and only if
> their aliasing and freedom from bounds are both curtailed.

The first statement is false, because C semantics treats pointers as
multiply dimensioned arrays (given a proper declaration), and the second
statement is only true for programs that are broken in any case... just
like Fortran programs where you do something like:

	INTEGER A(10)

	CALL PROC(A,A)
-- 
 _--_|\  `-_-' Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>.
/      \  'U`
\_.--._/
      v

preston@titan.rice.edu (Preston Briggs) (04/11/90)

In article <S4U2APBggpc2@ficc.uu.net> peter@ficc.uu.net (Peter da Silva) writes:
>In article <14316@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
>> Pointers are more general than arrays in that they are free from bounds
>> checking and allow arbitrary aliasing.
>
>The first statement here is false, as I have demonstrated.

Ah.  Then pointers are not more general than arrays :-)

> By making a
>pointer a 3-tuple (low, current, high) it is subject to the same bounds
>checking. 

And why not be practical?  Implementing pointers with triples!?
Going to be hard to keep in a register.  Going to be hard to cast
into a long int and back.

With enough work by the compiler, the penalty for array bounds checking
can be reduced to less than 10% on systems code (hearsay, from 
Peter Markstein) and much less on scientific code (Optimization of
Range Checking, Vicky Markstein, John Cocke, Peter Markstein).

Care to speculate on the cost of a 3-tuple scheme for pointers?
If not, can we dispose of them as a topic?

>The second statement is irrelevant, because arrays can also be
>arbitrarily aliased. Fortran sidesteps this problem by declaring that this
>aliasing is illegal, but really doesn't provide a mechanism for ensuring
>that no illegal aliasing is going on. 

Arrays in fortran can be aliased at procedure calls (illegally) and in 
equivalence statements (legally).  Equivalence statements are easy.
Calls are more difficult but can be handled correctly, even allowing for
separate compilation.  That is, it's possible to detect the aliases
at compile-time and either complain or handle them correctly (but
with less efficient code).

Aliasing due to pointers can be introduced at any assignment.
It's possible to detect aliasing, but the analysis is either
very imprecise or very expensive or both.  By imprecise, I mean
overly conservative -- pointers will appear to point to many places.

--
Preston Briggs				looking for the great leap forward
preston@titan.rice.edu

flee@shire.cs.psu.edu (Felix Lee) (04/11/90)

I just had a nasty thought.  If pointers were implemented as 3-tuples,
then the compiler could generate run-time checks for aliasing, and
then switch to either a fast- or slow-coded loop.  This is probably
doable to some extent without 3-tuple pointers.
--
Felix Lee	flee@shire.cs.psu.edu	*!psuvax1!flee

preston@titan.rice.edu (Preston Briggs) (04/11/90)

In article <Ekg#.d5@cs.psu.edu> flee@shire.cs.psu.edu (Felix Lee) writes:
>If pointers were implemented as 3-tuples,
>then the compiler could generate run-time checks for aliasing, and
>then switch to either a fast- or slow-coded loop.  

Jim Giles has suggested this in terms of arrays and fortran
(presumably extended to allow aliasing).

Ershov (in 1967) suggested compiling 2 versions of a procedure,
one assuming aliasing and the other assuming no aliasing,
and determining the appropriate one to call (at each call site)
via compile-time analysis.

Cooper and Kennedy (POPL, 1989) have shown how to do the necessary 
compile-time interprocedural alias analysis.  Cooper (in '83) showed 
a practical approach to interprocedural analysis in the presence
of separate compilation.

None of these ideas, including 3-tuples, are adequate for analyzing
the effects of pointers.  They might be adequate to analyze 
"pointers used as arrays" if that were the only use of pointers.
Unfortunately, pointers are very general and are used in many ways
(after all, that's why they're popular).  In particular, it's
very difficult to determine (at compile time) what memory
locations might be modified by assignments like

	*(root->left->right) = *(root->right)

--
Preston Briggs				looking for the great leap forward
preston@titan.rice.edu

jlg@lambda.UUCP (Jim Giles) (04/11/90)

From article <6531@brazos.Rice.edu>, by preston@titan.rice.edu (Preston Briggs):
> In article <Ekg#.d5@cs.psu.edu> flee@shire.cs.psu.edu (Felix Lee) writes:
>>If pointers were implemented as 3-tuples,
>>then the compiler could generate run-time checks for aliasing, and
>>then switch to either a fast- or slow-coded loop.  
> 
> Jim Giles has suggested this in terms of arrays and fortran
> (presumably extended to allow aliasing).

Actually, I recommended it as something _NOT_ to do.  The only way
to make it genuinely useful is to have special versions of the code
code each combination of aliased/non-aliased arguments.  Since this
increases combinatorially (by the definition of combinatorial :-)
with the number of _prtentially_ aliased items, this leads to an
unacceptably large number of alternatives.  You can't efficiently
implement this as a run-time test.  Hence (as I've said before),
the only solutions are to outlaw aliasing or to insist on sophisticated
implementations which do interprocedural dataflow analysis.

J. Giles

jlg@lambda.UUCP (Jim Giles) (04/11/90)

In article <S4U2APBggpc2@ficc.uu.net> peter@ficc.uu.net (Peter da Silva) writes:
> [...]
>The second statement is irrelevant, because arrays can also be
>arbitrarily aliased. Fortran sidesteps this problem by declaring that this
>aliasing is illegal, but really doesn't provide a mechanism for ensuring
>that no illegal aliasing is going on. 

Neither the Fortran standard NOR the C standard provide mechanisms for
implementing or insuring ANYTHING.  The mechanisms are the responsibility
of the implementor.

As it happens, the implementaion of a test for illegal aliasing is
fairly simple.  (It consists of having the optimizer tag the data
flow graph every time it took advantage of the fact that aliasing
was illegal.  The entry sequence for the procedure would then check
all the flagged dependencies and issue an error if any of them _really_
were dependencies.  This entry sequence test would be as fast as the
3-tuple test you proposed as a method in C for determining which version
of a loop to take.)  The fact that few (if any) implementations have
an option to turn on run-time alias detection is not because it is
hard to do (it isn't) but because there has been no demand for such
a feature.  The error of aliasing things illegally rarely occurs.

The ability to detect aliasing at compile-time is hampered by the
need to do interprocedural analysis.  This the compiler can't do
in the presence of separate compilation.  If, on the other hand,
the loader did code generation, it could detect the aliasing and
generate code accordingly.  If this technology were widely available
and acceptably cheap, I suspect that the Fortran committee would
remove the constraint against aliasing in procedure calls.

C generates slow code by default in order to cater to a rarely needed
ability to alias arguments through the procedure call.  Fortran 'sidesteps'
the problem by always generating better code.  The test for aliasing
(which I agree should be available) is possible, but few compilers
have it because few users run into the problem.

J. Giles

peter@ficc.uu.net (Peter da Silva) (04/11/90)

In article <6529@brazos.Rice.edu> preston@titan.rice.edu (Preston Briggs) writes:
> Ah.  Then pointers are not more general than arrays :-)

Not if you can allocate new arrays at will, and reference them somehow in
a consistent manner.

> > By making a
> >pointer a 3-tuple (low, current, high) it is subject to the same bounds
> >checking. 

> And why not be practical?  Implementing pointers with triples!?
> Going to be hard to keep in a register.  Going to be hard to cast
> into a long int and back.

Doctor! Doctor! It hurts when I do this!

So don't do this. Casting pointers to long ints and back is inherently
dangerous. There is no guarantee that you can put a pointer in a
register (you can't on an 80{,2}86 in large model), nor that you can
put it in a long int (you can't on most word-addressed machines), so
don't do it. The world is not a VAX.

> With enough work by the compiler, the penalty for array bounds checking
> can be reduced to less than 10% on systems code (hearsay, from 
> Peter Markstein) and much less on scientific code (Optimization of
> Range Checking, Vicky Markstein, John Cocke, Peter Markstein).

> Care to speculate on the cost of a 3-tuple scheme for pointers?

About the same as the costs of bounds-checking on arrays passed to
functions. About the same as the costs of bounds-checking on locally
declared and allocated arrays. The only place it's going to cost more
is when you use pointers for things arrays just can't be used for, such
as lists. If you're doing scientific stuff, 3-tuple pointers and arrays
are isomorphic.

And if your code is worth a damn you're checking those pointers anyway.

	[ if(p == NULL) return FAILURE; ]

Doesn't it make sense to let the compiler do this check where it can
optimise it more effectively?
-- 
 _--_|\  `-_-' Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>.
/      \  'U`
\_.--._/
      v

njk@diku.dk (Niels J|rgen Kruse) (04/11/90)

peter@ficc.uu.net (Peter da Silva) writes:
->Using the syntax #A as the address of A (since &a now refers to a 3-tuple
->(#a, #a, #a)...
->>    t A [N] [M];    /* 2d array of type t */
->A looks like a tuple (#A[0][0], #A[0][0], #A[N][0]) of type (t (*)[M]).

Agreed.

->>    t *pA;
->>    ...
->>    pA = A;
->pA is the same tuple, but now looks like (#(t*)A[0], #(t*)A[0], #(t*)A[M*N])
->and has the type (t *).
------------------------------------------------------------------------^^^^^
Should be N, otherwise agreed.

->> Now, what bounds are associated with pA under your scheme?  N clearly
->> doesn't make sense.  But should the bound be M or N*M?  Whichever you
->> pick, I'll (sooner or later) want it to be the other.
->It's N*M. If you want it to be M, then you would have said:
->       pA = A[0];
->Since that's a tuple (#A[0][0], #A[0][0], #A[0][M]) with type (t[]).

I disagree.  It is my impression (i may be wrong), that the bounds
of a pointer to a subobject derive from the enclosing object,
not from the subobject itself.

To test your model of bounds-checking in C, what are the bounds of
        foo->buf
after
        struct v { int bar; char buf[1]; } *foo;
        foo = malloc (sizeof (struct v) + 2);
?
(ignoring possible allocation failure)

Is foo->buf[2] within bounds according to your model?

For your convenience, i quote the relevant passage of a summary
of X3J11's first meeting in the interpretations phase, which
Doug Gwyn was so considerate to post to comp.std.c.
        (various disclaimers in all caps (not official etc.) deleted)

        + There are no technical problems with using malloc()ed objects
        I.e. my dirent implementation is portable: bounds checking is NOT
        allowed in cases like the common struct record_header {...buffer[1];
        /*more than 1 actually allocated*/} technique where the right size
        is allocated for the buffer in the object via malloc() (as per
        Karl Heuer's argument).

Considering the disclaimers, you might choose to ignore this.

->--
-> _--_|\  `-_-' Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>.
->/      \  'U`
->\_.--._/
->      v

BTW, why to you have a map of Australia in your .sig? (Just curious).
-- 
Niels J|rgen Kruse 	DIKU Graduate 	njk@diku.dk

jlg@lambda.UUCP (Jim Giles) (04/11/90)

From article <S4U2APBggpc2@ficc.uu.net>, by peter@ficc.uu.net (Peter da Silva):
> In article <14316@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
>> Pointers are more general than arrays in that they are free from bounds
>> checking and allow arbitrary aliasing.
> 
> The first statement here is false, as I have demonstrated.  [...]

You have 'demonstrated' no such thing.  You have proposed an implementation
which would _make_ my first statement false.  You have asserted (perhaps
correctly) that the new C standard actually supports your interpretation.
However, you have _NOT_ demonstrated that pointers which obey bounds are
more general that arrays (they aren't).  You have _NOT_ demonstrated that
pointers  _should_ obey bounds.

In fact, what you are recommending is a constraint on pointers which
removes one of the _only_ features that I thought pointers were useful
for.  I already had a pretty low opinion about the value of pointers
and you are busy destroying what respect I had left.  If it is your
intent to make pointers seem _less_ useful than they might be, you
are succeeding.

> [...]
>> Arrays are more general than
>> pointers in that they can be multiply dimensioned.
> 
> The C semantics of multiply dimensioned arrays as arrays of arrays
> demonstrates the opposite.

How does a misinterpretation of the semantic properties of multi-
dimensional arrays demonstrate anything but that C has not been
well designed for array manipulation?  Multidimensional arrays
are NOT arrays-of-arrays.  The subscripts of a multidimensional
array are independent and carry _equal_ weight.  There are no
'master' dimensions to which the others are 'slaves'.  The user of
multidimensional arrays should seldom have to know whether arrays
are stored row- or column-wise (he should _care_ about this difference
even less often).  C forces such irrelevant distinctions into every
context where multidimensional arrays are used: as args to procedures,
the user must provide for the array indexing calculation itself
by manual means.

> [...]
>> Pointers are only isomorphic to _one_dimensional_arrays_, and only if
>> their aliasing and freedom from bounds are both curtailed.
> 
> The first statement is false, because C semantics treats pointers as
> multiply dimensioned arrays (given a proper declaration), [...]

Oh!  Gee, I was unaware of that.  Please enlighten me as to the proper
declaration for the following case:

main(){
       int A[10][20];
       ...
       sub1(A);
       ...
}
       sub1(pa)
       ...      /* somewhere here there should be a declaration
                   which allows me to treat pa as a multiply dimensioned
                   array.  I hope you can show it to me.  Everyone
                   else claims that I have to declare a macro to do the
                   subscript calculation (or, shudder, do the calculation
                   by hand).  */
{...}

Actually, of course, C has no way of doing this.  The procedure can't
find out the dimensions of the argument.  And even if I were to pass
the values 10 and 20 as separate arguments, the C declaration rules
would not permit me to declare 'pa' as "int pa[bnd1][bnd2]".

No, 'pa' is isomorphic to a one-dimensional array with bound 10*20==200.
You can move 'pa' around linearly by adding and subtracting to it, but
you can _never_ double subscript it.

> [...]                                               and the second
> statement is only true for programs that are broken in any case... just
> like Fortran programs where you do something like:
> 
> 	INTEGER A(10)
> 
> 	CALL PROC(A,A)

Not necessarily broken.  Only if PROC modifies one or more of its
arguments.  Many C programs which do this sort of thing also turn
out to be buggy.  That's why you don't lose much by making it
illegal.

J. Giles

gudeman@cs.arizona.edu (David Gudeman) (04/11/90)

[about the 3-tuple method for bounds checking...]
>...You have _NOT_ demonstrated that pointers  _should_ obey bounds...
>
>In fact, what you are recommending is a constraint on pointers which
>removes one of the _only_ features that I thought pointers were useful
>for.  I already had a pretty low opinion about the value of pointers
>and you are busy destroying what respect I had left.  If it is your
>intent to make pointers seem _less_ useful than they might be, you
>are succeeding.

OK, I'm confused.  I thought one of your objections to pointers was
that they can point to undefined values, and that there was no way to
do run-time error checking for this as is done for array indexes.  I
showed a way to do it, and now you say that I've removed the only
advantage they have.  How can you criticize pointers for having a
feature and also say that that feature is their only advantage?

The only thing I've done is provide a way to ensure that the undefined
operations are detected, causing a meaningful error message rather
than the endearing "bus error: core dumped" or just peculiar
(undefined) behavior.  My proposal puts no restrictions on the
well-defined pointer operations at all.

As to whether pointers _should_ obey bounds, it would be hard to
justify any other interpretation.  Notice that I am _not_ restricting
what objects pointers can point to, only ensuring that they point to
real objects.  Any other interpretation would be either non-portable
or force compilers on some machines to produce extremely poor code.

Are you suggesting that it would be an _advantage_ if pointers could
be used like this?

int foo()
{ int *p, i,j,k,l,m,n;
  for (p = &i; p <= &n; p++) do_something(p);
}

This sort of thing can be useful I suppose, but it is not defined in
the C language, and all I'm doing is checking that it doesn't happen.
-- 
					David Gudeman
Department of Computer Science
The University of Arizona        gudeman@cs.arizona.edu
Tucson, AZ 85721                 noao!arizona!gudeman

gudeman@cs.arizona.edu (David Gudeman) (04/11/90)

In article  <14324@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
[about compiling different versions of procedures for different
aliasing patterns...]
>Actually, I recommended it as something _NOT_ to do.  The only way
>to make it genuinely useful is to have special versions of the code
>code each combination of aliased/non-aliased arguments.  Since this
>increases combinatorially (by the definition of combinatorial :-)
>with the number of _prtentially_ aliased items, this leads to an
>unacceptably large number of alternatives.

Actually, it increases exponentially with the number of aliasing
patterns in the arguments of the functions.  For a function of 3
array arguments, the maximum number of versions needed is 2^3 = 8.
Just how many array arguments do your functions have?  And I wouldn't
even consider trying to compile a seperate function for _every_
possibility.  Just 2 or so, maybe a worst case and an average case.
-- 
					David Gudeman
Department of Computer Science
The University of Arizona        gudeman@cs.arizona.edu
Tucson, AZ 85721                 noao!arizona!gudeman

gudeman@cs.arizona.edu (David Gudeman) (04/11/90)

In article  <6529@brazos.Rice.edu> preston@titan.rice.edu (Preston Briggs) writes:
>And why not be practical?  Implementing pointers with triples!?
>Going to be hard to keep in a register.  Going to be hard to cast
>into a long int and back.

Obviously the triple method is only intended to be used for code
developement, the finished product would be compiled without bounds
checks of any kind.  The casting is going to cause some trouble, but
then anyone doing that kind of thing is on his own.  (Actually, I
think I could handle that case also, but I'm not sure it would be
worth the effort.)

>With enough work by the compiler, the penalty for array bounds checking
>can be reduced to less than 10% on systems code.
>
>Care to speculate on the cost of a 3-tuple scheme for pointers?
>If not, can we dispose of them as a topic?

Dispose of them as a topic because they are inapropriate for one
particular application (finished products)?  Hardly sounds reasonable.
I usually don't keep the "asserts" in my finished products either,
does that make them useless?

>Aliasing due to pointers can be introduced at any assignment.
>It's possible to detect aliasing, but the analysis is either
>very imprecise or very expensive or both.  By imprecise, I mean
>overly conservative -- pointers will appear to point to many places.

True enough for general pointer use.  However, the statement made by
Jim Giles -- that array indexes can be optimized more than pointer
arithmetic used for the same purpose -- is incorrect.  When a pointer
is being used with pointer arithmetic in such a way that it could be
easily replaced by array indexing, then the structure of the
operations is such that it can be optimized just like the indexing
could (in most cases).
-- 
					David Gudeman
Department of Computer Science
The University of Arizona        gudeman@cs.arizona.edu
Tucson, AZ 85721                 noao!arizona!gudeman

brnstnd@stealth.acf.nyu.edu (04/11/90)

In article <14309@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
  [ about the dimensions of an array ]
> As for "known at compile time" - well, N and M aren't if they were
> passed as parameters.  Oh, but that's illegal in C too.  I've never
> heard a rational reason for it to be illegal, but that's how things
> go. 

Efficiency and simplicity, as is obvious from the machine language. I
and many others agree that variable dimensions are useful enough to
offset the angst they cause the compiler. That's why some available
compilers, and probably future standards, support them.

In the meantime you'll just have to settle for manual indexing.

---Dan

brnstnd@stealth.acf.nyu.edu (04/11/90)

In article <14324@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
  [ basically, it doesn't work to compile once for aliasing and once ]
  [ for non-aliasing, because you have to do this for each potentially ]
  [ aliased parameter, and there's a combinatorial explosion, and it ]
  [ can't be done efficiently, blah blah blah ]

Jim, you're being silly. As you delight in pointing out, most parameters
ever passed in real programs aren't aliased to each other. Hence it's
enough to compile one version where no parameters are aliased and one
version where all parameters are aliased if at all possible. (If it's
likely that some parameters may be aliased, the compiler can introduce
an intermediate level where just those parameters are aliased, provided
that the programmer or a global analysis points them out.) Say ``oops.''

---Dan

brnstnd@stealth.acf.nyu.edu (04/11/90)

In article <14327@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
> From article <S4U2APBggpc2@ficc.uu.net>, by peter@ficc.uu.net (Peter da Silva):
  [ Peter once again reminds Jim that pointers can be implemented to ]
  [ reliably detect what ANSI C calls illegal pointer operations ]
> In fact, what you are recommending is a constraint on pointers which
> removes one of the _only_ features that I thought pointers were useful
> for.

Say what? Jim, are you saying that you thought pointers were only useful
for operations that ANSI C calls illegal? Once again, the ``constraint''
embodied by, say, pointer 3-tuples changes absolutely none of the ANSI C
pointer semantics---because that constraint *is* the semantics. I must
admit I've never found any use for going outside those semantics, except
in device drivers.

> Multidimensional arrays
> are NOT arrays-of-arrays.  The subscripts of a multidimensional
> array are independent and carry _equal_ weight.

Both assembly code and mathematics say otherwise: at some level of
specification (the opposite of abstraction), a multidimensional array
really is an array of arrays. I agree, the user often doesn't think of
it that way; but it's still the truth.

> > 	INTEGER A(10)
> > 	CALL PROC(A,A)
> Not necessarily broken.  Only if PROC modifies one or more of its
> arguments.

Note that Fortran has no way to express that concept explicitly, while C
can declare const int *A. Hmmm, shall we start flogging this horse too?

---Dan

jlg@lambda.UUCP (Jim Giles) (04/11/90)

From article <20079@megaron.cs.arizona.edu>, by gudeman@cs.arizona.edu (David Gudeman):
> [...]
> OK, I'm confused.  I thought one of your objections to pointers was
> that they can point to undefined values, and that there was no way to
> do run-time error checking for this as is done for array indexes.

I never said so.  My objection to pointers is (and always has been) the
use of them when arrays are more appropriate.  Pointers _should_ be used
for things that arrays are _not_ suited to.  One instance of this is to
implement dynamic memory.  Here, it is the _purpose_ of the code to
return a reference to undefined values (undefined until allocation is
complete anyway).

Arrays are already bounded.  Why have two features with the same
functionality?

> [...]
> As to whether pointers _should_ obey bounds, it would be hard to
> justify any other interpretation.  [...]
> [...]          Any other interpretation would be either non-portable
> or force compilers on some machines to produce extremely poor code.

On any machine (even segmented ones) the memory model presented to
the user code is a contiguous memory address space.  It is within 
this linear address space that pointers should be free to roam.
(On virtual memory systems, the system may actually map the user's
memory allocation to discontiguous hardware memory locations.  But,
this is all invisable to the user code - or _should_ be.)

> [...]
> Are you suggesting that it would be an _advantage_ if pointers could
> be used like this?
> int foo()
> { int *p, i,j,k,l,m,n;
>   for (p = &i; p <= &n; p++) do_something(p);
> }
No.  I would not recommend such a thing.  Nor do I suggest using pointers
as a poor cousin of an array.  I want pointers to behave as addresses
within the linear address space presented to the user by the system.

J. Giles

jlg@lambda.UUCP (Jim Giles) (04/11/90)

From article <20080@megaron.cs.arizona.edu>, by gudeman@cs.arizona.edu (David Gudeman):
> [...]
> Actually, it increases exponentially with the number of aliasing
> patterns in the arguments of the functions.  For a function of 3
> array arguments, the maximum number of versions needed is 2^3 = 8.

No, the number increases combinatorially.  Aliasing occurs pairwise
between arguments (an/or global data).  So, the number of potentially
aliased pairs is N choose 2, ie. a binomial coefficient.  Here, N is the
number of variables in the pool of potentially aliased objects.

> Just how many array arguments do your functions have?  And I wouldn't
> even consider trying to compile a seperate function for _every_
> possibility.  Just 2 or so, maybe a worst case and an average case.

Actually, the procedures which are required to be fast may have lots
of parameters (like the RKF differential equation solver).  You give
me the choice of alias free performance or very _worst_ case.  I'll
take the first and skip the second by not allowing _any_ aliasing
(at least until the tools required for interprocedural analysis
are cheap and widely available).

J. Giles

jlg@lambda.UUCP (Jim Giles) (04/11/90)

From article <296:Apr1105:06:4490@stealth.acf.nyu.edu>, by brnstnd@stealth.acf.nyu.edu:
> In article <14327@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
> [...]
>> Multidimensional arrays
>> are NOT arrays-of-arrays.  The subscripts of a multidimensional
>> array are independent and carry _equal_ weight.
> Both assembly code and mathematics say otherwise: at some level of
> specification (the opposite of abstraction), a multidimensional array
> really is an array of arrays. I agree, the user often doesn't think of
> it that way; but it's still the truth.

Who cares about what assembly code does?  It is the purpose of a high
level language is to present an environment that is relatively free
from such concerns.

Mathematically, what you say is not true.  The ordering of subscripts
is an arbitrary decision by the mathematician who lays out the calculation.
Mathematically, such operations as TRANSPOSE are regarded as 'feebies'.
The level of specification you are refering to is _below_ that at which
mathematics works.  Certainly such concerns as row- vs. column-wise
storage are (rightly) regarded as irrelevant to the user.  The only
time such concerns become important is when efficiency forces them
into the limelight. Even in this case, an implementation which
_efficiently_ handles such details is preferable to one which
requires user intervention.

> [...]
>> > 	INTEGER A(10)
>> > 	CALL PROC(A,A)
>> Not necessarily broken.  Only if PROC modifies one or more of its
>> arguments.
> Note that Fortran has no way to express that concept explicitly, while C
> can declare const int *A. Hmmm, shall we start flogging this horse too?

Why not?  I _support_ the addition of an explicit way of declaring which
procedure arguments are changed and which are not.  I am NOT and Fortran
fanatic.  I don't maintain that it is perfect or even nearly so.  In this
case, I would rather deal with Fortran's shortcomings than with C's though.

J. Giles

peter@ficc.uu.net (Peter da Silva) (04/11/90)

In article <14327@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
> However, you have _NOT_ demonstrated that pointers which obey bounds are
> more general that arrays (they aren't).

Fine. How do you implement dynamic storage with arrays?

> In fact, what you are recommending is a constraint on pointers which
> removes one of the _only_ features that I thought pointers were useful
> for.

It would help if you would explain what this feature is, rather than
just telling me I've recommended a constraint on pointers (I've done
no such thing) that removes it (whatever it is).

> main(){
>        int A[10][20];
>        ...
>        sub1(A);
>        ...
> }
>        sub1(pa)
>        ...      /* somewhere here there should be a declaration
>                    which allows me to treat pa as a multiply dimensioned
>                    array.  I hope you can show it to me.  Everyone
>                    else claims that I have to declare a macro to do the
>                    subscript calculation (or, shudder, do the calculation
>                    by hand).  */

	int (*pa)[10][20];

		(*pa)[4][5];

> No, 'pa' is isomorphic to a one-dimensional array with bound 10*20==200.
> You can move 'pa' around linearly by adding and subtracting to it, but
> you can _never_ double subscript it.

		(*pa)[4][5];

Nope. Never.
-- 
 _--_|\  `-_-' Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>.
/      \  'U`
\_.--._/
      v

peter@ficc.uu.net (Peter da Silva) (04/11/90)

In article <1990Apr10.215213.18124@diku.dk> njk@diku.dk (Niels J|rgen Kruse) writes:
> I disagree.  It is my impression (i may be wrong), that the bounds
> of a pointer to a subobject derive from the enclosing object,
> not from the subobject itself.

I think that is probably a good idea 

> To test your model of bounds-checking in C, what are the bounds of
>         foo->buf
> after
>         struct v { int bar; char buf[1]; } *foo;
>         foo = malloc (sizeof (struct v) + 2);
> ?
> (ignoring possible allocation failure)

> Is foo->buf[2] within bounds according to your model?

My immediate reaction would be to say "no", but since X3J11 has seen fit
to allow this special case I think it would be OK for the compiler to
treat structures and arrays differently here, because the use of
substructures to refer to the whole of an extended structure has been
widespread. For another example:

	typedef struct link {
		struct link *next;
		struct link *prev;
	};

	struct foo {
		link foo_header;
		...
	};

	addlist(struct link *list, struct link *element);
	struct link *gethead(struct link *list);

	struct foo *bar;

	addlist(list, bar);

On the other hand, you *do* need to expand bounds going the other way:

	bar = gethead(list);

> Considering the disclaimers, you might choose to ignore this.

It's a tough call. I'd have been happy to go back and fix my code that
used the foo[1] convention if they'd gone the other way, I think that
just special-casing this convention would be acceptable.

> BTW, why to you have a map of Australia in your .sig? (Just curious).

I'm Australian.
-- 
 _--_|\  `-_-' Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>.
/      \  'U`
\_.--._/
      v

mjs@hpfcso.HP.COM (Marc Sabatella) (04/11/90)

>In fact, what you are recommending is a constraint on pointers which
>removes one of the _only_ features that I thought pointers were useful
>for.  I already had a pretty low opinion about the value of pointers
>and you are busy destroying what respect I had left.  If it is your
>intent to make pointers seem _less_ useful than they might be, you
>are succeeding.

I must come from a different background than you do.  Where I come from,
pointers are used for things like linked lists, binary trees, etc.
Arrays can be used for this only if you know a priori the ultimate size of
the data structure in question, or if your language allows dynamically
reallocatable arrays.

brnstnd@stealth.acf.nyu.edu (04/12/90)

In article <14333@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
> From article <296:Apr1105:06:4490@stealth.acf.nyu.edu>, by brnstnd@stealth.acf.nyu.edu:
> > In article <14327@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
     [ the following overgeneralization: ]
> >> Multidimensional arrays
> >> are NOT arrays-of-arrays.  The subscripts of a multidimensional
> >> array are independent and carry _equal_ weight.
> > Both assembly code and mathematics say otherwise: at some level of
> > specification (the opposite of abstraction), a multidimensional array
> > really is an array of arrays. I agree, the user often doesn't think of
> > it that way; but it's still the truth.
> Who cares about what assembly code does?  It is the purpose of a high
> level language is to present an environment that is relatively free
> from such concerns.

Relatively but not completely. Any program will show a speed hit if you
bounce all around memory rather than going through it sequentially; as
the user can notice this and (in C but not in Fortran!) write code to
prevent it, your overgeneralization is inaccurate.

> Mathematically, what you say is not true.  The ordering of subscripts
> is an arbitrary decision by the mathematician who lays out the calculation.

Not at a certain level. The ordered pair (a,b) is usually defined as the
set {a,{a,b}}, which is not symmetric---though of course you knew that.
Then again, the isomorphism between ((a,b),M(a,b)) and (a,(b,M(a,b)))
makes it obvious that a matrix M of n dimensions is a one-dimensional
array consisting of matrices M_i of n - 1 dimensions. In other words,
a multidimensional array is an array of arrays.

Yeah, that's it, lay on the heavy terminology and he'll crumple. :-)

> The only
> time such concerns become important
  [ but your overgeneralization implies they're nonexistent! ]
> is when efficiency forces them
> into the limelight.

Golly, I never worry about uh-fish-in-sea. Duh, Pee-tuh, do you ever
worry about uh-fish-in-sea? 1/2 :-)

> >> > 	INTEGER A(10)
> >> > 	CALL PROC(A,A)
> >> Not necessarily broken.  Only if PROC modifies one or more of its
> >> arguments.
> > Note that Fortran has no way to express that concept explicitly, while C
> > can declare const int *A. Hmmm, shall we start flogging this horse too?
> Why not?  I _support_ the addition of an explicit way of declaring which
> procedure arguments are changed and which are not.

Aha! I _support_ the addition of an explicit way of using parameters as
array dimensions (and in fact even more generally---but that's a start).
Let's make a pact: You shut up about C's failure to provide parametrized
dimensions and I'll shut up about Fortran's failure to detect errors in
changing what are meant to be constant values. Agreed, at least for this
discussion?

---Dan

gudeman@cs.arizona.edu (David Gudeman) (04/12/90)

In article  <14332@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
]From article <20080@megaron.cs.arizona.edu>, by gudeman@cs.arizona.edu (David Gudeman):
]> [...]
]> Actually, it increases exponentially with the number of aliasing
]> patterns in the arguments of the functions.  For a function of 3
]> array arguments, the maximum number of versions needed is 2^3 = 8.
]
]No, the number increases combinatorially.  Aliasing occurs pairwise
]between arguments (an/or global data).  So, the number of potentially
]aliased pairs is N choose 2, ie. a binomial coefficient.  Here, N is the
]number of variables in the pool of potentially aliased objects.

Oops.  I was simplifying to the assumption that each argument was
either potentially aliased or it was not.  I suppose that the
combinatorial analysis would allow better code generation in some
cases, but no one want to compile more than two versions anyway.
-- 
					David Gudeman
Department of Computer Science
The University of Arizona        gudeman@cs.arizona.edu
Tucson, AZ 85721                 noao!arizona!gudeman

gudeman@cs.arizona.edu (David Gudeman) (04/12/90)

(given the number of articles I'm writting, I must have a lot of spare
time these days...)
In article  <14330@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
>Here, it is the _purpose_ of the code to
>return a reference to undefined values (undefined until allocation is
>complete anyway).

There is a difference between an undefined _operation_ on a pointer
and pointing to an undefined _value_.  I'm sorry Jim, but it should be
obvious that the 3-tuple scheme does not prevent malloc() from
working, and it really looks like you are resorting to desperate
rhetorical ploys in an attempt to support your position.

>> Are you suggesting that it would be an _advantage_ if pointers could
>> be used like this?
>> int foo()
>> { int *p, i,j,k,l,m,n;
>>   for (p = &i; p <= &n; p++) do_something(p);
>> }
>No.  I would not recommend such a thing.  Nor do I suggest using pointers
>as a poor cousin of an array.  I want pointers to behave as addresses
>within the linear address space presented to the user by the system.

And the 3-tuple system does exactly that.  It only prevents the user
from doing things that have no defined meaning.  As to "poor cousins",
I wish you would stop pretending that there is something objective
about your personal distaste for the way pointers are used.  There are
a lot of people who think that

  for (p = A; p < A + A_SIZE; p++) foo(*p);

is more elegant than

  for (i = 0; i < A_SIZE; i++) foo(A[i]);

it is just a matter of taste.  You sound like an art critic who thinks
that his personal tastes are universal constants, and that others who
don't like the same things are "wrong".
-- 
					David Gudeman
Department of Computer Science
The University of Arizona        gudeman@cs.arizona.edu
Tucson, AZ 85721                 noao!arizona!gudeman

jlg@lambda.UUCP (Jim Giles) (04/12/90)

From article <20145@megaron.cs.arizona.edu>, by gudeman@cs.arizona.edu (David Gudeman):
> [...]                                I'm sorry Jim, but it should be
> obvious that the 3-tuple scheme does not prevent malloc() from
> working, and it really looks like you are resorting to desperate
> rhetorical ploys in an attempt to support your position.

No.  And I'll test your claim right here.  Using C, with you 3-tuple
idea, write the following routines: brk(), sbrk(), malloc(), and
free().  You won't succeed.  This is because _at_least_one_ of these
routines has to have direct explicit access to the internal representations
of your 3-tuples (something has to be responsible for setting them).
Since I maintain that the best use of explicitly visable pointers is
in the implementation of routines like those mentioned above, and your
3-tuple prevents me from doing so with C's pointers, I claim that you
have removed one of the only interesting properties of pointers.

> [...]                                                     There are
> a lot of people who think that
>   for (p = A; p < A + A_SIZE; p++) foo(*p);
> is more elegant than
>   for (i = 0; i < A_SIZE; i++) foo(A[i]);

And I guess it's just a matter of taste that I consider:

      for (i = 1; i < N-1; i++ )
         for (j = 1; j < M-1; j++ )
            A[i][j] = 0;

is better than:

      for (p = A; ???; ???)
        /* Oh what the hell.  I don't feel like working it out.
           Why don't _you_ tell me how to complete this code to
           zero the interior of a two-dimensional array.  I can't
           be bothered to work out something that the _compiler_
           should be responsible for (not at this time of night
           anyway).                                               */

What ever you come up with will either _not_ be as clear as the first
above, or it will use some macro (whose definition won't be clear).
Either way, I think my preference is _more_ than just personal taste.
(And, if it _is_ just personal taste, I think that those who share it
will outnumber those who don't - and very probably those who share it
will out-program those who don't.)

J. Giles

jlg@lambda.UUCP (Jim Giles) (04/12/90)

From article <8960015@hpfcso.HP.COM>, by mjs@hpfcso.HP.COM (Marc Sabatella):
> [...]
> I must come from a different background than you do.  Where I come from,
> pointers are used for things like linked lists, binary trees, etc.

Where I come from, all those things are called recursive data structures.
These are _sometimes_ implemented using user-visable pointers (but only
if the language doesn't provide them directly).  If the language provides
recursive structures directly, even then, the implementation _may_ use
pointers internally (or it may not).  Pointers are related to recursive
data structures only in that pointers are _one_ of the possible implementation
strategies for them.

> Arrays can be used for this only if you know a priori the ultimate size of
> the data structure in question, or if your language allows dynamically
> reallocatable arrays.

The second choice is something I wholeheartedly support.

J. Giles

jlg@lambda.UUCP (Jim Giles) (04/12/90)

From article <7=U2P58xds13@ficc.uu.net>, by peter@ficc.uu.net (Peter da Silva):
> In article <14327@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
>> However, you have _NOT_ demonstrated that pointers which obey bounds are
>> more general that arrays (they aren't).
> 
> Fine. How do you implement dynamic storage with arrays?

That's EXACTLY MY POINT!!!  Pointers are for dynamic memory implementation
(but, not necessarily visable to the user of the allocation mechanism).
At least, that's the one application that I'm really certain that pointers
should be used for.  I can't think of many others.

Pointers which obey bounds are worthless as dynamic memory implementors.
The dynamic memory allocator is what _sets_ the bounds - be pretty silly
to force it to obey them.

> [...]
>> main(){
>>        int A[10][20];
>>        ...
>>        sub1(A);
>>        ...
>> }
>>        sub1(pa)
>>        ...      /* somewhere here there should be a declaration
>>                    which allows me to treat pa as a multiply dimensioned
>>                    array.  I hope you can show it to me.  Everyone
>>                    else claims that I have to declare a macro to do the
>>                    subscript calculation (or, shudder, do the calculation
>>                    by hand).  */
> 
> 	int (*pa)[10][20];
> 
> 		(*pa)[4][5];

Nope, Nope, Nope, ....  how does the guy who wrote 'sub1' know that the
array that will be passed has bounds 10 and 20?  Answer: he doesn't.
In fact, the routine might be called many times - with a differently
bounded array each time.  Keep looking!  You may find the answer yet!






J. Giles

mph@lion.inmos.co.uk (Mike Harrison) (04/12/90)

In article <14341@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
>From article <8960015@hpfcso.HP.COM>, by mjs@hpfcso.HP.COM (Marc Sabatella):
>> [...]
>> Arrays can be used for this only if you know a priori the ultimate size of
>> the data structure in question, or if your language allows dynamically
>> reallocatable arrays.
>
>The second choice is something I wholeheartedly support.


Sounds like another vote for Algol 68 :-)


Mike,




Michael P. Harrison - Software Group - Inmos Ltd. UK.
-----------------------------------------------------------
UK : mph@inmos.co.uk             with STANDARD_DISCLAIMERS;
US : mph@inmos.com               use  STANDARD_DISCLAIMERS;

peter@ficc.uu.net (Peter da Silva) (04/12/90)

> Pointers which obey bounds are worthless as dynamic memory implementors.
> The dynamic memory allocator is what _sets_ the bounds - be pretty silly
> to force it to obey them.

Since you can't implement a dynamic memory allocator portably (of course
not: it has to know what pointers look like and how to request blocks
from the O/S) who cares?

But suppose you want to implement a pool of fixed-sized memory between
malloc and the user program (say, you want to allocate 27-byte and
192-byte chunks all the time, and rarely other sizes)... with arrays?

> Nope, Nope, Nope, ....  how does the guy who wrote 'sub1' know that the
> array that will be passed has bounds 10 and 20?  Answer: he doesn't.

But that's not what you asked.

> In fact, the routine might be called many times - with a differently
> bounded array each time.  Keep looking!  You may find the answer yet!

Why? So this particular thing is easier in Fortran than in C (or, I might
add, in a bunch of other languages)... so what? I'd rather put up with
a minor inconvenience (and it is minor... the macro to do dynamic sizing
is TRIVIAL) than have to put up with all the things you just *can't* do
in Fortran.

Then again, I could work in some less-common language and find it quite
impossible to even *get* compilers for most of the machines I use, let
alone *compatible* ones. Fortran and C are about the only widely available
languages with reasonably complete portable subsets.
-- 
 _--_|\  `-_-' Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>.
/      \  'U`
\_.--._/
      v

brnstnd@stealth.acf.nyu.edu (04/13/90)

In article <14330@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
> I want pointers to behave as addresses
> within the linear address space presented to the user by the system.

Ugh. I want pointers to behave as addresses within particular objects,
like ANSI says they do. I don't care about the arrangement of those
objects within memory; why do you? (Leave device drivers aside for this
discussion; they're neither portable nor meant to be portable.)

---Dan

machaffi@fred.cs.washington.edu (Scott MacHaffie) (04/13/90)

In article <20083@megaron.cs.arizona.edu> gudeman@cs.arizona.edu (David Gudeman) writes:
>Obviously the triple method is only intended to be used for code
>developement, the finished product would be compiled without bounds
>checks of any kind.

So, you suggest that people ship code which they haven't tested?  If people
are testing code with bounds checking turned on and sell it with bounds
checking turned off, they're going to have problems.


		Scott MacHaffie

gudeman@cs.arizona.edu (David Gudeman) (04/13/90)

In article  <14340@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
>...  Using C, with you 3-tuple
>idea, write the following routines: brk(), sbrk(), malloc(), and
>free().  You won't succeed.  This is because _at_least_one_ of these
>routines has to have direct explicit access to the internal representations
>of your 3-tuples (something has to be responsible for setting them).

The problem has not changed at all.  I wouldn't know how to write
those functions portably with "normal" pointers either unless (1)
there is a system function of some sort to tell me where to allocate
from (usually brk() or sbrk()) or (2) I declare my own array to
allocate from.  Either way, the code is identical for "normal"
pointers as for bounds-checked pointers.

Any code that has well-defined behavior with normal pointers has
exactly the same well-defined behavior with bounds-checked pointers.
If the semantics has changed at all, it is only in the sense that some
behavior that was previously undefined is now defined to produce an
error result.

>And I guess it's just a matter of taste that I consider:
>
>      for (i = 1; i < N-1; i++ )
>         for (j = 1; j < M-1; j++ )
>            A[i][j] = 0;
>
>is better than:
>
>      for (p = A; ???; ???)
>        /* Oh what the hell.  I don't feel like working it out.

Why do these multi-dimensional arrays keep popping up?  I don't think
anyone has said that pointers make a good indexing method for
multi-dimensioned arrays.  OK, I agree, pointers are not usually a
very good way to traverse multi-dimensional arrays.  That observation,
however has nothing to do with the larger question of whether it is
useful to have pointers (and pointer arithmetic) in a language.

Incidentally, you chose one of the few examples where I wouldn't mind
using pointers to traverse a multi-dimensional array:

  for (p = (t *)A; p < A + M * N; p++) *p = 0;

Of course, you should expect to loose a few points for "obscure code" on
your grade...
-- 
					David Gudeman
Department of Computer Science
The University of Arizona        gudeman@cs.arizona.edu
Tucson, AZ 85721                 noao!arizona!gudeman

ske@pkmab.se (Kristoffer Eriksson) (04/13/90)

In article <14327@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
>You have 'demonstrated' no such thing.  You have proposed an implementation
>which would _make_ my first statement false.  You have asserted (perhaps
>correctly) that the new C standard actually supports your interpretation.
>However, you have _NOT_ demonstrated that pointers which obey bounds are
>more general that arrays (they aren't).

Pointers can still be assigned to, and thus point to or into different
arrays or structures at different times. Arrays can't. Just because you
have bounds in the pointers, doesn't mean that a particular pointer
variable has to contain the same bounds at every time.

> You have _NOT_ demonstrated that pointers  _should_ obey bounds.

Of course they "should". At least the C standard explicitely leaves
room for bounds checking. You are not required not to check bounds.
Thus, you may choose freely whether to do it or not. If you like
bounds checking in general, naturally you want it on pointers too.

>In fact, what you are recommending is a constraint on pointers which
>removes one of the _only_ features that I thought pointers were useful
>for.

I don't know what you are thinking of here. If it is possible for pointer
variables to point at different objects at different times, and there is
a mechanism for obtaining pointers with smaller bounds from pointers with
larger bounds, for instance some kind of casting, I can't imagine what
more you could posible desire.

>  I already had a pretty low opinion about the value of pointers
>and you are busy destroying what respect I had left.  If it is your
>intent to make pointers seem _less_ useful than they might be, you
>are succeeding.

It seems to me that here, you are fooling yourself somehow. Either you
had unrealistic views of how pointer could be used, to start with (but
I haven't seen any other evidence for that), or you are somehow
misinterpreting the effects of putting bounds into the pointers.

>main(){
>       int A[10][20];
>       ...
>       sub1(A);
>       ...
>}
>       sub1(pa)

>Actually, of course, C has no way of doing this.  The procedure can't
>find out the dimensions of the argument.  And even if I were to pass
>the values 10 and 20 as separate arguments, the C declaration rules
>would not permit me to declare 'pa' as "int pa[bnd1][bnd2]".

Doesn't GNU C allow that kind of variable declarations? Ok, they're not
allowed by the C standard, but there's no inherent difficulty in adding
them.

>No, 'pa' is isomorphic to a one-dimensional array with bound 10*20==200.
>You can move 'pa' around linearly by adding and subtracting to it, but
>you can _never_ double subscript it.

As has been pointed out, it's never been a problem to double subscript
function arguments, if you declare them like arrays with constant sizes,
and expanding the language to allow variable sizes wouldn't change the
pointer concept of C in the least, thus this particular point is not
inherent to pointers. Maybe we should try not to confuse the general
usefulness of pointers and what particular features for using pointers
are included in the C-standard at one particular moment in time.

Also, when you for instance say that C should be changed to include
real arrays rather than bounds-checking pointers, don't you rather
mean _non-aliased_ and constant references (whether arrays or pointers
acting as arrays) ? Just because aliased arrays are disallowed in Fortran,
does that make non-aliasing an inherent feature of arrays? Isn't
this, again, confusing arrays with rules that apply to arrays in
one (or several) particular language?

Why don't you simply say that you'd like C to include non-aliased
arrays, to be able to optimize the particular kind of applications that
you are involved in, and leave it at that, rather than saying that
pointers are bad and C is bad and Fortran or anything else is better.
Can't we just agree that C and Fortran have radically different
historical backgrounds?

I'd say that on many points you are technically right in what you say,
except, sometimes, about C details, but your _perspective_ is what
upsets most people. The views about what is important in a particular
language, what is useful, what should be required by every language
regardless of background, and so on.

-- 
Kristoffer Eriksson, Peridot Konsult AB, Hagagatan 6, S-703 40 Oerebro, Sweden
Phone: +46 19-13 03 60  !  e-mail: ske@pkmab.se
Fax:   +46 19-11 51 03  !  or ...!{uunet,mcvax}!sunic.sunet.se!kullmar!pkmab!ske

seanf@sco.COM (Sean Fagan) (04/13/90)

In article <14340@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
>No.  And I'll test your claim right here.  Using C, with you 3-tuple
>idea, write the following routines: brk(), sbrk(), malloc(), and
>free().  You won't succeed.  This is because _at_least_one_ of these
>routines has to have direct explicit access to the internal representations
>of your 3-tuples (something has to be responsible for setting them).

Even in current implementations, such things either a) are written in
assembly (i.e., brk is a system call in SysV), or b) call some other routine
which is written in assembly (e.g., malloc calls sbrk calls brk, which is
written in assembly).  Tell me, is all of the FORTRAN stuff you're so fond
of going to be written in FORTRAN?  Somehow, I doubt it.

>your
>3-tuple prevents me from doing so with C's pointers,

You *can't* write brk in C only, as far as I know!  You can get awfully
close, but it's still kinda difficult.

-- 
-----------------+
Sean Eric Fagan  | "It's a pity the universe doesn't use [a] segmented 
seanf@sco.COM    |  architecture with a protected mode."
uunet!sco!seanf  |         -- Rich Cook, _Wizard's Bane_
(408) 458-1422   | Any opinions expressed are my own, not my employers'.

new@udel.EDU (Darren New) (04/13/90)

In article <20197@megaron.cs.arizona.edu> gudeman@cs.arizona.edu (David Gudeman) writes:
>The problem has not changed at all.  I wouldn't know how to write
>those functions portably with "normal" pointers either unless (1)
>there is a system function of some sort to tell me where to allocate
>from (usually brk() or sbrk()) or (2) I declare my own array to
>allocate from.  Either way, the code is identical for "normal"
>pointers as for bounds-checked pointers.

Except that you lose the bounds checking in the array case.  I.e., if
you malloc(50) and then access element 52, as long as it is within your
static array you won't get any violation.  Also the case with sbrk()
uless you can somehow set the bounds.

>>      for (i = 1; i < N-1; i++ )
>>         for (j = 1; j < M-1; j++ )
>>            A[i][j] = 0;
>  for (p = (t *)A; p < A + M * N; p++) *p = 0;

These two are not identical.  Note that the array-case does not clear
the "outside edge" (0 or N) of the array (assuming that is what was meant).
              -- Darren

gudeman@cs.arizona.edu (David Gudeman) (04/14/90)

In article  <11423@june.cs.washington.edu> machaffi@fred.cs.washington.edu (Scott MacHaffie) writes:
>In article <20083@megaron.cs.arizona.edu> gudeman@cs.arizona.edu (David Gudeman) writes:
>>Obviously the triple method is only intended to be used for code
>>developement, the finished product would be compiled without bounds
>>checks of any kind.
>
>So, you suggest that people ship code which they haven't tested?

Please explain how you derived your conclusion from my statement.  I
see no relationship.
-- 
					David Gudeman
Department of Computer Science
The University of Arizona        gudeman@cs.arizona.edu
Tucson, AZ 85721                 noao!arizona!gudeman

gudeman@cs.arizona.edu (David Gudeman) (04/14/90)

In article  <16777@snow-white.udel.EDU> new@udel.EDU (Darren New) writes:
>In article <20197@megaron.cs.arizona.edu> gudeman@cs.arizona.edu (David Gudeman) writes:
>>[implementing malloc() with 3-tuple pointers]
>
>Except that you lose the bounds checking in the array case.  I.e., if
>you malloc(50) and then access element 52, as long as it is within your
>static array you won't get any violation.  Also the case with sbrk()
>uless you can somehow set the bounds.

Lets try to gain some perspective here.  The whole 3-tuple thing
started because Jim Giles was stomping on C's pointer arithmetic and one
of his alleged criticisms was that pointers that access arrays could
not be bounds checked, and were therefore inherently less "safe" than
indexes.  The 3-tuple technique was proposed to show that this
argument is wrong.

Now, since that has been disposed of, Jim has retreated to the
position that 3-tuple pointers are not as general as regular pointers.
How this relates to the use of pointers-instead-of-indexes is beyond
me, since the bounds checking was only intended to answer that single
criticism.  Even so however, the assertion was wrong, and I pointed it
out.

Now someone points out that a malloc() implemented in a C with bounds
checked pointers will not return bounds checked pointers (except
possibly that pointers will be bounded to the whole region).  What
does this have to do with either (1) pointers-instead-of-indexes or
(2) whether the semantics of bounds-checked pointers is compatible
with the semantics of unchecked pointers?  What can I say except (1)
try to implement malloc() with arrays and indexes to get
bounds-checked pointers (or indexes) and (2) the malloc() implemented
with 3-tuple pointers has the same semantics as the malloc()
implemented with unchecked pointers.

If you want malloc() to return bounds-checked pointers, you obviously
can't implement it entirely in C, whether or not the C pointers are
bounds checked.

>>>      for (i = 1; i < N-1; i++ )
>>>         for (j = 1; j < M-1; j++ )
>>>            A[i][j] = 0;
>>  for (p = (t *)A; p < A + M * N; p++) *p = 0;
>
>These two are not identical.  Note that the array-case does not clear
>the "outside edge" (0 or N) of the array (assuming that is what was meant).

I didn't notice the array bounds.  So I wouldn't use a pointer in this
case, I would use array indexes.  As I've said before, there are some
jobs that are better done with arrays and some jobs that are better
done with pointers.
-- 
					David Gudeman
Department of Computer Science
The University of Arizona        gudeman@cs.arizona.edu
Tucson, AZ 85721                 noao!arizona!gudeman

barmar@think.com (Barry Margolin) (04/14/90)

In article <14330@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
>On any machine (even segmented ones) the memory model presented to
>the user code is a contiguous memory address space.

Not on the machine I use, a Symbolics Lisp Machine.  The memory model
presented to user code is object-oriented.  It's possible to get actual
numeric memory addresses for objects, but these are almost useless outside
low-level implementation routines as objects may move around in memory as a
result of garbage collection.

In fact, this whole 3-tuple thing is not just an academic discussion; it is
actually used in the Symbolics C implementation.  A C pointer is
implemented (in software) as a pair <lisp-array, offset>, and lisp-array
objects are implemented (by hardware and microcode) as tuples <type, base,
dimensions, other-stuff>.  The array reference instructions perform bounds
checking in parallel with the memory reference so there's very little
overhead.  They *could* implement bounds checking of pointer arithmetic,
but they haven't; it will detect most attempts to indirect through pointers
that exceed their bounds (the one case where it doesn't is with pointers to
automatic data -- all the automatic variables in an activation record are
stored in one lisp-array, so it will only detect attempts to reference
outside the stack frame to which the pointer points, and thus the yucky

	int *p, a, b, c;
	for (p=&a; p++; p<=&c) ;

will probably "work").

--
Barry Margolin, Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

brnstnd@stealth.acf.nyu.edu (04/15/90)

In article <1990Apr10.215213.18124@diku.dk> njk@diku.dk (Niels J|rgen Kruse) writes:
> > To test your model of bounds-checking in C, what are the bounds of
> >         foo->buf
> > after
> >         struct v { int bar; char buf[1]; } *foo;
> >         foo = malloc (sizeof (struct v) + 2);

malloc() returns a char pointer bounded between the bottom and the top
of the allocated memory. foo then gets a struct v pointer with the same
bounds. buf is an end-of-structure array (hooray for this special case)
and hence is (when used as a pointer) a char pointer with lower bound
sizeof(int) + (char *) foo and upper bound the end of foo.

This works perfectly. ANSI's pointer semantics could be defined solely
in terms of (pointer,lower,upper).

---Dan

brnstnd@stealth.acf.nyu.edu (04/15/90)

In article <14340@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
> Using C, with you 3-tuple
> idea, write the following routines: brk(), sbrk(), malloc(), and
> free().

malloc() and free() are provided as library routines, so under a 3-tuple
implementation of pointers they're defined to use the correct bounds.
(Of course, 3-tuples are invisible to compliant programs.) They don't
need to be implemented in portable C, as long as they act correctly.

But I understand what you're aiming at, and I'll answer on that level.
At least one of brk() and sbrk() must be provided by the OS for malloc()
to get any memory in the first place. Now let's assume malloc(n) has set
up its data structures, found a character array, and made a char pointer
x that points to the n requested characters.

The bounds of x are the bounds of the memory array malloc() took from
sbrk(); the problem is to change its bounds into x lower, x + n upper.
C doesn't provide any explicit casts to do the job; is that your point?
In a more powerful C-like language, malloc() would finish off with

  return (*(char [n] at x)) x;

``char [n]'' is the type for character arrays of size n; ``at x'' is the
array type qualifier for arrays starting at x; and so malloc() returns a
pointer to a character array of size n---i.e., the 3-tuple (x,x,x+n).

> >   for (p = A; p < A + A_SIZE; p++) foo(*p);
> > is more elegant than
> >   for (i = 0; i < A_SIZE; i++) foo(A[i]);
> And I guess it's just a matter of taste that I consider:
>       for (i = 1; i < N-1; i++ )
>          for (j = 1; j < M-1; j++ )
>             A[i][j] = 0;
> is better than:
>       for (p = A; ???; ???)
  [ proceeds to complain that he doesn't understand what to do ]

Come on, Jim. foo A[N][M]; foo (*p)[M]; foo *q;

	for (p = A; p < A + N; p++)
	  for (q = &(*p)[0]; q < &(*p)[0] + M; q++)
	    *q = 0;

I'm assuming array bounds of (0,0) through (N - 1,M - 1); you seem to
have taken (1,1) through (N - 2,M - 2). Whatever.

> What ever you come up with will either _not_ be as clear as the first
> above, or it will use some macro (whose definition won't be clear).

Care to repeat that, now that you've seen how simple the solution is?
Ada types probably wouldn't mind using macro FIRST(p) for (&((*p)[0]));
that definition is perfectly clear, and

	for (p = A; p < A + N; p++)
	  for (q = FIRST(p); q < FIRST(p) + M; q++)
	    *q = 0;

is very readable. It also provides an advantage that you still haven't
admitted: namely, that p and q can be bound-checked the moment they're
assigned values.

---Dan