[comp.lang.misc] C's sins of commission

nevin@igloo.scum.com (Nevin Liber) (10/04/90)

[I added comp.lang.misc to the newsgroup list; please follow-up to the
appropriate newsgroup only.]

In article <64618@lanl.gov> jlg@lanl.gov (Jim Giles) writes:

>It is my contention that future languages
>shouldn't have pointers at all.  Not just no C-like pointers, none at
>all.  I just picked on C as the most unpleasant example of what I'm
>against.

I really hate to agree with you Jim :-), but I'm beginning to think
that you are right.  The only real argument I can see _for_ having
pointers is efficiency; more specifically, to help in
hand-optimisation.  Extensions to C such as C++ are showing that
pointers aren't needed nearly as much as they use to be; heck, code
seems to be more readable w/o them.  In languages such as Icon and
LISP I find that I don't even miss them.
-- 
	NEVIN ":-)" LIBER
	nevin@igloo.Scum.com  or  ..!gargoyle!igloo!nevin
	(708) 831-FLYS
California, here I come!	Public Service Announcement: Say NO to Rugs!

nevin@igloo.scum.com (Nevin Liber) (10/04/90)

[I added comp.lang.misc to the list of newsgroups; please follow-up to
the appropriate newsgroup ONLY.]

In article <5088@uqcspe.cs.uq.oz.au> brendan@batserver.cs.uq.oz.au writes:

>The percieved need for
>side-effects in terms is merely a by-product of the poor state of language
>design, and would not be missed at all in better languages.

I disagree.  This would throw out all functions which maintain their
own state (eg: i/o).  Heck, you might ask why we even have variables?
Even the LISP community gave into this as being a helpful programming
technique.
-- 
	NEVIN ":-)" LIBER
	nevin@igloo.Scum.com  or  ..!gargoyle!igloo!nevin
	(708) 831-FLYS
California, here I come!	Public Service Announcement: Say NO to Rugs!

sommar@enea.se (Erland Sommarskog) (10/08/90)

Nevin Liber (nevin@igloo.UUCP) writes:
)Jim Giles (jlg@lanl.gov) writes:
))It is my contention that future languages
))shouldn't have pointers at all.  Not just no C-like pointers, none at
))all.  I just picked on C as the most unpleasant example of what I'm
))against.
)
)I really hate to agree with you Jim :-), but I'm beginning to think
)that you are right.  The only real argument I can see _for_ having
)pointers is efficiency; more specifically, to help in
)hand-optimisation.  Extensions to C such as C++ are showing that
)pointers aren't needed nearly as much as they use to be;

I must be missing some context here, else this doesn't make sense.
Of course pointers are a necessary thing. Then it's an issue whether
you make them explicit like in C, Ada or Pascal, or hide them a
little and call them references like in Eiffel.
-- 
Erland Sommarskog - ENEA Data, Stockholm - sommar@enea.se
"Nelly Nilsson n|jer sig numera n{ppeligen med nio n|tter till natten"

jejones@mcrware.UUCP (James Jones) (10/08/90)

In article <2171@enea.se> sommar@enea.se (Erland Sommarskog) writes:
>I must be missing some context here, else this doesn't make sense.
>Of course pointers are a necessary thing. Then it's an issue whether
>you make them explicit like in C, Ada or Pascal, or hide them a
>little and call them references like in Eiffel.

I'm missing context, too, and of course I can't know what the original
posters were referring to, but I'd rather like the ability to define
recursive data structures (I think Hoare wrote a paper about this sort
of thing) rather than using pointers.  I tend to agree--pointers are to
data structure what gotos are to control flow.

Speaking of other sins, though, I also hope that future language
designers will bring a "human factors"/"cognitive engineering" person
along for the ride.  IMHO, C would have been VERY different had this
happened when it was created. 

	James Jones

lowry@arnor.uucp (10/08/90)

In article <2171@enea.se>, sommar@enea.se (Erland Sommarskog) writes:
|> Nevin Liber (nevin@igloo.UUCP) writes:
|> )Jim Giles (jlg@lanl.gov) writes:
|> ))It is my contention that future languages
|> ))shouldn't have pointers at all.  Not just no C-like pointers, none at
|> ))all.  I just picked on C as the most unpleasant example of what I'm
|> ))against.
|> )
|> )I really hate to agree with you Jim :-), but I'm beginning to think
|> )that you are right.  The only real argument I can see _for_ having
|> )pointers is efficiency; more specifically, to help in
|> )hand-optimisation.  Extensions to C such as C++ are showing that
|> )pointers aren't needed nearly as much as they use to be;
|> 
|> I must be missing some context here, else this doesn't make sense.
|> Of course pointers are a necessary thing. Then it's an issue whether
|> you make them explicit like in C, Ada or Pascal, or hide them a
|> little and call them references like in Eiffel.
Or hide them COMPLETELY, like in Hermes (a language for distributed
computing that has been designed and implemented in prototype form by
my group).  The Hermes programmer has NO notion of pointer in any of
its various guises.  There is NO way for the programmer to specify
aliasing or shared data of any kind.  We provide high-level data types
called "tables" that subsume most of the uses that pointers are
traditionally put to (linked data structures, character strings and
other arrays).  Hermes uses output ports (connections to message queues
called input ports) as capabilities, so passing these around subsumes
the need for passing around function pointers.  

I personally have written a great deal of code in Hermes and have
found the lack of pointers to be no hindrance.  In fact, the level of
compile-time checking provided in Hermes far surpasses that found in
any other language that I have used, and I have found this to be a
very big win.  One of the reasons we can do so much at compile time is
the lack of pointers.

Of course, our compiler and run-time system make heavy use of pointers
and shared data for reasons of efficiency, but this is completely
hidden from the Hermes programmer.

So now it's just a question of whether "not having pointers" and
"hiding them completely from the programmer" are two different things.
For all practical purposes, I'd say they are not.  Hence I would
disagree with your assessment (which you stated as if it were
incontestable) that pointers are necessary.
-- 
Andy Lowry, lowry@ibm.com, (914) 784-7925
IBM Research, 30 Saw Mill River Road, P.O. Box 704, Yorktown Heights, NY  10598

skrenta@cbmvax.commodore.com (Rich Skrenta) (10/09/90)

In article <2171@enea.se>, sommar@enea.se (Erland Sommarskog) writes:
> 
> Of course pointers are a necessary thing. Then it's an issue whether
> you make them explicit like in C, Ada or Pascal, or hide them a
> little and call them references like in Eiffel.

Suppose I eliminate pointers from my language and instead provide the
programmer with high-level data structures such as lists and arrays.
Won't he just use integer indexes as "pointers" into these data structures
if he needs them?  Isn't this what typically happens in Pascal programs,
where pointers are not as versatile as they are in C?  Are the programs
any more readable because I say

	int i; a[i]

to dereference instead of

	struct foo *p; *p ?

In article <1990Oct8.135551.21639@arnor.uucp> lowry@lusitania.watson.ibm.com (Andy Lowry) writes:

>The Hermes programmer has NO notion of pointer in any of
>its various guises.  There is NO way for the programmer to specify
>aliasing or shared data of any kind.  We provide high-level data types
>called "tables" that subsume most of the uses that pointers are
>traditionally put to (linked data structures, character strings and
>other arrays).

Sure, if Hermes gives me my tree, linked list and array, I probably won't
need pointers as much as I do in C--the data structures have already been
implemented for me.

However, if I want to make something that isn't directly supported by the
language, I'll have to use my integer array index "pointers" to do it.
Can Hermes stop me from doing that?

Rich
-- 
Rich Skrenta
skrenta@cbmvax.commodore.com

chl@cs.man.ac.uk (Charles Lindsey) (10/09/90)

In <2883@igloo.scum.com> nevin@igloo.scum.com (Nevin Liber) writes:

>I really hate to agree with you Jim :-), but I'm beginning to think
>that you are right.  The only real argument I can see _for_ having
>pointers is efficiency; ....

I disagree. Other posters have suggested that recursive data structures will
do all that is needed (sorry, I do not know how to make this newsreader
followup to two postings simultaneously ;-) ).

The situation when pointers are the only solution is when the real world
situation that you are modelling is best represented by a graph. Then you
encounter the situation where you may need to know whether two nodes are
equivalent (in the recursive sense that their connectees are equivalent), or
whether they are identically the same node. The difference is important if you
need to modify one of those nodes (do you expect the other to get changed as
well).

Here is an example.

LET twice: BE lambda(x: ) x + x;

That is an expression in some language, and one would expect to represent 
'x + x' by some tree.

twice (1 OR 2)

Here OR is a nondeterministic choice operator (some people write it as a
square box, but my terminal won't oblige :-( ). It means I want twice(1) or
twice(2).

Now unfold the call, and we get

(1 OR 2) + (1 OR 2)

which one would also expect to represent by some tree.

Now we have two nondeterministic choices, so let us choose 1 for the first and
2 for the second, and the result of the addition is 3. This is neither
twice(1) nor twice(2).

Now we really ought to have represented these expressions by graphs.

	| + |
	\   /
         \ /
	  |
          x

which unfolds to

	| + |
	\   /
	 \ /
	  |
	1 OR 2

Now with pointers, this is easily implemented. But with recursive data types
it is impossible (at least in any straightforward way - can anyone tell me how
to do this in a functional language).

I can do it in Smalltalk, where the pointers certainly exist, and so near to
the surface that you can hardly fail to notice them. In particular, Smalltalk
provides two equality operators: '=' to compare values and '==' to compare
pointers.

larus@primost.cs.wisc.edu (James Larus) (10/09/90)

In article <1990Oct8.135551.21639@arnor.uucp>, lowry@arnor.uucp writes:
  ... Long discussion of the abscence of pointers in the Hermes language...
|> 
|> Of course, our compiler and run-time system make heavy use of pointers
|> and shared data for reasons of efficiency, but this is completely
|> hidden from the Hermes programmer.
|> 

Does this mean that Hermes programmers aren't concerned with efficiency?

/Jim

chased@rbbb.Eng.Sun.COM (David Chase) (10/10/90)

skrenta@cbmvax.commodore.com (Rich Skrenta) writes:

>Suppose I eliminate pointers from my language and instead provide the
>programmer with high-level data structures such as lists and arrays.
>Won't he just use integer indexes as "pointers" into these data structures
>if he needs them?  Isn't this what typically happens in Pascal programs,
>where pointers are not as versatile as they are in C?  Are the programs
>any more readable ...

You're right, mostly, but you're missing one thing.  Pointers (as
implemented in C and C++) are generated by the programmer, either by
taking the address of something, or by allocating new memory.  In C,
C++, Pascal, Modula-2, BCPL, and PL/1, the programmer is responsible
for ensuring that whatever the pointer points to is not reused while
the pointer is in use.  Getting this wrong leads to messy bugs that
are not (necessarily) repeatable from machine to machine, or from
malloc version to malloc version.  

If integers in arrays are used instead of pointers, then the bug is at
least confined to an object of the same type (more or less --
sufficiently clever programming will produce more clever bugs), and
the bug is repeatable from machine to machine, and malloc version to
malloc version.  Hidden in this is the unfortunate fact that the
programmer may need to write his own allocator within the array.  In
practice, this is not been that big a problem for me, since these
little allocators deal in a fixed arena with uniform objects (unlike
malloc).  In practice, expandable arrays often suffice, and allow me
to make strong assertions about what bugs cannot exist in my code.

David

jlg@lanl.gov (Jim Giles) (10/10/90)

From article <14972@cbmvax.commodore.com>, by skrenta@cbmvax.commodore.com (Rich Skrenta):
> [...]                                                  Are the programs
> any more readable because I say
> 	int i; a[i]
> to dereference instead of
> 	struct foo *p; *p ?

Yes!  Because, a[i] and b[j] are guaranteed _not_ to be aliased.  Whereas
*p and *q might be or, then again, might not be.  Further, the syntactic
for (arrays in this example) will give some clue as to how the variables
will be used.  The pointer syatax _may_ be used to simulate arrays, but
you might be planning to use it for dynamic memory, strings, recursive
data structures, run-tim equivalencing, etc..  How does the reader know
that the pointer will not be used in any of those ways - he knows the
array won't be.  Each of these features should have separate syntax since
they are separate features.  Forcing them all to masquerade as pointers
only confuses the person maintaining the code - and doesn't give the
compiler enough information to adequately optimize.

J. Giles

anw@maths.nott.ac.uk (Dr A. N. Walker) (10/10/90)

Like everyone else over here, I'm missing the (initial) context, but I
found this interesting:

In article <1990Oct8.135551.21639@arnor.uucp>
lowry@lusitania.watson.ibm.com (Andy Lowry) writes:
>I personally have written a great deal of code in Hermes and have
>found the lack of pointers to be no hindrance.
[...]
>Of course, our compiler and run-time system make heavy use of pointers
>and shared data for reasons of efficiency, but this is completely
>hidden from the Hermes programmer.

	In other words, here is a construct which is incredibly
useful for writing compilers and other programs, but which the
ordinary programmer is unable to use.

	I don't think languages should hide useful abstractions
from programmers;  they should hide annoying concretions instead.
I don't really want to know the machine address of a variable,
or how to represent a conditional jump in machine code, or which
registers need to be saved over a procedure call.  But I do want
to be able to express within my language all the abstractions
that naturally arise in modelling my problem.  These may well
include pointers ["keep a finger on that variable"], even when
there are no `data structures' around.

	When, as a mathematician [well, as a game player, really],
I think of a tree or a graph, I think naturally and easily of each
node as being some information plus a collection of pointers to other
nodes.  I draw them that way.  You may think of them differently;
but I think I should be allowed, within my chosen language, to program
them my way, and not have to pummel them into some other shape.

	Yes, of course compile-time checking is a Good Thing, but
the semantics that enable this should work with rather than against
the programmer.  Run-time checking is also a Good Thing;  if doing
it causes efficiency problems, then compilers and hardware need to be
better.

-- 
Andy Walker, Maths Dept., Nott'm Univ., UK.
anw@maths.nott.ac.uk

lowry@arnor.uucp (10/10/90)

In article <14972@cbmvax.commodore.com>, skrenta@cbmvax.commodore.com (Rich Skrenta) writes:
|> In article <1990Oct8.135551.21639@arnor.uucp> lowry@lusitania.watson.ibm.com (Andy Lowry) writes:
|> 
|> >The Hermes programmer has NO notion of pointer in any of
|> >its various guises.  There is NO way for the programmer to specify
|> >aliasing or shared data of any kind.  We provide high-level data types
|> >called "tables" that subsume most of the uses that pointers are
|> >traditionally put to (linked data structures, character strings and
|> >other arrays).
|> 
|> Sure, if Hermes gives me my tree, linked list and array, I probably won't
|> need pointers as much as I do in C--the data structures have already been
|> implemented for me.
|> 
|> However, if I want to make something that isn't directly supported by the
|> language, I'll have to use my integer array index "pointers" to do it.
|> Can Hermes stop me from doing that?
|> 
|> Rich
|> -- 
|> Rich Skrenta
|> skrenta@cbmvax.commodore.com
The view in Hermes is that the programmer should not be concerned with
how data values are laid out in memory.  Hermes does not give the
programmer direct access to "tree, linked list and array."  Instead,
there is a very simple high-level data type called "table," with which
one can describe homogeneous collections of data.  The most basic
table type definition is:

	t: table of x {full};

where X is any other datatype.  This says that a value of type T is a
collection of values of type X, each of which is fully initialized
(this relates to another unique feature in Hermes called "typestate"
that I won't go into here).  If you want to operate on a specific
element of the table, you can do so with statements like:

	a := x in t where (x.color = 'red' and x.weight > 10000);

To iterate over a subtable you do something like:

	for x in t where (x.color <> 'blue') inspect
	begin
	  ...
	end for;

There are other statements for inserting and removing elements,
merging tables, and extracting or copying out subtables.

If, in your collection of data, position is important, then you add
the "ordered" keyword to your table definition:

	t: ordered table of x {full};

Then you can do things like:

	pos := position of x in t where (x.name = 'xyzzy');
	a := t[pos];
	a := x in t where (position of x = pos);

(The latter two are two different syntaxes for precisely the same
statement.)

If your table has keys in the relational sense (meaning all elements
must be unique with respect to each key), you can specify them:

	t: ordered table of x {full} keys (id) (a,b);

This would describe a table in which no two elements can have the same
id, or the same combination of a and b values.

The above is EVERYTHING you need to know about tables as a Hermes
programmer.  The compiler and run-time decide data representations and
algorithms for accessing your data.

Asking what Hermes would do if you wanted to implement a data
structure that we don't support misses the point.  With Hermes, data
representations are implementation details that the programmer need
not be concerned with.  Instead, the programmer is left to express
programs at a higher level of abstraction, where programming is easier
and less error-prone.

Of course, our language support does not allow all possible
relationships among elements of tables or among fields of an element
to be captured, but we're working on that (with user-defined
constraints verified and maintained via our typestate mechanism).

I should also mention that we have a notion of "pragmas" supplied by
the programmer that very well may take the form of *suggestions* for
data representations.  Pragmas, however, in no way change the
semantics of programs, and the compiler and run-time system are never
required to adhere to them.  We have done very little in the way of
defining and supporting pragmas to date.
-- 
Andy Lowry, lowry@ibm.com, (914) 784-7925
IBM Research, 30 Saw Mill River Road, P.O. Box 704, Yorktown Heights, NY  10598

cik@l.cc.purdue.edu (Herman Rubin) (10/10/90)

In article <2883@igloo.scum.com>, nevin@igloo.scum.com (Nevin Liber) writes:
> In article <64618@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
 
> >It is my contention that future languages
> >shouldn't have pointers at all.  Not just no C-like pointers, none at
> >all.  I just picked on C as the most unpleasant example of what I'm
> >against.
 
> I really hate to agree with you Jim :-), but I'm beginning to think
> that you are right.  The only real argument I can see _for_ having
> pointers is efficiency; more specifically, to help in
> hand-optimisation.  Extensions to C such as C++ are showing that
> pointers aren't needed nearly as much as they use to be; heck, code
> seems to be more readable w/o them.  In languages such as Icon and
> LISP I find that I don't even miss them.

I must strongly object.  When one considers what a variable or any other
type of reference is, it is a pointer.  Fortran originally only used
pointers, as there was no call by value.  I cannot imagine a language
for mathematical use which does not have variable arrays, functions, 
etc.  These are all expressed, consciously or not, as pointers.

Now if one can have fixed pointers, why not variable pointers?  We are
no longer in the days when the only way to call a subroutine was to
have the address in the instruction.

One has to go through contortions in Fortran to use a subroutine which
has a variable function or subroutine as an argument.  Now there are
developments in programming languages which should have been apparent
way back when which may decrease the need for some uses of pointers,
but why hobble the intelligent programmer?  I have deliberately used
arrays of pointers as the most convenient means of handling array
refill procedures, and while I can find a way around it, it is clumsy
and even less portable.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet)	{purdue,pur-ee}!l.cc!cik(UUCP)

lowry@arnor.uucp (10/10/90)

In article <11429@spool.cs.wisc.edu>, larus@primost.cs.wisc.edu (James Larus) writes:
|> In article <1990Oct8.135551.21639@arnor.uucp>, lowry@arnor.uucp writes:
|>   ... Long discussion of the abscence of pointers in the Hermes language...
|> |> 
|> |> Of course, our compiler and run-time system make heavy use of pointers
|> |> and shared data for reasons of efficiency, but this is completely
|> |> hidden from the Hermes programmer.
|> |> 
|> 
|> Does this mean that Hermes programmers aren't concerned with efficiency?

We believe that getting a program logically correct and making it
efficient should be two independent tasks.  The Hermes language allows
this separation by not requiring the programmer to make decisions that
will determine efficiency.  Data representation is just one example of
this.  Others are replication and migration of processes and/or data,
concurrent execution of programs written in serial form, etc.  It is
part of our continuing research to develop very aggressive and
high-level optimizations of this sort so that programs written in our
model will, in fact, execute efficiently in a wide range of
environments.

The programmer can also be involved in tuning, but this should
generally come only after the program has been made correct.  The
tuning comes in the form of sprinkling "pragmas" around the code in
various places.  None of the existing code is changed as this is done,
and the pragmas have absolutely no semantic effect on the program.  In
fact, the compiler and run-time system are allowed to completely
ignore the pragmas, though typically they would not.  Pragmas could
take a wide variety of approaches, like "You shoudl use a bit vector
to represent the following table of booleans", or "This table will
most often be accessed with queries of the following form:..." or "The
following section of code will insert many elements into this table
without otherwise referencing the table" (so perhaps a more efficient
off-line bulk insert algorithm can be chosen), etc.

The first sort of pragma would be a direct suggestion by the
programmer.  The other is behavioural information that can help the
compiler and run-time system make better choices.  The latter could
conceivably be generated by profiling tools, thereby making the tuning
process at least semi-automated and dynamic.

We have not yet addressed the pragma issue to any great extent.  In
particular, we have not made any decisions as to the form pragmas
should take, and our compiler does not pay attention to any pragmas.
Our first step, which we believe we have achieved, was to design a
powerful language in which correctness and efficiency could be
separated as we believe they should.
-- 
Andy Lowry, lowry@ibm.com, (914) 784-7925
IBM Research, 30 Saw Mill River Road, P.O. Box 704, Yorktown Heights, NY  10598

pepke@gw.scri.fsu.edu (Eric Pepke) (10/11/90)

In article <2884@igloo.scum.com> nevin@igloo.scum.com (Nevin Liber) writes:
> In article <5088@uqcspe.cs.uq.oz.au> brendan@batserver.cs.uq.oz.au 
writes:
> >The percieved need for
> >side-effects in terms is merely a by-product of the poor state of 
language
> >design, and would not be missed at all in better languages.
> 
> I disagree.  This would throw out all functions which maintain their
> own state (eg: i/o).

It would throw out *all* output, with or without internally readable state 
information.  Setting a certain pixel to a certain color is a side effect, 
as is making a UART spit out a character, as is sending a packet out over 
the net.

There are good reasons to limit the kinds of side effects and their scope 
and to provide protections from unwanted interactions, but to say "no side 
effects" is absurd.  People do more with programs than prove them correct. 
 Occasionally they actually run them on physical machines, and it's 
sometimes kind of fun to have them produce output of some sort.

Eric Pepke                                    INTERNET: pepke@gw.scri.fsu.edu
Supercomputer Computations Research Institute MFENET:   pepke@fsu
Florida State University                      SPAN:     scri::pepke
Tallahassee, FL 32306-4052                    BITNET:   pepke@fsu

Disclaimer: My employers seldom even LISTEN to my opinions.
Meta-disclaimer: Any society that needs disclaimers has too many lawyers.

norman@d.cs.okstate.edu (Norman Graham) (10/11/90)

From article <2627@l.cc.purdue.edu>, by cik@l.cc.purdue.edu (Herman Rubin):
> In article <2883@igloo.scum.com>, nevin@igloo.scum.com (Nevin Liber) writes:
>> In article <64618@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>  
>> >It is my contention that future languages
>> >shouldn't have pointers at all.  Not just no C-like pointers, none at
>> >all.  I just picked on C as the most unpleasant example of what I'm
>> >against.
>  
>> I really hate to agree with you Jim :-), but I'm beginning to think
>> that you are right.  The only real argument I can see _for_ having
>> pointers is efficiency; more specifically, to help in
>> hand-optimisation.  Extensions to C such as C++ are showing that
>> pointers aren't needed nearly as much as they use to be; heck, code
>> seems to be more readable w/o them.  In languages such as Icon and
>> LISP I find that I don't even miss them.
> 
> I must strongly object.  When one considers what a variable or any other
> type of reference is, it is a pointer.  Fortran originally only used
> pointers, as there was no call by value.  I cannot imagine a language
> for mathematical use which does not have variable arrays, functions, 
> etc.  These are all expressed, consciously or not, as pointers.

Pointers will always be required by those who demand that languages
reveal the machine-level representations of their constructs. But 
that is only one view that languages can provide: Languages can 
provide no information about the location of a value, or they can
remove the notion of storage (memory) entirely. Are these latter
views bad? It depends on the design of the rest of the language.

I cannot imagine that 'mathematical use' requires that pointers
(or memory for that matter) be part of the language--unless by
'mathematical use' you mean the class of problems that requires
very fast massaging of large matricies (i.e. wringing the most
computation from every cpu cycle). In this case, efficiency is
paramount; assembly language coding is not out of the question here.

IMHO, pointers are required only when directly manipulating hardware
devices (as in operating systems).

> Now if one can have fixed pointers, why not variable pointers?  We are
> no longer in the days when the only way to call a subroutine was to
> have the address in the instruction.

Certainly, if pointers are values in the language then we should make
them first class citizens: They must be able to point to any value 
that is conceptually stored in memory; pointers should be able to be
stored in data structures, passed to and from functions, etc.

> One has to go through contortions in Fortran to use a subroutine which
> has a variable function or subroutine as an argument.  Now there are
> developments in programming languages which should have been apparent
> way back when which may decrease the need for some uses of pointers,
> but why hobble the intelligent programmer?  I have deliberately used
> arrays of pointers as the most convenient means of handling array
> refill procedures, and while I can find a way around it, it is clumsy
> and even less portable.

Pointers are not the only way of providing these kinds of operations--
they're not even the most beautiful (conceptually speakin, of course).
The ability to pass functions to function and to store functions in
data structures only requires that functions be first class values in
the language: Pointers to functions are unneeded if functions are 
values in the language. You just need to match the tool to the 
problem. If you require pointers, use a languages with pointers;
otherwise use a conceptually more simple language~r{_{_.
-- 
Norman Graham   <norman@a.cs.okstate.edu>   {cbosgd,rutgers}!okstate!norman
The opinions expressed herein do not necessarily reflect the views of
the state of Oklahoma, Oklahoma State University, OSU's Department of
Computer Science, or of the writer himself.

norman@d.cs.okstate.edu (Norman Graham) (10/11/90)

From article <1990Oct10.101527.2247@maths.nott.ac.uk>, by anw@maths.nott.ac.uk (Dr A. N. Walker):
> 	When, as a mathematician [well, as a game player, really],
> I think of a tree or a graph, I think naturally and easily of each
> node as being some information plus a collection of pointers to other
> nodes.  I draw them that way.  You may think of them differently;
> but I think I should be allowed, within my chosen language, to program
> them my way, and not have to pummel them into some other shape.

Since you brought up the example of trees, you may want to read this
Miranda definition of a 3-ary tree (just to be different).

  three_tree ::= Leaf num | Node int three_tree three_tree three_tree

This says that a three_tree value ther a Leaf with a number or a Node
with a number and 3 other three_trees. Notice that it does this without
pointers to trees. Manipulation of these trees is very simple and
the shape of the tree is unchanged from the corresponding pointer
version.
-- 
Norman Graham   <norman@a.cs.okstate.edu>   {cbosgd,rutgers}!okstate!norman
The opinions expressed herein do not necessarily reflect the views of
the state of Oklahoma, Oklahoma State University, OSU's Department of
Computer Science, or of the writer himself.

jlg@lanl.gov (Jim Giles) (10/11/90)

From article <2627@l.cc.purdue.edu>, by cik@l.cc.purdue.edu (Herman Rubin):
> In article <2883@igloo.scum.com>, nevin@igloo.scum.com (Nevin Liber) writes:
>> In article <64618@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>  
>> >It is my contention that future languages
>> >shouldn't have pointers at all.  Not just no C-like pointers, none at
>> >all.  I just picked on C as the most unpleasant example of what I'm
>> >against.
>  
>> I really hate to agree with you Jim :-), but I'm beginning to think
>> that you are right.  The only real argument I can see _for_ having
>> pointers is efficiency; more specifically, to help in
>> hand-optimisation.  Extensions to C such as C++ are showing that
>> pointers aren't needed nearly as much as they use to be; heck, code
>> seems to be more readable w/o them.  In languages such as Icon and
>> LISP I find that I don't even miss them.

This discussion was cross-posted without adequate context to allow
clear discussion.  The assertion I made above was with respect to
_explicit_ pointers - most of the functionality of which is possible
in clearer ways.  For example, one dimensional arrays are semantically
equivalent to bounded pointer domains (with the subscript acting as
the pointer) - however, since the arrays are bounded, it is possible
to make assertions about aliasing between references that unbounded
pointers don't allow (different arrays are disjoint, for example).

> [...]
> I must strongly object.  When one considers what a variable or any other
> type of reference is, it is a pointer.  [...]

Almost.  Variables (and most constants) in procedural languages are
_usually_ (but not always) allocated some specific memory location
(either dynamically or statically) in which to keep the value of the
symbol.  The address of this memory location is an attribute of the
variable (like its size or type - which are also attributes).

A pointer, on the other hand, is a variable whose _value_ is an
address.  It also _usually_ has an address part as an attribute -
which points to the address which is the value.  But, addresses
aren't particularly important to me.  I almost never need to know
or manipulate them in any way.  So, why do I need a data type who's
values are addresses?

> [...]                                   Fortran originally only used
> pointers, as there was no call by value.  [...]

Fortran did no such thing.  Fortran has _always_ been designed to allow
_either_ call by reference _or_ call by value/result.  On many machines,
particularly the new massively parallel machines like the Connection
Machine, call by value/result is _much_ more efficient.  It's cheaper
to copy the data in and out than to have the procedure always reading
across the network of CPUs for it.

> [...]                                    I cannot imagine a language
> for mathematical use which does not have variable arrays,

Neither can I.  But these don't need explicit pointers.  Nor, do they
even, necessarily, need call by reference as I pointed out above.

> [...]                                                functions,
> etc.

Yes, I think that languages should have these too.  But, it's clear to
me that pointers are not the proper way.

> [...]  These are all expressed, consciously or not, as pointers.

Not necessarily.

> [...]
> Now if one can have fixed pointers, why not variable pointers?  We are
> no longer in the days when the only way to call a subroutine was to
> have the address in the instruction.

Why not leave pointers out of it and provide explicit ways of altering
the reference attribute of a variable in specific ways?  We are no longer
in the days when anyone _cared_ about the absolute addresses of anything.
Why have a data type which purports to represent them?

> [...]
> One has to go through contortions in Fortran to use a subroutine which
> has a variable function or subroutine as an argument.  [...]

Yes, and the solution to this problem is to have a data type called
'function', and allowing variables of that type to be declared.  For
example, suppose you have a random number generator called ranf() which
has an optional argument allowing you to set the seed.  Why shouldn't
the language of the future allow this:

   Function (optional float) -> float :: x, y      !declare x, y the same
						   !type as ranf()

   x = ranf                !not an evaluation - an assignment to x
   y = ranf                !ditto for y

   dummy = x(some_seed)         !set seed for x
   dummy = y(some_other_seed)   !set seed for y

Now, you have two independent generators with different seeds that
both are "copies" of the ranf() code.  Actually, the only part the
implementation needs to copy is the internal static variables of
the routine - all else can use the same code.

> [...]                                                 Now there are
> developments in programming languages which should have been apparent
> way back when which may decrease the need for some uses of pointers,
> but why hobble the intelligent programmer? [...]

Why indeed?  Why force the programmer to use explicit pointers when
the language can be designed to allow the programmer to explicitly
state what he really wants to do?  It's easier for the programmer,
easier for the compiler to generate efficient code, easier all
around.

> [...]                                     I have deliberately used
> arrays of pointers as the most convenient means of handling array
> refill procedures, and while I can find a way around it, it is clumsy
> and even less portable.

Exactly!  Pointers _are_ clumbsy and less than portable.  What you
really wanted was probably an array of sequences, or an array of
functions, or an array of dynamic arrays (not the same as a 2-d array -
in an array of arrays, each element of the parent array may have a
different size or even a different allocation status from its
siblings).  If the high-level language gave you those concepts, you
wouldn't need explicit pointers and you would get portability (where
ever the language was).

J. Giles

jlg@lanl.gov (Jim Giles) (10/11/90)

From article <chl.655461731@m1>, by chl@cs.man.ac.uk (Charles Lindsey):
> [...stuff about graphs with aliased nodes and aliased references to them...]
> Now with pointers, this is easily implemented. But with recursive data types
> it is impossible (at least in any straightforward way - can anyone tell me how
> to do this in a functional language).

What?  Where in the definition of recursive data structures is aliasing
prohibited?  C.A.R. Hoare carefully discussed not allowing aliasing in
recursive structures in "Structured Programming".  The fact that he had
to go to so much extra trouble indicates that the usual definition of
recursive data _does_ allow aliasing.

Now, I don't agree with Hoare that aliasing should be illegal.  But then,
I'm not as zealous as he (nor as clever - he _may_ be right).  Still, I
don't think that aliasing should be a default capability either.  I think
that allowing aliasing should require extra and explicit declarations to
clearly state what variables are allowed to be aliased what what objects.
For example:

   Recursive type graph         !recursive type with a float value and
      float :: value            !two links in each node
      graph :: left, right
   end type graph

   aliased graph :: a, b        !two vars which can be aliased to each
				!other and to their descendents

   aliased graph :: c, d        !ditto, EXCEPT that these can't be aliased
				!to the others (or vice-versa).

   graph :: e                   !this is just a binary tree, it can't
				!be aliased to any other nodes of to
				!its own descendents.

...
   a = graph(1.0,graph(2.0,null,null),null)
				!this assigns a graph with two nodes to
				!variable 'a'.  The type constructor
				!also allocates space.

   b <- a                       !'<-' is the shallow copy assignment.
				!The two variables are now aliased.

   c = a                        !'=' is the deep copy assignment.  'c'
				!is now a disjoint copy of all of 'a'.
				!If 'a' had cycles or aliased elements,
				!the corresponding elements of 'c' would
				!also be aliased - but 'a' and 'c' would
				!still be disjoint

   d <- b                       !ILLEGAL.  'd' and 'b' were in separate alias
				!alias declarations so they can't alias each
				!other.

You might argue that this is so nearly identical to pointers that it
doesn't make any difference.  But, this yields much greater control over
the meaning of the program.  Aliasing is only present when aliasing is a
required part of the algorithm and only those elements which _need_ to
be aliased are allowed to be.  Try to get that amount of control from
pointers.  Finally, there are no 'dereference' and 'address-of'
operators to misuse, forget, or confuse.

J. Giles

gadi@SHUM.HUJI.AC.IL (Gad Aharoni) (10/11/90)

UUCP>
Reply-To: shum!gadi (Gad Aharoni)
Organization: The Hebrew University of Jerusalem
Lines: 15
Apparently-To: post-usenet@ucbvax.berkeley.edu

In article <3254@mcrware.UUCP> jejones@mcrware.UUCP (James Jones) writes:

>... but I'd rather like the ability to define
>recursive data structures (I think Hoare wrote a paper about this sort
>of thing) rather than using pointers.  I tend to agree--pointers are to
>data structure what gotos are to control flow.

To all you pointer haters, I hate to point out that it is exremely
awkward to implement a graph data structure without the use of pointers.
And a graph is the most general data structure.

Incidentally, pointers do not necessarily mean side effects.

Gadi

anw@maths.nott.ac.uk (Dr A. N. Walker) (10/11/90)

In article <1224@exodus.Eng.Sun.COM> chased@rbbb.Eng.Sun.COM
(David Chase) writes:
> [...]								In C,
>C++, Pascal, Modula-2, BCPL, and PL/1, the programmer is responsible
>for ensuring that whatever the pointer points to is not reused while
>the pointer is in use.  Getting this wrong leads to messy bugs [...]

	In sensible languages (like Algol), storage allocation and,
more importantly, deallocation is monitored by the compiler and the
run-time support.  So this bug is detected mostly as a compilation
error, occasionally as a run-time fault, but in any case *when it
happens*, not when the program inexplicably fails some time later.

-- 
Andy Walker, Maths Dept., Nott'm Univ., UK.
anw@maths.nott.ac.uk

gudeman@cs.arizona.edu (David Gudeman) (10/12/90)

In article  <65265@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
]From article <14972@cbmvax.commodore.com>, by skrenta@cbmvax.commodore.com (Rich Skrenta):
]> [...]                                                  Are the programs
]> any more readable because I say
]> 	int i; a[i]
]> to dereference instead of
]> 	struct foo *p; *p ?
]
]Yes!  Because, a[i] and b[j] are guaranteed _not_ to be aliased...

But a[i] and a[j] don't have that guarantee.  So there are a few
aliasing problems that are easier to solve with arrays.  No big deal.

]...  The pointer syatax _may_ be used to simulate arrays, but
]you might be planning to use it for dynamic memory, strings, recursive
]data structures, run-tim equivalencing, etc..  How does the reader know
]that the pointer will not be used in any of those ways - he knows the
]array won't be.

No he doesn't.  In a language without pointers, the array might be
used to simulate pointers, and the pointer-simulation used for one of
the above purposes.

]  Each of these features should have separate syntax since
]they are separate features.

C-style pointers struck me from my very first exposure as a simple and
elegant way of merging various things that are only _apparently_
different.  I don't suppose you would be willing to just admit that
people have different tastes and give up your crusade against
pointers?

I'm willing to recognize that you prefer to divide the world up into a
bunch of seperate boxes.  You ought to recognize that others prefer to
integrate different things into the same box.  It is simple hubris to
try to convince others that your personal preferences are somehow
objective and that their preferences are misguided.

]  Forcing them all to masquerade as pointers
]only confuses the person maintaining the code - and doesn't give the
]compiler enough information to adequately optimize.

Funny, I've never been confused by any of the above uses of pointers
(with the possible exception of run-time equivalencing, I don't know
what you mean by that...).

As to optimization, I will admit that it is _harder_ to optimize code
with pointers.  You have never convinced me that it is less _possible_
to optimize pointers.  I say let the machine (or the compiler-writer)
do the work.  If my mental picture of a solution to a problem involves
pointers, then the language should let me _express_ the solution with
pointers.  Yes, yes, _you_ don't think I should be picturing problem
solutions with pointers.  I happen to disagree with you, and so do
thousands of other C programmers.
-- 
					David Gudeman
Department of Computer Science
The University of Arizona        gudeman@cs.arizona.edu
Tucson, AZ 85721                 noao!arizona!gudeman

gudeman@cs.arizona.edu (David Gudeman) (10/12/90)

In article  <65409@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
]
]A pointer, on the other hand, is a variable whose _value_ is an
]address...  But, addresses
]aren't particularly important to me.  I almost never need to know
]or manipulate them in any way.  So, why do I need a data type who's
]values are addresses?

An int, on the other hand, is a variable whose _value_ is a
bit-string.  But, bit-strings aren't particularly important to me.  I
almost never need to know or manipulate them in any way.  So why do I
need a data type who's values are bit-strings?

The above probably won't make the point (though it should) so I will
expand on it.  A pointer might have an address as its concrete value,
but you should be thinking of a pointer as an abstraction, not by its
concrete representation.  A pointer is a data type with certain
operations on it:

for pointers declared "TYPE *p, *q", the following

	*p returns an l-value referencing an object of type TYPE (as
	   for any other data object, if p is not initialized the
	   outcome of any operation is undefined)

	p + i returns a pointer to the ith element of type TYPE
	   following *p.  This is only valid if there is a contiguous
	   sequence of at least i elements  of type TYPE following *p.

	p < q returns TRUE iff *p preceeds *q in a sequence of
	   elements of type TYPE, assuming that both *p and *q are
	   members of the same contiguous sequence.

etc.  There is no need to refer to addresses.  Pointers are no less
abstract than arrays.  There is no guarantee that "p + i" will return
an address "i*sizeof(TYPE)" greater than that of *p, anymore than it
is guaranteed that the address of A[i] is "(i-j)*sizeof(TYPE)" more
than the address of A[j].  I know implementations of both arrays and
pointers where these conditions do _not_ hold.
-- 
					David Gudeman
Department of Computer Science
The University of Arizona        gudeman@cs.arizona.edu
Tucson, AZ 85721                 noao!arizona!gudeman

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (10/12/90)

In article <65265@lanl.gov>, jlg@lanl.gov (Jim Giles) writes:
> Yes!  Because, a[i] and b[j] are guaranteed _not_ to be aliased.  Whereas
> *p and *q might be or, then again, might not be.

That's an argument against *C* pointers.  It is not at all an argument
against pointers qua pointers.  Remember Euclid?  You declare pointers
into ZONES.  Pointers into different zones cannot possibly be aliased,
and when a zone's lifetime expires the entire zone can be reclaimed,
because a pointer can't outlive its zone.

> The pointer syatax _may_ be used to simulate arrays, but
> you might be planning to use it for dynamic memory, strings, recursive
> data structures, run-tim equivalencing, etc..  How does the reader know
> that the pointer will not be used in any of those ways - he knows the
> array won't be.

The reader certainly doesn't know anything of the kind.  Of course you
*can* use arrays to hold strings, recursive data structures, and do
run-time equivalencing.  (It's quite common to overlay two logical arrays
onto one in Pascal, just think of straight insertion sort.)

-- 
Fear most of all to be in error.	-- Kierkegaard, quoting Socrates.

pcg@cs.aber.ac.uk (Piercarlo Grandi) (10/12/90)

On 10 Oct 90 10:15:27 GMT, anw@maths.nott.ac.uk (Dr A. N. Walker) said:

anw> Like everyone else over here, I'm missing the (initial) context, but I
anw> found this interesting:

Yes, good spotting!

anw> In article <1990Oct8.135551.21639@arnor.uucp>
anw> lowry@lusitania.watson.ibm.com (Andy Lowry) writes:

lowry> I personally have written a great deal of code in Hermes and have
lowry> found the lack of pointers to be no hindrance. [...]

lowry> Of course, our compiler and run-time system make heavy use of
lowry> pointers and shared data for reasons of efficiency, but this is
lowry> completely hidden from the Hermes programmer.

anw> In other words, here is a construct which is incredibly useful for
anw> writing compilers and other programs, but which the ordinary
anw> programmer is unable to use.

Good point! And precisely my point too: pointers are nearly
indispensable to write *implementations*, because our machines use them.
But the other guy also has a point: if your implementation is
efficiently written using pointers, your application can abstract away
from them.

anw> I don't think languages should hide useful abstractions from
anw> programmers; they should hide annoying concretions instead.

But pointers are a useful concretion, not an abstraction...

anw> I don't really want to know the machine address of a variable,
anw> or how to represent a conditional jump in machine code, or which
anw> registers need to be saved over a procedure call.

If you are doing fundamental *implementation* (OS, runtime library,
profiler, debugger, compiler, ...) work, you want of course to deal
explicitly with such things. It all depends on where you put your
virtual machine boundary -- if your virtual machine is a C virtual
machine, or an Ingres virtual machine, or a SPARC virtual machine,
etc...

anw> But I do want to be able to express within my language all the
anw> abstractions that naturally arise in modelling my problem.  These
anw> may well include pointers ["keep a finger on that variable"], even
anw> when there are no `data structures' around.

Again: you never need pointers to model your *application*. Pointers
actually cloud abstractions because they introduce an extra layer of
manipulation (pointer values) that does not exist at the application
level (that is only expressed in data values). If you have very simple
relationships in your application, pointers are just one of the ways of
*implementing* them, and one of the least interesting and nastier.

If the machines we are programming had identification-by-contents for
objects we would not need pointers at the implementation level; but they
do not, so we must use pointers, identification-by-tagging, as a poor
substitute on which to build the more useful and application oriented
abstractions for relationships, using usually far more sophisticated
mechanisms than pointers.

I strongly believe that a lot of computer science, I think actually
almost all of it (if you accept the idea that numeric analysis is not
strictly CS :->), is about trying to simulate identification-by-contents
using identification-by-tagging.
--
Piercarlo "Peter" Grandi           | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

cik@l.cc.purdue.edu (Herman Rubin) (10/13/90)

In article <1990Oct10.193502.2011@d.cs.okstate.edu>, norman@d.cs.okstate.edu (Norman Graham) writes:
> From article <2627@l.cc.purdue.edu>, by cik@l.cc.purdue.edu (Herman Rubin):
> > In article <2883@igloo.scum.com>, nevin@igloo.scum.com (Nevin Liber) writes:
> >> In article <64618@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> >  
> >> >It is my contention that future languages
> >> >shouldn't have pointers at all.  Not just no C-like pointers, none at
> >> >all.  I just picked on C as the most unpleasant example of what I'm
> >> >against.

			.......................

> Pointers are not the only way of providing these kinds of operations--
> they're not even the most beautiful (conceptually speakin, of course).
> The ability to pass functions to function and to store functions in
> data structures only requires that functions be first class values in
> the language: Pointers to functions are unneeded if functions are 
> values in the language. You just need to match the tool to the 
> problem. If you require pointers, use a languages with pointers;
> otherwise use a conceptually more simple language~r{_{_.

There are real simplifications and apparent simplifications.

Suppose that internally information is being passed in a computer.  Now
the only ways that I can see to pass this information is to pass all or
part of the value, the name, or a pointer.  By part, one can pass part of
the value and a way to get at the rest, so that the problem is repeated,
although there are many places where this is done; many string procedures
pass part of the string and pointers to what is before or after.

Now passing the value may be the best way to do things, or passing a pointer.
Passing the name only rarely; interpreters are slow, and compilers and linkers
normally replace names with values or pointers, even for first-class objects.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet)	{purdue,pur-ee}!l.cc!cik(UUCP)

asylvain@felix.UUCP (Alvin "the Chipmunk" Sylvain) (10/13/90)

In article <2884@igloo.scum.com> nevin@igloo.UUCP (Nevin Liber) writes:
::[I added comp.lang.misc to the list of newsgroups; please follow-up to
::the appropriate newsgroup ONLY.]

Which is the appropriate newsgroup?  If comp.lang.misc is not appropriate,
why'd you add it?  Why do I get the feeling that no one in comp.society.futures
was or is or will be interested?  Anywho ...

::In article <5088@uqcspe.cs.uq.oz.au> brendan@batserver.cs.uq.oz.au writes:
::
::>The percieved need for
::>side-effects in terms is merely a by-product of the poor state of language
::>design, and would not be missed at all in better languages.
::
::I disagree.  This would throw out all functions which maintain their
::own state (eg: i/o).
[...]

Nope.  C (at least) allows for variables to be 'static'.  No need
for side-effects to maintain the function's internal state.

Unfortunately, Pascal doesn't suffer from this convenience.  This causes a
programmer to load up more and more junk into the 'program'-level 'var'
declaration, until it's almost as hard to debug as FORTRAN COMMON statements.
Heaven help you if two or more subroutines are both using a global 'IDX'
in different contexts.  There, your argument about side-effects probably holds.

Any post-modern (after Pascal) language allows the declaration of a
variable which is local to the routine, but doesn't change between
invokations.  Side-effects are not necessary for state-maintenance.

Period.
--
=============Opinions are Mine, typos belong to /bin/ucb/vi=============
"We're sorry, but the reality you have dialed is no   |            Alvin
longer in service.  Please check the value of pi,     |   "the Chipmunk"
or pray to your local diety for assistance."          |          Sylvain
= = = = = =I haven't the smoggiest notion what my address is!= = = = = =

jlg@lanl.gov (Jim Giles) (10/13/90)

From article <26295@megaron.cs.arizona.edu>, by gudeman@cs.arizona.edu (David Gudeman):
> In article  <65265@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> [...]
> ]Yes!  Because, a[i] and b[j] are guaranteed _not_ to be aliased...
> 
> But a[i] and a[j] don't have that guarantee.  So there are a few
> aliasing problems that are easier to solve with arrays.  No big deal.

My objection here is with the word "few" and the phrase "no big deal."
_MOST_ aliasing between pointers simulating arrays is actually between
separate arrays.  On a pipelined machine, the slowdown for aliasing
is usually on the order of a factor of 2.  On a vector machine, the
factor may be between 10 and 100.  Most users who are hit with this
problem don't regard it as "no big deal."

> [...]
> ]...  The pointer syatax _may_ be used to simulate arrays, but
> ]you might be planning to use it for dynamic memory, strings, recursive
> ]data structures, run-tim equivalencing, etc..  How does the reader know
> ]that the pointer will not be used in any of those ways - he knows the
> ]array won't be.
> 
> No he doesn't.  In a language without pointers, the array might be
> used to simulate pointers, and the pointer-simulation used for one of
> the above purposes.

Touche. _IF_ the choice were strictly between pointers and arrays,
then the rest of the important data type construction tools would have
to be simulated with whichever is chosen.  However, I don't recomment
minimalist language design.  The language should allow arrays,
sequences (essentially variable length 1-d arrays), unions (always
tagged), records (structs, whatever you want to call them), and
recursive data structures.  These features should be allowed to be
used individually or in any combination.  In addition, the 'aliased'
attribute should be allowed to be applied to any collection of data
items of the same type (which allows the "shallow copy" assignment to
be applied to them).  Further attributes that should be applicable to
any variables are 'dynamic' (which allows the objects to be allocated
and deallocated explicitly by the user), 'static' (which makes the
variable have permanent scope): the default is to put the variable on
the stack.  Finally, an explicit way of defeating type checking (for
run-time 'equivalence' work) should be provided.

The presence of all these doesn't _guarantee_ that some user won't try
to simulate one of them with some combination of the others, but why
should a user do so?  It would simply make his code harder to read and
maintain.  I prefer to state _explicitly_ how my data is structured.
Perhaps your objection is founded on this last issue.  Maybe you don't
know a-priori how you data really is structured and you want to delay
the decision until most of the code is written.  This violates the
spirit of structured programming (iterative refinement of programs -
which differs from Structured (note the initial capital) programming
which is a religion of GOTO evasion).

> [...]
> ]  Each of these features should have separate syntax since
> ]they are separate features.
> 
> C-style pointers struck me from my very first exposure as a simple and
> elegant way of merging various things that are only _apparently_
> different.  [...]

C pointers always struck me as rather _inelegant_ - even if I wanted to
merge distinct features.

> [...]     I don't suppose you would be willing to just admit that
> people have different tastes and give up your crusade against
> pointers?

I might if I believed that people's tastes were correlated to their
productivity.  I've always worked in or near the consulting office
(help desk, whatever your business calls it).  I worked my way
through college as such a consultant.  In 18 years of such experience
I think I have a pretty good idea of what kind of errors people make,
what kind cause the most trouble, and what kind of language features
don't seem to engender such errors.  Pointers are associated with
the error side of the ledger.

Further, many user productivity studies have been performed on language
features. (Although I haven't found one on pointers yet.) Many of the
researchers who conduct such tests have remarked that, very often, the
feature the users thought was most productive was actually the reverse.
Users are, in fact, notoriously bad at gauging their own productivity
or the features that effect it.  For this reason, when I see a feature
which is associated with a disproportional percentage of errors (and
difficult to find and correct ones at that), I perfectly willing to
ascribe a good share of the blame for such errors on the feature itself
- even if it's a feature I personally like. (This is indeed the case.
Ten years ago I quite liked pointers.  Since then I've found
substitutes which are just as efficient and are more readible and less
error prone.)

> [...]
> I'm willing to recognize that you prefer to divide the world up into a
> bunch of seperate boxes.  You ought to recognize that others prefer to
> integrate different things into the same box.  It is simple hubris to
> try to convince others that your personal preferences are somehow
> objective and that their preferences are misguided.

And yet, you do not find it "simple hubris" when people (including
yourself) maintain that the features of C are all anyone ever needs.
If my opposition to pointers seems overblown, it is because all the
false hype popularizing C is even more so.  How many time have you
chastized a C supporter for claiming that a preference for Fortran
was misguided?  It happens on the net all the time - at least a
dozen times a year on each relevant newsgroup (and some irrelevant
ones).  Yet, such a position is as much "simple hubris" (or more)
than the statements I'm making.

> [...]
> ]  Forcing them all to masquerade as pointers
> ]only confuses the person maintaining the code - and doesn't give the
> ]compiler enough information to adequately optimize.
> 
> Funny, I've never been confused by any of the above uses of pointers
> (with the possible exception of run-time equivalencing, I don't know
> what you mean by that...).

This is a variation on the old "blame the victim" approach that lawyers
defending rapists use.  What you're saying is that you don't have the
problem and those that do aren't worth considering.  Aside from the
ethical questions about your apathy toward other programmers, what about
practical issues such as increased software costs, or increased taxes
(the government hires programmers too - not all of them are as immune
as you are to these errors)?

But, let's test your claim:  what is the expected argument type in the
following ANSI C style function prototype?

   int f(char *x);

Now, is the argument expected a single character which is passed by
reference?  Does the function expect an array of char?  Does the
function expect a sequence of char (terminated with a zero byte)?
Or, perhaps the function expects something entirely different and
it is declaring its argument to be a (char *) because the programmer
"knows" that (char *) is more or less generic (this dependence on
internal, machine dependent internal structure is what I mean by
run-time equivalence - it is often a valuable and important thing to
do, but I don't think pointer casting - implicit or explicit - is
the best mechanism)?  You can't tell?  You are now having a problem
pointers that you claim you don't have.  The only way to tell what
this argument is expected to be is to inspect the body of the function.
Commentary and other sources may _claim_ that the argument is used
in a particular way - but without compiler enforcement, such claims
are not reliable.

> [...]
>               If my mental picture of a solution to a problem involves
> pointers, then the language should let me _express_ the solution with
> pointers.  Yes, yes, _you_ don't think I should be picturing problem
> solutions with pointers.  I happen to disagree with you, and so do
> thousands of other C programmers.

Then leave the discussion!  C is a fait accompli and this is a
discussion about the possible design of some future language.
If you feel that C has all you need already, and in a form you
like, then I promise never to twist your arm to buy this new
language.  You can continue to use C until doomsday for all
I care.

My remarks are addresses to those who aren't already religious about
C or about a programming pradigm requiring pointers.  To those people
I am pointing out that there are ways to design languages which permit
them to explicitly declare the structure of their data in ways that the
compiler can check - thus rendering a large class of potential errors
completely impossible.  Additional advantages to this approach include
the fact that the compiler itself can make use of this more explicit
information to produce more efficient code.  Another advantage is that
explicit data structuring actually allows greater flexibility in the
use of such structures.

As an example of this last point: consider arrays and sequences.  They
are, as you keep asserting, quite similar concepts.  But, they aren't
identical.  The rank and extent of an array is fixed for the lifetime
of the array (which may be dynamic: specified in an allocate statement).
The length of a sequence (it is always of rank one) is variable.  This
means that the concatenate operator is quite appropriate and useful for
sequences, but is of doubtful use for arrays (and even of doubtful
meaning for arrays of rank greater than one).  If both arrays and
sequences are implemented in the same way (as you insist using pointers),
then the introduction of a concatenate operator would endanger error
due to accidental use on arrays.  This may not seem important and it
may not occur very often, but it could be a very difficult error to
discover in a large code.  And, it's so easy to avoid entirely by
making sequences and arrays distinct concepts in the language.

J. Giles

jlg@lanl.gov (Jim Giles) (10/13/90)

From article <387@shum.huji.ac.il>, by gadi@SHUM.HUJI.AC.IL (Gad Aharoni):
> [...]
> To all you pointer haters, I hate to point out that it is exremely
> awkward to implement a graph data structure without the use of pointers.
> [...]

Yes, and an IF statement is exremely awkward to implement without GOTOs.
The fact that the implementation may _internally_ use pointers and/or
GOTOs should trouble no one at the source code level - in fact, for
the most part, they shouldn't need to know or care.  Graphs are easy
to implement as recursive data structures without any user-visible
variable whose value is an address - that is to say: without explicit
pointers.

J. Giles

jlg@lanl.gov (Jim Giles) (10/13/90)

From article <1990Oct11.100302.19258@maths.nott.ac.uk>, by anw@maths.nott.ac.uk (Dr A. N. Walker):
> In article <1224@exodus.Eng.Sun.COM> chased@rbbb.Eng.Sun.COM
> (David Chase) writes:
>> [...]								In C,
>>C++, Pascal, Modula-2, BCPL, and PL/1, the programmer is responsible
>>for ensuring that whatever the pointer points to is not reused while
>>the pointer is in use.  Getting this wrong leads to messy bugs [...]
> 
> 	In sensible languages (like Algol), storage allocation and,
> more importantly, deallocation is monitored by the compiler and the
> run-time support.  So this bug is detected mostly as a compilation
> error, occasionally as a run-time fault, but in any case *when it
> happens*, not when the program inexplicably fails some time later.

Yes.  But at considerable overhead.  The fastest method I know of is a
reference counting scheme which is order (N+M*C); where N is the number
of nodes descended from the pointer being monitored, M is the number of
edges in the structure connecting those N nodes, and C is the number of
simple cycles in the structure (a simple cycle is one which either
entirely contains each cycle which overlaps it or which is entirely
contained within each overlapping cycle).

Note that this issue arises identically for recursive data structures
which allow aliasing.  So this technique is equally applicable (or
expensive) regardless of whether the pointers involved are explicit
or not.  In fact, for my language feature (the 'aliased' attribute),
the user must either take the responsibility for explicitly allocating
and deallocating them or he must pay the overhead for an automatic
maintenence of them.  This is why 'aliased' is optional and is off
by default (among other reasons).

J. Giles

jlg@lanl.gov (Jim Giles) (10/13/90)

From article <26296@megaron.cs.arizona.edu>, by gudeman@cs.arizona.edu (David Gudeman):
> [...]
> An int, on the other hand, is a variable whose _value_ is a
> bit-string.  But, bit-strings aren't particularly important to me.  I
> almost never need to know or manipulate them in any way.  So why do I
> need a data type who's values are bit-strings?

Here we see the basic confusion between the semantics of something and
its implementation.  The _value_ of an int is an integer restricted to
some limited range (machine dependent - unfortunately).  The internal
representation _may_ be a bit string.  On the other hand, the representation
_may_ be a string of decimal digits (which in turn _may_ be implemented
as bit strings), or the value _may_ be represented as trits (which I
assume is the name one would give to the basic units of tri-state logic
designs).  Which of these (or any other) internal representations I
get is (or at least, should be) irrelevant to me.

> [...]
> The above probably won't make the point (though it should) so I will
> expand on it.  A pointer might have an address as its concrete value,
> but you should be thinking of a pointer as an abstraction, not by its
> concrete representation.  A pointer is a data type with certain
> operations on it:

And, if the equivalent functionality can be provided in a different
for which gives the user more control over the semantics while allowing
the compiler greater freedom in choosing the internal representation,
why not prefer it?

> [...]
> for pointers declared "TYPE *p, *q", the following
> 	*p returns an l-value referencing an object of type TYPE (as
> 	   for any other data object, if p is not initialized the
> 	   outcome of any operation is undefined)

And, for non-pointers declared TYPE p, q, the following p returns an
l-value referencing an object of type TYPE

As for any other data object, the default initial value should be
required to be specified either by the definition of TYPE itself, or
by initialization within the declaration of each variable - with the
later overriding the former if both initializations are present.
The only exception to this is in the case of dynamically allocated
variables, in which case, the allocation mechanism should always
initialize.

> [...]
> 	p + i returns a pointer to the ith element of type TYPE
> 	   following *p.  This is only valid if there is a contiguous
> 	   sequence of at least i elements  of type TYPE following *p.

If p is to be used as an indexed object, it should be declared as such.
Either an array or a sequence specification in the declaration of p
should be required.  That way, both the user and the compiler are
aware of the bounds, extent, and rank over which the object is indexed.

> [...]
> 	p < q returns TRUE iff *p preceeds *q in a sequence of
> 	   elements of type TYPE, assuming that both *p and *q are
> 	   members of the same contiguous sequence.

The same applies to indexes within arrays or sequences.  With the
added advanntage that the fact that both indexes apply to the same
contiguous sequence of values is explicit in every use - so both
the user and the compiler can make precise assumptions about this
property.

The issue isn't whether pointers have properties which can be
abstracted from raw addresses or not.  The issue is whether the
properties of pointers are required (or even desireable) in the
context of a high-level language.

J. Giles

pcg@cs.aber.ac.uk (Piercarlo Grandi) (10/13/90)

On 10 Oct 90 14:31:03 GMT, lowry@arnor.uucp said:

lowry> We believe that getting a program logically correct and making it
lowry> efficient should be two independent tasks.  The Hermes language allows
lowry> this separation by not requiring the programmer to make decisions that
lowry> will determine efficiency.  Data representation is just one example of
lowry> this.  Others are replication and migration of processes and/or data,
lowry> concurrent execution of programs written in serial form, etc.

Well said. Note that under the idea that what we really want to do is
reuse of interface, semantics, implementation, you are saying that it is
efficient to achieve a lot of reuse of interface and semantics is by
having a very high level language. Agreeable point.

lowry> It is part of our continuing research to develop very aggressive
lowry> and high-level optimizations of this sort so that programs
lowry> written in our model will, in fact, execute efficiently in a wide
lowry> range of environments.

I most strongly object to call this 'aggressive optimization'. It makes
'aggressive optimization' look respectable.

Aggressive optimization is where the optimizer rewrites your program in
a supposedly similar form behind your back. In your case you are just
doing competent implementation of high level constructs, which were
carefully designed with clean semantics to give ample opportunities for
safe application of sophisticated implementation strategies.

The query planner of a relational database does not do any aggressive
optimization -- it is just required to do its job well. It starts to be
aggressive if it rewrites queries before planning them, and hen it
becomes really obnoxious only if the algebra underlying the query
language is poorly designed, like in SQL. But note that in a well designed
high level application language there *should* be no point in
restructuring a user's program, even in the presence of a clearly
defined and safe algebra for doing so -- there is essentially only one
canonical form of achieving an effect.

lowry> The programmer can also be involved in tuning, but this should
lowry> generally come only after the program has been made correct.  The
lowry> tuning comes in the form of sprinkling "pragmas" around the code in
lowry> various places.

Well said. 'register' or 'inline' come to mind immediately, at a much
lower level of abstractions, or providing expected domain value
statistics and join paths to a query planner or relation creator for a
relation database.

Again, of course, as far as current technology is concerned, we are
speaking of *application level* programming. Somebody has still got to
do the bit twiddling.
--
Piercarlo "Peter" Grandi           | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

carroll@cis.udel.edu (Mark Carroll) (10/14/90)

In article <2632@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
>In article <1990Oct10.193502.2011@d.cs.okstate.edu>, norman@d.cs.okstate.edu (Norman Graham) writes:
]] From article <2627@l.cc.purdue.edu>, by cik@l.cc.purdue.edu (Herman Rubin):
]]] In article <2883@igloo.scum.com>, nevin@igloo.scum.com (Nevin Liber) writes:
]]]] In article <64618@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
]]]  
]]]]]It is my contention that future languages
]]]]]shouldn't have pointers at all.  Not just no C-like pointers, none at
]]]]]all.  I just picked on C as the most unpleasant example of what I'm
]]]]]against.
]
]			.......................
]
]] Pointers are not the only way of providing these kinds of operations--
]] they're not even the most beautiful (conceptually speakin, of course).
]] The ability to pass functions to function and to store functions in
]] data structures only requires that functions be first class values in
]] the language: Pointers to functions are unneeded if functions are 
]] values in the language. You just need to match the tool to the 
]] problem. If you require pointers, use a languages with pointers;
]] otherwise use a conceptually more simple language~r{_{_.
]
]There are real simplifications and apparent simplifications.
]
]Suppose that internally information is being passed in a computer.  Now
]the only ways that I can see to pass this information is to pass all or
]part of the value, the name, or a pointer.  By part, one can pass part of
]the value and a way to get at the rest, so that the problem is repeated,
]although there are many places where this is done; many string procedures
]pass part of the string and pointers to what is before or after.
]
]Now passing the value may be the best way to do things, or passing a pointer.
]Passing the name only rarely; interpreters are slow, and compilers and linkers
]normally replace names with values or pointers, even for first-class objects.

Herman, you should really strongly consider learning a little bit about
languages, because you're continually making a fool of yourself out of
ingorance.

*** Machine Level Representation != Language Level Representation ***

That's the ENTIRE POINT of high-level languages - to allow us to write
programs in representations that are different from that of the underlying
architecture. The "real" way in which things are represented is not
necessarily the best way to present them to a human being.

Of COURSE you'll have pointers on the execution level, if you want to
represent data structures - there's really no way to avoid that. But,
equally, you've got to have bitstrings on the execution level to
represent floating point numbers.  Do I want my progamming language to
force me to twiddle bit strings? No - and equally, I don't necessarily
want my programming language to force me to twiddle pointers. There
are better ways to represent things on my level. Like Recursive data
structures. Or perhaps the table types that the hermes people have
spoken about.

]Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907

	<MC>

jlg@lanl.gov (Jim Giles) (10/14/90)

From article <PCG.90Oct13114804@odin.cs.aber.ac.uk>, by pcg@cs.aber.ac.uk (Piercarlo Grandi):
> [...]
> I most strongly object to call this 'aggressive optimization'. It makes
> 'aggressive optimization' look respectable.
> 
> Aggressive optimization is where the optimizer rewrites your program in
> a supposedly similar form behind your back. [...]
> [...]
>-Piercarlo "Peter" Grandi     [...]

Grandi comes up with this one periodically.  I can never pin him down
on a meaning.  A compiler is a tool which  transforms one representation
of a program (the source) into a semantically equivalent form (the object
code) in another language.  If the compiler changes the semantics of the
program during this process, it is _BROKEN_.  The optimizer, as part of
the compiler, must obey this same constraint.  If the optimizer changes
the semantics of a code, it is _BROKEN_.  An optimizer may be as
"aggressive" as the compiler implementor wants to make it - as long
as it doesn't alter the semantics of the code it's translating.

J. Giles

peter@ficc.ferranti.com (Peter da Silva) (10/14/90)

How about taking this to alt.religion.computers?

As for the usefulness of pointers, or whether they're the data-structure
equivalent of a GOTO, remember that most practical people in this world are
willing to accept that an occasional GOTO is useful or even desirable.

And, remember that C is nearly 20 years old! It's not the perfect tool for
every job, so what? You can't cook everything in a microwave oven, either.
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.   'U`
peter@ferranti.com

gudeman@cs.arizona.edu (David Gudeman) (10/14/90)

In article  <65680@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
]
]Here we see the basic confusion between the semantics of something and
]its implementation.

That was _clearly_ a rephrasing something _you_ said.  The purpose of
the rephrasing was _clearly_ to show that you had this basic
confusion.  Taking a quote out of context like that is really low.

In article  <65662@lanl.gov> jlg@lanl.gov (Jim Giles) writes:

]And yet, you do not find it "simple hubris" when people (including
]yourself) maintain that the features of C are all anyone ever needs.

That is a lie.  Slander, libel, and not even a nice or true thing to
say about me.  I have never, _never_, NEVER, said, implied, intimated,
believed, thought, or given credence to the statement that the
features of C are all anyone ever needs.  If anyone asked my opinion
on C, I would say that it is OK in it's application domain, widely
known, and has some nice features.  But if your application isn't
seriously performance bound, use a real language like Lisp or Icon,
one that supports multiple paradigms and has lots of nice high-level
data structures.

]If my opposition to pointers seems overblown, it is because all the
]false hype popularizing C is even more so.  How many time have you
]chastized a C supporter for claiming that a preference for Fortran
]was misguided?

My opinion on Fortran is about the same as my opinion on C, except for
the part about having some nice features.  Maybe if I knew a Fortran
more recent than Fortran 77 I could even say nice things about it.

]This is a variation on the old "blame the victim" approach that lawyers
]defending rapists use.  What you're saying is that you don't have the
]problem and those that do aren't worth considering.

That is a pretty sleazy rhetorical ploy.  People who have trouble
reading my code are not the moral equivalents of rape victims.  They
are not very competent programmers either.  As to "blame the victim",
there is not necessarily any "blame" associated with incompetence.  I
just don't want to be bound in _my_ actions by the possible
incompetence of other people.  I also don't expect professional
basketball players to play blindfolded with one arm tied behind their
backs so that clutzes like me have a chance to be in the NBA.

]  Aside from the
]ethical questions about your apathy toward other programmers, what about
]practical issues such as increased software costs, or increased taxes
](the government hires programmers too - not all of them are as immune
]as you are to these errors)?

First, I do not claim immunity to errors.  I claim that using pointers
does not cause me to make errors.  Second, I do not claim that
everyone should use pointers.

]But, let's test your claim:  what is the expected argument type in the
]following ANSI C style function prototype?
]
]   int f(char *x);
]
A pointer to a character.

]Now, is the argument expected a single character which is passed by
]reference?  Does the function expect an array of char?  Does the
]function expect a sequence of char (terminated with a zero byte)?

Yes.  One of those.

]Or, perhaps the function expects something entirely different and
]it is declaring its argument to be a (char *)

In a post-ANSI C this would be trouble.  The programmer's problem, not
mine.

OK, now your turn.  In the following LISP function

(defun f (x) ....

What is x supposed to be?  One of the above things?  A symbol?  An
array?  Hmm.  Looks like you are going to have to look at the code.

]  You can't tell?  You are now having a problem
]pointers that you claim you don't have.  The only way to tell what
]this argument is expected to be is to inspect the body of the function.
]Commentary and other sources may _claim_ that the argument is used

Well, you either trust the documentation or you don't.  If you don't,
then you have to inspect the body of the function anyway to see if it
does what you want it to.

]Then leave the discussion!  C is a fait accompli and this is a
]discussion about the possible design of some future language.

It is?  Then how come the article I responded to looked like a nothing
more than a pointless (heh,heh) slam at pointers?

]... The rank and extent of an array is fixed for the lifetime
]of the array (which may be dynamic: specified in an allocate statement).

APL programmers will be surprised to hear that.

]The length of a sequence ... is variable.

Mathematicians will be surprised to hear that.

]  This
]means that the concatenate operator is quite appropriate and useful for
]sequences, but is of doubtful use for arrays

I don't see the relationship.  When you concatenate two sequences to
get a third, you haven't changed the length of any sequence.  Changing
the length of a sequence implies that it is a mutable object in the
first place, and in the second place makes it deque.

]  If both arrays and
]sequences are implemented in the same way (as you insist using pointers),

More slander.  I never insisted on any such thing.

]then the introduction of a concatenate operator would endanger error
]due to accidental use on arrays.

Oops.  And maybe you just discovered a useful new operation on arrays
by observing their similarity to sequences.  It works in APL.
-- 
					David Gudeman
Department of Computer Science
The University of Arizona        gudeman@cs.arizona.edu
Tucson, AZ 85721                 noao!arizona!gudeman

gadi@SHUM.HUJI.AC.IL (Gad Aharoni) (10/15/90)

In article <65664@lanl.gov> jlg@lanl.gov (Jim Giles) writes:

>... Graphs are easy
>to implement as recursive data structures without any user-visible
>variable whose value is an address - that is to say: without explicit
>pointers.

Ok. Show me how you would implement a graph data structure without using
pointers, and when I say pointers I also mean array indices.

The only other way I can think of is to use the standard logic, or
relational database technique of listing the dependencies. I am not
too fond of this method of representing a graph because when a path
in the graph is followed, the next node is not "at hand". When a list
is followed the next element is always there, when following the right
son of a tree, the subtree is there, at hand, immediatly accessible.
In this method of representing a graph the next node has to be searched
for in the table before the node is accessed. This is in fact simulating
store, only without the advantage of immediate access.

Gadi
--
Gad Aharoni                                              TEL: +972-2-584932
BITNET: gadi@humus.bitnet           CSNET & INTERNET: gadi@humus.huji.ac.il
Snail: Dept. of CS, Hebrew University of Jerusalem, Jerusalem 91904, Israel

chl@cs.man.ac.uk (Charles Lindsey) (10/15/90)

In <65664@lanl.gov> jlg@lanl.gov (Jim Giles) writes:

>........  Graphs are easy
>to implement as recursive data structures without any user-visible
>variable whose value is an address - that is to say: without explicit
>pointers.

How?

anw@maths.nott.ac.uk (Dr A. N. Walker) (10/16/90)

In article <65409@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>[...]						  But, addresses
>aren't particularly important to me.  I almost never need to know
>or manipulate them in any way.  So, why do I need a data type who's
>values are addresses?

    (a) Because, although *you* may not, *others* may, or may think
	they do, or may simply prefer to think of things that way.

    (b) Because, although I may never want to manipulate addresses
	in the bit-twiddling way you seem to be suggesting, I *do*
	quite often want fingers.  If I search a data structure for
	elements with some property (the "largest" node, or the pair
	that are furthest apart, or whatever) it is reasonable to
	keep fingers on the results.  You can certainly disguise the
	pointer-ness of what I am doing, but I don't see why you should
	have to, or indeed want to.

> [...]							  We are no longer
>in the days when anyone _cared_ about the absolute addresses of anything.

	(*Some* people still do, as PCG points out elsewhere.)

>Why have a data type which purports to represent them?

	In most HLLs, no such representation is purported.  A pointer is
simply a way in which one object can make reference to another, which
sounds like a pretty useful abstraction to me.

In article <65662@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>[...]	     The rank and extent of an array is fixed for the lifetime
>of the array [...]

	This will be news to Algol programmers.

> This means that the concatenate operator is [...] of doubtful use
> for arrays [...].

	MODE STRING = [1: 0 FLEX] CHAR;
	STRING s := "tomorrow";
	s +:= 2 * (" and " + s) + " creeps in ..."
	C apologies to Shakespeare! C

-- 
Andy Walker, Maths Dept., Nott'm Univ., UK.
anw@maths.nott.ac.uk

anw@maths.nott.ac.uk (Dr A. N. Walker) (10/16/90)

In article <1990Oct10.195202.2340@d.cs.okstate.edu> norman@d.cs.okstate.edu
(Norman Graham) writes:
>[...]					    you may want to read this
>Miranda definition of a 3-ary tree (just to be different).

	I keep trying to get interested in Miranda, but "they" want real
money for it, which is bad news for a Maths Dept.

>  three_tree ::= Leaf num | Node int three_tree three_tree three_tree

	Well, that's fine for an acyclic structure [though not the way
I usually *think* of trees] and a lazy evaluator, but it doesn't make
sense to me in the form

	three_graph ::= Node int three_graph three_graph three_graph

Where does it all end, even lazily?  How do I express the identity of
one node with another [ie, perhaps the same node, but reached via some
chain of edges]?  How do I *draw* it?  What is *anyone*'s objection to

	MODE NODE = STRUCT (INT info, REF NODE left, right, middle) ?

-- 
Andy Walker, Maths Dept., Nott'm Univ., UK.
anw@maths.nott.ac.uk

anw@maths.nott.ac.uk (Dr A. N. Walker) (10/16/90)

In article <65665@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>From article <1990Oct11.100302.19258@maths.nott.ac.uk>, by
>anw@maths.nott.ac.uk (Dr A. N. Walker):
>> 	In sensible languages (like Algol), storage allocation and,
>> more importantly, deallocation is monitored by the compiler and the
>> run-time support.  So this bug is detected mostly as a compilation
>> error, occasionally as a run-time fault, but in any case *when it
>> happens*, not when the program inexplicably fails some time later.
>
>Yes.  But at considerable overhead.  The fastest method I know of is a
>reference counting scheme [...]

	[The "bug" referred to is a pointer mistakenly left dangling
into deallocated storage.]

	If pointers may not point into storage with shorter lifetime
[the simplest rule to enforce], then most illegal uses (eg:  returning
a pointer to local storage from a subroutine;  making a global pointer
refer to a local variable) are detected by the compiler.  Only when
procedure parameters are abused in a somewhat devious way is an
expensive check needed.

	The overhead comes from garbage collection.  If your program
happens not to fill your entire virtual machine, you may (and I often
do) get away scot free.  If you do fill the machine, there may then
be a short glitch [which may or may not amount to more or less than
the accumulated minor glitches of doing lots of smaller deallocations
by hand].  This is bad news if you're shutting down a nuclear power
station or landing the space shuttle, but that's a different story,
and most of us don't need to worry about such things.

-- 
Andy Walker, Maths Dept., Nott'm Univ., UK.
anw@maths.nott.ac.uk

lowry@arnor.uucp (10/16/90)

In article <1990Oct10.101527.2247@maths.nott.ac.uk>, anw@maths.nott.ac.uk (Dr A. N. Walker) writes:
|> In article <1990Oct8.135551.21639@arnor.uucp>
|> lowry@lusitania.watson.ibm.com (Andy Lowry) writes:
|> >I personally have written a great deal of code in Hermes and have
|> >found the lack of pointers to be no hindrance.
|> [...]
|> >Of course, our compiler and run-time system make heavy use of pointers
|> >and shared data for reasons of efficiency, but this is completely
|> >hidden from the Hermes programmer.
|> 
|> 	In other words, here is a construct which is incredibly
|> useful for writing compilers and other programs, but which the
|> ordinary programmer is unable to use.

The point I meant to make is that indirect addressing is a fact of
life in most of today's computer architectures.  Therefore, pointers
will certainly be found in efficient implementations of many programs.
This does not mean that the pointers need to show through in the
languages in which the programs are written.  Many computers also have
segment registers, process status registers, data caches, and other
such features, but I'm not ordinarily aware of them when I write
programs.  When we implemented the Hermes run-time system, we did, in
fact, make heavy use of pointers.  This is because we wrote the system
in C, where pointers are indispensible.  We believe that we have
designed Hermes in such a way that pointers are not indispensible.

|> 	When, as a mathematician [well, as a game player, really],
|> I think of a tree or a graph, I think naturally and easily of each
|> node as being some information plus a collection of pointers to other
|> nodes.  I draw them that way.  You may think of them differently;
|> but I think I should be allowed, within my chosen language, to program
|> them my way, and not have to pummel them into some other shape.

I expect you actually think of them as a collection of nodes and arcs.
At least that's how they're normally presented in mathematical books
and articles.  Computer programmers typically represent the arcs as
arrays or lists of pointers to successor nodes.  This moves away from
the conceptualization and biases the implementation in some very
significant ways.  For example, without substantial extra baggage,
such a representation will never support following the arcs backward
as easily or cheaply as forward.

The Hermes programmer could represent a graph precisely in the
standard conceptual manner--as a collection of nodes and a collection
of arcs, where each arc is a record containing two node id's.  The
compiler and run-time system would choose representations and
algorithms based on a number of factors, perhaps including observed or
anticipated access patterns, hints from the programmer, etc.  If the
access patterns were to change, a recompilation could adapt these
choices as needed to maintain efficiency.  This is much cheaper and
less error-prone than re-engineering a C program because reverse
traversals are suddenly needed in a structure that has only forward
pointers.
-- 
Andy Lowry, lowry@ibm.com, (914) 784-7925
IBM Research, 30 Saw Mill River Road, P.O. Box 704, Yorktown Heights, NY  10598

pcg@cs.aber.ac.uk (Piercarlo Grandi) (10/16/90)

On 13 Oct 90 20:43:18 GMT, jlg@lanl.gov (Jim Giles) said:

jlg> From article <PCG.90Oct13114804@odin.cs.aber.ac.uk>, by
jlg> pcg@cs.aber.ac.uk (Piercarlo Grandi):

pcg> I most strongly object to call this 'aggressive optimization'. It makes
pcg> 'aggressive optimization' look respectable.

pcg> Aggressive optimization is where the optimizer rewrites your program in
pcg> a supposedly similar form behind your back. [...]


jlg> Grandi comes up with this one periodically.  I can never pin him down
jlg> on a meaning.

I am sorry on this -- it is a fairly imprecise concept and I seem not to
be able to express it clearly, even if I have given at least several
examples. I doubt it can be defined precisely, even if certainly more
precisely than pornography... :-)

jlg> A compiler is a tool which transforms one representation of a
jlg> program (the source) into a semantically equivalent form (the
jlg> object code) in another language.  If the compiler changes the
jlg> semantics of the program during this process, it is _BROKEN_.  The
jlg> optimizer, as part of the compiler, must obey this same constraint.
jlg> If the optimizer changes the semantics of a code, it is _BROKEN_.

Yes, as long as the result is correct, an optimizer can do what it
wants. But here we are not interested on what a correct optimizer can
do; we are interested in what is proper or cost/effective for an
optimizer to do. In particular if there is reason to reckon that certain
types of optimization are hard to get right, that is the generated code
is incorrect with higher probability, they may not be very cost
effective. In practice the probability of code being generated
incorrectly is indeed an increasing function of optimizer complexity,
and of the level of semantic restructing performed by it.

So the statement:

jlg> An optimizer may be as "aggressive" as the compiler implementor
jlg> wants to make it - as long as it doesn't alter the semantics of the
jlg> code it's translating.

is true but not interesting -- it is a tautology. The interesting
question is which transformations, when going from source to target
code, are worth doing, performance and reliability wise.

Maybe I can find a more understandable way of expressing my thought:

I reckon, with some supportive reasoning that I will not repeat here,
that the tranformations that do not involve any "horizontal" rewriting
of a program's logic are worthwhile. I mean by this tranformations that
preserve the structure of the algorithm being translated.

If the structure is relatively underspecified (a very high level
language) then there is considerable scope for clever implementation
strategies. If it is relatively precisely specified (C, other systems
implementation languages) then it should be respected. I also regard
attempts at reinterpreting low level languages as high level ones, to
get around this, as dangerous, or at least inferior to using languages
designed at the desired level in the first instance.

Aggressive optimization is the idea that restructuring a program *in the
course of translation* is both feasible and advantageous, and desirable
(cost effective).

My contention is that logic restructuring optimizations are more cost
effective instead at the source level, whether automatic or manual, and
that often automatic is not necessary, manual is sufficient.

My main reason for surmising so is that automatic program analysis and
rewriting is often far more difficult than code planning and synthesis,
and the benefits are not as often competitive with those of more limited
and safer alternatives (source analysis and rewriting).

More shortly: when -O[2345....] will not raise substantially the
probability of getting broken code in most compilers, when it will not
result in huge increases in compile time or space, and when it will give
substantially better results than hand tuned code, then I will BELIEVE!
:-).
--
Piercarlo "Peter" Grandi           | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

mjs@hpfcso.HP.COM (Marc Sabatella) (10/17/90)

>>[...]						  But, addresses
>>aren't particularly important to me.  I almost never need to know
>>or manipulate them in any way.  So, why do I need a data type who's
>>values are addresses?
>
>    (a) Because, although *you* may not, *others* may, or may think
>	they do, or may simply prefer to think of things that way.

Count me in.  I write pretty low level code.  Dynamic linker/loaders, etc.
Darn tootin' I care about addresses.  Pretty useful when doing symbolic
relocation.

--------------
Marc Sabatella (marc@hpmonk.fc.hp.com)
Disclaimers:
	2 + 2 = 3, for suitably small values of 2
	Bill and Dave may not always agree with me

peter@ficc.ferranti.com (Peter da Silva) (10/18/90)

In article <1990Oct15.204343.2907@arnor.uucp> lowry@lusitania.watson.ibm.com (Andy Lowry) writes:
> The Hermes programmer could represent a graph precisely in the
> standard conceptual manner--as a collection of nodes and a collection
> of arcs, where each arc is a record containing two node id's.

...each of which node-ids is a pointer, no? It's not a machine address, but
its an identifier that can reference any arbitrary node... it's a pointer
into a bounded collection of objects, but you still have to worry about
aliasing within that collection. The problem of aliasing is alleviated,
but not solved.

(basically, you've switched from C to lisp)
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.   'U`
peter@ferranti.com

nevin@igloo.scum.com (Nevin Liber) (10/18/90)

[<sigh> I moved this thread over to comp.lang.misc about a week and a
half ago; then our machine (igloo) took a few power hits.  I missed most of the
real details and find myself left with a flame war.  Just my luck.]

In article <393@shum.huji.ac.il> shum!gadi@lilac.berkeley.edu (Gad Aharoni) writes:
>In article <65664@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>
>>... Graphs are easy
>>to implement as recursive data structures without any user-visible
>>variable whose value is an address - that is to say: without explicit
>>pointers.

>Ok. Show me how you would implement a graph data structure without using
>pointers, and when I say pointers I also mean array indices.

All that are needed are the stack operations push and pop, and the queue
operations put and get, and it should be fairly easy to implement a
graph data structure.  If this isn't true, then it would be impossible
to implement graphs in Icon and LISP.

I am not looking for "C without pointers".  What I feel is that,
sometime in the future, our computer languages (if that is how
we program computers at all :-)) will have enough higher-level
constructs (such as stack and queue operations on arbituary complex
objects), that pointers will be superfluous.

Pointers will, of course, always remain on the implementation level.
But on the programming level, it tends to get messy.  I would _much_
rather deal with references than pointers.  Once you have references
and high-level data structures and operations, what do you still need
pointers for (other than hardware manipulation, of course)?

A related question:  why _don't_ languages such as LISP and Icon (and
probably Smalltalk; I haven't used it enough to be able to comment on
it) have pointers on the programming level?  What classes of problems
can't be solved in these languages that can in say, a strictly
conforming ANSI C program involving pointers (other than OS specific
and hardware-specific stuff, of course)?
-- 
	NEVIN ":-)" LIBER
	nevin@igloo.Scum.com  or  ..!gargoyle!igloo!nevin
	(708) 831-FLYS
California, here I come!	Q: Who killed Laura Palmer?  A: Waldo did it.

lowry@arnor.uucp (10/19/90)

In article <=GH6UHD@xds13.ferranti.com>, peter@ficc.ferranti.com (Peter da Silva) writes:
|> In article <1990Oct15.204343.2907@arnor.uucp> lowry@lusitania.watson.ibm.com (Andy Lowry) writes:
|> > The Hermes programmer could represent a graph precisely in the
|> > standard conceptual manner--as a collection of nodes and a collection
|> > of arcs, where each arc is a record containing two node id's.
|> 
|> ...each of which node-ids is a pointer, no? It's not a machine address, but
|> its an identifier that can reference any arbitrary node... it's a pointer
|> into a bounded collection of objects, but you still have to worry about
|> aliasing within that collection. The problem of aliasing is alleviated,
|> but not solved.
|> 
|> (basically, you've switched from C to lisp)

Not really.  Just because I have the same integer value stored in two
different variables, does that mean they are aliased?  Just because I
have the same node ID stored in two different variables, are those
variables aliased?  I can change one without the other being affected.
I would say they are not aliased.  Hermes guarantees that it is
impossible to change the value of one variable as a side-effect of
changing the value of another.  This is what we mean when we say
Hermes does not allow aliasing.

In Lisp, this is not the case.  I can write:

	(setq x '(1 2 3))
	(setq y x)
	(rplaca x 'a)

It's true that if x and y contain node ID's, then I can retrieve the
node identified by x and change it, and now if I retrieve the node
identified by y I will see the change.  But I did not change the value
of y.

Perhaps one could say that the distinction is a bit muddy, and a
stronger guarantee than the one we make might be useful in some
situations.  Nevertheless, the guarantee we do make is extremely
useful, in that it enables our compiler to perform very strong checks
to ensure that no Hermes process can ever affect another in a way that
could not have been anticipated by looking at the code, knowing
nothing of the underlying implementation.  For example, a bug in one
process can never cause random behavior like crashes to occur in other
processes, even if they are executing in the same address space.  In
addition, though we have not explored this to any great extent, we
believe that our non-aliasing guarantee will enable many compiler
optimizations that cannot be performed in the presence of aliasing.

-- 
Andy Lowry, lowry@ibm.com, (914) 784-7925
IBM Research, 30 Saw Mill River Road, P.O. Box 704, Yorktown Heights, NY  10598

jlg@lanl.gov (Jim Giles) (10/19/90)

From article <=GH6UHD@xds13.ferranti.com>, by peter@ficc.ferranti.com (Peter da Silva):
> [... description of a particular structure ...]
> ...each of which node-ids is a pointer, no? It's not a machine address, but
> its an identifier that can reference any arbitrary node... it's a pointer
> into a bounded collection of objects, but you still have to worry about
> aliasing within that collection. The problem of aliasing is alleviated,
> but not solved.

Yes, but this alleviation is important.  That's why the C habit of
changing arrays to pointers in procedure calls is a _really_ bad idea.

J. Giles

anw@maths.nott.ac.uk (Dr A. N. Walker) (10/19/90)

In article <1990Oct15.204343.2907@arnor.uucp> lowry@lusitania.watson.ibm.com
(Andy Lowry) writes:
>The point I meant to make is that indirect addressing is a fact of
>life in most of today's computer architectures.  Therefore, pointers
>will certainly be found in efficient implementations of many programs.
>This does not mean that the pointers need to show through in the
>languages in which the programs are written.

	True.  But no-one has yet given a convincing (to me!) reason
why pointers should *not* show through.

>					       Many computers also have
>segment registers, process status registers, data caches, and other
>such features, but I'm not ordinarily aware of them when I write
>programs.

	Umm.  Not directly, perhaps;  after all, many such features are
transparent even at the assembler level.  However, you may soon have a
poor data locality, for example, brought indirectly to your attention
when the data cache ceases to be effective -- I have seen programs very
suddenly run over 100 times more slowly when an array size crept over
some limit.  [Admittedly, this was in 1965;  but I expect Jim Giles sees
it happen even today when vectorisation fails.]

>	    When we implemented the Hermes run-time system, we did, in
>fact, make heavy use of pointers.  This is because we wrote the system
>in C, where pointers are indispensible.  We believe that we have
>designed Hermes in such a way that pointers are not indispensible.

	Presumably an acid test will come when the system is written
in Hermes and becomes self-supporting.

>I expect you actually think of them as a collection of nodes and arcs.

	["you" is me, and "them" is trees and graphs]

>The Hermes programmer could represent a graph precisely in the
>standard conceptual manner--as a collection of nodes and a collection
>of arcs, where each arc is a record containing two node id's.

	Indeed.  But the first actual implementation of graphs that
I clapped my eyes on started out something very like

	MODE GRAPH = STRUCT (VERTICES v, EDGES e),
	     EDGE = STRUCT (REF VERTEX a, b),
	     VERTEX = ...etc...

which seems not so far away.  The distinction between "each arc is a
record containing two node id's" and "each arc is a record containing
two pointers to nodes" seems rather slight, and amounts to "who provides
the node ids?".

> [recompilation]				 is much cheaper and
>less error-prone than re-engineering a C program because reverse
>traversals are suddenly needed in a structure that has only forward
>pointers.

	On the other hand, if reverse traversals are suddenly needed
in a program that was designed on the assumption that they weren't
necessary, re-engineering might be thought highly desirable.

	There are also some problems with compilers that change
algorithms behind the programmer's back, depending on some complicated
analysis.  Suppose my program has a bug that is benign in some
implementations, but not in others.  Lo and behold, all my test programs
work, the production run fails, and as soon as I try to isolate the bug,
it goes away.  (Yes, I know we all have bugs like that sometimes!)

	[Nothing in this should be read as implying that I'm agin the
Hermes project.  What I've heard about it sounds interesting, and
systems that raise levels of abstraction should be encouraged.  I
just don't see pointers (in particular) as undesirable aliens that
*should* be abstracted away.  They are useful citizens, just like
integers and procedures, to be harnessed with care.]

-- 
Andy Walker, Maths Dept., Nott'm Univ., UK.
anw@maths.nott.ac.uk

peter@ficc.ferranti.com (Peter da Silva) (10/20/90)

In article <152323@felix.UUCP> asylvain@felix.UUCP (Alvin "the Chipmunk" Sylvain) writes:
> Nope.  C (at least) allows for variables to be 'static'.  No need
> for side-effects to maintain the function's internal state.

Those *are* side-effects, since they mean the same function may return
different values on successive calls with the same calling sequence. This
has the same effects on predictably and optimisation as more obvious side
effects.
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.   'U`
peter@ferranti.com

pcg@aber-cs.UUCP (Piercarlo Grandi) (10/22/90)

In article <1990Oct15.175924.14455@maths.nott.ac.uk> anw@maths.nott.ac.uk
(Dr A. N. Walker) writes:

  In article <1990Oct10.195202.2340@d.cs.okstate.edu> norman@d.cs.okstate.edu
  (Norman Graham) writes:

  >[...] you may want to read this
  >Miranda definition of a 3-ary tree (just to be different).

  I keep trying to get interested in Miranda, but "they" want real
  money for it, which is bad news for a Maths Dept.

Try Haskell or SML. Not terribly different. Much the same technology.

  >  three_tree ::= Leaf num | Node int three_tree three_tree three_tree

  Well, that's fine for an acyclic structure [though not the way I usually
  *think* of trees] and a lazy evaluator, but it doesn't make sense to me in
  the form

  	three_graph ::= Node int three_graph three_graph three_graph

  Where does it all end, even lazily?

Ahem. It does not end. There is no problem though. It is just an incorrect
definition. Let's rewrite it like this:

	graph = NULL: empty | roots
	roots = node | roots node

	node  = NODE: info LINKS: nodes
	nodes = empty | nodes node

Observe that this is very much like a syntax. Indeed you can use a 2 level
grammar to model anything at all (book by Tzischritis and Uzgalis), and it
does generate infinite syntaxes (needed to model context dependency), not
just infinite syntax trees (needed to model graphs). This is not a
difficult, because you never need to actually expand the grammar in an
infinite way, given a finite data strcture or program.

  How do I express the identity of one node with another [ie, perhaps the
  same node, but reached via some chain of edges]?  How do I *draw* it?

This is a thing on which I have already commented, but I love repeating
myself. The correct way to implement identity is by contents, not location.
If you encode part of the contents in the location, that's an implementation
choice, not a necessary thing. It is also not very nice.

  What is *anyone*'s objection to

  	MODE NODE = STRUCT (INT info, REF NODE left, right, middle) ?

That instead of manipulating just values of type NODE you also have
to deal with values of type REF NODE. This complicates life considerably.
Consider the relational version in some pseudo notation:

	domain source,target: SOMETYPE

	left(source*,target)
	right(source*,target)
	middle(source*,target)

Now you say: but this means that I cannot have two nodes with the same info!
Precisely. We are *assuming* that info identifies a node. Put into info
whatever detail serves to identify the node.

Then you say: but this is inefficient! the same info is replicated up to six
times! the links between a node and its neichbours are spread in three
relations! This is matter for the implementation. It can keep a table of
infos, and keep the three relations clustered either by first or second
column, depending on which is thought ot be the preferred join column.

Note that in your pointer based *implementation* the join column is assumed
to be the first, because your NODE struct is clustered physically on
outgoing pointer, not incoming pointer; you would have to write

	MODE NODE = STRUCT(INT info, FLEX [] REF NODE incoming)

to implement the graph clustered by incoming pointer.

Indeed you can model any graph in relational database technology by saying
something like

	domain node : SOMETYPE
	domain link : OTHERTYPE

	source(node,link*)
	target(link*,node)

This also clearly models the notorious symmetry between nodes and links, and
that links are not just pointers in the general case.

It all depends on what you really want. Data design is more difficult and
less clear with pointers, but implementations are more efficient, all because
pointers require less levels of abstractions.

Note that nobody I know (except for me I mean :->) addresses the issue of
data design in programming languages, yet many programs deal with internal
databases with considerably more complicated implied schemas than those
of many DMBS applications.
-- 
Piercarlo "Peter" Grandi           | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

lowry@arnor.uucp (10/22/90)

In article <1990Oct19.160210.9787@maths.nott.ac.uk>, anw@maths.nott.ac.uk (Dr A. N. Walker) writes:
|> 	There are also some problems with compilers that change
|> algorithms behind the programmer's back, depending on some complicated
|> analysis.  Suppose my program has a bug that is benign in some
|> implementations, but not in others.  Lo and behold, all my test programs
|> work, the production run fails, and as soon as I try to isolate the bug,
|> it goes away.  (Yes, I know we all have bugs like that sometimes!)

You're right, I have experienced many such bugs in my time.

I can see such a scenario coming about for three reasons: (1) there is
a bug in the program, but in some implementations it does no damage
and has no visible effect on the program (e.g. the program fails to
initialize a counter to zero before referencing it, but since the
operating system happens zero static variables during program load,
things work out); (2) the program has different resource requirements
under different implementations, and therefore the two implementations
act differently in the face of resource depletions; (3) the compiler
introduces a bug when applying its transformations.

Category 1 is addressed in Hermes by a new compile-time mechanism we
call "typestate checking."  We use dataflow analysis techniques to
track static characteristics of variables such as their initialization
state and the case of variants.  Each operation of the language has
typestate preconditions, which define what typestate conditions must
hold for the operation's operand variables, and typestate
postcondition rules which define how the operation affects the operand
typestates.  The compiler uses the postconditions to compute a
fixed-point typestate at every point in the program, and rejects any
program that attempts an operation with operands that are not in the
correct typestate.  In addition, when two or more program paths merge,
a meet operation (in the typestate lattice) is performed to compute a
single typestate for the merge point, and "coercion" operations are
inserted on all the incoming paths to lower their individual
typestates to the computed meet.  One effect of the coercions is that
the program is augmented at compile time to automatically dispose of
its data values at appropriate points.  We get automatic garbage
collection without the run-time cost normally associated with this
feature.

Typestate checking elimintates category 1, because it makes it
impossible to write programs that do not have a precise,
implementation-independent meaning.  Thus, in the absence of compiler
bugs, and given adequate resources, one is guaranteed that two
implementations of the same buggy program will both exhibit the bug.

Category 2 is a difficult problem.  If memory is tight, an
implementation that keeps lots of auxiliary data structures to speed
certain operations may fail whereas a slower implementation that
requires less memory will succeed.  In Hermes we provide a built-in
exception called "Depletion" that is meant to handle resource
depletions, such as: memory depletion, computing a numerical value
exceeding hardware's range or accuracy limits, insufficient sockets
available for network communications, etc.  The Hermes programmer is
not spared these exceptions (though we have some ideas on how this
might be achieved).  Implementations are free to use whatever tricks
are available to avoid depletions (like swapping real memory to disk,
migrating data to other network sites, using extended-precision
arithmetic software, etc.), but we don't make it a requirement for
language conformance.  The pragma mechanism I've mentioned before may
be able to address many of these situations by allowing the programmer
to encourage the compiler, for example, to be sparing in its memory
utilization.

Category 3 is a problem in any language except machine language (even
assemblers can introduce addressing errors).  The best I know how to
do is to continue paying especially close attention to compilers and
other low-level software.  Adopting languages and methodologies for
this software that reduce the frequency of bugs (like typestate
checking) will help, of course.
-- 
Andy Lowry, lowry@ibm.com, (914) 784-7925
IBM Research, 30 Saw Mill River Road, P.O. Box 704, Yorktown Heights, NY  10598

chl@cs.man.ac.uk (Charles Lindsey) (10/22/90)

In <2958@igloo.scum.com> nevin@igloo.scum.com (Nevin Liber) writes:

>A related question:  why _don't_ languages such as LISP and Icon (and
>probably Smalltalk; I haven't used it enough to be able to comment on
>it) have pointers on the programming level?  

Smalltalk DOES have pointers on the programming level. Its just that
everything you declare (apart from a few specials like integers) is a pointer
whether you like it or not, so you do not notice the fact until some
unintended aliasing occurs. Eiffel is (more or less) the same.

cik@l.cc.purdue.edu (Herman Rubin) (10/23/90)

In article <5EJ64J3@xds13.ferranti.com>, peter@ficc.ferranti.com (Peter da Silva) writes:
> In article <152323@felix.UUCP> asylvain@felix.UUCP (Alvin "the Chipmunk" Sylvain) writes:
> > Nope.  C (at least) allows for variables to be 'static'.  No need
> > for side-effects to maintain the function's internal state.
> 
> Those *are* side-effects, since they mean the same function may return
> different values on successive calls with the same calling sequence. This
> has the same effects on predictably and optimisation as more obvious side
> effects.

I have yet to see a random number. or pseudo-random number, procedure which
did not exploit this.  The same is true for uses of buffers, reading external
media, etc.  It is also the case when one makes calls by reference, and
uses code to change the values of the arguments.  This even applies to
a subroutine to multiply two matrices.

This means it is the programmer who must decide, and pass on the information
to the compiler, about the side-effects.  Sometimes, but not always, the
compiler can tell by looking at the global code.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet)	{purdue,pur-ee}!l.cc!cik(UUCP)

anw@maths.nott.ac.uk (Dr A. N. Walker) (10/26/90)

In article <2062@aber-cs.UUCP> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:
>In article <1990Oct15.175924.14455@maths.nott.ac.uk> anw@maths.nott.ac.uk
>(Dr A. N. Walker) writes:
	[ of a recursive pseudo-definition of regular graphs ]
>  Where does it all end, even lazily?
>
>Ahem. It does not end. There is no problem though. It is just an incorrect
>definition. Let's rewrite it like this:
	[ gives syntactic definition of a graph ]

	Well, that's fine, and one way I might well do it myself, but it
seems to be not the way that other people are telling me I ought to do it!

>  How do I express the identity of one node with another [ie, perhaps the
>  same node, but reached via some chain of edges]?  How do I *draw* it?
>
>This is a thing on which I have already commented, but I love repeating
>myself. The correct way to implement identity is by contents, not location.

	But why should I have to invent a contents, just so that I can
avoid referring to location?  If I draw a graph with lots of nodes, and
put some information (perhaps a colour, RED, GREEN, BLUE, ...) in each,
I don't necessarily want the contents to be unique.  On the other hand,
if I'm explaining my drawing to you, I can point to nodes: "Look, *this*
node connects to *that*, and so there's a path from *here* to *there*".
There's nothing wrong with location!

>  What is *anyone*'s objection to
>
>  	MODE NODE = STRUCT (INT info, REF NODE left, right, middle) ?
>
>That instead of manipulating just values of type NODE you also have
>to deal with values of type REF NODE. This complicates life considerably.

	But in almost every programming language, in addition to values
of type INT, I have to deal with values of type REF INT (integer variables
in most languages).  Yes, it complicates life, but it's usually thought
to be worth it.  Extending to REF REF INT (pointers), and other REF types,
generalises and simplifies the construct.  It's just a shame that so many
programmers fail to appreciate this (often because their first language
managed to muddy the waters).

-- 
Andy Walker, Maths Dept., Nott'm Univ., UK.
anw@maths.nott.ac.uk

pcg@cs.aber.ac.uk (Piercarlo Grandi) (10/31/90)

On 26 Oct 90 15:59:37 GMT, anw@maths.nott.ac.uk (Dr A. N. Walker) said:

anw> How do I express the identity of one node with another [ie, perhaps
anw> the same node, but reached via some chain of edges]?  How do I
anw> *draw* it?

pcg> This is a thing on which I have already commented, but I love
pcg> repeating myself. The correct way to implement identity is by
pcg> contents, not location.

anw> But why should I have to invent a contents, just so that I can
anw> avoid referring to location?

You are not inventing a content -- what you call "location" here is part
of the contents, it is the relationship between a content and another.
It is also a content, at some higher level of abstraction.

anw> If I draw a graph with lots of nodes, and put some information
anw> (perhaps a colour, RED, GREEN, BLUE, ...)  in each, I don't
anw> necessarily want the contents to be unique.

By definition, if two entities have the same contents they are the same
entity, unless they are elementary particles and Bell's Theorem applies
:-). What actually happens is that then you put only part of the
contents in the actual data storage.

anw> On the other hand, if I'm explaining my drawing to you, I can point
anw> to nodes: "Look, *this* node connects to *that*, and so there's a
anw> path from *here* to *there*".  There's nothing wrong with location!

Then part of the contents is in the location. This is just an
implementation technique, not a concept; and it is an implementation
technique that complicates matters. When I design an entity I want to be
able to know whether it is the same or different from another entity. I
want the entity to contain all information that distinguishes it from
any other. If I do not encode information in the location I can just
look at the the entities and compare them; if I encode part of the
information in the location I also have to compare their locations.

This adds a whole new world of complication, because we need no longer
just an algebra for objects, but also an algebra for locations, and we
must prove that a program not only works ok wrt to contents, but also to
locations, and that the encoding of parts of the contents in the
location is preserved correctly.

This is worth the while if we are working close to the machine, because
that is how the machine is built. It is crazy if we are using an high
level language, like SQL or APL or whatever else like it.
--
Piercarlo "Peter" Grandi           | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

tom@nw.stl.stc.co.uk (Tom Thomson) (11/01/90)

I cannot understand why Jim Giles wants pointers to have values which
are addresses.  The important thing about a pointer is simply that it
identifies (unambiguously) the thing to which it refers.  The only
operations on a pointer are the (a) getting the thing to which it refers;
(b) modifying the value of the thing to which it refers; (c) checking
whether two pointers refer to the same thing.  There is of course no
operation that makes the pointer point to something else (that's an
operation on a pointer-valued variable which assigns a new pointer to
it; think of an integer-valued variable - - if x has the value 3, and
the assignment x:=4 is executed, this does not assign the value 4 to
the integer 3; pointers are no different from integers in this respect
- - and yes, I have come across the broken Fortran compilers that 
allow you to assign a new value to an integer constant :-)  ).
There is no earthly reason why a pointer should support any of the 
extra operations Jim says it has to support (eg ordering), and 
certainly no reason why an implementation should be constrained to
implement it as an address. I wonder if all OO languages implement references  
to an object as addresses?  After all, a reference to an object is no more or
less than a pointer to it.  Do SQL systems implement all identifiers (keys
which are constrained to be unique) as addresses?  Well, I suppose that 
depends on what you mean by an address - - if you regard tables as being
content-addressed they do!  (PierCarlo seems to advocate that all 
identification of objects should be by content - - fair enough at some
level of description, but it's nice to have syntax which allows us to
distunguish a part of the content which can be used to identify the
object and call that part a pointer and have nice syntax for using
it as the access route to the object; and he appears to object to that).
 
Jim's arrays and indices down them are all very well, but they certainly
do not solve the aliasing issue he claims they solve; if I write 
  a[i] := a[j] the compiler does not in general know whether i and j
have different values, ie whether i and j are distinct.  Once you start
using an array of cells to implement a graph structure, with indices
used as pointers, you have exactly as much information about aliasing 
as if you used pointers instead of indices - no more, and no less;
however you as the programmer now have the problem of assigning the 
cells positions in the array, doing your own garbage-collection as
cells become free, coping with types whose values don't all occupy
the same amount of store (so a cell may occupy multiple slots in the
array) and so on; to get the language system to hide all that from you
and do it automatically, you need pointers.  It's easy to see, of course,
that all Jim's arrays do is classify pointers according to whether they
$d
that all Jim's arrays do is partition the store into a set of non-overlapping
regions - so a pointer written a[i] is known to be distinct from a
pointer written b[i]; but there is no difficulty at all in partitioning
the store and then constraining pointer-valued variables to point into
one partition or another: so if p1 is a pointer constrained to point into
region r1 and p2 is a pointer constrained to point into region r2 the
compiler knows that p1 and p2 are not aliases of each other.
 
There may be a case for low-level languages to have a pointer construct
which is constrained to be mapped as an address; for most programming
purposes, I would prefer to work in a high level language.  I claim that
any high level language (other than a single assignment language) must
have a pointer concept (although it may call it something else, and may
use search on content as the pointer resolution mechanism) since otherwise
it cannot express referential sharing and identity - - and if I have
mutable objects (ie not a single assignment language) then I must be able
to determine which refers to which so that I control whose references are
mopdified by an update (incidentally, the implementation may use accidental
sharing to optimise storage usage, and move one of the obkjects sharing
store when it changes but the other objects sharing that store don't. If
the implementation of pointer were address, as Jim insists, the representation
of the pointer to that object - and all copies of that representation -
would have to change even though the pointer has not changed, it still
references the same object. so implementations which use accidental sharing
all violate Jim's definition of pointer.)
 
Tom Thomson   [tom@nw.stl.stc.co.uk

gl8f@astsun9.astro.Virginia.EDU (Greg Lindahl) (11/02/90)

In article <3673@stl.stc.co.uk> "Tom Thomson" <tom@nw.stl.stc.co.uk> writes:

>Jim's arrays and indices down them are all very well, but they certainly
>do not solve the aliasing issue he claims they solve; if I write 
>  a[i] := a[j] the compiler does not in general know whether i and j
>have different values, ie whether i and j are distinct.

Er, actually the example I use from my code looks like this:

	a[i] = b[i] / c[i];
	d[i] = b[i] * c[i];

Fortran only loads b[i] and c[i] once.

C, if you confuse the issue by using pointers, will generally load
b[i] and c[i] twice. If the locations of b or c are passed in, no C
compiler that I know of will optimize it. You can talk about the
general case, and you can debate the definition of a pointer, but
don't make my code run 30% slower.

anw@maths.nott.ac.uk (Dr A. N. Walker) (11/03/90)

In article <PCG.90Oct30201101@teachk.cs.aber.ac.uk> pcg@cs.aber.ac.uk
(Piercarlo Grandi) writes:

	[Well, what he wrote, I found rather confusing (even assuming
that he means by Bell's Theorem what I mean by Bell's Inequalities).
I *think* it amounted to the following:

pcgp>	You shouldn't write "node := GREEN; ptr_to_node := node" because
pcgp>	"node" then doesn't include its location.  You should have instead
pcgp>	"node := { GREEN, node }" and then an associative search for
pcgp>	"{ GREEN, node }" will find node without the need for pointers.

(pcgp means PierCarlo Grandi Paraphrase).  In the following direct
extract, "This" means the version that I *shouldn't* write.]

>This adds a whole new world of complication, because we need no longer
>just an algebra for objects, but also an algebra for locations, and we
>must prove that a program not only works ok wrt to contents, but also to
>locations, and that the encoding of parts of the contents in the
>location is preserved correctly.

	But if content has to include location, then an algebra for
objects has to include an algebra for location, so the algebra and the
proofs already had to be supplied.  Furthermore, if I am denied access
to the algebra of locations, then some useful ideas like anonymous
objects, and like remembering locations, become inexpressible.

	In other words, either the PCGP bears no relationship to what
PCG *meant* to write, or else PCG's article was self-evident nonsense.
At the end of a difficult week, I'm in no fit state to decide which,
but I have my suspicions.

-- 
Andy Walker, Maths Dept., Nott'm Univ., UK.
anw@maths.nott.ac.uk

pcg@cs.aber.ac.uk (Piercarlo Grandi) (11/04/90)

On 2 Nov 90 17:25:08 GMT, anw@maths.nott.ac.uk (Dr A. N. Walker) said:

anw> In article <PCG.90Oct30201101@teachk.cs.aber.ac.uk> pcg@cs.aber.ac.uk
anw> (Piercarlo Grandi) writes:

pcgp> You shouldn't write "node := GREEN; ptr_to_node := node" because
pcgp> "node" then doesn't include its location.  You should have instead
pcgp> "node := { GREEN, node }" and then an associative search for "{
pcgp> GREEN, node }" will find node without the need for pointers.

anw> (pcgp means PierCarlo Grandi Paraphrase).

Not exactly. You should write "node := (GREEN,node id);", where "node
id" is anything you use to know a node from another. For example, its
list of incoming arcs. I would like to repeat my way to describe graphs
in a relation database:

	type node : node_info; # each node contains some info #
	type link : link_info; # each link contains some info #

	relation in(link*,node*); # * means this fields is part ... #
	relation out(node*,link*); # ... of the primary key #

Note that many links or nodes may have the same info, but a specific
link or node will be uniquely identified by context; if it cannot, e.g.
a two node graph like this:

		"B"
	   .-------------.
	  "A"		"A"
	   \_____________/
		"B"

then the graph is inherently ambiguous. It can be disambiguated by
noting that the "A" and "B" information content of the nodes is
incomplete, and should be really:

			("B",top)
	    .-------------------------------.
	("A",left)			("A",right)
	    \_______________________________/
			("B",bottom)

This representation also makes it obvious that each graph has a dual,
i.e. that it is an essentially symmetric structure, that the first
version of the example above is inherently ambiguous (and indeed a
relational database will not allow you to represent it), etc..., all
facts that are lost using pointers.

anw> In the following direct extract, "This" means the version that I
anw> *shouldn't* write.]

pcg> This adds a whole new world of complication, because we need no longer
pcg> just an algebra for objects, but also an algebra for locations, and we
pcg> must prove that a program not only works ok wrt to contents, but also to
pcg> locations, and that the encoding of parts of the contents in the
pcg> location is preserved correctly.

anw> But if content has to include location, then an algebra for objects
anw> has to include an algebra for location, so the algebra and the
anw> proofs already had to be supplied.

You got it the other way round: if pointers make a difference, this can
only mean that part of an object's content are implicit in its in-memory
location, not viceversa (what actually happens is that location includes
a bit of content). If the information encoded in the pointer value were
explicit in the object's contents, pointer values would become
irrelevant, and only object contents would matter; there would not be
any need for comparison operations on pointer values, only the usual
ones on contents.


Back to basics: objects in memory are representations for more abstract
entities. The representation is based on encoding, usually encoding onto
the integers (Hoare's essay in "Structured Programming").

The idea is that if two objects have the same content ("value" when seen
as an integer) they represent the same entity. You might say "no,
because they are at different locations, and I can change one but not
the other which makes them different".

This is pure nominalism; it simply implies that you have encoded part of
the representation in the storage occupied by the object and part in its
address. You never *need* to do so; you can always encode it explicitly,
and often it is better to do so, because it frees you from worry about
locations.

An example where pointers make a difference:

You have two lists, one of names of american states and another of
nuclear submarines. If you get a pointer to the string "Ohio" you don't
know if you are referring to a submarine or a a state, unless you
implement the two lists in two different memory areas of which you know
the bounds, and look at the pointer value to see in which area it falls.

You have encoded the type of the name, a boolean, in the address. You
could have encoded it in the strings, e.g. by writing "ohio" for
submarines and "OHIO" for states. If you had done this you would only
need to understand the encoding rules for strings to understand your
program; encoding the type of the name in its address may save you some
space, but means that you also have to deal with pointer values, not
just data values, because pointer values have become meaningful (laden
with information).

Now, in general the "ohio"/"OHIO" representation is safer, for example
because the fact that the Ohio in question is the submarine and not the
state is location independent, that means that I can copy the string
around and not lose any bits of information. This is what I mean by
saying that the algebra of locations is not dragged into the act,
because no information is encoded in locations.

anw> Furthermore, if I am denied access to the algebra of locations,
anw> then some useful ideas like anonymous objects, and like remembering
anw> locations, become inexpressible.

They are most definitely not useful at the application level, only at a
purely implementation oriented level. They add complications to
reasoning about representations, even if they save some resources. In
particular, in a value based systems all objects are anonymous; they are
only identified by contents, not by context.

You have a claim that pointers are necessary, if I understand correctly.
My observations are:

1) To represent information, pointers are never necessary and when used
as such are actually hazardous. From the application point of view all
information had better be represented explicitly with encoding in the
contents of an object, as in a relational database.

2) Pointers *can* be used to represent information, and the relative
hazard (extra complicated encoding rules) can be cost effective; the
more so the nearer we are to the machine, because it can conserve
storage (information bits are implied by the address) and speed (you
need not to fetch information bits from memory, you can just deduce them
from the address).

But, at the application level usually these concerns are irrelevant.
What you want to do is to represent entities, not be concerned about how
the relevant encoding is done. In building systems programs your
concerns are reversed, so pointers are useuful.
--
Piercarlo "Peter" Grandi           | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/06/90)

In article <1990Nov2.052815.22188@murdoch.acc.Virginia.EDU> gl8f@astsun9.astro.Virginia.EDU (Greg Lindahl) writes:
  [ on one type of aliasing problem ]
> 	a[i] = b[i] / c[i];
> 	d[i] = b[i] * c[i];
  [ Fortran sees separate arrays and loads b[i] and c[i] only once ]
  [ C sees possible aliasing and loads them twice ]

Hmmm. I would almost automatically optimize the above into

  {
   register double bi = b[i];
   register double ci = c[i];

   a[i] = bi / ci;
   d[i] = bi * ci;
  }

Don't ask me to philosophize about this fragment.

> but
> don't make my code run 30% slower.

Hey, dude, write it 30% faster in the first place.

---Dan

cik@l.cc.purdue.edu (Herman Rubin) (11/06/90)

In article <3673@stl.stc.co.uk>, tom@nw.stl.stc.co.uk (Tom Thomson) writes:
> I cannot understand why Jim Giles wants pointers to have values which
> are addresses.  The important thing about a pointer is simply that it
> identifies (unambiguously) the thing to which it refers.  The only
> operations on a pointer are the (a) getting the thing to which it refers;
> (b) modifying the value of the thing to which it refers; (c) checking
> whether two pointers refer to the same thing. 

In producing various types of random variables efficiently, it is
desirable to use buffers and refill subroutines.  Now each buffer
and each refill routine can be changed whenever the refill is called,
and even more can be considered.

Thus the communication scheme is

	if(buffer too close to empty){
		*rprog(info_pointer);
		change info_items in registers;}

info_pointer points to an array which has the information on the buffer
needed for use, and rprog contains the location of the array to be called.
It is true that one could use associative calls, locations, etc., but how
wise is it?  We have loaders to insert pointers to things external to the
subprogram in it.  And unless we use an associative memory, somewhere
pointers will have to be used.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet)   {purdue,pur-ee}!l.cc!hrubin(UUCP)

jlg@lanl.gov (Jim Giles) (11/07/90)

From article <2440:Nov607:32:5890@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> In article <1990Nov2.052815.22188@murdoch.acc.Virginia.EDU> gl8f@astsun9.astro.Virginia.EDU (Greg Lindahl) writes:
>   [ on one type of aliasing problem ]
<>      a[i] = b[i] / c[i];
<>      d[i] = b[i] * c[i];
<   [ Fortran sees separate arrays and loads b[i] and c[i] only once ]
<   [ C sees possible aliasing and loads them twice ]
< Hmmm. I would almost automatically optimize the above into
<   {register double bi = b[i];
<    register double ci = c[i];
<    a[i] = bi / ci;
>    d[i] = bi * ci;}

So would the compiler (a good one anyway).  The problem is that you
are shifting the burden to the programmer (who has enough on his plate
just getting his code to run correctly).  He has no time for these niggly
little optimization issues - especially those that the compiler _should_
solve for him.  To be sure, the programmer may have time to fine-tune
some tune some time-critical fragments of his code: but in that case he
should be able to work on the tough issues that the compiler _can't_
do for him.

J. Giles

gl8f@astsun.astro.Virginia.EDU (Greg Lindahl) (11/08/90)

In article <2440:Nov607:32:5890@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>> [ I write: ]
>> 	a[i] = b[i] / c[i];
>> 	d[i] = b[i] * c[i];
>  [ Fortran sees separate arrays and loads b[i] and c[i] only once ]
>  [ C sees possible aliasing and loads them twice ]
>
>Hmmm. I would almost automatically optimize the above into
>
>  {
>   register double bi = b[i];
>   register double ci = c[i];
>
>   a[i] = bi / ci;
>   d[i] = bi * ci;
>  }

Great, you've just doubled the # of lines of code. This is known as
"creating a maintenance nightmare".

Now take one of my real loops, with 3 lines of code referencing a
dozen or so array elements, including things like a[i], a[i-2], etc.
How do you pick the names? How do you keep the programmer from making
a mistake when they modify one of the lines?

I don't see why Dan thinks I should use a hard-to-optimize language
for numerical programming, which requires the programmer to do common
sub-expression elimination, when I can use a simple and fast language
instead. Having the compiler do a simple optimization seems much more
reliable than having the programmer optimize and maintain the code
segment, or have a compiler for an unnecessarily compliated language
compile the code segment.

By the way, this code has loops which can be unrolled 3 or 5 times and
keep loaded array values from previous iterations for future
iterations. Writing this out by hand is a nightmare when you have to
permute the temporary register variables. But it's not that difficult
to add such a feature to most Fortran compilers.

anw@maths.nott.ac.uk (Dr A. N. Walker) (11/09/90)

In article <PCG.90Nov3185931@athene.cs.aber.ac.uk> pcg@cs.aber.ac.uk
(Piercarlo Grandi) writes:

>Note that many links or nodes may have the same info, but a specific
>link or node will be uniquely identified by context; if it cannot, e.g.
>a two node graph like this:
>
>		"B"
>	   .-------------.
>	  "A"		"A"
>	   \_____________/
>		"B"
>
>then the graph is inherently ambiguous.

	Well, to use your notation of "top", etc., I would describe
such a graph by code somewhat like

	NODE top = ("B", (left, right)),
	     bottom = ("B", (left, right)),
	     left = ("A", (top, bottom)),
	     right = ("A", (top, bottom));

in which both the disambiguation and the duality are clear.  Of course,
this is somewhat like your database.  But "top" itself does not include
a description of its own location -- that is a matter for the compiler.

>You got it the other way round: if pointers make a difference, this can
>only mean that part of an object's content are implicit in its in-memory
>location, not viceversa (what actually happens is that location includes
>a bit of content).

	No:  it means that location can be used to disambiguate objects
without the need for programmer invention.

>anw> Furthermore, if I am denied access to the algebra of locations,
>anw> then some useful ideas like anonymous objects, and like remembering
>anw> locations, become inexpressible.
>
>They are most definitely not useful at the application level,

	They most definitely are!  When I give people directions to my
house, I include things like:  "Take the third on the right, ...", "If
you get to the Hemlock Stone, you've gone too far, you should ...", and
"If you get lost, ask for Priory Island, and try again from there".  I
don't want to name all the junctions, so the first two on the right are
anonymous;  but some key locations need to be remembered.  This is all
quite similar to the usage in "pic", Brian Kernighan's graphics language:

	arrow dashed from last circle.left to last box
	line up from top of Here
	arc from start of 2nd line to 2/3 of the way between A and B

and so on.

>You have a claim that pointers are necessary, if I understand correctly.

	That's an exaggeration.  I managed without pointers perfectly
well until 1972, and wrote [IMHO] much good code by the standards of
those days.  But I (and almost everyone else) also managed without
characters [except as "caption"s], structures, enumerations, sets,
not to mention files, editors, discs, terminals, ....  All of these
things have greatly increased my productivity, and I wouldn't want
to turn back the clock.

>What you want to do is to represent entities, not be concerned about how
>the relevant encoding is done.

	Just so, which is exactly why I am happy for my entities to
include pointers to other entities, and to let the compiler worry
about how it is actually done in machine code.

-- 
Andy Walker, Maths Dept., Nott'm Univ., UK.
anw@maths.nott.ac.uk

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/09/90)

In article <1990Nov8.024049.9731@murdoch.acc.Virginia.EDU> gl8f@astsun.astro.Virginia.EDU (Greg Lindahl) writes:
> In article <2440:Nov607:32:5890@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
    [ a[i] = b[i] / c[i]; d[i] = b[i] * c[i] ]
> >Hmmm. I would almost automatically optimize the above into
> >  {
> >   register double bi = b[i];
> >   register double ci = c[i];
> >   a[i] = bi / ci;
> >   d[i] = bi * ci;
> >  }
> Great, you've just doubled the # of lines of code. This is known as
> "creating a maintenance nightmare".

I thought we discussed this before? The way you maintain hand-optimized
code is to keep around the original, clean, algebraic version, as well
as the new, ugly, optimized version. Whenever the original, clean,
algebraic version is changed, the new, ugly, optimized version must be
rewritten. You do the same thing with automated source-code optimizers.

> Now take one of my real loops, with 3 lines of code referencing a
> dozen or so array elements, including things like a[i], a[i-2], etc.
> How do you pick the names? How do you keep the programmer from making
> a mistake when they modify one of the lines?

I pick the names by very logical conventions: ai, aim2, etc. It's *very*
easy to remember. As for maintenance, see above.

> I don't see why Dan thinks I should use a hard-to-optimize language
> for numerical programming, which requires the programmer to do common
> sub-expression elimination, when I can use a simple and fast language
> instead.

I don't recommend that you switch from Fortran to C, unless you need
something that the latter provides. I don't use Fortran simply because I
can't stand the maintenance hell when I mistype a variable name. But if
you don't mind that occasional problem, and if you don't need the system
services or dynamic allocation of C, certainly stick to Fortran.

> By the way, this code has loops which can be unrolled 3 or 5 times and
> keep loaded array values from previous iterations for future
> iterations.

Array unrolling is so mechanical that I would trust the compiler to do
it (and I wish optimizer writers would concentrate on taking hints from
the programmer and doing this well, rather than complex optimizations
that humans still do a much better job with!)

> Writing this out by hand is a nightmare when you have to
> permute the temporary register variables.

It isn't that bad. I regularly unroll loops several times. The secret is
macros. Only one of the compilers I use actually understands unrolling,
and I often get 30% improvements on the others with very little work.

---Dan

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/09/90)

In article <1990Nov8.174042.24789@maths.nott.ac.uk> anw@maths.nott.ac.uk (Dr A. N. Walker) writes:
> >What you want to do is to represent entities, not be concerned about how
> >the relevant encoding is done.
> 	Just so, which is exactly why I am happy for my entities to
> include pointers to other entities, and to let the compiler worry
> about how it is actually done in machine code.

Exactly! I really do think about data structures as boxes containing
other boxes, some of which contain arrows to boxes. When I draw them,
that's how I draw them, not as some abstract ``list'' or ``tree'' or
``recursive data structure'' (not that I know how you draw those).

---Dan

pcg@cs.aber.ac.uk (Piercarlo Grandi) (11/10/90)

On 8 Nov 90 17:40:42 GMT, anw@maths.nott.ac.uk (Dr A. N. Walker) said:

anw> Furthermore, if I am denied access to the algebra of locations,
anw> then some useful ideas like anonymous objects, and like remembering
anw> locations, become inexpressible.

pcg> They are most definitely not useful at the application level,

anw> They most definitely are!  When I give people directions to my
anw> house, I include things like: "Take the third on the right, ...",
anw> "If you get to the Hemlock Stone, you've gone too far, you should
anw> ...", and "If you get lost, ask for Priory Island, and try again
anw> from there".

This is an *implementation*.

I think we have very different ideas on what is application design and
what is implementation design, and in particular between data design and
representation design. More precisely, IMNHO you don't make much
distinction between the two aspects. Unfortunately.

Just to restate my position, while we are talking past each other: when
doing data design pointers are extraneous and therefore unwelcome
details (and so Chris Holt is right). When doing representation design
they are an invaluable technology (and so you are right too).

For application oriented languages (SQL, Hermes) pointers are
questionable; for implementation oriented languages (C, Algol 68)
pointers are important (and Andy Lowry is somehow right on both
accounts too).
--
Piercarlo Grandi                   | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

pcg@cs.aber.ac.uk (Piercarlo Grandi) (11/10/90)

On 9 Nov 90 06:24:07 GMT, brnstnd@kramden.acf.nyu.edu (Dan Bernstein) said:

brnstnd> In article <1990Nov8.174042.24789@maths.nott.ac.uk>
brnstnd> anw@maths.nott.ac.uk (Dr A. N. Walker) writes:

(probably me)> What you want to do is to represent entities, not be
(probably me)> concerned about how the relevant encoding is done.

anw> Just so, which is exactly why I am happy for my entities to include
anw> pointers to other entities, and to let the compiler worry about how
anw> it is actually done in machine code.

Pah. pointers are just a thin veneer on the underlying implementation.
The thing you want to implement is a *relationship*; a pointer is a way
to implement a relationship, and an address is a way to represent a
pointer. Both are by no means the only choices. When designing the
application, neither level is necessary or useful (except that xertain
8design* chocies may be legitimately be influenced by the expected
relative costs of the possible implementations and representations).

brnstnd> Exactly! I really do think about data structures as boxes
brnstnd> containing other boxes, some of which contain arrows to boxes.
brnstnd> When I draw them, that's how I draw them, not as some abstract
brnstnd> ``list'' or ``tree'' or ``recursive data structure'' (not that
brnstnd> I know how you draw those).

Ah no, this cannot pass. Lists and trees are *implementations*, and when
you draw them you draw specific encodings of such implementations.
Designs are quite a different thing.

The way *I* think about data structures as typed mappings over entities
(entity-category-relationship!), and then I wonder which is the best
implementation for the mapping, based on things like time and space
complexity for the most frequent operations needed, cost of
encoding/decoding, cardinality of the domains, sparseness (trees and
lists are both good for sparse mappings, trees are faster for random
access, lists have smaller space overhead, etc... -- boxes enter the
picture only when I have to represent the implementations).

	Please, please have a reread of Hoare's seminal paper on data
	abstraction in "Structured Programming", Academic Press. And
	when you have finished it, go read the next essay on control
	abstraction by Dahl. Another very recommended source is
	Wiederhold's "Database Design", McGraw Hill, second part (the
	first part is about implementation and representation).

IMNHO we are still quite far apart on having compatible notions of the
distinction between design and implementation, and the difference
between the design of an implementation (a pointer that is encoded as an
address) and the implementation of a design (a pointer that represents a
relationship).

Abstraction level mixup, I am afraid. I wish that data design were
considered a fundamental discipline, not just an esoteric trick for
DBAs. Most large programs have more sophisticated data designs than many
databases, yet DBAs are told about data design 9with a smattering of
implementation) and programmers are only told about representations
(with a smattering of implementation).
--
Piercarlo Grandi                   | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

gudeman@cs.arizona.edu (David Gudeman) (11/12/90)

In article  <PCG.90Nov10155309@odin.cs.aber.ac.uk> Piercarlo Grandi writes:
]
]Pah. pointers are just a thin veneer on the underlying implementation.
]The thing you want to implement is a *relationship*; a pointer is a way
] to implement a relationship

Pah.  Pointers are a direct representation of a fundamental way to
visualize relations.  The thing I want to implement is a relationship
that I am visualizing as a pointer.

You are abusing the term "implement".  The correct way to say this is
that a pointer is one way to _represent_ or _visualize_ a
relationship.  "Implement", implies a machine implementation, and it
is just false to suggest that people would not visualize pointers if
they were not implementing things on machines.  The concept of one
object referencing another pre-dates computers.

]Ah no, this cannot pass. Lists and trees are *implementations*, and when
]you draw them you draw specific encodings of such implementations.

No, this cannot pass.  Lists and trees are mental structures that
humans use to visualize things.  You can call them relations, you can
"implement" them mathematically as relations, but that does not change
the fact that they are in fact fundamental forms of visualization.
And when we draw them we certainly do not draw specific encodings.
In a list, X follows Y because our mental image of the list has X
following Y.  In a tree, there is an arrow from X to Y because in our
mental image X is related to Y by the relationship that the arrow
represents.

]The way *I* think about data structures as typed mappings over entities
](entity-category-relationship!),

I think you visualize data structures geometrically like everyone
else.  Then you mentally convert these natural visualizations into
unatural mathematical abstractions and pretend that the abstraction is
the original.

There is some justification for this in mathematics, in the case that
mathematicians are trying to prove that the structures they visualize
are well-founded.  But once a certain class of structures and
operations on those structures is known to be well-founded, there is
no reason to keep "implementing" them in terms of baroque mathematical
abstractions.
-- 
					David Gudeman
Department of Computer Science
The University of Arizona        gudeman@cs.arizona.edu
Tucson, AZ 85721                 noao!arizona!gudeman

peter@ficc.ferranti.com (Peter da Silva) (11/14/90)

Piercarlo Grandi:
> ]Ah no, this cannot pass. Lists and trees are *implementations*, and when
> ]you draw them you draw specific encodings of such implementations.

David Gudeman:
> No, this cannot pass.  Lists and trees are mental structures that
> humans use to visualize things.

While I'm generally in the bit-bangers camp, I must agree with Piercarlo here.
The basic things that people work with are "chunks of data". Lists, trees,
tries, and so on are methods of implementing these databases in efficient
ways. There's nothing fundamental about them.

Unfortunately... in the real world they *are* vital.

The problem is that there's no language I know of that really lets you get
away with writing your program in terms of relations and get within an order
of magnitude in speed of a well-written lower level implementation. The
languages that so let you work at this level tend to be DBMSes and those
database language/code generator hybrids called 4GLs. And they're all slow
as molasses in January.
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.   'U`
peter@ferranti.com

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/14/90)

In article <-M.6315@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:
> Piercarlo Grandi:
> > ]Ah no, this cannot pass. Lists and trees are *implementations*, and when
> > ]you draw them you draw specific encodings of such implementations.
> David Gudeman:
> > No, this cannot pass.  Lists and trees are mental structures that
> > humans use to visualize things.
> While I'm generally in the bit-bangers camp, I must agree with Piercarlo here.
> The basic things that people work with are "chunks of data". Lists, trees,
> tries, and so on are methods of implementing these databases in efficient
> ways. There's nothing fundamental about them.

I am in the bit-bangers camp and must agree with Dave here. Although it
would be nice to imagine the computer as accepting some abstract,
mathematical version of what I wanted to compute and spitting out some
results, that's not how it works on the lowest levels, or on any level
showing any efficiency.

I am translating a computation into a form that a machine can
understand. That final form may not be ``fundamental,'' but it *is* the
mental structure that I have to work with. In real programming, I find
myself working with a form that includes arrays and pointers and so on.
So those are the mental structures I deal with. That's why I think Dave
is right.

If I didn't care about finishing anything, I might use a functional
language. (Actually, I shouldn't be nasty---I hear SETL is becoming
rather efficient after all these years. It's imperative, but it has a
lot of the structures Jim seems to love.) So the final form I worked
with might include some more abstract type, and I'd probably agree more
with Piercarlo that relations and trees are higher level than pointers
and arrays.

Can we all agree to disagree now?

---Dan

gudeman@cs.arizona.edu (David Gudeman) (11/14/90)

In article  <-M.6315@xds13.ferranti.com> Peter da Silva writes:
]David Gudeman:
]> ...  Lists and trees are mental structures that
]> humans use to visualize things.
]
]While I'm generally in the bit-bangers camp, I must agree with Piercarlo here.
]The basic things that people work with are "chunks of data".

Well, I'm _not_ in the bit-bangers camp and never have been.  You have
to drag me kicking and screaming to get me anywhere near an assembler.
I _never_ look at the assembler output of my C code, and I prefer not
to program in C if I don't have to.  My favorite languages have no
obvious relationship between the language constructs and the object
code.  I even like interpreted languages.  Can we all stop assuming
that just because someone says that pointers can be abstract objects,
that this means the speaker doesn't know what an abstraction is?
Maybe I have an intuition here that you all are not seeing because you
refuse to posit the possibility even long enough to consider it.

My ideal language would have APL-style arrays, SETL-style sets,
Icon-style lists and strings (Lisp lists and strings are too low-level
for my taste), Prolog-style terms, Emacs-style buffers, associative
lookup tables, first-class everything, and lots of other things that
would horrify a bit-banger.  It would also have high-level pointers
with pointer arithmetic.  The pointers would be high-level in the
sense that they could not be implemented as simple addresses, nor
could pointer operations be implemented one-to-one as machine
instructions.  Bit-bangers would be as horrified by my pointers as
by my arrays.  Efficiency has _no_, I repeat _no_ influence on my
opinion about pointers.

Pant, pant.

And furthermore...

] Lists, trees,
]tries, and so on are methods of implementing these databases in efficient
]ways. There's nothing fundamental about them.

"tries"!?  Eek.  There is no relationship between tries and trees
except that tries are often visualized as trees.  Implying that these
are at the same level of abstraction is like implying that a list is
at the same level of abstraction as a linked list.  I'm not the one
having trouble seperating concept from implementation here.

Did the early linguists know that they were working with low-level,
machine-oriented representations when they started drawing sentence
diagrams as trees?
-- 
					David Gudeman
Department of Computer Science
The University of Arizona        gudeman@cs.arizona.edu
Tucson, AZ 85721                 noao!arizona!gudeman

pcg@cs.aber.ac.uk (Piercarlo Grandi) (11/14/90)

On 13 Nov 90 20:58:19 GMT, peter@ficc.ferranti.com (Peter da Silva) said:

peter> Piercarlo Grandi:
pcg> Ah no, this cannot pass. Lists and trees are *implementations*, and
pcg> when you draw them you draw specific encodings of such
pcg> implementations.

peter> David Gudeman:
gudeman> No, this cannot pass.  Lists and trees are mental structures
gudeman> that humans use to visualize things.

peter> While I'm generally in the bit-bangers camp, I must agree with
peter> Piercarlo here.

I am a bit banger too, mate...

peter> Lists, trees, tries, and so on are methods of implementing these
peter> databases in efficient ways. There's nothing fundamental about
peter> them. [ ... ] The problem is that there's no language I know of
peter> that really lets you get away with writing your program in terms
peter> of relations and get within an order of magnitude in speed of a
peter> well-written lower level implementation.

As to this I have good news! DBMSes and 4GLs are often (anectodal but
frequent evidence) impressively *faster* than any ad-hoc program that
does the same. NOTE: *the same*, not similar but much much simpler.

If you are manipulating large masses of data in complicated ways, which
is most people want to do, a well written DBMS is going to be way ahead
of any well written lower level solution. Notice that you *can* for a
very important problem outrun a well written DBMS with diabolically well
written low level code, but this is a rare case.

The same applies for example to sort utilities -- normally it takes a
lot of effort to beat a well written generic sort utility on any system
I know of.

The database approach is also being applied to in-memory databases, both
relational or OO. They are usually well within an order of magnitude of
using explicit ad hoc trees and lists, and may be even faster, if (and
again very few are well written) they pay attention to paging issues,
which normally even well written tree and list packages ignore.

	For example: an incore database may use B+ trees, which have an
	immensely better, if its blocks are suitably aligned and sized,
	locality profile than most garden variety trees.

Again, many programs, either in-memory or on-disk, are sophisticated and
large enough to be considered minro DBMSes. Too bad they are not
designed as such...
--
Piercarlo Grandi                   | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

anw@maths.nott.ac.uk (Dr A. N. Walker) (11/15/90)

In article <-M.6315@xds13.ferranti.com> peter@ficc.ferranti.com (Peter
da Silva) writes:

>While I'm generally in the bit-bangers camp, I must agree with Piercarlo here.
>The basic things that people work with are "chunks of data". Lists, trees,
>tries, and so on are methods of implementing these databases in efficient
>ways. There's nothing fundamental about them.

	While I'm generally in PdS's camp, I must agree with David
Gudeman here!

	*Sometimes* one works with "chunks of data".  In that case, yes,
you get to implement your database however you like, and all PierCarlo's
distinctions between data design and implementation design *may* come
into play.  But many, even most, of the real problems I have had to
program come with some structure already built in.

	For example, when analysing a chess position one naturally
builds a tree.  If you like, you can describe that in a relational
database and then observe "Hey, this would look nice as a tree";
I would prefer to skip those steps!  Obviously one should keep an
eye on the data design, in case things start creaking elsewhere,
but as long as things run smoothly, why not stay with the obvious?
Of course, *then* you have to implement this tree, and we get into
the usual paraphernalia that it can run mostly as a stack, but you
also need transposition tables, etc.;  but that doesn't stop the
tree being fundamental.

	Equally, when I wrote my exam-mark analysing program, it was
handed a *list* of marks;  again, you can squeeze it into a database
if you like, but *why*?  Just to discover that you really have a list?
You can implement it as an array or as a list with real pointers, as
you like, but the list was fundamental.  When I look at routes to my
house, I start with a graph;  twist it how you will, the graph is
*there*;  it might get implemented any number of ways, including
database look-up, but you can't wish away the graph.  Nor can you
wish away pointers -- *not* just as implementation details.  You
don't always have to start with pointers, and you don't always have
to implement with them, but they exist, and they are natural, and
convenient.

	As to whether they should be first-class objects -- well,
why not?  Surely, if we have learned anything from the last 20+ years
of structured programming, it should be that relegating objects to
the second class is counterproductive;  it just stops you from using
them in natural ways.

-- 
Andy Walker, Maths Dept., Nott'm Univ., UK.
anw@maths.nott.ac.uk

new@ee.udel.edu (Darren New) (11/15/90)

There aer two uses for pointers that heiarchical data structues do not seem
to addess, but which are nonetheless vey useful. 

1) changing one object into another.  Smalltalk supports this with the 
  "become:" operator.  Basically, "A become: B" makes all references to
  B now be references to A and vica versa.  I imagine that this could be
  done with the aliasing that is proposed in Hermes. (BTW, is there a 
  reference to whee I could learn more about Hermes?)  However, in Smalltalk,
  this is almost invaiably used in one of two situations: 
  a) Objects have a fixed size once created. To add more elements to a set (A),
     make a new set (B) which is larrger, copy A into B, and then "A become: B".
  b) To break circular dependancies for the garbage collector. That is,
     "bigStructureWithPossiblyCircularPointerSomewhere become: String new"
     will cause all references to that structure to be replaced with something
     which has no out-going references, hence allowing it to be GCed even if
     through some bug you have accidentally not discarded the path from
     the root diectory to the circular structure.  You may get errors later
     because the bigStructure... does not handle the same messages as
     Strings do, but it's better than having all that hanging around forever
     because of a programming mistake.

2) iterating over all objects (or all objects of a certain type). For example,
   it is not uncommon to have code to iterate over all objects that reference
   object X, or to iterate over all functions that have a reference to
   a paticular global variable, or whatever.  Unless you can declare a 
   vaiable that can be aliased to *any* value of a particular type (in which
   case I would call that variable a pointer), this extreamly handy operation 
   cannot be handled in Hermes. This is useful when making global munges
   of in-core data (like when adjusting the format of data). It is also useful
   when there is a rare or unplanned operation in which the natural 
   structuring of the data is not appropriate, which happens in relational
   databases all the time. For example, it is rare to ask "find all functions
   with the string 'xyzzy' in the comment field" or "find all source files
   holding a function which contains a local variable whose name ends in 'W'".
   It would be silly to build such things into the data structure repesenting
   compiled code, but the ability to iterate over source files, functions,
   and so on makes it possible.

			     -- Darren
			     (All typos to be blamed on an old Z19)

-- 
--- Darren New --- Grad Student --- CIS --- Univ. of Delaware ---
----- Network Protocols, Graphics, Programming Languages, 
      Formal Description Techniques (esp. Estelle), Coffee -----

gl8f@astsun.astro.Virginia.EDU (Greg Lindahl) (11/16/90)

In article <23461:Nov904:59:2290@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:

[ I give an example where FORTRAN optimizes well, but C must be
  hand-optimized. Dan claims "But look, I can get the same performance
  by tweaking the code."]

I reply:
>> Great, you've just doubled the # of lines of code. This is known as
>> "creating a maintenance nightmare".
>
>I thought we discussed this before? The way you maintain hand-optimized
>code is to keep around the original, clean, algebraic version, as well
>as the new, ugly, optimized version.

I know how to maintain code.

The point is that it is always more difficult to maintain hand-tweaked
code. And the reason that C sucks rocks for that code sequence is
because of aliasing.

This is the fun part of "talking" to Dan. He will quibble until the
cows come home. He will tell you "It's ok if I have to unroll 10,000
loops by hand and do common sub-expression elimination by hand, C is
still a great language for expression array routines." He will tell
you Fortran is maintinance hell because he doesn't know how to use
IMPLICIT NONE or the portable equivalent. And, best of all, at the
same time as I am demonstrating that there are cases in which C
pointers cause big optimization problems he will claim in another
thread that pointers are never less efficient than arrays. Sure, Dan,
we all have lots of time to tweak code by hand when an optimizer can
do it for us. Isn't that why HLL's were invented?

Well, I think I'm joining Jim Giles. Sorry that nobody liked Dan's
great theories on comp.theory, looks like comp.lang.misc has a similar
opinion.

Bye.

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/16/90)

In article <1990Nov16.042141.21130@murdoch.acc.Virginia.EDU> gl8f@astsun.astro.Virginia.EDU (Greg Lindahl) writes:
> In article <23461:Nov904:59:2290@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
> [ I give an example where FORTRAN optimizes well, but C must be
>   hand-optimized. Dan claims "But look, I can get the same performance
>   by tweaking the code."]

I was just making a simple observation about the code example you
posted. I tried to avoid controversy there because I didn't want to
distract people from the following points:

1. The compiler can do quite a bit of aliasing analysis without taking
any hints from the programmer (and respecting separate compilation).
I've inspected several large Fortran programs, and in each one I observe
that this sort of analysis would suffice to detect the aliasing even if
aliasing weren't prohibited beforehand. In other words, C doesn't have a
huge disadvantage over Fortran just because it allows aliasing.

2. The compiler cannot optimize pointer-free code into pointer code in
all cases. Hence there are cases where a C array program using pointers
will be faster than a Fortran array program, no matter how good the
Fortran optimizer is. The language has to provide some form of pointer
arithmetic (i.e., array shifting). If I understand Jim correctly, the
new Fortran does exactly that, and so I'm willing to believe that it can
express array computations as efficiently as C in all cases.

Now returning to the side issue:

> >> Great, you've just doubled the # of lines of code. This is known as
> >> "creating a maintenance nightmare".
> >I thought we discussed this before? The way you maintain hand-optimized
> >code is to keep around the original, clean, algebraic version, as well
> >as the new, ugly, optimized version.
> I know how to maintain code.

I believe you.

> The point is that it is always more difficult to maintain hand-tweaked
> code.

Agreed, but hand optimization is still a viable technique. Sometimes
it's the only choice.

> He will tell
> you Fortran is maintinance hell because he doesn't know how to use
> IMPLICIT NONE or the portable equivalent.

There isn't any portable equivalent. On some machines, if I want to use
Fortran, I have to deal with Fortran IV compilers. I'd rather use Pascal
than deal with these syntactic problems.

> And, best of all, at the
> same time as I am demonstrating that there are cases in which C
> pointers cause big optimization problems he will claim in another
> thread that pointers are never less efficient than arrays. Sure, Dan,
> we all have lots of time to tweak code by hand when an optimizer can
> do it for us. Isn't that why HLL's were invented?

See, here's where you simply fail to pay attention to the main points of
the thread. A sufficiently smart compiler *can* do the job for C,
without *any* hand optimization. Current compilers happen not to be that
smart. Would you please stop misrepresenting my position?

Now I'd like everybody to notice something. Jim made exactly the same
argument about arrays as I've made in the above paragraph about
aliasing. He said that a sufficiently smart compiler can find the right
base to precompute for Fortran array references. He said that current
compilers happen not to be that smart.

The difference is that Jim didn't post the method that his sufficiently
smart compiler would use. I did the job. I showed what it would take to
deal with aliasing. Jim just waved his hands and made some vague
assertions about how arrays are always as efficient as pointers. And so,
as it turns out, he was completely wrong.

Of course, Greg hasn't even made the effort that Jim has to contribute
to the discussion. He has completely ignored my comments on how the
compiler can optimize C array code with no help from the programmer.
Instead he has focused on my offhand remark about his Fortran example.
Sigh.

> Sorry that nobody liked Dan's
> great theories on comp.theory,

Actually, some people already knew what I was talking about, and
commented upon the usefulness of the technique (as two people have done
in this group). Again you ignore the discussion.

---Dan

peter@ficc.ferranti.com (Peter da Silva) (11/17/90)

In article <PCG.90Nov14151603@odin.cs.aber.ac.uk> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:
> As to this I have good news! DBMSes and 4GLs are often (anectodal but
> frequent evidence) impressively *faster* than any ad-hoc program that
> does the same. NOTE: *the same*, not similar but much much simpler.
  [mainly when manipulating large masses of data]

Sure, use the right tool for the job. I once wrote an adventure program
in APL as a learning excercise, but I'd be hard pressed to recommend it
to anyone for that purpose. It's a killer language for tasks that need a
lot of matrix operations, though. But for tasks that don't lend themselves
to the database model a DBMS is really really slow...

[also, as an aside... you can get the same sort of speed with a killer
 subroutine library that duplicates the DBMS operations embedded in your
 mid-level language: I believe Oracle sells such a beast]

And even if you're manipulating large amounts of data, you may be better
off with a lower throughput but quicker-responding mid-level language. I
wouldn't want to manipulate MIDI data in real-time with Oracle, even if you
are dealing with large masses of pretty homogenous data.
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.   'U`
peter@ferranti.com

peter@ficc.ferranti.com (Peter da Silva) (11/17/90)

You and David Gudeman are right. I don't know what was the matter with me
when I said pointers are not fundamental. Of course, they are, and in fact
I'd just got done arguing that at length in the "lisp has pointers" thread.

Hey, I've been sick.
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.   'U`
peter@ferranti.com

new@ee.udel.edu (Darren New) (11/17/90)

>pcg> Ah no, this cannot pass. Lists and trees are *implementations*, and
>pcg> when you draw them you draw specific encodings of such
>pcg> implementations.

>gudeman> No, this cannot pass.  Lists and trees are mental structures
>gudeman> that humans use to visualize things.

An associate of mine pointed out that we have spend the last several
species navigating through trees, with a high survival value to being
able to do so efficiently and without falling.  It seems entirely 
possible to me (IMHO) that trees are indeed elementary mental
structures for most humans. After all, we've only spend the last
few thousand years manipulating symbols formally. 1/2 :-) -- Darren
-- 
--- Darren New --- Grad Student --- CIS --- Univ. of Delaware ---
----- Network Protocols, Graphics, Programming Languages, 
      Formal Description Techniques (esp. Estelle), Coffee, Amigas -----
              =+=+=+ Let GROPE be an N-tuple where ... +=+=+=

2113av@gmuvax2.gmu.edu (John Porter) (11/17/90)

Piercarlo Grandi:
> ]Ah no, this cannot pass. Lists and trees are *implementations*, and when
> ]you draw them you draw specific encodings of such implementations.

David Gudeman:
> No, this cannot pass.  Lists and trees are mental structures that
> humans use to visualize things.

Peter da Silva:
I must agree with Piercarlo here.
The basic things that people work with are "chunks of data". Lists, trees,
tries, and so on are methods of implementing these databases in efficient
ways. There's nothing fundamental about them.

John Porter resonds:
I'm not normally one to be argumentative, and I find it fascinating
that "wars" can arise over topics as basic to c.s. as lists, trees, etc.
But I feel compelled to register my agreement with Mr. Gudeman here;
lists, trees, etc., ARE concepts, NOT implementations.  (That is why they
are called "abstract data types".)
Mr. Piercarlo's assertion that "when you draw them you draw specific
implementations of them" is, it seems patently obvious to me, quite false.
A circle, a box, a line, an arrowhead, they say nothing, imply nothing
about any implementation; they do not imply that there is, was, or ever
will be an implementation, in a computer program or anywhere else.
You don't say "I implemented this
chunk of data with a tree", you say "I implemented this tree with
dynamically allocated structs, each containing pointers to parent and
children", or whatever your implementation happens to be.  As a concrete
example, contrast the just-described tree implementation with the 
following:
A binary tree can be implemented in a static array, where element
number one is the head; from any given node (identified by its index into
the array), you can reach the left child by multiplying the indes by 2;
find the right child immediately past the left child.
This implemenation is sometimes called "heap", but that is confusing.
Note. No pointers. An entirely different implemenation, but a tree
none-the-less.  The concept is the same.

-- John Porter

#	"Everybody could stand a HUNDRED chest x-rays a year.
#	 They ought to have them, too."	J. Frank Parnell

dbc@bushido.uucp (Dave Caswell) (11/18/90)

.As to this I have good news! DBMSes and 4GLs are often (anectodal but
.frequent evidence) impressively *faster* than any ad-hoc program that
.does the same. NOTE: *the same*, not similar but much much simpler.
.
.If you are manipulating large masses of data in complicated ways, which
.is most people want to do, a well written DBMS is going to be way ahead
.of any well written lower level solution. Notice that you *can* for a
.very important problem outrun a well written DBMS with diabolically well
.written low level code, but this is a rare case.
.
.The same applies for example to sort utilities -- normally it takes a
.lot of effort to beat a well written generic sort utility on any system
.I know of.

This sure doesn't my experience.  My experience is that custom written
code can take a problem that takes 10-20hrs. to solve and change it into
a 10-20 minute problem.  In other words the custom written code is
what allows the solution to actually be used.

I'm not saying that Oracle or whatever isn't a very useful tool just that
there is a whole host of problems that it can't solve.  Many of these 
problems are relatively easy to hand-code using well known algorithms.
(well known means published in CACM or something similar).




-- 
David Caswell                             dbc%bushido.uucp@umich.edu

pcg@cs.aber.ac.uk (Piercarlo Grandi) (11/19/90)

On 18 Nov 90 03:15:03 GMT, dbc@bushido.uucp (Dave Caswell) said:

	[ ... DMBS ar usually better than custom written code ... ]

pcg> The same applies for example to sort utilities -- normally it takes a
pcg> lot of effort to beat a well written generic sort utility on any system
pcg> I know of.

dbc> This sure doesn't my experience.  My experience is that custom
dbc> written code can take a problem that takes 10-20hrs. to solve and
dbc> change it into a 10-20 minute problem.  In other words the custom
dbc> written code is what allows the solution to actually be used.

I would say that this is *extremely* unlikely to be common for problems
that fall within the scope of a DBMS query planner. And if these are
pathological problems, than you are right, and I had said that these
indeed require csutom coding. But I would be very skeptical of anybody's
claiming that they can routinely do better than a DBMS by using custom
code. If you want, I have anecdotal evidence that custom written DB
applications using B-tree libraries or similar low level things were
substantially slower than loading the same data into a DBMS and letting
it do its query planning.

dbc> I'm not saying that Oracle or whatever isn't a very useful tool
dbc> just that there is a whole host of problems that it can't solve.

But it solves the problems of surely more than 50% of the people that
use a computer (CS types tend to forget that computers are not used
solely to develop compilers and operating systems, or clever numerical
codes), and this is damn important.

I would say that for the extremely large class of application that can
be done with a DBMS using anything else should be justified with great
and good reasons.

For things that are not within the scope of a DBMS, there is something
equivalent to a DBMS. Before rolling one's own, if you do statistics you
had better use S or the BDMP library, and so on.

dbc> Many of these problems are relatively easy to hand-code using well
dbc> known algorithms.  (well known means published in CACM or something
dbc> similar).

And here we have a big misunderstanding. Programs and utilities are not
algorithms. A sort utility source is maybe one or two orders magnitude
larger and more sophisticated than that of its core algorithm.

Admittedly a lot of this is not value added -- it would not be there if
the environment were more cooperative. But, a lot of it is value added.

It is a common delusion to think that algorithms and programs are the
same thing (I would call it the "Dijkstra book syndrome"). To make a
useful program out of an algorithm takes a lot of effort.

Take for example sorting: if all you have to sort is a few kilobytes,
almost any algorithm will do and you can code it straightforwardly. But
if you have large masses of complicated data either you get into
difficult things like handling of external storage, or, if you have
single level storage, you must make sure you have good locality.

Just as an example of a relatively simple minded sort utility, take the
UNIX sort(1). Its logic is "relatively easy to hand code using well
known algorithms".  Have a look at the source for a demonstration :-).
I'd bet that not many programmers around can do significantly better
than the UNIX sort(1) command, even on special cases.
--
Piercarlo Grandi                   | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk