nevin@igloo.scum.com (Nevin Liber) (10/04/90)
[I added comp.lang.misc to the newsgroup list; please follow-up to the appropriate newsgroup only.] In article <64618@lanl.gov> jlg@lanl.gov (Jim Giles) writes: >It is my contention that future languages >shouldn't have pointers at all. Not just no C-like pointers, none at >all. I just picked on C as the most unpleasant example of what I'm >against. I really hate to agree with you Jim :-), but I'm beginning to think that you are right. The only real argument I can see _for_ having pointers is efficiency; more specifically, to help in hand-optimisation. Extensions to C such as C++ are showing that pointers aren't needed nearly as much as they use to be; heck, code seems to be more readable w/o them. In languages such as Icon and LISP I find that I don't even miss them. -- NEVIN ":-)" LIBER nevin@igloo.Scum.com or ..!gargoyle!igloo!nevin (708) 831-FLYS California, here I come! Public Service Announcement: Say NO to Rugs!
nevin@igloo.scum.com (Nevin Liber) (10/04/90)
[I added comp.lang.misc to the list of newsgroups; please follow-up to the appropriate newsgroup ONLY.] In article <5088@uqcspe.cs.uq.oz.au> brendan@batserver.cs.uq.oz.au writes: >The percieved need for >side-effects in terms is merely a by-product of the poor state of language >design, and would not be missed at all in better languages. I disagree. This would throw out all functions which maintain their own state (eg: i/o). Heck, you might ask why we even have variables? Even the LISP community gave into this as being a helpful programming technique. -- NEVIN ":-)" LIBER nevin@igloo.Scum.com or ..!gargoyle!igloo!nevin (708) 831-FLYS California, here I come! Public Service Announcement: Say NO to Rugs!
sommar@enea.se (Erland Sommarskog) (10/08/90)
Nevin Liber (nevin@igloo.UUCP) writes: )Jim Giles (jlg@lanl.gov) writes: ))It is my contention that future languages ))shouldn't have pointers at all. Not just no C-like pointers, none at ))all. I just picked on C as the most unpleasant example of what I'm ))against. ) )I really hate to agree with you Jim :-), but I'm beginning to think )that you are right. The only real argument I can see _for_ having )pointers is efficiency; more specifically, to help in )hand-optimisation. Extensions to C such as C++ are showing that )pointers aren't needed nearly as much as they use to be; I must be missing some context here, else this doesn't make sense. Of course pointers are a necessary thing. Then it's an issue whether you make them explicit like in C, Ada or Pascal, or hide them a little and call them references like in Eiffel. -- Erland Sommarskog - ENEA Data, Stockholm - sommar@enea.se "Nelly Nilsson n|jer sig numera n{ppeligen med nio n|tter till natten"
jejones@mcrware.UUCP (James Jones) (10/08/90)
In article <2171@enea.se> sommar@enea.se (Erland Sommarskog) writes: >I must be missing some context here, else this doesn't make sense. >Of course pointers are a necessary thing. Then it's an issue whether >you make them explicit like in C, Ada or Pascal, or hide them a >little and call them references like in Eiffel. I'm missing context, too, and of course I can't know what the original posters were referring to, but I'd rather like the ability to define recursive data structures (I think Hoare wrote a paper about this sort of thing) rather than using pointers. I tend to agree--pointers are to data structure what gotos are to control flow. Speaking of other sins, though, I also hope that future language designers will bring a "human factors"/"cognitive engineering" person along for the ride. IMHO, C would have been VERY different had this happened when it was created. James Jones
lowry@arnor.uucp (10/08/90)
In article <2171@enea.se>, sommar@enea.se (Erland Sommarskog) writes: |> Nevin Liber (nevin@igloo.UUCP) writes: |> )Jim Giles (jlg@lanl.gov) writes: |> ))It is my contention that future languages |> ))shouldn't have pointers at all. Not just no C-like pointers, none at |> ))all. I just picked on C as the most unpleasant example of what I'm |> ))against. |> ) |> )I really hate to agree with you Jim :-), but I'm beginning to think |> )that you are right. The only real argument I can see _for_ having |> )pointers is efficiency; more specifically, to help in |> )hand-optimisation. Extensions to C such as C++ are showing that |> )pointers aren't needed nearly as much as they use to be; |> |> I must be missing some context here, else this doesn't make sense. |> Of course pointers are a necessary thing. Then it's an issue whether |> you make them explicit like in C, Ada or Pascal, or hide them a |> little and call them references like in Eiffel. Or hide them COMPLETELY, like in Hermes (a language for distributed computing that has been designed and implemented in prototype form by my group). The Hermes programmer has NO notion of pointer in any of its various guises. There is NO way for the programmer to specify aliasing or shared data of any kind. We provide high-level data types called "tables" that subsume most of the uses that pointers are traditionally put to (linked data structures, character strings and other arrays). Hermes uses output ports (connections to message queues called input ports) as capabilities, so passing these around subsumes the need for passing around function pointers. I personally have written a great deal of code in Hermes and have found the lack of pointers to be no hindrance. In fact, the level of compile-time checking provided in Hermes far surpasses that found in any other language that I have used, and I have found this to be a very big win. One of the reasons we can do so much at compile time is the lack of pointers. Of course, our compiler and run-time system make heavy use of pointers and shared data for reasons of efficiency, but this is completely hidden from the Hermes programmer. So now it's just a question of whether "not having pointers" and "hiding them completely from the programmer" are two different things. For all practical purposes, I'd say they are not. Hence I would disagree with your assessment (which you stated as if it were incontestable) that pointers are necessary. -- Andy Lowry, lowry@ibm.com, (914) 784-7925 IBM Research, 30 Saw Mill River Road, P.O. Box 704, Yorktown Heights, NY 10598
skrenta@cbmvax.commodore.com (Rich Skrenta) (10/09/90)
In article <2171@enea.se>, sommar@enea.se (Erland Sommarskog) writes: > > Of course pointers are a necessary thing. Then it's an issue whether > you make them explicit like in C, Ada or Pascal, or hide them a > little and call them references like in Eiffel. Suppose I eliminate pointers from my language and instead provide the programmer with high-level data structures such as lists and arrays. Won't he just use integer indexes as "pointers" into these data structures if he needs them? Isn't this what typically happens in Pascal programs, where pointers are not as versatile as they are in C? Are the programs any more readable because I say int i; a[i] to dereference instead of struct foo *p; *p ? In article <1990Oct8.135551.21639@arnor.uucp> lowry@lusitania.watson.ibm.com (Andy Lowry) writes: >The Hermes programmer has NO notion of pointer in any of >its various guises. There is NO way for the programmer to specify >aliasing or shared data of any kind. We provide high-level data types >called "tables" that subsume most of the uses that pointers are >traditionally put to (linked data structures, character strings and >other arrays). Sure, if Hermes gives me my tree, linked list and array, I probably won't need pointers as much as I do in C--the data structures have already been implemented for me. However, if I want to make something that isn't directly supported by the language, I'll have to use my integer array index "pointers" to do it. Can Hermes stop me from doing that? Rich -- Rich Skrenta skrenta@cbmvax.commodore.com
chl@cs.man.ac.uk (Charles Lindsey) (10/09/90)
In <2883@igloo.scum.com> nevin@igloo.scum.com (Nevin Liber) writes: >I really hate to agree with you Jim :-), but I'm beginning to think >that you are right. The only real argument I can see _for_ having >pointers is efficiency; .... I disagree. Other posters have suggested that recursive data structures will do all that is needed (sorry, I do not know how to make this newsreader followup to two postings simultaneously ;-) ). The situation when pointers are the only solution is when the real world situation that you are modelling is best represented by a graph. Then you encounter the situation where you may need to know whether two nodes are equivalent (in the recursive sense that their connectees are equivalent), or whether they are identically the same node. The difference is important if you need to modify one of those nodes (do you expect the other to get changed as well). Here is an example. LET twice: BE lambda(x: ) x + x; That is an expression in some language, and one would expect to represent 'x + x' by some tree. twice (1 OR 2) Here OR is a nondeterministic choice operator (some people write it as a square box, but my terminal won't oblige :-( ). It means I want twice(1) or twice(2). Now unfold the call, and we get (1 OR 2) + (1 OR 2) which one would also expect to represent by some tree. Now we have two nondeterministic choices, so let us choose 1 for the first and 2 for the second, and the result of the addition is 3. This is neither twice(1) nor twice(2). Now we really ought to have represented these expressions by graphs. | + | \ / \ / | x which unfolds to | + | \ / \ / | 1 OR 2 Now with pointers, this is easily implemented. But with recursive data types it is impossible (at least in any straightforward way - can anyone tell me how to do this in a functional language). I can do it in Smalltalk, where the pointers certainly exist, and so near to the surface that you can hardly fail to notice them. In particular, Smalltalk provides two equality operators: '=' to compare values and '==' to compare pointers.
larus@primost.cs.wisc.edu (James Larus) (10/09/90)
In article <1990Oct8.135551.21639@arnor.uucp>, lowry@arnor.uucp writes: ... Long discussion of the abscence of pointers in the Hermes language... |> |> Of course, our compiler and run-time system make heavy use of pointers |> and shared data for reasons of efficiency, but this is completely |> hidden from the Hermes programmer. |> Does this mean that Hermes programmers aren't concerned with efficiency? /Jim
chased@rbbb.Eng.Sun.COM (David Chase) (10/10/90)
skrenta@cbmvax.commodore.com (Rich Skrenta) writes: >Suppose I eliminate pointers from my language and instead provide the >programmer with high-level data structures such as lists and arrays. >Won't he just use integer indexes as "pointers" into these data structures >if he needs them? Isn't this what typically happens in Pascal programs, >where pointers are not as versatile as they are in C? Are the programs >any more readable ... You're right, mostly, but you're missing one thing. Pointers (as implemented in C and C++) are generated by the programmer, either by taking the address of something, or by allocating new memory. In C, C++, Pascal, Modula-2, BCPL, and PL/1, the programmer is responsible for ensuring that whatever the pointer points to is not reused while the pointer is in use. Getting this wrong leads to messy bugs that are not (necessarily) repeatable from machine to machine, or from malloc version to malloc version. If integers in arrays are used instead of pointers, then the bug is at least confined to an object of the same type (more or less -- sufficiently clever programming will produce more clever bugs), and the bug is repeatable from machine to machine, and malloc version to malloc version. Hidden in this is the unfortunate fact that the programmer may need to write his own allocator within the array. In practice, this is not been that big a problem for me, since these little allocators deal in a fixed arena with uniform objects (unlike malloc). In practice, expandable arrays often suffice, and allow me to make strong assertions about what bugs cannot exist in my code. David
jlg@lanl.gov (Jim Giles) (10/10/90)
From article <14972@cbmvax.commodore.com>, by skrenta@cbmvax.commodore.com (Rich Skrenta): > [...] Are the programs > any more readable because I say > int i; a[i] > to dereference instead of > struct foo *p; *p ? Yes! Because, a[i] and b[j] are guaranteed _not_ to be aliased. Whereas *p and *q might be or, then again, might not be. Further, the syntactic for (arrays in this example) will give some clue as to how the variables will be used. The pointer syatax _may_ be used to simulate arrays, but you might be planning to use it for dynamic memory, strings, recursive data structures, run-tim equivalencing, etc.. How does the reader know that the pointer will not be used in any of those ways - he knows the array won't be. Each of these features should have separate syntax since they are separate features. Forcing them all to masquerade as pointers only confuses the person maintaining the code - and doesn't give the compiler enough information to adequately optimize. J. Giles
anw@maths.nott.ac.uk (Dr A. N. Walker) (10/10/90)
Like everyone else over here, I'm missing the (initial) context, but I found this interesting: In article <1990Oct8.135551.21639@arnor.uucp> lowry@lusitania.watson.ibm.com (Andy Lowry) writes: >I personally have written a great deal of code in Hermes and have >found the lack of pointers to be no hindrance. [...] >Of course, our compiler and run-time system make heavy use of pointers >and shared data for reasons of efficiency, but this is completely >hidden from the Hermes programmer. In other words, here is a construct which is incredibly useful for writing compilers and other programs, but which the ordinary programmer is unable to use. I don't think languages should hide useful abstractions from programmers; they should hide annoying concretions instead. I don't really want to know the machine address of a variable, or how to represent a conditional jump in machine code, or which registers need to be saved over a procedure call. But I do want to be able to express within my language all the abstractions that naturally arise in modelling my problem. These may well include pointers ["keep a finger on that variable"], even when there are no `data structures' around. When, as a mathematician [well, as a game player, really], I think of a tree or a graph, I think naturally and easily of each node as being some information plus a collection of pointers to other nodes. I draw them that way. You may think of them differently; but I think I should be allowed, within my chosen language, to program them my way, and not have to pummel them into some other shape. Yes, of course compile-time checking is a Good Thing, but the semantics that enable this should work with rather than against the programmer. Run-time checking is also a Good Thing; if doing it causes efficiency problems, then compilers and hardware need to be better. -- Andy Walker, Maths Dept., Nott'm Univ., UK. anw@maths.nott.ac.uk
lowry@arnor.uucp (10/10/90)
In article <14972@cbmvax.commodore.com>, skrenta@cbmvax.commodore.com (Rich Skrenta) writes: |> In article <1990Oct8.135551.21639@arnor.uucp> lowry@lusitania.watson.ibm.com (Andy Lowry) writes: |> |> >The Hermes programmer has NO notion of pointer in any of |> >its various guises. There is NO way for the programmer to specify |> >aliasing or shared data of any kind. We provide high-level data types |> >called "tables" that subsume most of the uses that pointers are |> >traditionally put to (linked data structures, character strings and |> >other arrays). |> |> Sure, if Hermes gives me my tree, linked list and array, I probably won't |> need pointers as much as I do in C--the data structures have already been |> implemented for me. |> |> However, if I want to make something that isn't directly supported by the |> language, I'll have to use my integer array index "pointers" to do it. |> Can Hermes stop me from doing that? |> |> Rich |> -- |> Rich Skrenta |> skrenta@cbmvax.commodore.com The view in Hermes is that the programmer should not be concerned with how data values are laid out in memory. Hermes does not give the programmer direct access to "tree, linked list and array." Instead, there is a very simple high-level data type called "table," with which one can describe homogeneous collections of data. The most basic table type definition is: t: table of x {full}; where X is any other datatype. This says that a value of type T is a collection of values of type X, each of which is fully initialized (this relates to another unique feature in Hermes called "typestate" that I won't go into here). If you want to operate on a specific element of the table, you can do so with statements like: a := x in t where (x.color = 'red' and x.weight > 10000); To iterate over a subtable you do something like: for x in t where (x.color <> 'blue') inspect begin ... end for; There are other statements for inserting and removing elements, merging tables, and extracting or copying out subtables. If, in your collection of data, position is important, then you add the "ordered" keyword to your table definition: t: ordered table of x {full}; Then you can do things like: pos := position of x in t where (x.name = 'xyzzy'); a := t[pos]; a := x in t where (position of x = pos); (The latter two are two different syntaxes for precisely the same statement.) If your table has keys in the relational sense (meaning all elements must be unique with respect to each key), you can specify them: t: ordered table of x {full} keys (id) (a,b); This would describe a table in which no two elements can have the same id, or the same combination of a and b values. The above is EVERYTHING you need to know about tables as a Hermes programmer. The compiler and run-time decide data representations and algorithms for accessing your data. Asking what Hermes would do if you wanted to implement a data structure that we don't support misses the point. With Hermes, data representations are implementation details that the programmer need not be concerned with. Instead, the programmer is left to express programs at a higher level of abstraction, where programming is easier and less error-prone. Of course, our language support does not allow all possible relationships among elements of tables or among fields of an element to be captured, but we're working on that (with user-defined constraints verified and maintained via our typestate mechanism). I should also mention that we have a notion of "pragmas" supplied by the programmer that very well may take the form of *suggestions* for data representations. Pragmas, however, in no way change the semantics of programs, and the compiler and run-time system are never required to adhere to them. We have done very little in the way of defining and supporting pragmas to date. -- Andy Lowry, lowry@ibm.com, (914) 784-7925 IBM Research, 30 Saw Mill River Road, P.O. Box 704, Yorktown Heights, NY 10598
cik@l.cc.purdue.edu (Herman Rubin) (10/10/90)
In article <2883@igloo.scum.com>, nevin@igloo.scum.com (Nevin Liber) writes: > In article <64618@lanl.gov> jlg@lanl.gov (Jim Giles) writes: > >It is my contention that future languages > >shouldn't have pointers at all. Not just no C-like pointers, none at > >all. I just picked on C as the most unpleasant example of what I'm > >against. > I really hate to agree with you Jim :-), but I'm beginning to think > that you are right. The only real argument I can see _for_ having > pointers is efficiency; more specifically, to help in > hand-optimisation. Extensions to C such as C++ are showing that > pointers aren't needed nearly as much as they use to be; heck, code > seems to be more readable w/o them. In languages such as Icon and > LISP I find that I don't even miss them. I must strongly object. When one considers what a variable or any other type of reference is, it is a pointer. Fortran originally only used pointers, as there was no call by value. I cannot imagine a language for mathematical use which does not have variable arrays, functions, etc. These are all expressed, consciously or not, as pointers. Now if one can have fixed pointers, why not variable pointers? We are no longer in the days when the only way to call a subroutine was to have the address in the instruction. One has to go through contortions in Fortran to use a subroutine which has a variable function or subroutine as an argument. Now there are developments in programming languages which should have been apparent way back when which may decrease the need for some uses of pointers, but why hobble the intelligent programmer? I have deliberately used arrays of pointers as the most convenient means of handling array refill procedures, and while I can find a way around it, it is clumsy and even less portable. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet) {purdue,pur-ee}!l.cc!cik(UUCP)
lowry@arnor.uucp (10/10/90)
In article <11429@spool.cs.wisc.edu>, larus@primost.cs.wisc.edu (James Larus) writes: |> In article <1990Oct8.135551.21639@arnor.uucp>, lowry@arnor.uucp writes: |> ... Long discussion of the abscence of pointers in the Hermes language... |> |> |> |> Of course, our compiler and run-time system make heavy use of pointers |> |> and shared data for reasons of efficiency, but this is completely |> |> hidden from the Hermes programmer. |> |> |> |> Does this mean that Hermes programmers aren't concerned with efficiency? We believe that getting a program logically correct and making it efficient should be two independent tasks. The Hermes language allows this separation by not requiring the programmer to make decisions that will determine efficiency. Data representation is just one example of this. Others are replication and migration of processes and/or data, concurrent execution of programs written in serial form, etc. It is part of our continuing research to develop very aggressive and high-level optimizations of this sort so that programs written in our model will, in fact, execute efficiently in a wide range of environments. The programmer can also be involved in tuning, but this should generally come only after the program has been made correct. The tuning comes in the form of sprinkling "pragmas" around the code in various places. None of the existing code is changed as this is done, and the pragmas have absolutely no semantic effect on the program. In fact, the compiler and run-time system are allowed to completely ignore the pragmas, though typically they would not. Pragmas could take a wide variety of approaches, like "You shoudl use a bit vector to represent the following table of booleans", or "This table will most often be accessed with queries of the following form:..." or "The following section of code will insert many elements into this table without otherwise referencing the table" (so perhaps a more efficient off-line bulk insert algorithm can be chosen), etc. The first sort of pragma would be a direct suggestion by the programmer. The other is behavioural information that can help the compiler and run-time system make better choices. The latter could conceivably be generated by profiling tools, thereby making the tuning process at least semi-automated and dynamic. We have not yet addressed the pragma issue to any great extent. In particular, we have not made any decisions as to the form pragmas should take, and our compiler does not pay attention to any pragmas. Our first step, which we believe we have achieved, was to design a powerful language in which correctness and efficiency could be separated as we believe they should. -- Andy Lowry, lowry@ibm.com, (914) 784-7925 IBM Research, 30 Saw Mill River Road, P.O. Box 704, Yorktown Heights, NY 10598
pepke@gw.scri.fsu.edu (Eric Pepke) (10/11/90)
In article <2884@igloo.scum.com> nevin@igloo.scum.com (Nevin Liber) writes: > In article <5088@uqcspe.cs.uq.oz.au> brendan@batserver.cs.uq.oz.au writes: > >The percieved need for > >side-effects in terms is merely a by-product of the poor state of language > >design, and would not be missed at all in better languages. > > I disagree. This would throw out all functions which maintain their > own state (eg: i/o). It would throw out *all* output, with or without internally readable state information. Setting a certain pixel to a certain color is a side effect, as is making a UART spit out a character, as is sending a packet out over the net. There are good reasons to limit the kinds of side effects and their scope and to provide protections from unwanted interactions, but to say "no side effects" is absurd. People do more with programs than prove them correct. Occasionally they actually run them on physical machines, and it's sometimes kind of fun to have them produce output of some sort. Eric Pepke INTERNET: pepke@gw.scri.fsu.edu Supercomputer Computations Research Institute MFENET: pepke@fsu Florida State University SPAN: scri::pepke Tallahassee, FL 32306-4052 BITNET: pepke@fsu Disclaimer: My employers seldom even LISTEN to my opinions. Meta-disclaimer: Any society that needs disclaimers has too many lawyers.
norman@d.cs.okstate.edu (Norman Graham) (10/11/90)
From article <2627@l.cc.purdue.edu>, by cik@l.cc.purdue.edu (Herman Rubin): > In article <2883@igloo.scum.com>, nevin@igloo.scum.com (Nevin Liber) writes: >> In article <64618@lanl.gov> jlg@lanl.gov (Jim Giles) writes: > >> >It is my contention that future languages >> >shouldn't have pointers at all. Not just no C-like pointers, none at >> >all. I just picked on C as the most unpleasant example of what I'm >> >against. > >> I really hate to agree with you Jim :-), but I'm beginning to think >> that you are right. The only real argument I can see _for_ having >> pointers is efficiency; more specifically, to help in >> hand-optimisation. Extensions to C such as C++ are showing that >> pointers aren't needed nearly as much as they use to be; heck, code >> seems to be more readable w/o them. In languages such as Icon and >> LISP I find that I don't even miss them. > > I must strongly object. When one considers what a variable or any other > type of reference is, it is a pointer. Fortran originally only used > pointers, as there was no call by value. I cannot imagine a language > for mathematical use which does not have variable arrays, functions, > etc. These are all expressed, consciously or not, as pointers. Pointers will always be required by those who demand that languages reveal the machine-level representations of their constructs. But that is only one view that languages can provide: Languages can provide no information about the location of a value, or they can remove the notion of storage (memory) entirely. Are these latter views bad? It depends on the design of the rest of the language. I cannot imagine that 'mathematical use' requires that pointers (or memory for that matter) be part of the language--unless by 'mathematical use' you mean the class of problems that requires very fast massaging of large matricies (i.e. wringing the most computation from every cpu cycle). In this case, efficiency is paramount; assembly language coding is not out of the question here. IMHO, pointers are required only when directly manipulating hardware devices (as in operating systems). > Now if one can have fixed pointers, why not variable pointers? We are > no longer in the days when the only way to call a subroutine was to > have the address in the instruction. Certainly, if pointers are values in the language then we should make them first class citizens: They must be able to point to any value that is conceptually stored in memory; pointers should be able to be stored in data structures, passed to and from functions, etc. > One has to go through contortions in Fortran to use a subroutine which > has a variable function or subroutine as an argument. Now there are > developments in programming languages which should have been apparent > way back when which may decrease the need for some uses of pointers, > but why hobble the intelligent programmer? I have deliberately used > arrays of pointers as the most convenient means of handling array > refill procedures, and while I can find a way around it, it is clumsy > and even less portable. Pointers are not the only way of providing these kinds of operations-- they're not even the most beautiful (conceptually speakin, of course). The ability to pass functions to function and to store functions in data structures only requires that functions be first class values in the language: Pointers to functions are unneeded if functions are values in the language. You just need to match the tool to the problem. If you require pointers, use a languages with pointers; otherwise use a conceptually more simple language~r{_{_. -- Norman Graham <norman@a.cs.okstate.edu> {cbosgd,rutgers}!okstate!norman The opinions expressed herein do not necessarily reflect the views of the state of Oklahoma, Oklahoma State University, OSU's Department of Computer Science, or of the writer himself.
norman@d.cs.okstate.edu (Norman Graham) (10/11/90)
From article <1990Oct10.101527.2247@maths.nott.ac.uk>, by anw@maths.nott.ac.uk (Dr A. N. Walker): > When, as a mathematician [well, as a game player, really], > I think of a tree or a graph, I think naturally and easily of each > node as being some information plus a collection of pointers to other > nodes. I draw them that way. You may think of them differently; > but I think I should be allowed, within my chosen language, to program > them my way, and not have to pummel them into some other shape. Since you brought up the example of trees, you may want to read this Miranda definition of a 3-ary tree (just to be different). three_tree ::= Leaf num | Node int three_tree three_tree three_tree This says that a three_tree value ther a Leaf with a number or a Node with a number and 3 other three_trees. Notice that it does this without pointers to trees. Manipulation of these trees is very simple and the shape of the tree is unchanged from the corresponding pointer version. -- Norman Graham <norman@a.cs.okstate.edu> {cbosgd,rutgers}!okstate!norman The opinions expressed herein do not necessarily reflect the views of the state of Oklahoma, Oklahoma State University, OSU's Department of Computer Science, or of the writer himself.
jlg@lanl.gov (Jim Giles) (10/11/90)
From article <2627@l.cc.purdue.edu>, by cik@l.cc.purdue.edu (Herman Rubin): > In article <2883@igloo.scum.com>, nevin@igloo.scum.com (Nevin Liber) writes: >> In article <64618@lanl.gov> jlg@lanl.gov (Jim Giles) writes: > >> >It is my contention that future languages >> >shouldn't have pointers at all. Not just no C-like pointers, none at >> >all. I just picked on C as the most unpleasant example of what I'm >> >against. > >> I really hate to agree with you Jim :-), but I'm beginning to think >> that you are right. The only real argument I can see _for_ having >> pointers is efficiency; more specifically, to help in >> hand-optimisation. Extensions to C such as C++ are showing that >> pointers aren't needed nearly as much as they use to be; heck, code >> seems to be more readable w/o them. In languages such as Icon and >> LISP I find that I don't even miss them. This discussion was cross-posted without adequate context to allow clear discussion. The assertion I made above was with respect to _explicit_ pointers - most of the functionality of which is possible in clearer ways. For example, one dimensional arrays are semantically equivalent to bounded pointer domains (with the subscript acting as the pointer) - however, since the arrays are bounded, it is possible to make assertions about aliasing between references that unbounded pointers don't allow (different arrays are disjoint, for example). > [...] > I must strongly object. When one considers what a variable or any other > type of reference is, it is a pointer. [...] Almost. Variables (and most constants) in procedural languages are _usually_ (but not always) allocated some specific memory location (either dynamically or statically) in which to keep the value of the symbol. The address of this memory location is an attribute of the variable (like its size or type - which are also attributes). A pointer, on the other hand, is a variable whose _value_ is an address. It also _usually_ has an address part as an attribute - which points to the address which is the value. But, addresses aren't particularly important to me. I almost never need to know or manipulate them in any way. So, why do I need a data type who's values are addresses? > [...] Fortran originally only used > pointers, as there was no call by value. [...] Fortran did no such thing. Fortran has _always_ been designed to allow _either_ call by reference _or_ call by value/result. On many machines, particularly the new massively parallel machines like the Connection Machine, call by value/result is _much_ more efficient. It's cheaper to copy the data in and out than to have the procedure always reading across the network of CPUs for it. > [...] I cannot imagine a language > for mathematical use which does not have variable arrays, Neither can I. But these don't need explicit pointers. Nor, do they even, necessarily, need call by reference as I pointed out above. > [...] functions, > etc. Yes, I think that languages should have these too. But, it's clear to me that pointers are not the proper way. > [...] These are all expressed, consciously or not, as pointers. Not necessarily. > [...] > Now if one can have fixed pointers, why not variable pointers? We are > no longer in the days when the only way to call a subroutine was to > have the address in the instruction. Why not leave pointers out of it and provide explicit ways of altering the reference attribute of a variable in specific ways? We are no longer in the days when anyone _cared_ about the absolute addresses of anything. Why have a data type which purports to represent them? > [...] > One has to go through contortions in Fortran to use a subroutine which > has a variable function or subroutine as an argument. [...] Yes, and the solution to this problem is to have a data type called 'function', and allowing variables of that type to be declared. For example, suppose you have a random number generator called ranf() which has an optional argument allowing you to set the seed. Why shouldn't the language of the future allow this: Function (optional float) -> float :: x, y !declare x, y the same !type as ranf() x = ranf !not an evaluation - an assignment to x y = ranf !ditto for y dummy = x(some_seed) !set seed for x dummy = y(some_other_seed) !set seed for y Now, you have two independent generators with different seeds that both are "copies" of the ranf() code. Actually, the only part the implementation needs to copy is the internal static variables of the routine - all else can use the same code. > [...] Now there are > developments in programming languages which should have been apparent > way back when which may decrease the need for some uses of pointers, > but why hobble the intelligent programmer? [...] Why indeed? Why force the programmer to use explicit pointers when the language can be designed to allow the programmer to explicitly state what he really wants to do? It's easier for the programmer, easier for the compiler to generate efficient code, easier all around. > [...] I have deliberately used > arrays of pointers as the most convenient means of handling array > refill procedures, and while I can find a way around it, it is clumsy > and even less portable. Exactly! Pointers _are_ clumbsy and less than portable. What you really wanted was probably an array of sequences, or an array of functions, or an array of dynamic arrays (not the same as a 2-d array - in an array of arrays, each element of the parent array may have a different size or even a different allocation status from its siblings). If the high-level language gave you those concepts, you wouldn't need explicit pointers and you would get portability (where ever the language was). J. Giles
jlg@lanl.gov (Jim Giles) (10/11/90)
From article <chl.655461731@m1>, by chl@cs.man.ac.uk (Charles Lindsey): > [...stuff about graphs with aliased nodes and aliased references to them...] > Now with pointers, this is easily implemented. But with recursive data types > it is impossible (at least in any straightforward way - can anyone tell me how > to do this in a functional language). What? Where in the definition of recursive data structures is aliasing prohibited? C.A.R. Hoare carefully discussed not allowing aliasing in recursive structures in "Structured Programming". The fact that he had to go to so much extra trouble indicates that the usual definition of recursive data _does_ allow aliasing. Now, I don't agree with Hoare that aliasing should be illegal. But then, I'm not as zealous as he (nor as clever - he _may_ be right). Still, I don't think that aliasing should be a default capability either. I think that allowing aliasing should require extra and explicit declarations to clearly state what variables are allowed to be aliased what what objects. For example: Recursive type graph !recursive type with a float value and float :: value !two links in each node graph :: left, right end type graph aliased graph :: a, b !two vars which can be aliased to each !other and to their descendents aliased graph :: c, d !ditto, EXCEPT that these can't be aliased !to the others (or vice-versa). graph :: e !this is just a binary tree, it can't !be aliased to any other nodes of to !its own descendents. ... a = graph(1.0,graph(2.0,null,null),null) !this assigns a graph with two nodes to !variable 'a'. The type constructor !also allocates space. b <- a !'<-' is the shallow copy assignment. !The two variables are now aliased. c = a !'=' is the deep copy assignment. 'c' !is now a disjoint copy of all of 'a'. !If 'a' had cycles or aliased elements, !the corresponding elements of 'c' would !also be aliased - but 'a' and 'c' would !still be disjoint d <- b !ILLEGAL. 'd' and 'b' were in separate alias !alias declarations so they can't alias each !other. You might argue that this is so nearly identical to pointers that it doesn't make any difference. But, this yields much greater control over the meaning of the program. Aliasing is only present when aliasing is a required part of the algorithm and only those elements which _need_ to be aliased are allowed to be. Try to get that amount of control from pointers. Finally, there are no 'dereference' and 'address-of' operators to misuse, forget, or confuse. J. Giles
gadi@SHUM.HUJI.AC.IL (Gad Aharoni) (10/11/90)
UUCP> Reply-To: shum!gadi (Gad Aharoni) Organization: The Hebrew University of Jerusalem Lines: 15 Apparently-To: post-usenet@ucbvax.berkeley.edu In article <3254@mcrware.UUCP> jejones@mcrware.UUCP (James Jones) writes: >... but I'd rather like the ability to define >recursive data structures (I think Hoare wrote a paper about this sort >of thing) rather than using pointers. I tend to agree--pointers are to >data structure what gotos are to control flow. To all you pointer haters, I hate to point out that it is exremely awkward to implement a graph data structure without the use of pointers. And a graph is the most general data structure. Incidentally, pointers do not necessarily mean side effects. Gadi
anw@maths.nott.ac.uk (Dr A. N. Walker) (10/11/90)
In article <1224@exodus.Eng.Sun.COM> chased@rbbb.Eng.Sun.COM (David Chase) writes: > [...] In C, >C++, Pascal, Modula-2, BCPL, and PL/1, the programmer is responsible >for ensuring that whatever the pointer points to is not reused while >the pointer is in use. Getting this wrong leads to messy bugs [...] In sensible languages (like Algol), storage allocation and, more importantly, deallocation is monitored by the compiler and the run-time support. So this bug is detected mostly as a compilation error, occasionally as a run-time fault, but in any case *when it happens*, not when the program inexplicably fails some time later. -- Andy Walker, Maths Dept., Nott'm Univ., UK. anw@maths.nott.ac.uk
gudeman@cs.arizona.edu (David Gudeman) (10/12/90)
In article <65265@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
]From article <14972@cbmvax.commodore.com>, by skrenta@cbmvax.commodore.com (Rich Skrenta):
]> [...] Are the programs
]> any more readable because I say
]> int i; a[i]
]> to dereference instead of
]> struct foo *p; *p ?
]
]Yes! Because, a[i] and b[j] are guaranteed _not_ to be aliased...
But a[i] and a[j] don't have that guarantee. So there are a few
aliasing problems that are easier to solve with arrays. No big deal.
]... The pointer syatax _may_ be used to simulate arrays, but
]you might be planning to use it for dynamic memory, strings, recursive
]data structures, run-tim equivalencing, etc.. How does the reader know
]that the pointer will not be used in any of those ways - he knows the
]array won't be.
No he doesn't. In a language without pointers, the array might be
used to simulate pointers, and the pointer-simulation used for one of
the above purposes.
] Each of these features should have separate syntax since
]they are separate features.
C-style pointers struck me from my very first exposure as a simple and
elegant way of merging various things that are only _apparently_
different. I don't suppose you would be willing to just admit that
people have different tastes and give up your crusade against
pointers?
I'm willing to recognize that you prefer to divide the world up into a
bunch of seperate boxes. You ought to recognize that others prefer to
integrate different things into the same box. It is simple hubris to
try to convince others that your personal preferences are somehow
objective and that their preferences are misguided.
] Forcing them all to masquerade as pointers
]only confuses the person maintaining the code - and doesn't give the
]compiler enough information to adequately optimize.
Funny, I've never been confused by any of the above uses of pointers
(with the possible exception of run-time equivalencing, I don't know
what you mean by that...).
As to optimization, I will admit that it is _harder_ to optimize code
with pointers. You have never convinced me that it is less _possible_
to optimize pointers. I say let the machine (or the compiler-writer)
do the work. If my mental picture of a solution to a problem involves
pointers, then the language should let me _express_ the solution with
pointers. Yes, yes, _you_ don't think I should be picturing problem
solutions with pointers. I happen to disagree with you, and so do
thousands of other C programmers.
--
David Gudeman
Department of Computer Science
The University of Arizona gudeman@cs.arizona.edu
Tucson, AZ 85721 noao!arizona!gudeman
gudeman@cs.arizona.edu (David Gudeman) (10/12/90)
In article <65409@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
]
]A pointer, on the other hand, is a variable whose _value_ is an
]address... But, addresses
]aren't particularly important to me. I almost never need to know
]or manipulate them in any way. So, why do I need a data type who's
]values are addresses?
An int, on the other hand, is a variable whose _value_ is a
bit-string. But, bit-strings aren't particularly important to me. I
almost never need to know or manipulate them in any way. So why do I
need a data type who's values are bit-strings?
The above probably won't make the point (though it should) so I will
expand on it. A pointer might have an address as its concrete value,
but you should be thinking of a pointer as an abstraction, not by its
concrete representation. A pointer is a data type with certain
operations on it:
for pointers declared "TYPE *p, *q", the following
*p returns an l-value referencing an object of type TYPE (as
for any other data object, if p is not initialized the
outcome of any operation is undefined)
p + i returns a pointer to the ith element of type TYPE
following *p. This is only valid if there is a contiguous
sequence of at least i elements of type TYPE following *p.
p < q returns TRUE iff *p preceeds *q in a sequence of
elements of type TYPE, assuming that both *p and *q are
members of the same contiguous sequence.
etc. There is no need to refer to addresses. Pointers are no less
abstract than arrays. There is no guarantee that "p + i" will return
an address "i*sizeof(TYPE)" greater than that of *p, anymore than it
is guaranteed that the address of A[i] is "(i-j)*sizeof(TYPE)" more
than the address of A[j]. I know implementations of both arrays and
pointers where these conditions do _not_ hold.
--
David Gudeman
Department of Computer Science
The University of Arizona gudeman@cs.arizona.edu
Tucson, AZ 85721 noao!arizona!gudeman
ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (10/12/90)
In article <65265@lanl.gov>, jlg@lanl.gov (Jim Giles) writes: > Yes! Because, a[i] and b[j] are guaranteed _not_ to be aliased. Whereas > *p and *q might be or, then again, might not be. That's an argument against *C* pointers. It is not at all an argument against pointers qua pointers. Remember Euclid? You declare pointers into ZONES. Pointers into different zones cannot possibly be aliased, and when a zone's lifetime expires the entire zone can be reclaimed, because a pointer can't outlive its zone. > The pointer syatax _may_ be used to simulate arrays, but > you might be planning to use it for dynamic memory, strings, recursive > data structures, run-tim equivalencing, etc.. How does the reader know > that the pointer will not be used in any of those ways - he knows the > array won't be. The reader certainly doesn't know anything of the kind. Of course you *can* use arrays to hold strings, recursive data structures, and do run-time equivalencing. (It's quite common to overlay two logical arrays onto one in Pascal, just think of straight insertion sort.) -- Fear most of all to be in error. -- Kierkegaard, quoting Socrates.
pcg@cs.aber.ac.uk (Piercarlo Grandi) (10/12/90)
On 10 Oct 90 10:15:27 GMT, anw@maths.nott.ac.uk (Dr A. N. Walker) said:
anw> Like everyone else over here, I'm missing the (initial) context, but I
anw> found this interesting:
Yes, good spotting!
anw> In article <1990Oct8.135551.21639@arnor.uucp>
anw> lowry@lusitania.watson.ibm.com (Andy Lowry) writes:
lowry> I personally have written a great deal of code in Hermes and have
lowry> found the lack of pointers to be no hindrance. [...]
lowry> Of course, our compiler and run-time system make heavy use of
lowry> pointers and shared data for reasons of efficiency, but this is
lowry> completely hidden from the Hermes programmer.
anw> In other words, here is a construct which is incredibly useful for
anw> writing compilers and other programs, but which the ordinary
anw> programmer is unable to use.
Good point! And precisely my point too: pointers are nearly
indispensable to write *implementations*, because our machines use them.
But the other guy also has a point: if your implementation is
efficiently written using pointers, your application can abstract away
from them.
anw> I don't think languages should hide useful abstractions from
anw> programmers; they should hide annoying concretions instead.
But pointers are a useful concretion, not an abstraction...
anw> I don't really want to know the machine address of a variable,
anw> or how to represent a conditional jump in machine code, or which
anw> registers need to be saved over a procedure call.
If you are doing fundamental *implementation* (OS, runtime library,
profiler, debugger, compiler, ...) work, you want of course to deal
explicitly with such things. It all depends on where you put your
virtual machine boundary -- if your virtual machine is a C virtual
machine, or an Ingres virtual machine, or a SPARC virtual machine,
etc...
anw> But I do want to be able to express within my language all the
anw> abstractions that naturally arise in modelling my problem. These
anw> may well include pointers ["keep a finger on that variable"], even
anw> when there are no `data structures' around.
Again: you never need pointers to model your *application*. Pointers
actually cloud abstractions because they introduce an extra layer of
manipulation (pointer values) that does not exist at the application
level (that is only expressed in data values). If you have very simple
relationships in your application, pointers are just one of the ways of
*implementing* them, and one of the least interesting and nastier.
If the machines we are programming had identification-by-contents for
objects we would not need pointers at the implementation level; but they
do not, so we must use pointers, identification-by-tagging, as a poor
substitute on which to build the more useful and application oriented
abstractions for relationships, using usually far more sophisticated
mechanisms than pointers.
I strongly believe that a lot of computer science, I think actually
almost all of it (if you accept the idea that numeric analysis is not
strictly CS :->), is about trying to simulate identification-by-contents
using identification-by-tagging.
--
Piercarlo "Peter" Grandi | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
cik@l.cc.purdue.edu (Herman Rubin) (10/13/90)
In article <1990Oct10.193502.2011@d.cs.okstate.edu>, norman@d.cs.okstate.edu (Norman Graham) writes: > From article <2627@l.cc.purdue.edu>, by cik@l.cc.purdue.edu (Herman Rubin): > > In article <2883@igloo.scum.com>, nevin@igloo.scum.com (Nevin Liber) writes: > >> In article <64618@lanl.gov> jlg@lanl.gov (Jim Giles) writes: > > > >> >It is my contention that future languages > >> >shouldn't have pointers at all. Not just no C-like pointers, none at > >> >all. I just picked on C as the most unpleasant example of what I'm > >> >against. ....................... > Pointers are not the only way of providing these kinds of operations-- > they're not even the most beautiful (conceptually speakin, of course). > The ability to pass functions to function and to store functions in > data structures only requires that functions be first class values in > the language: Pointers to functions are unneeded if functions are > values in the language. You just need to match the tool to the > problem. If you require pointers, use a languages with pointers; > otherwise use a conceptually more simple language~r{_{_. There are real simplifications and apparent simplifications. Suppose that internally information is being passed in a computer. Now the only ways that I can see to pass this information is to pass all or part of the value, the name, or a pointer. By part, one can pass part of the value and a way to get at the rest, so that the problem is repeated, although there are many places where this is done; many string procedures pass part of the string and pointers to what is before or after. Now passing the value may be the best way to do things, or passing a pointer. Passing the name only rarely; interpreters are slow, and compilers and linkers normally replace names with values or pointers, even for first-class objects. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet) {purdue,pur-ee}!l.cc!cik(UUCP)
asylvain@felix.UUCP (Alvin "the Chipmunk" Sylvain) (10/13/90)
In article <2884@igloo.scum.com> nevin@igloo.UUCP (Nevin Liber) writes: ::[I added comp.lang.misc to the list of newsgroups; please follow-up to ::the appropriate newsgroup ONLY.] Which is the appropriate newsgroup? If comp.lang.misc is not appropriate, why'd you add it? Why do I get the feeling that no one in comp.society.futures was or is or will be interested? Anywho ... ::In article <5088@uqcspe.cs.uq.oz.au> brendan@batserver.cs.uq.oz.au writes: :: ::>The percieved need for ::>side-effects in terms is merely a by-product of the poor state of language ::>design, and would not be missed at all in better languages. :: ::I disagree. This would throw out all functions which maintain their ::own state (eg: i/o). [...] Nope. C (at least) allows for variables to be 'static'. No need for side-effects to maintain the function's internal state. Unfortunately, Pascal doesn't suffer from this convenience. This causes a programmer to load up more and more junk into the 'program'-level 'var' declaration, until it's almost as hard to debug as FORTRAN COMMON statements. Heaven help you if two or more subroutines are both using a global 'IDX' in different contexts. There, your argument about side-effects probably holds. Any post-modern (after Pascal) language allows the declaration of a variable which is local to the routine, but doesn't change between invokations. Side-effects are not necessary for state-maintenance. Period. -- =============Opinions are Mine, typos belong to /bin/ucb/vi============= "We're sorry, but the reality you have dialed is no | Alvin longer in service. Please check the value of pi, | "the Chipmunk" or pray to your local diety for assistance." | Sylvain = = = = = =I haven't the smoggiest notion what my address is!= = = = = =
jlg@lanl.gov (Jim Giles) (10/13/90)
From article <26295@megaron.cs.arizona.edu>, by gudeman@cs.arizona.edu (David Gudeman): > In article <65265@lanl.gov> jlg@lanl.gov (Jim Giles) writes: > [...] > ]Yes! Because, a[i] and b[j] are guaranteed _not_ to be aliased... > > But a[i] and a[j] don't have that guarantee. So there are a few > aliasing problems that are easier to solve with arrays. No big deal. My objection here is with the word "few" and the phrase "no big deal." _MOST_ aliasing between pointers simulating arrays is actually between separate arrays. On a pipelined machine, the slowdown for aliasing is usually on the order of a factor of 2. On a vector machine, the factor may be between 10 and 100. Most users who are hit with this problem don't regard it as "no big deal." > [...] > ]... The pointer syatax _may_ be used to simulate arrays, but > ]you might be planning to use it for dynamic memory, strings, recursive > ]data structures, run-tim equivalencing, etc.. How does the reader know > ]that the pointer will not be used in any of those ways - he knows the > ]array won't be. > > No he doesn't. In a language without pointers, the array might be > used to simulate pointers, and the pointer-simulation used for one of > the above purposes. Touche. _IF_ the choice were strictly between pointers and arrays, then the rest of the important data type construction tools would have to be simulated with whichever is chosen. However, I don't recomment minimalist language design. The language should allow arrays, sequences (essentially variable length 1-d arrays), unions (always tagged), records (structs, whatever you want to call them), and recursive data structures. These features should be allowed to be used individually or in any combination. In addition, the 'aliased' attribute should be allowed to be applied to any collection of data items of the same type (which allows the "shallow copy" assignment to be applied to them). Further attributes that should be applicable to any variables are 'dynamic' (which allows the objects to be allocated and deallocated explicitly by the user), 'static' (which makes the variable have permanent scope): the default is to put the variable on the stack. Finally, an explicit way of defeating type checking (for run-time 'equivalence' work) should be provided. The presence of all these doesn't _guarantee_ that some user won't try to simulate one of them with some combination of the others, but why should a user do so? It would simply make his code harder to read and maintain. I prefer to state _explicitly_ how my data is structured. Perhaps your objection is founded on this last issue. Maybe you don't know a-priori how you data really is structured and you want to delay the decision until most of the code is written. This violates the spirit of structured programming (iterative refinement of programs - which differs from Structured (note the initial capital) programming which is a religion of GOTO evasion). > [...] > ] Each of these features should have separate syntax since > ]they are separate features. > > C-style pointers struck me from my very first exposure as a simple and > elegant way of merging various things that are only _apparently_ > different. [...] C pointers always struck me as rather _inelegant_ - even if I wanted to merge distinct features. > [...] I don't suppose you would be willing to just admit that > people have different tastes and give up your crusade against > pointers? I might if I believed that people's tastes were correlated to their productivity. I've always worked in or near the consulting office (help desk, whatever your business calls it). I worked my way through college as such a consultant. In 18 years of such experience I think I have a pretty good idea of what kind of errors people make, what kind cause the most trouble, and what kind of language features don't seem to engender such errors. Pointers are associated with the error side of the ledger. Further, many user productivity studies have been performed on language features. (Although I haven't found one on pointers yet.) Many of the researchers who conduct such tests have remarked that, very often, the feature the users thought was most productive was actually the reverse. Users are, in fact, notoriously bad at gauging their own productivity or the features that effect it. For this reason, when I see a feature which is associated with a disproportional percentage of errors (and difficult to find and correct ones at that), I perfectly willing to ascribe a good share of the blame for such errors on the feature itself - even if it's a feature I personally like. (This is indeed the case. Ten years ago I quite liked pointers. Since then I've found substitutes which are just as efficient and are more readible and less error prone.) > [...] > I'm willing to recognize that you prefer to divide the world up into a > bunch of seperate boxes. You ought to recognize that others prefer to > integrate different things into the same box. It is simple hubris to > try to convince others that your personal preferences are somehow > objective and that their preferences are misguided. And yet, you do not find it "simple hubris" when people (including yourself) maintain that the features of C are all anyone ever needs. If my opposition to pointers seems overblown, it is because all the false hype popularizing C is even more so. How many time have you chastized a C supporter for claiming that a preference for Fortran was misguided? It happens on the net all the time - at least a dozen times a year on each relevant newsgroup (and some irrelevant ones). Yet, such a position is as much "simple hubris" (or more) than the statements I'm making. > [...] > ] Forcing them all to masquerade as pointers > ]only confuses the person maintaining the code - and doesn't give the > ]compiler enough information to adequately optimize. > > Funny, I've never been confused by any of the above uses of pointers > (with the possible exception of run-time equivalencing, I don't know > what you mean by that...). This is a variation on the old "blame the victim" approach that lawyers defending rapists use. What you're saying is that you don't have the problem and those that do aren't worth considering. Aside from the ethical questions about your apathy toward other programmers, what about practical issues such as increased software costs, or increased taxes (the government hires programmers too - not all of them are as immune as you are to these errors)? But, let's test your claim: what is the expected argument type in the following ANSI C style function prototype? int f(char *x); Now, is the argument expected a single character which is passed by reference? Does the function expect an array of char? Does the function expect a sequence of char (terminated with a zero byte)? Or, perhaps the function expects something entirely different and it is declaring its argument to be a (char *) because the programmer "knows" that (char *) is more or less generic (this dependence on internal, machine dependent internal structure is what I mean by run-time equivalence - it is often a valuable and important thing to do, but I don't think pointer casting - implicit or explicit - is the best mechanism)? You can't tell? You are now having a problem pointers that you claim you don't have. The only way to tell what this argument is expected to be is to inspect the body of the function. Commentary and other sources may _claim_ that the argument is used in a particular way - but without compiler enforcement, such claims are not reliable. > [...] > If my mental picture of a solution to a problem involves > pointers, then the language should let me _express_ the solution with > pointers. Yes, yes, _you_ don't think I should be picturing problem > solutions with pointers. I happen to disagree with you, and so do > thousands of other C programmers. Then leave the discussion! C is a fait accompli and this is a discussion about the possible design of some future language. If you feel that C has all you need already, and in a form you like, then I promise never to twist your arm to buy this new language. You can continue to use C until doomsday for all I care. My remarks are addresses to those who aren't already religious about C or about a programming pradigm requiring pointers. To those people I am pointing out that there are ways to design languages which permit them to explicitly declare the structure of their data in ways that the compiler can check - thus rendering a large class of potential errors completely impossible. Additional advantages to this approach include the fact that the compiler itself can make use of this more explicit information to produce more efficient code. Another advantage is that explicit data structuring actually allows greater flexibility in the use of such structures. As an example of this last point: consider arrays and sequences. They are, as you keep asserting, quite similar concepts. But, they aren't identical. The rank and extent of an array is fixed for the lifetime of the array (which may be dynamic: specified in an allocate statement). The length of a sequence (it is always of rank one) is variable. This means that the concatenate operator is quite appropriate and useful for sequences, but is of doubtful use for arrays (and even of doubtful meaning for arrays of rank greater than one). If both arrays and sequences are implemented in the same way (as you insist using pointers), then the introduction of a concatenate operator would endanger error due to accidental use on arrays. This may not seem important and it may not occur very often, but it could be a very difficult error to discover in a large code. And, it's so easy to avoid entirely by making sequences and arrays distinct concepts in the language. J. Giles
jlg@lanl.gov (Jim Giles) (10/13/90)
From article <387@shum.huji.ac.il>, by gadi@SHUM.HUJI.AC.IL (Gad Aharoni): > [...] > To all you pointer haters, I hate to point out that it is exremely > awkward to implement a graph data structure without the use of pointers. > [...] Yes, and an IF statement is exremely awkward to implement without GOTOs. The fact that the implementation may _internally_ use pointers and/or GOTOs should trouble no one at the source code level - in fact, for the most part, they shouldn't need to know or care. Graphs are easy to implement as recursive data structures without any user-visible variable whose value is an address - that is to say: without explicit pointers. J. Giles
jlg@lanl.gov (Jim Giles) (10/13/90)
From article <1990Oct11.100302.19258@maths.nott.ac.uk>, by anw@maths.nott.ac.uk (Dr A. N. Walker): > In article <1224@exodus.Eng.Sun.COM> chased@rbbb.Eng.Sun.COM > (David Chase) writes: >> [...] In C, >>C++, Pascal, Modula-2, BCPL, and PL/1, the programmer is responsible >>for ensuring that whatever the pointer points to is not reused while >>the pointer is in use. Getting this wrong leads to messy bugs [...] > > In sensible languages (like Algol), storage allocation and, > more importantly, deallocation is monitored by the compiler and the > run-time support. So this bug is detected mostly as a compilation > error, occasionally as a run-time fault, but in any case *when it > happens*, not when the program inexplicably fails some time later. Yes. But at considerable overhead. The fastest method I know of is a reference counting scheme which is order (N+M*C); where N is the number of nodes descended from the pointer being monitored, M is the number of edges in the structure connecting those N nodes, and C is the number of simple cycles in the structure (a simple cycle is one which either entirely contains each cycle which overlaps it or which is entirely contained within each overlapping cycle). Note that this issue arises identically for recursive data structures which allow aliasing. So this technique is equally applicable (or expensive) regardless of whether the pointers involved are explicit or not. In fact, for my language feature (the 'aliased' attribute), the user must either take the responsibility for explicitly allocating and deallocating them or he must pay the overhead for an automatic maintenence of them. This is why 'aliased' is optional and is off by default (among other reasons). J. Giles
jlg@lanl.gov (Jim Giles) (10/13/90)
From article <26296@megaron.cs.arizona.edu>, by gudeman@cs.arizona.edu (David Gudeman): > [...] > An int, on the other hand, is a variable whose _value_ is a > bit-string. But, bit-strings aren't particularly important to me. I > almost never need to know or manipulate them in any way. So why do I > need a data type who's values are bit-strings? Here we see the basic confusion between the semantics of something and its implementation. The _value_ of an int is an integer restricted to some limited range (machine dependent - unfortunately). The internal representation _may_ be a bit string. On the other hand, the representation _may_ be a string of decimal digits (which in turn _may_ be implemented as bit strings), or the value _may_ be represented as trits (which I assume is the name one would give to the basic units of tri-state logic designs). Which of these (or any other) internal representations I get is (or at least, should be) irrelevant to me. > [...] > The above probably won't make the point (though it should) so I will > expand on it. A pointer might have an address as its concrete value, > but you should be thinking of a pointer as an abstraction, not by its > concrete representation. A pointer is a data type with certain > operations on it: And, if the equivalent functionality can be provided in a different for which gives the user more control over the semantics while allowing the compiler greater freedom in choosing the internal representation, why not prefer it? > [...] > for pointers declared "TYPE *p, *q", the following > *p returns an l-value referencing an object of type TYPE (as > for any other data object, if p is not initialized the > outcome of any operation is undefined) And, for non-pointers declared TYPE p, q, the following p returns an l-value referencing an object of type TYPE As for any other data object, the default initial value should be required to be specified either by the definition of TYPE itself, or by initialization within the declaration of each variable - with the later overriding the former if both initializations are present. The only exception to this is in the case of dynamically allocated variables, in which case, the allocation mechanism should always initialize. > [...] > p + i returns a pointer to the ith element of type TYPE > following *p. This is only valid if there is a contiguous > sequence of at least i elements of type TYPE following *p. If p is to be used as an indexed object, it should be declared as such. Either an array or a sequence specification in the declaration of p should be required. That way, both the user and the compiler are aware of the bounds, extent, and rank over which the object is indexed. > [...] > p < q returns TRUE iff *p preceeds *q in a sequence of > elements of type TYPE, assuming that both *p and *q are > members of the same contiguous sequence. The same applies to indexes within arrays or sequences. With the added advanntage that the fact that both indexes apply to the same contiguous sequence of values is explicit in every use - so both the user and the compiler can make precise assumptions about this property. The issue isn't whether pointers have properties which can be abstracted from raw addresses or not. The issue is whether the properties of pointers are required (or even desireable) in the context of a high-level language. J. Giles
pcg@cs.aber.ac.uk (Piercarlo Grandi) (10/13/90)
On 10 Oct 90 14:31:03 GMT, lowry@arnor.uucp said: lowry> We believe that getting a program logically correct and making it lowry> efficient should be two independent tasks. The Hermes language allows lowry> this separation by not requiring the programmer to make decisions that lowry> will determine efficiency. Data representation is just one example of lowry> this. Others are replication and migration of processes and/or data, lowry> concurrent execution of programs written in serial form, etc. Well said. Note that under the idea that what we really want to do is reuse of interface, semantics, implementation, you are saying that it is efficient to achieve a lot of reuse of interface and semantics is by having a very high level language. Agreeable point. lowry> It is part of our continuing research to develop very aggressive lowry> and high-level optimizations of this sort so that programs lowry> written in our model will, in fact, execute efficiently in a wide lowry> range of environments. I most strongly object to call this 'aggressive optimization'. It makes 'aggressive optimization' look respectable. Aggressive optimization is where the optimizer rewrites your program in a supposedly similar form behind your back. In your case you are just doing competent implementation of high level constructs, which were carefully designed with clean semantics to give ample opportunities for safe application of sophisticated implementation strategies. The query planner of a relational database does not do any aggressive optimization -- it is just required to do its job well. It starts to be aggressive if it rewrites queries before planning them, and hen it becomes really obnoxious only if the algebra underlying the query language is poorly designed, like in SQL. But note that in a well designed high level application language there *should* be no point in restructuring a user's program, even in the presence of a clearly defined and safe algebra for doing so -- there is essentially only one canonical form of achieving an effect. lowry> The programmer can also be involved in tuning, but this should lowry> generally come only after the program has been made correct. The lowry> tuning comes in the form of sprinkling "pragmas" around the code in lowry> various places. Well said. 'register' or 'inline' come to mind immediately, at a much lower level of abstractions, or providing expected domain value statistics and join paths to a query planner or relation creator for a relation database. Again, of course, as far as current technology is concerned, we are speaking of *application level* programming. Somebody has still got to do the bit twiddling. -- Piercarlo "Peter" Grandi | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
carroll@cis.udel.edu (Mark Carroll) (10/14/90)
In article <2632@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes: >In article <1990Oct10.193502.2011@d.cs.okstate.edu>, norman@d.cs.okstate.edu (Norman Graham) writes: ]] From article <2627@l.cc.purdue.edu>, by cik@l.cc.purdue.edu (Herman Rubin): ]]] In article <2883@igloo.scum.com>, nevin@igloo.scum.com (Nevin Liber) writes: ]]]] In article <64618@lanl.gov> jlg@lanl.gov (Jim Giles) writes: ]]] ]]]]]It is my contention that future languages ]]]]]shouldn't have pointers at all. Not just no C-like pointers, none at ]]]]]all. I just picked on C as the most unpleasant example of what I'm ]]]]]against. ] ] ....................... ] ]] Pointers are not the only way of providing these kinds of operations-- ]] they're not even the most beautiful (conceptually speakin, of course). ]] The ability to pass functions to function and to store functions in ]] data structures only requires that functions be first class values in ]] the language: Pointers to functions are unneeded if functions are ]] values in the language. You just need to match the tool to the ]] problem. If you require pointers, use a languages with pointers; ]] otherwise use a conceptually more simple language~r{_{_. ] ]There are real simplifications and apparent simplifications. ] ]Suppose that internally information is being passed in a computer. Now ]the only ways that I can see to pass this information is to pass all or ]part of the value, the name, or a pointer. By part, one can pass part of ]the value and a way to get at the rest, so that the problem is repeated, ]although there are many places where this is done; many string procedures ]pass part of the string and pointers to what is before or after. ] ]Now passing the value may be the best way to do things, or passing a pointer. ]Passing the name only rarely; interpreters are slow, and compilers and linkers ]normally replace names with values or pointers, even for first-class objects. Herman, you should really strongly consider learning a little bit about languages, because you're continually making a fool of yourself out of ingorance. *** Machine Level Representation != Language Level Representation *** That's the ENTIRE POINT of high-level languages - to allow us to write programs in representations that are different from that of the underlying architecture. The "real" way in which things are represented is not necessarily the best way to present them to a human being. Of COURSE you'll have pointers on the execution level, if you want to represent data structures - there's really no way to avoid that. But, equally, you've got to have bitstrings on the execution level to represent floating point numbers. Do I want my progamming language to force me to twiddle bit strings? No - and equally, I don't necessarily want my programming language to force me to twiddle pointers. There are better ways to represent things on my level. Like Recursive data structures. Or perhaps the table types that the hermes people have spoken about. ]Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 <MC>
jlg@lanl.gov (Jim Giles) (10/14/90)
From article <PCG.90Oct13114804@odin.cs.aber.ac.uk>, by pcg@cs.aber.ac.uk (Piercarlo Grandi): > [...] > I most strongly object to call this 'aggressive optimization'. It makes > 'aggressive optimization' look respectable. > > Aggressive optimization is where the optimizer rewrites your program in > a supposedly similar form behind your back. [...] > [...] >-Piercarlo "Peter" Grandi [...] Grandi comes up with this one periodically. I can never pin him down on a meaning. A compiler is a tool which transforms one representation of a program (the source) into a semantically equivalent form (the object code) in another language. If the compiler changes the semantics of the program during this process, it is _BROKEN_. The optimizer, as part of the compiler, must obey this same constraint. If the optimizer changes the semantics of a code, it is _BROKEN_. An optimizer may be as "aggressive" as the compiler implementor wants to make it - as long as it doesn't alter the semantics of the code it's translating. J. Giles
peter@ficc.ferranti.com (Peter da Silva) (10/14/90)
How about taking this to alt.religion.computers? As for the usefulness of pointers, or whether they're the data-structure equivalent of a GOTO, remember that most practical people in this world are willing to accept that an occasional GOTO is useful or even desirable. And, remember that C is nearly 20 years old! It's not the perfect tool for every job, so what? You can't cook everything in a microwave oven, either. -- Peter da Silva. `-_-' +1 713 274 5180. 'U` peter@ferranti.com
gudeman@cs.arizona.edu (David Gudeman) (10/14/90)
In article <65680@lanl.gov> jlg@lanl.gov (Jim Giles) writes: ] ]Here we see the basic confusion between the semantics of something and ]its implementation. That was _clearly_ a rephrasing something _you_ said. The purpose of the rephrasing was _clearly_ to show that you had this basic confusion. Taking a quote out of context like that is really low. In article <65662@lanl.gov> jlg@lanl.gov (Jim Giles) writes: ]And yet, you do not find it "simple hubris" when people (including ]yourself) maintain that the features of C are all anyone ever needs. That is a lie. Slander, libel, and not even a nice or true thing to say about me. I have never, _never_, NEVER, said, implied, intimated, believed, thought, or given credence to the statement that the features of C are all anyone ever needs. If anyone asked my opinion on C, I would say that it is OK in it's application domain, widely known, and has some nice features. But if your application isn't seriously performance bound, use a real language like Lisp or Icon, one that supports multiple paradigms and has lots of nice high-level data structures. ]If my opposition to pointers seems overblown, it is because all the ]false hype popularizing C is even more so. How many time have you ]chastized a C supporter for claiming that a preference for Fortran ]was misguided? My opinion on Fortran is about the same as my opinion on C, except for the part about having some nice features. Maybe if I knew a Fortran more recent than Fortran 77 I could even say nice things about it. ]This is a variation on the old "blame the victim" approach that lawyers ]defending rapists use. What you're saying is that you don't have the ]problem and those that do aren't worth considering. That is a pretty sleazy rhetorical ploy. People who have trouble reading my code are not the moral equivalents of rape victims. They are not very competent programmers either. As to "blame the victim", there is not necessarily any "blame" associated with incompetence. I just don't want to be bound in _my_ actions by the possible incompetence of other people. I also don't expect professional basketball players to play blindfolded with one arm tied behind their backs so that clutzes like me have a chance to be in the NBA. ] Aside from the ]ethical questions about your apathy toward other programmers, what about ]practical issues such as increased software costs, or increased taxes ](the government hires programmers too - not all of them are as immune ]as you are to these errors)? First, I do not claim immunity to errors. I claim that using pointers does not cause me to make errors. Second, I do not claim that everyone should use pointers. ]But, let's test your claim: what is the expected argument type in the ]following ANSI C style function prototype? ] ] int f(char *x); ] A pointer to a character. ]Now, is the argument expected a single character which is passed by ]reference? Does the function expect an array of char? Does the ]function expect a sequence of char (terminated with a zero byte)? Yes. One of those. ]Or, perhaps the function expects something entirely different and ]it is declaring its argument to be a (char *) In a post-ANSI C this would be trouble. The programmer's problem, not mine. OK, now your turn. In the following LISP function (defun f (x) .... What is x supposed to be? One of the above things? A symbol? An array? Hmm. Looks like you are going to have to look at the code. ] You can't tell? You are now having a problem ]pointers that you claim you don't have. The only way to tell what ]this argument is expected to be is to inspect the body of the function. ]Commentary and other sources may _claim_ that the argument is used Well, you either trust the documentation or you don't. If you don't, then you have to inspect the body of the function anyway to see if it does what you want it to. ]Then leave the discussion! C is a fait accompli and this is a ]discussion about the possible design of some future language. It is? Then how come the article I responded to looked like a nothing more than a pointless (heh,heh) slam at pointers? ]... The rank and extent of an array is fixed for the lifetime ]of the array (which may be dynamic: specified in an allocate statement). APL programmers will be surprised to hear that. ]The length of a sequence ... is variable. Mathematicians will be surprised to hear that. ] This ]means that the concatenate operator is quite appropriate and useful for ]sequences, but is of doubtful use for arrays I don't see the relationship. When you concatenate two sequences to get a third, you haven't changed the length of any sequence. Changing the length of a sequence implies that it is a mutable object in the first place, and in the second place makes it deque. ] If both arrays and ]sequences are implemented in the same way (as you insist using pointers), More slander. I never insisted on any such thing. ]then the introduction of a concatenate operator would endanger error ]due to accidental use on arrays. Oops. And maybe you just discovered a useful new operation on arrays by observing their similarity to sequences. It works in APL. -- David Gudeman Department of Computer Science The University of Arizona gudeman@cs.arizona.edu Tucson, AZ 85721 noao!arizona!gudeman
gadi@SHUM.HUJI.AC.IL (Gad Aharoni) (10/15/90)
In article <65664@lanl.gov> jlg@lanl.gov (Jim Giles) writes: >... Graphs are easy >to implement as recursive data structures without any user-visible >variable whose value is an address - that is to say: without explicit >pointers. Ok. Show me how you would implement a graph data structure without using pointers, and when I say pointers I also mean array indices. The only other way I can think of is to use the standard logic, or relational database technique of listing the dependencies. I am not too fond of this method of representing a graph because when a path in the graph is followed, the next node is not "at hand". When a list is followed the next element is always there, when following the right son of a tree, the subtree is there, at hand, immediatly accessible. In this method of representing a graph the next node has to be searched for in the table before the node is accessed. This is in fact simulating store, only without the advantage of immediate access. Gadi -- Gad Aharoni TEL: +972-2-584932 BITNET: gadi@humus.bitnet CSNET & INTERNET: gadi@humus.huji.ac.il Snail: Dept. of CS, Hebrew University of Jerusalem, Jerusalem 91904, Israel
chl@cs.man.ac.uk (Charles Lindsey) (10/15/90)
In <65664@lanl.gov> jlg@lanl.gov (Jim Giles) writes: >........ Graphs are easy >to implement as recursive data structures without any user-visible >variable whose value is an address - that is to say: without explicit >pointers. How?
anw@maths.nott.ac.uk (Dr A. N. Walker) (10/16/90)
In article <65409@lanl.gov> jlg@lanl.gov (Jim Giles) writes: >[...] But, addresses >aren't particularly important to me. I almost never need to know >or manipulate them in any way. So, why do I need a data type who's >values are addresses? (a) Because, although *you* may not, *others* may, or may think they do, or may simply prefer to think of things that way. (b) Because, although I may never want to manipulate addresses in the bit-twiddling way you seem to be suggesting, I *do* quite often want fingers. If I search a data structure for elements with some property (the "largest" node, or the pair that are furthest apart, or whatever) it is reasonable to keep fingers on the results. You can certainly disguise the pointer-ness of what I am doing, but I don't see why you should have to, or indeed want to. > [...] We are no longer >in the days when anyone _cared_ about the absolute addresses of anything. (*Some* people still do, as PCG points out elsewhere.) >Why have a data type which purports to represent them? In most HLLs, no such representation is purported. A pointer is simply a way in which one object can make reference to another, which sounds like a pretty useful abstraction to me. In article <65662@lanl.gov> jlg@lanl.gov (Jim Giles) writes: >[...] The rank and extent of an array is fixed for the lifetime >of the array [...] This will be news to Algol programmers. > This means that the concatenate operator is [...] of doubtful use > for arrays [...]. MODE STRING = [1: 0 FLEX] CHAR; STRING s := "tomorrow"; s +:= 2 * (" and " + s) + " creeps in ..." C apologies to Shakespeare! C -- Andy Walker, Maths Dept., Nott'm Univ., UK. anw@maths.nott.ac.uk
anw@maths.nott.ac.uk (Dr A. N. Walker) (10/16/90)
In article <1990Oct10.195202.2340@d.cs.okstate.edu> norman@d.cs.okstate.edu (Norman Graham) writes: >[...] you may want to read this >Miranda definition of a 3-ary tree (just to be different). I keep trying to get interested in Miranda, but "they" want real money for it, which is bad news for a Maths Dept. > three_tree ::= Leaf num | Node int three_tree three_tree three_tree Well, that's fine for an acyclic structure [though not the way I usually *think* of trees] and a lazy evaluator, but it doesn't make sense to me in the form three_graph ::= Node int three_graph three_graph three_graph Where does it all end, even lazily? How do I express the identity of one node with another [ie, perhaps the same node, but reached via some chain of edges]? How do I *draw* it? What is *anyone*'s objection to MODE NODE = STRUCT (INT info, REF NODE left, right, middle) ? -- Andy Walker, Maths Dept., Nott'm Univ., UK. anw@maths.nott.ac.uk
anw@maths.nott.ac.uk (Dr A. N. Walker) (10/16/90)
In article <65665@lanl.gov> jlg@lanl.gov (Jim Giles) writes: >From article <1990Oct11.100302.19258@maths.nott.ac.uk>, by >anw@maths.nott.ac.uk (Dr A. N. Walker): >> In sensible languages (like Algol), storage allocation and, >> more importantly, deallocation is monitored by the compiler and the >> run-time support. So this bug is detected mostly as a compilation >> error, occasionally as a run-time fault, but in any case *when it >> happens*, not when the program inexplicably fails some time later. > >Yes. But at considerable overhead. The fastest method I know of is a >reference counting scheme [...] [The "bug" referred to is a pointer mistakenly left dangling into deallocated storage.] If pointers may not point into storage with shorter lifetime [the simplest rule to enforce], then most illegal uses (eg: returning a pointer to local storage from a subroutine; making a global pointer refer to a local variable) are detected by the compiler. Only when procedure parameters are abused in a somewhat devious way is an expensive check needed. The overhead comes from garbage collection. If your program happens not to fill your entire virtual machine, you may (and I often do) get away scot free. If you do fill the machine, there may then be a short glitch [which may or may not amount to more or less than the accumulated minor glitches of doing lots of smaller deallocations by hand]. This is bad news if you're shutting down a nuclear power station or landing the space shuttle, but that's a different story, and most of us don't need to worry about such things. -- Andy Walker, Maths Dept., Nott'm Univ., UK. anw@maths.nott.ac.uk
lowry@arnor.uucp (10/16/90)
In article <1990Oct10.101527.2247@maths.nott.ac.uk>, anw@maths.nott.ac.uk (Dr A. N. Walker) writes: |> In article <1990Oct8.135551.21639@arnor.uucp> |> lowry@lusitania.watson.ibm.com (Andy Lowry) writes: |> >I personally have written a great deal of code in Hermes and have |> >found the lack of pointers to be no hindrance. |> [...] |> >Of course, our compiler and run-time system make heavy use of pointers |> >and shared data for reasons of efficiency, but this is completely |> >hidden from the Hermes programmer. |> |> In other words, here is a construct which is incredibly |> useful for writing compilers and other programs, but which the |> ordinary programmer is unable to use. The point I meant to make is that indirect addressing is a fact of life in most of today's computer architectures. Therefore, pointers will certainly be found in efficient implementations of many programs. This does not mean that the pointers need to show through in the languages in which the programs are written. Many computers also have segment registers, process status registers, data caches, and other such features, but I'm not ordinarily aware of them when I write programs. When we implemented the Hermes run-time system, we did, in fact, make heavy use of pointers. This is because we wrote the system in C, where pointers are indispensible. We believe that we have designed Hermes in such a way that pointers are not indispensible. |> When, as a mathematician [well, as a game player, really], |> I think of a tree or a graph, I think naturally and easily of each |> node as being some information plus a collection of pointers to other |> nodes. I draw them that way. You may think of them differently; |> but I think I should be allowed, within my chosen language, to program |> them my way, and not have to pummel them into some other shape. I expect you actually think of them as a collection of nodes and arcs. At least that's how they're normally presented in mathematical books and articles. Computer programmers typically represent the arcs as arrays or lists of pointers to successor nodes. This moves away from the conceptualization and biases the implementation in some very significant ways. For example, without substantial extra baggage, such a representation will never support following the arcs backward as easily or cheaply as forward. The Hermes programmer could represent a graph precisely in the standard conceptual manner--as a collection of nodes and a collection of arcs, where each arc is a record containing two node id's. The compiler and run-time system would choose representations and algorithms based on a number of factors, perhaps including observed or anticipated access patterns, hints from the programmer, etc. If the access patterns were to change, a recompilation could adapt these choices as needed to maintain efficiency. This is much cheaper and less error-prone than re-engineering a C program because reverse traversals are suddenly needed in a structure that has only forward pointers. -- Andy Lowry, lowry@ibm.com, (914) 784-7925 IBM Research, 30 Saw Mill River Road, P.O. Box 704, Yorktown Heights, NY 10598
pcg@cs.aber.ac.uk (Piercarlo Grandi) (10/16/90)
On 13 Oct 90 20:43:18 GMT, jlg@lanl.gov (Jim Giles) said: jlg> From article <PCG.90Oct13114804@odin.cs.aber.ac.uk>, by jlg> pcg@cs.aber.ac.uk (Piercarlo Grandi): pcg> I most strongly object to call this 'aggressive optimization'. It makes pcg> 'aggressive optimization' look respectable. pcg> Aggressive optimization is where the optimizer rewrites your program in pcg> a supposedly similar form behind your back. [...] jlg> Grandi comes up with this one periodically. I can never pin him down jlg> on a meaning. I am sorry on this -- it is a fairly imprecise concept and I seem not to be able to express it clearly, even if I have given at least several examples. I doubt it can be defined precisely, even if certainly more precisely than pornography... :-) jlg> A compiler is a tool which transforms one representation of a jlg> program (the source) into a semantically equivalent form (the jlg> object code) in another language. If the compiler changes the jlg> semantics of the program during this process, it is _BROKEN_. The jlg> optimizer, as part of the compiler, must obey this same constraint. jlg> If the optimizer changes the semantics of a code, it is _BROKEN_. Yes, as long as the result is correct, an optimizer can do what it wants. But here we are not interested on what a correct optimizer can do; we are interested in what is proper or cost/effective for an optimizer to do. In particular if there is reason to reckon that certain types of optimization are hard to get right, that is the generated code is incorrect with higher probability, they may not be very cost effective. In practice the probability of code being generated incorrectly is indeed an increasing function of optimizer complexity, and of the level of semantic restructing performed by it. So the statement: jlg> An optimizer may be as "aggressive" as the compiler implementor jlg> wants to make it - as long as it doesn't alter the semantics of the jlg> code it's translating. is true but not interesting -- it is a tautology. The interesting question is which transformations, when going from source to target code, are worth doing, performance and reliability wise. Maybe I can find a more understandable way of expressing my thought: I reckon, with some supportive reasoning that I will not repeat here, that the tranformations that do not involve any "horizontal" rewriting of a program's logic are worthwhile. I mean by this tranformations that preserve the structure of the algorithm being translated. If the structure is relatively underspecified (a very high level language) then there is considerable scope for clever implementation strategies. If it is relatively precisely specified (C, other systems implementation languages) then it should be respected. I also regard attempts at reinterpreting low level languages as high level ones, to get around this, as dangerous, or at least inferior to using languages designed at the desired level in the first instance. Aggressive optimization is the idea that restructuring a program *in the course of translation* is both feasible and advantageous, and desirable (cost effective). My contention is that logic restructuring optimizations are more cost effective instead at the source level, whether automatic or manual, and that often automatic is not necessary, manual is sufficient. My main reason for surmising so is that automatic program analysis and rewriting is often far more difficult than code planning and synthesis, and the benefits are not as often competitive with those of more limited and safer alternatives (source analysis and rewriting). More shortly: when -O[2345....] will not raise substantially the probability of getting broken code in most compilers, when it will not result in huge increases in compile time or space, and when it will give substantially better results than hand tuned code, then I will BELIEVE! :-). -- Piercarlo "Peter" Grandi | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
mjs@hpfcso.HP.COM (Marc Sabatella) (10/17/90)
>>[...] But, addresses >>aren't particularly important to me. I almost never need to know >>or manipulate them in any way. So, why do I need a data type who's >>values are addresses? > > (a) Because, although *you* may not, *others* may, or may think > they do, or may simply prefer to think of things that way. Count me in. I write pretty low level code. Dynamic linker/loaders, etc. Darn tootin' I care about addresses. Pretty useful when doing symbolic relocation. -------------- Marc Sabatella (marc@hpmonk.fc.hp.com) Disclaimers: 2 + 2 = 3, for suitably small values of 2 Bill and Dave may not always agree with me
peter@ficc.ferranti.com (Peter da Silva) (10/18/90)
In article <1990Oct15.204343.2907@arnor.uucp> lowry@lusitania.watson.ibm.com (Andy Lowry) writes: > The Hermes programmer could represent a graph precisely in the > standard conceptual manner--as a collection of nodes and a collection > of arcs, where each arc is a record containing two node id's. ...each of which node-ids is a pointer, no? It's not a machine address, but its an identifier that can reference any arbitrary node... it's a pointer into a bounded collection of objects, but you still have to worry about aliasing within that collection. The problem of aliasing is alleviated, but not solved. (basically, you've switched from C to lisp) -- Peter da Silva. `-_-' +1 713 274 5180. 'U` peter@ferranti.com
nevin@igloo.scum.com (Nevin Liber) (10/18/90)
[<sigh> I moved this thread over to comp.lang.misc about a week and a half ago; then our machine (igloo) took a few power hits. I missed most of the real details and find myself left with a flame war. Just my luck.] In article <393@shum.huji.ac.il> shum!gadi@lilac.berkeley.edu (Gad Aharoni) writes: >In article <65664@lanl.gov> jlg@lanl.gov (Jim Giles) writes: > >>... Graphs are easy >>to implement as recursive data structures without any user-visible >>variable whose value is an address - that is to say: without explicit >>pointers. >Ok. Show me how you would implement a graph data structure without using >pointers, and when I say pointers I also mean array indices. All that are needed are the stack operations push and pop, and the queue operations put and get, and it should be fairly easy to implement a graph data structure. If this isn't true, then it would be impossible to implement graphs in Icon and LISP. I am not looking for "C without pointers". What I feel is that, sometime in the future, our computer languages (if that is how we program computers at all :-)) will have enough higher-level constructs (such as stack and queue operations on arbituary complex objects), that pointers will be superfluous. Pointers will, of course, always remain on the implementation level. But on the programming level, it tends to get messy. I would _much_ rather deal with references than pointers. Once you have references and high-level data structures and operations, what do you still need pointers for (other than hardware manipulation, of course)? A related question: why _don't_ languages such as LISP and Icon (and probably Smalltalk; I haven't used it enough to be able to comment on it) have pointers on the programming level? What classes of problems can't be solved in these languages that can in say, a strictly conforming ANSI C program involving pointers (other than OS specific and hardware-specific stuff, of course)? -- NEVIN ":-)" LIBER nevin@igloo.Scum.com or ..!gargoyle!igloo!nevin (708) 831-FLYS California, here I come! Q: Who killed Laura Palmer? A: Waldo did it.
lowry@arnor.uucp (10/19/90)
In article <=GH6UHD@xds13.ferranti.com>, peter@ficc.ferranti.com (Peter da Silva) writes: |> In article <1990Oct15.204343.2907@arnor.uucp> lowry@lusitania.watson.ibm.com (Andy Lowry) writes: |> > The Hermes programmer could represent a graph precisely in the |> > standard conceptual manner--as a collection of nodes and a collection |> > of arcs, where each arc is a record containing two node id's. |> |> ...each of which node-ids is a pointer, no? It's not a machine address, but |> its an identifier that can reference any arbitrary node... it's a pointer |> into a bounded collection of objects, but you still have to worry about |> aliasing within that collection. The problem of aliasing is alleviated, |> but not solved. |> |> (basically, you've switched from C to lisp) Not really. Just because I have the same integer value stored in two different variables, does that mean they are aliased? Just because I have the same node ID stored in two different variables, are those variables aliased? I can change one without the other being affected. I would say they are not aliased. Hermes guarantees that it is impossible to change the value of one variable as a side-effect of changing the value of another. This is what we mean when we say Hermes does not allow aliasing. In Lisp, this is not the case. I can write: (setq x '(1 2 3)) (setq y x) (rplaca x 'a) It's true that if x and y contain node ID's, then I can retrieve the node identified by x and change it, and now if I retrieve the node identified by y I will see the change. But I did not change the value of y. Perhaps one could say that the distinction is a bit muddy, and a stronger guarantee than the one we make might be useful in some situations. Nevertheless, the guarantee we do make is extremely useful, in that it enables our compiler to perform very strong checks to ensure that no Hermes process can ever affect another in a way that could not have been anticipated by looking at the code, knowing nothing of the underlying implementation. For example, a bug in one process can never cause random behavior like crashes to occur in other processes, even if they are executing in the same address space. In addition, though we have not explored this to any great extent, we believe that our non-aliasing guarantee will enable many compiler optimizations that cannot be performed in the presence of aliasing. -- Andy Lowry, lowry@ibm.com, (914) 784-7925 IBM Research, 30 Saw Mill River Road, P.O. Box 704, Yorktown Heights, NY 10598
jlg@lanl.gov (Jim Giles) (10/19/90)
From article <=GH6UHD@xds13.ferranti.com>, by peter@ficc.ferranti.com (Peter da Silva): > [... description of a particular structure ...] > ...each of which node-ids is a pointer, no? It's not a machine address, but > its an identifier that can reference any arbitrary node... it's a pointer > into a bounded collection of objects, but you still have to worry about > aliasing within that collection. The problem of aliasing is alleviated, > but not solved. Yes, but this alleviation is important. That's why the C habit of changing arrays to pointers in procedure calls is a _really_ bad idea. J. Giles
anw@maths.nott.ac.uk (Dr A. N. Walker) (10/19/90)
In article <1990Oct15.204343.2907@arnor.uucp> lowry@lusitania.watson.ibm.com (Andy Lowry) writes: >The point I meant to make is that indirect addressing is a fact of >life in most of today's computer architectures. Therefore, pointers >will certainly be found in efficient implementations of many programs. >This does not mean that the pointers need to show through in the >languages in which the programs are written. True. But no-one has yet given a convincing (to me!) reason why pointers should *not* show through. > Many computers also have >segment registers, process status registers, data caches, and other >such features, but I'm not ordinarily aware of them when I write >programs. Umm. Not directly, perhaps; after all, many such features are transparent even at the assembler level. However, you may soon have a poor data locality, for example, brought indirectly to your attention when the data cache ceases to be effective -- I have seen programs very suddenly run over 100 times more slowly when an array size crept over some limit. [Admittedly, this was in 1965; but I expect Jim Giles sees it happen even today when vectorisation fails.] > When we implemented the Hermes run-time system, we did, in >fact, make heavy use of pointers. This is because we wrote the system >in C, where pointers are indispensible. We believe that we have >designed Hermes in such a way that pointers are not indispensible. Presumably an acid test will come when the system is written in Hermes and becomes self-supporting. >I expect you actually think of them as a collection of nodes and arcs. ["you" is me, and "them" is trees and graphs] >The Hermes programmer could represent a graph precisely in the >standard conceptual manner--as a collection of nodes and a collection >of arcs, where each arc is a record containing two node id's. Indeed. But the first actual implementation of graphs that I clapped my eyes on started out something very like MODE GRAPH = STRUCT (VERTICES v, EDGES e), EDGE = STRUCT (REF VERTEX a, b), VERTEX = ...etc... which seems not so far away. The distinction between "each arc is a record containing two node id's" and "each arc is a record containing two pointers to nodes" seems rather slight, and amounts to "who provides the node ids?". > [recompilation] is much cheaper and >less error-prone than re-engineering a C program because reverse >traversals are suddenly needed in a structure that has only forward >pointers. On the other hand, if reverse traversals are suddenly needed in a program that was designed on the assumption that they weren't necessary, re-engineering might be thought highly desirable. There are also some problems with compilers that change algorithms behind the programmer's back, depending on some complicated analysis. Suppose my program has a bug that is benign in some implementations, but not in others. Lo and behold, all my test programs work, the production run fails, and as soon as I try to isolate the bug, it goes away. (Yes, I know we all have bugs like that sometimes!) [Nothing in this should be read as implying that I'm agin the Hermes project. What I've heard about it sounds interesting, and systems that raise levels of abstraction should be encouraged. I just don't see pointers (in particular) as undesirable aliens that *should* be abstracted away. They are useful citizens, just like integers and procedures, to be harnessed with care.] -- Andy Walker, Maths Dept., Nott'm Univ., UK. anw@maths.nott.ac.uk
peter@ficc.ferranti.com (Peter da Silva) (10/20/90)
In article <152323@felix.UUCP> asylvain@felix.UUCP (Alvin "the Chipmunk" Sylvain) writes: > Nope. C (at least) allows for variables to be 'static'. No need > for side-effects to maintain the function's internal state. Those *are* side-effects, since they mean the same function may return different values on successive calls with the same calling sequence. This has the same effects on predictably and optimisation as more obvious side effects. -- Peter da Silva. `-_-' +1 713 274 5180. 'U` peter@ferranti.com
pcg@aber-cs.UUCP (Piercarlo Grandi) (10/22/90)
In article <1990Oct15.175924.14455@maths.nott.ac.uk> anw@maths.nott.ac.uk (Dr A. N. Walker) writes: In article <1990Oct10.195202.2340@d.cs.okstate.edu> norman@d.cs.okstate.edu (Norman Graham) writes: >[...] you may want to read this >Miranda definition of a 3-ary tree (just to be different). I keep trying to get interested in Miranda, but "they" want real money for it, which is bad news for a Maths Dept. Try Haskell or SML. Not terribly different. Much the same technology. > three_tree ::= Leaf num | Node int three_tree three_tree three_tree Well, that's fine for an acyclic structure [though not the way I usually *think* of trees] and a lazy evaluator, but it doesn't make sense to me in the form three_graph ::= Node int three_graph three_graph three_graph Where does it all end, even lazily? Ahem. It does not end. There is no problem though. It is just an incorrect definition. Let's rewrite it like this: graph = NULL: empty | roots roots = node | roots node node = NODE: info LINKS: nodes nodes = empty | nodes node Observe that this is very much like a syntax. Indeed you can use a 2 level grammar to model anything at all (book by Tzischritis and Uzgalis), and it does generate infinite syntaxes (needed to model context dependency), not just infinite syntax trees (needed to model graphs). This is not a difficult, because you never need to actually expand the grammar in an infinite way, given a finite data strcture or program. How do I express the identity of one node with another [ie, perhaps the same node, but reached via some chain of edges]? How do I *draw* it? This is a thing on which I have already commented, but I love repeating myself. The correct way to implement identity is by contents, not location. If you encode part of the contents in the location, that's an implementation choice, not a necessary thing. It is also not very nice. What is *anyone*'s objection to MODE NODE = STRUCT (INT info, REF NODE left, right, middle) ? That instead of manipulating just values of type NODE you also have to deal with values of type REF NODE. This complicates life considerably. Consider the relational version in some pseudo notation: domain source,target: SOMETYPE left(source*,target) right(source*,target) middle(source*,target) Now you say: but this means that I cannot have two nodes with the same info! Precisely. We are *assuming* that info identifies a node. Put into info whatever detail serves to identify the node. Then you say: but this is inefficient! the same info is replicated up to six times! the links between a node and its neichbours are spread in three relations! This is matter for the implementation. It can keep a table of infos, and keep the three relations clustered either by first or second column, depending on which is thought ot be the preferred join column. Note that in your pointer based *implementation* the join column is assumed to be the first, because your NODE struct is clustered physically on outgoing pointer, not incoming pointer; you would have to write MODE NODE = STRUCT(INT info, FLEX [] REF NODE incoming) to implement the graph clustered by incoming pointer. Indeed you can model any graph in relational database technology by saying something like domain node : SOMETYPE domain link : OTHERTYPE source(node,link*) target(link*,node) This also clearly models the notorious symmetry between nodes and links, and that links are not just pointers in the general case. It all depends on what you really want. Data design is more difficult and less clear with pointers, but implementations are more efficient, all because pointers require less levels of abstractions. Note that nobody I know (except for me I mean :->) addresses the issue of data design in programming languages, yet many programs deal with internal databases with considerably more complicated implied schemas than those of many DMBS applications. -- Piercarlo "Peter" Grandi | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
lowry@arnor.uucp (10/22/90)
In article <1990Oct19.160210.9787@maths.nott.ac.uk>, anw@maths.nott.ac.uk (Dr A. N. Walker) writes: |> There are also some problems with compilers that change |> algorithms behind the programmer's back, depending on some complicated |> analysis. Suppose my program has a bug that is benign in some |> implementations, but not in others. Lo and behold, all my test programs |> work, the production run fails, and as soon as I try to isolate the bug, |> it goes away. (Yes, I know we all have bugs like that sometimes!) You're right, I have experienced many such bugs in my time. I can see such a scenario coming about for three reasons: (1) there is a bug in the program, but in some implementations it does no damage and has no visible effect on the program (e.g. the program fails to initialize a counter to zero before referencing it, but since the operating system happens zero static variables during program load, things work out); (2) the program has different resource requirements under different implementations, and therefore the two implementations act differently in the face of resource depletions; (3) the compiler introduces a bug when applying its transformations. Category 1 is addressed in Hermes by a new compile-time mechanism we call "typestate checking." We use dataflow analysis techniques to track static characteristics of variables such as their initialization state and the case of variants. Each operation of the language has typestate preconditions, which define what typestate conditions must hold for the operation's operand variables, and typestate postcondition rules which define how the operation affects the operand typestates. The compiler uses the postconditions to compute a fixed-point typestate at every point in the program, and rejects any program that attempts an operation with operands that are not in the correct typestate. In addition, when two or more program paths merge, a meet operation (in the typestate lattice) is performed to compute a single typestate for the merge point, and "coercion" operations are inserted on all the incoming paths to lower their individual typestates to the computed meet. One effect of the coercions is that the program is augmented at compile time to automatically dispose of its data values at appropriate points. We get automatic garbage collection without the run-time cost normally associated with this feature. Typestate checking elimintates category 1, because it makes it impossible to write programs that do not have a precise, implementation-independent meaning. Thus, in the absence of compiler bugs, and given adequate resources, one is guaranteed that two implementations of the same buggy program will both exhibit the bug. Category 2 is a difficult problem. If memory is tight, an implementation that keeps lots of auxiliary data structures to speed certain operations may fail whereas a slower implementation that requires less memory will succeed. In Hermes we provide a built-in exception called "Depletion" that is meant to handle resource depletions, such as: memory depletion, computing a numerical value exceeding hardware's range or accuracy limits, insufficient sockets available for network communications, etc. The Hermes programmer is not spared these exceptions (though we have some ideas on how this might be achieved). Implementations are free to use whatever tricks are available to avoid depletions (like swapping real memory to disk, migrating data to other network sites, using extended-precision arithmetic software, etc.), but we don't make it a requirement for language conformance. The pragma mechanism I've mentioned before may be able to address many of these situations by allowing the programmer to encourage the compiler, for example, to be sparing in its memory utilization. Category 3 is a problem in any language except machine language (even assemblers can introduce addressing errors). The best I know how to do is to continue paying especially close attention to compilers and other low-level software. Adopting languages and methodologies for this software that reduce the frequency of bugs (like typestate checking) will help, of course. -- Andy Lowry, lowry@ibm.com, (914) 784-7925 IBM Research, 30 Saw Mill River Road, P.O. Box 704, Yorktown Heights, NY 10598
chl@cs.man.ac.uk (Charles Lindsey) (10/22/90)
In <2958@igloo.scum.com> nevin@igloo.scum.com (Nevin Liber) writes: >A related question: why _don't_ languages such as LISP and Icon (and >probably Smalltalk; I haven't used it enough to be able to comment on >it) have pointers on the programming level? Smalltalk DOES have pointers on the programming level. Its just that everything you declare (apart from a few specials like integers) is a pointer whether you like it or not, so you do not notice the fact until some unintended aliasing occurs. Eiffel is (more or less) the same.
cik@l.cc.purdue.edu (Herman Rubin) (10/23/90)
In article <5EJ64J3@xds13.ferranti.com>, peter@ficc.ferranti.com (Peter da Silva) writes: > In article <152323@felix.UUCP> asylvain@felix.UUCP (Alvin "the Chipmunk" Sylvain) writes: > > Nope. C (at least) allows for variables to be 'static'. No need > > for side-effects to maintain the function's internal state. > > Those *are* side-effects, since they mean the same function may return > different values on successive calls with the same calling sequence. This > has the same effects on predictably and optimisation as more obvious side > effects. I have yet to see a random number. or pseudo-random number, procedure which did not exploit this. The same is true for uses of buffers, reading external media, etc. It is also the case when one makes calls by reference, and uses code to change the values of the arguments. This even applies to a subroutine to multiply two matrices. This means it is the programmer who must decide, and pass on the information to the compiler, about the side-effects. Sometimes, but not always, the compiler can tell by looking at the global code. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet) {purdue,pur-ee}!l.cc!cik(UUCP)
anw@maths.nott.ac.uk (Dr A. N. Walker) (10/26/90)
In article <2062@aber-cs.UUCP> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes: >In article <1990Oct15.175924.14455@maths.nott.ac.uk> anw@maths.nott.ac.uk >(Dr A. N. Walker) writes: [ of a recursive pseudo-definition of regular graphs ] > Where does it all end, even lazily? > >Ahem. It does not end. There is no problem though. It is just an incorrect >definition. Let's rewrite it like this: [ gives syntactic definition of a graph ] Well, that's fine, and one way I might well do it myself, but it seems to be not the way that other people are telling me I ought to do it! > How do I express the identity of one node with another [ie, perhaps the > same node, but reached via some chain of edges]? How do I *draw* it? > >This is a thing on which I have already commented, but I love repeating >myself. The correct way to implement identity is by contents, not location. But why should I have to invent a contents, just so that I can avoid referring to location? If I draw a graph with lots of nodes, and put some information (perhaps a colour, RED, GREEN, BLUE, ...) in each, I don't necessarily want the contents to be unique. On the other hand, if I'm explaining my drawing to you, I can point to nodes: "Look, *this* node connects to *that*, and so there's a path from *here* to *there*". There's nothing wrong with location! > What is *anyone*'s objection to > > MODE NODE = STRUCT (INT info, REF NODE left, right, middle) ? > >That instead of manipulating just values of type NODE you also have >to deal with values of type REF NODE. This complicates life considerably. But in almost every programming language, in addition to values of type INT, I have to deal with values of type REF INT (integer variables in most languages). Yes, it complicates life, but it's usually thought to be worth it. Extending to REF REF INT (pointers), and other REF types, generalises and simplifies the construct. It's just a shame that so many programmers fail to appreciate this (often because their first language managed to muddy the waters). -- Andy Walker, Maths Dept., Nott'm Univ., UK. anw@maths.nott.ac.uk
pcg@cs.aber.ac.uk (Piercarlo Grandi) (10/31/90)
On 26 Oct 90 15:59:37 GMT, anw@maths.nott.ac.uk (Dr A. N. Walker) said: anw> How do I express the identity of one node with another [ie, perhaps anw> the same node, but reached via some chain of edges]? How do I anw> *draw* it? pcg> This is a thing on which I have already commented, but I love pcg> repeating myself. The correct way to implement identity is by pcg> contents, not location. anw> But why should I have to invent a contents, just so that I can anw> avoid referring to location? You are not inventing a content -- what you call "location" here is part of the contents, it is the relationship between a content and another. It is also a content, at some higher level of abstraction. anw> If I draw a graph with lots of nodes, and put some information anw> (perhaps a colour, RED, GREEN, BLUE, ...) in each, I don't anw> necessarily want the contents to be unique. By definition, if two entities have the same contents they are the same entity, unless they are elementary particles and Bell's Theorem applies :-). What actually happens is that then you put only part of the contents in the actual data storage. anw> On the other hand, if I'm explaining my drawing to you, I can point anw> to nodes: "Look, *this* node connects to *that*, and so there's a anw> path from *here* to *there*". There's nothing wrong with location! Then part of the contents is in the location. This is just an implementation technique, not a concept; and it is an implementation technique that complicates matters. When I design an entity I want to be able to know whether it is the same or different from another entity. I want the entity to contain all information that distinguishes it from any other. If I do not encode information in the location I can just look at the the entities and compare them; if I encode part of the information in the location I also have to compare their locations. This adds a whole new world of complication, because we need no longer just an algebra for objects, but also an algebra for locations, and we must prove that a program not only works ok wrt to contents, but also to locations, and that the encoding of parts of the contents in the location is preserved correctly. This is worth the while if we are working close to the machine, because that is how the machine is built. It is crazy if we are using an high level language, like SQL or APL or whatever else like it. -- Piercarlo "Peter" Grandi | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
tom@nw.stl.stc.co.uk (Tom Thomson) (11/01/90)
I cannot understand why Jim Giles wants pointers to have values which are addresses. The important thing about a pointer is simply that it identifies (unambiguously) the thing to which it refers. The only operations on a pointer are the (a) getting the thing to which it refers; (b) modifying the value of the thing to which it refers; (c) checking whether two pointers refer to the same thing. There is of course no operation that makes the pointer point to something else (that's an operation on a pointer-valued variable which assigns a new pointer to it; think of an integer-valued variable - - if x has the value 3, and the assignment x:=4 is executed, this does not assign the value 4 to the integer 3; pointers are no different from integers in this respect - - and yes, I have come across the broken Fortran compilers that allow you to assign a new value to an integer constant :-) ). There is no earthly reason why a pointer should support any of the extra operations Jim says it has to support (eg ordering), and certainly no reason why an implementation should be constrained to implement it as an address. I wonder if all OO languages implement references to an object as addresses? After all, a reference to an object is no more or less than a pointer to it. Do SQL systems implement all identifiers (keys which are constrained to be unique) as addresses? Well, I suppose that depends on what you mean by an address - - if you regard tables as being content-addressed they do! (PierCarlo seems to advocate that all identification of objects should be by content - - fair enough at some level of description, but it's nice to have syntax which allows us to distunguish a part of the content which can be used to identify the object and call that part a pointer and have nice syntax for using it as the access route to the object; and he appears to object to that). Jim's arrays and indices down them are all very well, but they certainly do not solve the aliasing issue he claims they solve; if I write a[i] := a[j] the compiler does not in general know whether i and j have different values, ie whether i and j are distinct. Once you start using an array of cells to implement a graph structure, with indices used as pointers, you have exactly as much information about aliasing as if you used pointers instead of indices - no more, and no less; however you as the programmer now have the problem of assigning the cells positions in the array, doing your own garbage-collection as cells become free, coping with types whose values don't all occupy the same amount of store (so a cell may occupy multiple slots in the array) and so on; to get the language system to hide all that from you and do it automatically, you need pointers. It's easy to see, of course, that all Jim's arrays do is classify pointers according to whether they $d that all Jim's arrays do is partition the store into a set of non-overlapping regions - so a pointer written a[i] is known to be distinct from a pointer written b[i]; but there is no difficulty at all in partitioning the store and then constraining pointer-valued variables to point into one partition or another: so if p1 is a pointer constrained to point into region r1 and p2 is a pointer constrained to point into region r2 the compiler knows that p1 and p2 are not aliases of each other. There may be a case for low-level languages to have a pointer construct which is constrained to be mapped as an address; for most programming purposes, I would prefer to work in a high level language. I claim that any high level language (other than a single assignment language) must have a pointer concept (although it may call it something else, and may use search on content as the pointer resolution mechanism) since otherwise it cannot express referential sharing and identity - - and if I have mutable objects (ie not a single assignment language) then I must be able to determine which refers to which so that I control whose references are mopdified by an update (incidentally, the implementation may use accidental sharing to optimise storage usage, and move one of the obkjects sharing store when it changes but the other objects sharing that store don't. If the implementation of pointer were address, as Jim insists, the representation of the pointer to that object - and all copies of that representation - would have to change even though the pointer has not changed, it still references the same object. so implementations which use accidental sharing all violate Jim's definition of pointer.) Tom Thomson [tom@nw.stl.stc.co.uk
gl8f@astsun9.astro.Virginia.EDU (Greg Lindahl) (11/02/90)
In article <3673@stl.stc.co.uk> "Tom Thomson" <tom@nw.stl.stc.co.uk> writes: >Jim's arrays and indices down them are all very well, but they certainly >do not solve the aliasing issue he claims they solve; if I write > a[i] := a[j] the compiler does not in general know whether i and j >have different values, ie whether i and j are distinct. Er, actually the example I use from my code looks like this: a[i] = b[i] / c[i]; d[i] = b[i] * c[i]; Fortran only loads b[i] and c[i] once. C, if you confuse the issue by using pointers, will generally load b[i] and c[i] twice. If the locations of b or c are passed in, no C compiler that I know of will optimize it. You can talk about the general case, and you can debate the definition of a pointer, but don't make my code run 30% slower.
anw@maths.nott.ac.uk (Dr A. N. Walker) (11/03/90)
In article <PCG.90Oct30201101@teachk.cs.aber.ac.uk> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes: [Well, what he wrote, I found rather confusing (even assuming that he means by Bell's Theorem what I mean by Bell's Inequalities). I *think* it amounted to the following: pcgp> You shouldn't write "node := GREEN; ptr_to_node := node" because pcgp> "node" then doesn't include its location. You should have instead pcgp> "node := { GREEN, node }" and then an associative search for pcgp> "{ GREEN, node }" will find node without the need for pointers. (pcgp means PierCarlo Grandi Paraphrase). In the following direct extract, "This" means the version that I *shouldn't* write.] >This adds a whole new world of complication, because we need no longer >just an algebra for objects, but also an algebra for locations, and we >must prove that a program not only works ok wrt to contents, but also to >locations, and that the encoding of parts of the contents in the >location is preserved correctly. But if content has to include location, then an algebra for objects has to include an algebra for location, so the algebra and the proofs already had to be supplied. Furthermore, if I am denied access to the algebra of locations, then some useful ideas like anonymous objects, and like remembering locations, become inexpressible. In other words, either the PCGP bears no relationship to what PCG *meant* to write, or else PCG's article was self-evident nonsense. At the end of a difficult week, I'm in no fit state to decide which, but I have my suspicions. -- Andy Walker, Maths Dept., Nott'm Univ., UK. anw@maths.nott.ac.uk
pcg@cs.aber.ac.uk (Piercarlo Grandi) (11/04/90)
On 2 Nov 90 17:25:08 GMT, anw@maths.nott.ac.uk (Dr A. N. Walker) said:
anw> In article <PCG.90Oct30201101@teachk.cs.aber.ac.uk> pcg@cs.aber.ac.uk
anw> (Piercarlo Grandi) writes:
pcgp> You shouldn't write "node := GREEN; ptr_to_node := node" because
pcgp> "node" then doesn't include its location. You should have instead
pcgp> "node := { GREEN, node }" and then an associative search for "{
pcgp> GREEN, node }" will find node without the need for pointers.
anw> (pcgp means PierCarlo Grandi Paraphrase).
Not exactly. You should write "node := (GREEN,node id);", where "node
id" is anything you use to know a node from another. For example, its
list of incoming arcs. I would like to repeat my way to describe graphs
in a relation database:
type node : node_info; # each node contains some info #
type link : link_info; # each link contains some info #
relation in(link*,node*); # * means this fields is part ... #
relation out(node*,link*); # ... of the primary key #
Note that many links or nodes may have the same info, but a specific
link or node will be uniquely identified by context; if it cannot, e.g.
a two node graph like this:
"B"
.-------------.
"A" "A"
\_____________/
"B"
then the graph is inherently ambiguous. It can be disambiguated by
noting that the "A" and "B" information content of the nodes is
incomplete, and should be really:
("B",top)
.-------------------------------.
("A",left) ("A",right)
\_______________________________/
("B",bottom)
This representation also makes it obvious that each graph has a dual,
i.e. that it is an essentially symmetric structure, that the first
version of the example above is inherently ambiguous (and indeed a
relational database will not allow you to represent it), etc..., all
facts that are lost using pointers.
anw> In the following direct extract, "This" means the version that I
anw> *shouldn't* write.]
pcg> This adds a whole new world of complication, because we need no longer
pcg> just an algebra for objects, but also an algebra for locations, and we
pcg> must prove that a program not only works ok wrt to contents, but also to
pcg> locations, and that the encoding of parts of the contents in the
pcg> location is preserved correctly.
anw> But if content has to include location, then an algebra for objects
anw> has to include an algebra for location, so the algebra and the
anw> proofs already had to be supplied.
You got it the other way round: if pointers make a difference, this can
only mean that part of an object's content are implicit in its in-memory
location, not viceversa (what actually happens is that location includes
a bit of content). If the information encoded in the pointer value were
explicit in the object's contents, pointer values would become
irrelevant, and only object contents would matter; there would not be
any need for comparison operations on pointer values, only the usual
ones on contents.
Back to basics: objects in memory are representations for more abstract
entities. The representation is based on encoding, usually encoding onto
the integers (Hoare's essay in "Structured Programming").
The idea is that if two objects have the same content ("value" when seen
as an integer) they represent the same entity. You might say "no,
because they are at different locations, and I can change one but not
the other which makes them different".
This is pure nominalism; it simply implies that you have encoded part of
the representation in the storage occupied by the object and part in its
address. You never *need* to do so; you can always encode it explicitly,
and often it is better to do so, because it frees you from worry about
locations.
An example where pointers make a difference:
You have two lists, one of names of american states and another of
nuclear submarines. If you get a pointer to the string "Ohio" you don't
know if you are referring to a submarine or a a state, unless you
implement the two lists in two different memory areas of which you know
the bounds, and look at the pointer value to see in which area it falls.
You have encoded the type of the name, a boolean, in the address. You
could have encoded it in the strings, e.g. by writing "ohio" for
submarines and "OHIO" for states. If you had done this you would only
need to understand the encoding rules for strings to understand your
program; encoding the type of the name in its address may save you some
space, but means that you also have to deal with pointer values, not
just data values, because pointer values have become meaningful (laden
with information).
Now, in general the "ohio"/"OHIO" representation is safer, for example
because the fact that the Ohio in question is the submarine and not the
state is location independent, that means that I can copy the string
around and not lose any bits of information. This is what I mean by
saying that the algebra of locations is not dragged into the act,
because no information is encoded in locations.
anw> Furthermore, if I am denied access to the algebra of locations,
anw> then some useful ideas like anonymous objects, and like remembering
anw> locations, become inexpressible.
They are most definitely not useful at the application level, only at a
purely implementation oriented level. They add complications to
reasoning about representations, even if they save some resources. In
particular, in a value based systems all objects are anonymous; they are
only identified by contents, not by context.
You have a claim that pointers are necessary, if I understand correctly.
My observations are:
1) To represent information, pointers are never necessary and when used
as such are actually hazardous. From the application point of view all
information had better be represented explicitly with encoding in the
contents of an object, as in a relational database.
2) Pointers *can* be used to represent information, and the relative
hazard (extra complicated encoding rules) can be cost effective; the
more so the nearer we are to the machine, because it can conserve
storage (information bits are implied by the address) and speed (you
need not to fetch information bits from memory, you can just deduce them
from the address).
But, at the application level usually these concerns are irrelevant.
What you want to do is to represent entities, not be concerned about how
the relevant encoding is done. In building systems programs your
concerns are reversed, so pointers are useuful.
--
Piercarlo "Peter" Grandi | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/06/90)
In article <1990Nov2.052815.22188@murdoch.acc.Virginia.EDU> gl8f@astsun9.astro.Virginia.EDU (Greg Lindahl) writes: [ on one type of aliasing problem ] > a[i] = b[i] / c[i]; > d[i] = b[i] * c[i]; [ Fortran sees separate arrays and loads b[i] and c[i] only once ] [ C sees possible aliasing and loads them twice ] Hmmm. I would almost automatically optimize the above into { register double bi = b[i]; register double ci = c[i]; a[i] = bi / ci; d[i] = bi * ci; } Don't ask me to philosophize about this fragment. > but > don't make my code run 30% slower. Hey, dude, write it 30% faster in the first place. ---Dan
cik@l.cc.purdue.edu (Herman Rubin) (11/06/90)
In article <3673@stl.stc.co.uk>, tom@nw.stl.stc.co.uk (Tom Thomson) writes: > I cannot understand why Jim Giles wants pointers to have values which > are addresses. The important thing about a pointer is simply that it > identifies (unambiguously) the thing to which it refers. The only > operations on a pointer are the (a) getting the thing to which it refers; > (b) modifying the value of the thing to which it refers; (c) checking > whether two pointers refer to the same thing. In producing various types of random variables efficiently, it is desirable to use buffers and refill subroutines. Now each buffer and each refill routine can be changed whenever the refill is called, and even more can be considered. Thus the communication scheme is if(buffer too close to empty){ *rprog(info_pointer); change info_items in registers;} info_pointer points to an array which has the information on the buffer needed for use, and rprog contains the location of the array to be called. It is true that one could use associative calls, locations, etc., but how wise is it? We have loaders to insert pointers to things external to the subprogram in it. And unless we use an associative memory, somewhere pointers will have to be used. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet) {purdue,pur-ee}!l.cc!hrubin(UUCP)
jlg@lanl.gov (Jim Giles) (11/07/90)
From article <2440:Nov607:32:5890@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein): > In article <1990Nov2.052815.22188@murdoch.acc.Virginia.EDU> gl8f@astsun9.astro.Virginia.EDU (Greg Lindahl) writes: > [ on one type of aliasing problem ] <> a[i] = b[i] / c[i]; <> d[i] = b[i] * c[i]; < [ Fortran sees separate arrays and loads b[i] and c[i] only once ] < [ C sees possible aliasing and loads them twice ] < Hmmm. I would almost automatically optimize the above into < {register double bi = b[i]; < register double ci = c[i]; < a[i] = bi / ci; > d[i] = bi * ci;} So would the compiler (a good one anyway). The problem is that you are shifting the burden to the programmer (who has enough on his plate just getting his code to run correctly). He has no time for these niggly little optimization issues - especially those that the compiler _should_ solve for him. To be sure, the programmer may have time to fine-tune some tune some time-critical fragments of his code: but in that case he should be able to work on the tough issues that the compiler _can't_ do for him. J. Giles
gl8f@astsun.astro.Virginia.EDU (Greg Lindahl) (11/08/90)
In article <2440:Nov607:32:5890@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >> [ I write: ] >> a[i] = b[i] / c[i]; >> d[i] = b[i] * c[i]; > [ Fortran sees separate arrays and loads b[i] and c[i] only once ] > [ C sees possible aliasing and loads them twice ] > >Hmmm. I would almost automatically optimize the above into > > { > register double bi = b[i]; > register double ci = c[i]; > > a[i] = bi / ci; > d[i] = bi * ci; > } Great, you've just doubled the # of lines of code. This is known as "creating a maintenance nightmare". Now take one of my real loops, with 3 lines of code referencing a dozen or so array elements, including things like a[i], a[i-2], etc. How do you pick the names? How do you keep the programmer from making a mistake when they modify one of the lines? I don't see why Dan thinks I should use a hard-to-optimize language for numerical programming, which requires the programmer to do common sub-expression elimination, when I can use a simple and fast language instead. Having the compiler do a simple optimization seems much more reliable than having the programmer optimize and maintain the code segment, or have a compiler for an unnecessarily compliated language compile the code segment. By the way, this code has loops which can be unrolled 3 or 5 times and keep loaded array values from previous iterations for future iterations. Writing this out by hand is a nightmare when you have to permute the temporary register variables. But it's not that difficult to add such a feature to most Fortran compilers.
anw@maths.nott.ac.uk (Dr A. N. Walker) (11/09/90)
In article <PCG.90Nov3185931@athene.cs.aber.ac.uk> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes: >Note that many links or nodes may have the same info, but a specific >link or node will be uniquely identified by context; if it cannot, e.g. >a two node graph like this: > > "B" > .-------------. > "A" "A" > \_____________/ > "B" > >then the graph is inherently ambiguous. Well, to use your notation of "top", etc., I would describe such a graph by code somewhat like NODE top = ("B", (left, right)), bottom = ("B", (left, right)), left = ("A", (top, bottom)), right = ("A", (top, bottom)); in which both the disambiguation and the duality are clear. Of course, this is somewhat like your database. But "top" itself does not include a description of its own location -- that is a matter for the compiler. >You got it the other way round: if pointers make a difference, this can >only mean that part of an object's content are implicit in its in-memory >location, not viceversa (what actually happens is that location includes >a bit of content). No: it means that location can be used to disambiguate objects without the need for programmer invention. >anw> Furthermore, if I am denied access to the algebra of locations, >anw> then some useful ideas like anonymous objects, and like remembering >anw> locations, become inexpressible. > >They are most definitely not useful at the application level, They most definitely are! When I give people directions to my house, I include things like: "Take the third on the right, ...", "If you get to the Hemlock Stone, you've gone too far, you should ...", and "If you get lost, ask for Priory Island, and try again from there". I don't want to name all the junctions, so the first two on the right are anonymous; but some key locations need to be remembered. This is all quite similar to the usage in "pic", Brian Kernighan's graphics language: arrow dashed from last circle.left to last box line up from top of Here arc from start of 2nd line to 2/3 of the way between A and B and so on. >You have a claim that pointers are necessary, if I understand correctly. That's an exaggeration. I managed without pointers perfectly well until 1972, and wrote [IMHO] much good code by the standards of those days. But I (and almost everyone else) also managed without characters [except as "caption"s], structures, enumerations, sets, not to mention files, editors, discs, terminals, .... All of these things have greatly increased my productivity, and I wouldn't want to turn back the clock. >What you want to do is to represent entities, not be concerned about how >the relevant encoding is done. Just so, which is exactly why I am happy for my entities to include pointers to other entities, and to let the compiler worry about how it is actually done in machine code. -- Andy Walker, Maths Dept., Nott'm Univ., UK. anw@maths.nott.ac.uk
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/09/90)
In article <1990Nov8.024049.9731@murdoch.acc.Virginia.EDU> gl8f@astsun.astro.Virginia.EDU (Greg Lindahl) writes: > In article <2440:Nov607:32:5890@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: [ a[i] = b[i] / c[i]; d[i] = b[i] * c[i] ] > >Hmmm. I would almost automatically optimize the above into > > { > > register double bi = b[i]; > > register double ci = c[i]; > > a[i] = bi / ci; > > d[i] = bi * ci; > > } > Great, you've just doubled the # of lines of code. This is known as > "creating a maintenance nightmare". I thought we discussed this before? The way you maintain hand-optimized code is to keep around the original, clean, algebraic version, as well as the new, ugly, optimized version. Whenever the original, clean, algebraic version is changed, the new, ugly, optimized version must be rewritten. You do the same thing with automated source-code optimizers. > Now take one of my real loops, with 3 lines of code referencing a > dozen or so array elements, including things like a[i], a[i-2], etc. > How do you pick the names? How do you keep the programmer from making > a mistake when they modify one of the lines? I pick the names by very logical conventions: ai, aim2, etc. It's *very* easy to remember. As for maintenance, see above. > I don't see why Dan thinks I should use a hard-to-optimize language > for numerical programming, which requires the programmer to do common > sub-expression elimination, when I can use a simple and fast language > instead. I don't recommend that you switch from Fortran to C, unless you need something that the latter provides. I don't use Fortran simply because I can't stand the maintenance hell when I mistype a variable name. But if you don't mind that occasional problem, and if you don't need the system services or dynamic allocation of C, certainly stick to Fortran. > By the way, this code has loops which can be unrolled 3 or 5 times and > keep loaded array values from previous iterations for future > iterations. Array unrolling is so mechanical that I would trust the compiler to do it (and I wish optimizer writers would concentrate on taking hints from the programmer and doing this well, rather than complex optimizations that humans still do a much better job with!) > Writing this out by hand is a nightmare when you have to > permute the temporary register variables. It isn't that bad. I regularly unroll loops several times. The secret is macros. Only one of the compilers I use actually understands unrolling, and I often get 30% improvements on the others with very little work. ---Dan
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/09/90)
In article <1990Nov8.174042.24789@maths.nott.ac.uk> anw@maths.nott.ac.uk (Dr A. N. Walker) writes: > >What you want to do is to represent entities, not be concerned about how > >the relevant encoding is done. > Just so, which is exactly why I am happy for my entities to > include pointers to other entities, and to let the compiler worry > about how it is actually done in machine code. Exactly! I really do think about data structures as boxes containing other boxes, some of which contain arrows to boxes. When I draw them, that's how I draw them, not as some abstract ``list'' or ``tree'' or ``recursive data structure'' (not that I know how you draw those). ---Dan
pcg@cs.aber.ac.uk (Piercarlo Grandi) (11/10/90)
On 8 Nov 90 17:40:42 GMT, anw@maths.nott.ac.uk (Dr A. N. Walker) said: anw> Furthermore, if I am denied access to the algebra of locations, anw> then some useful ideas like anonymous objects, and like remembering anw> locations, become inexpressible. pcg> They are most definitely not useful at the application level, anw> They most definitely are! When I give people directions to my anw> house, I include things like: "Take the third on the right, ...", anw> "If you get to the Hemlock Stone, you've gone too far, you should anw> ...", and "If you get lost, ask for Priory Island, and try again anw> from there". This is an *implementation*. I think we have very different ideas on what is application design and what is implementation design, and in particular between data design and representation design. More precisely, IMNHO you don't make much distinction between the two aspects. Unfortunately. Just to restate my position, while we are talking past each other: when doing data design pointers are extraneous and therefore unwelcome details (and so Chris Holt is right). When doing representation design they are an invaluable technology (and so you are right too). For application oriented languages (SQL, Hermes) pointers are questionable; for implementation oriented languages (C, Algol 68) pointers are important (and Andy Lowry is somehow right on both accounts too). -- Piercarlo Grandi | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
pcg@cs.aber.ac.uk (Piercarlo Grandi) (11/10/90)
On 9 Nov 90 06:24:07 GMT, brnstnd@kramden.acf.nyu.edu (Dan Bernstein) said:
brnstnd> In article <1990Nov8.174042.24789@maths.nott.ac.uk>
brnstnd> anw@maths.nott.ac.uk (Dr A. N. Walker) writes:
(probably me)> What you want to do is to represent entities, not be
(probably me)> concerned about how the relevant encoding is done.
anw> Just so, which is exactly why I am happy for my entities to include
anw> pointers to other entities, and to let the compiler worry about how
anw> it is actually done in machine code.
Pah. pointers are just a thin veneer on the underlying implementation.
The thing you want to implement is a *relationship*; a pointer is a way
to implement a relationship, and an address is a way to represent a
pointer. Both are by no means the only choices. When designing the
application, neither level is necessary or useful (except that xertain
8design* chocies may be legitimately be influenced by the expected
relative costs of the possible implementations and representations).
brnstnd> Exactly! I really do think about data structures as boxes
brnstnd> containing other boxes, some of which contain arrows to boxes.
brnstnd> When I draw them, that's how I draw them, not as some abstract
brnstnd> ``list'' or ``tree'' or ``recursive data structure'' (not that
brnstnd> I know how you draw those).
Ah no, this cannot pass. Lists and trees are *implementations*, and when
you draw them you draw specific encodings of such implementations.
Designs are quite a different thing.
The way *I* think about data structures as typed mappings over entities
(entity-category-relationship!), and then I wonder which is the best
implementation for the mapping, based on things like time and space
complexity for the most frequent operations needed, cost of
encoding/decoding, cardinality of the domains, sparseness (trees and
lists are both good for sparse mappings, trees are faster for random
access, lists have smaller space overhead, etc... -- boxes enter the
picture only when I have to represent the implementations).
Please, please have a reread of Hoare's seminal paper on data
abstraction in "Structured Programming", Academic Press. And
when you have finished it, go read the next essay on control
abstraction by Dahl. Another very recommended source is
Wiederhold's "Database Design", McGraw Hill, second part (the
first part is about implementation and representation).
IMNHO we are still quite far apart on having compatible notions of the
distinction between design and implementation, and the difference
between the design of an implementation (a pointer that is encoded as an
address) and the implementation of a design (a pointer that represents a
relationship).
Abstraction level mixup, I am afraid. I wish that data design were
considered a fundamental discipline, not just an esoteric trick for
DBAs. Most large programs have more sophisticated data designs than many
databases, yet DBAs are told about data design 9with a smattering of
implementation) and programmers are only told about representations
(with a smattering of implementation).
--
Piercarlo Grandi | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
gudeman@cs.arizona.edu (David Gudeman) (11/12/90)
In article <PCG.90Nov10155309@odin.cs.aber.ac.uk> Piercarlo Grandi writes:
]
]Pah. pointers are just a thin veneer on the underlying implementation.
]The thing you want to implement is a *relationship*; a pointer is a way
] to implement a relationship
Pah. Pointers are a direct representation of a fundamental way to
visualize relations. The thing I want to implement is a relationship
that I am visualizing as a pointer.
You are abusing the term "implement". The correct way to say this is
that a pointer is one way to _represent_ or _visualize_ a
relationship. "Implement", implies a machine implementation, and it
is just false to suggest that people would not visualize pointers if
they were not implementing things on machines. The concept of one
object referencing another pre-dates computers.
]Ah no, this cannot pass. Lists and trees are *implementations*, and when
]you draw them you draw specific encodings of such implementations.
No, this cannot pass. Lists and trees are mental structures that
humans use to visualize things. You can call them relations, you can
"implement" them mathematically as relations, but that does not change
the fact that they are in fact fundamental forms of visualization.
And when we draw them we certainly do not draw specific encodings.
In a list, X follows Y because our mental image of the list has X
following Y. In a tree, there is an arrow from X to Y because in our
mental image X is related to Y by the relationship that the arrow
represents.
]The way *I* think about data structures as typed mappings over entities
](entity-category-relationship!),
I think you visualize data structures geometrically like everyone
else. Then you mentally convert these natural visualizations into
unatural mathematical abstractions and pretend that the abstraction is
the original.
There is some justification for this in mathematics, in the case that
mathematicians are trying to prove that the structures they visualize
are well-founded. But once a certain class of structures and
operations on those structures is known to be well-founded, there is
no reason to keep "implementing" them in terms of baroque mathematical
abstractions.
--
David Gudeman
Department of Computer Science
The University of Arizona gudeman@cs.arizona.edu
Tucson, AZ 85721 noao!arizona!gudeman
peter@ficc.ferranti.com (Peter da Silva) (11/14/90)
Piercarlo Grandi: > ]Ah no, this cannot pass. Lists and trees are *implementations*, and when > ]you draw them you draw specific encodings of such implementations. David Gudeman: > No, this cannot pass. Lists and trees are mental structures that > humans use to visualize things. While I'm generally in the bit-bangers camp, I must agree with Piercarlo here. The basic things that people work with are "chunks of data". Lists, trees, tries, and so on are methods of implementing these databases in efficient ways. There's nothing fundamental about them. Unfortunately... in the real world they *are* vital. The problem is that there's no language I know of that really lets you get away with writing your program in terms of relations and get within an order of magnitude in speed of a well-written lower level implementation. The languages that so let you work at this level tend to be DBMSes and those database language/code generator hybrids called 4GLs. And they're all slow as molasses in January. -- Peter da Silva. `-_-' +1 713 274 5180. 'U` peter@ferranti.com
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/14/90)
In article <-M.6315@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes: > Piercarlo Grandi: > > ]Ah no, this cannot pass. Lists and trees are *implementations*, and when > > ]you draw them you draw specific encodings of such implementations. > David Gudeman: > > No, this cannot pass. Lists and trees are mental structures that > > humans use to visualize things. > While I'm generally in the bit-bangers camp, I must agree with Piercarlo here. > The basic things that people work with are "chunks of data". Lists, trees, > tries, and so on are methods of implementing these databases in efficient > ways. There's nothing fundamental about them. I am in the bit-bangers camp and must agree with Dave here. Although it would be nice to imagine the computer as accepting some abstract, mathematical version of what I wanted to compute and spitting out some results, that's not how it works on the lowest levels, or on any level showing any efficiency. I am translating a computation into a form that a machine can understand. That final form may not be ``fundamental,'' but it *is* the mental structure that I have to work with. In real programming, I find myself working with a form that includes arrays and pointers and so on. So those are the mental structures I deal with. That's why I think Dave is right. If I didn't care about finishing anything, I might use a functional language. (Actually, I shouldn't be nasty---I hear SETL is becoming rather efficient after all these years. It's imperative, but it has a lot of the structures Jim seems to love.) So the final form I worked with might include some more abstract type, and I'd probably agree more with Piercarlo that relations and trees are higher level than pointers and arrays. Can we all agree to disagree now? ---Dan
gudeman@cs.arizona.edu (David Gudeman) (11/14/90)
In article <-M.6315@xds13.ferranti.com> Peter da Silva writes:
]David Gudeman:
]> ... Lists and trees are mental structures that
]> humans use to visualize things.
]
]While I'm generally in the bit-bangers camp, I must agree with Piercarlo here.
]The basic things that people work with are "chunks of data".
Well, I'm _not_ in the bit-bangers camp and never have been. You have
to drag me kicking and screaming to get me anywhere near an assembler.
I _never_ look at the assembler output of my C code, and I prefer not
to program in C if I don't have to. My favorite languages have no
obvious relationship between the language constructs and the object
code. I even like interpreted languages. Can we all stop assuming
that just because someone says that pointers can be abstract objects,
that this means the speaker doesn't know what an abstraction is?
Maybe I have an intuition here that you all are not seeing because you
refuse to posit the possibility even long enough to consider it.
My ideal language would have APL-style arrays, SETL-style sets,
Icon-style lists and strings (Lisp lists and strings are too low-level
for my taste), Prolog-style terms, Emacs-style buffers, associative
lookup tables, first-class everything, and lots of other things that
would horrify a bit-banger. It would also have high-level pointers
with pointer arithmetic. The pointers would be high-level in the
sense that they could not be implemented as simple addresses, nor
could pointer operations be implemented one-to-one as machine
instructions. Bit-bangers would be as horrified by my pointers as
by my arrays. Efficiency has _no_, I repeat _no_ influence on my
opinion about pointers.
Pant, pant.
And furthermore...
] Lists, trees,
]tries, and so on are methods of implementing these databases in efficient
]ways. There's nothing fundamental about them.
"tries"!? Eek. There is no relationship between tries and trees
except that tries are often visualized as trees. Implying that these
are at the same level of abstraction is like implying that a list is
at the same level of abstraction as a linked list. I'm not the one
having trouble seperating concept from implementation here.
Did the early linguists know that they were working with low-level,
machine-oriented representations when they started drawing sentence
diagrams as trees?
--
David Gudeman
Department of Computer Science
The University of Arizona gudeman@cs.arizona.edu
Tucson, AZ 85721 noao!arizona!gudeman
pcg@cs.aber.ac.uk (Piercarlo Grandi) (11/14/90)
On 13 Nov 90 20:58:19 GMT, peter@ficc.ferranti.com (Peter da Silva) said: peter> Piercarlo Grandi: pcg> Ah no, this cannot pass. Lists and trees are *implementations*, and pcg> when you draw them you draw specific encodings of such pcg> implementations. peter> David Gudeman: gudeman> No, this cannot pass. Lists and trees are mental structures gudeman> that humans use to visualize things. peter> While I'm generally in the bit-bangers camp, I must agree with peter> Piercarlo here. I am a bit banger too, mate... peter> Lists, trees, tries, and so on are methods of implementing these peter> databases in efficient ways. There's nothing fundamental about peter> them. [ ... ] The problem is that there's no language I know of peter> that really lets you get away with writing your program in terms peter> of relations and get within an order of magnitude in speed of a peter> well-written lower level implementation. As to this I have good news! DBMSes and 4GLs are often (anectodal but frequent evidence) impressively *faster* than any ad-hoc program that does the same. NOTE: *the same*, not similar but much much simpler. If you are manipulating large masses of data in complicated ways, which is most people want to do, a well written DBMS is going to be way ahead of any well written lower level solution. Notice that you *can* for a very important problem outrun a well written DBMS with diabolically well written low level code, but this is a rare case. The same applies for example to sort utilities -- normally it takes a lot of effort to beat a well written generic sort utility on any system I know of. The database approach is also being applied to in-memory databases, both relational or OO. They are usually well within an order of magnitude of using explicit ad hoc trees and lists, and may be even faster, if (and again very few are well written) they pay attention to paging issues, which normally even well written tree and list packages ignore. For example: an incore database may use B+ trees, which have an immensely better, if its blocks are suitably aligned and sized, locality profile than most garden variety trees. Again, many programs, either in-memory or on-disk, are sophisticated and large enough to be considered minro DBMSes. Too bad they are not designed as such... -- Piercarlo Grandi | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
anw@maths.nott.ac.uk (Dr A. N. Walker) (11/15/90)
In article <-M.6315@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes: >While I'm generally in the bit-bangers camp, I must agree with Piercarlo here. >The basic things that people work with are "chunks of data". Lists, trees, >tries, and so on are methods of implementing these databases in efficient >ways. There's nothing fundamental about them. While I'm generally in PdS's camp, I must agree with David Gudeman here! *Sometimes* one works with "chunks of data". In that case, yes, you get to implement your database however you like, and all PierCarlo's distinctions between data design and implementation design *may* come into play. But many, even most, of the real problems I have had to program come with some structure already built in. For example, when analysing a chess position one naturally builds a tree. If you like, you can describe that in a relational database and then observe "Hey, this would look nice as a tree"; I would prefer to skip those steps! Obviously one should keep an eye on the data design, in case things start creaking elsewhere, but as long as things run smoothly, why not stay with the obvious? Of course, *then* you have to implement this tree, and we get into the usual paraphernalia that it can run mostly as a stack, but you also need transposition tables, etc.; but that doesn't stop the tree being fundamental. Equally, when I wrote my exam-mark analysing program, it was handed a *list* of marks; again, you can squeeze it into a database if you like, but *why*? Just to discover that you really have a list? You can implement it as an array or as a list with real pointers, as you like, but the list was fundamental. When I look at routes to my house, I start with a graph; twist it how you will, the graph is *there*; it might get implemented any number of ways, including database look-up, but you can't wish away the graph. Nor can you wish away pointers -- *not* just as implementation details. You don't always have to start with pointers, and you don't always have to implement with them, but they exist, and they are natural, and convenient. As to whether they should be first-class objects -- well, why not? Surely, if we have learned anything from the last 20+ years of structured programming, it should be that relegating objects to the second class is counterproductive; it just stops you from using them in natural ways. -- Andy Walker, Maths Dept., Nott'm Univ., UK. anw@maths.nott.ac.uk
new@ee.udel.edu (Darren New) (11/15/90)
There aer two uses for pointers that heiarchical data structues do not seem to addess, but which are nonetheless vey useful. 1) changing one object into another. Smalltalk supports this with the "become:" operator. Basically, "A become: B" makes all references to B now be references to A and vica versa. I imagine that this could be done with the aliasing that is proposed in Hermes. (BTW, is there a reference to whee I could learn more about Hermes?) However, in Smalltalk, this is almost invaiably used in one of two situations: a) Objects have a fixed size once created. To add more elements to a set (A), make a new set (B) which is larrger, copy A into B, and then "A become: B". b) To break circular dependancies for the garbage collector. That is, "bigStructureWithPossiblyCircularPointerSomewhere become: String new" will cause all references to that structure to be replaced with something which has no out-going references, hence allowing it to be GCed even if through some bug you have accidentally not discarded the path from the root diectory to the circular structure. You may get errors later because the bigStructure... does not handle the same messages as Strings do, but it's better than having all that hanging around forever because of a programming mistake. 2) iterating over all objects (or all objects of a certain type). For example, it is not uncommon to have code to iterate over all objects that reference object X, or to iterate over all functions that have a reference to a paticular global variable, or whatever. Unless you can declare a vaiable that can be aliased to *any* value of a particular type (in which case I would call that variable a pointer), this extreamly handy operation cannot be handled in Hermes. This is useful when making global munges of in-core data (like when adjusting the format of data). It is also useful when there is a rare or unplanned operation in which the natural structuring of the data is not appropriate, which happens in relational databases all the time. For example, it is rare to ask "find all functions with the string 'xyzzy' in the comment field" or "find all source files holding a function which contains a local variable whose name ends in 'W'". It would be silly to build such things into the data structure repesenting compiled code, but the ability to iterate over source files, functions, and so on makes it possible. -- Darren (All typos to be blamed on an old Z19) -- --- Darren New --- Grad Student --- CIS --- Univ. of Delaware --- ----- Network Protocols, Graphics, Programming Languages, Formal Description Techniques (esp. Estelle), Coffee -----
gl8f@astsun.astro.Virginia.EDU (Greg Lindahl) (11/16/90)
In article <23461:Nov904:59:2290@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: [ I give an example where FORTRAN optimizes well, but C must be hand-optimized. Dan claims "But look, I can get the same performance by tweaking the code."] I reply: >> Great, you've just doubled the # of lines of code. This is known as >> "creating a maintenance nightmare". > >I thought we discussed this before? The way you maintain hand-optimized >code is to keep around the original, clean, algebraic version, as well >as the new, ugly, optimized version. I know how to maintain code. The point is that it is always more difficult to maintain hand-tweaked code. And the reason that C sucks rocks for that code sequence is because of aliasing. This is the fun part of "talking" to Dan. He will quibble until the cows come home. He will tell you "It's ok if I have to unroll 10,000 loops by hand and do common sub-expression elimination by hand, C is still a great language for expression array routines." He will tell you Fortran is maintinance hell because he doesn't know how to use IMPLICIT NONE or the portable equivalent. And, best of all, at the same time as I am demonstrating that there are cases in which C pointers cause big optimization problems he will claim in another thread that pointers are never less efficient than arrays. Sure, Dan, we all have lots of time to tweak code by hand when an optimizer can do it for us. Isn't that why HLL's were invented? Well, I think I'm joining Jim Giles. Sorry that nobody liked Dan's great theories on comp.theory, looks like comp.lang.misc has a similar opinion. Bye.
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/16/90)
In article <1990Nov16.042141.21130@murdoch.acc.Virginia.EDU> gl8f@astsun.astro.Virginia.EDU (Greg Lindahl) writes: > In article <23461:Nov904:59:2290@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > [ I give an example where FORTRAN optimizes well, but C must be > hand-optimized. Dan claims "But look, I can get the same performance > by tweaking the code."] I was just making a simple observation about the code example you posted. I tried to avoid controversy there because I didn't want to distract people from the following points: 1. The compiler can do quite a bit of aliasing analysis without taking any hints from the programmer (and respecting separate compilation). I've inspected several large Fortran programs, and in each one I observe that this sort of analysis would suffice to detect the aliasing even if aliasing weren't prohibited beforehand. In other words, C doesn't have a huge disadvantage over Fortran just because it allows aliasing. 2. The compiler cannot optimize pointer-free code into pointer code in all cases. Hence there are cases where a C array program using pointers will be faster than a Fortran array program, no matter how good the Fortran optimizer is. The language has to provide some form of pointer arithmetic (i.e., array shifting). If I understand Jim correctly, the new Fortran does exactly that, and so I'm willing to believe that it can express array computations as efficiently as C in all cases. Now returning to the side issue: > >> Great, you've just doubled the # of lines of code. This is known as > >> "creating a maintenance nightmare". > >I thought we discussed this before? The way you maintain hand-optimized > >code is to keep around the original, clean, algebraic version, as well > >as the new, ugly, optimized version. > I know how to maintain code. I believe you. > The point is that it is always more difficult to maintain hand-tweaked > code. Agreed, but hand optimization is still a viable technique. Sometimes it's the only choice. > He will tell > you Fortran is maintinance hell because he doesn't know how to use > IMPLICIT NONE or the portable equivalent. There isn't any portable equivalent. On some machines, if I want to use Fortran, I have to deal with Fortran IV compilers. I'd rather use Pascal than deal with these syntactic problems. > And, best of all, at the > same time as I am demonstrating that there are cases in which C > pointers cause big optimization problems he will claim in another > thread that pointers are never less efficient than arrays. Sure, Dan, > we all have lots of time to tweak code by hand when an optimizer can > do it for us. Isn't that why HLL's were invented? See, here's where you simply fail to pay attention to the main points of the thread. A sufficiently smart compiler *can* do the job for C, without *any* hand optimization. Current compilers happen not to be that smart. Would you please stop misrepresenting my position? Now I'd like everybody to notice something. Jim made exactly the same argument about arrays as I've made in the above paragraph about aliasing. He said that a sufficiently smart compiler can find the right base to precompute for Fortran array references. He said that current compilers happen not to be that smart. The difference is that Jim didn't post the method that his sufficiently smart compiler would use. I did the job. I showed what it would take to deal with aliasing. Jim just waved his hands and made some vague assertions about how arrays are always as efficient as pointers. And so, as it turns out, he was completely wrong. Of course, Greg hasn't even made the effort that Jim has to contribute to the discussion. He has completely ignored my comments on how the compiler can optimize C array code with no help from the programmer. Instead he has focused on my offhand remark about his Fortran example. Sigh. > Sorry that nobody liked Dan's > great theories on comp.theory, Actually, some people already knew what I was talking about, and commented upon the usefulness of the technique (as two people have done in this group). Again you ignore the discussion. ---Dan
peter@ficc.ferranti.com (Peter da Silva) (11/17/90)
In article <PCG.90Nov14151603@odin.cs.aber.ac.uk> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes: > As to this I have good news! DBMSes and 4GLs are often (anectodal but > frequent evidence) impressively *faster* than any ad-hoc program that > does the same. NOTE: *the same*, not similar but much much simpler. [mainly when manipulating large masses of data] Sure, use the right tool for the job. I once wrote an adventure program in APL as a learning excercise, but I'd be hard pressed to recommend it to anyone for that purpose. It's a killer language for tasks that need a lot of matrix operations, though. But for tasks that don't lend themselves to the database model a DBMS is really really slow... [also, as an aside... you can get the same sort of speed with a killer subroutine library that duplicates the DBMS operations embedded in your mid-level language: I believe Oracle sells such a beast] And even if you're manipulating large amounts of data, you may be better off with a lower throughput but quicker-responding mid-level language. I wouldn't want to manipulate MIDI data in real-time with Oracle, even if you are dealing with large masses of pretty homogenous data. -- Peter da Silva. `-_-' +1 713 274 5180. 'U` peter@ferranti.com
peter@ficc.ferranti.com (Peter da Silva) (11/17/90)
You and David Gudeman are right. I don't know what was the matter with me when I said pointers are not fundamental. Of course, they are, and in fact I'd just got done arguing that at length in the "lisp has pointers" thread. Hey, I've been sick. -- Peter da Silva. `-_-' +1 713 274 5180. 'U` peter@ferranti.com
new@ee.udel.edu (Darren New) (11/17/90)
>pcg> Ah no, this cannot pass. Lists and trees are *implementations*, and >pcg> when you draw them you draw specific encodings of such >pcg> implementations. >gudeman> No, this cannot pass. Lists and trees are mental structures >gudeman> that humans use to visualize things. An associate of mine pointed out that we have spend the last several species navigating through trees, with a high survival value to being able to do so efficiently and without falling. It seems entirely possible to me (IMHO) that trees are indeed elementary mental structures for most humans. After all, we've only spend the last few thousand years manipulating symbols formally. 1/2 :-) -- Darren -- --- Darren New --- Grad Student --- CIS --- Univ. of Delaware --- ----- Network Protocols, Graphics, Programming Languages, Formal Description Techniques (esp. Estelle), Coffee, Amigas ----- =+=+=+ Let GROPE be an N-tuple where ... +=+=+=
2113av@gmuvax2.gmu.edu (John Porter) (11/17/90)
Piercarlo Grandi: > ]Ah no, this cannot pass. Lists and trees are *implementations*, and when > ]you draw them you draw specific encodings of such implementations. David Gudeman: > No, this cannot pass. Lists and trees are mental structures that > humans use to visualize things. Peter da Silva: I must agree with Piercarlo here. The basic things that people work with are "chunks of data". Lists, trees, tries, and so on are methods of implementing these databases in efficient ways. There's nothing fundamental about them. John Porter resonds: I'm not normally one to be argumentative, and I find it fascinating that "wars" can arise over topics as basic to c.s. as lists, trees, etc. But I feel compelled to register my agreement with Mr. Gudeman here; lists, trees, etc., ARE concepts, NOT implementations. (That is why they are called "abstract data types".) Mr. Piercarlo's assertion that "when you draw them you draw specific implementations of them" is, it seems patently obvious to me, quite false. A circle, a box, a line, an arrowhead, they say nothing, imply nothing about any implementation; they do not imply that there is, was, or ever will be an implementation, in a computer program or anywhere else. You don't say "I implemented this chunk of data with a tree", you say "I implemented this tree with dynamically allocated structs, each containing pointers to parent and children", or whatever your implementation happens to be. As a concrete example, contrast the just-described tree implementation with the following: A binary tree can be implemented in a static array, where element number one is the head; from any given node (identified by its index into the array), you can reach the left child by multiplying the indes by 2; find the right child immediately past the left child. This implemenation is sometimes called "heap", but that is confusing. Note. No pointers. An entirely different implemenation, but a tree none-the-less. The concept is the same. -- John Porter # "Everybody could stand a HUNDRED chest x-rays a year. # They ought to have them, too." J. Frank Parnell
dbc@bushido.uucp (Dave Caswell) (11/18/90)
.As to this I have good news! DBMSes and 4GLs are often (anectodal but .frequent evidence) impressively *faster* than any ad-hoc program that .does the same. NOTE: *the same*, not similar but much much simpler. . .If you are manipulating large masses of data in complicated ways, which .is most people want to do, a well written DBMS is going to be way ahead .of any well written lower level solution. Notice that you *can* for a .very important problem outrun a well written DBMS with diabolically well .written low level code, but this is a rare case. . .The same applies for example to sort utilities -- normally it takes a .lot of effort to beat a well written generic sort utility on any system .I know of. This sure doesn't my experience. My experience is that custom written code can take a problem that takes 10-20hrs. to solve and change it into a 10-20 minute problem. In other words the custom written code is what allows the solution to actually be used. I'm not saying that Oracle or whatever isn't a very useful tool just that there is a whole host of problems that it can't solve. Many of these problems are relatively easy to hand-code using well known algorithms. (well known means published in CACM or something similar). -- David Caswell dbc%bushido.uucp@umich.edu
pcg@cs.aber.ac.uk (Piercarlo Grandi) (11/19/90)
On 18 Nov 90 03:15:03 GMT, dbc@bushido.uucp (Dave Caswell) said: [ ... DMBS ar usually better than custom written code ... ] pcg> The same applies for example to sort utilities -- normally it takes a pcg> lot of effort to beat a well written generic sort utility on any system pcg> I know of. dbc> This sure doesn't my experience. My experience is that custom dbc> written code can take a problem that takes 10-20hrs. to solve and dbc> change it into a 10-20 minute problem. In other words the custom dbc> written code is what allows the solution to actually be used. I would say that this is *extremely* unlikely to be common for problems that fall within the scope of a DBMS query planner. And if these are pathological problems, than you are right, and I had said that these indeed require csutom coding. But I would be very skeptical of anybody's claiming that they can routinely do better than a DBMS by using custom code. If you want, I have anecdotal evidence that custom written DB applications using B-tree libraries or similar low level things were substantially slower than loading the same data into a DBMS and letting it do its query planning. dbc> I'm not saying that Oracle or whatever isn't a very useful tool dbc> just that there is a whole host of problems that it can't solve. But it solves the problems of surely more than 50% of the people that use a computer (CS types tend to forget that computers are not used solely to develop compilers and operating systems, or clever numerical codes), and this is damn important. I would say that for the extremely large class of application that can be done with a DBMS using anything else should be justified with great and good reasons. For things that are not within the scope of a DBMS, there is something equivalent to a DBMS. Before rolling one's own, if you do statistics you had better use S or the BDMP library, and so on. dbc> Many of these problems are relatively easy to hand-code using well dbc> known algorithms. (well known means published in CACM or something dbc> similar). And here we have a big misunderstanding. Programs and utilities are not algorithms. A sort utility source is maybe one or two orders magnitude larger and more sophisticated than that of its core algorithm. Admittedly a lot of this is not value added -- it would not be there if the environment were more cooperative. But, a lot of it is value added. It is a common delusion to think that algorithms and programs are the same thing (I would call it the "Dijkstra book syndrome"). To make a useful program out of an algorithm takes a lot of effort. Take for example sorting: if all you have to sort is a few kilobytes, almost any algorithm will do and you can code it straightforwardly. But if you have large masses of complicated data either you get into difficult things like handling of external storage, or, if you have single level storage, you must make sure you have good locality. Just as an example of a relatively simple minded sort utility, take the UNIX sort(1). Its logic is "relatively easy to hand code using well known algorithms". Have a look at the source for a demonstration :-). I'd bet that not many programmers around can do significantly better than the UNIX sort(1) command, even on special cases. -- Piercarlo Grandi | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk