ok@quintus (08/08/88)
ICON (described in "The ICON Programming Language", Griswold & Griswold, Prentice-Hall, 1983) is a descendant of SNOBOL. It has a fairly conventional syntax, but adds backtracking, strings, hash tables, and pattern-matching (patterns are not a data type as in SNOBOL, but you can do most of the same sorts of things). Version 7 of ICON (the current one) is implemented (in C and YACC) as a compiler which produces abstract instructions which are then interpreted by another C program. This is rather like the Berkeley Pascal "pi" compiler and "px" interpreter. Version 5 of ICON (the one described in the book) had two implementations sharing a common front end: an "abstract instruction" version, and one which produced native code. However, the book says that "a difference of 5% to 10%" in speed between "compiled" and "interpreted" versions was typical. So the ICON group decided that it wasn't worth maintaining the native-code version any longer. What I'd like to understand is why the difference was so slight. Let's put it in context (bigger numbers mean faster, each language's C-coded interpreter set to 1): 1 Pascal, Berkeley "pi" + "px" 36 Pascal, Berkeley "pc" -- my measurement on a SUN-3 1 Prolog, WAM emulated in C 2 Prolog, WAM emulated in assembly code 4 Prolog, WAM macro-expanded and peephole-optimised to native code 1.0 ICON, Version 5 "icont" + "iconx" 1.1 ICON, Version 5 "iconc" It's a long time since I was sure of my facts about Berkeley Pascal, but I don't recall the abstract instructions as being designed for particularly fast emulation, or the compiler as doing much optimisation. I suspect that a ratio of 4 to 10 for "simple-minded native code" -vs- "good C emulator" should be more typical of Pascal. So why is the ICON ratio so low? It could be that the emulator was unusually good, or the compiler unusually bad, or it could be something about the language. To put _that_ in context, I decided to take a Prolog program and recode it in ICON. Since I happened to have it handy, I used the "darlington-to-workington" example on page 163 of the 3rd edition of Clocksin & Mellish (section 7.9). The total times to find all shortest paths from each town were original Prolog version 4.8 seconds ICON version 6.2 seconds comparable Prolog version 0.5 seconds ohe original Prolog version is the one which appears in Clocksin & Mellish. It uses "findall", which is a fairly expensive operation, and uses a quadratic algorithm for finding the shortest element of a list. The ICON version, not being able to backtrack over a Prolog-like data base, backtracks over a vector of triples, but is otherwise determinate. The second Prolog version is much closer to the ICON version in structure. The ICON version suffers from not having lists (in the sense of Prolog and Lisp). Instead it has extensible vectors (and calls them lists). This is a neat idea, but the overheads of the ICON implementation are high. So I declared a "pair" record type (pair(Cost,Path) in ICON corresponding to Cost-Path in Prolog) and a "cons" record type (cons(Head,Tail) in ICON corresponding to [Head|Tail]) in Prolog. Allow a factor of 3 for the size of the data structures (ICON data structures taking up a lot of memory), and a factor of 2 to allow for the fact that the ICON emulator is coded in C and the Quintus Prolog emulator isn't, and we're looking at a ratio of about 2 to 1 between ICON and Prolog [on _this_ task, as coded by me, running on a Sun-3/50, & other context]. The bottom line of this comparison is that the ICON interpreter doesn't seem to be especially good or especially bad. If ICON were like Prolog, we would expect about a factor of 4 between the speed of interpreted code and the speed of native code. That fact that this was _not_ observed suggests that either the compiler or the language was the problem. It's worth drawing a distinction between polymorphic languages like ML and typed Prolog, and languages with overloading, like PL/I and ICON. An example of a polymorphic operation is: -- len: * list -> int ++ len nil = 0 ++ len H.T = 1 + len T which computes the length of a list, be the type of the elements what it may. An example of an overloaded operation is: *X which yields: the cardinality of a character set, the cardinality of a (general) set, the length of a string, the length of a vector, the number of fields in a record, &c, each of which is a different machine- level operation. Similarly (courtesy of automatic run-time coercion), X||Y (string concatenation) does not correspond to a single operation either. So I _think_ the low ratio for ICON wasn't do to the compiler either, but to the fact that there is so little that the compiler can do. Would anyone who understands (I have the ICON Implementation book, but have not finished it yet and still can't pretend to understand) the ICON implementation care to comment on this? [I'm quite happy with ICON's speed: it's quite a bit faster than AWK....]
debray@arizona.edu (Saumya Debray) (08/23/88)
In article <260@quintus.UUCP>, ok@quintus writes: > Version 5 of ICON (the one described in the book) had two implementations > sharing a common front end: an "abstract instruction" version, and one > which produced native code. However, the book says that "a difference of > 5% to 10%" in speed between "compiled" and "interpreted" versions was > typical. So the ICON group decided that it wasn't worth maintaining the > native-code version any longer. > > What I'd like to understand is why the difference was so slight. I don't know as much about Icon implementation as I should, but my impression (someone from the Icon project should correct me if I'm wrong) is that in the old implementation, "backtrack points" were created most of the time, even in contexts where they weren't really necessary. Given current wisdom about the cost of creating backtrack points, this overhead might account for the relatively small speedup obtained in going to native code. Some recent research in Icon has, in fact, been aimed at eliminating some of this redundant backtrack point creation through static analysis. -- Saumya Debray CS Department, University of Arizona, Tucson internet: debray@arizona.edu uucp: arizona!debray
gudeman@arizona.edu (David Gudeman) (08/26/88)
>From: debray@arizona.edu (Saumya Debray) > >in the old implementation, "backtrack points" were >created most of the time, even in contexts where they weren't really >necessary. Given current wisdom about the cost of creating backtrack >points, this overhead might account for the relatively small speedup >obtained in going to native code. Actually, in the distributed version of Icon, backtrack points are still created where they aren't needed. Ralph Griswold did some measurements that indicated that the interpreter spends about 90% of its time in the run-time system, and only 10% interpreting the icode. Obviously, a compiler that uses the same run-time system can only hope to improve the speed by 10% at the most. Also, I have a couple of comments to make about the previous article, which I didn't have time to formulate before... (1) The article described an experiment in which a program was written in Prolog and in Icon, and the Prolog implementation was faster. I don't know how much the author knows about Icon, but Icon is like Prolog in that an experienced programmer can get a lot more performance out of the language by knowledge of the implementation. The Icon implementation is a lot different from Prolog implementations, and a Prolog guru can be depended on to make a lot of wrong assumptions about speed in Icon. In particular I was horrified at the description of using Icon records to implement linked lists. Icon lists are a _lot_ faster than this if they are used right. You have to forget the old lisp tradition of recursing on the cdr. Icon is a procedural language, so you should not expect functional and logical approaches to work well in Icon. (2) Also, the distributed version of Icon is supported as a research project, and little of the research has gone into efficiency (until recently). That means that a lot more money has gone into Prolog implementations, so Prolog has a decided advantage in that regard. Given (1) and (2), I don't think the speed difference can be attributed to inherent properties of the languages tested. My own feeling is that Icon and Prolog both fit into the same category when you are listing languages by how "high-level" they are. Each language though, is much stronger than the other in particular problem domains. David Gudeman Department of Computer Science gudeman@arizona.edu Gould-Simpson Science Building {allegra,cmcl2,ihnp4,noao}!arizona!gudeman The University of Arizona 602-621-2858 Tucson, AZ 85721
ok@quintus.uucp (Richard A. O'Keefe) (08/26/88)
In article <6797@megaron.arizona.edu> gudeman@arizona.edu (David Gudeman) writes: >Also, I have a couple of comments to make about the previous article, >which I didn't have time to formulate before... Since I wrote it, I'll reply. >(1) The article described an experiment in which a program was written >in Prolog and in Icon, and the Prolog implementation was faster. For heaven's sake, nobody claimed that that PROVED anything! The experiment showed that Prolog and Icon were comparable, i.e. that the Icon implementation wasn't doing anything outrageously bad. Actually, it showed more than that. The Prolog implementation in question uses a threaded-code interpreter implemented at a low level. An implementation in C runs about twice as fast. The Icon implementation is entirely in C. Other things being equal, that should mean that Icon could be made to go about twice as fast, but the claim was made in the Icon books that that sort of speed-up had not been attained. >In particular I was horrified at the description of using Icon records >to implement linked lists. Icon lists are a _lot_ faster than this if >they are used right. You have to forget the old lisp tradition of >recursing on the cdr. Why should you be horrified at using records for sequences? They are in the language, and they are supposed to be useful for that amongst other things. Icon does have things called lists, but they are not at all what everyone else means by lists: they are flex arrays. There are two reasons why my test code didn't use them. (1) The code was much clumsier, and (2) the implementation book frightened me away from them, because the sequences were being constructed dynamically and the space and time overheads would have been rather high. (I'm quite serious: I found that the space cost for a sequence made of records would *in this case* be about half the space cost of a flex array.) >Given (1) and (2), I don't think the speed difference can be >attributed to inherent properties of the languages tested. Neither do I, and that isn't what was claimed. Not speeds, but speedUPs. The point is that the current Icon implementation *is* comparable to Prolog systems using similar implementation methods, but that Prolog can be speeded up quite a bit by elementary means, and that it was claimed *by the Icon project* that such speedups have not been obtained for Icon. It has been explained that 90% of the Icon time is spent in the run-time system. Well, that's not true for Prolog, so there we have the explanation. Yes, Icon *is* a higher-level language than Prolog, in that its basic operations (replace a substring &c) are more remote from conventional machines than the basic operations of Prolog. If the run-time system of Icon were implemented in a lower-level language, that might give the kind of speedup that one gets from improving a Prolog instruction emulator.
raymond@pioneer.arc.nasa.gov.arpa (Eric Raymond RIA) (08/27/88)
In article <6797@megaron.arizona.edu> gudeman@arizona.edu (David Gudeman) writes: >I don't know how much the author knows about Icon, but Icon is like >Prolog in that an experienced programmer can get a lot more >performance out of the language by knowledge of the implementation. And you call this a high level language? Of course I know why you call it "high level". Don't flame about that. I know that optimization is always (usually) a tradeoff for clarity. Don't flame about that, either. What I'm complaining about is a language which depends upon knowledge of its internal implementation (as opposed to some abstract (intuitive) specification). This is further aggravated when this optimization is global in extent. A truly high level language would localize the optimization. (Easier said than done, but let's look ahead. Remember, "High Level" is a relative term.) Eric A. Raymond - NASA Ames Research Center - raymond@pluto.arc.nasa.gov
gudeman@arizona.edu (David Gudeman) (08/27/88)
In article <320@quintus.UUCP> ok@quintus.uucp (Richard A. O'Keefe) writes: >In article <6797@megaron.arizona.edu> gudeman@arizona.edu (David Gudeman) writes: >>(1) The article described an experiment in which a program was written >>in Prolog and in Icon, and the Prolog implementation was faster. > >For heaven's sake, nobody claimed that that PROVED anything! For heaven's sake, nobody claimed that you claimed that that PROVED anything :-). In fact, I understood that you were throwing the experiment out for discussion so I discussed. Sorry you misunderstood. >... Other things being equal, that should mean that Icon >could be made to go about twice as fast, but the claim was made in the >Icon books that that sort of speed-up had not been attained. I should have mentioned that the Icon compiler didn't do anything except output a sequence of machine code instructions for each icode instruction. The machine language part just called functions in the run-time system to do all the work. There is currently a lot of work going on to write a _real_ compiler for Icon, and we expect a lot of improvement. But the improvements mostly will come from optimizations rather than from getting rid of the interpreter. >>In particular I was horrified at the description of using Icon records >>to implement linked lists... > >Why should you be horrified at using records for sequences? They are >in the language, and they are supposed to be useful for that amongst >other things. Oh well, I guess I shouldn't apply my dry sense of humor to people who don't know me. The work "horrified" should be taken as hyperbole. What I _meant_ was that this use of records is no where near as efficient as the use of conses in Prolog and Lisp. Looking up a field in a record requires (1) looking up the record constructor, (2) hashing on the name of the field, (3) going back to the record to get the value. Rather than trying to write a Prolog program in Icon, it would have been more efficient to completely reformulate the solution for Icon. I'd be interested in knowing how well _my_ Icon solution stacks up against your Prolog solution. If you will send me a complete description of the problem, I'll send you my solution. Please be aware that I'm not out to defend Icon. There are a _lot_ of problems with the current implementation (and I'm not responsible for any of them :-). I just think the comparison would be interesting. >... Icon does have things called lists, but they are not at >all what everyone else means by lists: they are flex arrays. I probably should just pass on this one, but I can't help myself... By "everyone else" you must mean "everyone in the Lisp/Prolog community" because outside of that group "list" is a generic term referring to an ordered collection of objects. The operations on a list are generally considered to be application specific. Also, outside of that group no one would know what you mean by the term "flex array". Ask a C programmer, a Smalltalk programmer, a Snobol4 programmer, and a data-structures teacher what a list is. You will probably get three or four different answers. >... Yes, Icon *is* a higher-level language than >Prolog, in that its basic operations (replace a substring &c) are more >remote from conventional machines than the basic operations of Prolog. Isn't unification just as remote from conventional machines as Icon operations are?
gudeman@arizona.edu (David Gudeman) (08/27/88)
In article <13919@ames.arc.nasa.gov> raymond@pioneer.arc.nasa.gov.arpa (Eric Raymond RIA) writes: >In article <6797@megaron.arizona.edu> gudeman@arizona.edu (David Gudeman) writes: >>I don't know how much the author knows about Icon, but Icon is like >>Prolog in that an experienced programmer can get a lot more >>performance out of the language by knowledge of the implementation. > >And you call this a high level language? You know, I've said the exact same thing about Prolog :-). The answer for Icon is that this efficiency thing is a consequence of the _implementation_ of Icon, not a consequence of the definition of the language. So, I should have said that given the UA's current implementation of Icon, an experienced programmer can get a lot more performance out of the language. I firmly believe that this is a Bad Thing, and that it should be changed (someday).
ok@quintus.uucp (Richard A. O'Keefe) (08/30/88)
In article <6814@megaron.arizona.edu> gudeman@arizona.edu (David Gudeman) writes: >Oh well, I guess I shouldn't apply my dry sense of humor to people who >don't know me. The word "horrified" should be taken as hyperbole. >What I _meant_ was that this use of records is no where near as >efficient as the use of conses in Prolog and Lisp. Looking up a field >in a record requires (1) looking up the record constructor, (2) hashing >on the name of the field, (3) going back to the record to get the >value. Rather than trying to write a Prolog program in Icon, it would >have been more efficient to completely reformulate the solution for >Icon. Since I didn't exhibit the source code for either the Icon version of the program or any of the 16 Prolog variants I was trying, it was rather a surprise to find my use of Icon constructs being criticised. Unfortunately, in the rather long interval between my original posting and this subsequent discussion, I deleted the files in question. Again, without seeing the code as I wrote it, it is a little unjust to describe what I did as "trying to write a Prolog program in Icon". Far more accurate would be to describe it as "trying to write a Pop-2 program in Prolog and in Icon". The program was a graph searching algorithm loosely based on the Darlington-to-Workington example in Clocksin & Mellish. The Open set is a collection of <Node,Cost,Path> triples, the Path being a sequence of nodes through which the Node can be reached. With this data structure, if you can reach 50 nodes from a given node, their representatives in the Open set will share most of their Path structures -- this appears to be impossible with Icon arrays. I had skipped the chapter on records in the Icon Implementation book, because I thought I could predict how that would be done. Clearly I was wrong. I expected that, as in Pop-2 and Pop-11, field names would be global function constants (in Pop, doublets). The inconvenience of having to repeat part of the record name in the field name is minor (and it can make the code easier to read), the gain in efficiency is major. As for it being "more efficient to completely reformulate the solution", that rather misses the point. The algorithm I started with was known to be inefficient FOR PROLOG! We're talking factors of TEN here. The point was not to try to write the most efficient possible program in each language, but to pick a variant of a given algorithm which could be expressed naturally in both languages, and see what a compiler did to it. >>... Icon does have things called lists, but they are not at >>all what everyone else means by lists: they are flex arrays. > >I probably should just pass on this one, but I can't help myself... >By "everyone else" you must mean "everyone in the Lisp/Prolog >community" because outside of that group "list" is a generic term >referring to an ordered collection of objects. The operations on a >list are generally considered to be application specific. Also, >outside of that group no one would know what you mean by the term >"flex array". "Flex array" is not a Prolog or Lisp term. It is an Algol 68 term. As for my use of "list", I refer you to section 5.4.1, page 126, of "The SNOBOL 4 Programming Language", where we find DATA('LIST(VALUE,NEXT)') "linked list" page 25, Segdewick, "Algorithms" 1st ed, the use of the word "list" in Knuth "The Art of Computer Programming", and so on. The book "The Turing Programming language" uses "list" in two senses: as a sequence of items in a grammar, e.g. <export list>, and for the linked list data structure. I suggest that the word "list" has several senses (including a verbal one), but that the "data structure" sense is almost universally understood to be the "linked list" one. For the more abstract "ordered collection" data structure, the usual term is "sequence". >Ask a C programmer, a Smalltalk programmer, a Snobol4 >programmer, and a data-structures teacher what a list is. You will >probably get three or four different answers. As it happens, I *am* a C programmer and have been a Smalltalk programmer and a Snobol 4 programmer. When I asked myself, myself agreed with me... (I also asked a former PL/I programmer, who started talking about PUT SKIP LIST and the like, but agreed that "list processing" was about pointers, so again myself agreed with me.) I've already cited the only instance of "list" I could find in the Snobol 4 book. "list" in the sense of a data structure doesn't appear in Harbison & Steele, but on p204 of Stroustroup's C++ (1st ed) we find an "slist" class using a [linked] list implementation. It has to be admitted that the Smalltalk books use the term "list" to mean "visually presented sequence". There is no "list" data structure in Smalltalk, but there *is* a LinkedList class, so I imagine that other Smalltalk programmers, if asked what "list" means *as a data structure* would answer in terms of LinkedList. >Isn't unification just as remote from conventional machines as Icon >operations are? No.