restivo@POLYA.STANFORD.EDU (Chuck Restivo) (07/04/88)
Date: Monday 2 May 1988 12:53-PST From: Chuck Restivo (The Moderator) <PROLOG-REQUEST@POLYA.STANFORD.EDU> Reply-to: PROLOG@POLYA.STANFORD.EDU> US-Mail: P.O. Box 4584, Stanford CA 94305 Subject: PROLOG Digest V6 #25 To: PROLOG@POLYA.STANFORD.EDU PROLOG Digest Monday, 2 May 1988 Volume 6 : Issue 25 Today's Topics: Query - Toleration & Etiquette, Implementation - End_Of_File & Destructive Predicates, & Techniques & Strings & BSI Standards ---------------------------------------------------------------------------------------------------------------------------- From: mcvax!prlb2!kulcs!bimbart@uunet.uu.net (Bart Demoen) Subject: I don't understand "precised" could the author of article 599 in comp.lang.prolog behave a bit more respectful towards europeans who speak 3 or 4 (natural) languages but whose native language is not english ? we are bound to make mistakes in english, but that's no reason to sneer at us and by the way, it is not the first time he did so, or do I need to remind you of the Boston conference in 1985, Mister O'Keefe ? those who have nothing to say, repeat themselves. ------------------------------- Date: 16 Mar 88 10:24:05 GMT From: mcvax!prlb2!kulcs!bimandre@uunet.uu.net (Andre Marien) Subject: behavior of read/get0 at end_of_file > copy_chars :- > get0(Char), > copy_chars(Char). > > copy_chars(-1) :- !. > copy_chars(Char) :- > put(Char), > copy_chars. At the benchmark workshop he agitated vividly against the different behavior of BIM_Prolog in case of end of file. BIM_Prolog fails when get0 attemps to read past end of file in stead of returning -1. The same is true for read. If you write the same program as above with the BIM_prolog convention, this it what it looks like : copy_chars :- get0(Char), !, put(Char), copy_chars. copy_chars . The previous code creates a choicepoint for every character which get processed. It can be easily avoided : copy_chars :- copy_chars_h. copy_chars . copy_chars_h :- get0(Char), put(Char), copy_chars_h . Now this looks so obviously better and more readable to me, that it convinces me we made the better choice. BTW, if you ever want to convert a program with a different interpretation, the solution is easy : /*QP*/read(X) :- /*bim*/read(X), ! . /*QP*/read(whatever_is_used_to_indicate_end_of_file) . Of course, as you can verify above, a different coding may very well produce a better program. -- Andre' Marien -- Bart Demoen ------------------------------ Date: 17 Mar 88 03:17:08 GMT From: munnari!mulga!lee@uunet.uu.net (Lee Naish) Subject: If-then-else/soft cut (Destructive list predicates) In article <1798@sics.se> seif@sics.se (Seif Haridi) writes: > >if(+P,+Q,+R) is similar to (P -> Q ; R) construct except that it allows the >generation of all possible solutions for P. Several Prologs have similar constructs and I think all Prologs *should*. From a logical point of view, quite often we want to do something like some X (p(X, Y), q(X, Z)) ; (not some X p(X, Y), r(Z)) The standard (p(X, Y) -> q(X, Z) ; r(Z)) doesnt work because some solutions to the first disjunct may be lost (it may also bind Y, cutting off the second disjunct when it shouldn't). A solution to the first problem is to use (\+ p(X, Y) -> r(Z) ; p(X, Y), q(X, Z)) Unfortunately, this calls p/2 twice. In NU-Prolog, you can write (if some X p(X, Y) then q(X, Z) else r(Z)) This can return multiple answers to p/2 (without calling it twice) and also delays until Y is ground (solving the second problem). How is it implemented? Not with any destructive assignment. It is translated into a call to xyzzy(Y, Z), where xyzzy/2 is defined as follows (more or less): ?- xyzzy(Y, _Z) when ground(Y). % delay until Y is ground xyzzy(Y, Z) :- p(X, Y), $soft_cut, q(Y, Z). xyzzy(Y, Z) :- r(Z). Soft cut removes the choice point for xyzzy but does not touch choice points created by the execution of p. It is as easy to implement as cut, if not easier. If the quantified variable, X, did not occur in the "then" part, normal cut could be used instead of soft cut. This is discussed in Naish, L., "Negation and Quantifiers in NU-Prolog", Proceedings Third ICLP, London, LNCS 225, Springer. While I am on the subject, I may as well launch into some "random musings, part 1". <<REST HERE IF TIRED>> Its a pity cut was ever invented. It combines two operations which should be separate. They are soft cut (clobbers the parent choice point) and the "once" construct of DEC-10 (which clobbers all choice points created by the execution of a call). p :- q, r, !, s. is equivalent (exactly) to p :- once (q, r), soft_cut, s. Why should the two operations be separated? * Soft cut is for negation. * Once is for existential quantification. For example, if we assume that the first argument of p/2 is ground at the time of calling (we could ensure this by delaying, testing, compile time analysis, or praying very hard) p(X, Y) :- once q(X, Z), r(X, Y). p(X, Y) :- s(X, Y). is a correct implementation of p(X, Y) :- some Z q(X, Z), r(X, Y). p(X, Y) :- s(X, Y). and p(X, Y) :- q(X, Z), soft_cut, r(X, Y). p(X, Y) :- s(X, Y). is a correct implementation of p(X, Y) :- q(X, Z), r(X, Y). p(X, Y) :- all Z not q(X, Z), s(X, Y). Thats enough for now. Stay tuned for part 2, which may mention commit, removing choice points as soon as possible and "semi-completeness" (defined in "Proving properties of committed choice logic programs", appearing in JLP some time). -- lee ------------------------------ Date: 17 Mar 88 08:00:35 GMT From: munnari!mulga!lee@uunet.uu.net (Lee Naish) Subject: Programming techniques In article <772@cresswell.quintus.UUCP> ok@quintus.UUCP writes: > >use q(0, X, X) for an empty list >and q(s(s(...(0)...)), [X1,...,Xn|X], X) for a list of N elements. > > empty_queue(q(0,X,X)). This is a really nice technique, which I only became aware of recently (it was used in a breadth first search program of Fernando's posted to the net). I am currently giving an honours course on Prolog programming techniques and gave this example last week (I used proper integers, not s(...) - I think its better in this example). There must be lots of techniques like this which are being discovered. Please post them to the net so we can all benefit (including my students). Here is an example from todays lecture, for reading terms from a file (the idea comes from David Morley): % Term is a term in File % (returns all terms in a file by backtracking) % Cleans up when eof is reached. contains_term(File, Term) :- open(File, read, Stream), % see(File), repeat, read(Stream, Term), % read(Term), ( isEof(Term) -> close(Stream), % seen, !, fail ). contains_term/2 can be treated like a nice logical relation. Some examples of its use: % get list of all terms in a file read_all(File, TermList) :- findall(Term, contains_term(File, Term), TermList). % do dcg processing of a file % (I'm not sure if I like this style of coding) dcg(File) :- contains_term(File, Term), expand_term(Term, ETerm), write_clause(ETerm), fail. dcg(_). Here is a better way of collecting all terms in a file into a single term (a list in this case). Most people should have seen this. % Treat file like a list of terms and use tail recursion to % process them. % This means we dont have to use side effects or all % solutions predicates to do something useful. % This scheme doesnt work properly with backtracking, since % read is not undone on backtracking. read_all(File, AllTerms) :- open(File, read, Stream), read_all1(Stream, AllTerms). read_all1(Stream, AllTerms) :- read(Stream, Term), ( isEof(Term) -> close(Stream), process_eof(AllTerms) ; process_term(Term, AllTerms, AllTerms1), read_all1(Stream, AllTerms1) ). % what to return at eof process_eof([]). % how to return a single term in the data structure process_term(Term, Term.AllTerms, AllTerms). -- lee ------------------------------ Date: 14 Mar 88 05:39:31 GMT From: quintus!ok@unix.sri.com (Richard A. O'Keefe) Subject: Strings In article <512@ecrcvax.UUCP>, micha@ecrcvax.UUCP (Micha Meier) writes: > In Prolog Digest V5.100 Richard O'Keefe writes > > "... if efficiency is your concern your are better off using lists > of character codes than strings." > > May I ask this to be precised a little bit more? I can't see why is it > faster to handle structured data on the global stack rather than directly > characters - I guess this applies only for special predicates, > e.g. returning all substrings of a string, but even there it's in > no way evident that using lists is faster. Several points to make: > I don't understand "precised". Most implementations of strings in Prolog ARE "structured data on the global stack". When I say that it is more efficient to use lists of character codes rather than packed byte vectors, I am reporting empirical measurements. I don't have to explain it: I'm just saying truthfully that in the cases I have measured lists were faster. There may be Prolog systems which implement lists so badly that this isn't true. I have to admit that I've only really tested concatenation and searching, not accessing single characters. > - when using lists of character codes, each time when > the character code or the next element is accessed, > it has to be dereferenced and its type must be tested On the contrary. A great advantage of using lists is the fact that you *don't* have to do anything to a list element *unless* you want to process that particular element. Contrast Chars0 = [Char1|Chars1] which dereferences-and-type-tests Chars0 (it's one operation) and assigns the two fields of Chars0 to Char1 and Chars1 *without* dereferencing or type testing Char1, with N1 is N0+1, nth_char(N1, String, Char1) where N1 has to be dereferenced-and-type-tested and checked for being in the right range, then the character has to be extracted and *boxed*, and finally assigned to Char1, also we had to do arithmetic on N0/N1. If you don't like the arithmetic (and who could blame you), consider instead the Turbo Prolog equivalent (taken from the first edition of the Turbo Prolog manual, so it's probably wrong). frontchar(String0, Char1, String1) Let's assume a conventional Prolog, rather than Turbo Prolog, which no doubt does something clever. String0 has to be dereferenced and type tested (just like a list cell!) then the first character has to be extracted and *boxed*, and a new string descriptor has to be created and *boxed* and assigned to String1. Assuming that a boxed character costs one cell and a string descriptor costs two cells (not unreasonable), walking down a string of N characters using frontchar/3 ends up using 3N cells for the characters and string tails *in addition* to the packed byte vector we started with, whereas the list version would have used 2N cells total. > - copying a list is certainly slower than copying a string; > for strings you might need a fixed size buffer which > can complicate the things a little bit of you want > strings of arbitrary length, but it is easier to get > another buffer than to get another global stack Why would you copy either? If you have a variable X which is bound to a string or list of character codes, and you want a variable Y to have the same value, just do Y = X. I don't understand "get another global stack". If you're worried about space, consider the fact that with the usual representations of lists and byte vectors, when you append something in front of a list you can share the old tail, but with a byte vector you have to make a copy. That is, with Chars1 a known list of N characters and Chars0 unknown, Chars0 = [Char1|Chars1] costs 2 cells and 1 time unit, but with String1 a known string of N characters and String0 unknown, frontchar(String0, Char1, String1) costs N+1 bytes + 2 cells for the string descriptor and N+k time units. (The version using nth_char/2 would turn over 2N cells.) You can use 1 cell for a string descriptor rather than 2, but then the cost of taking substrings goes way up. > - a list item needs at least 8 bytes, a character only one > (plus some fixed amount for each string, e.g. length, end or others). > If your string is longer than "aa", you need much more space > to handle it as a list, consequently more time due to swapping I have answered this above. You might think about the fact that the list representation will handle 16-bit characters (can you say "Kanji"? I knew you could) at no extra cost, but the alleged space advantage of byte vectors is chopped in half. > - garbage collection is easier for atoms than for strings > in the sense that there is only one reference to the name > of the atom, namely from the symbol table itself, but to > gather all accessible atoms one has to scan the stacks > and the program code anyway (unless you want to trail them > which is really not faster) This is an implementation detail. There is no reason why a Prolog system might have more pointers to an atom name than one. In order to find all accessible strings you have to scan the stacks too. > we still think that having strings is useful. Yes, but WHAT FOR? We've seen above that the time and space costs can go either way. Here's a specific example: assuming that your Prolog system supports the ISO 8859/1 character set, and that it does *not* reflect dpANS C's setlocale() and strcollate() functions as Prolog string operations, write a predicate which will take two text objects representing German-language book titles, and compare them using whatever the convention is in German libraries. To make it easier, we'll assume that there are no punctuation marks and that all numbers are spelled out. Do this entirely in Prolog, using only the operations you already have. (If you can stand it, try using the operations in the BSI draft.) make two versions of the predicate: one which uses byte vectors and one which uses lists, measure their space and time use, and show us the code, the test data, and the results. Those who do not understand Prolog are condemned to reinvent Lisp, poorly. ------------------------------ Date: 16 Mar 88 09:57:19 GMT From: quintus!ok@unix.sri.com (Richard A. O'Keefe) Subject: Strings In article <5349@utah-cs.UUCP>, shebs%defun.utah.edu.uucp@utah-cs.UUCP (Stanley T. Shebs) writes: > In article <776@cresswell.quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes: > > >If you need random access into some collection, a vector-like structure > >is going to be a really good idea. The point is that most uses of STRINGS > >just aren't like that. > > In other words, if I use strings in the ways for which lists are a good > representation, fine, and if I don't, then my program is going to have > really shabby performance. > The point is that there is no one implementation of the abstract data type "sequence of character" which is best for all purposes, which is hardly a surprise. My claim is that the implementation which is best for most of the tasks people claim strings are good for is lists, not packed vectors of bytes. Is having only one representation available a good idea? Of course not. Does that mean that packed vector of byte should be provided? Not necessarily. A bearable-all-round implementation of logical arrays might be a better idea. > But surely you're not suggesting that Prologs provide "buffers"? No, of course not. But NOT supporting abstraction and sticking as close to the machine as possible is how C gets the speed it does. Specifically, C can handle text fast precisely because it HASN'T got a string data type. The point was that you can't use C or Fortran or ADA as a stick to beat Prolog with while shouting how wonderful *strings* are. If the goal is, as Shebs suggested in an earlier posting, to have no dynamic allocation costs, show me how to achieve that in an implementation of Lisp or Icon, and then we'll talk about achieving zero dynamic allocation in Prolog. Don't tell me that strings are so wonderful &c &c that Prolog should have them and use C as an example to support your case, it doesn't. Anybody who thinks he can find in any of my postings an argument that data abstraction is bad or that lists are a universal cure had better take a course in reading English. The claims I have made about strings in Prolog are simply that For most of the things I have seen strings used for in Prolog, the packed vector of byte representation is clumsy to use or slower than the list representation and usually both, and the use of string operations can often be avoided entirely by the use of proper abstractions that express what you really want. As an example of the latter, if you want to represent arithmetic expressions, it is clearer, faster, and a better use of abstraction, to use trees (such as a+b) rather than strings (such as $a+b$), and if you use data abstraction, only the module that defines the operations will know which representation you are using. ------------------------------ Date: 15 Mar 88 02:35:02 GMT From: defun.utah.edu!shebs@cs.utah.edu (Stanley T. Shebs) Subject: Strings In article <768@cresswell.quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes: >When I say that it is more efficient to use lists of character codes >rather than packed byte vectors, I am reporting empirical measurements. >I don't have to explain it: I'm just saying truthfully that in the >cases I have measured lists were faster. There may be Prolog systems >which implement lists so badly that this isn't true. I have to admit >that I've only really tested concatenation and searching, not accessing >single characters. This is a very interesting assertion, because it has a lot of implications for *any* vector/array-type representation. The broadest interpretation is that any and all vector-like types are bad. This would also extend to types that one might not think of as vector-like, such as infinite-precision integers (bignums). So, if Prolog strings are better as lists, is that also true for Prolog bignums? The reason I wonder is that Lisp has not used lists to represent bignums since the mid-60s, and (for instance) JonL White's bignum paper in the 86 Lisp conference assumes a vector-like representation. Off the top of my head, it seems that the Prolog evaluation model is the ultimate reason for the advantage of lists. If not, O'Keefe's empirical measurements have some consequences reaching far beyond contemporary Prolog, and Lispers should start rethinking their implementation techniques... Incidentally, David Wise has made similar claims for the advantages of list representation in functional languages, but (if I recall correctly) he also assumes binary trees and some sort of hardware memory support. > frontchar(String0, Char1, String1) >Let's assume a conventional Prolog, rather than Turbo Prolog, which no >doubt does something clever. String0 has to be dereferenced and type >tested (just like a list cell!) then the first character has to be >extracted and *boxed*, and a new string descriptor has to be created >and *boxed* and assigned to String1. Assuming that a boxed character >costs one cell and a string descriptor costs two cells (not unreasonable), >walking down a string of N characters using frontchar/3 ends up using >3N cells for the characters and string tails *in addition* to the >packed byte vector we started with, whereas the list version would have >used 2N cells total. Boxing characters is just as bad as boxing small integers, and just as unnecessary. Avoiding the creation of a new string descriptor is harder, and requires compiler analysis, but it is possible under many circumstances. I think the real underlying motivation for strings is that languages like C can do string processing with essentially zero space and time overhead, and one would like to have Prolog to have the same characteristics. It's not so much an issue of doing one operation on German titles, but of passing a million characters of text through a program without allocating so much as one byte of storage dynamically. If a Prolog implementor could guarantee that, then no one would care whether there was a separate string datatype or not, and the implementor could make a lot of money! -- stan shebs ------------------------------ Date: 17 Mar 88 16:03:04 GMT From: mcvax!enea!ttds!draken!kth!sics!lhe@uunet.uu.net (Lars-Henrik Eriksson) Subject: Strings In article <776@cresswell.quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes: > ISBN 3-211-81776-X (can anyone tell me what it means > ISBN 0-387-81776-X for a book to have 2 ISBNs?) It means it is published at two places at once. For instance, in my copy of the ALGOL 68 report, you can find the lines: ISBN 3-540-07545-3 Springer-Verlag Berlin * Heidelberg * New York ISBN 0-387-07545-3 Springer-Verlag New York * Heidelberg * Berlin I.e. the first one was printed in Germany, the second one in the U.S. Note that only the publisher parts of the numbers differ. -- Lars-Henrik Eriksson ------------------------------ Date: 11 Mar 88 16:40:19 GMT From: defun.utah.edu!shebs@cs.utah.edu (Stanley T. Shebs) Subject: character type (BSI standards) In article <756@cresswell.quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes: What he says. O'Keefe makes some strong points about the complexities of case conversion in various languages. I suppose case conversion on individual characters is too prevalent to drop it in favor of "word" or "sentence" case conversion! I haven't done any significant text processing in CL, so can't comment on the "correct" practice. >I think the Common Lisp character abstractions aren't quite right either. >Trying to treat control-super-hyper-X as a single character is not quite >right. (How many versions of Common Lisp have (> (char-bits-limit 1)) ?) The real reason for having "char-bits" in CL has more to do with a certain Lisp company than with sound technical reasons. Thus the "fancy" characters are not required to be storable into strings, which limits their usefulness! Still, most commercial CL impls *do* have (> char-bits-limit 1), but the main reason seems to be that there are usually about 24 bits available in the standard representations, but only 7 are actually needed for a code, so there's nothing to lose by saying that some of the remaining bits are "char-bits". All pretty sad, actually... >If characters are represented by integers, then it is straightforward >to program up missing operations. If characters are a separate data type, >but that data type is missing many of its "natural" operations, then you >wind up with murky code changing types all over the place. A separate data type with conversion functions doesn't imply murky code, if the missing operations have been written to keep all the murkiness to themselves. -- stan shebs ------------------------------- Date: 17 Mar 88 01:48:17 GMT From: munnari!mulga!dnm@uunet.uu.net (David Morley) Subject: BSI standards In article <8803112331.AA09995@decwrl.dec.com> vantreeck@curie.dec.com writes to Richard O'Keefe: >You have a precedence parser for PROLOG with user defined operators that can >run as fast and be as small as a one-pass recursive decent or LALR(1) parser >on a PROLOG without user defined operators? It is not dificult to modify SLR(1) or LALR(1) parser generators so that they accept user definable operators. I am working on a generalisation of LR-style parser generating algorithms which makes this easy. For example, the modified SLR(1) parser generator can be given a DCG such as: term(Prec) --> [fy_op(Prec)], term(P1), { Prec>=P1 }. term(Prec) --> [fx_op(Prec)], term(P1), { Prec>P1 }. term(Prec) --> term(P1), [yf_op(Prec)], { Prec>=P1 }. term(Prec) --> term(P1), [xf_op(Prec)], { Prec>P1 }. term(Prec) --> term(P1), [yfx_op(Prec)], term(P2), {Prec>=P1, Prec>P2}. term(Prec) --> term(P1), [xfy_op(Prec)], term(P2), {Prec>P1, Prec>=P2}. term(Prec) --> term(P1), [xfx_op(Prec)], term(P2), {Prec>P1, Prec>P2}. term(0) --> simple_term. and automatically churn out a deterministic shift-reduce parser for the language (ignoring the fy500 atom yf500 conflicts, etc.). No need to get your hands dirty hacking YACC to shreds. (By the way, if you hate DCG's, just think of a DCG as a finite representation of an infinite BNF grammar). As for a parser for PROLOG with all the bells and whistles, I'm working on it, but don't hold your breath (my interest in this field is improving automatic parser generation so that ambiguity can be resolved in the grammar rather than by hacking the parser generator algorithm ala YACC). <-: By the way, does anyone want user definable brackets? :-> The only reason I haven't said anything about the BSI standards debate before is that ROK says everything I want to say (and more)better than I can. I hope the committee takes note. -- David Morley ------------------------------ Date: 17 Mar 88 12:06:38 GMT From: mcvax!unido!ecrcvax!micha@uunet.uu.net (Micha Meier) Subject: BSI Proposal In article <771@cresswell.quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes: >As Chris Moss says, most languages since Fortran regard blanks as >significant. Surely it is inconsistent to argue that > "RED O" and "REDO" >are different, and that is a Good Thing, but > "red (O)" and "red(O)" >being different is a Bad Thing? Blanks are being treated as >significant in both cases. "RED O" and "red (" are both two tokens, >and "REDO" and "red(" are both one token; it's exactly the same thing. >What's sauce for the goose is sauce for the gander. I'm no fan of the allowed space between the functor and the opening parenthesis, but this does not seem to me as a good argument. In the usual languages blankspaces are allowed between characters of different classes (in the POP meaning of character classes), with some exceptions like e.g. the floating point constants. From this point of view a space between "red" and "(" seems rather natural, to say that "red(" is one token seems to me at least arguable (easy to implement, I admit). But there are other consequences of this BSI rule - is there any reasonable argument to forbid binary operators that start with a capital letter or that contain letters and graphic chars? Actually, we have implemented the BSI standard in our Sepia, (it's probably one of the first implementations) but I have to admit that one of the first things to provide was a switch to be able to switch it off. There is a long list of problems we have encountered, we are going to pass them to BSI soon. I have appreciated to have a minimal list of built-ins provided by BSI, but the description contains so many errors that one has really to take care when implementing them. My global impression from BSI is, I'm afraid, close to that of R.A.O'K, it is a new language, and very difficult to implement. I've never really understood the grammar and the 'Formal Specification of Prolog', though :-) -- Micha ------------------------------ End of PROLOG Digest ********************