bimandre@kulcs.uucp (Andre Marien) (03/16/88)
> copy_chars :- > get0(Char), > copy_chars(Char). > > copy_chars(-1) :- !. > copy_chars(Char) :- > put(Char), > copy_chars. At the benchmark workshop he agitated vividly against the different behavior of BIM_Prolog in case of end of file. BIM_Prolog fails when get0 attemps to read past end of file in stead of returning -1. The same is true for read. If you write the same program as above with the BIM_prolog convention, this it what it looks like : copy_chars :- get0(Char), !, put(Char), copy_chars. copy_chars . The previous code creates a choicepoint for every character which get processed. It can be easily avoided : copy_chars :- copy_chars_h. copy_chars . copy_chars_h :- get0(Char), put(Char), copy_chars_h . Now this looks so obviously better and more readable to me, that it convinces me we made the better choice. BTW, if you ever want to convert a program with a different interpretation, the solution is easy : /*QP*/read(X) :- /*bim*/read(X), ! . /*QP*/read(whatever_is_used_to_indicate_end_of_file) . Of course, as you can verify above, a different coding may very well produce a better program. Andre' Marien B.I.M. Belgium bimandre@kulcs Bart Demoen K.U.Leuven Belgium bimbart@kulcs
ok@quintus.UUCP (Richard A. O'Keefe) (03/17/88)
article <1197@kulcs.kulcs.uucp>, by bimandre@kulcs.uucp (Andre Marien) [signed by Andre Marien and Bart Demoen both] arrived mangled at our site. What we got started like this: > > > copy_chars :- > > get0(Char), > > copy_chars(Char). > > > > copy_chars(-1) :- !. > > copy_chars(Char) :- > > put(Char), > > copy_chars. > > At the benchmark workshop he agitated vividly against the different > behavior of BIM_Prolog in case of end of file. > BIM_Prolog fails when get0 attempts to read past end of file instead of > returning -1. The same is true for read. > If you write the same program as above with the BIM_prolog convention, > this it what it looks like : > > copy_chars :- > get0(Char), !, > put(Char), > copy_chars. > copy_chars . > > Now this looks so obviously better and more readable to me, > that it convinces me we made the better choice. It would be inaccurate to say that I have ever "agitated ... against ... BIM_Prolog". It would, however, be accurate to say that I had some rather harsh things to say about the quondam intention of the BSI Prolog group to make get0/1 behave in this way. I don't know what the BSI group currently intend. I should apologise for not having taken greater care with the presentation of copy_chars/0; but then in the context of my original message it was part of a joke. Here's how I'd write copy_chars/0 for real: copy_chars :- get0(Char), ( is_endfile(Char) -> true ; put(Char), copy_chars ). This is only superficially different from C's while ((Char = getchar()) != EOF) putchar(Char); Which version is "so obviously better and more readable"? I'm not going to try to answer that question, because it is entirely the wrong question. An example this small can be coped with even if it is badly written. The right question is: are there any objective reasons for preferring one approach to the other? Let's face it, a very important consideration for Quintus is that we take money from people in return for something we claim to be an Edinburgh- compatible Prolog. So we have three choices: 1. have get0/1 return a special code (26 as in DEC-10 Prolog, or -1 as in C-Prolog; we picked the latter) and have read/1 return a special code, because that's what real Edinburgh Prologs do, or 2. stop claiming to be Edinburgh compatible, or 3. deliberately lie to our customers. I think it's clear that number 1 is a defensible choice. So that's a reason why a Prolog vendor who claims to provide an Edinburgh-compatible Prolog should do what we do. But is there any reason why this is a *good* thing to do in itself? Yes, there is. With the possible exception of applying-a-function-to-some-arguments, an operation on its own isn't much good. You need a method for using it, and it has to fit well with the other operations available. In particular, the question of how we read a single character isn't all that interesting: what's interesting is "how do we write a program that reads a lot of characters". For example, how do you write a tokeniser for Prolog or Pascal or whatever in Prolog? Where would you look for advice about how to write programs that do input? Where but in a book about compilers. And a good book on that topic is the "Dragon" book: Compilers: Principles, Techniques, and Tools Aho, Sethi, & Ullman Addison-Wesley 1986 ISBN 0-201-10088-6 To learn how to write a tokeniser, we might look at 'lexer.c' in section 2.9 (page 74). You will note that this program relies on the fact that C streams end with an "EOF" end-marker. Perhaps more seriously, the whole "transition diagram"/"finite state automaton" approach described in chapter 3 relies on "end of file" being a source character like any other. (See fig 3.22, for example.) Indeed, end-markers are so much taken for granted in parsing that "$" crops up in chapter 4 without much introduction. We can convert a deterministic finite-state automaton to Edinburgh Prolog with very little effort. We represent a state of the automaton by a predicate with one argument: the next character. We represent an arc of the automaton by a clause. For example, the arcs s1: a -> s2. s1: b -> s1. s1: $ -> accept. would be coded like this: s1(0'a) :- get0(Next), s2(Next). s1(0'b) :- get0(Next), s1(Next). s1(- 1) :- true. The correspondence is exact. I think this is an important practical point. I offer the following anecdote as evidence: a year after learning Prolog I decided that I wanted to write a Prolog system of my own, but couldn't figure out how to write the tokeniser, so I abandoned the entire project. Two years later, I was talking to someone and suggested this approach as a method for writing programs that use get0/1, and suddenly it dawned on me that it would work. One evening of coding later, I had a Dec-10 Prolog tokeniser written in Dec-10 Prolog. If you need some lookahead, you simply add extra arguments to a few states to carry the looked-ahead characters. What would it do to us if get0/1 failed at the end of a file? Unpleasant things: the test for end of file has to be pushed up into the states *before* the state which expects end of file. Every arc of the form sN(X) :- get0(Next), s1(Next). has to be rewritten as sN(X) :- bad_get0(Next), !, s1(Next). sN(X) :- s1_eof. and s1 has to be written as s1(0'a) :- bad_get0(Next), s2(Next). s1(0'b) :- bad_get0(Next), !, s1(Next). s1(0'b) :- /* EOF */ s1_eof. s1_eof :- true. The cleanest way to avoid this mess is to use a get0/1 which returns an end-marker, and if your vendor won't provide you with one, you'll have to write one yourself. +------------------------------------------------------------------------+ | An important reason for get0/1 returning an end-marker at | | the end of a stream is that this forms part of a practice | | of writing character-reading code. | +------------------------------------------------------------------------+ Note that this practice existed prior to Prolog: we didn't have to figure out anything new. Is there any other reason for preferring the Edinburgh Prolog version of get0/1? Yes. Suppose I have a goal bad_get0(X) and it fails. Does that mean that the end of the stream has been reached? ***NO***. It means that *EITHER* the end of the stream has been reached *OR* a character has been read which didn't happen to unify with X. Is there anything we can do afterwards to find out which was the case? No. Assuming, for the sake of argument, a predicate at_eof/0 which succeeds when we are at the end of the current input stream, ... ( bad_get0(X) -> /* character read */ ; at_eof -> /* now at end of stream */ ; /* otherwise it was unification failure */ ) ... isn't quite right. If there was precisely one character left in the input stream, bad_get0(X) will consume it, and if the unification fails, at_eof will *now* see the end of the file. The problem is that bad_get0/1 has a side-effect (consuming one character from the current input stream) which it SOMETIMES does and sometimes doesn't do. It's similar to playing Russian roulette, except that the gun is pointed at the foot rather than the head. It is interesting that Marien and Demoen fell into exactly this trap. They say: > > BTW, if you ever want to convert a program with a different interpretation, > the solution is easy : > > /*QP*/read(X) :- /*bim*/read(X), ! . > /*QP*/read(whatever_is_used_to_indicate_end_of_file) . > It may be easy, but it isn't a solution. Suppose we write this: buggy_read(Term) :- bim_read(Term), !. buggy_read(end_of_file). Now, suppose the current input stream contains fred. and we call buggy_read(end_of_file). IT WILL SUCCEED! It should have failed. Now, have I shown that the BIM approach is bad? No. What I have shown is that the end-marker approach which Quintus Prolog inherited from C Prolog (which got it from DEC-10 Prolog, which I think got it from Pop-2) is accompanied by a straightforward discipline for using it to write tokenisers. (In fact, it's exactly the same approach you use in C.) I am not aware of a similar methodology for the end-failure approach, despite having asked Bart Demoen at the Prolog Benchmarking Workshop for enlightenment on this point. Again, this doesn't mean that there isn't any such methodology, only that I don't know what it might be. If there is a straightforward way of turning deterministic transition diagrams into end-failure code, I would be pleased to be instructed in it.
ok@quintus.UUCP (Richard A. O'Keefe) (03/19/88)
In article <783@cresswell.quintus.UUCP>, I replied to article
<1197@kulcs.kulcs.uucp>, by bimandre@kulcs.uucp (Andre Marien)
[signed by Andre Marien and Bart Demoen both].
The topic was what read/1 and get0/1 should do at the end of a stream.
I thought you might be interested to know what the BSI committee say.
In document PS/236, "Draft minutes, Prolog Built-In Predicates meeting,
10 December 1987", we find
4 Design criterion
<name deleted> suggested: "Whenever possible, a predicate with
a side effect should always succeed and never instantiate
variables."
This of course rules get0/1 and read/1 out entirely. That may not be
what <name deleted> _meant_, but it _is_ what the minutes say he _said_.
As far as I can tell, the real intent is to rule out retract/1, which
is disliked because it unifies its argument with the thing you removed.
The minutes show that Paul Chung proposed naming the standard clause-
removing predicate delete/1 instead of retract/1. Good on yer, mate!
This should not be construed as endorsement of the name delete/1, but
as praise for Paul Chung's good standardisation manners.
micha@ecrcvax.UUCP (Micha Meier) (03/22/88)
In article <783@cresswell.quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes: >Here's how I'd write copy_chars/0 for real: > > copy_chars :- > get0(Char), > ( is_endfile(Char) -> true > ; put(Char), > copy_chars > ). > >This is only superficially different from C's > while ((Char = getchar()) != EOF) putchar(Char); > >Which version is "so obviously better and more readable"? > >I'm not going to try to answer that question, because it is entirely >the wrong question. An example this small can be coped with even if >it is badly written. This is true, we have to distinguish various uses of get0/1. The above example is indeed easier written when get0/1 fails at the eof, because the is_endfile/1 test is not needed. However, most often one wants to do more with the character rather than just test the eof, and only then the differences are meaningful. By the way, get0/1 does *not* exist in BSI, it uses get_char/1 instead, and its argument is a character, i.e. a string of length 1. This means that the type 'character' is inferred from the type 'string' (and not the other way round like in C). Does anybody out there know what advantages this can bring? It is independent on the character <-> integer encoding, but this only because explicit conversion predicates have to be called all the time. >We can convert a deterministic finite-state automaton to Edinburgh >Prolog with very little effort. We represent a state of the automaton >by a predicate with one argument: the next character. We represent an >arc of the automaton by a clause. For example, the arcs > > s1: a -> s2. > s1: b -> s1. > s1: $ -> accept. > >would be coded like this: > > s1(0'a) :- get0(Next), s2(Next). > s1(0'b) :- get0(Next), s1(Next). > s1(- 1) :- true. > In his tutorial to the SLP '87 Richard has taken another representation of a finite automaton which is more appropriate: s1 :- get0(Char), s1(Char). s1(0'a) :- s2. s1(0'b) :- s1. s1(-1) :- accept. The difference is, that if one wants to perform some action in some states, this must be done *before* reading the next character, i.e. just at the beginning of s1/0. Such representation can be more easily converted to the BSI's variant of get: s1 :- % do the corresponding action ( get0(Char) -> s1(Char) ; accept ). s1(0'a) :- s2. s1(0'b) :- s1. Note that the eof arc has to be merged into s1/0 in this way since if we'd write it like s1 :- s1_action, get0(Char), !, s1(Char). s1 :- accept. then after an eof we would backtrack over s1_action and undo what we've done. I must say, none of the two seems to me satisfactory. Richard's version is not portable due to the -1 as eof character. We can improve this into s1(X) :- eof(X), accept. s1(0'a) :- s2. s1(0'b) :- s1. and hope that the compiler will unfold the eof/1 inside the indexing mechanism, otherwise we have choice points even if the code is deterministic. The BSI version is much more arguable, though. Having to wrap a disjunction (and a choice point) around the get0/1 call suggests that for this application the BSI choice is not the appropriate one. It is interesting to note, however, that it could work even with nondeterministic automata, where the BSI's failure was (I thought) more likely to cause problems. >> BTW, if you ever want to convert a program with a different interpretation, >> the solution is easy : >> >> /*QP*/read(X) :- /*bim*/read(X), ! . >> /*QP*/read(whatever_is_used_to_indicate_end_of_file) . >> >It may be easy, but it isn't a solution. Suppose we write this: > > buggy_read(Term) :- bim_read(Term), !. > buggy_read(end_of_file). > >Now, suppose the current input stream contains > fred. >and we call > buggy_read(end_of_file). >IT WILL SUCCEED! It should have failed. Since the Edinburgh get0/1 can easily simulate the BSI's one with get0_BSI(Char) :- get0_Edinburgh(Char), not_eof(Char). but as Richard has shown, not vice versa, it is clear that for a Prolog system it is better to have get0/1 return some *portable* eof (e.g the atom end_of_file, for get0/1 there can be no confusion with source items) instead of some integer. This, however, just shifts the problem up to read/1: BSI objects that if it returns e.g. the atom end_of_file then any occurrence of this atom in the source file could not be distinguished from a real end of file. In this case, a remedy would be the introduction of a term with a local scope (e.g. valid only in the module where read/1 and eof/1 are defined) and using eof/1 instead of unifying the argument of read/1 with the end_of_file term. Hence read/1 would return this term on encountering the file end and eof/1 would check whether its argument is this term. --Micha
bruno@ecrcvax.UUCP (Bruno Poterie) (03/23/88)
I do not think that having read/1 returning an atom on EOF is a bad thing. If you take as an example certain UN*X tools, they read their input from a file (typically stdin) until finding a line composed of a single dot. So it is perfectly legal to submit a file which contains a dot-line in the middle if you want only the first part to be feeded to the tool. Same thing for Prolog, if you have a huge file full of various facts but want only say the first 100 to be used as input in your test program, then simply add the EOF mark before line 101. I would then prefer to have it as a directive: ... :- eof. so that it is not a bare fact but actually a command to the consulting tool that it have to EOF this input source. It is then coherent with other directives like: ... :- [file3,file4]. ... which actually order the consulting tool to insert at this point the content of the named files. I believe that the notation "eof" is quite standard in the UN*X system and already in some Prolog, including as a test predicate for this very term: ... read(Term), eof(Term) -> ... so i think we could maybe abandon the end_of_file notation of Quintus (sorry for you Richard, a compatibility switch could very easily turn it back anyway), but it is not an important point as the aim would be to discipline one's programming style by systematically using the test form: eof(Term) and never ever explicit the EOF term itself. Portability is great. Now for the get0/1 [or get_char/1/2]: having it returning an otherwise impossible value, say an atom as suggested by Micha, is ok if the returned thing is an integer representing the ascii code [or ebcdic] of the character. using the same term as the one returned by read/1, and consequently the same test predicate as only mechanism to check for EOF, would greatly improve the compactness and consistency of the i/o system. As a side effect, close/1 is not strictly necessary anymore as the following sequence does the job: eof(EOF), put(EOF) Because obviously put/1 must handle the same term in the same way (I am afraid that outputing CHAR modulo 256 would not work in this case). I nevertheless believe that EOF == -1 is a clearer convention, returning an object of the same type but out of the normal range of normal input, and is already the UN*X convention. It would not force put/1 to accept it as EOF character, as it would be outputed as: -1 modulo 256 (or 512) == 255. Passing -1 to UN*X putchar() does not generate an EOF! Ok, enough delirium tremens for today. My main point is: the character i/o should provide a very low level facility, with no hypothesis about the use which could be made of them. Using read(Term) and eof(Term) provides an uniform, simple, elegant and portable mean of performing i/o at Prolog term level. Using get0/1 implies you are interested in the real bits contained in your input support, so you want to control it at a low level. Returning the -1 value is portable and low-level, because independant of ascii or any other character set. Alternatively, returning eof and using the same eof(Char) test predicate would be again low-level, portable, and free of any supposed semantic. More important, most of prolog input loops may be adapted with this scheme at low cost. Failing at EOF, however, would mean full rewriting of those applications and system libraries. ================================================================================ Bruno Poterie # ... une vie, c'est bien peu, compare' a un chat ... ECRC GmbH # tel: (49)89/92699-161 Arabellastrasse 17 # Tx: 5 216 910 D-8000 MUNICH 81 # mcvax!unido!ecrcvax!bruno West Germany # bruno%ecrcvax.UUCP@Germany.CSNET ================================================================================
ok@quintus.UUCP (Richard A. O'Keefe) (03/23/88)
In article <518@ecrcvax.UUCP>, micha@ecrcvax.UUCP (Micha Meier) writes: > By the way, get0/1 does *not* exist in BSI, it uses get_char/1 instead, > and its argument is a character, i.e. a string of length 1. > This means that the type 'character' is inferred from > the type 'string' (and not the other way round like in C). > Does anybody out there know what advantages this can bring? > It is independent on the character <-> integer encoding, > but this only because explicit conversion predicates have > to be called all the time. I find it extremely odd to call a string of length one a character. It's like calling a list of integers which contains one element an integer. Do we call an array with one element a scalar? I haven't commented on the BSI's get_char/1 before because for once they have given a new operation a new name. There are two problems with it. A minor problem is that the result being a string, they can't represent end of file with an additional character, so the fail-at-end approach is hard to avoid. (Not impossible.) There is an efficiency problem: something which returns an integer or a character constant can just return a single tagged item, but something which returns a string either has to construct a new string every time, or else cache the strings somehow. For example, Interlisp has a function which returns you the next character in the current input stream, represented as an atom with one character in its name. (Well, almost: characters `0`..`9` are represented by integers 0..9.) This was quite attractive on a DEC-20, where you could just compute a table of 128 atoms once and for all. It wasn't too bad on VAXen either, where the table had to have 256 elements. But it because rather more clumsy on the D machines, which have a 16-bit character set. (Can you say "Kanji"? I knew you could.) So the alternatives I can see at the moment are o construct a new string every time. o precompute 2^16 strings. o cache 2^8 strings, and construct a new string every time for Kanji and other non-Latin alphabets. o not support Kanji or other non-Latin alphabets at all. (Can you say "Cyrillic"? How about "Devanagari"? You may need the assistance of a good dictionary; I used to mispronounce "Devanagari", and probably still do.) I wrote that > >For example, the arcs > > s1: a -> s2. > > s1: b -> s1. > > s1: $ -> accept. > >would be coded like this: > > s1(0'a) :- get0(Next), s2(Next). > > s1(0'b) :- get0(Next), s1(Next). > > s1(- 1) :- true. Meier says that > In his tutorial to the SLP '87 Richard has taken another > representation of a finite automaton which is more appropriate: > s1 :- > get0(Char), > s1(Char). > > s1(0'a) :- > s2. > s1(0'b) :- > s1. > s1(-1) :- > accept. There wasn't time to go into this in detail in the tutorial, but it should be obvious that the first approach is more general: in particular it can handle transitions where (perhaps because of context) no input is consumed, and it can handle lookahead. > Such representation can > be more easily converted to the BSI's variant of get: > s1 :- > % do the corresponding action > ( get0(Char) -> s1(Char) > ; > accept > ). This doesn't generalise as well as the end-marker version. Here is the kind of thing one is constantly doing: rest_identifier(Char, [Char|Chars], After) :- is_csymf(Char), !, get0(Next), rest_identifier(Next, Chars, After). rest_identifier(After, [], After). See how this code can treat the end marker just like any other character: because it doesn't pass the is_csymf/1 test (copied from Harbison & Steele, by the way) we'll pick the second clause, and there is no special case needed for an identifier which happens to be at the end of a stream. The fail-at-end approach forces us not only to do something special with the get0/1 in rest_identifier/3, but in everything that calls it. (In the Prolog tokeniser, there are two such callers.) The point is that if-then-elses such as Meier suggests start appearing all over the place like maggots in a corpse if you adopt the fail-at-end approach, to the point of obscuring the underlying automaton. > I must say, none of the two seems to me satisfactory. Richard's > version is not portable due to the -1 as eof character. If the standard were to rule that -1 was the end of file character, it would be precisely as portable as anything else in the standard! In strict point of fact, the Prolog-in-Prolog tokeniser was written in DEC-10 Prolog for DEC-10 Prolog, and used 26 as the end of file character, and 31 as the end of line character. It took 5 minutes with an editor to adapt it to Quintus Prolog. I wish C programs written for UNIX took this little effort to port! > for a Prolog system it is better to have get0/1 return > some *portable* eof (e.g the atom end_of_file, for get0/1 > there can be no confusion with source items) instead of > some integer. It is important that the end-of-file marker, whatever it is, should be the same kind of thing, in some sense, as the normal values, so that classification tests such as is_lower/1, is_digit/1, and so on will just fail quietly for the end-of-file marker, not report errors. Since end of file is rare, we would like to test the other cases first. Pop-2 on the Dec-10 returned integers almost all the time, except that at the end of a stream you got an end-of-file object which belonged to another data type (there was only one element of that data type, and it printed as ^Z). This was in practice a major nuisance, because before you could do anything other than an equality test with the result, you had to check whether it was the end of file mark. I have been giving out copies of the Prolog-in-Prolog tokeniser to show how easy it is to program character input with the Edinburgh Prolog approach. If someone would give me a tokeniser for BSI Prolog written entirely in BSI Prolog using the fail-at-end approach, and if that tokeniser were about as readable as the Prolog-in-Prolog one, that would go a long way towards convincing me that fail-at-end was a good idea. > BSI objects that if [read/1] returns e.g. the atom end_of_file > then any occurrence of this atom in the source file > could not be distinguished from a real end of file. That's not a bug, it's a feature! I'm serious about that. At Edinburgh, I had the problem that if someone asked me for help with Prolog, they might be using one of four different operating systems, where the end of file key might be ^Z or ^D or ^Y or something else which I have been glad to forget. No problem. I could always type end_of_file. to a Prolog listener, and it would go away. Oh, this was so nice! In fact, on my SUN right now I have function key F5 bound to "end_of_file.\n" so that I can get out of Prolog without running the risk of typing too many of them and logging out. Another thing it is useful for is leaving test data in a source file. One can do <declarations> <clauses> end_of_file. <test cases> and include the test cases in the program or not just by moving the end_of_file around. Ah, you'll say, but that's what nested comments are for! Well no, they don't work. That's right, "#| ... |#" is NOT a reliable way of commenting code out in Common Lisp, and "/* ... */" is NOT a reliable way of commenting code out in PopLog. But end_of_file, in Edinburgh Prolog, IS a reliable way of commenting out the rest of the file. > In this case, a remedy would be the introduction of Prolog needs a remedy for end_of_file like Elizabeth Schwarzkopf needs a remedy for her voice. Before taking end_of_file away from me, the BSI committee should supply me with a portable way of exiting a break level and a reliable method of leaving test cases in a file without having them always read.
ok@quintus.UUCP (Richard A. O'Keefe) (03/24/88)
In article <519@ecrcvax.UUCP>, bruno@ecrcvax.UUCP (Bruno Poterie) writes: > I believe that the notation "eof" is quite standard in > the UN*X system and already in some Prolog I just grepped through the UNIX [UNIX is a trademark of AT&T] manuals, and all I could find was the function feof(Stream). None of the UNIX utilities I am familiar with uses "eof" to signify end of file. Franz Lisp does something interesting: (ratom [Port [Eof]]) (read [Port [Eof]]) (readc [Port [Eof]]) return the Eof argument (which defaults to nil) when you read the end of the file, so you can get whatever takes your fancy. > so i think we could maybe abandon the end_of_file notation of Quintus (sorry > for you Richard, a compatibility switch could very easily turn it back anyway), But it ***ISN'T*** a Quintus notation! This is the notation used by DEC-10 Prolog EMAS Prolog C Prolog Quintus Prolog Stony Brook Prolog ALS Prolog Expert Systems International Prolog-2 AAIS Prolog (in "Edinburgh" mode only) and doubtless many others. end_of_file IS the "de facto" standard. Poterie's suggestions are good ones, but in order to overthrow the de facto standard, they would have to be MUCH MUCH better, and they aren't. > but it is not an important point as the aim would be to discipline one's > programming style by systematically using the test form: > eof(Term) > and never ever explicit the EOF term itself. Portability is great. Beware. While Quintus Prolog offers the library predicate is_endfile(?EndMarker) there are other Prolog systems, such as AAIS Prolog, where there is a predicate with a similar name which takes a Stream argument: is_eof(+Stream) in AAIS Prolog means "is it the case that Stream is positioned at its end?". Yes, portability is great, but would it not be more just to reward those people (such as SICS, Saumya Debray, ALS, and others) who have tried to provide it, by standardising their solution? > As a side effect, close/1 is > not strictly necessary anymore as the following sequence does the job: > eof(EOF), put(EOF) Um, what about INPUT streams? And there is another reason for wanting close/1: it will close a stream which is not the current output stream.
ok@quintus.UUCP (Richard A. O'Keefe) (03/24/88)
Just to continue with the get0 topic: The fail-at-end approach rests on an assumption which deserves to be made explicit, because it is false. What is the assumption? That receiving the end-of-file indication from an operating system indicates that there is nothing further to read in that stream. This is false? Yes. Let's ignore 4.2BSD sockets, V.3 Streams, non-blocking I/O, VMS concatenated files, and other esoterica which one doesn't expect BSI Prolog to cope with. Let's just consider terminals. In Tops-10 (home of DEC-10 Prolog): end-of-file from the terminal is a software convention (^Z). You can just keep reading from the terminal after that, and in fact that's exactly what DEC-10 Prolog does. In UNIX (original home of C Prolog): end-of-file from the terminal is a software convention (EOF character typed after an empty line). You can just keep reading from the terminal after that, and in fact that's exactly what C Prolog does. In VM/CMS, using SAS Lattice C end-of-file from the terminal is a software convention (some magic string, which defaults to "EOF", but it is trivially easy for a program to change it -- use afreopen()). I believe that you can keep reading from the terminal after that, but I haven't tried it myself. On a Macintosh, using ALS Prolog end-of-file from a window is a software convention (you click on "End of File" in the "Edit" menu). All windows and streams remain live after that, and you can just keep reading, and that's what ALS Prolog does. On a Xerox Lisp Machine, using Xerox Quintus Prolog end-of-file from a TEXEC window is a software convention. All windows and streams remain live after that, and you can just keep reading, and that's what XQP does. [The sample of Prologs is not of interest here; my point is that there are several *operating systems* with this characteristic. ] So the rule actually followed in Edinburgh-compatible Prologs is that - the sequence of characters codes returned by get0/1 is the sequence of characters delivered by the source - with the end-of-file marker inserted every time the host indicated the end-of-file condition - Prolog receives through get0/1 as many characters and as many end-of-file markers as there are; any attempt to read past the end of this union stream is an error. Not a failure, an error. It happens that when you are reading from disc files, most operating systems will indicate the end of file condition once. Are terminals the only kind of file for which multiple end-of-file conditions are plausible? No. The convention for tapes is that a single tape-mark (usually reported as an end-of-file condition) is merely a separator; a tape as such is terminated by a double tape-mark. Thus a Prolog program copying one tape to another (this is a reason why we might want put(-1) NOT to close a file; if it does anything special on a tape it should be to write a tape-mark) might want to keep reading after seeing an end-marker.
grzm@zyx.UUCP (Gunnar Blomberg) (03/25/88)
In article <801@cresswell.quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes: >[...] So the alternatives I can see at the moment >are > o construct a new string every time. > o precompute 2^16 strings. > o cache 2^8 strings, and construct a new string every > time for Kanji and other non-Latin alphabets. > o not support Kanji or other non-Latin alphabets at all. How about: o support an immediate representation for characters. if you've got room for them in your pointers. Or o cache them as they occur. if you haven't. I can't see that the fact that characters look like one-element strings to the Prolog programmer in any way would stop an implementor from implementing them using the same tricks as if characters were a separate data-type. Yes, it makes the internal string-handling somewhat more convoluted, but not unduly so, I would say. -- Gunnar Blomberg, ZYX, +46 8 6653205, grzm@zyx.se
grzm@zyx.UUCP (Gunnar Blomberg) (03/25/88)
Hmm... isn't this a lot of fuss about very little? It seems to me that whatever semantics is chosen, it is simple to get the other: BSIread(X) :- DEC10read(X), X \== end_of_file. DEC10read(X) :- BSIread(Y), !, X = Y. DEC10read(end_of_file). Given that most Prologs seem to use the DEC-10 Prolog approach, and that it is probably marginally more efficient to write BSIread in terms of DEC10read than the other way around, the DEC-10 approach seems the obvious choice. Not that I think the other choice is all that much worse... Isn't it more interesting to discuss things where it is harder to get it the way one wants (like the question raised by Richard O'Keefe about whether a string data-type is necessary, or even useful. Now *that* is interesting!) ---------- At this point I had a discussion with a colleague of mine, and it turns out that it isn't this simple. In fact, I now believe that it is impossible to get the BSIread functionality from a system that only provides the DEC-10 one. The predicate BSIread above will fail if the file read contains 'end_of_file', of course. This (for me) tips the balance over in favor of the BSI approach. It is after all easy to write DEC10read in terms of BSIread. Naturally there should be a provision for compatibility with "old" programs. I would be quite happy to name BSIread read_term, for instance, and provide a user-level predicate read, that could be redefined to give the required semantics. ----------- As far as get0 goes, the question is much easier, since there *is* an obvious out-of-band value, namely -1. -- Gunnar Blomberg, ZYX, +46 8 6653205, grzm@zyx.se
grzm@zyx.UUCP (Gunnar Blomberg) (03/25/88)
In article <814@cresswell.quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes: >Just to continue with the get0 topic: > > The fail-at-end approach rests on an assumption > which deserves to be made explicit, because it is false. > >What is the assumption? That receiving the end-of-file indication from >an operating system indicates that there is nothing further to read in >that stream. This is false? Yes. > [Lots of examples deleted] This argumentation seems a little doubtful to me. I don't have experience with all the systems RAO'K mentions, but (to the best of my memory) I have never seen a use of end-of-file from the terminal that wasn't being used to pretend that the terminal was more than one file. Cases in point: DEC-10 Prolog (on TOPS-20, alas): User says [user], gives clauses and ends with ^Z. The system pretends that there is a file 'user' by reading from the terminal until end-of-file is seen. As far as Prolog is concerned the file ended at that point, and no more reading is done from that particular file at that point. Using the terminal as standard input in Unix: Example: user types 'cat >foo' and then writes contents of file on terminal, indicating end by end-of-file. As far as the reader of that particular input is concerned the file ended at that point, and no more reading is done from that particular 'file'. In conclusion: I think that software conventions concerning end-of-file from the terminal exist primarily to enable the system/user to pretend that the terminal is more than one file. In fact, I know of no instance where this is not so. Can somebody come up with an example where multiple end-of-files are actually used in one single ('conceptual') file? -- Gunnar Blomberg, ZYX, +46 8 6653205, grzm@zyx.se
grzm@zyx.UUCP (Gunnar Blomberg) (03/25/88)
In article <801@cresswell.quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes: >Another thing it is useful for is leaving test data in a source file. >One can do > <declarations> > <clauses> > end_of_file. > <test cases> >and include the test cases in the program or not just by moving the >end_of_file around. > >Ah, you'll say, but that's what nested comments are for! >Well no, they don't work. That's right, "#| ... |#" is NOT a reliable >way of commenting code out in Common Lisp, and "/* ... */" is NOT a >reliable way of commenting code out in PopLog. But end_of_file, in >Edinburgh Prolog, IS a reliable way of commenting out the rest of the file. Well, considering the fact that nested comments can comment out *any* part of the file, not just the last part, and that the cases where nested comments do not work must be so exceedingly rare as to be practically non-existent, I would definitely prefer nested comments. Honestly, how often do you have unmatched beginning-of-nested-comment of end-of-nested-comment buried inside your code? Well, just because nested comments are much more useful than plain ones does not mean that BSI should adopt them. There is the question of supporting "old" code. It would be interesting to know how many programs would break if Prolog comments were changed to be nesting. Do you know of any? [I have actually seen the following style used in C: /* #define wantFOO 1 /* To get foo feature */ #define wantBAR 1 /* To get bar feature */ /* #define wamtBAZ 1 /* To get baz feature */ It gave me a good laugh at the time.] In any case, I have always considered the use of end_of_file to get some kind of half-baked ability to comment out a part of a file as an abomination (which does not mean I didn't use it and find it useful). -- Gunnar Blomberg, ZYX, +46 8 6653205, grzm@zyx.se
cdsm@ivax.doc.ic.ac.uk (Chris Moss) (03/25/88)
In article <801@cresswell.quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes:
rok>I find it extremely odd to call a string of length one a character.
rok> ... But it because rather more
rok>clumsy on the D machines, which have a 16-bit character set. (Can you say
rok>"Kanji"? I knew you could.)
Yes, the BSI committee is just beginning to face up to this problem,
as the Japanese have just started taking an interest...
As Richard points out, it's not much problem for a character based
definition, which I personally would favour.
rok>The fail-at-end approach forces us not only to do something special
rok>with the get0/1 in rest_identifier/3, but in everything that calls it.
rok>(In the Prolog tokeniser, there are two such callers.)
rok>
rok>The point is that if-then-elses such as Meier suggests start
rok>appearing all over the place like maggots in a corpse if you adopt
rok>the fail-at-end approach, to the point of obscuring the underlying
rok>automaton.
I think this is a fair point when looking at the definition of
lexical analysers, however...
mmeier> I must say, none of the two seems to me satisfactory. Richard's
mm> version is not portable due to the -1 as eof character.
A character definition which included a (special) end-of-file token would be
better.
mm> BSI objects that if [read/1] returns e.g. the atom end_of_file
mm> then any occurrence of this atom in the source file
mm> could not be distinguished from a real end of file.
rok>
rok>That's not a bug, it's a feature! I'm serious about that.
I don't think that is any better than most uses of that particular
argument. Sure, if you learn to live with it you can find uses for it.
rok>Before taking end_of_file away from me, the BSI committee should supply
rok>me with a portable way of exiting a break level and a reliable method of
rok>leaving test cases in a file without having them always read.
And this is the death of any standardization process! I have yet to
find the document that Richard referred to (a few days ago) when he
claimed that the BSI's mandate was to standardize Edinburgh Prolog.
It certainly hasn't been repeated in all the other formal
presentations that have been made to BSI or ISO. But if one has to
follow every wrinkle of an implementation just because it represents
(arguably) the most popular dialect, then why don't we just
appoint IBM to write all our standards for us (or Quintus or ...)?
[And who is the TRUE inheritor of the title "Edinburgh Prolog" anyway? Is
it the commercial product (formerly NIP) now being sold under that title?]
To return to the argument, I think there's a significant difference between
get0 and read. Having an end-of-file marker for read is (almost
never) used to implement finite-state-machines. Instead it is used
for repeat-fail loops.
e.g. go :- repeat,
read(Term),
(Term=end_of_file -> true; process(Term), fail).
Now in the days before tail recursion and all the other optimizations
this was inevitable. But why should we encourage this approach today?
The above clause is a good example of the trickiness of "repeat". I always
write repeat loops wrong first time and this was no exception. I put
(Term=end_of_file -> true; process(Term)), fail.
then changed it to
(Term=end_of_file -> !; process(Term)), fail.
before settling on the above version. I personally think "repeat" should
be left out of the standard (there's no penalty overhead in not having it
built-in these days anyway). Don't other people have my problem?
It would seem to encourage better programming if we allowed "get0"
(or get_file or whatever) to return an end-of-file token, and any
high-level routines to fail at end-of-file. It's not particularly
consistent, but I don't know whether that's a priority in this case.
rok>In fact, on my SUN right now I have function key F5 bound to
rok>"end_of_file.\n" so that I can get out of Prolog without running the
rok>risk of typing too many of them and logging out.
I seem to get by perfectly well by setting "ignoreeof" in my cshell!
rok>Ah, you'll say, but that's what nested comments are for!
rok>Well no, they don't work. That's right, "#| ... |#" is NOT a reliable
rok>way of commenting code out in Common Lisp, and "/* ... */" is NOT a
rok>reliable way of commenting code out in PopLog.
That seems to be the best argument for allowing end-of-line comments in
Prolog. Now where do I find the Emacs macro for commenting out all lines
between dot and mark (and removing such comments)?
Chris Moss
Disclaimer: unless I say otherwise I am expressing my personal opinions
NOT the opinions of any committee!
ok@quintus.UUCP (Richard A. O'Keefe) (03/27/88)
In article <2410@zyx.UUCP>, grzm@zyx.UUCP (Gunnar Blomberg) writes: > Hmm... isn't this a lot of fuss about very little? No. I have a suggestion for you. Write a Pascal tokeniser in the following three programming languages: o C (end-of-file is a special value) o Pascal (end-of-file is tested by eof(input)) o PL/I (end-of-file is an exception). _Then_ come back and tell us it's "very little". Based on my experience with these three, I'd rank them out of 10 on a "difficulty" scale as C: 1, Pascal: 3, PL/I: 10. (Try telling a C programmer that he would be better off if end-of-file were handled by a new SIGEOF signal. If, back when I was writing PL/I, you had offered me a version of PL/I which handled end-of-file the way Pascal does, I'd have thanked you with tears in my eyes.) What happens when you hit the end of a file is not a minor matter. After all, every file has at least one end! If we were designing a new programming language, it would deserve the most careful attention. The treatment of end-of-file has a large effect on the structure of programs. But the Prolog standard is not supposed to be a matter of designing a new programming language. I keep saying this, and people seem to keep failing to see the point: the criteria for changing an existing language are MUCH more stringent than the criteria for designing a new one. For example, I think that abbreviations in the names of evaluable predicates are bad, so that argument/3 would be a better name than arg/3. So what? It isn't better ENOUGH to warrant the change. I could list a score of such things in Edinburgh Prolog which are not to my personal taste, and which I believe I have objective grounds for criticising. What of that? There is none of them bad enough to warrant my breaking other people's code. Now changing the behaviour get0/1 and read/1 would break every program I have ever written that does any input. (The change from is_endfile(26) in DEC-10 Prolog and some versions of C Prolog to is_endfile(-1) in some versions of C Prolog and Quintus Prolog took an average of about 10 seconds per file to fix with a good editor.) If someone comes up to you and asks you to improve their programming language, you have a pretty heavy responsibility to do a good job of it. Quintus move very slowly and very cautiously: once we've put something in the language, customers are likely to start using it, and pulling a feature out on the grounds that we don't like it any more is not really ethical behaviour. The moral responsibility of a group of people who take it on themselves to change a language around without being asked to by the people who will be affected by such changes is much much greater. At the very least, a paramount concern of such a group should be to provide enough operations and hooks in the standard that "99%" compatibility packages for some reasonably representative set of dialects should be KNOWN to be definable using standard operations. For example, in my work on this in 1984, I very carefully worked through Waterloo Prolog (NOT an Edinburgh-compatible Prolog) to find out what extra hooks would be needed. > It seems to me that whatever semantics is chosen, it is simple to get > the other: > BSIread(X) :- | get_char(X) :- > DEC10read(X), | get0(C), > X \== end_of_file. | C =\= -1, | string_list(X, [C]). > DEC10read(X) :- | get0(C) :- > BSIread(Y), | ( get_char(X) -> string_list(X, [C]) > !, | ; C = -1 > X = Y. | ). > DEC10read(end_of_file). I can't find a BSI document which describes read/1 anything but vaguely, so I've added the character I/O versions on the right, and it's those I'll comment on. (By the way, string_list/2 is a pretty appalling name; you would expect it to have something to do with lists of strings.) The latest character I/O document I checked was so phrased as to suggest that having failed once, get_char/1 would continue to fail. There was a note which pointed out that it was still an open question whether get_char/1 should do this or should report an error if called again after having once failed. This presumably carries over to read/1. So we simply don't yet know whether the first definition is correct or not, because BSI I/O is not yet fully defined. Case 1: calling get_char/1 after it has already failed results in an error report. The cross-definitions of get_char/1 and get0/1 would then be correct, IF an end-of-file condition could be indicated only once in a file, which is false. Case 2: get_char/1 keeps on failing quietly. Then none of the cross-definitions would be correct. Since the only motivation that anyone has ever told me about for the fail-at-end approach is the analogy between a file and a list of characters, case 2 is the "natural" one. That is, a parallel is thought to exist between next_term([Head|Tail], Head, Tail). and next_term(File, Head) :- read(File, Head). and if we take this seriously, we would expect read(File, Head) to keep on failing at the end of a file, just as next_term([], Head, _) would keep failing. Now the analogy is very far from being a good one, so there may be some other motivation I have not been told about which would make case 1 the "natural" one. Even in case 1, and even discounting the extremely useful possibility of a literal 'end_of_file' appearing in the input, it is still not clear that the cross-definitions for read/1 would be correct. There are two difficulties: what about syntax errors? and what about end of file? There are end-of-file problems in read/1 additional to those in get0/1, due to the fact that a term is an extended object, and the fact that read/1 may consume arbitrarily many characters without encountering a term. > It is after all easy to write DEC10read in terms of BSIread. Strictly speaking, it is impossible, because the two syntaxes are different. Even ignoring that, it isn't clear to me that it is possible. It *would* be possible to write read/1 in terms of get_char/1 (though it would be rather more painful than it would be given get0/1).
ok@quintus.UUCP (Richard A. O'Keefe) (03/27/88)
In article <2412@zyx.UUCP>, grzm@zyx.UUCP (Gunnar Blomberg) writes: > Well, considering the fact that nested comments can comment out > *any* part of the file, not just the last part, and that the cases > where nested comments do not work must be so exceedingly rare as to be > practically non-existent, I would definitely prefer nested comments. > Honestly, how often do you have unmatched beginning-of-nested-comment > of end-of-nested-comment buried inside your code? I see, it is ok to have an operation which works 99.9% of the time? It would be ok if "X is 1+1" almost always gave you the answer 2 but *might* give you 47? You would be happy to cross a ravine on a bridge which has been known to collapse before, but doesn't collapse often? I have found at least two C compilers where char *file_pattern = "/usr/me/foo/*"; broke because the pre-processor thought the /* was the beginning of a comment. Are you trying to tell me that a Prolog programmer writing for UNIX is never going to say file_pattern("/usr/me/foo/*"). and is never going to want to comment that out? Are you really? When I learned PL/I, the instructor stressed very strongly that we should never start a /**/ comment in column 1. Remember why? No doubt the JCL designers would have said "Honestly, how often do you have /* buried inside your data decks?". Perhaps I'm an old-fashioned fuddy-duddy, but it seems to me that an operation should ALWAYS do exactly what it is supposed to do, or else TELL you that it went wrong. {And yes, DEC-10 Prolog didn't live up to this, and no, I've never said that the standard should be identical to DEC-10 Prolog.} Yes, commenting file_pattern(...) out with PL/I-style comments isn't going to work either, but with PL/I-style comments you KNOW that there is no reason to expect it to work. Commenting it out with "%" WILL work. End-of-line comments are a much more reliable method of commenting out blocks of code than nesting comments, and are already part of the language. > Well, just because nested comments are much more useful than > plain ones does not mean that BSI should adopt them. There is the > question of supporting "old" code. It would be interesting to know > how many programs would break if Prolog comments were changed to be > nesting. Do you know of any? (1) BSI Prolog is going to break existing code in much worse ways than that. Old code written in ESI Prolog-2 stands a good chance of running under BSI Prolog, but old code written in Arity/Prolog has very little chance of getting away without massive changes. (2) The DEC-10 Prolog library broke when it was mounted under PopLog because PopLog used Modula-style comments and Edinburgh Prolog uses PL/I-style comments. (3) With a table-driven tokeniser, there isn't any reason at all why the Prolog standard can't make *BOTH* types of comment available. (4) If you think of "commenting-out" not as a matter of adding some characters at the beginning of a region and some other characters at the end, but as an operation on the entire region, you'll soon realise that you don't actually need nesting comments to be able to comment out a region that contains comments. For example, the editor I am using at the moment has commands Ctrl-X Ctrl-[ comment out the region using /* */ Ctrl-X Ctrl-] undo the effect of ^X^[ Works fine *even when nesting comments would break*, and I can use it in C as well as Prolog. (Actually, in Prolog I use a Meta-% command which uses "%".) So "much more useful" I take leave to doubt.
ok@quintus.UUCP (Richard A. O'Keefe) (03/28/88)
In article <2411@zyx.UUCP>, grzm@zyx.UUCP (Gunnar Blomberg) writes: > In article <814@cresswell.quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes: > >Just to continue with the get0 topic: > > The fail-at-end approach rests on an assumption > > which deserves to be made explicit, because it is false. > >What is the assumption? That receiving the end-of-file indication from > >an operating system indicates that there is nothing further to read in > >that stream. This is false? Yes. > This argumentation seems a little doubtful to me. I don't have > experience with all the systems RAO'K mentions, but (to the best of my > memory) I have never seen a use of end-of-file from the terminal that > wasn't being used to pretend that the terminal was more than one file. > DEC-10 Prolog (on TOPS-20, alas): > User says [user], gives clauses and ends with ^Z. The system > pretends that there is a file 'user' by reading from the > terminal until end-of-file is seen. As far as Prolog is > concerned the file ended at that point, and no more reading > is done from that particular file at that point. Wrong. Dec-10 Prolog does **NOT** think the file ended at that point. compile(user), consult(user), break, or anything which uses the usual consult loop will stop. But it will stop because it has seen 'end_of_file', not because it thinks the file has ended. Those loops don't do any more reading because they DON'T do any more reading, not because they CAN'T. Here is a transcript: | ?- read(A), read(B), read(C), read(D), read(E). |: this-is-a. |: ^Z |: this-is-c. |: ^Z |: ^Z A = this-is-a, B = D = E = end_of_file, C = this-is-c > In conclusion: I think that software conventions concerning > end-of-file from the terminal exist primarily to enable the > system/user to pretend that the terminal is more than one file. This may be misleading: the effect is of NESTED files, not of a sequence of files. (In fact if you want, you can even interleave several input streams from a terminal, each stream getting its own end-of-file.) Blomberg seems to suggest that by saying that the possibility of multiple end-of-file conditions is to permit multiple user of the terminal, he has shown that it is not important to preserve this behaviour. I hope that I have misunderstood him. (1) I took some pains in my previous message to point out that it is not only terminals which can experience multiple end-of-file conditions. {Think for a moment about doing BSD-style non-blocking I/O in Prolog. Why _not_?} And I specifically mentioned 'user'. Even in the case of terminals, this behaviour can be obtained with _any_ (real or pseudo-) "terminal", not just 'user'. (2) Nested "files" on a terminal are something one uses many times a day. Think about break/0 and debugging. (3) "I don't use it" is not good justification for "it should go". I don't use ancestral cuts. Is that any sort of justification for saying they should not be in the language? No way! I do say it, but I need much better evidence than that: I had to show how you could write a Prolog interpreter without them, and I had to find arguments to show that ancestral cuts were bad in themselves. Before we get into further debate about multiple end-of-file conditions I should do a better job of explaining why I raised the point. The ability to 'consult(user)' and to 'break' is very useful for debugging. Having read/1 and get0/1 return an end-of-file marker when the operating system indicates end-of-file is a coherent convention which additionally makes this use of the terminal very easy, without requiring special conventions for resetting the state of the terminal stream. (You get nested data streams from 'user' without having to open new streams.) It did not appear from what few sparse minutes there are that the BSI committee had ever considered this. I claim that this is existing practice in "Edinburgh Prolog" (the BSI definition of which is "what Clocksin & Mellish say"), and that extraordinarily good reasons are needed to warrant a change in existing practice, and that those reasons have not yet been explained. Further, I pointed out the possibility of multiple end-of-file conditions to show that the claim that it is easy to simulate existing practice using the fail-at-end approach is a false claim. I do not claim that returning an end-marker at the end is the only sensible thing to do in all circumstances, nor that a Prolog standard should mandate ALS-Prolog behaviour only. Whoever it was who suggested that it should be possible to get an end-marker, a failure, or an error, was clearly right. Who could object to a standard which made it easier for Prolog users to get their existing code running?
ok@quintus.UUCP (Richard A. O'Keefe) (03/28/88)
In article <243@gould.doc.ic.ac.uk>, cdsm@ivax.doc.ic.ac.uk (Chris Moss) writes: > I have yet to > find the document that Richard referred to (a few days ago) when he > claimed that the BSI's mandate was to standardize Edinburgh Prolog. > It certainly hasn't been repeated in all the other formal > presentations that have been made to BSI or ISO. Don't bother, I'm about to post it. > But if one has to > follow every wrinkle of an implementation just because it represents > (arguably) the most popular dialect, then why don't we just > appoint IBM to write all our standards for us (or Quintus or ...)? > [And who is the TRUE inheritor of the title "Edinburgh Prolog" anyway? Is > it the commercial product (formerly NIP) now being sold under that title?] No, one doesn't have to follow every wrinkle of an implementation. How often do I have to repeat it? I don't give a continental for implementors or implementations. Not Quintus, not ALS, not Arity, not IBM, not Borland, not any of them. +---------------------------------------+ | What I care about is Prolog *USERS* | +---------------------------------------+ The question to be asked every time is "might this change break a reasonable program?" "How can we make it easy for people who are already using Prolog to change over to the standard, especially for people using Prolog systems whose vendors have made some attempt at compatibility?" What is "Edinburgh Prolog"? I have two definitions, which have much the same practical effect. (1) An "Edinburgh Prolog" is one whose implementors made a serious and reasonably successful attempt to make their system compatible with Clocksin & Mellish, or better yet, with DEC-10 Prolog or C Prolog. (2) An "Edinburgh Prolog" is one to which I can port the DEC-10 Prolog library in two days, with only a text editor to help me. (That is, no boot-strapping through Prolog or Lisp.) By either definition, we have the following results: Dialect is "Edinburgh Prolog"? VM/PROLOG no Waterloo Prolog no AAIS Prolog not quite, but closer than BSI Prolog by (2)-estimated BIM Prolog in "native" mode, no, in "-c" mode almost IF Prolog yes Arity Prolog yes ESI Prolog-2 no micro-PROLOG no Stony Brook yes SICStus Prolog yes ALS Prolog yes NU Prolog yes Poplog yes (well, it was in mid-1984) LM Prolog no NIP yes (in 1985) Other Prolog versions are omitted because I haven't got access to manuals for them and haven't used them. The fact that I classify something as "not an Edinburgh Prolog" does not mean that I regard it as technically inferior, only that I regard it as sufficiently different to be hard to port to or from. > Now in the days before tail recursion and all the other optimizations > this was inevitable. But why should we encourage this approach today? A very simple reason: there are still people trying to use Prolog on IBM PCs and clones, and several PC Prologs have 64kbyte stacks. So failure-driven loops are still necessary on those machines, not because the Prolog systems are bad, but because they are good enough for the limitations of the machine to be encountered. > I seem to get by perfectly well by setting "ignoreeof" in my cshell! This doesn't work terribly well if you are using the Bourne shell. Is Prolog to be standardised only for people who use the C shell? > Now where do I find the Emacs macro for commenting out all lines > between dot and mark (and removing such comments)? Well, the editor which I claimed does it isn't Emacs. Commenting out lines using "%" is 18 lines of C. Commenting out the region with /**/ and undoing that come to 40 lines of C. I'll send this by E-mail if you're interested: it should be easy to translate them to mock-Lisp, except that I strongly dislike mock-Lisp.
lee@mulga.oz (Lee Naish) (03/30/88)
In article <243@gould.doc.ic.ac.uk> cdsm@doc.ic.ac.uk (Chris Moss) writes: >e.g. go :- repeat, > read(Term), > (Term=end_of_file -> true; process(Term), fail). > Testing for the end of file term using = is a common error. If a variable is read, it succeeds. A second criticism I would make of this code is that it has a tendency to loop. I think that repeat/0 should also have a matching cut. I posted a nicer way to encapsulate this backtracking style of reading terms a while back. It is also possible to move the read back into the repeat loop, avoiding repeat and the need for cut. With tro, it is just as efficient. Interestingly, it it works whether read fails or succeeds on eof. % returns all terms read by backtracking % (should have stream/file arg and close file at eof?) read_terms(Term) :- read(Term1), \+ isEof(Term1), % if you dont want to return end_of_file ( Term = Term1 ; % \+ isEof(Term1), % if you do want to read_terms(Term) ). Richard metioned some subtle differences between eof/1, is_eof/1 etc in different systems. There is another one which he missed: in NU-Prolog, isEof/1 checks if its argument is the eof term (reading variables works) and eof/1 returns the eof term (which is end_of_file for comatability). Now for my suggestion of a new predicate which can be used to implement your favourite version of read/1: read_term(Stream, Term) % change the name if necessary 1) If it succeeds in reading a term T, Term = term(T, VarNames) where Varnames is some structure which allows mapping between variables and their names. Wrapping a functor around the term enables to distinguish between variables, 'end_of_file' and real end of file easily. It also lets us retain variable name information. 2) If end of file is encountered for the first time, or if an end of file marker occurs next in the stream (like ^Z on a terminal) Term = eof_marker 3) If eof has already been read and multiple eof markers are not possible Term = error(past_eof) Whether this is an error is arguable, but by explicitly returning something, the programmer has the choice of what to do. Rather than having a proliferation of top level functors being returned by read_term, it seems reasonable to wrap the error functor around past_eof. 4) If there is a syntax error Term = error(syntax(X)) where X is some indication of the error 5) If Stream is not a valid stream Term = error(invalid_stream(X)) where X is some indication of the error 6) If there has just been a disk head crash Term = error(unix(hardware(disk_head_crash(X)))) etc, etc. Similarly, reading characters could be done as follows read_character(Stream, Char) % change name if necessary 1) If it succeeds in reading a character C, Char = C where char_to_int(C, I) can map the character to a small integer for get0/1. There is no special functor needed to wrap up Char, assuming the other things returned by read_character/2 can be distinguished from characters (eg, by is_character(Char)). 2) If end of file is encountered for the first time, or if an end of file marker occurs next in the stream (like ^Z on a terminal) Char = eof_marker 3) If eof has already been read and multiple eof markers are not possible Char = error(past_eof) 4) I doubt that there will ever be a need for error(syntax(X)), but it should be reserved anyway. 5) If Stream is not a valid stream Char = error(invalid_stream(X)) etc, etc. I think it would be useful for these (with the details fleshed out a bit more) to be part of the standard. lee
micha@ecrcvax.UUCP (Micha Meier) (04/07/88)
In article <243@gould.doc.ic.ac.uk> cdsm@doc.ic.ac.uk (Chris Moss) writes: >... I personally think "repeat" should >be left out of the standard (there's no penalty overhead in not having it >built-in these days anyway). Don't other people have my problem? Unfortunately, there is a penalty when it is not built-in. If repeat/0 is coded as repeat. repeat :- repeat. it creates a new choice point each time the system backtracks to it, however this is not the main point, I'm sure there are clever compilers around that could get by with it. The problem concerns the Byrd box model for debugging and it was pointed to me by Thomas Graf: if repeat/0 is built-in, it is called only once (it enters the CALL port and leaves through the EXIT port), on backtracking it enters by REDO and leaves by EXIT. When its choice point is cut, it just exits or fails. The situation is different with the above Prolog coding: on each backtracking to repeat/0 a new call is made, i.e. a new box and hence there are two more ports to trace. Look at this script from SICStus Prolog: | ?- [user]. | repeat1. | repeat1 :- repeat1. | ^D yes | ?- trace, repeat1, fail. The debugger will first creep -- showing everything (trace). 1 1 Call: repeat1 ? 1 1 Exit: repeat1 ? 2 1 Call: fail ? 2 1 Fail: fail ? 1 1 Redo: repeat1 ? 2 2 Call: repeat1 ? 2 2 Exit: repeat1 ? 1 1 Exit: repeat1 ? 3 1 Call: fail ? 3 1 Fail: fail ? 1 1 Redo: repeat1 ? 2 2 Redo: repeat1 ? 3 3 Call: repeat1 ? 3 3 Exit: repeat1 ? 2 2 Exit: repeat1 ? 1 1 Exit: repeat1 ? 4 1 Call: fail ? etc., while the built-in repeat/0 behaves normally: | ?- repeat, fail. 1 1 Call: repeat ? 1 1 Exit: repeat ? 2 1 Call: fail ? 2 1 Fail: fail ? 1 1 Redo: repeat ? 1 1 Exit: repeat ? 2 1 Call: fail ? 2 1 Fail: fail ? 1 1 Redo: repeat ? 1 1 Exit: repeat ? 2 1 Call: fail ? You can say that it is possible to skip over the multiple 'repeat' ports, but the point is, that the stack space for them is needed even when they are not printed - you cannot run such a repeat/0 forever, eventually it is going to overflow some stack. On the other hand, we could ask whether the box model is right in this case - after all it does not bring any new information by repeating all these ports. There is no reasonable argument to force people to use tail-recursive loops instead of repeat-fail loops; if I'm using temporary structures and I know that with the recursive loops they are going to be garbage collected whereas with repeat-fail loops they are just popped, I will always prefer the latter and the standard should support me by providing a built-in repeat/0, since its full functionality cannot be provided by other means. --- Another point I want to make concerns the -1 returned by get0/1 at eof: several people have claimed that it is portable and that it cannot be confused with any character, however it is *not* portable, since it relies on the fact that no valid character can be confused with -1. If characters are represented as strings of length 1, then -1 has a different type and so there is no confusion, but the eof value should have the same type (if nothing else then because of indexing). If characters are integers, taking -1 implies that no character can have the code 2^n - 1 (n being the number of bits on which the character is stored) which is not necessarily true - you can use 7 bits for ASCII, 16 bits for Kanji and anything else on any number of bits. Only if we waste enough space we can guarantee that -1 will be different. A standard that forces you to waste space would really not be good. --Micha
ok@quintus.UUCP (Richard A. O'Keefe) (04/09/88)
In article <522@ecrcvax.UUCP>, micha@ecrcvax.UUCP (Micha Meier) writes: > Another point I want to make concerns the -1 returned by get0/1 > at eof: several people have claimed that it is portable > and that it cannot be confused with any character, however > it is *not* portable, since it relies on the fact that > no valid character can be confused with -1. If characters are > represented as strings of length 1, then -1 has a different type > and so there is no confusion, but the eof value should have the same type > (if nothing else then because of indexing). If characters are integers, > taking -1 implies that no character can have the code 2^n - 1 > (n being the number of bits on which the character is stored) > which is not necessarily true - you can use 7 bits for ASCII, 16 bits for > Kanji and anything else on any number of bits. Only if we waste > enough space we can guarantee that -1 will be different. > A standard that forces you to waste space would really not be good. > > --Micha Er, which character set standards allow a character to be represented by a negative number? I only know about ISO 646, ASCII, EBCDIC, ISO 8859, and XNS, and all of them define character codes to be positive integers. Perhaps someone from Japan could comment on the JIS codings; certainly XNS doesn't assign any Kanji a number which could be confused with a negative integer, even in 16-bit 2s complement. Wouldn't representing some characters by negative numbers mean that comparison of character codes would disagree with the collating order defined by the standard? It is not the case that using -1 as end of file mark means that no character can have the value 2^n-1. All it means is that the integer representation used by Prolog must contain at least one more bit than the number of bits used to *store* characters. (The whole point of the end of file marker is that it is a value which *can't* be stored: it can never be a valid character in a file and it can't appear in the name of an atom or the text of a string.) This isn't much of a restriction. Even for XNS, which I believe includes all the JIS-required Kanji, 16 bits would suffice for Prolog integers. We don't have to waste any space at all. For example, VM/PROLOG has two representations for integers: a compact one for 24-bit integers, and another one for 32-bit integers. Similarly, a Prolog system for PCs using 16-bit "area" tags could have one tag for 16-bit positive integers and another for 16-bit negative integers and a third for bigger integers represented indirectly. (This is what Interlisp-D does.) I've used a programming language where the character input operation returned one type of object for ordinary characters and another type for end of file. It was amazingly painful: you always had to test for the end of file object before doing anything with the result, because character comparison &c were not defined on the end of file object. If characters are to be represented by strings of length 1 (what an utterly disgusting vomitously repulsive kludge), representing end of file markers by the empty string seems like the obvious thing. This representation would even make the end of file marker less than any valid `character', which is what the -1 convention currently does.