[comp.lang.prolog] behavior of read/get0 at end_of_file

bimandre@kulcs.uucp (Andre Marien) (03/16/88)

>	copy_chars :-
>		get0(Char),
>		copy_chars(Char).
>
>	copy_chars(-1) :- !.
>	copy_chars(Char) :-
>		put(Char),
>		copy_chars.

At the benchmark workshop he agitated vividly against the different
behavior of BIM_Prolog in case of end of file.
BIM_Prolog fails when get0 attemps to read past end of file in stead of
returning -1. The same is true for read.
If you write the same program as above with the BIM_prolog convention,
this it what it looks like :

        copy_chars :-
                get0(Char), !,
                put(Char),
                copy_chars.
        copy_chars .

The previous code creates a choicepoint for every character which
get processed. It can be easily avoided :

	copy_chars :- copy_chars_h.
	copy_chars .

	copy_chars_h :- get0(Char), put(Char), copy_chars_h .

Now this looks so obviously better and more readable to me,
that it convinces me we made the better choice.

BTW, if you ever want to convert a program with a different interpretation,
the solution is easy :

/*QP*/read(X) :- /*bim*/read(X), ! .
/*QP*/read(whatever_is_used_to_indicate_end_of_file) .

Of course, as you can verify above, a different coding may very well
produce a better program.

Andre' Marien
B.I.M. Belgium
bimandre@kulcs

Bart Demoen
K.U.Leuven Belgium
bimbart@kulcs

ok@quintus.UUCP (Richard A. O'Keefe) (03/17/88)

article <1197@kulcs.kulcs.uucp>, by bimandre@kulcs.uucp (Andre Marien)
[signed by Andre Marien and Bart Demoen both] arrived mangled at our site.
What we got started like this:
> 
> >	copy_chars :-
> >		get0(Char),
> >		copy_chars(Char).
> >
> >	copy_chars(-1) :- !.
> >	copy_chars(Char) :-
> >		put(Char),
> >		copy_chars.
> 
> At the benchmark workshop he agitated vividly against the different
> behavior of BIM_Prolog in case of end of file.
> BIM_Prolog fails when get0 attempts to read past end of file instead of
> returning -1. The same is true for read.
> If you write the same program as above with the BIM_prolog convention,
> this it what it looks like :
> 
>         copy_chars :-
>                 get0(Char), !,
>                 put(Char),
>                 copy_chars.
>         copy_chars .
> 
> Now this looks so obviously better and more readable to me,
> that it convinces me we made the better choice.

It would be inaccurate to say that I have ever "agitated ... against ...
BIM_Prolog".  It would, however, be accurate to say that I had some
rather harsh things to say about the quondam intention of the BSI
Prolog group to make get0/1 behave in this way.  I don't know what
the BSI group currently intend.

I should apologise for not having taken greater care with the presentation
of copy_chars/0; but then in the context of my original message it was part
of a joke.  Here's how I'd write copy_chars/0 for real:

	copy_chars :-
		get0(Char),
		(   is_endfile(Char) -> true
		;   put(Char),
		    copy_chars
		).

This is only superficially different from C's
	while ((Char = getchar()) != EOF) putchar(Char);

Which version is "so obviously better and more readable"?

I'm not going to try to answer that question, because it is entirely
the wrong question.  An example this small can be coped with even if
it is badly written.

The right question is:
	are there any objective reasons for preferring one approach
	to the other?

Let's face it, a very important consideration for Quintus is that we take
money from people in return for something we claim to be an Edinburgh-
compatible Prolog.  So we have three choices:
   1.	have get0/1 return a special code (26 as in DEC-10 Prolog,
	or -1 as in C-Prolog; we picked the latter) and have read/1
	return a special code, because that's what real Edinburgh
	Prologs do, or
   2.	stop claiming to be Edinburgh compatible, or
   3.	deliberately lie to our customers.
I think it's clear that number 1 is a defensible choice.

So that's a reason why a Prolog vendor who claims to provide an
Edinburgh-compatible Prolog should do what we do.  But is there any
reason why this is a *good* thing to do in itself?

Yes, there is.

With the possible exception of applying-a-function-to-some-arguments,
an operation on its own isn't much good.  You need a method for using
it, and it has to fit well with the other operations available.  In
particular, the question of how we read a single character isn't all
that interesting:  what's interesting is "how do we write a program
that reads a lot of characters".  For example, how do you write a
tokeniser for Prolog or Pascal or whatever in Prolog?

Where would you look for advice about how to write programs that do
input?  Where but in a book about compilers.  And a good book on that
topic is the "Dragon" book:

	Compilers: Principles, Techniques, and Tools
	Aho, Sethi, & Ullman
	Addison-Wesley 1986
	ISBN 0-201-10088-6

To learn how to write a tokeniser, we might look at 'lexer.c' in 
section 2.9 (page 74).  You will note that this program relies on the
fact that C streams end with an "EOF" end-marker.  Perhaps more
seriously, the whole "transition diagram"/"finite state automaton"
approach described in chapter 3 relies on "end of file" being a
source character like any other.  (See fig 3.22, for example.)
Indeed, end-markers are so much taken for granted in parsing that
"$" crops up in chapter 4 without much introduction.

We can convert a deterministic finite-state automaton to Edinburgh
Prolog with very little effort.  We represent a state of the automaton
by a predicate with one argument: the next character.  We represent an
arc of the automaton by a clause.  For example, the arcs

	s1: a -> s2.
	s1: b -> s1.
	s1: $  -> accept.

would be coded like this:
	
	s1(0'a) :- get0(Next), s2(Next).
	s1(0'b) :- get0(Next), s1(Next).
	s1(- 1) :- true.

The correspondence is exact.  I think this is an important practical
point.  I offer the following anecdote as evidence:  a year after
learning Prolog I decided that I wanted to write a Prolog system of my
own, but couldn't figure out how to write the tokeniser, so I abandoned
the entire project.  Two years later, I was talking to someone and
suggested this approach as a method for writing programs that use get0/1,
and suddenly it dawned on me that it would work.  One evening of coding
later, I had a Dec-10 Prolog tokeniser written in Dec-10 Prolog.  If you
need some lookahead, you simply add extra arguments to a few states to
carry the looked-ahead characters.

What would it do to us if get0/1 failed at the end of a file?
Unpleasant things:  the test for end of file has to be pushed up into
the states *before* the state which expects end of file.  Every arc of
the form

	sN(X) :- get0(Next), s1(Next).

has to be rewritten as 

	sN(X) :- bad_get0(Next), !, s1(Next).
	sN(X) :- s1_eof.

and s1 has to be written as

	s1(0'a) :- bad_get0(Next), s2(Next).
	s1(0'b) :- bad_get0(Next), !, s1(Next).
	s1(0'b) :- /* EOF */      s1_eof.

	s1_eof :- true.

The cleanest way to avoid this mess is to use a get0/1 which returns an
end-marker, and if your vendor won't provide you with one, you'll have
to write one yourself.

+------------------------------------------------------------------------+
|	An important reason for get0/1 returning an end-marker at	 |
|	the end of a stream is that this forms part of a practice	 |
|	of writing character-reading code.				 |
+------------------------------------------------------------------------+

Note that this practice existed prior to Prolog:  we didn't have to figure
out anything new.  

Is there any other reason for preferring the Edinburgh Prolog version of
get0/1?

Yes.

Suppose I have a goal
	bad_get0(X)
and it fails.  Does that mean that the end of the stream has been reached?
***NO***.  It means that *EITHER* the end of the stream has been reached
*OR* a character has been read which didn't happen to unify with X.  Is
there anything we can do afterwards to find out which was the case?  No.
Assuming, for the sake of argument, a predicate at_eof/0 which succeeds
when we are at the end of the current input stream,

	... ( bad_get0(X) -> /* character read */
	    ; at_eof -> /* now at end of stream */
	    ; /* otherwise it was unification failure */
	    ) ...
isn't quite right.  If there was precisely one character left in the
input stream, bad_get0(X) will consume it, and if the unification fails,
at_eof will *now* see the end of the file.

The problem is that bad_get0/1 has a side-effect (consuming one character
from the current input stream) which it SOMETIMES does and sometimes
doesn't do.  It's similar to playing Russian roulette, except that the
gun is pointed at the foot rather than the head.

It is interesting that Marien and Demoen fell into exactly this trap.
They say:
>
> BTW, if you ever want to convert a program with a different interpretation,
> the solution is easy :
> 
> /*QP*/read(X) :- /*bim*/read(X), ! .
> /*QP*/read(whatever_is_used_to_indicate_end_of_file) .
> 
It may be easy, but it isn't a solution.  Suppose we write this:

	buggy_read(Term) :- bim_read(Term), !.
	buggy_read(end_of_file).

Now, suppose the current input stream contains
	fred.
and we call
	buggy_read(end_of_file).
IT WILL SUCCEED!  It should have failed.


Now, have I shown that the BIM approach is bad?  No.
What I have shown is that the end-marker approach which Quintus Prolog
inherited from C Prolog (which got it from DEC-10 Prolog, which I think
got it from Pop-2) is accompanied by a straightforward discipline for
using it to write tokenisers.  (In fact, it's exactly the same approach
you use in C.)  I am not aware of a similar methodology for the
end-failure approach, despite having asked Bart Demoen at the Prolog
Benchmarking Workshop for enlightenment on this point.  Again, this
doesn't mean that there isn't any such methodology, only that I don't
know what it might be.  If there is a straightforward way of turning
deterministic transition diagrams into end-failure code, I would be
pleased to be instructed in it.

ok@quintus.UUCP (Richard A. O'Keefe) (03/19/88)

In article <783@cresswell.quintus.UUCP>, I replied to article
<1197@kulcs.kulcs.uucp>, by bimandre@kulcs.uucp (Andre Marien)
[signed by Andre Marien and Bart Demoen both].
The topic was what read/1 and get0/1 should do at the end of a stream.

I thought you might be interested to know what the BSI committee say.

In document PS/236, "Draft minutes, Prolog Built-In Predicates meeting,
10 December 1987", we find

	4 Design criterion

	<name deleted> suggested: "Whenever possible, a predicate with
	a side effect should always succeed and never instantiate
	variables."

This of course rules get0/1 and read/1 out entirely.  That may not be
what <name deleted> _meant_, but it _is_ what the minutes say he _said_.
As far as I can tell, the real intent is to rule out retract/1, which
is disliked because it unifies its argument with the thing you removed.
The minutes show that Paul Chung proposed naming the standard clause-
removing predicate delete/1 instead of retract/1.  Good on yer, mate!
This should not be construed as endorsement of the name delete/1, but
as praise for Paul Chung's good standardisation manners.

micha@ecrcvax.UUCP (Micha Meier) (03/22/88)

In article <783@cresswell.quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes:
>Here's how I'd write copy_chars/0 for real:
>
>	copy_chars :-
>		get0(Char),
>		(   is_endfile(Char) -> true
>		;   put(Char),
>		    copy_chars
>		).
>
>This is only superficially different from C's
>	while ((Char = getchar()) != EOF) putchar(Char);
>
>Which version is "so obviously better and more readable"?
>
>I'm not going to try to answer that question, because it is entirely
>the wrong question.  An example this small can be coped with even if
>it is badly written.

	This is true, we have to distinguish various uses
	of get0/1. The above example is indeed easier
	written when get0/1 fails at the eof, because the is_endfile/1
	test is not needed. However, most often one wants to do more
	with the character rather than just test the eof, and only
	then the differences are meaningful.

	By the way, get0/1 does *not* exist in BSI, it uses get_char/1 instead,
	and its argument is a character, i.e. a string of length 1.
	This means that the type 'character' is inferred from
	the type 'string' (and not the other way round like in C).
	Does anybody out there know what advantages this can bring?
	It is independent on the character <-> integer encoding,
	but this only because explicit conversion predicates have
	to be called all the time.

>We can convert a deterministic finite-state automaton to Edinburgh
>Prolog with very little effort.  We represent a state of the automaton
>by a predicate with one argument: the next character.  We represent an
>arc of the automaton by a clause.  For example, the arcs
>
>	s1: a -> s2.
>	s1: b -> s1.
>	s1: $  -> accept.
>
>would be coded like this:
>	
>	s1(0'a) :- get0(Next), s2(Next).
>	s1(0'b) :- get0(Next), s1(Next).
>	s1(- 1) :- true.
>

	In his tutorial to the SLP '87 Richard has taken another
	representation of a finite automaton which is more
	appropriate:

	s1 :-
		get0(Char),
		s1(Char).

	s1(0'a) :-
		s2.
	s1(0'b) :-
		s1.
	s1(-1) :-
		accept.

	
	The difference is, that if one wants to perform some action
	in some states, this must be done *before* reading the next character,
	i.e. just at the beginning of s1/0. Such representation can
	be more easily converted to the BSI's variant of get:

	s1 :-
		% do the corresponding action
		( get0(Char) -> s1(Char)
		;
		  accept
		).

	s1(0'a) :-
		s2.
	s1(0'b) :-
		s1.

	Note that the eof arc has to be merged into s1/0 in this way
	since if we'd write it like

	s1 :-
		s1_action,
		get0(Char),
		!,
		s1(Char).
	s1 :-
		accept.

	then after an eof we would backtrack over s1_action and undo
	what we've done.

	I must say, none of the two seems to me satisfactory. Richard's
	version is not portable due to the -1 as eof character. We can
	improve this into

	s1(X) :-
		eof(X),
		accept.
	s1(0'a) :-
		s2.
	s1(0'b) :-
		s1.

	and hope that the compiler will unfold the eof/1 inside the
	indexing mechanism, otherwise we have choice points even
	if the code is deterministic.
	The BSI version is much more arguable, though. Having to
	wrap a disjunction (and a choice point) around the get0/1 call
	suggests that for this application the BSI choice is not
	the appropriate one. It is interesting to note, however, that
	it could work even with nondeterministic automata, where the BSI's
	failure was (I thought) more likely to cause problems.

>> BTW, if you ever want to convert a program with a different interpretation,
>> the solution is easy :
>> 
>> /*QP*/read(X) :- /*bim*/read(X), ! .
>> /*QP*/read(whatever_is_used_to_indicate_end_of_file) .
>> 
>It may be easy, but it isn't a solution.  Suppose we write this:
>
>	buggy_read(Term) :- bim_read(Term), !.
>	buggy_read(end_of_file).
>
>Now, suppose the current input stream contains
>	fred.
>and we call
>	buggy_read(end_of_file).
>IT WILL SUCCEED!  It should have failed.

	Since the Edinburgh get0/1 can easily simulate the BSI's one with

	get0_BSI(Char) :-
		get0_Edinburgh(Char),
		not_eof(Char).

	but as Richard has shown, not vice versa, it is clear that
	for a Prolog system it is better to have get0/1 return
	some *portable* eof (e.g the atom end_of_file, for get0/1
	there can be no confusion with source items) instead of
	some integer.
	
	  This, however, just shifts the problem up to read/1:
	BSI objects that if it returns e.g. the atom end_of_file
	then any occurrence of this atom in the source file
	could not be distinguished from a real end of file.
	In this case, a remedy would be the introduction of
	a term with a local scope (e.g. valid
	only in the module where read/1 and eof/1 are defined) and
	using eof/1 instead of unifying the argument of read/1 with
	the end_of_file term. Hence read/1 would return this term
	on encountering the file end and eof/1 would check whether
	its argument is this term.

--Micha

bruno@ecrcvax.UUCP (Bruno Poterie) (03/23/88)

I do not think that having read/1 returning an atom on EOF is a bad thing.
If you take as an example certain UN*X tools, they read their input from
a file (typically stdin) until finding a line composed of a single dot.
So it is perfectly legal to submit a file which contains a dot-line in the
middle if you want only the first part to be feeded to the tool. Same thing
for Prolog, if you have a huge file full of various facts but want only say
the first 100 to be used as input in your test program, then simply add
the EOF mark before line 101. I would then prefer to have it as a directive:
		...
		:- eof.

so that it is not a bare fact but actually a command to the consulting tool
that it have to EOF this input source. It is then coherent with other
directives like:
		...
		:- [file3,file4].
		...
which actually order the consulting tool to insert at this point the content
of the named files. I believe that the notation "eof" is quite standard in
the UN*X system and already in some Prolog, including as a test predicate for
this very term:
		... read(Term), eof(Term) -> ...

so i think we could maybe abandon the end_of_file notation of Quintus (sorry
for you Richard, a compatibility switch could very easily turn it back anyway),
but it is not an important point as the aim would be to discipline one's
programming style by systematically using the test form: 
	eof(Term) 
and never ever explicit the EOF term itself. Portability is great.

	Now for the get0/1 [or get_char/1/2]: having it returning an otherwise
impossible value, say an atom as suggested by Micha, is ok if the returned
thing is an integer representing the ascii code [or ebcdic] of the character.
using the same term as the one returned by read/1, and consequently the same
test predicate as only mechanism to check for EOF, would greatly improve the
compactness and consistency of the i/o system. As a side effect, close/1 is
not strictly necessary anymore as the following sequence does the job:
		eof(EOF), put(EOF)
Because obviously put/1 must handle the same term in the same way (I am afraid
that outputing CHAR modulo 256 would not work in this case).
I nevertheless believe that EOF == -1 is a clearer convention, returning an
object of the same type but out of the normal range of normal input, and
is already the UN*X convention. It would not force put/1 to accept it as EOF
character, as it would be outputed as: -1 modulo 256 (or 512) == 255. Passing
-1 to UN*X putchar() does not generate an EOF!

Ok, enough delirium tremens for today. My main point is: the character i/o
should provide a very low level facility, with no hypothesis about the use
which could be made of them. Using read(Term) and eof(Term) provides an
uniform, simple, elegant and portable mean of performing i/o at Prolog term
level. Using get0/1 implies you are interested in the real bits contained
in your input support, so you want to control it at a low level. Returning
the -1 value is portable and low-level, because independant of ascii or
any other character set. Alternatively, returning eof and using the same
eof(Char) test predicate would be again low-level, portable, and free of
any supposed semantic. More important, most of prolog input loops may be
adapted with this scheme at low cost. Failing at EOF, however, would mean
full rewriting of those applications and system libraries.

================================================================================
  Bruno Poterie		# ... une vie, c'est bien peu, compare' a un chat ...
  ECRC GmbH		#		tel: (49)89/92699-161
  Arabellastrasse 17	#		Tx: 5 216 910
  D-8000 MUNICH 81	#		mcvax!unido!ecrcvax!bruno
  West Germany		#		bruno%ecrcvax.UUCP@Germany.CSNET
================================================================================

ok@quintus.UUCP (Richard A. O'Keefe) (03/23/88)

In article <518@ecrcvax.UUCP>, micha@ecrcvax.UUCP (Micha Meier) writes:
> 	By the way, get0/1 does *not* exist in BSI, it uses get_char/1 instead,
> 	and its argument is a character, i.e. a string of length 1.
> 	This means that the type 'character' is inferred from
> 	the type 'string' (and not the other way round like in C).
> 	Does anybody out there know what advantages this can bring?
> 	It is independent on the character <-> integer encoding,
> 	but this only because explicit conversion predicates have
> 	to be called all the time.

I find it extremely odd to call a string of length one a character.
It's like calling a list of integers which contains one element an
integer.  Do we call an array with one element a scalar?

I haven't commented on the BSI's get_char/1 before because for once they
have given a new operation a new name.  There are two problems with it.
A minor problem is that the result being a string, they can't represent
end of file with an additional character, so the fail-at-end approach is
hard to avoid.  (Not impossible.)  There is an efficiency problem:
something which returns an integer or a character constant can just
return a single tagged item, but something which returns a string either
has to construct a new string every time, or else cache the strings somehow.

For example, Interlisp has a function which returns you the next character
in the current input stream, represented as an atom with one character in
its name.  (Well, almost:  characters `0`..`9` are represented by integers
0..9.)  This was quite attractive on a DEC-20, where you could just compute
a table of 128 atoms once and for all.  It wasn't too bad on VAXen either,
where the table had to have 256 elements.  But it because rather more
clumsy on the D machines, which have a 16-bit character set.  (Can you say
"Kanji"?  I knew you could.)  So the alternatives I can see at the moment
are
    o	construct a new string every time.
    o	precompute 2^16 strings.
    o	cache 2^8 strings, and construct a new string every
	time for Kanji and other non-Latin alphabets.
    o	not support Kanji or other non-Latin alphabets at all.
(Can you say "Cyrillic"?  How about "Devanagari"?  You may need the
assistance of a good dictionary; I used to mispronounce "Devanagari",
and probably still do.)

I wrote that
> >For example, the arcs
> >	s1: a -> s2.
> >	s1: b -> s1.
> >	s1: $  -> accept.
> >would be coded like this:
> >	s1(0'a) :- get0(Next), s2(Next).
> >	s1(0'b) :- get0(Next), s1(Next).
> >	s1(- 1) :- true.
Meier says that
> 	In his tutorial to the SLP '87 Richard has taken another
> 	representation of a finite automaton which is more appropriate:
> 	s1 :-
> 		get0(Char),
> 		s1(Char).
> 
> 	s1(0'a) :-
> 		s2.
> 	s1(0'b) :-
> 		s1.
> 	s1(-1) :-
> 		accept.
There wasn't time to go into this in detail in the tutorial, but it
should be obvious that the first approach is more general:  in particular
it can handle transitions where (perhaps because of context) no input is
consumed, and it can handle lookahead.
>	Such representation can
> 	be more easily converted to the BSI's variant of get:
> 	s1 :-
> 		% do the corresponding action
> 		( get0(Char) -> s1(Char)
> 		;
> 		  accept
> 		).
This doesn't generalise as well as the end-marker version.
Here is the kind of thing one is constantly doing:

	rest_identifier(Char, [Char|Chars], After) :-
		is_csymf(Char),
		!,
		get0(Next),
		rest_identifier(Next, Chars, After).
	rest_identifier(After, [], After).

See how this code can treat the end marker just like any other
character:  because it doesn't pass the is_csymf/1 test (copied from
Harbison & Steele, by the way) we'll pick the second clause, and there
is no special case needed for an identifier which happens to be at the
end of a stream.

The fail-at-end approach forces us not only to do something special
with the get0/1 in rest_identifier/3, but in everything that calls it.
(In the Prolog tokeniser, there are two such callers.)

The point is that if-then-elses such as Meier suggests start
appearing all over the place like maggots in a corpse if you adopt
the fail-at-end approach, to the point of obscuring the underlying
automaton.

> 	I must say, none of the two seems to me satisfactory. Richard's
> 	version is not portable due to the -1 as eof character.

If the standard were to rule that -1 was the end of file character,
it would be precisely as portable as anything else in the standard!
In strict point of fact, the Prolog-in-Prolog tokeniser was written
in DEC-10 Prolog for DEC-10 Prolog, and used 26 as the end of file
character, and 31 as the end of line character.  It took 5 minutes
with an editor to adapt it to Quintus Prolog.  I wish C programs
written for UNIX took this little effort to port!

> 	for a Prolog system it is better to have get0/1 return
> 	some *portable* eof (e.g the atom end_of_file, for get0/1
> 	there can be no confusion with source items) instead of
> 	some integer.

It is important that the end-of-file marker, whatever it is, should be
the same kind of thing, in some sense, as the normal values, so that
classification tests such as is_lower/1, is_digit/1, and so on will
just fail quietly for the end-of-file marker, not report errors.  Since
end of file is rare, we would like to test the other cases first.
Pop-2 on the Dec-10 returned integers almost all the time, except that
at the end of a stream you got an end-of-file object which belonged to
another data type (there was only one element of that data type, and it
printed as ^Z).  This was in practice a major nuisance, because before
you could do anything other than an equality test with the result, you
had to check whether it was the end of file mark.

I have been giving out copies of the Prolog-in-Prolog tokeniser to show
how easy it is to program character input with the Edinburgh Prolog
approach.  If someone would give me a tokeniser for BSI Prolog written
entirely in BSI Prolog using the fail-at-end approach, and if that
tokeniser were about as readable as the Prolog-in-Prolog one, that would
go a long way towards convincing me that fail-at-end was a good idea.

> 	BSI objects that if [read/1] returns e.g. the atom end_of_file
> 	then any occurrence of this atom in the source file
> 	could not be distinguished from a real end of file.

That's not a bug, it's a feature!  I'm serious about that.  At Edinburgh,
I had the problem that if someone asked me for help with Prolog, they
might be using one of four different operating systems, where the end
of file key might be
	^Z
or	^D
or	^Y
or	something else which I have been glad to forget.
No problem.  I could always type
	end_of_file.
to a Prolog listener, and it would go away.  Oh, this was so nice!
In fact, on my SUN right now I have function key F5 bound to
"end_of_file.\n" so that I can get out of Prolog without running the
risk of typing too many of them and logging out.

Another thing it is useful for is leaving test data in a source file.
One can do
	<declarations>
	<clauses>
	end_of_file.
	<test cases>
and include the test cases in the program or not just by moving the
end_of_file around.

Ah, you'll say, but that's what nested comments are for!
Well no, they don't work.  That's right, "#| ... |#" is NOT a reliable
way of commenting code out in Common Lisp, and "/* ... */" is NOT a
reliable way of commenting code out in PopLog.  But end_of_file, in
Edinburgh Prolog, IS a reliable way of commenting out the rest of the file.

> 	In this case, a remedy would be the introduction of

Prolog needs a remedy for end_of_file like Elizabeth Schwarzkopf
needs a remedy for her voice.

Before taking end_of_file away from me, the BSI committee should supply
me with a portable way of exiting a break level and a reliable method of
leaving test cases in a file without having them always read.

ok@quintus.UUCP (Richard A. O'Keefe) (03/24/88)

In article <519@ecrcvax.UUCP>, bruno@ecrcvax.UUCP (Bruno Poterie) writes:
> I believe that the notation "eof" is quite standard in
> the UN*X system and already in some Prolog

I just grepped through the UNIX [UNIX is a trademark of AT&T] manuals,
and all I could find was the function feof(Stream).  None of the UNIX 
utilities I am familiar with uses "eof" to signify end of file.
Franz Lisp does something interesting:
	(ratom [Port [Eof]])
	(read  [Port [Eof]])
	(readc [Port [Eof]])
return the Eof argument (which defaults to nil) when you read the
end of the file, so you can get whatever takes your fancy.

> so i think we could maybe abandon the end_of_file notation of Quintus (sorry
> for you Richard, a compatibility switch could very easily turn it back anyway),

But it ***ISN'T*** a Quintus notation!  This is the notation used by
	DEC-10 Prolog
	EMAS Prolog
	C Prolog
	Quintus Prolog
	Stony Brook Prolog
	ALS Prolog
	Expert Systems International Prolog-2
	AAIS Prolog (in "Edinburgh" mode only)
and doubtless many others.  end_of_file IS the "de facto" standard.
Poterie's suggestions are good ones, but in order to overthrow the
de facto standard, they would have to be MUCH MUCH better, and they
aren't.

> but it is not an important point as the aim would be to discipline one's
> programming style by systematically using the test form: 
> 	eof(Term) 
> and never ever explicit the EOF term itself. Portability is great.

Beware.  While Quintus Prolog offers the library predicate
	is_endfile(?EndMarker)
there are other Prolog systems, such as AAIS Prolog, where there is a
predicate with a similar name which takes a Stream argument:
	is_eof(+Stream)
in AAIS Prolog means "is it the case that Stream is positioned at its end?".
Yes, portability is great, but would it not be more just to reward those
people (such as SICS, Saumya Debray, ALS, and others) who have tried to
provide it, by standardising their solution?

> As a side effect, close/1 is
> not strictly necessary anymore as the following sequence does the job:
> 		eof(EOF), put(EOF)
Um, what about INPUT streams?  And there is another reason for wanting
close/1:  it will close a stream which is not the current output stream.

ok@quintus.UUCP (Richard A. O'Keefe) (03/24/88)

Just to continue with the get0 topic:

	The fail-at-end approach rests on an assumption
	which deserves to be made explicit, because it is false.

What is the assumption?   That receiving the end-of-file indication from
an operating system indicates that there is nothing further to read in
that stream.  This is false?  Yes.

Let's ignore 4.2BSD sockets, V.3 Streams, non-blocking I/O, VMS
concatenated files, and other esoterica which one doesn't expect
BSI Prolog to cope with.  Let's just consider terminals.

In Tops-10 (home of DEC-10 Prolog):
	end-of-file from the terminal is a software convention (^Z).
	You can just keep reading from the terminal after that, and
	in fact that's exactly what DEC-10 Prolog does.

In UNIX (original home of C Prolog):
	end-of-file from the terminal is a software convention
	(EOF character typed after an empty line).
	You can just keep reading from the terminal after that, and
	in fact that's exactly what C Prolog does.

In VM/CMS, using SAS Lattice C
	end-of-file from the terminal is a software convention
	(some magic string, which defaults to "EOF", but it is
	trivially easy for a program to change it -- use afreopen()).
	I believe that you can keep reading from the terminal after
	that, but I haven't tried it myself.

On a Macintosh, using ALS Prolog
	end-of-file from a window is a software convention
	(you click on "End of File" in the "Edit" menu).
	All windows and streams remain live after that, and
	you can just keep reading, and that's what ALS Prolog does.

On a Xerox Lisp Machine, using Xerox Quintus Prolog
	end-of-file from a TEXEC window is a software convention.
	All windows and streams remain live after that, and
	you can just keep reading, and that's what XQP does.

[The sample of Prologs is not of interest here; my point is that there
 are several *operating systems* with this characteristic.
]
So the rule actually followed in Edinburgh-compatible Prologs is that
    -   the sequence of characters codes returned by get0/1 is
	the sequence of characters delivered by the source
    -   with the end-of-file marker inserted every time the
	host indicated the end-of-file condition
    -	Prolog receives through get0/1 as many characters and as many
	end-of-file markers as there are; any attempt to read past the
	end of this union stream is an error.  Not a failure, an error.

It happens that when you are reading from disc files, most operating
systems will indicate the end of file condition once.

Are terminals the only kind of file for which multiple end-of-file
conditions are plausible?  No.  The convention for tapes is that a
single tape-mark (usually reported as an end-of-file condition) is
merely a separator; a tape as such is terminated by a double tape-mark.
Thus a Prolog program copying one tape to another (this is a reason why
we might want put(-1) NOT to close a file; if it does anything special
on a tape it should be to write a tape-mark) might want to keep reading
after seeing an end-marker.

grzm@zyx.UUCP (Gunnar Blomberg) (03/25/88)

In article <801@cresswell.quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes:
>[...]  So the alternatives I can see at the moment
>are
>    o	construct a new string every time.
>    o	precompute 2^16 strings.
>    o	cache 2^8 strings, and construct a new string every
>	time for Kanji and other non-Latin alphabets.
>    o	not support Kanji or other non-Latin alphabets at all.

How about:
     o	support an immediate representation for characters.
if you've got room for them in your pointers. Or
     o	cache them as they occur.
if you haven't.

I can't see that the fact that characters look like one-element strings
to the Prolog programmer in any way would stop an implementor from
implementing them using the same tricks as if characters were a
separate data-type.  Yes, it makes the internal string-handling
somewhat more convoluted, but not unduly so, I would say.

-- 
Gunnar Blomberg, ZYX, +46 8 6653205, grzm@zyx.se

grzm@zyx.UUCP (Gunnar Blomberg) (03/25/88)

Hmm...  isn't this a lot of fuss about very little?

It seems to me that whatever semantics is chosen, it is simple to get
the other:

BSIread(X) :-
   DEC10read(X),
   X \== end_of_file.

DEC10read(X) :-
   BSIread(Y),
   !,
   X = Y.
DEC10read(end_of_file).

Given that most Prologs seem to use the DEC-10 Prolog approach, and
that it is probably marginally more efficient to write BSIread in
terms of DEC10read than the other way around, the DEC-10 approach seems
the obvious choice.  Not that I think the other choice is all that
much worse...  Isn't it more interesting to discuss things where it is
harder to get it the way one wants (like the question raised by
Richard O'Keefe about whether a string data-type is necessary, or even
useful.  Now *that* is interesting!)

----------
At this point I had a discussion with a colleague of mine, and it
turns out that it isn't this simple.  In fact, I now believe that it
is impossible to get the BSIread functionality from a system that only
provides the DEC-10 one.  The predicate BSIread above will fail if the
file read contains 'end_of_file', of course.  This (for me) tips the
balance over in favor of the BSI approach.  It is after all easy to
write DEC10read in terms of BSIread.

Naturally there should be a provision for compatibility with "old"
programs.  I would be quite happy to name BSIread read_term, for
instance, and provide a user-level predicate read, that could be
redefined to give the required semantics.

-----------
As far as get0 goes, the question is much easier, since there *is* an
obvious out-of-band value, namely -1.
-- 
Gunnar Blomberg, ZYX, +46 8 6653205, grzm@zyx.se

grzm@zyx.UUCP (Gunnar Blomberg) (03/25/88)

In article <814@cresswell.quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes:
>Just to continue with the get0 topic:
>
>	The fail-at-end approach rests on an assumption
>	which deserves to be made explicit, because it is false.
>
>What is the assumption?   That receiving the end-of-file indication from
>an operating system indicates that there is nothing further to read in
>that stream.  This is false?  Yes.
>
[Lots of examples deleted]

This argumentation seems a little doubtful to me.  I don't have
experience with all the systems RAO'K mentions, but (to the best of my
memory) I have never seen a use of end-of-file from the terminal that
wasn't being used to pretend that the terminal was more than one file.

Cases in point:

DEC-10 Prolog (on TOPS-20, alas):
	User says [user], gives clauses and ends with ^Z.  The system
	pretends that there is a file 'user' by reading from the
	terminal until end-of-file is seen.  As far as Prolog is
	concerned the file ended at that point, and no more reading
	is done from that particular file at that point.

Using the terminal as standard input in Unix:
	Example: user types 'cat >foo' and then writes contents of file
	on terminal, indicating end by end-of-file.  As far as the
	reader of that particular input is concerned the file ended at
	that point, and no more reading is done from that particular
	'file'.

In conclusion:  I think that software conventions concerning
end-of-file from the terminal exist primarily to enable the
system/user to pretend that the terminal is more than one file.  In
fact, I know of no instance where this is not so.  Can somebody come
up with an example where multiple end-of-files are actually used in
one single ('conceptual') file?

-- 
Gunnar Blomberg, ZYX, +46 8 6653205, grzm@zyx.se

grzm@zyx.UUCP (Gunnar Blomberg) (03/25/88)

In article <801@cresswell.quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes:
>Another thing it is useful for is leaving test data in a source file.
>One can do
>	<declarations>
>	<clauses>
>	end_of_file.
>	<test cases>
>and include the test cases in the program or not just by moving the
>end_of_file around.
>
>Ah, you'll say, but that's what nested comments are for!
>Well no, they don't work.  That's right, "#| ... |#" is NOT a reliable
>way of commenting code out in Common Lisp, and "/* ... */" is NOT a
>reliable way of commenting code out in PopLog.  But end_of_file, in
>Edinburgh Prolog, IS a reliable way of commenting out the rest of the file.

	Well, considering the fact that nested comments can comment out
*any* part of the file, not just the last part, and that the cases
where nested comments do not work must be so exceedingly rare as to be
practically non-existent, I would definitely prefer nested comments.
Honestly, how often do you have unmatched beginning-of-nested-comment
of end-of-nested-comment buried inside your code?
	Well, just because nested comments are much more useful than
plain ones does not mean that BSI should adopt them.  There is the
question of supporting "old" code.  It would be interesting to know
how many programs would break if Prolog comments were changed to be
nesting.  Do you know of any?
[I have actually seen the following style used in C:
	/* #define wantFOO 1	/* To get foo feature */
	#define wantBAR 1	/* To get bar feature */
	/* #define wamtBAZ 1	/* To get baz feature */
 It gave me a good laugh at the time.]
	In any case, I have always considered the use of end_of_file
to get some kind of half-baked ability to comment out a part of a file
as an abomination (which does not mean I didn't use it and find it
useful).
-- 
Gunnar Blomberg, ZYX, +46 8 6653205, grzm@zyx.se

cdsm@ivax.doc.ic.ac.uk (Chris Moss) (03/25/88)

In article <801@cresswell.quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes:
rok>I find it extremely odd to call a string of length one a character.
rok> ...  But it because rather more
rok>clumsy on the D machines, which have a 16-bit character set.  (Can you say
rok>"Kanji"?  I knew you could.)

Yes, the BSI committee is just beginning to face up to this problem,
as the Japanese have just started taking an interest...
As Richard points out, it's not much problem for a character based
definition, which I personally would favour.

rok>The fail-at-end approach forces us not only to do something special
rok>with the get0/1 in rest_identifier/3, but in everything that calls it.

rok>(In the Prolog tokeniser, there are two such callers.)
rok>
rok>The point is that if-then-elses such as Meier suggests start
rok>appearing all over the place like maggots in a corpse if you adopt
rok>the fail-at-end approach, to the point of obscuring the underlying
rok>automaton.

I think this is a fair point when looking at the definition of
lexical analysers, however...

mmeier> I must say, none of the two seems to me satisfactory. Richard's
mm> 	version is not portable due to the -1 as eof character.

A character definition which included a (special) end-of-file token would be
better.

mm> 	BSI objects that if [read/1] returns e.g. the atom end_of_file
mm> 	then any occurrence of this atom in the source file
mm> 	could not be distinguished from a real end of file.
rok>
rok>That's not a bug, it's a feature!  I'm serious about that. 

I don't think that is any better than most uses of that particular
argument. Sure, if you learn to live with it you can find uses for it.

rok>Before taking end_of_file away from me, the BSI committee should supply
rok>me with a portable way of exiting a break level and a reliable method of
rok>leaving test cases in a file without having them always read.

And this is the death of any standardization process!  I have yet to
find the document that Richard referred to (a few days ago) when he
claimed that the BSI's mandate was to standardize Edinburgh Prolog.
It certainly hasn't been repeated in all the other formal
presentations that have been made to BSI or ISO. But if one has to
follow every wrinkle of an implementation just because it represents
(arguably) the most popular dialect, then why don't we just
appoint IBM to write all our standards for us (or Quintus or ...)?
[And who is the TRUE inheritor of the title "Edinburgh Prolog" anyway? Is
it the commercial product (formerly NIP) now being sold under that title?]

To return to the argument, I think there's a significant difference between
get0 and read. Having an end-of-file marker for read is (almost
never) used to implement finite-state-machines. Instead it is used
for repeat-fail loops.

e.g.  go :- repeat,
	    read(Term),
            (Term=end_of_file -> true; process(Term), fail).

Now in the days before tail recursion and all the other optimizations
this was inevitable. But why should we encourage this approach today?

The above clause is a good example of the trickiness of "repeat". I always
write repeat loops wrong first time and this was no exception. I put 
            (Term=end_of_file -> true; process(Term)), fail.
then changed it to
            (Term=end_of_file -> !; process(Term)), fail.
before settling on the above version.  I personally think "repeat" should
be left out of the standard (there's no penalty overhead in not having it
built-in these days anyway). Don't other people have my problem?

It would seem to encourage better programming if we allowed "get0"
(or get_file or whatever) to return an end-of-file token, and any
high-level routines to fail at end-of-file. It's not particularly
consistent, but I don't know whether that's a priority in this case.

rok>In fact, on my SUN right now I have function key F5 bound to
rok>"end_of_file.\n" so that I can get out of Prolog without running the
rok>risk of typing too many of them and logging out.

I seem to get by perfectly well by setting "ignoreeof" in my cshell!

rok>Ah, you'll say, but that's what nested comments are for!
rok>Well no, they don't work.  That's right, "#| ... |#" is NOT a reliable
rok>way of commenting code out in Common Lisp, and "/* ... */" is NOT a
rok>reliable way of commenting code out in PopLog. 

That seems to be the best argument for allowing end-of-line comments in
Prolog. Now where do I find the Emacs macro for commenting out all lines
between dot and mark (and removing such comments)?

Chris Moss

Disclaimer: unless I say otherwise I am expressing my personal opinions
NOT the opinions of any committee!

ok@quintus.UUCP (Richard A. O'Keefe) (03/27/88)

In article <2410@zyx.UUCP>, grzm@zyx.UUCP (Gunnar Blomberg) writes:
> Hmm...  isn't this a lot of fuss about very little?

No.

I have a suggestion for you.  Write a Pascal tokeniser in the
following three programming languages:
    o	C (end-of-file is a special value)
    o	Pascal (end-of-file is tested by eof(input))
    o	PL/I (end-of-file is an exception).
_Then_ come back and tell us it's "very little".
Based on my experience with these three, I'd rank them out of 10 on a
"difficulty" scale as C:  1, Pascal:  3, PL/I:  10.  (Try telling a C
programmer that he would be better off if end-of-file were handled by
a new SIGEOF signal.  If, back when I was writing PL/I, you had offered
me a version of PL/I which handled end-of-file the way Pascal does, I'd
have thanked you with tears in my eyes.)

What happens when you hit the end of a file is not a minor matter.
After all, every file has at least one end!  If we were designing a new
programming language, it would deserve the most careful attention.  The
treatment of end-of-file has a large effect on the structure of programs.

But the Prolog standard is not supposed to be a matter of designing
a new programming language.  I keep saying this, and people seem to
keep failing to see the point:  the criteria for changing an existing
language are MUCH more stringent than the criteria for designing a
new one.  For example, I think that abbreviations in the names of
evaluable predicates are bad, so that argument/3 would be a better
name than arg/3.  So what?  It isn't better ENOUGH to warrant the
change.  I could list a score of such things in Edinburgh Prolog which
are not to my personal taste, and which I believe I have objective
grounds for criticising.  What of that?  There is none of them bad
enough to warrant my breaking other people's code.  Now changing the
behaviour get0/1 and read/1 would break every program I have ever
written that does any input.  (The change from is_endfile(26) in
DEC-10 Prolog and some versions of C Prolog to is_endfile(-1) in
some versions of C Prolog and Quintus Prolog took an average of
about 10 seconds per file to fix with a good editor.)

If someone comes up to you and asks you to improve their programming
language, you have a pretty heavy responsibility to do a good job of
it.  Quintus move very slowly and very cautiously:  once we've put
something in the language, customers are likely to start using it,
and pulling a feature out on the grounds that we don't like it any
more is not really ethical behaviour.  The moral responsibility of
a group of people who take it on themselves to change a language
around without being asked to by the people who will be affected by
such changes is much much greater.  At the very least, a paramount
concern of such a group should be to provide enough operations and
hooks in the standard that "99%" compatibility packages for some
reasonably representative set of dialects should be KNOWN to be
definable using standard operations.  For example, in my work on
this in 1984, I very carefully worked through Waterloo Prolog (NOT
an Edinburgh-compatible Prolog) to find out what extra hooks would
be needed.

> It seems to me that whatever semantics is chosen, it is simple to get
> the other:

> BSIread(X) :-			| get_char(X) :-
>    DEC10read(X),		|	get0(C),
>    X \== end_of_file.		|	C =\= -1,
				|	string_list(X, [C]).

> DEC10read(X) :-		| get0(C) :-
>    BSIread(Y),		|	( get_char(X) -> string_list(X, [C])
>    !,				|	; C = -1
>    X = Y.			|	).
> DEC10read(end_of_file).

I can't find a BSI document which describes read/1 anything but vaguely,
so I've added the character I/O versions on the right, and it's those
I'll comment on.  (By the way, string_list/2 is a pretty appalling name;
you would expect it to have something to do with lists of strings.)

The latest character I/O document I checked was so phrased as to suggest
that having failed once, get_char/1 would continue to fail.  There was a
note which pointed out that it was still an open question whether
get_char/1 should do this or should report an error if called again
after having once failed.  This presumably carries over to read/1.
So we simply don't yet know whether the first definition is correct or
not, because BSI I/O is not yet fully defined.

    Case 1:  calling get_char/1 after it has already failed results in
	     an error report.  The cross-definitions of get_char/1 and
	     get0/1 would then be correct, IF an end-of-file condition
	     could be indicated only once in a file, which is false.

    Case 2:  get_char/1 keeps on failing quietly.  Then none of the
	     cross-definitions would be correct.

Since the only motivation that anyone has ever told me about for the
fail-at-end approach is the analogy between a file and a list of
characters, case 2 is the "natural" one.  That is, a parallel is
thought to exist between
	next_term([Head|Tail], Head, Tail).
and
	next_term(File, Head) :- read(File, Head).
and if we take this seriously, we would expect read(File, Head) to
keep on failing at the end of a file, just as next_term([], Head, _)
would keep failing.  Now the analogy is very far from being a good
one, so there may be some other motivation I have not been told about
which would make case 1 the "natural" one.

Even in case 1, and even discounting the extremely useful possibility of
a literal 'end_of_file' appearing in the input, it is still not clear
that the cross-definitions for read/1 would be correct.  There are two
difficulties:  what about syntax errors?  and what about end of file?
There are end-of-file problems in read/1 additional to those in get0/1,
due to the fact that a term is an extended object, and the fact that
read/1 may consume arbitrarily many characters without encountering a
term.

> It is after all easy to write DEC10read in terms of BSIread.

Strictly speaking, it is impossible, because the two syntaxes are
different.  Even ignoring that, it isn't clear to me that it is possible.
It *would* be possible to write read/1 in terms of get_char/1 (though it
would be rather more painful than it would be given get0/1).

ok@quintus.UUCP (Richard A. O'Keefe) (03/27/88)

In article <2412@zyx.UUCP>, grzm@zyx.UUCP (Gunnar Blomberg) writes:
> 	Well, considering the fact that nested comments can comment out
> *any* part of the file, not just the last part, and that the cases
> where nested comments do not work must be so exceedingly rare as to be
> practically non-existent, I would definitely prefer nested comments.
> Honestly, how often do you have unmatched beginning-of-nested-comment
> of end-of-nested-comment buried inside your code?

I see, it is ok to have an operation which works 99.9% of the time?
It would be ok if "X is 1+1" almost always gave you the answer 2 but
*might* give you 47?  You would be happy to cross a ravine on a bridge
which has been known to collapse before, but doesn't collapse often?

I have found at least two C compilers where
	char *file_pattern = "/usr/me/foo/*";
broke because the pre-processor thought the /* was the beginning of a
comment.  Are you trying to tell me that a Prolog programmer writing
for UNIX is never going to say
	file_pattern("/usr/me/foo/*").
and is never going to want to comment that out?  Are you really?

When I learned PL/I, the instructor stressed very strongly that we
should never start a /**/ comment in column 1.  Remember why?  No
doubt the JCL designers would have said "Honestly, how often do you
have /* buried inside your data decks?".

Perhaps I'm an old-fashioned fuddy-duddy, but it seems to me that
an operation should ALWAYS do exactly what it is supposed to do,
or else TELL you that it went wrong.  {And yes, DEC-10 Prolog didn't
live up to this, and no, I've never said that the standard should be
identical to DEC-10 Prolog.}

Yes, commenting file_pattern(...) out with PL/I-style comments isn't
going to work either, but with PL/I-style comments you KNOW that there
is no reason to expect it to work.  Commenting it out with "%" WILL work.
End-of-line comments are a much more reliable method of commenting
out blocks of code than nesting comments, and are already part of the
language.

> 	Well, just because nested comments are much more useful than
> plain ones does not mean that BSI should adopt them.  There is the
> question of supporting "old" code.  It would be interesting to know
> how many programs would break if Prolog comments were changed to be
> nesting.  Do you know of any?

(1) BSI Prolog is going to break existing code in much worse ways than
    that.  Old code written in ESI Prolog-2 stands a good chance of
    running under BSI Prolog, but old code written in Arity/Prolog has
    very little chance of getting away without massive changes.    

(2) The DEC-10 Prolog library broke when it was mounted under PopLog
    because PopLog used Modula-style comments and Edinburgh Prolog
    uses PL/I-style comments.

(3) With a table-driven tokeniser, there isn't any reason at all why
    the Prolog standard can't make *BOTH* types of comment available.

(4) If you think of "commenting-out" not as a matter of adding some
    characters at the beginning of a region and some other characters
    at the end, but as an operation on the entire region, you'll soon
    realise that you don't actually need nesting comments to be able
    to comment out a region that contains comments.  For example, the
    editor I am using at the moment has commands
	Ctrl-X Ctrl-[	comment out the region using /* */
	Ctrl-X Ctrl-]	undo the effect of ^X^[
    Works fine *even when nesting comments would break*, and I can use
    it in C as well as Prolog.  (Actually, in Prolog I use a Meta-%
    command which uses "%".)  So "much more useful" I take leave to doubt.

ok@quintus.UUCP (Richard A. O'Keefe) (03/28/88)

In article <2411@zyx.UUCP>, grzm@zyx.UUCP (Gunnar Blomberg) writes:
> In article <814@cresswell.quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes:
> >Just to continue with the get0 topic:
> >	The fail-at-end approach rests on an assumption
> >	which deserves to be made explicit, because it is false.
> >What is the assumption?   That receiving the end-of-file indication from
> >an operating system indicates that there is nothing further to read in
> >that stream.  This is false?  Yes.

> This argumentation seems a little doubtful to me.  I don't have
> experience with all the systems RAO'K mentions, but (to the best of my
> memory) I have never seen a use of end-of-file from the terminal that
> wasn't being used to pretend that the terminal was more than one file.

> DEC-10 Prolog (on TOPS-20, alas):
> 	User says [user], gives clauses and ends with ^Z.  The system
> 	pretends that there is a file 'user' by reading from the
> 	terminal until end-of-file is seen.  As far as Prolog is
> 	concerned the file ended at that point, and no more reading
> 	is done from that particular file at that point.

Wrong.  Dec-10 Prolog does **NOT** think the file ended at that point.
compile(user), consult(user), break, or anything which uses the usual
consult loop will stop.  But it will stop because it has seen 'end_of_file',
not because it thinks the file has ended.  Those loops don't do any more
reading because they DON'T do any more reading, not because they CAN'T.
Here is a transcript:

	| ?- read(A), read(B), read(C), read(D), read(E).
	|: this-is-a.
	|: ^Z
	|: this-is-c.
	|: ^Z
	|: ^Z
	A = this-is-a,
	B = D = E = end_of_file,
	C = this-is-c 

> In conclusion:  I think that software conventions concerning
> end-of-file from the terminal exist primarily to enable the
> system/user to pretend that the terminal is more than one file.

This may be misleading:  the effect is of NESTED files, not of a
sequence of files.  (In fact if you want, you can even interleave
several input streams from a terminal, each stream getting its own
end-of-file.)

Blomberg seems to suggest that by saying that the possibility of multiple
end-of-file conditions is to permit multiple user of the terminal, he
has shown that it is not important to preserve this behaviour.  I hope
that I have misunderstood him.

(1) I took some pains in my previous message to point out that it is not
    only terminals which can experience multiple end-of-file conditions.
    {Think for a moment about doing BSD-style non-blocking I/O in Prolog.
    Why _not_?}  And I specifically mentioned 'user'.  Even in the case
    of terminals, this behaviour can be obtained with _any_ (real or
    pseudo-) "terminal", not just 'user'.

(2) Nested "files" on a terminal are something one uses many times a day.
    Think about break/0 and debugging.

(3) "I don't use it" is not good justification for "it should go".
    I don't use ancestral cuts.  Is that any sort of justification for
    saying they should not be in the language?  No way!  I do say it,
    but I need much better evidence than that:  I had to show how you
    could write a Prolog interpreter without them, and I had to find
    arguments to show that ancestral cuts were bad in themselves.

Before we get into further debate about multiple end-of-file conditions
I should do a better job of explaining why I raised the point.  The
ability to 'consult(user)' and to 'break' is very useful for debugging.
Having read/1 and get0/1 return an end-of-file marker when the operating
system indicates end-of-file is a coherent convention which additionally
makes this use of the terminal very easy, without requiring special
conventions for resetting the state of the terminal stream.  (You get
nested data streams from 'user' without having to open new streams.)
It did not appear from what few sparse minutes there are that the BSI
committee had ever considered this.  I claim that this is existing
practice in "Edinburgh Prolog" (the BSI definition of which is "what
Clocksin & Mellish say"), and that extraordinarily good reasons are needed
to warrant a change in existing practice, and that those reasons have not
yet been explained.  Further, I pointed out the possibility of multiple
end-of-file conditions to show that the claim that it is easy to simulate
existing practice using the fail-at-end approach is a false claim.

I do not claim that returning an end-marker at the end is the only sensible
thing to do in all circumstances, nor that a Prolog standard should mandate
ALS-Prolog behaviour only.  Whoever it was who suggested that it should be
possible to get an end-marker, a failure, or an error, was clearly right.
Who could object to a standard which made it easier for Prolog users to
get their existing code running?

ok@quintus.UUCP (Richard A. O'Keefe) (03/28/88)

In article <243@gould.doc.ic.ac.uk>, cdsm@ivax.doc.ic.ac.uk (Chris Moss) writes:
> I have yet to
> find the document that Richard referred to (a few days ago) when he
> claimed that the BSI's mandate was to standardize Edinburgh Prolog.
> It certainly hasn't been repeated in all the other formal
> presentations that have been made to BSI or ISO.

Don't bother, I'm about to post it.

> But if one has to
> follow every wrinkle of an implementation just because it represents
> (arguably) the most popular dialect, then why don't we just
> appoint IBM to write all our standards for us (or Quintus or ...)?
> [And who is the TRUE inheritor of the title "Edinburgh Prolog" anyway? Is
> it the commercial product (formerly NIP) now being sold under that title?]

No, one doesn't have to follow every wrinkle of an implementation.
How often do I have to repeat it?  I don't give a continental for
implementors or implementations.  Not Quintus, not ALS, not Arity, not
IBM, not Borland, not any of them.

	+---------------------------------------+
	|  What I care about is Prolog *USERS*  |
	+---------------------------------------+

The question to be asked every time is "might this change break a reasonable
program?"  "How can we make it easy for people who are already using Prolog
to change over to the standard, especially for people using Prolog systems
whose vendors have made some attempt at compatibility?"

What is "Edinburgh Prolog"?  I have two definitions, which have much the
same practical effect.
(1) An "Edinburgh Prolog" is one whose implementors made a serious and
    reasonably successful attempt to make their system compatible with
    Clocksin & Mellish, or better yet, with DEC-10 Prolog or C Prolog.

(2) An "Edinburgh Prolog" is one to which I can port the DEC-10 Prolog
    library in two days, with only a text editor to help me.  (That is,
    no boot-strapping through Prolog or Lisp.)

By either definition, we have the following results:
	Dialect		is "Edinburgh Prolog"?
	VM/PROLOG	no
	Waterloo Prolog	no
	AAIS Prolog	not quite, but closer than BSI Prolog by (2)-estimated
	BIM Prolog	in "native" mode, no, in "-c" mode almost
	IF Prolog	yes
	Arity Prolog	yes
	ESI Prolog-2	no
	micro-PROLOG	no
	Stony Brook	yes
	SICStus Prolog	yes
	ALS Prolog	yes
	NU Prolog	yes
	Poplog		yes (well, it was in mid-1984)
	LM Prolog	no
	NIP		yes (in 1985)
Other Prolog versions are omitted because I haven't got access to manuals
for them and haven't used them.  The fact that I classify something as
"not an Edinburgh Prolog" does not mean that I regard it as technically
inferior, only that I regard it as sufficiently different to be hard to
port to or from.

> Now in the days before tail recursion and all the other optimizations
> this was inevitable. But why should we encourage this approach today?

A very simple reason:  there are still people trying to use Prolog on
IBM PCs and clones, and several PC Prologs have 64kbyte stacks.  So
failure-driven loops are still necessary on those machines, not because
the Prolog systems are bad, but because they are good enough for the
limitations of the machine to be encountered.

> I seem to get by perfectly well by setting "ignoreeof" in my cshell!

This doesn't work terribly well if you are using the Bourne shell.
Is Prolog to be standardised only for people who use the C shell?

> Now where do I find the Emacs macro for commenting out all lines
> between dot and mark (and removing such comments)?

Well, the editor which I claimed does it isn't Emacs.  Commenting out lines
using "%" is 18 lines of C.  Commenting out the region with /**/ and undoing
that come to 40 lines of C.  I'll send this by E-mail if you're interested:
it should be easy to translate them to mock-Lisp, except that I strongly
dislike mock-Lisp.

lee@mulga.oz (Lee Naish) (03/30/88)

In article <243@gould.doc.ic.ac.uk> cdsm@doc.ic.ac.uk (Chris Moss) writes:
>e.g.  go :- repeat,
>	    read(Term),
>            (Term=end_of_file -> true; process(Term), fail).
>
Testing for the end of file term using = is a common error.  If a
variable is read, it succeeds.
A second criticism I would make of this code is that it has a tendency
to loop.  I think that repeat/0 should also have a matching cut.

I posted a nicer way to encapsulate this backtracking style of reading
terms a while back.  It is also possible to move the read back into the
repeat loop, avoiding repeat and the need for cut.  With tro, it is just
as efficient.  Interestingly, it it works whether read fails or succeeds
on eof.

	% returns all terms read by backtracking
	% (should have stream/file arg and close file at eof?)
read_terms(Term) :-
	read(Term1),
	\+ isEof(Term1),	% if you dont want to return end_of_file
	(	Term = Term1
	;
		% \+ isEof(Term1),	% if you do want to
		read_terms(Term)
	).

Richard metioned some subtle differences between eof/1, is_eof/1 etc in
different systems.  There is another one which he missed: in NU-Prolog,
isEof/1 checks if its argument is the eof term (reading variables works)
and eof/1 returns the eof term (which is end_of_file for comatability).

Now for my suggestion of a new predicate which can be used to implement
your favourite version of read/1:

	read_term(Stream, Term)		% change the name if necessary

1) If it succeeds in reading a term T,
		Term = term(T, VarNames)
	where Varnames is some structure which allows mapping between
	variables and their names.  Wrapping a functor around the term
	enables to distinguish between variables, 'end_of_file' and real
	end of file easily.  It also lets us retain variable name
	information.

2) If end of file is encountered for the first time, or if an end of
	file marker occurs next in the stream (like ^Z on a terminal)
		Term = eof_marker

3) If eof has already been read and multiple eof markers are not
	possible
		Term = error(past_eof)
	Whether this is an error is arguable, but by explicitly
	returning something, the programmer has the choice of what
	to do.  Rather than having a proliferation of top level
	functors being returned by read_term, it seems reasonable to
	wrap the error functor around past_eof.

4) If there is a syntax error
		Term = error(syntax(X))
	where X is some indication of the error

5) If Stream is not a valid stream
		Term = error(invalid_stream(X))
	where X is some indication of the error

6) If there has just been a disk head crash
		Term = error(unix(hardware(disk_head_crash(X))))
etc, etc.

Similarly, reading characters could be done as follows

	read_character(Stream, Char)	% change name if necessary

1) If it succeeds in reading a character C,
		Char = C
	where char_to_int(C, I) can map the character to a small integer
	for get0/1.  There is no special functor needed to wrap up Char,
	assuming the other things returned by read_character/2 can be
	distinguished from characters (eg, by is_character(Char)).

2) If end of file is encountered for the first time, or if an end of
	file marker occurs next in the stream (like ^Z on a terminal)
		Char = eof_marker

3) If eof has already been read and multiple eof markers are not
	possible
		Char = error(past_eof)

4) I doubt that there will ever be a need for error(syntax(X)), but
	it should be reserved anyway.

5) If Stream is not a valid stream
		Char = error(invalid_stream(X))
etc, etc.

I think it would be useful for these (with the details fleshed out a bit
more) to be part of the standard.

	lee

micha@ecrcvax.UUCP (Micha Meier) (04/07/88)

In article <243@gould.doc.ic.ac.uk> cdsm@doc.ic.ac.uk (Chris Moss) writes:
>... I personally think "repeat" should
>be left out of the standard (there's no penalty overhead in not having it
>built-in these days anyway). Don't other people have my problem?

	Unfortunately, there is a penalty when it is not built-in.
	If repeat/0 is coded as

	repeat.
	repeat :- repeat.

	it creates a new choice point each time the system backtracks
	to it, however this is not the main point, I'm sure there
	are clever compilers around that could get by with it.
	The problem concerns the Byrd box model for debugging and it was
	pointed to me by Thomas Graf:
	if repeat/0 is built-in, it is called only once (it enters the CALL
	port and leaves through the EXIT port), on backtracking it enters
	by REDO and leaves by EXIT. When its choice point is cut,
	it just exits or fails.

	The situation is different with the above Prolog coding:
	on each backtracking to repeat/0 a new call is made, i.e. a new box
	and hence there are two more ports to trace. Look at this
	script from SICStus Prolog:

| ?- [user].
| repeat1.
| repeat1 :- repeat1.
| ^D
yes
| ?- trace, repeat1, fail.
 
 The debugger will first creep -- showing everything (trace).
 1  1  Call: repeat1 ? 
 1  1  Exit: repeat1 ? 
 2  1  Call: fail ? 
 2  1  Fail: fail ? 
 1  1  Redo: repeat1 ? 
 2  2  Call: repeat1 ? 
 2  2  Exit: repeat1 ? 
 1  1  Exit: repeat1 ? 
 3  1  Call: fail ? 
 3  1  Fail: fail ? 
 1  1  Redo: repeat1 ? 
 2  2  Redo: repeat1 ? 
 3  3  Call: repeat1 ? 
 3  3  Exit: repeat1 ? 
 2  2  Exit: repeat1 ? 
 1  1  Exit: repeat1 ? 
 4  1  Call: fail ? 

	etc., while the built-in repeat/0 behaves normally:

| ?- repeat, fail.
1  1  Call: repeat ? 
1  1  Exit: repeat ? 
2  1  Call: fail ? 
2  1  Fail: fail ? 
1  1  Redo: repeat ? 
1  1  Exit: repeat ? 
2  1  Call: fail ? 
2  1  Fail: fail ? 
1  1  Redo: repeat ? 
1  1  Exit: repeat ? 
2  1  Call: fail ? 

	You can say that it is possible to skip over the multiple
	'repeat' ports, but the point is, that the stack space for them
	is needed even when they are not printed - you cannot
	run such a repeat/0 forever, eventually it is going to overflow
	some stack.
	On the other hand, we could ask whether the box model is right
	in this case - after all it does not bring any new information
	by repeating all these ports.

	There is no reasonable argument to force people to use
	tail-recursive loops instead of repeat-fail loops;
	if I'm using temporary structures and I know that with the
	recursive loops they are going to be garbage collected whereas
	with repeat-fail loops they are just popped, I will always
	prefer the latter and the standard should support me
	by providing a built-in repeat/0, since its full functionality
	cannot be provided by other means.

---

Another point I want to make concerns the -1 returned by get0/1
at eof: several people have claimed that it is portable
and that it cannot be confused with any character, however
it is *not* portable, since it relies on the fact that
no valid character can be confused with -1. If characters are
represented as strings of length 1, then -1 has a different type
and so there is no confusion, but the eof value should have the same type
(if nothing else then because of indexing). If characters are integers,
taking -1 implies that no character can have the code 2^n - 1
(n being the number of bits on which the character is stored)
which is not necessarily true - you can use 7 bits for ASCII, 16 bits for
Kanji and anything else on any number of bits. Only if we waste
enough space we can guarantee that -1 will be different.
A standard that forces you to waste space would really not be good.

--Micha

ok@quintus.UUCP (Richard A. O'Keefe) (04/09/88)

In article <522@ecrcvax.UUCP>, micha@ecrcvax.UUCP (Micha Meier) writes:
> Another point I want to make concerns the -1 returned by get0/1
> at eof: several people have claimed that it is portable
> and that it cannot be confused with any character, however
> it is *not* portable, since it relies on the fact that
> no valid character can be confused with -1. If characters are
> represented as strings of length 1, then -1 has a different type
> and so there is no confusion, but the eof value should have the same type
> (if nothing else then because of indexing). If characters are integers,
> taking -1 implies that no character can have the code 2^n - 1
> (n being the number of bits on which the character is stored)
> which is not necessarily true - you can use 7 bits for ASCII, 16 bits for
> Kanji and anything else on any number of bits. Only if we waste
> enough space we can guarantee that -1 will be different.
> A standard that forces you to waste space would really not be good.
> 
> --Micha

Er, which character set standards allow a character to be represented by
a negative number?  I only know about ISO 646, ASCII, EBCDIC, ISO 8859,
and XNS, and all of them define character codes to be positive integers.
Perhaps someone from Japan could comment on the JIS codings; certainly
XNS doesn't assign any Kanji a number which could be confused with a
negative integer, even in 16-bit 2s complement.  Wouldn't representing
some characters by negative numbers mean that comparison of character
codes would disagree with the collating order defined by the standard?

It is not the case that using -1 as end of file mark means that no
character can have the value 2^n-1.  All it means is that the
integer representation used by Prolog must contain at least one more
bit than the number of bits used to *store* characters.  (The whole
point of the end of file marker is that it is a value which *can't*
be stored:  it can never be a valid character in a file and it can't
appear in the name of an atom or the text of a string.)  This isn't
much of a restriction.  Even for XNS, which I believe includes all
the JIS-required Kanji, 16 bits would suffice for Prolog integers.

We don't have to waste any space at all.  For example, VM/PROLOG has
two representations for integers:  a compact one for 24-bit integers,
and another one for 32-bit integers.  Similarly, a Prolog system for
PCs using 16-bit "area" tags could have one tag for 16-bit positive
integers and another for 16-bit negative integers and a third for
bigger integers represented indirectly.  (This is what Interlisp-D does.)

I've used a programming language where the character input operation
returned one type of object for ordinary characters and another type
for end of file.  It was amazingly painful:  you always had to test
for the end of file object before doing anything with the result,
because character comparison &c were not defined on the end of file
object.  If characters are to be represented by strings of length 1
(what an utterly disgusting vomitously repulsive kludge), representing
end of file markers by the empty string seems like the obvious thing.
This representation would even make the end of file marker less than
any valid `character', which is what the -1 convention currently does.