[comp.lang.prolog] PROLOG DIGEST V6 #29

restivo@POLYA.STANFORD.EDU (Chuck Restivo) (05/12/88)

Date: Fri 6 May 1988 0:53-PST
>From: Chuck Restivo (The Moderator) <PROLOG-REQUEST@POLYA.STANFORD.EDU>
Reply-to: PROLOG@POLYA.STANFORD.EDU>
US-Mail: P.O. Box 4584, Stanford CA  94305
Subject: PROLOG Digest   V6 #29
To: PROLOG@POLYA.STANFORD.EDU

PROLOG Digest           Friday, 6 May 1988      Volume 6 : Issue 29

Today's Topics:
    					Query - Interface,
			    Implementation - Strings & end_of_file
----------------------------------------------------------------------------------------------------------------------------

Date: 24 Mar 88 16:29:18 GMT
>From: mcvax!ukc!dcl-cs!simon@uunet.uu.net  (Simon Brooke)
Subject: Interface

Here is another embarrasingly simple problem. I'm fairly clear that there
is a simple solution... but I don't know what it is.

I am writing code to interface prolog (as it happens, Quintus) to an
external database running on a remote machine. When making a query on the
remote database, it is essential to form the best joins, because joining
the database is computationally expensive. 

The geography of the database can be seen as a messy graph. Each table can
be joined to at least one other, and this possibility, together with the
fields that make it possible, is represented in prolog by the symmetrical
relation:

	canJoin( [ Table1, Key1], [ Table2, Key2]).

Furthermore, we can be sure that there is at least one possible path from
any table to any other. However, joins may have to connect an arbitrary
number of tables. The rules for this are:

*	A continuous join path must be found which visits every table
	in the list;
*	The path must not cycle;
*	The path may branch.

In other words, the smallest possible tree must be formed such that it
visits all the nodes on the target list.

What I want is a predicate of the form:

	findBestJoins( +TableSet, -JoinSet).

where TableSet is a set of tables to be joined, and JoinSet is a list of
join specifications (ideally in the form [ [ Table1, Key1], [ Table2, Key2]])
which describe the tree.

After looking at this ugly monster for a while, I have come up with an
English description of a procedural algorithm to tackle it, and this I
intend to code over the next few days. But that misses the point. Prolog
is supposed to be a declarative language, and this problem can easily be
described in declarative terms [look, I've done it up there :-)]. But I
can't for the life of me work out how to write it, declaratively, in
prolog. Can you?

Thank you.

-- Simon Brooke

------------------------------

Date: 26 Mar 88 06:58:22 GMT
>From: quintus!ok@unix.sri.com  (Richard A. O'Keefe)
Subject:  Another embarassingly simple problem

Isn't the matrix chain problem a special case of this?

------------------------------

Date: 26 Mar 88 06:55:17 GMT
>From: quintus!ok@unix.sri.com  (Richard A. O'Keefe)
Subject:  Strings

In article <488@dcl-csvax.comp.lancs.ac.uk>, simon@comp.lancs.ac.uk (Simon Brooke) writes:
> In article <776@cresswell.quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes:
> >Xerox Lisp uses lists for bignums to this very day.
> 
> Yes, true. But please don't assume this means it is efficient.

Well, no, I didn't assume that.  I was replying to a message which claimed
that "Lisp" didn't use lists for bignums any more.

> For
> example, I recently benchmarked a Xerox 1186 running Interlisp (Koto)
> against the new Acorn Archimedes running Arthur Norman's Cambridge Lisp.
> Generally the Arch was a bit faster, reflecting the simpler lisp and
> faster processor. But running (fact 1000) it was 321 (three hundred and
> twenty one) times faster - and this must reflect grotesque inefficiency in
> Xerox' bignum code.

If it was "recently", didn't you have the Lyric release?

I am shocked to hear that the Archimedes was only "a bit faster".
As you can easily find out if you have Xerox Quintus Prolog, it is a _lot_
faster at list processing than Xerox Lisp is.  I would have expected
Cambridge Lisp on a RISC to be about as fast as XQP.  What's wrong?

The (fact 1000) test may reflect only the simple fact that the
Archimedes has a 32-bit ALU, whereas the Xerox 1186 has a basically
16-bit ALU.  It is easy to pull apart an Interlisp bignum and see what
is inside.  It is a list of FOURTEEN-bit 2-s complement integers.  It
works out at about 0.4 data bits per stored bit.  The vector approach
should get about 1 data bit per stored bit.

Another possible consideration is storage turnover.  (fact 1000) requires
over 8500 bits to represent [about 266 32-bit words, but about 610 list
cells].  The Xerox D-machines are not noted for their paging performance.
Did you make sure that all the relevant code was paged in before
measuring the performance of (fact 1000)?

------------------------------

Date: 26 Mar 88 06:12:05 GMT
>From: quintus!ok@unix.sri.com  (Richard A. O'Keefe)
Subject: Strings

There are two separate and distinct ways of doing this using the
Quintus library, and both are documented in the Library manual.

In Section 8.3, you will find

	put_chars(+Chars)
and
	put_line(+Chars)

described.

These are exactly the writestring you want, except that they have
truthful names.  (put_chars/1 does 'put' to a 'chars', that is, to a
list of 'char's.  put_line does 'put' to a list of character codes
considered as a line; it's the equivalent of puts(3S) in the C library.)

There is also a library package called library(print_chars) which
makes the 'print'ing of chars values entirely automatic.  It is
described in section 9-2-2 of the library manual.  The point of this
is that once you have done

	| ?- ensure_loaded(library(print_chars)).

then top-level output, debugger output, and print/1 will have this sort
of effect:

	| ?- X = ["this","is","a","list","of","chars","values"].
	X = ["this","is","a","list","of","chars","values"]

	| ?- append("stuff", Tail, Chars).
	Tail = _537,
	Chars = "stuff|Tail"

{this is an excerpt from an actual transcript}.  If you look in the
DEC-10 Prolog library you will find the ancestor of print_chars under
the name PORSTR.PL.

> What does that tell us about the use of strings? It suggests to
> me that people actually always use atoms for this purpose, and somewhere 
> in their code is an implicit definition: 
>     writestring(String) :- name(Atom,String),write(Atom).  

What it actually tells us is that Chris Moss didn't look very hard in
the manual.  Even if the Quintus library didn't provide two ways of
doing this, what people have done in other dialects is to define

	puts([]).
	puts([C|Cs]) :- put(C), puts(Cs).

In fact, if I remember correctly, MU-Prolog extended put/1 to do this.
Why drag atoms into it?  In fact, the code that Chris Moss shows us will
not work:  writestring("00") would print a single "0".  {Yes, this is not
a nice thing for name/2 to do, which is why Quintus Prolog offers
atom_chars/2 _as well_.}

> 2. I don't like having ASCII values in my programs.

As Chris Moss explained later in his posting, this is not an argument AGAINST
having a notation for lists of character codes.  It is an argument FOR having
a notation for character constants.

> With a certain amount
> of discipline and inefficiency one can get round that, by saying, for 
> instance:
>     lowercase(X) :- "a"=[A], "z"=[Z], X>=A, X=<Z.

Why do that when you can write
	lowercase(X) :- "a" =< X, X =< "z".
Better yet, why not do
	lower_case(X) :- is_lower(X).
where is_lower/1 is one of the character classification predicates stolen
from C that I suggested in my 1984 "evaluable predicates" document (BSI
number PS/6).  (Quintus Prolog provides this, and NU Prolog provides it
under the name isLower/1.)  Best of all, is_lower/1 will solve for X...

In fact, while Chris Moss may not _like_ having ASCII values in his
programs, that's what he just did, and having a character type won't
make them go away.  His code for lowercase is quite wrong for EBCDIC.
It's also wrong for ISO 8859, where there are another 64 lower case
letters.  The Quintus Prolog library contains a package ctypes.pl which
comes in an EBCDIC version as well as an ASCII version, and
	is_lower(X)
is the right thing to do in either context.

This is part of what I meant in an earlier posting by asking what was the
right set of operations for a character type.  (How do you test whether a
character code stands for a Hebrew letter in the appropriate variant of
EBCDIC?  No fair looking in the 3270 manual, now.)

> OK, there is the 0'a notation (another way of writing 97 if you're using
> ASCII). Now that DOESN'T work in MuProlog, or Poplog or ...

It _DOES_ work in NU Prolog, the successor to MU Prolog.  (See section
3.2.2 paragraph (c) of the NU Prolog manual, version 1.1.)  And PopLog
has `a`, has it not?  It should be the work of a few minutes to convert
0'a to `a` or conversely by writing a VED or EMACS macro.

> On efficiency I mostly agree with Richard. I too like strings to be lists,
> and can't see the real benefits of a string type, though you don't tend to
> miss what you never had!

Try SNOBOL, PL/I, BASIC, Burroughs Extended Algol, AWK, Common Lisp, &c &c.
What a pain what a pain.

> As far as notation is concerned I have no better suggestion
> than 0'a though I don't much like it.

The reason that Edinburgh Prolog uses 0'x is that by the time we thought
of it, there were programs using all the other ASCII printing characters,
and the DEC-10 tokeniser already accepted 0'<digits> but didn't do anything
terribly sensible with it.  The only virtue I've ever claimed for it is
that it didn't break anything.  The Pop-11 `x` notation is obviously
superior as a notation.  (0'x also suggests, as it is meant to, that it
is an integer.  `x` doesn't suggest that, so would be more appropriate to
a separate "character" type.).

> I wouldn't object to the C solution, which allows them to be used 
> in arithmetic contexts. e.g. Num is Ch-0'0 or Letter =<0'z.

But the C solution is that characters *ARE* integers.
The constant expression 'a' in C has type "int", not "char".

> Most of my suggestions at main committee meetings have been ignored!

I'm very sorry to hear that, but the quality of the BSI built-in-predicates
work seemed to me to be far too low to be compatible with the assumption that
you had much of a hand in it.  This is not to say that I would expect to
_like_ your suggestions if I knew what they were, but I certainly would
expect them to be coherent and sensible.

------------------------------

Date: 27 Mar 88 00:22:18 GMT
>From: defun.utah.edu!shebs@cs.utah.edu  (Stanley T. Shebs)
Subject: Strings

A minor point: I don't think I claimed that "`Lisp' didn't use lists for
bignums any more"; if I did, it was a boner.  For almost any possible design
choice, you can find one or more systems in the past that made that choice.
It is true that lists for bignums are unusual nowadays, but nobody (so far as
I know) has made a scientific study as to whether lists are better/worse than
vectors for representing bignums.  Anybody need a masters thesis?

>Did you make sure that all the relevant code was paged in before
>measuring the performance of (fact 1000)?

Ah yes, one of the nasty little details that a truly meaningful study
would have to take into account.  Still, I would expect all the relevant
code to get paged in after the first few bignums...

-- stan shebs

Date: 25 Mar 88 12:17:31 GMT
>From: mcvax!enea!zyx!grzm@uunet.uu.net  (Gunnar Blomberg)
Subject:  behavior of read/get0 at end_of_file

Hmm...  isn't this a lot of fuss about very little?

It seems to me that whatever semantics is chosen, it is simple to get
the other:

BSIread(X) :-
   DEC10read(X),
   X \== end_of_file.

DEC10read(X) :-
   BSIread(Y),
   !,
   X = Y.
DEC10read(end_of_file).

Given that most Prologs seem to use the DEC-10 Prolog approach, and
that it is probably marginally more efficient to write BSIread in
terms of DEC10read than the other way around, the DEC-10 approach seems
the obvious choice.  Not that I think the other choice is all that
much worse...  Isn't it more interesting to discuss things where it is
harder to get it the way one wants (like the question raised by
Richard O'Keefe about whether a string data-type is necessary, or even
useful.  Now *that* is interesting!)

----------
At this point I had a discussion with a colleague of mine, and it
turns out that it isn't this simple.  In fact, I now believe that it
is impossible to get the BSIread functionality from a system that only
provides the DEC-10 one.  The predicate BSIread above will fail if the
file read contains 'end_of_file', of course.  This (for me) tips the
balance over in favor of the BSI approach.  It is after all easy to
write DEC10read in terms of BSIread.

Naturally there should be a provision for compatibility with "old"
programs.  I would be quite happy to name BSIread read_term, for
instance, and provide a user-level predicate read, that could be
redefined to give the required semantics.

-----------
As far as get0 goes, the question is much easier, since there *is* an
obvious out-of-band value, namely -1.

--  Gunnar Blomberg

------------------------------

Date: 25 Mar 88 12:14:47 GMT
>From: mcvax!enea!zyx!grzm@uunet.uu.net  (Gunnar Blomberg)
Subject:  behavior of read/get0 at end_of_file

In article <801@cresswell.quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes:
>[...]  So the alternatives I can see at the moment
>are
>    o	construct a new string every time.
>    o	precompute 2^16 strings.
>    o	cache 2^8 strings, and construct a new string every
>	time for Kanji and other non-Latin alphabets.
>    o	not support Kanji or other non-Latin alphabets at all.

How about:
     o	support an immediate representation for characters.
if you've got room for them in your pointers. Or
     o	cache them as they occur.
if you haven't.

I can't see that the fact that characters look like one-element strings
to the Prolog programmer in any way would stop an implementor from
implementing them using the same tricks as if characters were a
separate data-type.  Yes, it makes the internal string-handling
somewhat more convoluted, but not unduly so, I would say.

-- Gunnar Blomberg

------------------------------

Date: 25 Mar 88 12:50:44 GMT
>From: mcvax!enea!zyx!grzm@uunet.uu.net  (Gunnar Blomberg)
Subject:  behavior of read/get0 at end_of_file

	Well, considering the fact that nested comments can comment out
*any* part of the file, not just the last part, and that the cases
where nested comments do not work must be so exceedingly rare as to be
practically non-existent, I would definitely prefer nested comments.
Honestly, how often do you have unmatched beginning-of-nested-comment
of end-of-nested-comment buried inside your code?
	Well, just because nested comments are much more useful than
plain ones does not mean that BSI should adopt them.  There is the
question of supporting "old" code.  It would be interesting to know
how many programs would break if Prolog comments were changed to be
nesting.  Do you know of any?
[I have actually seen the following style used in C:
	/* #define wantFOO 1	/* To get foo feature */
	#define wantBAR 1	/* To get bar feature */
	/* #define wamtBAZ 1	/* To get baz feature */
 It gave me a good laugh at the time.]
	In any case, I have always considered the use of end_of_file
to get some kind of half-baked ability to comment out a part of a file
as an abomination (which does not mean I didn't use it and find it
useful).

--  Gunnar Blomberg

------------------------------

Date: 25 Mar 88 12:33:40 GMT
>From: mcvax!enea!zyx!grzm@uunet.uu.net  (Gunnar Blomberg)
Subject:  behavior of read/get0 at end_of_file

In article <814@cresswell.quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes:
>Just to continue with the get0 topic:
>
>	The fail-at-end approach rests on an assumption
>	which deserves to be made explicit, because it is false.
>
>What is the assumption?   That receiving the end-of-file indication from
>an operating system indicates that there is nothing further to read in
>that stream.  This is false?  Yes.
>
[Lots of examples deleted]

This argumentation seems a little doubtful to me.  I don't have
experience with all the systems RAO'K mentions, but (to the best of my
memory) I have never seen a use of end-of-file from the terminal that
wasn't being used to pretend that the terminal was more than one file.

Cases in point:

DEC-10 Prolog (on TOPS-20, alas):
	User says [user], gives clauses and ends with ^Z.  The system
	pretends that there is a file 'user' by reading from the
	terminal until end-of-file is seen.  As far as Prolog is
	concerned the file ended at that point, and no more reading
	is done from that particular file at that point.

Using the terminal as standard input in Unix:
	Example: user types 'cat >foo' and then writes contents of file
	on terminal, indicating end by end-of-file.  As far as the
	reader of that particular input is concerned the file ended at
	that point, and no more reading is done from that particular
	'file'.

In conclusion:  I think that software conventions concerning
end-of-file from the terminal exist primarily to enable the
system/user to pretend that the terminal is more than one file.  In
fact, I know of no instance where this is not so.  Can somebody come
up with an example where multiple end-of-files are actually used in
one single ('conceptual') file?

--  Gunnar Blomberg

------------------------------

Date: 25 Mar 88 13:16:24 GMT
>From: eagle!icdoc!ivax!cdsm@ucbvax.Berkeley.EDU  (Chris Moss)
Subject:  behavior of read/get0 at end_of_file

In article <801@cresswell.quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes:
rok>I find it extremely odd to call a string of length one a character.
rok> ...  But it because rather more
rok>clumsy on the D machines, which have a 16-bit character set.  (Can you say
rok>"Kanji"?  I knew you could.)

Yes, the BSI committee is just beginning to face up to this problem,
as the Japanese have just started taking an interest...
As Richard points out, it's not much problem for a character based
definition, which I personally would favour.

rok>The fail-at-end approach forces us not only to do something special
rok>with the get0/1 in rest_identifier/3, but in everything that calls it.

rok>(In the Prolog tokeniser, there are two such callers.)
rok>
rok>The point is that if-then-elses such as Meier suggests start
rok>appearing all over the place like maggots in a corpse if you adopt
rok>the fail-at-end approach, to the point of obscuring the underlying
rok>automaton.

I think this is a fair point when looking at the definition of
lexical analysers, however...

mmeier> I must say, none of the two seems to me satisfactory. Richard's
mm> 	version is not portable due to the -1 as eof character.

A character definition which included a (special) end-of-file token would be
better.

mm> 	BSI objects that if [read/1] returns e.g. the atom end_of_file
mm> 	then any occurrence of this atom in the source file
mm> 	could not be distinguished from a real end of file.
rok>
rok>That's not a bug, it's a feature!  I'm serious about that. 

I don't think that is any better than most uses of that particular
argument. Sure, if you learn to live with it you can find uses for it.

rok>Before taking end_of_file away from me, the BSI committee should supply
rok>me with a portable way of exiting a break level and a reliable method of
rok>leaving test cases in a file without having them always read.

And this is the death of any standardization process!  I have yet to
find the document that Richard referred to (a few days ago) when he
claimed that the BSI's mandate was to standardize Edinburgh Prolog.
It certainly hasn't been repeated in all the other formal
presentations that have been made to BSI or ISO. But if one has to
follow every wrinkle of an implementation just because it represents
(arguably) the most popular dialect, then why don't we just
appoint IBM to write all our standards for us (or Quintus or ...)?
[And who is the TRUE inheritor of the title "Edinburgh Prolog" anyway? Is
it the commercial product (formerly NIP) now being sold under that title?]

To return to the argument, I think there's a significant difference between
get0 and read. Having an end-of-file marker for read is (almost
never) used to implement finite-state-machines. Instead it is used
for repeat-fail loops.

e.g.  go :- repeat,
	    read(Term),
            (Term=end_of_file -> true; process(Term), fail).

Now in the days before tail recursion and all the other optimizations
this was inevitable. But why should we encourage this approach today?

The above clause is a good example of the trickiness of "repeat". I always
write repeat loops wrong first time and this was no exception. I put 
            (Term=end_of_file -> true; process(Term)), fail.
then changed it to
            (Term=end_of_file -> !; process(Term)), fail.
before settling on the above version.  I personally think "repeat" should
be left out of the standard (there's no penalty overhead in not having it
built-in these days anyway). Don't other people have my problem?

It would seem to encourage better programming if we allowed "get0"
(or get_file or whatever) to return an end-of-file token, and any
high-level routines to fail at end-of-file. It's not particularly
consistent, but I don't know whether that's a priority in this case.

rok>In fact, on my SUN right now I have function key F5 bound to
rok>"end_of_file.\n" so that I can get out of Prolog without running the
rok>risk of typing too many of them and logging out.

I seem to get by perfectly well by setting "ignoreeof" in my cshell!

rok>Ah, you'll say, but that's what nested comments are for!
rok>Well no, they don't work.  That's right, "#| ... |#" is NOT a reliable
rok>way of commenting code out in Common Lisp, and "/* ... */" is NOT a
rok>reliable way of commenting code out in PopLog. 

That seems to be the best argument for allowing end-of-line comments in
Prolog. Now where do I find the Emacs macro for commenting out all lines
between dot and mark (and removing such comments)?

Chris Moss

Disclaimer: unless I say otherwise I am expressing my personal opinions
NOT the opinions of any committee!
------------------------------

End of PROLOG Digest
********************

restivo@POLYA.STANFORD.EDU (Chuck Restivo) (07/08/88)

Date: Fri 6 May 1988 0:53-PST
From: Chuck Restivo (The Moderator) <PROLOG-REQUEST@POLYA.STANFORD.EDU>
Reply-to: PROLOG@POLYA.STANFORD.EDU>
US-Mail: P.O. Box 4584, Stanford CA  94305
Subject: PROLOG Digest   V6 #29
To: PROLOG@POLYA.STANFORD.EDU

PROLOG Digest           Friday, 6 May 1988      Volume 6 : Issue 29

Today's Topics:
    					Query - Interface,
			    Implementation - Strings & end_of_file
----------------------------------------------------------------------------------------------------------------------------

Date: 24 Mar 88 16:29:18 GMT
From: mcvax!ukc!dcl-cs!simon@uunet.uu.net  (Simon Brooke)
Subject: Interface

Here is another embarrasingly simple problem. I'm fairly clear that there
is a simple solution... but I don't know what it is.

I am writing code to interface prolog (as it happens, Quintus) to an
external database running on a remote machine. When making a query on the
remote database, it is essential to form the best joins, because joining
the database is computationally expensive. 

The geography of the database can be seen as a messy graph. Each table can
be joined to at least one other, and this possibility, together with the
fields that make it possible, is represented in prolog by the symmetrical
relation:

	canJoin( [ Table1, Key1], [ Table2, Key2]).

Furthermore, we can be sure that there is at least one possible path from
any table to any other. However, joins may have to connect an arbitrary
number of tables. The rules for this are:

*	A continuous join path must be found which visits every table
	in the list;
*	The path must not cycle;
*	The path may branch.

In other words, the smallest possible tree must be formed such that it
visits all the nodes on the target list.

What I want is a predicate of the form:

	findBestJoins( +TableSet, -JoinSet).

where TableSet is a set of tables to be joined, and JoinSet is a list of
join specifications (ideally in the form [ [ Table1, Key1], [ Table2, Key2]])
which describe the tree.

After looking at this ugly monster for a while, I have come up with an
English description of a procedural algorithm to tackle it, and this I
intend to code over the next few days. But that misses the point. Prolog
is supposed to be a declarative language, and this problem can easily be
described in declarative terms [look, I've done it up there :-)]. But I
can't for the life of me work out how to write it, declaratively, in
prolog. Can you?

Thank you.

-- Simon Brooke

------------------------------

Date: 26 Mar 88 06:58:22 GMT
From: quintus!ok@unix.sri.com  (Richard A. O'Keefe)
Subject:  Another embarassingly simple problem

Isn't the matrix chain problem a special case of this?

------------------------------

Date: 26 Mar 88 06:55:17 GMT
From: quintus!ok@unix.sri.com  (Richard A. O'Keefe)
Subject:  Strings

In article <488@dcl-csvax.comp.lancs.ac.uk>, simon@comp.lancs.ac.uk (Simon Brooke) writes:
> In article <776@cresswell.quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes:
> >Xerox Lisp uses lists for bignums to this very day.
> 
> Yes, true. But please don't assume this means it is efficient.

Well, no, I didn't assume that.  I was replying to a message which claimed
that "Lisp" didn't use lists for bignums any more.

> For
> example, I recently benchmarked a Xerox 1186 running Interlisp (Koto)
> against the new Acorn Archimedes running Arthur Norman's Cambridge Lisp.
> Generally the Arch was a bit faster, reflecting the simpler lisp and
> faster processor. But running (fact 1000) it was 321 (three hundred and
> twenty one) times faster - and this must reflect grotesque inefficiency in
> Xerox' bignum code.

If it was "recently", didn't you have the Lyric release?

I am shocked to hear that the Archimedes was only "a bit faster".
As you can easily find out if you have Xerox Quintus Prolog, it is a _lot_
faster at list processing than Xerox Lisp is.  I would have expected
Cambridge Lisp on a RISC to be about as fast as XQP.  What's wrong?

The (fact 1000) test may reflect only the simple fact that the
Archimedes has a 32-bit ALU, whereas the Xerox 1186 has a basically
16-bit ALU.  It is easy to pull apart an Interlisp bignum and see what
is inside.  It is a list of FOURTEEN-bit 2-s complement integers.  It
works out at about 0.4 data bits per stored bit.  The vector approach
should get about 1 data bit per stored bit.

Another possible consideration is storage turnover.  (fact 1000) requires
over 8500 bits to represent [about 266 32-bit words, but about 610 list
cells].  The Xerox D-machines are not noted for their paging performance.
Did you make sure that all the relevant code was paged in before
measuring the performance of (fact 1000)?

------------------------------

Date: 26 Mar 88 06:12:05 GMT
From: quintus!ok@unix.sri.com  (Richard A. O'Keefe)
Subject: Strings

There are two separate and distinct ways of doing this using the
Quintus library, and both are documented in the Library manual.

In Section 8.3, you will find

	put_chars(+Chars)
and
	put_line(+Chars)

described.

These are exactly the writestring you want, except that they have
truthful names.  (put_chars/1 does 'put' to a 'chars', that is, to a
list of 'char's.  put_line does 'put' to a list of character codes
considered as a line; it's the equivalent of puts(3S) in the C library.)

There is also a library package called library(print_chars) which
makes the 'print'ing of chars values entirely automatic.  It is
described in section 9-2-2 of the library manual.  The point of this
is that once you have done

	| ?- ensure_loaded(library(print_chars)).

then top-level output, debugger output, and print/1 will have this sort
of effect:

	| ?- X = ["this","is","a","list","of","chars","values"].
	X = ["this","is","a","list","of","chars","values"]

	| ?- append("stuff", Tail, Chars).
	Tail = _537,
	Chars = "stuff|Tail"

{this is an excerpt from an actual transcript}.  If you look in the
DEC-10 Prolog library you will find the ancestor of print_chars under
the name PORSTR.PL.

> What does that tell us about the use of strings? It suggests to
> me that people actually always use atoms for this purpose, and somewhere 
> in their code is an implicit definition: 
>     writestring(String) :- name(Atom,String),write(Atom).  

What it actually tells us is that Chris Moss didn't look very hard in
the manual.  Even if the Quintus library didn't provide two ways of
doing this, what people have done in other dialects is to define

	puts([]).
	puts([C|Cs]) :- put(C), puts(Cs).

In fact, if I remember correctly, MU-Prolog extended put/1 to do this.
Why drag atoms into it?  In fact, the code that Chris Moss shows us will
not work:  writestring("00") would print a single "0".  {Yes, this is not
a nice thing for name/2 to do, which is why Quintus Prolog offers
atom_chars/2 _as well_.}

> 2. I don't like having ASCII values in my programs.

As Chris Moss explained later in his posting, this is not an argument AGAINST
having a notation for lists of character codes.  It is an argument FOR having
a notation for character constants.

> With a certain amount
> of discipline and inefficiency one can get round that, by saying, for 
> instance:
>     lowercase(X) :- "a"=[A], "z"=[Z], X>=A, X=<Z.

Why do that when you can write
	lowercase(X) :- "a" =< X, X =< "z".
Better yet, why not do
	lower_case(X) :- is_lower(X).
where is_lower/1 is one of the character classification predicates stolen
from C that I suggested in my 1984 "evaluable predicates" document (BSI
number PS/6).  (Quintus Prolog provides this, and NU Prolog provides it
under the name isLower/1.)  Best of all, is_lower/1 will solve for X...

In fact, while Chris Moss may not _like_ having ASCII values in his
programs, that's what he just did, and having a character type won't
make them go away.  His code for lowercase is quite wrong for EBCDIC.
It's also wrong for ISO 8859, where there are another 64 lower case
letters.  The Quintus Prolog library contains a package ctypes.pl which
comes in an EBCDIC version as well as an ASCII version, and
	is_lower(X)
is the right thing to do in either context.

This is part of what I meant in an earlier posting by asking what was the
right set of operations for a character type.  (How do you test whether a
character code stands for a Hebrew letter in the appropriate variant of
EBCDIC?  No fair looking in the 3270 manual, now.)

> OK, there is the 0'a notation (another way of writing 97 if you're using
> ASCII). Now that DOESN'T work in MuProlog, or Poplog or ...

It _DOES_ work in NU Prolog, the successor to MU Prolog.  (See section
3.2.2 paragraph (c) of the NU Prolog manual, version 1.1.)  And PopLog
has `a`, has it not?  It should be the work of a few minutes to convert
0'a to `a` or conversely by writing a VED or EMACS macro.

> On efficiency I mostly agree with Richard. I too like strings to be lists,
> and can't see the real benefits of a string type, though you don't tend to
> miss what you never had!

Try SNOBOL, PL/I, BASIC, Burroughs Extended Algol, AWK, Common Lisp, &c &c.
What a pain what a pain.

> As far as notation is concerned I have no better suggestion
> than 0'a though I don't much like it.

The reason that Edinburgh Prolog uses 0'x is that by the time we thought
of it, there were programs using all the other ASCII printing characters,
and the DEC-10 tokeniser already accepted 0'<digits> but didn't do anything
terribly sensible with it.  The only virtue I've ever claimed for it is
that it didn't break anything.  The Pop-11 `x` notation is obviously
superior as a notation.  (0'x also suggests, as it is meant to, that it
is an integer.  `x` doesn't suggest that, so would be more appropriate to
a separate "character" type.).

> I wouldn't object to the C solution, which allows them to be used 
> in arithmetic contexts. e.g. Num is Ch-0'0 or Letter =<0'z.

But the C solution is that characters *ARE* integers.
The constant expression 'a' in C has type "int", not "char".

> Most of my suggestions at main committee meetings have been ignored!

I'm very sorry to hear that, but the quality of the BSI built-in-predicates
work seemed to me to be far too low to be compatible with the assumption that
you had much of a hand in it.  This is not to say that I would expect to
_like_ your suggestions if I knew what they were, but I certainly would
expect them to be coherent and sensible.

------------------------------

Date: 27 Mar 88 00:22:18 GMT
From: defun.utah.edu!shebs@cs.utah.edu  (Stanley T. Shebs)
Subject: Strings

A minor point: I don't think I claimed that "`Lisp' didn't use lists for
bignums any more"; if I did, it was a boner.  For almost any possible design
choice, you can find one or more systems in the past that made that choice.
It is true that lists for bignums are unusual nowadays, but nobody (so far as
I know) has made a scientific study as to whether lists are better/worse than
vectors for representing bignums.  Anybody need a masters thesis?

>Did you make sure that all the relevant code was paged in before
>measuring the performance of (fact 1000)?

Ah yes, one of the nasty little details that a truly meaningful study
would have to take into account.  Still, I would expect all the relevant
code to get paged in after the first few bignums...

-- stan shebs

Date: 25 Mar 88 12:17:31 GMT
From: mcvax!enea!zyx!grzm@uunet.uu.net  (Gunnar Blomberg)
Subject:  behavior of read/get0 at end_of_file

Hmm...  isn't this a lot of fuss about very little?

It seems to me that whatever semantics is chosen, it is simple to get
the other:

BSIread(X) :-
   DEC10read(X),
   X \== end_of_file.

DEC10read(X) :-
   BSIread(Y),
   !,
   X = Y.
DEC10read(end_of_file).

Given that most Prologs seem to use the DEC-10 Prolog approach, and
that it is probably marginally more efficient to write BSIread in
terms of DEC10read than the other way around, the DEC-10 approach seems
the obvious choice.  Not that I think the other choice is all that
much worse...  Isn't it more interesting to discuss things where it is
harder to get it the way one wants (like the question raised by
Richard O'Keefe about whether a string data-type is necessary, or even
useful.  Now *that* is interesting!)

----------
At this point I had a discussion with a colleague of mine, and it
turns out that it isn't this simple.  In fact, I now believe that it
is impossible to get the BSIread functionality from a system that only
provides the DEC-10 one.  The predicate BSIread above will fail if the
file read contains 'end_of_file', of course.  This (for me) tips the
balance over in favor of the BSI approach.  It is after all easy to
write DEC10read in terms of BSIread.

Naturally there should be a provision for compatibility with "old"
programs.  I would be quite happy to name BSIread read_term, for
instance, and provide a user-level predicate read, that could be
redefined to give the required semantics.

-----------
As far as get0 goes, the question is much easier, since there *is* an
obvious out-of-band value, namely -1.

--  Gunnar Blomberg

------------------------------

Date: 25 Mar 88 12:14:47 GMT
From: mcvax!enea!zyx!grzm@uunet.uu.net  (Gunnar Blomberg)
Subject:  behavior of read/get0 at end_of_file

In article <801@cresswell.quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes:
>[...]  So the alternatives I can see at the moment
>are
>    o	construct a new string every time.
>    o	precompute 2^16 strings.
>    o	cache 2^8 strings, and construct a new string every
>	time for Kanji and other non-Latin alphabets.
>    o	not support Kanji or other non-Latin alphabets at all.

How about:
     o	support an immediate representation for characters.
if you've got room for them in your pointers. Or
     o	cache them as they occur.
if you haven't.

I can't see that the fact that characters look like one-element strings
to the Prolog programmer in any way would stop an implementor from
implementing them using the same tricks as if characters were a
separate data-type.  Yes, it makes the internal string-handling
somewhat more convoluted, but not unduly so, I would say.

-- Gunnar Blomberg

------------------------------

Date: 25 Mar 88 12:50:44 GMT
From: mcvax!enea!zyx!grzm@uunet.uu.net  (Gunnar Blomberg)
Subject:  behavior of read/get0 at end_of_file

	Well, considering the fact that nested comments can comment out
*any* part of the file, not just the last part, and that the cases
where nested comments do not work must be so exceedingly rare as to be
practically non-existent, I would definitely prefer nested comments.
Honestly, how often do you have unmatched beginning-of-nested-comment
of end-of-nested-comment buried inside your code?
	Well, just because nested comments are much more useful than
plain ones does not mean that BSI should adopt them.  There is the
question of supporting "old" code.  It would be interesting to know
how many programs would break if Prolog comments were changed to be
nesting.  Do you know of any?
[I have actually seen the following style used in C:
	/* #define wantFOO 1	/* To get foo feature */
	#define wantBAR 1	/* To get bar feature */
	/* #define wamtBAZ 1	/* To get baz feature */
 It gave me a good laugh at the time.]
	In any case, I have always considered the use of end_of_file
to get some kind of half-baked ability to comment out a part of a file
as an abomination (which does not mean I didn't use it and find it
useful).

--  Gunnar Blomberg

------------------------------

Date: 25 Mar 88 12:33:40 GMT
From: mcvax!enea!zyx!grzm@uunet.uu.net  (Gunnar Blomberg)
Subject:  behavior of read/get0 at end_of_file

In article <814@cresswell.quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes:
>Just to continue with the get0 topic:
>
>	The fail-at-end approach rests on an assumption
>	which deserves to be made explicit, because it is false.
>
>What is the assumption?   That receiving the end-of-file indication from
>an operating system indicates that there is nothing further to read in
>that stream.  This is false?  Yes.
>
[Lots of examples deleted]

This argumentation seems a little doubtful to me.  I don't have
experience with all the systems RAO'K mentions, but (to the best of my
memory) I have never seen a use of end-of-file from the terminal that
wasn't being used to pretend that the terminal was more than one file.

Cases in point:

DEC-10 Prolog (on TOPS-20, alas):
	User says [user], gives clauses and ends with ^Z.  The system
	pretends that there is a file 'user' by reading from the
	terminal until end-of-file is seen.  As far as Prolog is
	concerned the file ended at that point, and no more reading
	is done from that particular file at that point.

Using the terminal as standard input in Unix:
	Example: user types 'cat >foo' and then writes contents of file
	on terminal, indicating end by end-of-file.  As far as the
	reader of that particular input is concerned the file ended at
	that point, and no more reading is done from that particular
	'file'.

In conclusion:  I think that software conventions concerning
end-of-file from the terminal exist primarily to enable the
system/user to pretend that the terminal is more than one file.  In
fact, I know of no instance where this is not so.  Can somebody come
up with an example where multiple end-of-files are actually used in
one single ('conceptual') file?

--  Gunnar Blomberg

------------------------------

Date: 25 Mar 88 13:16:24 GMT
From: eagle!icdoc!ivax!cdsm@ucbvax.Berkeley.EDU  (Chris Moss)
Subject:  behavior of read/get0 at end_of_file

In article <801@cresswell.quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes:
rok>I find it extremely odd to call a string of length one a character.
rok> ...  But it because rather more
rok>clumsy on the D machines, which have a 16-bit character set.  (Can you say
rok>"Kanji"?  I knew you could.)

Yes, the BSI committee is just beginning to face up to this problem,
as the Japanese have just started taking an interest...
As Richard points out, it's not much problem for a character based
definition, which I personally would favour.

rok>The fail-at-end approach forces us not only to do something special
rok>with the get0/1 in rest_identifier/3, but in everything that calls it.

rok>(In the Prolog tokeniser, there are two such callers.)
rok>
rok>The point is that if-then-elses such as Meier suggests start
rok>appearing all over the place like maggots in a corpse if you adopt
rok>the fail-at-end approach, to the point of obscuring the underlying
rok>automaton.

I think this is a fair point when looking at the definition of
lexical analysers, however...

mmeier> I must say, none of the two seems to me satisfactory. Richard's
mm> 	version is not portable due to the -1 as eof character.

A character definition which included a (special) end-of-file token would be
better.

mm> 	BSI objects that if [read/1] returns e.g. the atom end_of_file
mm> 	then any occurrence of this atom in the source file
mm> 	could not be distinguished from a real end of file.
rok>
rok>That's not a bug, it's a feature!  I'm serious about that. 

I don't think that is any better than most uses of that particular
argument. Sure, if you learn to live with it you can find uses for it.

rok>Before taking end_of_file away from me, the BSI committee should supply
rok>me with a portable way of exiting a break level and a reliable method of
rok>leaving test cases in a file without having them always read.

And this is the death of any standardization process!  I have yet to
find the document that Richard referred to (a few days ago) when he
claimed that the BSI's mandate was to standardize Edinburgh Prolog.
It certainly hasn't been repeated in all the other formal
presentations that have been made to BSI or ISO. But if one has to
follow every wrinkle of an implementation just because it represents
(arguably) the most popular dialect, then why don't we just
appoint IBM to write all our standards for us (or Quintus or ...)?
[And who is the TRUE inheritor of the title "Edinburgh Prolog" anyway? Is
it the commercial product (formerly NIP) now being sold under that title?]

To return to the argument, I think there's a significant difference between
get0 and read. Having an end-of-file marker for read is (almost
never) used to implement finite-state-machines. Instead it is used
for repeat-fail loops.

e.g.  go :- repeat,
	    read(Term),
            (Term=end_of_file -> true; process(Term), fail).

Now in the days before tail recursion and all the other optimizations
this was inevitable. But why should we encourage this approach today?

The above clause is a good example of the trickiness of "repeat". I always
write repeat loops wrong first time and this was no exception. I put 
            (Term=end_of_file -> true; process(Term)), fail.
then changed it to
            (Term=end_of_file -> !; process(Term)), fail.
before settling on the above version.  I personally think "repeat" should
be left out of the standard (there's no penalty overhead in not having it
built-in these days anyway). Don't other people have my problem?

It would seem to encourage better programming if we allowed "get0"
(or get_file or whatever) to return an end-of-file token, and any
high-level routines to fail at end-of-file. It's not particularly
consistent, but I don't know whether that's a priority in this case.

rok>In fact, on my SUN right now I have function key F5 bound to
rok>"end_of_file.\n" so that I can get out of Prolog without running the
rok>risk of typing too many of them and logging out.

I seem to get by perfectly well by setting "ignoreeof" in my cshell!

rok>Ah, you'll say, but that's what nested comments are for!
rok>Well no, they don't work.  That's right, "#| ... |#" is NOT a reliable
rok>way of commenting code out in Common Lisp, and "/* ... */" is NOT a
rok>reliable way of commenting code out in PopLog. 

That seems to be the best argument for allowing end-of-line comments in
Prolog. Now where do I find the Emacs macro for commenting out all lines
between dot and mark (and removing such comments)?

Chris Moss

Disclaimer: unless I say otherwise I am expressing my personal opinions
NOT the opinions of any committee!
------------------------------

End of PROLOG Digest
********************