[comp.lang.prolog] Strings

micha@ecrcvax.UUCP (Micha Meier) (03/11/88)

In Prolog Digest V5.100 Richard O'Keefe writes

"... if efficiency is your concern your are better off using lists
of character codes than strings."

May I ask this to be precised a little bit more? I can't see why is it
faster to handle structured data on the global stack rather than directly
characters - I guess this applies only for special predicates,
e.g. returning all substrings of a string, but even there it's in
no way evident that using lists is faster. Several points to make:

	- when using lists of character codes, each time when
	  the character code or the next element is accessed,
	  it has to be dereferenced and its type must be tested

	- copying a list is certainly slower than copying a string;
	  for strings you might need a fixed size buffer which
	  can complicate the things a little bit of you want
	  strings of arbitrary length, but it is easier to get
	  another buffer than to get another global stack

	- a list item needs at least 8 bytes, a character only one
	  (plus some fixed amount for each string, e.g. length, end or others).
	  If your string is longer than "aa", you need much more space
	  to handle it as a list, consequently more time due to swapping

	- garbage collection is easier for atoms than for strings
	  in the sense that there is only one reference to the name
	  of the atom, namely from the symbol table itself, but to
	  gather all accessible atoms one has to scan the stacks
	  and the program code anyway (unless you want to trail them
	  which is really not faster)

The point whether strings are necessary or not is another one - but
although our system allows atoms of any length, any number of atoms,
and garbage collection of atoms, we still think that having
strings is useful.

Disclaimer:
" I definitely do not try to defend BSI " :-)


--Micha

ok@quintus.UUCP (Richard A. O'Keefe) (03/14/88)

In article <512@ecrcvax.UUCP>, micha@ecrcvax.UUCP (Micha Meier) writes:
> In Prolog Digest V5.100 Richard O'Keefe writes
> 
> "... if efficiency is your concern your are better off using lists
> of character codes than strings."
> 
> May I ask this to be precised a little bit more? I can't see why is it
> faster to handle structured data on the global stack rather than directly
> characters - I guess this applies only for special predicates,
> e.g. returning all substrings of a string, but even there it's in
> no way evident that using lists is faster. Several points to make:
> 
I don't understand "precised".

Most implementations of strings in Prolog ARE "structured data on the
global stack".

When I say that it is more efficient to use lists of character codes
rather than packed byte vectors, I am reporting empirical measurements.
I don't have to explain it:  I'm just saying truthfully that in the
cases I have measured lists were faster.  There may be Prolog systems
which implement lists so badly that this isn't true.  I have to admit
that I've only really tested concatenation and searching, not accessing
single characters.

> 	- when using lists of character codes, each time when
> 	  the character code or the next element is accessed,
> 	  it has to be dereferenced and its type must be tested

On the contrary.  A great advantage of using lists is the fact that
you *don't* have to do anything to a list element *unless* you want
to process that particular element.  Contrast
	Chars0 = [Char1|Chars1]
which dereferences-and-type-tests Chars0 (it's one operation) and
assigns the two fields of Chars0 to Char1 and Chars1 *without*
dereferencing or type testing Char1, with
	N1 is N0+1,
	nth_char(N1, String, Char1)
where N1 has to be dereferenced-and-type-tested and checked for being
in the right range, then the character has to be extracted and *boxed*,
and finally assigned to Char1, also we had to do arithmetic on N0/N1.

If you don't like the arithmetic (and who could blame you), consider
instead the Turbo Prolog equivalent (taken from the first edition of
the Turbo Prolog manual, so it's probably wrong).
	frontchar(String0, Char1, String1)
Let's assume a conventional Prolog, rather than Turbo Prolog, which no
doubt does something clever.  String0 has to be dereferenced and type
tested (just like a list cell!) then the first character has to be
extracted and *boxed*, and a new string descriptor has to be created
and *boxed* and assigned to String1.  Assuming that a boxed character
costs one cell and a string descriptor costs two cells (not unreasonable),
walking down a string of N characters using frontchar/3 ends up using
3N cells for the characters and string tails *in addition* to the
packed byte vector we started with, whereas the list version would have
used 2N cells total.

> 	- copying a list is certainly slower than copying a string;
> 	  for strings you might need a fixed size buffer which
> 	  can complicate the things a little bit of you want
> 	  strings of arbitrary length, but it is easier to get
> 	  another buffer than to get another global stack

Why would you copy either?  If you have a variable X which is bound to a
string or list of character codes, and you want a variable Y to have the
same value, just do Y = X.  I don't understand "get another global stack".

If you're worried about space, consider the fact that with the usual
representations of lists and byte vectors, when you append something
in front of a list you can share the old tail, but with a byte vector
you have to make a copy.  That is, with Chars1 a known list of N
characters and Chars0 unknown,
	Chars0 = [Char1|Chars1]
costs 2 cells and 1 time unit, but with String1 a known string of N
characters and String0 unknown,
	frontchar(String0, Char1, String1)
costs N+1 bytes + 2 cells for the string descriptor and N+k time units.
(The version using nth_char/2 would turn over 2N cells.)
You can use 1 cell for a string descriptor rather than 2, but then the
cost of taking substrings goes way up.

> 	- a list item needs at least 8 bytes, a character only one
> 	  (plus some fixed amount for each string, e.g. length, end or others).
> 	  If your string is longer than "aa", you need much more space
> 	  to handle it as a list, consequently more time due to swapping
 
I have answered this above.  You might think about the fact that the
list representation will handle 16-bit characters (can you say "Kanji"?
I knew you could) at no extra cost, but the alleged space advantage of
byte vectors is chopped in half.

> 	- garbage collection is easier for atoms than for strings
> 	  in the sense that there is only one reference to the name
> 	  of the atom, namely from the symbol table itself, but to
> 	  gather all accessible atoms one has to scan the stacks
> 	  and the program code anyway (unless you want to trail them
> 	  which is really not faster)

This is an implementation detail.  There is no reason why a Prolog system
might have more pointers to an atom name than one.  In order to find all
accessible strings you have to scan the stacks too.

> we still think that having strings is useful.

Yes, but WHAT FOR?  We've seen above that the time and space costs can go
either way.

Here's a specific example:
    assuming that your Prolog system supports the ISO 8859/1 character set,
    and that it does *not* reflect dpANS C's setlocale() and strcollate()
    functions as Prolog string operations,

    write a predicate which will take two text objects representing
    German-language book titles, and compare them using whatever the
    convention is in German libraries.  To make it easier, we'll
    assume that there are no punctuation marks and that all numbers
    are spelled out.  Do this entirely in Prolog, using only the
    operations you already have.  (If you can stand it, try using
    the operations in the BSI draft.)

    make two versions of the predicate: one which uses byte vectors and
    one which uses lists, measure their space and time use, and show us
    the code, the test data, and the results.

Those who do not understand Prolog are	| Richard A. O'Keefe
condemned to reinvent Lisp, poorly.	| ...!sun!quintus!ok
-- 
Those who do not understand Prolog are
condemned to reinvent Lisp, poorly.

shebs%defun.utah.edu.uucp@utah-cs.UUCP (Stanley T. Shebs) (03/15/88)

In article <768@cresswell.quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes:

>When I say that it is more efficient to use lists of character codes
>rather than packed byte vectors, I am reporting empirical measurements.
>I don't have to explain it:  I'm just saying truthfully that in the
>cases I have measured lists were faster.  There may be Prolog systems
>which implement lists so badly that this isn't true.  I have to admit
>that I've only really tested concatenation and searching, not accessing
>single characters.

This is a very interesting assertion, because it has a lot of implications
for *any* vector/array-type representation.  The broadest interpretation
is that any and all vector-like types are bad.  This would also extend to
types that one might not think of as vector-like, such as infinite-precision
integers (bignums).  So, if Prolog strings are better as lists, is that also
true for Prolog bignums?

The reason I wonder is that Lisp has not used lists to represent bignums
since the mid-60s, and (for instance) JonL White's bignum paper in the
86 Lisp conference assumes a vector-like representation.  Off the top of
my head, it seems that the Prolog evaluation model is the ultimate reason
for the advantage of lists.  If not, O'Keefe's empirical measurements have
some consequences reaching far beyond contemporary Prolog, and Lispers
should start rethinking their implementation techniques...

Incidentally, David Wise has made similar claims for the advantages of
list representation in functional languages, but (if I recall correctly)
he also assumes binary trees and some sort of hardware memory support.

>	frontchar(String0, Char1, String1)
>Let's assume a conventional Prolog, rather than Turbo Prolog, which no
>doubt does something clever.  String0 has to be dereferenced and type
>tested (just like a list cell!) then the first character has to be
>extracted and *boxed*, and a new string descriptor has to be created
>and *boxed* and assigned to String1.  Assuming that a boxed character
>costs one cell and a string descriptor costs two cells (not unreasonable),
>walking down a string of N characters using frontchar/3 ends up using
>3N cells for the characters and string tails *in addition* to the
>packed byte vector we started with, whereas the list version would have
>used 2N cells total.

Boxing characters is just as bad as boxing small integers, and just as
unnecessary.  Avoiding the creation of a new string descriptor is harder,
and requires compiler analysis, but it is possible under many circumstances.

I think the real underlying motivation for strings is that languages like C
can do string processing with essentially zero space and time overhead,
and one would like to have Prolog to have the same characteristics.
It's not so much an issue of doing one operation on German titles, but of
passing a million characters of text through a program without allocating
so much as one byte of storage dynamically.  If a Prolog implementor could
guarantee that, then no one would care whether there was a separate string
datatype or not, and the implementor could make a lot of money!

							stan shebs
							shebs@cs.utah.edu

ok@quintus.UUCP (Richard A. O'Keefe) (03/15/88)

In article <5348@utah-cs.UUCP>, shebs%defun.utah.edu.uucp@utah-cs.UUCP (Stanley T. Shebs) writes:
> In article <768@cresswell.quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes:
> 
> >When I say that it is more efficient to use lists of character codes
> >rather than packed byte vectors, I am reporting empirical measurements.

> This is a very interesting assertion, because it has a lot of implications
> for *any* vector/array-type representation.  The broadest interpretation
> is that any and all vector-like types are bad.

I deny this.  The question is >how do you USE these things<.
If you are doing lots of concatenation and deconcatenation (I stole this
word from PL/I), lists are a win.  How often do you concatenate bignums?
How often do you concatenate matrices (APL programmers need not apply)?

If you need random access into some collection, a vector-like structure
is going to be a really good idea.  The point is that most uses of STRINGS
just aren't like that.  (Hint:  for which C data-type is the idiom *x++
most often used...)

> The reason I wonder is that Lisp has not used lists to represent bignums
> since the mid-60s, and (for instance) JonL White's bignum paper in the
> 86 Lisp conference assumes a vector-like representation.

Xerox Lisp uses lists for bignums to this very day.
There is a book:
	Computer Algebra: Symbolic and Algebraic Computation (2nd ed)
	eds Buchberger, Collins, & Loos
	Springer-Verlag, 1982, 1983
	ISBN 3-211-81776-X		(can anyone tell me what it means
	ISBN 0-387-81776-X		for a book to have 2 ISBNs?)
	Price US$ 32.
I bought this book mainly because of the chapter
	Arithmetic in Basic Algebraic Domains
	Collins, Mignotte, & Winkler
which describes a number of algorithms Knuth didn't.
This chapter claims quite strongly that list representation is better
for bignums.  The fact that one seldom if ever accesses the "elements"
of a bignum in other than strict sequential order suggests that this
may be a good idea.

Consider the fact that if you have an N-"bigit" number X,
X and X+k for small values of k can almost always share the most
significant (N-1) "bigits".

> Boxing characters is just as bad as boxing small integers, and just as
> unnecessary.  Avoiding the creation of a new string descriptor is harder,
> and requires compiler analysis, but it is possible under many circumstances.

I used "boxing" loosely.  I did not, as Shebs excusably thinks, mean
"allocating storage for", but "suitably locating in a register and
supplying with an appropriate tag".  This is cheap, but it isn't free.

> I think the real underlying motivation for strings is that languages like C
> can do string processing with essentially zero space and time overhead,

C can do this precisely because it ****HASN'T**** got a string data type!
For example, suppose I want to concatenate four chunks of text in C.
I can do
	char *strmov(register char *dst, register char *src)
	    {
		while (*dst++ = *src++) ;
		return dst-1;
	    }

	void conc4(char buffer[], char *a, char *b, char *c, char *d)
	    {
		(void) strmov( strmov( strmov( strmov( buffer,
			a), b), c), d);
	    }
Time overhead: each character is examined exactly once, as it is moved.
		no check for overflow or overlap.
Space overhead: static allocation of result area.

If you look closely, you will notice that there are no strings in that!
There are only pointers into buffers.

In Fortran 77, concatenation can be efficient because, once again,
Fortran 77 hasn't got strings, only buffers.  For example,
	CHARACTER *(80) BUFFER
	CHARACTER *(20) A, B, C, D
	BUFFER = A//B//C//D
There is no storage allocation.  (Unlike C, there *is* a check for
overflow, but in this case it can be done at compile time.)
ADA is similar to Fortran 77:
	declare
	    a, b, c, d : string(1..20);
	    buffer : string(1..a'length+b'length+c'length+d'length);
	begin
	    buffer := a & b & c & d;
	end;
The key to not allocating storage is not to have any operations which
create string values, but only commands which change the contents of
an existing buffer.  (At the source level you may have operations which
look as though they create string values, but the implementation mustn't
actually do that.)  One thing that helps in Fortran 77 and ADA is saying
in advance how big a string each buffer can hold.  C obtains a similar
advantage by not bothering to check.

> It's not so much an issue of doing one operation on German titles, but of
> passing a million characters of text through a program without allocating
> so much as one byte of storage dynamically.

Well, if *that* is what you want,

	copy_chars :-
		get0(Char),
		copy_chars(Char).

	copy_chars(-1) :- !.
	copy_chars(Char) :-
		put(Char),
		copy_chars.

will pass any number of characters through a program without allocating
so much as one byte of storage dynamically.   (:-)

Actually, if that's what you want, AAIS Prolog may be the language for you.
I forget the details, but it has an operation something like
	setchar(+Index, +String, +Char)
which changes the Indexth character of String to Char.  So in that language
you can have all the fun of buffers (and all the pain of buffer overflow).

shebs%defun.utah.edu.uucp@utah-cs.UUCP (Stanley T. Shebs) (03/16/88)

In article <776@cresswell.quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes:

>If you need random access into some collection, a vector-like structure
>is going to be a really good idea.  The point is that most uses of STRINGS
>just aren't like that.

In other words, if I use strings in the ways for which lists are a good
representation, fine, and if I don't, then my program is going to have
really shabby performance.

>	Arithmetic in Basic Algebraic Domains
>	Collins, Mignotte, & Winkler
>which describes a number of algorithms Knuth didn't.
>This chapter claims quite strongly that list representation is better
>for bignums.  The fact that one seldom if ever accesses the "elements"
>of a bignum in other than strict sequential order suggests that this
>may be a good idea.

Actually, I'm in the process of constructing a system that will (among
other things) try using lists to represent bignums.  Experimentation will
involve using both representations within a language, to see how they compare.

>I used "boxing" loosely.  I did not, as Shebs excusably thinks, mean
>"allocating storage for", but "suitably locating in a register and
>supplying with an appropriate tag".  This is cheap, but it isn't free.

Tagging it shouldn't be necessary, either, but I grant that the character
has to live in a register, at least temporarily - most machines are pretty
insistent on having their intermediate data in registers. :-)

>> I think the real underlying motivation for strings is that languages like C
>> can do string processing with essentially zero space and time overhead,
>
>C can do this precisely because it ****HASN'T**** got a string data type!

But surely you're not suggesting that Prologs provide "buffers"?  Perhaps
I'm the only person left in the world who cares about abstraction, but
I thought that the whole rationale for high-level languages was to have
datatypes that are *not* exact equivalents of machine structures, and it's
the business of the language implementation to choose structures for those
datatypes that are both compact and efficient.  If lists are *always* the
most efficient approach for processing sequences, fine.  If they are not,
then the implementation should use something else.

Ah well, these days the drumbeat is for low-level languages like C to do
everything.  Who needs Prolog when you can hack the bits directly, eh?

							stan shebs
							shebs@cs.utah.edu

ok@quintus.UUCP (Richard A. O'Keefe) (03/16/88)

In article <5349@utah-cs.UUCP>, shebs%defun.utah.edu.uucp@utah-cs.UUCP (Stanley T. Shebs) writes:
> In article <776@cresswell.quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes:
> 
> >If you need random access into some collection, a vector-like structure
> >is going to be a really good idea.  The point is that most uses of STRINGS
> >just aren't like that.
> 
> In other words, if I use strings in the ways for which lists are a good
> representation, fine, and if I don't, then my program is going to have
> really shabby performance.
> 
The point is that there is no one implementation of the abstract data type
"sequence of character" which is best for all purposes, which is hardly a
surprise.  My claim is that the implementation which is best for most of
the tasks people claim strings are good for is lists, not packed vectors of
bytes.  Is having only one representation available a good idea?  Of course
not.  Does that mean that packed vector of byte should be provided?  Not
necessarily.  A bearable-all-round implementation of logical arrays might
be a better idea.

> But surely you're not suggesting that Prologs provide "buffers"?

No, of course not.  But NOT supporting abstraction and sticking as close
to the machine as possible is how C gets the speed it does.  Specifically,
C can handle text fast precisely because it HASN'T got a string data
type.  The point was that you can't use C or Fortran or ADA as a stick
to beat Prolog with while shouting how wonderful *strings* are.  If the
goal is, as Shebs suggested in an earlier posting, to have no dynamic
allocation costs, show me how to achieve that in an implementation of
Lisp or Icon, and then we'll talk about achieving zero dynamic
allocation in Prolog.  Don't tell me that strings are so wonderful &c &c
that Prolog should have them and use C as an example to support your
case, it doesn't.

Anybody who thinks he can find in any of my postings an argument that
data abstraction is bad or that lists are a universal cure had better
take a course in reading English.  The claims I have made about strings
in Prolog are simply that

 	For most of the things I have seen strings used for in Prolog,
	the packed vector of byte representation is clumsy to use or
	slower than the list representation and usually both, and the
	use of string operations can often be avoided entirely by the
	use of proper abstractions that express what you really want.

As an example of the latter, if you want to represent arithmetic
expressions, it is clearer, faster, and a better use of abstraction,
to use trees (such as a+b) rather than strings (such as $a+b$), and
if you use data abstraction, only the module that defines the operations
will know which representation you are using.

lhe@sics.se (Lars-Henrik Eriksson) (03/18/88)

In article <776@cresswell.quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes:
>	ISBN 3-211-81776-X		(can anyone tell me what it means
>	ISBN 0-387-81776-X		for a book to have 2 ISBNs?)

It means it is published at two places at once. For instance, in my copy of
the ALGOL 68 report, you can find the lines:

ISBN 3-540-07545-3 Springer-Verlag Berlin * Heidelberg * New York
ISBN 0-387-07545-3 Springer-Verlag New York * Heidelberg * Berlin

I.e. the first one was printed in Germany, the second one in the U.S.
Note that only the publisher parts of the numbers differ.

Lars-Henrik Eriksson				Internet: lhe@sics.se
Swedish Institute of Computer Science		Phone (intn'l): +46 8 752 15 09
Box 1263					Telefon (nat'l): 08 - 752 15 09
S-164 28  KISTA, SWEDEN

simon@comp.lancs.ac.uk (Simon Brooke) (03/24/88)

In article <776@cresswell.quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes:
>Xerox Lisp uses lists for bignums to this very day.

Yes, true. But please don't assume this means it is efficient. For
example, I recently benchmarked a Xerox 1186 running Interlisp (Koto)
against the new Acorn Archimedes running Arthur Norman's Cambridge Lisp.
Generally the Arch was a bit faster, reflecting the simpler lisp and
faster processor. But running (fact 1000) it was 321 (three hundred and
twenty one) times faster - and this must reflect grotesque inefficiency in
Xerox' bignum code.

cdsm@doc.ic.ac.uk (Chris Moss) (03/25/88)

I haven't contributed to the string discussion because I don't believe I
know all the answers. But I do have strong reservations about the current
Edinburgh implementations. Let me air them:

1. They don't get printed properly by "write". I don't like seeing
"Prolog" come out as [80,114,111,108,111,103]. OK, I can define a
"writestring", but not use it within a bigger structure without pain.  

Now here's an odd thing: there is no writestring routine in the Quintus
library that I can discover! Not in the built-ins or the (extensive)
library!  

What does that tell us about the use of strings? It suggests to
me that people actually always use atoms for this purpose, and somewhere 
in their code is an implicit definition: 
    writestring(String) :- name(Atom,String),write(Atom).  

Note that in most systems this involves an entry in the symbol table 
(reasonably slow) and a limit of 256 (or maybe 512) characters.

MuProlog DOES print out "Prolog" as a string, wherever it occurs.
Presumably it has to scan the list first. It uses list format whenever
there is a integer outside the range 32-127 in the list (it also gets tabs
right tho I don't understand why it finishes at 127 not 126). That's a
pretty good heuristic, tho I can imagine it going wrong on occasions. e.g.
if I read in a clause over several lines the string will contain newlines,
so my DCG debugging will still look horrid!

2. I don't like having ASCII values in my programs. With a certain amount
of discipline and inefficiency one can get round that, by saying, for 
instance:
    lowercase(X) :- "a"=[A], "z"=[Z], X>=A, X=<Z.
but it's not an ideal solution (tho close to normal arithmetic habits in
Prolog). These type of routines tend to be efficiency sensitive. 

OK, there is the 0'a notation (another way of writing 97 if you're using
ASCII). Now that DOESN'T work in MuProlog, or Poplog or ...

On efficiency I mostly agree with Richard. I too like strings to be lists,
and can't see the real benefits of a string type, though you don't tend to
miss what you never had!

So, what I would like to see is not a string type but a CHARACTER type. 
Small integers work in Ed-Prolog mainly because they're efficient for get
and put -- there's no symbol table lookup. It would be nice to treat
single character atoms as their ASCII value, but that's wrong because one
does want to hang things off atoms. So they would have to be a separate
data type. As far as notation is concerned I have no better suggestion
than 0'a tho I don't much like it.

Now I don't object to Pascal's ORD (tho Modula II's ORD is TERRIBLE.
It is defined as CARDINAL so one usually has to say VAL(INTEGER,ORD("a"))).
But one-way transfers aren't much of a problem semantically, so I
wouldn't object to the C solution, which allows them to be used in arithmetic
contexts. e.g. Num is Ch-0'0 or Letter =<0'z.

Thus the main difference with the present systems would be:
  1. 0'a wouldn't unify with 97. (I assume)
  2. write(0'a) would print out 0'a.
  3. write("abc") or write([0'a,0'b,0'c])  would print out as "abc".
  4. One would need a "CHR" function (but not ORD) in arithmetic
expressions.

The question is, "is it worth the change"?  For my pennyworth the answer
is "yes". NOT having a character type makes Prolog as archaic as Fortran
IV in computer-science terms. I don't much like the string type idea. And
if you say "you proposed it" you're wrong. I've only been concerned with
syntax and semantics, not the built-in-predicates (there are limits to the
number of committees any sane person can attend!). Strings have been mostly
discussed in the BIPs meetings. Most of my suggestions at main committee
meetings have been ignored! When it comes to the semantics one needs a
character type even in the present proposal.

The biggest anti-argument is that all sorts of Prolog programs might break
in weird ways because of difference 1. I don't know the answer to that.

Chris Moss
 

cdsm@ivax.doc.ic.ac.uk (Chris Moss) (03/25/88)

In article <241@gould.doc.ic.ac.uk> cdsm@doc.ic.ac.uk (Chris Moss) writes:
>Thus the main difference with the present systems would be:
>  2. write(0'a) would print out 0'a.
>  3. write("abc") or write([0'a,0'b,0'c])  would print out as "abc".

I meant, of course, writeq. write would output a and abc respectively
(and I can't put any quotes in that sentence!)

Chris Moss.
.

ok@quintus.UUCP (Richard A. O'Keefe) (03/26/88)

In article <241@gould.doc.ic.ac.uk>, cdsm@doc.ic.ac.uk (Chris Moss) writes:
> Now here's an odd thing: there is no writestring routine in the Quintus
> library that I can discover! Not in the built-ins or the (extensive)
> library!  

There are two separate and distinct ways of doing this using the
Quintus library, and both are documented in the Library manual.

In Section 8.3, you will find

	put_chars(+Chars)
and
	put_line(+Chars)

described.

These are exactly the writestring you want, except that they have
truthful names.  (put_chars/1 does 'put' to a 'chars', that is, to a
list of 'char's.  put_line does 'put' to a list of character codes
considered as a line; it's the equivalent of puts(3S) in the C library.)

There is also a library package called library(print_chars) which
makes the 'print'ing of chars values entirely automatic.  It is
described in section 9-2-2 of the library manual.  The point of this
is that once you have done

	| ?- ensure_loaded(library(print_chars)).

then top-level output, debugger output, and print/1 will have this sort
of effect:

	| ?- X = ["this","is","a","list","of","chars","values"].
	X = ["this","is","a","list","of","chars","values"]

	| ?- append("stuff", Tail, Chars).
	Tail = _537,
	Chars = "stuff|Tail"

{this is an excerpt from an actual transcript}.  If you look in the
DEC-10 Prolog library you will find the ancestor of print_chars under
the name PORSTR.PL.

> What does that tell us about the use of strings? It suggests to
> me that people actually always use atoms for this purpose, and somewhere 
> in their code is an implicit definition: 
>     writestring(String) :- name(Atom,String),write(Atom).  

What it actually tells us is that Chris Moss didn't look very hard in
the manual.  Even if the Quintus library didn't provide two ways of
doing this, what people have done in other dialects is to define

	puts([]).
	puts([C|Cs]) :- put(C), puts(Cs).

In fact, if I remember correctly, MU-Prolog extended put/1 to do this.
Why drag atoms into it?  In fact, the code that Chris Moss shows us will
not work:  writestring("00") would print a single "0".  {Yes, this is not
a nice thing for name/2 to do, which is why Quintus Prolog offers
atom_chars/2 _as well_.}

> 2. I don't like having ASCII values in my programs.

As Chris Moss explained later in his posting, this is not an argument AGAINST
having a notation for lists of character codes.  It is an argument FOR having
a notation for character constants.

> With a certain amount
> of discipline and inefficiency one can get round that, by saying, for 
> instance:
>     lowercase(X) :- "a"=[A], "z"=[Z], X>=A, X=<Z.

Why do that when you can write
	lowercase(X) :- "a" =< X, X =< "z".
Better yet, why not do
	lower_case(X) :- is_lower(X).
where is_lower/1 is one of the character classification predicates stolen
from C that I suggested in my 1984 "evaluable predicates" document (BSI
number PS/6).  (Quintus Prolog provides this, and NU Prolog provides it
under the name isLower/1.)  Best of all, is_lower/1 will solve for X...

In fact, while Chris Moss may not _like_ having ASCII values in his
programs, that's what he just did, and having a character type won't
make them go away.  His code for lowercase is quite wrong for EBCDIC.
It's also wrong for ISO 8859, where there are another 64 lower case
letters.  The Quintus Prolog library contains a package ctypes.pl which
comes in an EBCDIC version as well as an ASCII version, and
	is_lower(X)
is the right thing to do in either context.

This is part of what I meant in an earlier posting by asking what was the
right set of operations for a character type.  (How do you test whether a
character code stands for a Hebrew letter in the appropriate variant of
EBCDIC?  No fair looking in the 3270 manual, now.)

> OK, there is the 0'a notation (another way of writing 97 if you're using
> ASCII). Now that DOESN'T work in MuProlog, or Poplog or ...

It _DOES_ work in NU Prolog, the successor to MU Prolog.  (See section
3.2.2 paragraph (c) of the NU Prolog manual, version 1.1.)  And PopLog
has `a`, has it not?  It should be the work of a few minutes to convert
0'a to `a` or conversely by writing a VED or EMACS macro.

> On efficiency I mostly agree with Richard. I too like strings to be lists,
> and can't see the real benefits of a string type, though you don't tend to
> miss what you never had!

Try SNOBOL, PL/I, BASIC, Burroughs Extended Algol, AWK, Common Lisp, &c &c.
What a pain what a pain.

> As far as notation is concerned I have no better suggestion
> than 0'a though I don't much like it.

The reason that Edinburgh Prolog uses 0'x is that by the time we thought
of it, there were programs using all the other ASCII printing characters,
and the DEC-10 tokeniser already accepted 0'<digits> but didn't do anything
terribly sensible with it.  The only virtue I've ever claimed for it is
that it didn't break anything.  The Pop-11 `x` notation is obviously
superior as a notation.  (0'x also suggests, as it is meant to, that it
is an integer.  `x` doesn't suggest that, so would be more appropriate to
a separate "character" type.).

> I wouldn't object to the C solution, which allows them to be used 
> in arithmetic contexts. e.g. Num is Ch-0'0 or Letter =<0'z.

But the C solution is that characters *ARE* integers.
The constant expression 'a' in C has type "int", not "char".

> Most of my suggestions at main committee meetings have been ignored!

I'm very sorry to hear that, but the quality of the BSI built-in-predicates
work seemed to me to be far too low to be compatible with the assumption that
you had much of a hand in it.  This is not to say that I would expect to
_like_ your suggestions if I knew what they were, but I certainly would
expect them to be coherent and sensible.

ok@quintus.UUCP (Richard A. O'Keefe) (03/26/88)

In article <488@dcl-csvax.comp.lancs.ac.uk>, simon@comp.lancs.ac.uk (Simon Brooke) writes:
> In article <776@cresswell.quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes:
> >Xerox Lisp uses lists for bignums to this very day.
> 
> Yes, true. But please don't assume this means it is efficient.

Well, no, I didn't assume that.  I was replying to a message which claimed
that "Lisp" didn't use lists for bignums any more.

> For
> example, I recently benchmarked a Xerox 1186 running Interlisp (Koto)
> against the new Acorn Archimedes running Arthur Norman's Cambridge Lisp.
> Generally the Arch was a bit faster, reflecting the simpler lisp and
> faster processor. But running (fact 1000) it was 321 (three hundred and
> twenty one) times faster - and this must reflect grotesque inefficiency in
> Xerox' bignum code.

If it was "recently", didn't you have the Lyric release?

I am shocked to hear that the Archimedes was only "a bit faster".
As you can easily find out if you have Xerox Quintus Prolog, it is a _lot_
faster at list processing than Xerox Lisp is.  I would have expected
Cambridge Lisp on a RISC to be about as fast as XQP.  What's wrong?

The (fact 1000) test may reflect only the simple fact that the
Archimedes has a 32-bit ALU, whereas the Xerox 1186 has a basically
16-bit ALU.  It is easy to pull apart an Interlisp bignum and see what
is inside.  It is a list of FOURTEEN-bit 2-s complement integers.  It
works out at about 0.4 data bits per stored bit.  The vector approach
should get about 1 data bit per stored bit.

Another possible consideration is storage turnover.  (fact 1000) requires
over 8500 bits to represent [about 266 32-bit words, but about 610 list
cells].  The Xerox D-machines are not noted for their paging performance.
Did you make sure that all the relevant code was paged in before
measuring the performance of (fact 1000)?

shebs%defun.utah.edu.uucp@utah-cs.UUCP (Stanley T. Shebs) (03/27/88)

A minor point: I don't think I claimed that "`Lisp' didn't use lists for
bignums any more"; if I did, it was a boner.  For almost any possible design
choice, you can find one or more systems in the past that made that choice.
It is true that lists for bignums are unusual nowadays, but nobody (so far as
I know) has made a scientific study as to whether lists are better/worse than
vectors for representing bignums.  Anybody need a masters thesis?

>Did you make sure that all the relevant code was paged in before
>measuring the performance of (fact 1000)?

Ah yes, one of the nasty little details that a truly meaningful study
would have to take into account.  Still, I would expect all the relevant
code to get paged in after the first few bignums...

							stan shebs
							shebs@cs.utah.edu