PROLOG-REQUEST@SUSHI.STANFORD.EDU (Chuck Restivo, The Moderator) (01/07/88)
PROLOG Digest Thursday, 7 Jan 1988 Volume 6 : Issue 5
Today's Topics:
Implementation - Strings
----------------------------------------------------------------------
Date: Thu, 31 Dec 87 11:32:44 EST
From: Michael Covington <MCOVINGT%UGA.BITNET@forsythe.stanford.edu>
Subject: Strings & Back Patting
In regards to Richard O'Keefe's article on string concatenation
(Prolog Digest, Volume 5, Number 100): He has certainly hit the nail
on the head in pointing out the utility of formatted i/o (i.e.,
predicates analogous to printf and scanf in C). If I can remember
right, _all_ of the string handling in our forthcoming textbook
(Prolog Programming in Depth, by Covington, Nute, and Vellino) was
motivated by the lack of a formatted i/o package. Some Prologs have
these, but we were writing highly portable Prolog and couldn't rely on
anything.
The more general point is well taken -- string processing within a
program is usually a substitute for something that could be done better
with Prolog data structures. Take your input, parse it up into meaningful
data structures as soon as possible, and convert it back to characters
at the very moment of output.
------------------------------
Date: 5 Jan 88 18:25:25 GMT
From: lagache@violet.Berkeley.EDU (Edouard Lagache)
Subject: String operations in PROLOG: some comments.
I was quite intrigued by Richard A. O'Keefe's comments on the need (or
lack of) string operations in PROLOG. He focused on the question of
concatenation. However, there is a number of basic operations that are
required for decent string handling. As a reference one can take the
"C" string library, which includes (among others):
index = locate a character from the beginning of a string.
strcat = Concatenate two strings.
strcpy = Create a copy of a string.
strcmp = Compare two strings.
strlen = return the length of a string.
"mid$" = Returns a substring between specified intervals (non-
standard, but easily implemented).
I think that some of the above functions do have a place in a
PROLOG predicate library. 'strcmp', for example, is a very useful tool
for making sense of string data, and 'index' and 'mid$' can be used to
get at embedded control characters in strings that represent anything
from device control instructions, to templates for creating pseudo
natural language output.
However, there is a larger issue related to this question. Much
of the work with strings is dumb old procedural operations. Since
procedural operations run counter to PROLOG's way of doing business,
there is an obvious friction involved. There is two ways to resolve
that friction: One way is to try to come up with string primitives that
deal with all the procedural tasks internally and yet represent a
"useful" declarative concept for PROLOG programmer. I suppose that is
what Dr. O'Keefe is seeking, and trying to come up with such primitives
will require much thought and reflection from the PROLOG user
community.
The other approach would be to permit a real dichotomy between
procedural tasks and declarative programming via a second programming
language interface. Several PROLOGs now can call "C" functions, and in
some sense mixing "C" or LISP with PROLOG permits users to have the
best of both worlds.
The risk with that second approach is potential poisoning of the
declarative style of PROLOG with poorly chosen "C" interfaces. Thus,
in some sense we are reduced back the first option. Even if individual
programmers have the freedom to write procedural routines to interface
to PROLOG, those routines must be written to support and enhance the
logical structure of the PROLOG program they are written for.
In that sense PROLOG is a very demanding language, but I suppose
that is where its real power resides.
-- Edouard Lagache
P.S. I don't know how other PROLOGs are set up with respect to string
and/or atom manipulatives. It may well be the case that most of
my points are mute on the more potent PROLOGs.
------------------------------
Date: Wed, 6 Jan 88 00:14:20 PST
From: quintus!ok@Sun.COM (Richard A. O'Keefe)
Subject: Strings
Lagache's message is sensible, but doesn't quite address the
point I am interested in. He says
> I think that some of the above functions do have a place in a
> PROLOG predicate library.
and
> One way is to try to come up with string primitives that
> deal with all the procedural tasks internally and yet represent a
> "useful" declarative concept for PROLOG programmer. I suppose that is
> what Dr. O'Keefe is seeking, and trying to come up with such primitives
> will require much thought and reflection from the PROLOG user
> community.
(1) The whole point of my original message was that I do *NOT* think
that strings (and therefore, operations on strings) should be in
Prolog. Data interchange with a host language such as Lisp which
has a string data type is a good reason for supporting strings at
some level (making (=)/2, compare/3, and write/1 do something
sensible with them), but my position is that Prolog has no need
of strings for its own sake.
(2) I am not interested in the question "what is a good set of operations
for strings". I think I already know the answer. Quintus customers
with release 2.0 or later have it. Yes, it did take a couple of
years of thought. The answer is by APL out of Interlisp, with me as
midwife rather than inventor. The cornerstone of the Quintus strings
library is the predicate
midstring(ABC, B, AC, LenA, LenB, LenC)
which is true when ABC, B, and AC are all the same kind of text
object (that is, all three are strings or all three are atoms),
and there are text objects A and C such that
ABC = A // B // C
AC = A // C
and LenA, LenB, and LenC are the lengths of A, B, and C respectively.
Strictly speaking, all you really need is the operation I proposed
back in 1984:
string_chars(String, Chars)
which is true when String is a string, Chars is a list of character
codes, and they represent the same sequence of characters.
The operations in the Quintus Prolog library(strings) package work
together very nicely, and are as multi-directional as I could make
them.
Lagache says
> [O'Keefe] focused on the question of
> concatenation. However, there is a number of basic operations that are
> required for decent string handling. As a reference one can take the
> "C" string library
(3) I focussed on concatenation for a very simple reason.
My previous message was about the question:
Is there any good reason, other than compatibility with a
host language which provides strings, to have strings AT ALL
in Prolog?
This suggested a related questions:
How do people USE string operations when they are available?
Are these good ways, or would something else be better?
Concatenation is the most commonly used string operation. (Note
that midstring/6 is concatenation, finding length, search,
inserting, deleting, or extracting a substring, or lots of other
things, depending on how you call it.)
I examined one particular program which used concatenation heavily,
and found that in nearly every case it was inappropriate. My
previous message had a list of some of these uses, and suggested
more appropriate things to do.
The question remains open: other than communication with another
language which has strings, are there any applications for which
string operations are the best choice in Prolog, rather than say
lists of character codes or atoms. I was hoping for responses from
people who had practical experience of using strings in Prolog (or
Lisp for that matter). In particular, I would like to see responses
of the form
"I use strings to solve such and such a problem.
Here is a sketch of the code."
If any responses come up, I mean to justify my opinion against
strings (it is NOT a prejudice, because when I started programming,
I thought strings were a wonderful idea) by suggesting ways of
solving the same kind of problem in Prolog without the use of strings.
If I can't do this, or if the non-string solutions turn out to be
intrinsically inefficient, I shall admit that I was wrong.
(4) The reason why C is such a good language for text processing is
that it HASN'T got a built-in string data type. The C library
actually supports three different string representations:
str*() functions : terminated by NUL only
strn*() functions : terminated by a count or NUL, whichever
runs out first
mem*() functions : terminated by a count only
Version 6+ UNIX also included a "big strings" package, I believe
that the bc(1) utility used it. The package disappeared in later
UNICes. I have my own C strings package which provides about 110
functions (some of them macros). I am thoroughly familiar with
the string operations in C (the standard ones and the Lattice C
ones, I don't know Whitesmiths C at all). I am surprised that
anyone would suggest taking them as a model. (There is no
standard operation for taking a substring or for searching for
a given string.) When the Quintus strings package was designed,
I considered APL, Interlisp, Common Lisp, PL/I, and SNOBOL. I'm
not sure what Scheme has, but T hasn't got anything Common Lisp
hasn't.
Lagache specifically mentions the following operations:
> index = locate a character from the beginning of a string.
> strcat = Concatenate two strings.
> strcpy = Create a copy of a string.
> strcmp = Compare two strings.
> strlen = return the length of a string.
> "mid$" = Returns a substring between specified intervals (non-
standard, but easily implemented).
(5) I repeat that my original question was "is there any practical
reason to have strings AT ALL", not "what string operations
should there be". All of these operations (with the exception
of strcpy()) are useful, and should be synthesisable.
Let's see first how we could implement these operations using
midstring/6 plus the existing built-in operations, and then how
we would implement them with lists of character codes.
index(Whole, Part, Offset) :-
midstring(Whole, Part, _, Offset, _, _).
strcat(A, B, AB) :-
midstring(AB, B, A, _, _, 0).
strcpy(X, X).
strcmp(Order, X, Y) :-
compare(Order, X, Y).
strlen(Text, Length) :-
midstring(Text, Text, 0, Length, 0).
'mid$'(Whole, Offset, Length, Part) :-
midstring(Whole, Part, _, Offset, Length, _).
So adding midstring/6 and string_chars/2, plus extending (=)/2,
compare/3, and write/1, appears to be a sufficient basis for
synthesising the required operations. Which is not to say that
this is the best interface for the programmer.
How would we do this using lists of character codes?
index(List, Pattern, Offset) :-
index(Pattern, Offset, List, _).
index(Pattern, Offset) -->
len(Offset),
Pattern.
strcat(A, Z, AZ) :-
append(A, Z, AZ).
strcpy(X, X).
strcmp(Order, X, Y) :-
compare(Order, X, Y).
strlen(List, Length) :-
length(List, Length).
'mid$'(List, Drop, Take, Section) :-
'mid$'(Drop, Take, Section, List, _).
'mid$'(Drop, Take, Section) -->
len(Drop),
len(Take, Section).
The non-terminals len(L) and len(L,V) mimic the SNOBOL constructs
LEN(L) and LEN(L) $ V respectively. A variable in a grammar rule
mimics the SNOBOL construct *Pattern. Thus index/3 is true
when after skipping Offset items in List there is something that
matches Pattern, and 'mid$'/4 is true when after skipping Drop
items from List the next Take items match the list Section.
'mid$'/4 is multi-directional. The Pattern argument of index/3
cannot be solved for, but that is because it can be anything that
would be legal as the body of a grammar rule. A list is of course
just such a legal body.
I just stopped and benchmarked this version of 'mid$' against
the byte-vector version, hoping for a clear result. But sometimes
one was faster and sometimes the other.
(6) The question/challenge remains:
what, if anything, are strings *better* for?
Remember: if you are generating output, it is better to avoid
strings entirely. What are strings better than trees for,
internally?
------------------------------
End of PROLOG Digest
********************