PROLOG-REQUEST@SUSHI.STANFORD.EDU (Chuck Restivo, The Moderator) (01/07/88)
PROLOG Digest Thursday, 7 Jan 1988 Volume 6 : Issue 5 Today's Topics: Implementation - Strings ---------------------------------------------------------------------- Date: Thu, 31 Dec 87 11:32:44 EST From: Michael Covington <MCOVINGT%UGA.BITNET@forsythe.stanford.edu> Subject: Strings & Back Patting In regards to Richard O'Keefe's article on string concatenation (Prolog Digest, Volume 5, Number 100): He has certainly hit the nail on the head in pointing out the utility of formatted i/o (i.e., predicates analogous to printf and scanf in C). If I can remember right, _all_ of the string handling in our forthcoming textbook (Prolog Programming in Depth, by Covington, Nute, and Vellino) was motivated by the lack of a formatted i/o package. Some Prologs have these, but we were writing highly portable Prolog and couldn't rely on anything. The more general point is well taken -- string processing within a program is usually a substitute for something that could be done better with Prolog data structures. Take your input, parse it up into meaningful data structures as soon as possible, and convert it back to characters at the very moment of output. ------------------------------ Date: 5 Jan 88 18:25:25 GMT From: lagache@violet.Berkeley.EDU (Edouard Lagache) Subject: String operations in PROLOG: some comments. I was quite intrigued by Richard A. O'Keefe's comments on the need (or lack of) string operations in PROLOG. He focused on the question of concatenation. However, there is a number of basic operations that are required for decent string handling. As a reference one can take the "C" string library, which includes (among others): index = locate a character from the beginning of a string. strcat = Concatenate two strings. strcpy = Create a copy of a string. strcmp = Compare two strings. strlen = return the length of a string. "mid$" = Returns a substring between specified intervals (non- standard, but easily implemented). I think that some of the above functions do have a place in a PROLOG predicate library. 'strcmp', for example, is a very useful tool for making sense of string data, and 'index' and 'mid$' can be used to get at embedded control characters in strings that represent anything from device control instructions, to templates for creating pseudo natural language output. However, there is a larger issue related to this question. Much of the work with strings is dumb old procedural operations. Since procedural operations run counter to PROLOG's way of doing business, there is an obvious friction involved. There is two ways to resolve that friction: One way is to try to come up with string primitives that deal with all the procedural tasks internally and yet represent a "useful" declarative concept for PROLOG programmer. I suppose that is what Dr. O'Keefe is seeking, and trying to come up with such primitives will require much thought and reflection from the PROLOG user community. The other approach would be to permit a real dichotomy between procedural tasks and declarative programming via a second programming language interface. Several PROLOGs now can call "C" functions, and in some sense mixing "C" or LISP with PROLOG permits users to have the best of both worlds. The risk with that second approach is potential poisoning of the declarative style of PROLOG with poorly chosen "C" interfaces. Thus, in some sense we are reduced back the first option. Even if individual programmers have the freedom to write procedural routines to interface to PROLOG, those routines must be written to support and enhance the logical structure of the PROLOG program they are written for. In that sense PROLOG is a very demanding language, but I suppose that is where its real power resides. -- Edouard Lagache P.S. I don't know how other PROLOGs are set up with respect to string and/or atom manipulatives. It may well be the case that most of my points are mute on the more potent PROLOGs. ------------------------------ Date: Wed, 6 Jan 88 00:14:20 PST From: quintus!ok@Sun.COM (Richard A. O'Keefe) Subject: Strings Lagache's message is sensible, but doesn't quite address the point I am interested in. He says > I think that some of the above functions do have a place in a > PROLOG predicate library. and > One way is to try to come up with string primitives that > deal with all the procedural tasks internally and yet represent a > "useful" declarative concept for PROLOG programmer. I suppose that is > what Dr. O'Keefe is seeking, and trying to come up with such primitives > will require much thought and reflection from the PROLOG user > community. (1) The whole point of my original message was that I do *NOT* think that strings (and therefore, operations on strings) should be in Prolog. Data interchange with a host language such as Lisp which has a string data type is a good reason for supporting strings at some level (making (=)/2, compare/3, and write/1 do something sensible with them), but my position is that Prolog has no need of strings for its own sake. (2) I am not interested in the question "what is a good set of operations for strings". I think I already know the answer. Quintus customers with release 2.0 or later have it. Yes, it did take a couple of years of thought. The answer is by APL out of Interlisp, with me as midwife rather than inventor. The cornerstone of the Quintus strings library is the predicate midstring(ABC, B, AC, LenA, LenB, LenC) which is true when ABC, B, and AC are all the same kind of text object (that is, all three are strings or all three are atoms), and there are text objects A and C such that ABC = A // B // C AC = A // C and LenA, LenB, and LenC are the lengths of A, B, and C respectively. Strictly speaking, all you really need is the operation I proposed back in 1984: string_chars(String, Chars) which is true when String is a string, Chars is a list of character codes, and they represent the same sequence of characters. The operations in the Quintus Prolog library(strings) package work together very nicely, and are as multi-directional as I could make them. Lagache says > [O'Keefe] focused on the question of > concatenation. However, there is a number of basic operations that are > required for decent string handling. As a reference one can take the > "C" string library (3) I focussed on concatenation for a very simple reason. My previous message was about the question: Is there any good reason, other than compatibility with a host language which provides strings, to have strings AT ALL in Prolog? This suggested a related questions: How do people USE string operations when they are available? Are these good ways, or would something else be better? Concatenation is the most commonly used string operation. (Note that midstring/6 is concatenation, finding length, search, inserting, deleting, or extracting a substring, or lots of other things, depending on how you call it.) I examined one particular program which used concatenation heavily, and found that in nearly every case it was inappropriate. My previous message had a list of some of these uses, and suggested more appropriate things to do. The question remains open: other than communication with another language which has strings, are there any applications for which string operations are the best choice in Prolog, rather than say lists of character codes or atoms. I was hoping for responses from people who had practical experience of using strings in Prolog (or Lisp for that matter). In particular, I would like to see responses of the form "I use strings to solve such and such a problem. Here is a sketch of the code." If any responses come up, I mean to justify my opinion against strings (it is NOT a prejudice, because when I started programming, I thought strings were a wonderful idea) by suggesting ways of solving the same kind of problem in Prolog without the use of strings. If I can't do this, or if the non-string solutions turn out to be intrinsically inefficient, I shall admit that I was wrong. (4) The reason why C is such a good language for text processing is that it HASN'T got a built-in string data type. The C library actually supports three different string representations: str*() functions : terminated by NUL only strn*() functions : terminated by a count or NUL, whichever runs out first mem*() functions : terminated by a count only Version 6+ UNIX also included a "big strings" package, I believe that the bc(1) utility used it. The package disappeared in later UNICes. I have my own C strings package which provides about 110 functions (some of them macros). I am thoroughly familiar with the string operations in C (the standard ones and the Lattice C ones, I don't know Whitesmiths C at all). I am surprised that anyone would suggest taking them as a model. (There is no standard operation for taking a substring or for searching for a given string.) When the Quintus strings package was designed, I considered APL, Interlisp, Common Lisp, PL/I, and SNOBOL. I'm not sure what Scheme has, but T hasn't got anything Common Lisp hasn't. Lagache specifically mentions the following operations: > index = locate a character from the beginning of a string. > strcat = Concatenate two strings. > strcpy = Create a copy of a string. > strcmp = Compare two strings. > strlen = return the length of a string. > "mid$" = Returns a substring between specified intervals (non- standard, but easily implemented). (5) I repeat that my original question was "is there any practical reason to have strings AT ALL", not "what string operations should there be". All of these operations (with the exception of strcpy()) are useful, and should be synthesisable. Let's see first how we could implement these operations using midstring/6 plus the existing built-in operations, and then how we would implement them with lists of character codes. index(Whole, Part, Offset) :- midstring(Whole, Part, _, Offset, _, _). strcat(A, B, AB) :- midstring(AB, B, A, _, _, 0). strcpy(X, X). strcmp(Order, X, Y) :- compare(Order, X, Y). strlen(Text, Length) :- midstring(Text, Text, 0, Length, 0). 'mid$'(Whole, Offset, Length, Part) :- midstring(Whole, Part, _, Offset, Length, _). So adding midstring/6 and string_chars/2, plus extending (=)/2, compare/3, and write/1, appears to be a sufficient basis for synthesising the required operations. Which is not to say that this is the best interface for the programmer. How would we do this using lists of character codes? index(List, Pattern, Offset) :- index(Pattern, Offset, List, _). index(Pattern, Offset) --> len(Offset), Pattern. strcat(A, Z, AZ) :- append(A, Z, AZ). strcpy(X, X). strcmp(Order, X, Y) :- compare(Order, X, Y). strlen(List, Length) :- length(List, Length). 'mid$'(List, Drop, Take, Section) :- 'mid$'(Drop, Take, Section, List, _). 'mid$'(Drop, Take, Section) --> len(Drop), len(Take, Section). The non-terminals len(L) and len(L,V) mimic the SNOBOL constructs LEN(L) and LEN(L) $ V respectively. A variable in a grammar rule mimics the SNOBOL construct *Pattern. Thus index/3 is true when after skipping Offset items in List there is something that matches Pattern, and 'mid$'/4 is true when after skipping Drop items from List the next Take items match the list Section. 'mid$'/4 is multi-directional. The Pattern argument of index/3 cannot be solved for, but that is because it can be anything that would be legal as the body of a grammar rule. A list is of course just such a legal body. I just stopped and benchmarked this version of 'mid$' against the byte-vector version, hoping for a clear result. But sometimes one was faster and sometimes the other. (6) The question/challenge remains: what, if anything, are strings *better* for? Remember: if you are generating output, it is better to avoid strings entirely. What are strings better than trees for, internally? ------------------------------ End of PROLOG Digest ********************