[comp.lang.prolog] PROLOG Digest V6 #5

PROLOG-REQUEST@SUSHI.STANFORD.EDU (Chuck Restivo, The Moderator) (01/07/88)
PROLOG Digest            Thursday, 7 Jan 1988       Volume 6 : Issue 5

Today's Topics:
                       Implementation - Strings
----------------------------------------------------------------------

Date: Thu, 31 Dec 87 11:32:44 EST
From: Michael Covington  <MCOVINGT%UGA.BITNET@forsythe.stanford.edu>
Subject: Strings & Back Patting

In regards to Richard O'Keefe's article on string concatenation
(Prolog Digest, Volume 5, Number 100): He has certainly hit the nail
on the head in pointing out the utility of formatted i/o (i.e.,
predicates analogous to printf and scanf in C). If I can remember
right, _all_ of the string handling in our forthcoming textbook
(Prolog Programming in Depth, by Covington, Nute, and Vellino) was
motivated by the lack of a formatted i/o package. Some Prologs have
these, but we were writing highly portable Prolog and couldn't rely on
anything.

  The more general point is well taken -- string processing within a
program is usually a substitute for something that could be done better
with Prolog data structures. Take your input, parse it up into meaningful
data structures as soon as possible, and convert it back to characters
at the very moment of output.

------------------------------

Date: 5 Jan 88 18:25:25 GMT
From: lagache@violet.Berkeley.EDU  (Edouard Lagache)
Subject: String operations in PROLOG: some comments.

     I was quite intrigued by Richard A. O'Keefe's comments on the need (or
     lack of) string operations in PROLOG.  He focused on the question of
     concatenation.  However, there is a number of basic operations that are
     required for decent string handling.  As a reference one can take the
     "C" string library, which includes (among others):

          index =   locate a character from the beginning of a string.

          strcat =  Concatenate two strings.

          strcpy =  Create a copy of a string.

          strcmp =  Compare two strings.

          strlen =  return the length of a string.

          "mid$" =  Returns a substring between specified intervals (non-
                    standard, but easily implemented).


          I think that some of the above functions do have a place in a
     PROLOG predicate library.  'strcmp', for example, is a very useful tool
     for making sense of string data, and 'index' and 'mid$' can be used to
     get at embedded control characters in strings that represent anything
     from device control instructions, to templates for creating pseudo
     natural language output.

          However, there is a larger issue related to this question.  Much
     of the work with strings is dumb old procedural operations.  Since
     procedural operations run counter to PROLOG's way of doing business,
     there is an obvious friction involved.  There is two ways to resolve
     that friction: One way is to try to come up with string primitives that
     deal with all the procedural tasks internally and yet represent a
     "useful" declarative concept for PROLOG programmer.  I suppose that is
     what Dr. O'Keefe is seeking, and trying to come up with such primitives
     will require much thought and reflection from the PROLOG user
     community.

          The other approach would be to permit a real dichotomy between
     procedural tasks and declarative programming via a second programming
     language interface.  Several PROLOGs now can call "C" functions, and in
     some sense mixing "C" or LISP with PROLOG permits users to have the
     best of both worlds.

          The risk with that second approach is potential poisoning of the
     declarative style of PROLOG with poorly chosen "C" interfaces.  Thus,
     in some sense we are reduced back the first option.  Even if individual
     programmers have the freedom to write procedural routines to interface
     to PROLOG, those routines must be written to support and enhance the
     logical structure of the PROLOG program they are written for.

          In that sense PROLOG is a very demanding language, but I suppose
     that is where its real power resides.

-- Edouard Lagache


     P.S. I don't know how other PROLOGs are set up with respect to string
          and/or atom manipulatives.  It may well be the case that most of
          my points are mute on the more potent PROLOGs.

------------------------------

Date: Wed, 6 Jan 88 00:14:20 PST
From: quintus!ok@Sun.COM (Richard A. O'Keefe)
Subject: Strings

    Lagache's message is sensible, but doesn't quite address the
point I am interested in.  He says
>    I think that some of the above functions do have a place in a
>    PROLOG predicate library.
and
>   One way is to try to come up with string primitives that
>   deal with all the procedural tasks internally and yet represent a
>   "useful" declarative concept for PROLOG programmer.  I suppose that is
>   what Dr. O'Keefe is seeking, and trying to come up with such primitives
>   will require much thought and reflection from the PROLOG user
>   community.

(1) The whole point of my original message was that I do *NOT* think
    that strings (and therefore, operations on strings) should be in
    Prolog.  Data interchange with a host language such as Lisp which
    has a string data type is a good reason for supporting strings at
    some level (making (=)/2, compare/3, and write/1 do something
    sensible with them), but my position is that Prolog has no need
    of strings for its own sake.

(2) I am not interested in the question "what is a good set of operations
    for strings".  I think I already know the answer.  Quintus customers
    with release 2.0 or later have it.  Yes, it did take a couple of
    years of thought.  The answer is by APL out of Interlisp, with me as
    midwife rather than inventor.  The cornerstone of the Quintus strings
    library is the predicate

        midstring(ABC, B, AC, LenA, LenB, LenC)

    which is true when ABC, B, and AC are all the same kind of text
    object (that is, all three are strings or all three are atoms),
    and there are text objects A and C such that
        ABC = A // B // C
        AC  = A    //   C
    and LenA, LenB, and LenC are the lengths of A, B, and C respectively.

    Strictly speaking, all you really need is the operation I proposed
    back in 1984:
        string_chars(String, Chars)
    which is true when String is a string, Chars is a list of character
    codes, and they represent the same sequence of characters.

    The operations in the Quintus Prolog library(strings) package work
    together very nicely, and are as multi-directional as I could make
    them.

Lagache says
>   [O'Keefe] focused on the question of
>   concatenation.  However, there is a number of basic operations that are
>   required for decent string handling.  As a reference one can take the
>   "C" string library

(3) I focussed on concatenation for a very simple reason.
    My previous message was about the question:
        Is there any good reason, other than compatibility with a
        host language which provides strings, to have strings AT ALL
        in Prolog?
    This suggested a related questions:
        How do people USE string operations when they are available?
        Are these good ways, or would something else be better?

    Concatenation is the most commonly used string operation.  (Note
    that midstring/6 is concatenation, finding length, search,
    inserting, deleting, or extracting a substring, or lots of other
    things, depending on how you call it.)

    I examined one particular program which used concatenation heavily,
    and found that in nearly every case it was inappropriate.  My
    previous message had a list of some of these uses, and suggested
    more appropriate things to do.

    The question remains open:  other than communication with another
    language which has strings, are there any applications for which
    string operations are the best choice in Prolog, rather than say
    lists of character codes or atoms.  I was hoping for responses from
    people who had practical experience of using strings in Prolog (or
    Lisp for that matter).  In particular, I would like to see responses
    of the form
        "I use strings to solve such and such a problem.
         Here is a sketch of the code."
    If any responses come up, I mean to justify my opinion against
    strings (it is NOT a prejudice, because when I started programming,
    I thought strings were a wonderful idea) by suggesting ways of
    solving the same kind of problem in Prolog without the use of strings.
    If I can't do this, or if the non-string solutions turn out to be
    intrinsically inefficient, I shall admit that I was wrong.

(4) The reason why C is such a good language for text processing is
    that it HASN'T got a built-in string data type.  The C library
    actually supports three different string representations:
        str*() functions        : terminated by NUL only
        strn*() functions       : terminated by a count or NUL, whichever
                                  runs out first
        mem*() functions        : terminated by a count only
    Version 6+ UNIX also included a "big strings" package, I believe
    that the bc(1) utility used it.  The package disappeared in later
    UNICes.  I have my own C strings package which provides about 110
    functions (some of them macros).  I am thoroughly familiar with
    the string operations in C (the standard ones and the Lattice C
    ones, I don't know Whitesmiths C at all).  I am surprised that
    anyone would suggest taking them as a model.  (There is no
    standard operation for taking a substring or for searching for
    a given string.)  When the Quintus strings package was designed,
    I considered APL, Interlisp, Common Lisp, PL/I, and SNOBOL.  I'm
    not sure what Scheme has, but T hasn't got anything Common Lisp
    hasn't.

Lagache specifically mentions the following operations:
> index =   locate a character from the beginning of a string.
> strcat =  Concatenate two strings.
> strcpy =  Create a copy of a string.
> strcmp =  Compare two strings.
> strlen =  return the length of a string.
> "mid$" =  Returns a substring between specified intervals (non-
            standard, but easily implemented).

(5) I repeat that my original question was "is there any practical
    reason to have strings AT ALL", not "what string operations
    should there be".  All of these operations (with the exception
    of strcpy()) are useful, and should be synthesisable.

    Let's see first how we could implement these operations using
    midstring/6 plus the existing built-in operations, and then how
    we would implement them with lists of character codes.

        index(Whole, Part, Offset) :-
                midstring(Whole, Part, _, Offset, _, _).

        strcat(A, B, AB) :-
                midstring(AB, B, A, _, _, 0).

        strcpy(X, X).

        strcmp(Order, X, Y) :-
                compare(Order, X, Y).

        strlen(Text, Length) :-
                midstring(Text, Text, 0, Length, 0).

        'mid$'(Whole, Offset, Length, Part) :-
                midstring(Whole, Part, _, Offset, Length, _).

    So adding midstring/6 and string_chars/2, plus extending (=)/2,
    compare/3, and write/1, appears to be a sufficient basis for
    synthesising the required operations.  Which is not to say that
    this is the best interface for the programmer.

    How would we do this using lists of character codes?

        index(List, Pattern, Offset) :-
                index(Pattern, Offset, List, _).

        index(Pattern, Offset) -->
                len(Offset),
                Pattern.

        strcat(A, Z, AZ) :-
                append(A, Z, AZ).

        strcpy(X, X).

        strcmp(Order, X, Y) :-
                compare(Order, X, Y).

        strlen(List, Length) :-
                length(List, Length).

        'mid$'(List, Drop, Take, Section) :-
                'mid$'(Drop, Take, Section, List, _).

        'mid$'(Drop, Take, Section) -->
                len(Drop),
                len(Take, Section).

    The non-terminals len(L) and len(L,V) mimic the SNOBOL constructs
    LEN(L) and LEN(L) $ V respectively.  A variable in a grammar rule
    mimics the SNOBOL construct *Pattern.  Thus index/3 is true
    when after skipping Offset items in List there is something that
    matches Pattern, and 'mid$'/4 is true when after skipping Drop
    items from List the next Take items match the list Section.

    'mid$'/4 is multi-directional.  The Pattern argument of index/3
    cannot be solved for, but that is because it can be anything that
    would be legal as the body of a grammar rule.  A list is of course
    just such a legal body.

    I just stopped and benchmarked this version of 'mid$' against
    the byte-vector version, hoping for a clear result.  But sometimes
    one was faster and sometimes the other.

(6) The question/challenge remains:
        what, if anything, are strings *better* for?
    Remember: if you are generating output, it is better to avoid
    strings entirely.  What are strings better than trees for,
    internally?

------------------------------

End of PROLOG Digest
********************