[comp.lang.lisp] Building a string, char-by-char -- more details

dkb@cs.brown.edu (Dilip Barman) (10/02/90)

What I'm trying to do seems fraught with problems and I thought I'd
ask my question again with more details.  Thanks to those who suggested
fill-pointers and treating a string as an array of characters, but I
am still having problems.  What I want to be able to do is parse sentences
from an input file, delimited by periods.  Other than periods, only a-z, A-Z,
slash (/), and dash (-) are considered non-white space.  Words are delimited
by white space.  What I want to create is not a string but a list component
(I discovered setting *print-escape* to nil to disable double quotes and
this should help).  So, in reading
            "This is sentence 1.
               This is * sentence2." 
I am trying to create:

( (THIS IS SENTENCE (NUMBER 1))
  (THIS IS SENTENCE2)
)

How can I coerce the string into being a list component??  This has
me stumped!  Thanks in advance.
Dilip Barman     dkb@cs.brown.edu
U.S. mail: Brown University                       Home: 40 Everett Avenue
           Dept. of Computer Science, Box 1910          Providence, RI 02906
           Providence, RI 02912 (401)863-7666           (401)521-9731

miller@cam.nist.gov (Bruce R. Miller) (10/02/90)

In article <51809@brunix.UUCP>, Dilip Barman writes: 
> What I'm trying to do seems fraught with problems and I thought I'd
> ask my question again with more details.  Thanks to those who suggested
> fill-pointers and treating a string as an array of characters, but I
> am still having problems.  What I want to be able to do is parse sentences
> from an input file, delimited by periods.  Other than periods, only a-z, A-Z,
> slash (/), and dash (-) are considered non-white space.  Words are delimited
> by white space.  What I want to create is not a string but a list component
> (I discovered setting *print-escape* to nil to disable double quotes and
> this should help).  So, in reading
>             "This is sentence 1.
>                This is * sentence2." 
> I am trying to create:
> 
> ( (THIS IS SENTENCE (NUMBER 1))
>   (THIS IS SENTENCE2)
> )
> 
> How can I coerce the string into being a list component??  This has
> me stumped!  Thanks in advance.

Why, CONS it onto something, of course!

In lisp ANYTHING can be a `list component'
(cons "FOO" NIL) -> ("FOO")
(cons 1 (cons "FOO" NIL))     -> (1 "FOO")
etc.

So, read characters and put them into a string, using the functions
mentioned, stop when you get to anything you consider a delimiter, cons
that onto a result and keep going till you get a period, then return the
reverse of the result.  This gives
("THIS" "IS" "SENTENCE2")
And your homework will be done in no time.
If you really want symbols instead of strings, use INTERN.
And dont bother with *print-escape*; that only affects how things print,
not how they are read.

bruce

eliot@phoenix.Princeton.EDU (Eliot Handelman) (10/02/90)

In article <51809@brunix.UUCP> dkb@cs.brown.edu (Dilip Barman) writes:
;
;How can I coerce the string into being a list component??  This has
;me stumped!  Thanks in advance.

If READ-FROM-STRING acted reasonably, you could use it like this:

(defun string->list (string)
  (let ((words '()) (index 0))
    (loop
     (multiple-value-bind (word next-index)
	 (read-from-string string nil nil :start index)
       (setq index next-index)  
       (if word
	   (push word words)
	   (return (nreverse words)))))))


> (string->list "It was a dark and stormy night")
(IT WAS A DARK AND STORMY NIGHT)

Unfortunately, READ-FROM-STRING throws an error if it sees read-macros,
like commas. The solution is to write your own version, which reads
characters from a string (using SCHAR), preprocesses special characters
(like comma), detects the end of the word, then hands the string and
indicies to SUBSEQ, which operates on the string. Intern the result,
push on a list, NREVERSE when done and voila! 

It really is a pain to to this in CL. It was so much easier in the
old Franz Lisp, because strings and atoms were identical. 

--eliot

moore%cdr.utah.edu@cs.utah.edu (Tim Moore) (10/02/90)

In article <2990@idunno.Princeton.EDU> eliot@phoenix.Princeton.EDU (Eliot Handelman) writes:
>In article <51809@brunix.UUCP> dkb@cs.brown.edu (Dilip Barman) writes:
>> [How do I turn a string sentence into a list?]
>If READ-FROM-STRING acted reasonably, you could use it like this:
>
>(defun string->list (string)
>  (let ((words '()) (index 0))
>    (loop
>     (multiple-value-bind (word next-index)
>	 (read-from-string string nil nil :start index)
>       (setq index next-index)  
>       (if word
>	   (push word words)
>	   (return (nreverse words)))))))

>Unfortunately, READ-FROM-STRING throws an error if it sees read-macros,
>like commas. The solution is to write your own version, which reads
>characters from a string (using SCHAR), preprocesses special characters
>(like comma), detects the end of the word, then hands the string and
>indicies to SUBSEQ, which operates on the string. Intern the result,
>push on a list, NREVERSE when done and voila! 

Rather than rewrite the reader, the solution is to do some readtable
hacking. For example:

(defvar sentence-read-table (copy-readtable))
(set-macro-character #\. #'(lambda (stream char) '|.|)
		     nil sentence-read-table)
(set-macro-character #\, #'(lambda (stream char) '|,|)
		     nil sentence-read-table)
;;; ... and so on.

(defun string->list (string)
  (let ((words '())
	(index 0)
	(*readtable* sentence-read-table))
    (loop
     (multiple-value-bind (word next-index)
         (read-from-string string nil nil :start index)
       (setq index next-index)
       (if word
           (push word words)
           (return (nreverse words)))))))

(string->list "Alas poor Yorick, I knew him well Horatio.")
(ALAS POOR YORICK |,| I KNEW HIM WELL HORATIO |.|)

If you are willing to sacrifice a character, a 4 line hack (plus
readtable initialization) that does the same thing is:

(defun string->list2 (string)
  (let ((*readtable* sentence-read-table))
    (with-input-from-string (s (concatenate 'simple-string string "`"))
      (read-delimited-list #\` s))))

>It really is a pain to to this in CL. It was so much easier in the
>old Franz Lisp, because strings and atoms were identical. 
>
>--eliot

It's not that hard in Common Lisp. CL's extensive macro character
syntax can get in the way, but a one-time setup of a new read table
gets around this. In some sense strings and symbols are equivalent, as
many CL string functions will take a symbol argument and coerce it to
a string. 

Tim Moore                    moore@cs.utah.edu {bellcore,hplabs}!utah-cs!moore
"Ah, youth. Ah, statute of limitations."
		-John Waters