[gnu.emacs.bug] problem in manual

fr07+@ANDREW.CMU.EDU (Frank Ritter) (11/09/89)

[please forward to the separate manual group if appropriate]

I've run across what appears to be a bug.  The gnu-elisp manual uses
the code below as an example, but it doesn't work.

(string-match "ba\(na\)*" "banana")
    [p. 285, GNU emacs lisp manual, Emacs version 18, March 1988]

 (string-match "ba\\(na\\)*" "banana")

Since the above modified form works, I suspect that the text
explaining how "\( \)" works is wrong also.  The text and example
should be updated.

Sincerely,

Frank  ritter@psy.cmu.edu, @cs.cmu.edu

tale@pawl.rpi.edu (David C Lawrence) (11/10/89)

In <EZKMX9q00jWEA5_NN3@andrew.cmu.edu> fr07+@ANDREW.CMU.EDU (Frank Ritter):
Frank> I've run across what appears to be a bug.  The gnu-elisp manual uses
Frank> the code below as an example, but it doesn't work.

Frank> (string-match "ba\(na\)*" "banana")
Frank>     [p. 285, GNU emacs lisp manual, Emacs version 18, March 1988]
Frank>  (string-match "ba\\(na\\)*" "banana") [ ... does work ]

This is a common error which exists with beginning Emacs-Lisp
programmers because how the Lisp reader is involved does not seem
readily apparent to them.

From later in the regexp section of the manual, dated 1 Apr 89:

Manual> In Lisp syntax, the string constant begins and ends with a
Manual> double-quote.  `\"' stands for a double-quote as part of the
Manual> regexp, `\\' for a backslash as part of the regexp, [...]

In fact, the example which this manual gives regarding "banana" is not
in Lisp syntax at all, but rather in simple regexp syntax:

Manual> 2. To enclose a complicated expression for the postfix `*' to
Manual>    operate on.  Thus, `ba\(na\)*' matches `bananana', etc., with
Manual>    any (zero or more) number of `na' strings.

The doubling of the backslash is necessary because of how the Lisp
reader does backslash interpretation.  For anything that does not
introduce special quoting (ie, \C-?, \M-?, \e, \t, \", \\ and others)
the backslash is simply discarded.  To pass a backslash in a string it
must be doubled; the first will be discarded but the second will
remain.  In the first example above, the regular expression
interpreter is actually being handed "ba(na)*" which will match only
`ba(na' with any number of trailing closing parentheses.

In fact, one mistake more advanced Emacs-Lisp programmers sometimes
make is doubling the backslash out of habit.  For example, using
M-x re-search-backward from this sentence, entering "ba\\(na\\)*" will
move point to the beginning of `ba\(na\)*' but using "ba\(na\)*" will
move it to the word `backslash' in the previous sentence.

Dave
-- 
 (setq mail '("tale@pawl.rpi.edu" "tale@ai.mit.edu" "tale@rpitsmts.bitnet"))