[net.emacs] Retrospective Pattern Matching

hugh@hcrvx1.UUCP (Hugh Redelmeier) (11/03/86)

Some Pitfalls of Retrospective Patterns in String Substitution
==============================================================

A number of text editors have substitute commands that will match
patterns.  This note discusses a problem that arises if patterns can
"look back", that is, depend on context preceding the current
point.  In passing, a problem with null strings is discussed too.

A "retrospective pattern" is one that can look back.  In QED for the
GE 600 machines, and its derivatives (notably UNIX ed and vi), the
caret (^) means "match a null string at the start of a line".  This
pattern is retrospective in that it depends on what precedes the
current position.  Similarly, the University of Toronto version of
ed has patterns for beginning of word (\{) and end of word (\}), and
these depend on what precedes the current position (although not
acknowledged, this code found its way into vi for the \< and \>
operators).  A pattern element that would match a specific column
would also be retrospective.  Many other retrospective patterns
could be invented, but at least these ones exist and have proven to
be useful.

The problem with retrospective patterns arises when they are used in
a global replacement command.  How should a match be affected by a
previous replacement performed by the same command?  For
concreteness, consider the vi command
	s/\<a//g

applied to "aardvark".  Should the result be "ardvark" or "rdvark"?
I believe that the answer is "ardvark".  My reasoning is that since a
global substitute does not perform overlapping replacements, it
should not match patterns based on the result of earlier
substitutions.

I implemented this interpretation in the University of Toronto
version of ed by always pattern-matching against the original data,
rather than the result of earlier steps in the substitute.  This is
a simple and intuitive technique.

Unfortunately, the above technique does not apply in an EMACS
context.  EMACS has a query-replace-string command which is an
interactive global substitute command.  After each individual
substitution, the resulting state must be presented to the user.
Thus, subsequent matches should be performed on the modified text.
To fix this problem in JOVE, I have used another mechanism.  I have
forced each retrospective pattern to fail if it looks back into the
point of the preceding substitution.  A more sophisticated approach
would be needed to handle retrospective patterns that could look
farther back.

It is perhaps anomalous that a retrospective pattern can look back
before the position at which the search started, but not before the
point at which it restarted.  But some examples have convinced me
that this is desirable.  Otherwise, there would be no way to match
the start of the first line with '^', or the start of the first word
with '\<' if it is at the start of the buffer.

Substituting Null Strings
=========================

It is often useful to replace certain null strings.  For example, to
prepend every line in a file with "> ", use the ed command:
	s/^/> /

It is possible to get into an infinite loop with a naively
implemented substitute command.  The following command will cause ed
to loop.  The strange pattern is the simplest way of writing a null
pattern.
	s/\(\)//g
The next version will cause an error due to the line growing too
long:
	s/\(\)/x/g

The solution I have implemented is to force a search for a pattern
that last matched a null string to resume one character farther on.
Intuitively, this means that each null string in the subject is
matched only once.  This rule treats a pattern that has matched a
null string as if it had a retrospective operator under the new JOVE
rule.

Hugh Redelmeier (416) 922-1937
utzoo!hcr!hugh