hugh@hcrvx1.UUCP (Hugh Redelmeier) (11/03/86)
Some Pitfalls of Retrospective Patterns in String Substitution ============================================================== A number of text editors have substitute commands that will match patterns. This note discusses a problem that arises if patterns can "look back", that is, depend on context preceding the current point. In passing, a problem with null strings is discussed too. A "retrospective pattern" is one that can look back. In QED for the GE 600 machines, and its derivatives (notably UNIX ed and vi), the caret (^) means "match a null string at the start of a line". This pattern is retrospective in that it depends on what precedes the current position. Similarly, the University of Toronto version of ed has patterns for beginning of word (\{) and end of word (\}), and these depend on what precedes the current position (although not acknowledged, this code found its way into vi for the \< and \> operators). A pattern element that would match a specific column would also be retrospective. Many other retrospective patterns could be invented, but at least these ones exist and have proven to be useful. The problem with retrospective patterns arises when they are used in a global replacement command. How should a match be affected by a previous replacement performed by the same command? For concreteness, consider the vi command s/\<a//g applied to "aardvark". Should the result be "ardvark" or "rdvark"? I believe that the answer is "ardvark". My reasoning is that since a global substitute does not perform overlapping replacements, it should not match patterns based on the result of earlier substitutions. I implemented this interpretation in the University of Toronto version of ed by always pattern-matching against the original data, rather than the result of earlier steps in the substitute. This is a simple and intuitive technique. Unfortunately, the above technique does not apply in an EMACS context. EMACS has a query-replace-string command which is an interactive global substitute command. After each individual substitution, the resulting state must be presented to the user. Thus, subsequent matches should be performed on the modified text. To fix this problem in JOVE, I have used another mechanism. I have forced each retrospective pattern to fail if it looks back into the point of the preceding substitution. A more sophisticated approach would be needed to handle retrospective patterns that could look farther back. It is perhaps anomalous that a retrospective pattern can look back before the position at which the search started, but not before the point at which it restarted. But some examples have convinced me that this is desirable. Otherwise, there would be no way to match the start of the first line with '^', or the start of the first word with '\<' if it is at the start of the buffer. Substituting Null Strings ========================= It is often useful to replace certain null strings. For example, to prepend every line in a file with "> ", use the ed command: s/^/> / It is possible to get into an infinite loop with a naively implemented substitute command. The following command will cause ed to loop. The strange pattern is the simplest way of writing a null pattern. s/\(\)//g The next version will cause an error due to the line growing too long: s/\(\)/x/g The solution I have implemented is to force a search for a pattern that last matched a null string to resume one character farther on. Intuitively, this means that each null string in the subject is matched only once. This rule treats a pattern that has matched a null string as if it had a retrospective operator under the new JOVE rule. Hugh Redelmeier (416) 922-1937 utzoo!hcr!hugh