chris@umcp-cs.UUCP (Chris Torek) (10/22/86)
(Warning: the following article will tell you more than you ever wanted to know about playing with regular expressions.) In article <1168@peregrine.UUCP> someone writes: >Since I have switched from vi to EMACS, there is one thing that I missed >more than anything else. The ability to perform an operation on all >the lines that met a particular criteria(specified by a regular expression). >For instance in vi, I could type in "/[A-Z][a-z]*/d" to delete all lines >that met the specified criteria or I could type in >"/\([A-Za-z][A-Za-z]*(\).*\()\)/s//\1\2". How would I do similar operations >in EMACS? (You left out the `g': `g/[A-Z][a-z]*/d'.) Some of these operations are best done by writing MLisp or elisp code, but note that a global delete operation is trivial due to the way regular expressions work, with the addition that Emacs can match newlines explicitly. Simply add `.*' at the front of your R.E., and add `.*<^J>' at the end: <ESC>x : re-replace-string<RET> Old pattern: .*[A-Z][a-z]*.*<^Q><^J><RET> New string: <RET> (Note that this should be done after moving to the top of the buffer, since Emacs's replace operations work from wherever you are now to the end of the buffer.) Since `.' matches any character but newline, and `{class}*' matches the longest possible sequence of {class}, this will always match full lines containing at least one [A-Z]. The pattern can be simplified as well. The [a-z]* part is unnecessary, as it matches zero or more `a's, `b's, ..., `z's. Yet the implied `.*' in vi's global, or the explicit one in Emacs, subsumes this: Old pattern: .*[A-Z].*<^Q><^J><RET> There is one final possible optimisation that is very useful when dealing with large files. Emacs's search code runs faster when it can do an `anchored search'. (I am not using `anchored' in quite the same sense as Snobol here. There may be a better term, but I cannot think of it offhand.) By this I mean that a first character that is considered `literal' speeds the matching operation. For example, searching for `[A-Z][A-Z]*' is slow, but searching for `A[A-Z]*' is fast. The reason is that a literal match (the first `A' here) is a common case, and has been optimised by having the search code first find one `A' before trying the full-blown regular expression match operation. But look at this: our original pattern is required to match a full line! It must start at the beginning of a line, find one character in [A..Z], match the rest of the line, then pick up a newline. So we should be able to `anchor' it to the beginning of a line. What begins a line? Well, `^' in a regular expression should do this. We could use the pattern ^.*[A-Z].*<^J> Unfortunately, this does not run any faster. Peeking at the innards of the regular expression matcher shows why: `^' is not considered a literal character. Curses! (No, not the library.) But lo! there is another way to denote the beginning of a line. Every line begins after the previous line ends, and every previous line ends with a newline! We can use instead the pattern <^J>.*[A-Z].*<^J> But---oops!---we forgot something. The very first line does not have a previous line. Now what can we do? When all else fails, cheat: Add a blank line at the top of the file. Now we have a previous line, and can use our modified pattern: Old pattern: <^Q><^J>.*[A-Z].*<^Q><^J><RET> New string: <RET> Whoops, that seems to have deleted all the newlines as well. That anchor we added came from the previous line, so we must put it back: New string: <^Q><^J><RET> But this is not necessary. Since we know all about how .* matches everything it can, we simply notice that that final newline on the original pattern is not necessary. If we leave it out, Emacs will not match the newline between the line we wanted to delete and the next. But that is all right: If we have Emacs leave that newline behind, it will make up for the newline we stole from the previous line. Thus the final pattern is: Old pattern: <^Q><^J>.*[A-Z].*<RET> New string: <RET> Of course, when we are all done we have to clean up: we stuck an extra blank line at the top of the buffer so that we could cheat. The ultimate sequence of commands, then, is ESC-< (top of buffer) ^O (add that extra blank line) ESC-x re-replace-string (do the replace) ^Q ^J .*[A-Z].* RET (type in the old pattern) RET (specify a blank new string) ^D (delete that extra blank line) And lo! Emacs deletes every line containing an uppercase letter. Not only that, it even does it faster than vi! :-) (Actually, chances are that typing ESC-< ^@ ESC-> ESC-x filter-region egrep -v "[A-Z]" RET is just as fast, and easier to remember. We can use a wrench as a hammer, but having the hammer too is nice.) -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690) UUCP: seismo!umcp-cs!chris CSNet: chris@umcp-cs ARPA: chris@mimsy.umd.edu
ihm@minnie.UUCP (Ian Merritt) (10/25/86)
There are some kinds of replacement functions for which I would really like the good ol' MIT-TECO (or even a reasonable subset) minibuffer. I realize this wouldn't be of much value for the newcomers to the EMACS world, but for many of us who have been using EMACS since ITS/TOPS-20, that was a really quick escape mechanism for certain transformations of medium complexity which probably could be performed with regex or other scenarios, but not as quickly. How long it has been since I have seen something like: jsfoo$.,.+4uxsbar$xi$$ ------------------------------ It has been so long that I am not even sure I remember the command set quite correcltly, but I found it quite useful back then and there have been times recently when I would have found it much faster than a ^X( macro or other method. Oh well... <>IHM<> -- uucp: ihnp4!nrcvax!ihm