[net.bugs.4bsd] Reg Expr bugs in vi?

emrath@uiuccsb.UUCP (09/27/83)

#N:uiuccsb:6300001:000:929
uiuccsb!emrath    Sep 26 04:34:00 1983

There seems to be bugs in the reg expr searcher of vi.
Create a file of random text, such as:
skd jflksd jfsdlk
llkwejrlkw hllejrljlkajioerumm
jlrkwejrkw2345kld';l./,l sdfl

The following search patterns all seem to behave the same way.
/^[a-z]
/.*[a-z]
/[a-z].*

Furthermore, making the . a real non-letter, say "3", giving the expr:
/3*[a-z]
 
causes the search to NOT find a letter if it is the character right
after(before) the cursor.
/[a-z]3*  seems to work, however.


It seems to me, any time a "*"ed expr appears at either end of an expr,
it should be dropped off.  (Hmmmm,  "/3*" moves the cursor by two chars,
until you hit a blank line.)

I realize these are rather meaningless searches, but I ran across this
when I wanted to do:
/,.*[0-9]
and mistakenly entered:
/,*[0-9]
The fact that these cases don't seem to work lowers my faith in the searcher's
ability to produce the correct results on a meaningful pattern.

davec@tektronix.UUCP (Dave Clemans) (09/28/83)

The examples you gave worked perfectly correctly. On the input
skd jflksd jfsdlk
llkwejrlkw hllejrljlkajioerumm
jlrkwejrkw2345kld';l./,l sdfl

you said that the following search patterns all seem to behave the same way.
/^[a-z]
/.*[a-z]
/[a-z].*

And well they should. Perhaps you are confusing shell syntax with regular
expression syntax. The combination ".*" will match anything of any length.
Thus the 1st example will find the first line that starts with the letters
a-z; the second and third will find the first lowercase letter after the
cursor.

Your second example,
/,.*[0-9]
instead of
/,*[0-9]

works like this: the first example will find the first line with a comma
and a number in it. Both
,0
and
,ldfjljlsajfldfjlfasjfj0
will be found by that pattern.

The second will find only a line with a number in it,
because '*' can match null too. It could conceivably find a line of the type
,,,,,,,,,,,,,,,,,,,,,,,1
also, but it would also work on
1

Playing with regular expressions can be tricky. Perhaps before you yell
"bug" you should explain exactly what you were trying to do rather than
what you did.

Rick Lindsley
richl@tektronix
...!tektronix!richl

dce@tekecs.UUCP (David Elliott) (09/28/83)

NONONONO - You are wrong Rich. Here's why :

*************************************
>From tektronix!uw-beaver!cornell!vax135!ariel!houti!hogpc!houxm!ihnp4!ixn5c!inuxc!pur-ee!uiucdcs!uiuccsb!emrath Mon Sep 26 19:36:47 1983
Subject: Reg Expr bugs in vi? - (nf)
Newsgroups: net.bugs.4bsd

#N:uiuccsb:6300001:000:929
uiuccsb!emrath    Sep 26 04:34:00 1983

There seems to be bugs in the reg expr searcher of vi.
Create a file of random text, such as:
skd jflksd jfsdlk
llkwejrlkw hllejrljlkajioerumm
jlrkwejrkw2345kld';l./,l sdfl

The following search patterns all seem to behave the same way.
/^[a-z]
/.*[a-z]
/[a-z].*
*************************************

	He's right! These do all act the same. They shouldn't!
	/^[a-z]/ says '[a-z] at the beginning of a line'.
	/.*[a-z]/ says '[a-z] after anything or nothing'.
	/[a-z].*/ says '[a-z] followed by anything'.

	What they all do is go to the next line containing an
	alphabetic lower-case character. The second expression
	should just move the cursor to the next alpha lower-case
	character, not the next line.

*************************************
Furthermore, making the . a real non-letter, say "3", giving the expr:
/3*[a-z]
 
causes the search to NOT find a letter if it is the character right
after(before) the cursor.
/[a-z]3*  seems to work, however.

*************************************

	Right again. /3*[a-z]/ should also find the next alphabetic
	lower case character. Instead, it skips a letter.
	/[a-z]3*/ doesn't skip the character.

*************************************
It seems to me, any time a "*"ed expr appears at either end of an expr,
it should be dropped off.  (Hmmmm,  "/3*" moves the cursor by two chars,
until you hit a blank line.)

I realize these are rather meaningless searches, but I ran across this
when I wanted to do:
/,.*[0-9]
and mistakenly entered:
/,*[0-9]
The fact that these cases don't seem to work lowers my faith in the searcher's
ability to produce the correct results on a meaningful pattern.

*************************************

I looked at the vi reference and it says that regular expressions look
for the next 'string' that matches the expression, not the next
line, so there is definitely a bug.

When I first read this, I thought that the submitter was wrong, but
he obviously isn't, he just didn't explain well enough.

			David

mp@mit-eddie.UUCP (Mark Plotnick) (09/30/83)

Sorry, it's still not a bug.  if you're positioned at a piece of text
that matches the regular expression, it skips it and finds the next
piece of text that matches the regular expression.  Why?  Because
it's useful to be able to step through a file while stopping at
every point that matches a given regular expression; if the string
of characters right after the cursor were a candidate for r.e. searching,
then repeated searches for the same r.e. wouldn't move you at all!
	Mark

mcdaniel@uiucdcs.UUCP (09/30/83)

#R:uiuccsb:6300001:uiucdcs:8200014:000:2596
uiucdcs!mcdaniel    Sep 29 20:30:00 1983

tektroni!davec's comments are preceded by > below.

>                                            . . . On the input
>skd jflksd jfsdlk
>llkwejrlkw hllejrljlkajioerumm
>jlrkwejrkw2345kld';l./,l sdfl
>
>you said that the following search patterns all seem to behave the same way.
>/^[a-z]
>/.*[a-z]
>/[a-z].*
>
>And well they should. Perhaps you are confusing shell syntax with regular
>expression syntax. The combination ".*" will match anything of any length.
>Thus the 1st example will find the first line that starts with the letters
>a-z; the second and third will find the first lowercase letter after the
>cursor.
No, he's not confusing it with shell syntax.  The second and third examples
DO NOT "find the first lowercase letter after the cursor."  They should match,
respectively, a string as long as possible of arbitrary characters (except /n)
followed by an alpha, and an alpha followed by a SALAPOAC(E/n), both
starting just after the cursor.  If you start in the middle of a line
and do either of the two, it matches the entire NEXT LINE.  It should
match the REMAINING text on the current line.

>Your second example,
At least third (8th if you count all REs -- unless you have a different
numbering system).  Why didn't you consider his second example?
>/,.*[0-9]
>instead of
>/,*[0-9]
>works like this: the first example will find the first line with a comma
>and a number in it.
As noted above, it should not "find a line".  It should match text,
on the CURRENT LINE if possible (and move the cursor to the start
of the matched text).

About his example of "/3*[a-z]" ("second" example):
TRY it on the example text.  You know what will happen
(at least in vi "Version 3.6, 11/3/80")?  It will move
the cursor in 2 character increments.  "/[a-z]3*" moves the cursor
in 1 character increments.  Why 2 CI in one, 1 CI in the other?

Now, about '"/3*" moves the cursor by two chars . . . '
[Emrath].  Again, why move by two?  Also, why does it stop at the
end of the line?  Why not go on to the next line?  (It does not
seem to stop at a blank line, as he indicates.)

>Playing with regular expressions can be tricky. Perhaps before you yell
>"bug" you should explain exactly what you were trying to do rather than
>what you did.
Indeed.  Couldn't agree more.
Perhaps before *YOU* yell "you're wrong", you should try the examples
first, hmmmmm? (Should I? Well, he was stupid enough not to check
it out first -- why not?) you stupid drip.

Tim McDaniel, University of Illinois at Urbana-Champaign, CS dept.
(UNIX mail: . . . pur-ee!uiucdcs!mcdaniel)
(CSNET: mcdaniel.uiuc@RAND-RELAY)

emrath@uiuccsb.UUCP (10/01/83)

#R:uiuccsb:6300001:uiuccsb:6300002:000:983
uiuccsb!emrath    Oct  1 00:35:00 1983

Without knowing or caring what vi actually does, what would you
have it do?  The input file contains the line:

3333333333

The commands to be typed are:

1G/,*[0-9]\rnnnnn

I claim it should first put the cursor at the first 3 (column 1) (ok so far:-).
After the /,*[0-9]\r it should move the cursor to the second 3 (column 2).
For each n command, the cursor should move right 1 (one) column.

I believe that if the r.e. DOES match null (OR anything non-null, such as 3*),
the cursor should NOT move. The documentation for this case is ambiguous at
best. However, discussion of this case may continue independently of how vi
should act on the above example, where the pattern does NOT match null.


Apparently, vi doesn't know how to backtrack worth a damn (at all?).
Enter the word mississippi on a line, position the cursor on the first s in
the word, and search for issi. GOOD LUCK.

		Perry Emrath, Univ. of IL
		...{decvax|inuxc}!pur-ee!uiucdcs!emrath
		emrath.uiuc@rand-relay

chris@umcp-cs.UUCP (10/01/83)

WARNING:  Beware of Feature

Some of your "bugs" are really "features" because of the way VI does
things.  The reason the search /3*[a-z]/ moves you two characters
if you're sitting on a long string of text, is that vi doesn't want
to re-find the last found occurance of this, so it moves you right
before starting the search.  It should move right one character, but
apparently someone made it move 2 for some reason.

Also, things with "c*" always try to match the longest string on a
line; this tends to do counter-intuitive things sometimes.  (I suppose
that depends on what your intuition says.)
-- 
Real:	Chris Torek, Univ of MD Comp Sci
UUCP:	{seismo,allegra,brl-bmd}!umcp-cs!chris
CSNet:	chris@umcp-cs
ARPA:	chris%umcp-cs@UDel-Relay

chris@umcp-cs.UUCP (10/03/83)

I think I've figured it out:

When you search for a string in VI, in order to ``guarantee'' that it
doesn't match the last-found version of the same string, vi attempts
to determine the first match on the current line; if the END of this
match is AT OR BEYOND the current cursor position (before the search
started) the match is discarded.  Thus:

	mississippi
	  ^
	cursor
/issi/ doesn't find the second "issi" (which overlaps the first "issi")
because it starts at the current line, finds "issi"; this goes over the
cursor position, so it searches for the next starting from AFTER the
issi!

The problem is, simply, that vi is a LINE EDITOR.  The most basic unit
is the LINE.  Occaisonally you get finer granularity, mostly from the
open/visual code.  The search code was probably hacked to make it look
like was character oriented; it's not.

(This bugs me no end when I ^Z and come back to find that my cursor is
now at the beginning of the line!)
-- 
Real:	Chris Torek, Univ of MD Comp Sci
UUCP:	{seismo,allegra,brl-bmd}!umcp-cs!chris
CSNet:	chris@umcp-cs
ARPA:	chris%umcp-cs@UDel-Relay

richl@tektronix.UUCP (Rick Lindsley) (10/06/83)

You are right, I am wrong. The case wasn't presented very clearly in
the original article, but after your description it is apparent that
there is a bug in the regular expressions of vi.

Rick