lang@PRC.UNISYS.COM (09/13/89)
This may not be a bug, but rather my misunderstanding the documentation, but this strikes me as odd.... % grep '\wx\w' foo finds all occurrences in foo of 'x' flanked by alphanumerics. % grep '\Wx\W' foo finds all occurrences in foo of 'x' flanked by non-alphanumerics. Fine. So far so good. Suppose now I want to find all occurrences of 'x' flanked by non-alphanumerics, but I want the match on 'x' not to be case-sensitive. Well, the -i flag makes grep ignore case difference when comparing strings, so I try % grep -i '\Wx\W' foo But the effect here is that the -i flag makes the '\W' meta-character non-case-sensitive as well, and I get all occurrences of x in foo, regardless of the case of x (which is fine) but also regardless of whether or not the x (or X) is flanked by non-alphanumerics! Is this a feature? --Francois Lang
tarvaine@tukki.jyu.fi (Tapani Tarvainen) (09/16/89)
In article <8909121817.AA11159@gem> lang@PRC.UNISYS.COM writes: ... >% grep -i '\Wx\W' foo > >But the effect here is that the -i flag makes the '\W' >meta-character non-case-sensitive as well I would call this a bug (and patched it). I traced the problem to the following piece of code in dfa.c: /* Parse and analyze a single string of the given length. */ void regcompile(s, len, r, searchflag) const char *s; size_t len; struct regexp *r; int searchflag; { if (case_fold) /* dummy folding in service of regmust() */ { static char *p; case_fold = 0; for (p = (char *)s; *p != 0; p++) if (isupper((int)*p)) *p = tolower((int) *p); ... I.e., when the -i flag is given, it folds the entire regexp to lower case before doing anything else with it. I failed to find any reason for this: the search routines handle case folding on their own anyway, and removing the above loop didn't seem to have any other effect than removing the undesired effect of -i flag on \W (and \B). Does somebody know if the folding loop is necessary in some other program using dfa.c (or do they maybe have the same bug)? If yes, the following should work (avoids changing letters after a \): for (p = (char *)s; *p != 0; p++) if (isupper((int)*p)) *p = tolower((int) *p); else if (*p == '\\' && *(p+1)) p++; As far as e?grep is concerned, however, just removing the loop seems to work just fine (the declaration of p can be removed as well). Here's a context diff for just that (actually it #if's them out rather than deletes them and adds a comment): *** dfa.old Sat Sep 16 12:04:32 1989 --- dfa.c Sat Sep 16 12:04:34 1989 *************** *** 1668,1679 **** --- 1668,1685 ---- { if (case_fold) /* dummy folding in service of regmust() */ { + /* the following two #if 0's added by Tapani Tarvainen 16 Sep 89 */ + /* to prevent -i flag from affecting \W and \B in e?grep */ + #if 0 static char *p; + #endif case_fold = 0; + #if 0 for (p = (char *)s; *p != 0; p++) if (isupper((int)*p)) *p = tolower((int) *p); + #endif reginit(r); r->mustn = 0; r->must[0] = '\0'; -- Tapani Tarvainen (tarvaine@tukki.jyu.fi, tarvainen@finjyu.bitnet)