tres@virga.rap.ucar.edu (Tres Hofmeister) (03/05/91)
I've run across a regular expression that I don't quite understand. Not that this hasn't happened before, but this seems like it should be fairly straightforward... I'm trying to match entries in /etc/group which have one or more members. The following works just fine, matching each of the colon delimited fields individually followed by one or more characters: grep '^.*:.*:.*:..*' /etc/group What I don't understand is why the following doesn't work the same way: grep '^.*:..*' /etc/group It grabs entries with one or more members, true, but also grabs entries with no members, e.g. "news:*:6:". I figured that this regexp would match the longest possible string at the beginning of a line, terminated by a colon, which in the group file should include the first two colons, followed by at least one character. It seems to be doing something else, given that it will also match a line with no members. Any ideas? Tres Hofmeister tres@ncar.ucar.edu -- Tres Hofmeister tres@ncar.ucar.edu
jik@athena.mit.edu (Jonathan I. Kamens) (03/05/91)
In article <10469@ncar.ucar.edu>, tres@virga.rap.ucar.edu (Tres Hofmeister) writes: |> It grabs entries with one or more members, true, but also grabs |> entries with no members, e.g. "news:*:6:". I figured that this regexp |> would match the longest possible string at the beginning of a line, |> terminated by a colon, which in the group file should include the first |> two colons, followed by at least one character. It seems to be doing |> something else, given that it will also match a line with no members. Each segment of a regular expression matches the longest possible string that it can match *while allowing the rest of the regular expression to match as well*. So, let's analyze what happens when the regexp "^.*:..*" is compared to "news:*:6:". It will first match the colon in that regexp against the last colon in the string. But then it will discover that when it does that, the rest of the regexp can't be matched. So it will back off and see if "^.*:" can be matched against something shorter. As a result, the colon will get matched up with the second to last colon in the string, and the "..*" will match against "6:". I hope this clears things up for you. -- Jonathan Kamens USnail: MIT Project Athena 11 Ashford Terrace jik@Athena.MIT.EDU Allston, MA 02134 Office: 617-253-8085 Home: 617-782-0710
]) (03/07/91)
In article <10469@ncar.ucar.edu> tres@virga.rap.ucar.edu (Tres Hofmeister) writes: > > I've run across a regular expression that I don't quite understand. >Not that this hasn't happened before, but this seems like it should be >fairly straightforward... > > I'm trying to match entries in /etc/group which have one or more >members. The following works just fine, matching each of the colon >delimited fields individually followed by one or more characters: > > grep '^.*:.*:.*:..*' /etc/group This one will find any line with three or more colons with a character of any type after colon-number-three-or-higher. This re means From start of line zero or more of any characters a colon zero or more of any characters a colon zero or more of any characters a colon any single character zero or more of any characters It'll match good group entries and ::::: :::.: ::: --: :::: a:::a :::a > What I don't understand is why the following doesn't work the same >way: > > grep '^.*:..*' /etc/group This one will find any record that includes a : before the last char in the line. The re means From start of line, zero or more of any characters a colon any single character 0 or more of any characters It matches the following :: :::a :b::: :b gigo:1123 > It grabs entries with one or more members, true, but also grabs >entries with no members, e.g. "news:*:6:". I figured that this regexp >would match the longest possible string at the beginning of a line, >terminated by a colon, which in the group file should include the first >two colons, followed by at least one character. It seems to be doing >something else, given that it will also match a line with no members. The only lines it *won't* match are those with no colons or where the only colon in the line is the last character. What it's looking for is a line with a colon followed by a character. > Any ideas? Instead of .* in there, on the first (field matching) version: grep '^[^:]*:[^:]*:[^:]*:..*' /etc/group Even better for the second example is to anchor at the END instead of the BEGINNING of the data lines: grep ':[^:]+$' /etc/group will match any line with at least one non-colon character following the last colon in the line. Alternatives that are the same: :[^:]\{1,\}$ :[^:][^:]*$ ^.*:[^:][^:]*$ Finally, any line not matching the following is either a group with no members or a badly-formed line in the file ^[^:]+:[^:]*:[0-9]+:[^:]+$ which matches From start of line at least one non-colon a colon any number of non-colons a colon a decimal number a colon at least one non-colon end of line Note that it won't see other anomolies like a group with too big a gid (system dependent and we can't check to see if it's 65536, for instance, if 65535 is the biggest) or usernames that are too long or weird stuff in the userids field (we could exclude spaces, for instance, by testing in each case [^: ] instead of [^:] ), but any line *not* found by the above is either a group with no members or a badly formed line. ...Kris -- Kristopher Stephens, | (408-746-6047) | krs@uts.amdahl.com | KC6DFS Amdahl Corporation | | | [The opinions expressed above are mine, solely, and do not ] [necessarily reflect the opinions or policies of Amdahl Corp. ]