rfinch@caldwr.water.ca.gov (Ralph Finch) (11/03/90)
Is there something like grep, except it will (easlly) search an entire file (not just line-by-line) for regexp's near each other? Ideally it would rank hits by how much or how close they match, e.g. fzgrep 'abc.*123' filename would return hits not by line number but by how close abc & 123 are found together. Also it wouldn't matter what order the regexp's are. -- Ralph Finch 916-445-0088 rfinch@water.ca.gov ...ucbvax!ucdavis!caldwr!rfinch Any opinions expressed are my own; they do not represent the DWR
lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (11/06/90)
In article <242@locke.water.ca.gov> rfinch@caldwr.water.ca.gov (Ralph Finch) writes: : Is there something like grep, except it will (easlly) search an entire : file (not just line-by-line) for regexp's near each other? Ideally it : would rank hits by how much or how close they match, e.g. : : fzgrep 'abc.*123' filename : : would return hits not by line number but by how close abc & 123 are : found together. Also it wouldn't matter what order the regexp's are. I sincerely doubt you're going to find a specialized tool to do that. But if you just slurp a file into a string in Perl, you can then start playing with it. For example, if your search strings are fixed, you can use index: #!/usr/bin/perl undef $/; while (<>) { # for each file $posabc = index($_, "abc"); next if $posabc < 0; $pos123 = index($_, "123"); next if $pos123 < 0; $diff = $posabc - $pos123; $diff = -$diff if $diff < 0; print "$ARGV: $diff\n"; } Of course, you'd probably want to make a subroutine of that middle junk. Or you can say: #!/usr/bin/perl undef $/; while (<>) { # for each file tr/\n/ /; # so . matches anything (/(abc.*)123/ || /(123.*)abc/) && print "$ARGV: " . (length($1)-3) . "\n" } Those .*'s are going to be expensive, though. Maybe #!/usr/bin/perl undef $/; while (<>) { # for each file next unless /abc/; $posabc = length($`); next unless /123/; $pos123 = length($`); $diff = $posabc - $pos123; $diff = -$diff if $diff < 0; print "$ARGV: $diff\n"; } Of course, none of these solutions is going to find the closest pair, necessarily. To do that, use a nested split, which also works with arbitrary regular expressions: #!/usr/bin/perl undef $/; while (<>) { # for each file $min = length($_); @abc = split(/abc/, $_, 999999); next if @abc == 1; # no match &try(shift(@abc), 0, 1); &try(pop(@abc), 1, 0); foreach $chunk (@abc) { &try($chunk, 1, 1); } next if $min == length($_); print "$ARGV: $min\n"; } sub try { ($hunk, $first, $last) = @_; @pieces = split(/123/, $hunk, 999999); if ($first && $min > length($pieces[0]) { $min = length($pieces[0]); } if ($last && $min > length($pieces[$#pieces]) { $min = length($pieces[$#pieces]); } } Or something like that... Larry Wall lwall@jpl-devvax.jpl.nasa.gov
kehoe@scotty.dccs.upenn.edu (Brendan Kehoe) (11/06/90)
In <10240@jpl-devvax.JPL.NASA.GOV>, lwall@jpl-devvax.JPL.NASA.GOV writes: >I sincerely doubt you're going to find a specialized tool to do that. .. tons & tons of Perl code by its dad .. > >Or something like that... > Hahahaha. This made my day. [Sad, but true.] Brendan Kehoe | Soon: brendan@cs.widener.edu [ Today? Could it be? <Ohm...> ] For now: kehoe@scotty.dccs.upenn.edu | Also: brendan.kehoe@cyber.widener.edu "The latest polls indicate you're in danger of losing touch with the common man." "Oh DEAR ... heaven forfend!"
MarkD@Aus.Sun.COM (11/06/90)
kehoe@scotty.dccs.upenn.edu (Brendan Kehoe) writes: >In <10240@jpl-devvax.JPL.NASA.GOV>, lwall@jpl-devvax.JPL.NASA.GOV writes: >>I sincerely doubt you're going to find a specialized tool to do that. > .. tons & tons of Perl code by its dad .. >> >>Or something like that... >> > Hahahaha. This made my day. [Sad, but true.] Agreed. But what gets me is the number of different ways he manages to sneak in these dang Perl lessons! Just when I was about to Beta test my "Impending Perl lesson" detector, he goes and changes his posting patterns - sigh, maybe I should re-write my detector in Perl :-) ------------ ----------------- -------------------- Mark Delany markd@Aus.Sun.COM ...!sun!sunaus!markd ------------ ----------------- --------------------
lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (11/07/90)
In article <markd.657881866@sunchat> MarkD@Aus.Sun.COM writes: : kehoe@scotty.dccs.upenn.edu (Brendan Kehoe) writes: : : >In <10240@jpl-devvax.JPL.NASA.GOV>, lwall@jpl-devvax.JPL.NASA.GOV writes: : >>I sincerely doubt you're going to find a specialized tool to do that. : : > .. tons & tons of Perl code by its dad .. : >> : >>Or something like that... : >> : : > Hahahaha. This made my day. [Sad, but true.] : : Agreed. But what gets me is the number of different ways he manages to : sneak in these dang Perl lessons! Just when I was about to Beta test my : "Impending Perl lesson" detector, he goes and changes his posting : patterns - sigh, maybe I should re-write my detector in Perl :-) It'd be fairly trivial: #!/usr/bin/perl while (<>) { /^:.*[?!]/ && warn "Impending Perl lesson!!!!\n"; } :-) Larry