utashiro@sra.co.jp (Kazumasa Utashiro) (05/29/91)
When we are editing Japanese text, grep doesn't work enough because Japanese word can be separated by newline. That is primary reason I made this command, but it is also useful to search English sentence made from multiple words, which may be folded into multiple lines. Major features are: - matching across the line boundary - underline, bold or stand-out matched pattern - show lines before/after matched line - recursive search with wildcard - automatic uncompress Man page is included but English is not very good, sorry. Any comments for command itself or perl programming are welcom. (//g is used correctly?) --- K. Utashiro utashiro@sra.co.jp ---------------------------------------------------------------------- #!/usr/local/bin/perl 'di'; 'ig00'; ;# ;# mg: multi-line grep ;# K. Utashiro <utashiro@sra.co.jp> ;# Original: Mar 29 1991 ;# Revised: Apr 1 1991, Apr 3 1991, Apr 4 1991, Apr 15 1991, Apr 19 1991 ;# Apr 23 1991, Apr 25 1991, May 29 1991 ;# Usage: ;# % mg 'internet control message protocol' *.txt ;# /* ;# * then you will get lines including search pattern, even if ;# * the string is splited by newline. ;# */ ;# require('getopts.pl'); #require('usage.pl'); $opts = 'ilnhwv:c:eEC:ubqQYas2RP:V:FTLSmMf:p:xzdIN'; @opts = ( '-:dINY:', 'i::ignore case', 'l::list filename only', 'n::print line number', 'h::do not display filenames', 'w::space doesn\'t match null ("a b" matches "ab" by default)', 'v:pattern:skip the line if matched with exp (w/o -a, -l)', 'c:n[,n]:print n-lines before/after matched line', 'e::use pattern as regular expression (space is special)', 'E::use pattern as regular expression completely', 'C:chars:continuous characters', 'u::underline matched string (except JIS)', 'b::make bold (print twice) matched string (except JIS)', 'q::quote matched string (except JIS)', 'Q::stand-out matched string (except JIS)', 'Y::yield to -Q option even if stdout is not a terminal', 'a::print whole file (no filename and line number)', 's::output filename and line number separately', '2::duplicate line if multiple matching is occured', 'R::search recursively', 'P:pattern:specify search file in wildcard (w/-R)', 'V:pattern:specify exception file in wildcard (w/-R)', 'F::follow symbolic link of directory (w/-R)', 'T::search text file only', 'L::print formfeed before each matching', 'S::get filenames from stdin', 'm::print only splited line', 'M::print only multiple matched line', 'f:file:file contains search pattern', 'p:pattern:specify search pattern', 'x::print processing filename', 'z::do not uncompress automatically', ); sub usage { @usage = &Usage($0, $opts, "pattern [ file ... ]", @opts); print "usage: mg [ -options ] pattern [ file... ]\n", pop(@usage); exit 2; } ($_ = $ENV{'MGOPTS'}) && do { s/^[^-]/-$&/; unshift(@ARGV, $_); }; &Getopts($opts) || &usage; exec "nroff -man $0|" .($ENV{'PAGER'}||'more'). " -s" if $opt_N; $opt_C =~ s/\W/\\$&/g, $opt_C = "[\\s$opt_C]" if $opt_C; $delim = ($opt_C || '\s') . ($opt_w ? '+' : '*'); ($rawdelim = $delim) =~ s/\W/\\$&/g; $in = join('|', (@in = ('\033\$\@', '\033\$B'))); $out = join('|', (@out = ('\033\(J', '\033\(B'))); $shiftcode = '(' . join('|', @in, @out) . ')'; $optionalseq = '(' . join('|', length, @in, @out, $opt_C || '\s'). ')*'; ($rawoptionalseq = $optionalseq) =~ s/\W/\\$&/g; sub makepat { local($pat, $p) = @_; for (split(/$shiftcode/, $pat)) { if (/$in/o) { $jis = 1; $p .= $optionalseq if $p; next; } if (/$out/o) { $jis = 0; next; } if ($jis) { s/../&jis($&)/eg; $p .= $_; next; } s/([\200-\377])?./length($&) > 1 ? &mb($&) : &asc($&)/eg; $p .= $_; } $p =~ s/($rawdelim|$rawoptionalseq)+$//; length($p) ? $p : undef; } $opt_f && do { open(F, $opt_f) || die "$opt_f: $!\n"; chop($opt_f = <F>); }; defined $opt_p || defined($opt_p = $opt_f || shift) || &usage; ($opt_p, $opt_v) = (&makepat($opt_p), &makepat($opt_v)); ($opt_P, $opt_V) = (&wildcard($opt_P), &wildcard($opt_V)); grep(eval "\$opt_$_=0,warn\"-$_ is disabled.\n\"if\$opt_$_;", 'q', 'u', 'b') if defined($jis); ($pre_c, $post_c) = /,/ ? split(',') : ($_, $_) if ($_ = $opt_c); ($ql, $qr, $qd, $qe) = ('>>', '<<', '', '') if $opt_q; ($ql, $qr, $qd, $qe) = &sose if ($opt_Q && ($opt_Y || -t STDOUT)); ($nlqsubs, $nlsubs) = $opt_n ? ('"$qd\n$fn".++$tl.":$qe"', '"\n$fn".++$tl.":"') : ('"$qd\n$fn$qe"', '"\n$fn"'); ($effect = $opt_b || $opt_u) && &initsubs; grep(eval "print \"opt_$_=/\$opt_$_/\n\" if \$opt_$_;", split(//, 'pvPVC')) if $opt_d; $* = 1; $ig = 'i' if $opt_i; $showfname = !$opt_h && ($#ARGV > $[ || $opt_R || $opt_S); $needinfo = $showfname || $opt_n; $neednlsubs = !$opt_s && $needinfo; $p_all = !($opt_m || $opt_M); push(@ARGV, '-') if @ARGV == 0; while (defined($file = $opt_S ? <> : shift)) { chop($file) if $opt_S; if (-d $file) { unshift(@ARGV, &getdirent($file)) if $opt_R; next; } next if ($opt_T && -B $file); open(F, $file) || do { $error = 2; warn "$file: $!\n" if !$opt_I; next; }; open(F, '-|') || exec('zcat', $file) || die("zcat: $!\n") if ($file =~ /\.Z$/ && !$opt_z); undef $/; $_ = <F>; $/ = "\n"; close(F); print "$file:\n" if $opt_x; if ($opt_l) { eval 'print("$file\\n"),$matched++ if /$opt_p/o'.$ig; next; } $#x = $[ - 1; if ($] > 4.003) { eval "while(/$opt_p/og$ig){push(\@x,length(\$`),length(\$&));}"; } elsif (($test || (($test = '10') =~ s/0/$`/, $test)) eq '11') { eval 's/$opt_p/push(@x,length($`),length($&));$&/ego'.$ig; } else { $m = length; eval 's/$opt_p/$t=length($&);push(@x,'.$m.'-length($\')-$t,$t);$&/ego'.$ig; } $fn = "$file:" if $showfname; if ($opt_a) { for ($op = 0; ($p, $len) = splice(@x, 0, 2); $op = $p + $len) { $matched++; $match = substr($_, $p, $len); $match = &effect($match) if $effect; print substr($_, $op, $p - $op), $ql, $match, $qr; } print substr($_, $op); next; } for ($line = 1, $op = 0; ($p, $len) = splice(@x, 0, 2); ) { $print = $p_all; @out = ($fn); $line += (substr($_, $op, $p - $op) =~ tr/\n/\n/) if $opt_n; $op = $p; $pnl = rindex($_, "\n", $p - 1) + 1; for ($n = $pre_c, $tl = $line; $n && $pnl > 0; $n--, $tl--) { $pnl = rindex($_, "\n", $pnl - 2) + 1; } push(@out, $opt_s ? $line : $tl, ':') if $opt_n; $left = substr($_, $pnl, $p - $pnl); $left =~ s/\n/$nlsubs/gee if $neednlsubs; push(@out, "\n") if ($opt_s && $needinfo); push(@out, $left); for (;; ($p, $len) = splice(@x, 0, 2)) { $match = substr($_, $p, $len); $print = (index($match, "\n") >= $[) if $opt_m; $match = &effect($match) if $effect; $match =~ s/\n/$nlqsubs/gee if $neednlsubs; push(@out, $ql, $match, $qr); $nnl = index($_, "\n", $p += $len); last if ($nnl < $x[$[] || $opt_2 || @x < 2); $opt_M && $print++; push(@out, substr($_, $p, $x[$[] - $p)); } for ($n = $post_c; $n--; $nnl = $tnnl) { last if ($tnnl = index($_, "\n", $nnl+1)) < 0; } $right = substr($_, $p, $nnl - $p); next if ($opt_v ne '' && substr($_, $pnl, $nnl - $pnl) =~ /$opt_v/o); $right =~ s/\n/$nlsubs/gee if $neednlsubs; push(@out, $right, "\n"); unshift(@out, "\f\n") if ($opt_L && $matched); print @out if ($print && ++$matched); } } exit($error || !$matched); sub asc { local($_) = @_; $opt_E || s/\s/$delim/ || $opt_e || s/\W/\\$&/, $_; } sub mb { local($_, @_) = ($_[$[], split(//, shift)); $_ = sprintf("\\%03o\\%03o", ord(shift), ord(shift)) if @_ == 2; $opt_E ? $_ : $_.$delim; } sub jis { $_[$[] =~ s/(\W)/\\$1/g; $_[$[] . $optionalseq; } sub initsubs { @ul[1,2] = ("_\010", "__\010\010"); @bs[1,2] = ("\010", "\010\010"); $s = 'sub effect{$_[$[]=~s/[\\200-\\337].|./$l=length($&);'; $s .= '$ul[$l].' x $opt_u . '$&.$bs[$l].' x $opt_b . "\$&/ge;shift;}\n"; eval $s; print $s if $opt_d && ($opt_u || $opt_b); } sub getdirent { local($dir, $_, @ent) = @_; print "Entering \"$dir\".\n" if $opt_d; return @ent if (-l $dir && !$opt_F); opendir(DIR, $dir) || do { warn "$dir: $!\n"; return @ent; }; push(@ent, $_) while ($_ = readdir(DIR)); close(DIR); @ent = grep(!m#^\.\.?$# && (-d "$dir/$_" || $opt_P eq '' || m/$opt_P/o) && ($opt_V eq '' || !m/$opt_V/o) && s#^#$dir/#, sort @ent); } sub wildcard { local($_) = @_; s#\\?.#$_ = $&; s/\\?([_0-9A-Za-z])/$1/ || /\\./ || s/[*]/.*/ || s/[|]/\$|^/ || tr/?{,}[]\-/.(|)[]\-/ || s/./\\$&/; $_;#ge; length($_) ? "^$_\$" : undef; } sub sose { do 'ioctl.ph' || do 'sys/ioctl.ph'; require('termcap.pl'); $ospeed = 1 unless $ospeed; &Tgetent; $so = &Tputs($TC{'so'}); $se = &Tputs($TC{'se'}); ($so, $se, $se, $so); } ;#------------------------------------------------------------ ;# usage.pl: make a string for usage line. ;# by K. Utashiro <utashiro@sra.co.jp> on Sep 7 1990 ;# Revised by utashiro on Mar 16 1991 ;# Revised by utashiro on Mar 20 1991 ;# ;# Syntax: &Usage($command, $option, $trailer, @arglist); ;# $command: command name (maybe $0) ;# $option: option string same as &Getopt ;# $trailer: trailer string (optional) ;# @arglist: description for options which takes argument (optional) ;# format is "option character : argument : description" ;# where argument and description are optional. ;# special form '-:xyz' hides options -x, -y, -z. ;# ;# &Usage returns list of two strings where 1st string is for usage ;# line and 2nd is for description. ;# ;# Example: ;# $opts = 'deg:u:s:x'; @arglist = ( ;# '-:x', # means -x is secret option ;# 'd::debug', ;# 'g:group', ;# 'u:user:user name', ;# ); ;# unless (&Getopts($opts)) { ;# print &Usage($0, $opts, 'file ...', @arglist); ;# exit(1); ;# } ;# ;# Result: ;# usage: sample [ -d ] [ -e ] [ -g group ] [ -u user ] [ -s : ] file ... ;# -d debug ;# -u user user name ;# sub Usage { package usage; reset('a-z'); local($cmd, $opt, $trailer, @arglist) = @_; for (@arglist) { ($name, $arg, $desc) = split(/:/, $_, 3); if ($name eq '-') { grep($hide{$_}++, split('', $arg)); next; } next if $hide{$name}; $arg{$name} = $arg; $desc{$name} = $desc; $w = length($arg) if ($desc && $w < length($arg)); } $cmd =~ s#.*/##; push(@usage, 'usage:', $cmd); while ($opt =~ /^\s*(.)(:?)/) { $opt = $'; next if $hide{$1}; push(@opts, $1); push(@usage, '[', "-$1"); push(@usage, $arg{$1} || $2) if $2; push(@usage, ']'); } push(@usage, $trailer) if $trailer; for (grep($desc{$_}, @opts)) { push(@desc, sprintf("\t-%s %-${w}s %s\n", $_, $arg{$_}, $desc{$_})); } (join(' ', @usage)."\n", join('', @desc)); } 1; ;#------------------------------------------------------------ .00; # finish .ig 'di \" finish diversion--previous line must be blank .nr nl 0-1 \" fake up transition to first page again .nr % 0 \" start at page 1 ';<<'.ex'; #__END__ ############# From here on it's a standard manual page # .TH MG 1 "May 29, 1991" .AT 3 .SH NAME mg \- multi-line grep .SH SYNOPSIS .B mg [ .B options ] .I pattern [ .I file \&... ] .SH DESCRIPTION .I Mg searches the specified pattern from files or standard input and print lines which contain the pattern. It has almost same function as Unix command .IR grep (1) but is distinguished from grep because the matching is done across the line boundaries. .PP For example, to find a sentence ``internet control message protocol'' from many RFC texts, you can say .nf mg\ \-i 'internet control message protocol'\ *.txt .fi Match will occur for sequence of words `internet', `control', `message', `protocol' separated by any number of whitespace characters including null string (\-w option avoids null matching). .PP .I Mg is also very useful to find a word from Japanese text because Japanese words are not separated by whitespaces and newline character can be inserted at any place of the text. As a matter of fact, .I mg is made for Japanese string search originally. Any Japanese codes including JIS, Shift-JIS, EUC can be handled hopefully, but JIS code disables some features and may make the command confused when searching a short string. .PP If the file has .Z suffix, it is uncompressed before search. .SH ENVIRONMENT Environment variable MGOPTS is used as a default options. .SH OPTIONS .LP GREP COMPATIBLE OPTIONS: .RS .IP \-i Ignore case. .IP \-l List filename only. .IP \-n Show line number. .IP \-h Do not display filenames even if multiple filenames are specified by command line. .RE .LP SLIGHTLY DIFFERENT OPTIONS: .RS .IP \-w Word sensitive. Space character in the pattern matches null string by default. So pattern ``a\ b'' matches to string ``ab''. With this option, space character matches single or more space. .IP "\-v \fIpattern\fP" Skip the line if matched with pattern. Don't print the matched line if it matched to the pattern specified with this option. This option doesn't have any effect used with \-a or \-l option. .RE .LP OTHER OPTIONS: .RS .IP "\-c \fIn[,n]\fP" Print n-lines before/after matched line. Only single line is displayed by default, but you can specify the lines before or after matched line to be printed. If ``\-c\ 3'' options is suplied, three lines before and after matched line will be displayed together. You can see only after three lines by ``\-c 0,3'' option. .IP \-e Use the pattern as a regular expression in .IR perl (1) but space is treated specially. With this option, you can use .I mg like .IR egrep (1) like this: .nf mg \-e 'foo bar|\^goo car|\^hoo dar' ... .fi See .IR perl (1) for detail of regular expression. .IP Note that if you are running .IR jperl (1) kanji code can be used in range expression, but you have to use \-E option to do that. .IP \-E Use the pattern as regular expression in .I perl completely. .IP "\-C \fIchars\fP" Continuous character. If you want search sentence continued by other than white space characters, continuous characters can be set by this options. For example, next command find a sentence even if it is quoted by `>' or `|' mark. .nf mg \-C '>\^|\^' 'ninja shogun fujiyama' `mhpath all` .fi To search a pattern in C style comment: .nf mg \-C '/*' 'setuid scripts' perl.c .fi .IP Note that continuous characters don't have to be found only top of the string. So ``foo\ bar'' matches a string ``foo>>bar'' on the previous example. .IP \-u Underline matched string. Makes a matched string underlined by precede each character by ``_^H''. .IP \-b Make bold matched string. Makes a matched string overstriked like ``f_^Hfo_^Hoo_^Ho''. .IP \-q Quote matched string by like ``>>matched<<''. This options is disabled when searching JIS file. .IP \-Q Use a stand-out feature of the terminal to quote the matched string (not for JIS). This options is ignored when the standard output is not a terminal. \-q and \-Q are useful for long line. Try .nf echo $path | mg \-Q mh .fi .IP \-a Print all contents of the file. This option makes a sense only if used with options like \-q, \-Q, \-u, \-b, otherwise behave almost same as .IR cat (1). Filename and lines are not printed with this option. .IP \-s When multiple files are specified, each matched line are preceded by ``filename:'' string. With this option, newline character is inserted between filename and matched line. This option is useful when the filenames are very long. .IP \-2 Usually only one line is displayed even if multiple matching is occurred for the same line. With this option, each matching will be displated in different line. .IP \-R Search recursively. Only files specified by command line arguments are searched by default. When invoked with this option and arguments contains a directory, it is searched recursively. Usually used with \-P option. .IP "\-P \fIpattern\fP" Search file pattern. When directories are searched recursively, only files which matches the `pattern' is searched. A `pattern' is specified in wildcard format same as shell and `|\^' character can be used for alternative in addition. For example, you can find a string ``foobar'' from all C source files and makefiles like this: .nf mg \-RP '*.c|\^[Mm]akefile' foobar /usr/src .fi .IP "\-V \fIpattern\fP" Exception file pattern. This is a counterpart of \-P. Only files which DOES NOT match the pattern will be searched. .nf mg \-RV '*.[oas]' foobar /usr/src .fi .IP Note that \-V option is also applied to a directory name while \-P option has an effect only for a file. This means you can specify a directory name to skip, but can't specify a directory name to search. .IP \-F Follow symbolic link of a directory. Doesn't follow by default. .IP \-T Search text file only. If this options is suplied, only text files are searched. Decision is made by perl operator \-T. .IP \-L Print formfeed between each matchings. Print the formfeed character before each matched line. This options is useful used with \-c option and piped to pager command. .IP \-S Get filenames from standard input. Read standard input and use each line as a filename for searching. You can feed the output from other command like .IR find (1) for .I mg with this option. .IP \-m Print matched line only when the pattern is across the line. .IP \-M Print matched line only when multiple matching is occurred for the same line. .IP "\-f \fIfile\fP" Specify the file which contains search pattern. Only the first line of the file is used. This options is useful when input of KANJI character is difficult from tty driver. .IP \-p pattern Specify search pattern. You don't have to use this option explicitly because the first argument after options will be treated as a pattern. .IP \-x Print processing filename. .IP \-z Disables automatic uncompress. .RE .SH AUTHOR Kazumasa Utashiro <utashiro@sra.co.jp> .br Software Research Associates, Inc., Japan .SH "SEE ALSO" grep(1), perl(1), jperl(1) .SH BUGS .PP Hypheneation is not supported. Please don't tell me to support skipping header and footer inserted by nroff... .PP Long JIS code pattern can not be processed. .PP /pattern/g syntax is not tested yet. .ex
sherman@unx.sas.com (Chris Sherman) (05/30/91)
In <UTASHIRO.91May29164841@ext12.sra.co.jp> utashiro@sra.co.jp (Kazumasa Utashiro) writes: >Any comments for command itself or perl programming are >welcom. (//g is used correctly?) >--- >K. Utashiro >utashiro@sra.co.jp >---------------------------------------------------------------------- >#!/usr/local/bin/perl >'di'; >'ig00'; >;# >;# mg: multi-line grep >;# K. Utashiro <utashiro@sra.co.jp> >;# Original: Mar 29 1991 >;# Revised: Apr 1 1991, Apr 3 1991, Apr 4 1991, Apr 15 1991, Apr 19 1991 >;# Apr 23 1991, Apr 25 1991, May 29 1991 Great program. But I am having a problem with the recursive descent option. I'm not sure if it is the option itself, or the mere fact it allows the program to run for a longer period of time, but as soon as I start the command, my available disk space drops about 7 Megs within the first minute or so. And the more the program runs, the more disk space gets eaten. After a while, I get a Filesystem Full error. This has happened on an HP and Sun4, both running perl v4p3. The funny thing is that I can't figure out where all the disk space goes... I stopped (cntl-Z) the program and thoroughly examined the file system, and I couldn't find 7 Megs worth of new files. When I kill (kill -9) the program, the space comes back. Does anyone know what is going on here? Is the problem caused by perl, or by this neat program? Big Thanx, -- Chris Sherman .................... sherman@unx.sas.com | ,-----------------------------------------' / Q: How many IBM CPU's does it take to execute a job? | A: Four; three to hold it down, and one to rip its head off.