[comp.unix.questions] context-grep

lang@pearl.PRC.Unisys.COM (Francois-Michel Lang) (02/07/89)

I recall some time ago there was a discussion of some kind of
variation on grep that could be used to show, e.g.,
all lines in a file containing "foo" occurring within, say,
6 lines of a line containing "bar".

Is there anything available that will do this?
I think I can use awk to do a watered-down version of this,
but if it's already available somewhere, I'd love to see it!

Many thanks.
----------------------------------------------------------------------------
Francois-Michel Lang
Paoli Research Center, Unisys Corporation lang@prc.unisys.com (215) 648-7256
Dept of Comp & Info Science, U of PA      lang@cis.upenn.edu  (215) 898-9511

maart@cs.vu.nl (Maarten Litmaath) (02/08/89)

lang@pearl.PRC.Unisys.COM (Francois-Michel Lang) writes:
\I think I can use awk to do a watered-down version of this,

How about `sed'? I think Leo de Wit [a.k.a. Mr. Sed] will have some remarks
about `cgrep', but it's a start.

#! /bin/sh

# cgrep		- context grep
#
# grep pattern plus surrounding lines
# @(#)cgrep 1.1 89/01/18 Maarten Litmaath
#
# example: "cgrep -5 +0 '^{' *.c" shows all function headers in the C sources,
# with (max.) 5 lines preceding the `{', followed by 0 lines (no line will be
# printed twice; there might not be as many contextual lines as requested)
# only the names of files with matches are printed, followed by a line of
# `====='; each match is separated from the following by a line of `-----'
# cgrep can be used as a filter; the file name will be `STDIN'
# the `--' option can be used to `protect' a pattern starting with `-'

usage="Usage: cgrep [-<# before>] [+<# after>] [--] <regexp> [<files>]"

if [ $# = 0 ]
then
	echo "$usage"
	exit 1
fi

b=2
a=2

while :
do
	case $1 in
		--)
			shift
			break
			;;
		-*)
			b=`echo "x$1" | sed s/..//`
			shift
			;;
		+*)
			a=`echo "x$1" | sed s/..//`
			shift
			;;
		*)
			break
	esac
done

if [ $# = 0 ]
then
	echo "$usage"
	exit 1
fi

pattern=`echo "$1" | sed 's-/-\\\/-'`
shift

[ x$b = x ] && b=1
[ x$a = x ] && a=1

if [ $b = 0 ]
then
	branch=b
else
	case "$pattern" in
		\^*)
			pattern=`echo "$pattern" | sed 's/./\\\n/'`
	esac
fi

while [ $b -gt 0 ]
do
	before=".*\n$before"
	b=`expr $b - 1`
done
	
while [ $a -gt 0 ]
do
	after="n;p;$after"
	a=`expr $a - 1`
done

if [ $# = 0 ]
then
	files=STDIN
	set dummy
else
	files="$*"
	set dummy $*
fi

umask 077

for i in $files
do
	shift

	sed -n "
			/$pattern/{
				p
				$after
				a\\
					-----
				b
			}
			$branch
		: L0
			N
			/$pattern/{
				p
				$after
				a\\
					-----
				b
			}
			/$before.*/!b L0
		: L1
			s/[^\n]*\n//
			N
			/$pattern/{
				p
				$after
				a\\
					-----
				b
			}
			b L1
		" $1 > /tmp/cgrep.$$

	if [ -s /tmp/cgrep.$$ ]
	then
		echo $i
		echo =====
		cat /tmp/cgrep.$$
	fi
done

/bin/rm -f /tmp/cgrep.$$
-- 
 "I love it                            |Maarten Litmaath @ VU Amsterdam:
          when a plan comes together." |maart@cs.vu.nl, mcvax!botter!maart

jaw@eos.UUCP (James A. Woods) (02/09/89)

# "Information is any difference that makes a difference." -- Gregory Bateson

> I recall some time ago there was a discussion of some kind of
> variation on grep that could be used to show, e.g.,
> all lines in a file containing "foo" occurring within, say,
> 6 lines of a line containing "bar".
> 
> Is there anything available that will do this?

sure, GNU e?grep would handle this as:

	grep -6 bar file | grep foo

thank mike haertel.  it has been available via 'ftp' from prep.ai.mit.edu
for quite some time.

p.s. anyone using the original fast e?grep under the stead of 4.3-tahoe
should switch to GNU grep now.

rsalz@bbn.com (Rich Salz) (02/09/89)

= I recall some time ago there was a discussion of some kind of
= variation on grep that could be used to show, e.g.,
= all lines in a file containing "foo" occurring within, say,
= 6 lines of a line containing "bar".

In <2557@eos.UUCP> jaw@eos.UUCP (James A. Woods) writes:
=sure, GNU e?grep would handle this as:
=	grep -6 bar file | grep foo
=thank mike haertel.  it has been available via 'ftp' from prep.ai.mit.edu
=for quite some time.
I will be posting it in comp.sources.unix within a week.

James is too modest.

=p.s. anyone using the original fast e?grep under the stead of 4.3-tahoe
=should switch to GNU grep now.
Yes, indeedy, they surely should.

Look for it.
	/rich $alz
-- 
Please send comp.sources.unix-related mail to rsalz@uunet.uu.net.

poage@sunny.UUCP (Tom Poage) (02/10/89)

Speaking of these greppy things, does anybody know where
one might find a grammar for regular expressions?

Thanks!  Tom.
-- 
Tom Poage, UCDMC Clinical Engineering, Sacto., CA
poage@sunny.ucdavis.edu
...!ucbvax!ucdavis!sunny!poage

leo@philmds.UUCP (Leo de Wit) (02/11/89)

In article <2017@piraat.cs.vu.nl> maart@cs.vu.nl (Maarten Litmaath) writes:
|How about `sed'? I think Leo de Wit [a.k.a. Mr. Sed] will have some remarks
|about `cgrep', but it's a start.

Ah, you asked for it!
I looked at Maarten's script and modified some of it. The result is
appended below. Modifications:

1) Changed use of [ by case whenever possible, 'cause this is a builtin.
2) Changed while-test-expr loops to construct a repeated pattern ($before
and $after) by a yes|sed construct. Neat trick, if I may say so myself (8-).
3) Changed the sed expression, since it didn't handle well an appearance
of $pattern in the 'after pattern part' (was treated as an ordinary line).
4) Prepend each line output by <filename>:, this is perhaps more in the
style of grep. It also obviates the need of a temporary file. This however
is clearly a matter of taste.

Ad 2): if your system doesn't support yes (a never ending echo), you can
build it easily from echo and sed:

echo $*|sed '
: again
	p
	b again'

yes and sed also make for an easy range generator (this was also in this
newsgroup, where someone wanted to create ranges without using a C program)
($1 and $2 are respectively the first and last number to be generated):

yes|sed -n "$1,$2{;=;$2q;}"

     Leo.


P.S. The modified cgrep follows here:
-------------------------------
#! /bin/sh

# cgrep		- context grep
#
# grep pattern plus surrounding lines
# @(#)cgrep 1.1 89/01/18 Maarten Litmaath
#      modified 89/02/11 Leo de Wit
#
# example: "cgrep -5 +0 '^{' *.c" shows all function headers in the C sources,
# with (max.) 5 lines preceding the `{', followed by 0 lines (no line will be
# printed twice; there might not be as many contextual lines as requested)
# each line is prepended with the filename it appears in, followed by a colon.
# cgrep can be used as a filter; the file name will be `STDIN'
# the `--' option can be used to `protect' a pattern starting with `-'

usage="Usage: cgrep [-<# before>] [+<# after>] [--] <regexp> [<files>]"

case $# in 0) echo "$usage"; exit 1;; esac

b=2 a=2

while :
do
	case $1 in
		--)
			shift
			break
			;;
		-*)
			b=`echo "x$1" | sed s/..//`
			shift
			;;
		+*)
			a=`echo "x$1" | sed s/..//`
			shift
			;;
		*)
			break
	esac
done

case $# in 0) echo "$usage"; exit 1;; esac

pattern=`echo "$1" | sed 's-/-\\\/-'`
shift

case $b in "") b=1;; esac
case $a in "") a=1;; esac

case $b in
0) branch=b;;
*)
	case "$pattern" in
		\^*)
			pattern=`echo "$pattern" | sed 's/./\\\n/'`
	esac
	before=`yes "\n[^\n]*"|sed -n "
		: back
		$b{
			s/\n//g
			p
			q
		}
		N
		b back"`;;
esac

case $a in
0) after= ;;
*) after=`yes "\n[^\n]*"|sed -n "
	: back
	$a{
		s/\n//g
		p
		q
	}
	N
	b back"`;;
esac

case $# in
0) files=STDIN
   set dummy;;
*) files="$*"
   set dummy $*;;
esac

umask 077

for i in $files
do
	shift

# The hold space will hold (at most) b lines (if not in the pattern part)
# CLRHLD (Clear hold space) and ADJUST (trim number of lines in hold space)
# are 'in hold space context', NEXT is 'in pattern space context'.

	sed -n "/$pattern/{
			x
			/^\n/{
				s/^\n/$i:/
				s/\n/&$i:/g
				p
			}
			x
		: CLRHLD
			x
			s/.*//
		: NEXT
			x
			s/^/$i:/p
			n
			/$pattern/b CLRHLD
			H
			x
			/\n[^\n]*$after/!b NEXT
			a\\
				-----
			b ADJUST
		}
		H
		x
	: ADJUST
		s/.*\($before\)/\1/
		x
		" $1
done
-------------- ends here -------------