[comp.unix.questions] Help a novice: Will "sed" do?

rostamia@umbc3.UMBC.EDU (Dr. Rouben Rostamian) (07/17/89)

I need a command or a script that searches a text file for a given
word or pattern and prints out all paragraphs that contain that word
or pattern.  Paragraphs are blocks of text separated by one or 
more blank lines.

I am new to unix so I don't even know where to look.  Sed seems to be 
promising but I have not been able to get it do this.  Any
hints or suggestions?  Maybe another utility? 

Thanks.

-- 
Rouben Rostamian
Department of Mathematics                      e-mail:
University of Maryland Baltimore Counnty       Rostamian@umbc2.bitnet
Baltimore, MD 21228                            rostamia@umbc3.umbc.edu

gwyn@smoke.BRL.MIL (Doug Gwyn) (07/17/89)

In article <2180@umbc3.UMBC.EDU> rostamia@umbc3.UMBC.EDU (Dr. Rouben Rostamian) writes:
>I need a command or a script that searches a text file for a given
>word or pattern and prints out all paragraphs that contain that word
>or pattern.  Paragraphs are blocks of text separated by one or 
>more blank lines.

It's pretty hard to do this with standard UNIX text-file utilities,
because most of them work on a line-at-a-time basis.  That means when
you find the pattern, it's too late to output the preceding lines.

A reasonable text editor (e.g. "sam") can do the job, along these lines:
	for entire file
		find next occurrence of pattern from current location
						forward to end of file
		if none
			done
		else
			find nearest blank line from current location
					backward to beginning of file
			if none
				move to beginning of file
			mark location
			find next blank line from current location
						forward to end of file
			if none
				move to end of file
			print lines from mark to current location

The details of how to write such an editor script depend on the editor.

maart@cs.vu.nl (Maarten Litmaath) (07/18/89)

rostamia@umbc3.UMBC.EDU (Dr. Rouben Rostamian) writes:
\I need a command or a script that searches a text file for a given
\word or pattern and prints out all paragraphs that contain that word
\or pattern.  Paragraphs are blocks of text separated by one or 
\more blank lines.

----------8<----------8<----------8<----------8<----------8<----------
#!/bin/sh

test $# = 0 && {
	echo "Usage: `basename $0` <pattern> [<files>]" >&2
	exit 1
}

pattern=`echo "$1" | sed 's-/-\\\\/-g'`		# quote slashes in the pattern
shift

tab="`ctrl I`"		# if you don't have `ctrl': tab="	" (a hard tab)
empty="^[ $tab]*$"	# the pattern of an `empty' line

SED="
: gap
	/$empty/!b para
: gap1
	n
	b gap
: para
	/$pattern/b found
	H
	n
	/$empty/!b para
: cleanup
	s/.*//
	x
	b gap1
: found
	H
	\${
		g
		p
	}
	n
	/$empty/!b found
	g
	p
	b cleanup
"

sed -n "$SED" $*
----------8<----------8<----------8<----------8<----------8<----------
-- 
   "... a lap-top Cray-2 with builtin    |Maarten Litmaath @ VU Amsterdam:
cold fusion power supply"  (Colin Dente) |maart@cs.vu.nl, mcvax!botter!maart

donn@mcgp1.UUCP (Donn F Pedro) (07/18/89)

In article <2180@umbc3.UMBC.EDU> rostamia@umbc3.UMBC.EDU (Dr. Rouben Rostamian) writes:
>I need a command or a script that searches a text file for a given
>word or pattern and prints out all paragraphs that contain that word
>or pattern.  Paragraphs are blocks of text separated by one or 
>more blank lines.

Try this: 

		nawk ' BEGIN {
		             RS=""
		             ORS="\n\n"
			     }
		     /pattern/
		     ' file

Works for me on:

		UNIX system V release 3.1.1


	Donn F Pedro ....................a.k.a. mcgp1!donn@Thalatta.COM   
           else:  {the known world}!uunet!nwnexus!thebes!mcgp1!donn 
       ----------------------------------------------------------------	
		You talk the talk.  Do you walk the walk?

ted@eslvcr.UUCP (Ted Powell) (07/20/89)

In article <10540@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes:
>In article <2180@umbc3.UMBC.EDU> rostamia@umbc3.UMBC.EDU (Dr. Rouben Rostamian) writes:
>>I need a command or a script that searches a text file for a given
>>word or pattern and prints out all paragraphs that contain that word
>>or pattern.  Paragraphs are blocks of text separated by one or 
>>more blank lines.
>
>It's pretty hard to do this with standard UNIX text-file utilities,
>because most of them work on a line-at-a-time basis.  That means when
>you find the pattern, it's too late to output the preceding lines.

Use awk! See section 3.4 Multiline Records in:
	The AWK Programming Language
	Aho, Alfred V., Kernighan, Brian W., Weinberger, Peter J.
	Addison-Wesley Series in Computer Science
	ISBN 0-201-07981-X

Given input data with paragraphs separated by blank lines, the following 
passes those paragraphs containing "New York" (example taken from page 83):
	... | awk 'BEGIN { RS = ""; ORS = "\n\n" }; /New York/' | ...

Setting RS (input Record Separator) to null makes awk take everything between
successive blank lines as a record. Setting ORS (Output Record Separator) to 
two newlines gives a blank line between output records. The example could also
be done as:
	awk '
	BEGIN { RS = ""; ORS = "\n\n" }
	/New York/
	' input-file >output-file
or the program can be hidden away in a file ( -f progfile ).

Patterns can be _very_ complex, and you can have multiple patterns with 
corresponding actions. In the example, the action is unspecified, and defaults
to outputting the current record. If you don't have access to the book, see the
man page. Note that in SVR3/386 (and possibly other flavours) there is AWK(1)
and NAWK(1) (New AWK). (Old awk is being kept around for a while, presumably
for compatibility reasons.) The book corresponds to NAWK(1). At least in
SVR3/386, awk/nawk come with the basic system. If you haven't ever used awk,
give it a try. If you haven't read the book, check it out -- it has all kinds
of useful examples in a surprisingly wide range of fields.