rostamia@umbc3.UMBC.EDU (Dr. Rouben Rostamian) (07/17/89)
I need a command or a script that searches a text file for a given word or pattern and prints out all paragraphs that contain that word or pattern. Paragraphs are blocks of text separated by one or more blank lines. I am new to unix so I don't even know where to look. Sed seems to be promising but I have not been able to get it do this. Any hints or suggestions? Maybe another utility? Thanks. -- Rouben Rostamian Department of Mathematics e-mail: University of Maryland Baltimore Counnty Rostamian@umbc2.bitnet Baltimore, MD 21228 rostamia@umbc3.umbc.edu
gwyn@smoke.BRL.MIL (Doug Gwyn) (07/17/89)
In article <2180@umbc3.UMBC.EDU> rostamia@umbc3.UMBC.EDU (Dr. Rouben Rostamian) writes: >I need a command or a script that searches a text file for a given >word or pattern and prints out all paragraphs that contain that word >or pattern. Paragraphs are blocks of text separated by one or >more blank lines. It's pretty hard to do this with standard UNIX text-file utilities, because most of them work on a line-at-a-time basis. That means when you find the pattern, it's too late to output the preceding lines. A reasonable text editor (e.g. "sam") can do the job, along these lines: for entire file find next occurrence of pattern from current location forward to end of file if none done else find nearest blank line from current location backward to beginning of file if none move to beginning of file mark location find next blank line from current location forward to end of file if none move to end of file print lines from mark to current location The details of how to write such an editor script depend on the editor.
maart@cs.vu.nl (Maarten Litmaath) (07/18/89)
rostamia@umbc3.UMBC.EDU (Dr. Rouben Rostamian) writes: \I need a command or a script that searches a text file for a given \word or pattern and prints out all paragraphs that contain that word \or pattern. Paragraphs are blocks of text separated by one or \more blank lines. ----------8<----------8<----------8<----------8<----------8<---------- #!/bin/sh test $# = 0 && { echo "Usage: `basename $0` <pattern> [<files>]" >&2 exit 1 } pattern=`echo "$1" | sed 's-/-\\\\/-g'` # quote slashes in the pattern shift tab="`ctrl I`" # if you don't have `ctrl': tab=" " (a hard tab) empty="^[ $tab]*$" # the pattern of an `empty' line SED=" : gap /$empty/!b para : gap1 n b gap : para /$pattern/b found H n /$empty/!b para : cleanup s/.*// x b gap1 : found H \${ g p } n /$empty/!b found g p b cleanup " sed -n "$SED" $* ----------8<----------8<----------8<----------8<----------8<---------- -- "... a lap-top Cray-2 with builtin |Maarten Litmaath @ VU Amsterdam: cold fusion power supply" (Colin Dente) |maart@cs.vu.nl, mcvax!botter!maart
donn@mcgp1.UUCP (Donn F Pedro) (07/18/89)
In article <2180@umbc3.UMBC.EDU> rostamia@umbc3.UMBC.EDU (Dr. Rouben Rostamian) writes: >I need a command or a script that searches a text file for a given >word or pattern and prints out all paragraphs that contain that word >or pattern. Paragraphs are blocks of text separated by one or >more blank lines. Try this: nawk ' BEGIN { RS="" ORS="\n\n" } /pattern/ ' file Works for me on: UNIX system V release 3.1.1 Donn F Pedro ....................a.k.a. mcgp1!donn@Thalatta.COM else: {the known world}!uunet!nwnexus!thebes!mcgp1!donn ---------------------------------------------------------------- You talk the talk. Do you walk the walk?
ted@eslvcr.UUCP (Ted Powell) (07/20/89)
In article <10540@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes: >In article <2180@umbc3.UMBC.EDU> rostamia@umbc3.UMBC.EDU (Dr. Rouben Rostamian) writes: >>I need a command or a script that searches a text file for a given >>word or pattern and prints out all paragraphs that contain that word >>or pattern. Paragraphs are blocks of text separated by one or >>more blank lines. > >It's pretty hard to do this with standard UNIX text-file utilities, >because most of them work on a line-at-a-time basis. That means when >you find the pattern, it's too late to output the preceding lines. Use awk! See section 3.4 Multiline Records in: The AWK Programming Language Aho, Alfred V., Kernighan, Brian W., Weinberger, Peter J. Addison-Wesley Series in Computer Science ISBN 0-201-07981-X Given input data with paragraphs separated by blank lines, the following passes those paragraphs containing "New York" (example taken from page 83): ... | awk 'BEGIN { RS = ""; ORS = "\n\n" }; /New York/' | ... Setting RS (input Record Separator) to null makes awk take everything between successive blank lines as a record. Setting ORS (Output Record Separator) to two newlines gives a blank line between output records. The example could also be done as: awk ' BEGIN { RS = ""; ORS = "\n\n" } /New York/ ' input-file >output-file or the program can be hidden away in a file ( -f progfile ). Patterns can be _very_ complex, and you can have multiple patterns with corresponding actions. In the example, the action is unspecified, and defaults to outputting the current record. If you don't have access to the book, see the man page. Note that in SVR3/386 (and possibly other flavours) there is AWK(1) and NAWK(1) (New AWK). (Old awk is being kept around for a while, presumably for compatibility reasons.) The book corresponds to NAWK(1). At least in SVR3/386, awk/nawk come with the basic system. If you haven't ever used awk, give it a try. If you haven't read the book, check it out -- it has all kinds of useful examples in a surprisingly wide range of fields.