lacey@batcomputer.tn.cornell.edu (John Lacey) (07/17/89)
As regards the question of finding paragraphs in text which
contain a particular word, I sent the following reply directly
to the asker of the question. But then I saw the reply that no Unix
utility could handle this, and I have to disagree. Awk will handle
this case with no problem. Certainly the Awk solution is much nicer
than the previous proposal.
----------------
Awk is what you want in this case. Try something like this:
awk 'BEGIN { FS = ""; RS = "\n"} /the-word-here/' the-filename-here
Awk is a series of pattern-action pairs. Whenever text matching the pattern
is recognized, the associated action is taken. BEGIN is a special action
that matches exactly once, before the input file is read. END is the
related pattern for after a file has reached EOF.
FS is the field separator, RS is the record separator. So, we set RS to
a newline to make each paragraph (separated by a blank line) a different
record. Then, we search for the word in question. Patterns in Awk are
egrep-type regular expressions, bounded by /'s. I left off the action,
to save space. Any missing action is taken to be a print-the-record.
You can do this explicitly with a print command.
Awk is a lovely language. I write a lot of one liners like this, and
I also use it to write reasonably large applications (including a small
relational database).
If you don't have awk documentation around, there is a book by Aho,
Kernighan, and Weinberger (A, W, K) called, appropriately, the
AWK Programming Language, that explains the whole thing.
Good luck, and cheers,
--
John Lacey | Internet: lacey@tcgould.tn.cornell.edu
running unattached | BITnet: lacey@crnlthry
| UUCP: cornell!batcomputer!lacey
"Whereof one cannot speak, thereof one must remain silent." ---Wittgenstein
--
John Lacey | Internet: lacey@tcgould.tn.cornell.edu
running unattached | BITnet: lacey@crnlthry
| UUCP: cornell!batcomputer!lacey
"Whereof one cannot speak, thereof one must remain silent." ---Wittgensteinip@me.utoronto.ca (Bevis Ip) (07/18/89)
In article <10545@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes: >In article <8421@batcomputer.tn.cornell.edu> lacey@tcgould.tn.cornell.edu (John Lacey) writes: >>Awk is what you want in this case. Try something like this: >> awk 'BEGIN { FS = ""; RS = "\n"} /the-word-here/' the-filename-here That is a problem. But, Doug, I'm kind of surprised at the comment you made in your last article though. Try this, it is simplified from something that I wrote to search bibliographies without having to use indxbib. I haven't check but I think the hold buffer in sed is dynamically expended, so paragraph size might not be a problem in most implementations. bevis [ To the original poster: I wasn't too careful when I cut my script to you; here's the correct one. ] -------- #!/bin/sh for i do SEARCH="$SEARCH -e /$i/!b" done sed -n -e '/^$/b gotcha' -e H -e '$b gotcha' -e b \ -e :gotcha -e x $SEARCH -e p -- Bevis Ip <> ip@me.toronto.edu, ip@me.utoronto.ca University of Toronto <> {pyramid,uunet}!utai!me!ip Mechanical Engineering <> {allegra,decwrl}!utcsri!me!ip
gwyn@smoke.BRL.MIL (Doug Gwyn) (07/18/89)
In article <8421@batcomputer.tn.cornell.edu> lacey@tcgould.tn.cornell.edu (John Lacey) writes: >Awk is what you want in this case. Try something like this: > awk 'BEGIN { FS = ""; RS = "\n"} /the-word-here/' the-filename-here $ awk 'BEGIN { FS = ""; RS = "\n"} /test/' > foo This isn't it. Try again. The requirement is to print the whole paragraph. This is a test. End of paragraph. Done. ^D $ cat foo This is a test. $
lacey@batcomputer.tn.cornell.edu (John Lacey) (07/18/89)
In article <10545@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes: >In article <8421@batcomputer.tn.cornell.edu> lacey@tcgould.tn.cornell.edu (John Lacey) writes: >>Awk is what you want in this case. Try something like this: >> awk 'BEGIN { FS = ""; RS = "\n"} /the-word-here/' the-filename-here > > [A nice demonstration that this doesn't work.] Yes, I mistyped the line. It should be { FS = "\n"; RS = "" }. I _did_ say "something like this" :-). But, hey, Doug, why didn't you just post a fix to this? [ :-) ] It's a simple typo, and you took the time to test it and post a demonstration that it didn't work. Anyhows, switching the values of the field and record separators will fix this "code" right up. Bythe, the default values for Awk are FS = " ", and RS = "\n". Cheers, -- John Lacey | Internet: lacey@tcgould.tn.cornell.edu running unattached | BITnet: lacey@crnlthry | UUCP: cornell!batcomputer!lacey "Whereof one cannot speak, thereof one must remain silent." ---Wittgenstein
gwyn@smoke.BRL.MIL (Doug Gwyn) (07/19/89)
In article <89Jul17.172655edt.19593@me.utoronto.ca> ip@me.utoronto.ca (Bevis Ip) writes: >But, Doug, I'm kind of surprised at the comment you made ... Frankly, I didn't realize there was any way to get "sed" to do this task. It's more programmable than I thought..