lacey@batcomputer.tn.cornell.edu (John Lacey) (07/17/89)
As regards the question of finding paragraphs in text which contain a particular word, I sent the following reply directly to the asker of the question. But then I saw the reply that no Unix utility could handle this, and I have to disagree. Awk will handle this case with no problem. Certainly the Awk solution is much nicer than the previous proposal. ---------------- Awk is what you want in this case. Try something like this: awk 'BEGIN { FS = ""; RS = "\n"} /the-word-here/' the-filename-here Awk is a series of pattern-action pairs. Whenever text matching the pattern is recognized, the associated action is taken. BEGIN is a special action that matches exactly once, before the input file is read. END is the related pattern for after a file has reached EOF. FS is the field separator, RS is the record separator. So, we set RS to a newline to make each paragraph (separated by a blank line) a different record. Then, we search for the word in question. Patterns in Awk are egrep-type regular expressions, bounded by /'s. I left off the action, to save space. Any missing action is taken to be a print-the-record. You can do this explicitly with a print command. Awk is a lovely language. I write a lot of one liners like this, and I also use it to write reasonably large applications (including a small relational database). If you don't have awk documentation around, there is a book by Aho, Kernighan, and Weinberger (A, W, K) called, appropriately, the AWK Programming Language, that explains the whole thing. Good luck, and cheers, -- John Lacey | Internet: lacey@tcgould.tn.cornell.edu running unattached | BITnet: lacey@crnlthry | UUCP: cornell!batcomputer!lacey "Whereof one cannot speak, thereof one must remain silent." ---Wittgenstein -- John Lacey | Internet: lacey@tcgould.tn.cornell.edu running unattached | BITnet: lacey@crnlthry | UUCP: cornell!batcomputer!lacey "Whereof one cannot speak, thereof one must remain silent." ---Wittgenstein
ip@me.utoronto.ca (Bevis Ip) (07/18/89)
In article <10545@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes: >In article <8421@batcomputer.tn.cornell.edu> lacey@tcgould.tn.cornell.edu (John Lacey) writes: >>Awk is what you want in this case. Try something like this: >> awk 'BEGIN { FS = ""; RS = "\n"} /the-word-here/' the-filename-here That is a problem. But, Doug, I'm kind of surprised at the comment you made in your last article though. Try this, it is simplified from something that I wrote to search bibliographies without having to use indxbib. I haven't check but I think the hold buffer in sed is dynamically expended, so paragraph size might not be a problem in most implementations. bevis [ To the original poster: I wasn't too careful when I cut my script to you; here's the correct one. ] -------- #!/bin/sh for i do SEARCH="$SEARCH -e /$i/!b" done sed -n -e '/^$/b gotcha' -e H -e '$b gotcha' -e b \ -e :gotcha -e x $SEARCH -e p -- Bevis Ip <> ip@me.toronto.edu, ip@me.utoronto.ca University of Toronto <> {pyramid,uunet}!utai!me!ip Mechanical Engineering <> {allegra,decwrl}!utcsri!me!ip
gwyn@smoke.BRL.MIL (Doug Gwyn) (07/18/89)
In article <8421@batcomputer.tn.cornell.edu> lacey@tcgould.tn.cornell.edu (John Lacey) writes: >Awk is what you want in this case. Try something like this: > awk 'BEGIN { FS = ""; RS = "\n"} /the-word-here/' the-filename-here $ awk 'BEGIN { FS = ""; RS = "\n"} /test/' > foo This isn't it. Try again. The requirement is to print the whole paragraph. This is a test. End of paragraph. Done. ^D $ cat foo This is a test. $
lacey@batcomputer.tn.cornell.edu (John Lacey) (07/18/89)
In article <10545@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes: >In article <8421@batcomputer.tn.cornell.edu> lacey@tcgould.tn.cornell.edu (John Lacey) writes: >>Awk is what you want in this case. Try something like this: >> awk 'BEGIN { FS = ""; RS = "\n"} /the-word-here/' the-filename-here > > [A nice demonstration that this doesn't work.] Yes, I mistyped the line. It should be { FS = "\n"; RS = "" }. I _did_ say "something like this" :-). But, hey, Doug, why didn't you just post a fix to this? [ :-) ] It's a simple typo, and you took the time to test it and post a demonstration that it didn't work. Anyhows, switching the values of the field and record separators will fix this "code" right up. Bythe, the default values for Awk are FS = " ", and RS = "\n". Cheers, -- John Lacey | Internet: lacey@tcgould.tn.cornell.edu running unattached | BITnet: lacey@crnlthry | UUCP: cornell!batcomputer!lacey "Whereof one cannot speak, thereof one must remain silent." ---Wittgenstein
gwyn@smoke.BRL.MIL (Doug Gwyn) (07/19/89)
In article <89Jul17.172655edt.19593@me.utoronto.ca> ip@me.utoronto.ca (Bevis Ip) writes: >But, Doug, I'm kind of surprised at the comment you made ... Frankly, I didn't realize there was any way to get "sed" to do this task. It's more programmable than I thought..