lm@slovax.Eng.Sun.COM (Larry McVoy) (05/09/91)
matthew@gizmo.UK.Sun.COM (Matthew Buller - Sun EHQ - MIS) writes: > problem: to extract text between start and end patterns in a file > eg:- > > file: > > pattern1--- > > stuff > stuff > stuff > > pattern2--- /bin/sh, usage shellscript start_pat stop_pat [files...] START=$1; shift STOP=$1; shift PRINT= cat $* | while read x do if [ "$x" = "$STOP" ] then exit 0; fi if [ "$x" = "$START" ] then PRINT=yes continue fi if [ X$PRINT != X ] then echo "$x"; fi done /bin/perl, same usage (see the notes on the ".." operator, cool thingy). $START = shift; $STOP = shift; while (<>) { if (/^$START$/../^$STOP/) { next if /^$START$/; # skip starting pattern last if /^$STOP/; # done if last; print; } } --- Larry McVoy, Sun Microsystems (415) 336-7627 ...!sun!lm or lm@sun.com
toma@swsrv1.cirr.com (Tom Armistead) (05/09/91)
In article <6686@male.EBay.Sun.COM> matthew@gizmo.UK.Sun.COM (Matthew Buller - Sun EHQ - MIS) writes: > >I am fairly new to unix, and I have a minor question:- >problem: to extract text between start and end patterns in a file >eg:- > >file: > >pattern1--- > >stuff >stuff >stuff > >pattern2--- > >How do I write a short script (preferably /bin/sh) to extract the information >between the start and end patterns (pattern1/pattern2) into a file. > >I have tried to grok the man page for `sed' but no luck. > >Any help would be appreciated. > >Tnx >Matt You could do this with sed. $ sed -n '/^pattern1---$/,/^pattern2---$/p' < data_file One problem with this is that it prints out the start and end parameters. You may be able to tell SED not to do this, but I don't know how. So I use egrep. $ sed -n '/^pattern1---$/,/^pattern2---$/p' < data_file | \ egrep -v '^pattern1---$|^pattern2---$' Tom -- Tom Armistead - Software Services - 2918 Dukeswood Dr. - Garland, Tx 75040 =========================================================================== toma@swsrv1.cirr.com {egsner,letni,ozdaltx,void}!swsrv1!toma
tchrist@convex.COM (Tom Christiansen) (05/09/91)
From the keyboard of lm@slovax.Eng.Sun.COM (Larry McVoy): :matthew@gizmo.UK.Sun.COM (Matthew Buller - Sun EHQ - MIS) writes: :> problem: to extract text between start and end patterns in a file :> eg:- :> :> file: :> :> pattern1--- :> :> stuff :> stuff :> stuff :> :> pattern2--- : :/bin/sh, usage shellscript start_pat stop_pat [files...] ug. A shell solution is obscene. :-) I don't know how to do it in sed. An awk solution would have made certain others happy, but wouldn't have been so nifty. > /bin/perl, same usage (see the notes on the ".." operator, cool thingy). But since we do happen to be on the perl topic... > $START = shift; > $STOP = shift; > while (<>) { > if (/^$START$/../^$STOP/) { > next if /^$START$/; # skip starting pattern > last if /^$STOP/; # done if last; > print; > } > } The following code should be faster because it's got fewer regexp compiles. The /o is to tell perl to compile the pattern only one. It also uses the fact that .. returns the sequence number, and that the last in the sequence has an E0 appended to it, for example making 144 be seen as 144E0, which is the same numerically, but you can do string or pattern operations on it. $START = shift; $STOP = shift; while (<>) { if ( $which = /^$START$/o .. /^$STOP$/o ) { next if $which == 1; last if $which =~ /E/; print; } } or maybe instead of the next/last pair of lines, just next if $which =~ /^1$|E/; if they want all instances in the stream extracted. --tom -- Tom Christiansen tchrist@convex.com convex!tchrist "So much mail, so little time."
lewis@tramp.Colorado.EDU (LEWIS WILLIAM M JR) (05/09/91)
In article <574@appserv.Eng.Sun.COM> lm@slovax.Eng.Sun.COM (Larry McVoy) writes: >matthew@gizmo.UK.Sun.COM (Matthew Buller - Sun EHQ - MIS) writes: >> problem: to extract text between start and end patterns in a file ... more problem description >/bin/sh, usage shellscript start_pat stop_pat [files...] > ... complex shell and perl programs to do sed -n '/pattern1/,/pattern2/p' source_file > new_file
bharat@computing-maths.cardiff.ac.uk (Bharat Mediratta) (05/09/91)
In article <1991May8.233803.4485@swsrv1.cirr.com> toma@swsrv1.cirr.com (Tom Armistead) writes: >In article <6686@male.EBay.Sun.COM> matthew@gizmo.UK.Sun.COM (Matthew Buller - Sun EHQ - MIS) writes: >> >>I am fairly new to unix, and I have a minor question:- >>problem: to extract text between start and end patterns in a file >>eg:- >> >>file: >> >>pattern1--- >> >>stuff >>stuff >>stuff >> >>pattern2--- > >You could do this with sed. > >$ sed -n '/^pattern1---$/,/^pattern2---$/p' < data_file > >One problem with this is that it prints out the start and end parameters. You >may be able to tell SED not to do this, but I don't know how. So I use egrep. > >$ sed -n '/^pattern1---$/,/^pattern2---$/p' < data_file | \ > egrep -v '^pattern1---$|^pattern2---$' Well, if the patterns only occur once in the file, here's a simple sed solution: sed -e '1,/^pattern1---$/d' -e '/^pattern2---$/,$d' < data_file As you can see, it deletes all the stuff up to (and including) the first pattern, and then all the stuff from the second pattern (inclusive) to the end of the file. If you have multiple recurrences of this in the file, you only get the first one. -- | Bharat Mediratta | JANET: bharat@cm.cf.ac.uk | +--------------------+ UUNET: bharat%cm.cf.ac.uk%cunyvm.cuny.edu@uunet.uucp | |On a clear disk... | uk.co: bharat%cm.cf.ac.uk%cunyvm.cuny.edu%uunet.uucp@ukc| |you can seek forever| UUCP: ...!uunet!cunym.cuny.edu!cm.cf.ac.uk!bharat |
tchrist@convex.COM (Tom Christiansen) (05/10/91)
From the keyboard of lewis@tramp.Colorado.EDU (LEWIS WILLIAM M JR): :In article <574@appserv.Eng.Sun.COM> lm@slovax.Eng.Sun.COM (Larry McVoy) writes: :>matthew@gizmo.UK.Sun.COM (Matthew Buller - Sun EHQ - MIS) writes: :>> problem: to extract text between start and end patterns in a file :... more problem description :>/bin/sh, usage shellscript start_pat stop_pat [files...] :> :... complex shell and perl programs to do : : sed -n '/pattern1/,/pattern2/p' source_file > new_file nope -- you included the endpoints. i didn't see the original posting, so i don't know whether it's possibly to have multiple sets of /pat1/,/pat2/ areas in the file. if so, the 1,/pat1/d /pat2,$d posting i just saw won't work. followups have been redirected. this isn't particularly wizardly. --tom -- Tom Christiansen tchrist@convex.com convex!tchrist "So much mail, so little time."
lwall@jpl-devvax.jpl.nasa.gov (Larry Wall) (05/10/91)
In article <1991May9.153351.1754@colorado.edu> lewis@tramp.Colorado.EDU (LEWIS WILLIAM M JR) writes: : In article <574@appserv.Eng.Sun.COM> lm@slovax.Eng.Sun.COM (Larry McVoy) writes: : >matthew@gizmo.UK.Sun.COM (Matthew Buller - Sun EHQ - MIS) writes: : >> problem: to extract text between start and end patterns in a file : ... more problem description : >/bin/sh, usage shellscript start_pat stop_pat [files...] : > : ... complex shell and perl programs to do : : sed -n '/pattern1/,/pattern2/p' source_file > new_file No, that's not what those programs were trying to do. (Admittedly, the original spec was unclear.) The other programs were attempting to omit the endpoints, taking "between" to mean exclusion of said endpoints. Some of them were also trying to snab only the text between the first pair of patterns. Some were allowing for the patterns to be passed in as arguments. Here's the perl equivalent of what you said: perl -ne 'print if /pattern1/../pattern2/' source_file >new_file When using Perl to do the other thing, I personally prefer a straightforward approach: #!/usr/bin/perl while (<>) { last if /pattern1/; } while (<>) { exit if /pattern2/; print; } For hardwired patterns this will generally beat sed. (Especially if sed is stupid enough to read the rest of the input file.) Parameterized patterns can get the same performance using eval: #!/usr/bin/perl $pattern1 = shift; $pattern2 = shift; eval <<"END"; while (<>) { last if /$pattern1/; } while (<>) { exit if /$pattern2/; print; } END Larry Wall lwall@netlabs.com
marc@mercutio.ultra.com (Marc Kwiatkowski {Host Software-AIX}) (05/16/91)
In article <1991May9.185503.325@jpl-devvax.jpl.nasa.gov> lwall@jpl-devvax.jpl.nasa.gov (Larry Wall) writes: > In article <1991May9.153351.1754@colorado.edu> lewis@tramp.Colorado.EDU (LEWIS WILLIAM M JR) writes: > : In article <574@appserv.Eng.Sun.COM> lm@slovax.Eng.Sun.COM (Larry McVoy) writes: > : >matthew@gizmo.UK.Sun.COM (Matthew Buller - Sun EHQ - MIS) writes: > : >> problem: to extract text between start and end patterns in a file > : ... more problem description > : >/bin/sh, usage shellscript start_pat stop_pat [files...] > : > > : ... complex shell and perl programs to do > : > : sed -n '/pattern1/,/pattern2/p' source_file > new_file > Here's the perl equivalent of what you said: > > perl -ne 'print if /pattern1/../pattern2/' source_file >new_file > > When using Perl to do the other thing, I personally prefer a straightforward > approach: Ahh. In grand c.u.w tradition the lesser sin of a non-wizardly question is met with the greater sin of slightly-correct to downright wrong answers. I know this isn't the right newsgroup, but the posters question hasn't been answered. The sed suggestions are all wet. The perl one will work and in terms of execution and readability is probably the best, but the original poster stated that he preferred an answer for /bin/sh. I am surprised noone suggested something like the following answer: cat foo | sed -n ' :lbl00 /pattern00/ { :lbl01 n /pattern01/ { b lbl00 } p b lbl01 }' The above will filter multiple instances of /pattern00/..../pattern01/. If only one is desired, replace 'b lbl00' with 'q'. Note follow-up. sed, a utility more sinned against than sinning. -- ------------------------------------------------------------------ Marc P. Kwiatkowski Ultra Network Technologies Internet: marc@ultra.com 101 Daggett Drive uucp: ...!ames!ultra!marc San Jose, CA 95134 USA telephone: 408 922 0100 x249 Ignore the following signature. -- ------------------------------------------------------------------ Marc P. Kwiatkowski Ultra Network Technologies Internet: marc@ultra.com 101 Daggett Drive uucp: ...!ames!ultra!marc San Jose, CA 95134 USA