[comp.editors] sed output

dattier@ddsw1.MCS.COM (David W. Tamkin) (02/21/91)

Hmmm...this is knotty to explain...  I'm running sed in a for loop (in sh)
on several input files.  Let's call them A, B, C, D, and E.  The sed script
should send output to three places.  If I made a separate output file for
each one, I'd finish with fifteen output files: Ax, Ay, Az, Bx, etc., through
Ez.  What I really want are a single x file, y file, and z file.

When a sed script begins, sed scans for w operators and creates all target
files needed.  If files by those names already exist, tough; the old contents
are clobbered.  So if I use "w x" and "w y" in the script and redirect sed's
stdout >>z, only z will have the proper contents; x and y will have only the
extractions from input file E.

I can use "w $i.x" and "w $i.y" (with proper quoting for the loop variable to
be expanded) and then cat [A-E].x > x and cat [A-E].y > y afterward, but I'd
rather lengthen the sed script than add the extra forks.  Yes, I'm sure it
can all be done in awk or perl, but I don't know them, don't do this stuff
for a living, and certainly don't have the means or time or talent to learn
awk or perl.

Can sed be prevented from clobbering a file named in a w command?

David Tamkin  Box 7002  Des Plaines IL  60018-7002  708 518 6769  312 693 0591
MCI Mail: 426-1818  GEnie: D.W.TAMKIN  CIS: 73720,1570   dattier@ddsw1.mcs.com

les@chinet.chi.il.us (Leslie Mikesell) (02/26/91)

In article <1991Feb20.234921.5738@ddsw1.MCS.COM> dattier@ddsw1.MCS.COM (David W. Tamkin) writes:
>Hmmm...this is knotty to explain...  I'm running sed in a for loop (in sh)
>on several input files.  Let's call them A, B, C, D, and E.  The sed script
>should send output to three places.  If I made a separate output file for
>each one, I'd finish with fifteen output files: Ax, Ay, Az, Bx, etc., through
>Ez.  What I really want are a single x file, y file, and z file.

>When a sed script begins, sed scans for w operators and creates all target
>files needed.  If files by those names already exist, tough; the old contents
>are clobbered.  So if I use "w x" and "w y" in the script and redirect sed's
>stdout >>z, only z will have the proper contents; x and y will have only the
>extractions from input file E.

Simple suggestion:
If the input files all exist at once, let sed iterate over the list of
filenames instead of the shell.  Multiple w's within a script will
append.

>I can use "w $i.x" and "w $i.y" (with proper quoting for the loop variable to
>be expanded) and then cat [A-E].x > x and cat [A-E].y > y afterward, but I'd
>rather lengthen the sed script than add the extra forks.  Yes, I'm sure it
>can all be done in awk or perl, but I don't know them, don't do this stuff
>for a living, and certainly don't have the means or time or talent to learn
>awk or perl.

Ahh, but with perl, you typically only have to learn one arcane set of
commands and syntax instead of the several you need to know to write
useful shell scripts.   If you really care about reducing the number of
forks (at the expense of some start-up time), perl is the way to go.

>Can sed be prevented from clobbering a file named in a w command?

Probably not, but there is still a simple alternative.  Just do the
editting portion of the script with ex.  You probably already know
the appropriate syntax and it supports the w >>file command to explicitly
append.  As an added benifit you also get the ability to specify an
address an arbitrary number of lines above a pattern match and some
other things that are impossible in sed (at the expense of making a
temporary copy of the file).

Les Mikesell
  les@chinet.chi.il.us

dattier@ddsw1.MCS.COM (David W. Tamkin) (02/28/91)

les@chinet.chi.il.us (Leslie Mikesell) wrote in
<1991Feb25.184548.14778@chinet.chi.il.us>:

| In article <1991Feb20.234921.5738@ddsw1.MCS.COM> dattier@ddsw1.MCS.COM
| (David W. Tamkin) writes:

| >Hmmm...this is knotty to explain...  I'm running sed in a for loop (in sh)
| >on several input files.  Let's call them A, B, C, D, and E.  The sed script
| >should send output to three places.  If I made a separate output file for
| >each one, I'd finish with fifteen output files: Ax, Ay, Az, Bx, etc., through
| >Ez.  What I really want are a single x file, y file, and z file.

| >When a sed script begins, sed scans for w operators and creates all target
| >files needed.  If files by those names already exist, tough; the old contents
| >are clobbered.  So if I use "w x" and "w y" in the script and redirect sed's
| >stdout >>z, only z will have the proper contents; x and y will have only the
| >extractions from input file E.

| If the input files all exist at once, let sed iterate over the list of
| filenames instead of the shell.  Multiple w's within a script will append.

Les, what I'm guessing you mean here is that I should use
sed -f scriptfile infile1 infile2 ... > outfile         instead of
for i in infile1 infile2 ... ; do sed -f scriptfile $i ; done >> outfile

I had two problems with that approach: first, I didn't know how to make the sed
script realize that it was starting on a new input file and thus should operate on the last line of each input file in a particular way, given that it was
last, and operate on the first line of every input file as I instructed it
for the first line of the entire input.  Ideally, yes, that would have saved
forking sed again for each file of the input, but sed has no way to address
"last line before you start taking input from somewhere else" nor "first line
of a new input source."  It acts as if you had catted them together first.

Second, one of the files contains text before the sed even starts.  I could
use it as >>stdout (file z in my explanation) except for one thing: it's the
one that receives output of a single w per pass.  The other file (there
really are only two, not three; I was illustrating before) gets the output of
all the p commands plus one P command.  Yes, I could interchange the p's with
the w, but how do I make a w out of the P without clobbering the hold space?
Plus, at the time I first asked, the script had print-at-end-of-cycle as the
default -- no, that's an easy one: just change all the b's to bz, stick #n at
the top or -n on the command line, and end the script with
d
:z
w filename
which I didn't do anyway, because the script in its final form was run with
sed -n and the b's became p's (or sometimes p ; d to avoid dropping down into
commands that would mess up the hold space).  Actually, it's not all that
easy, because I had a lot of n's too.  Without -n in effect, interchanging
stdout and wfile would have meant changing a simple n to
w filename
N
s/.*\n//
to avoid getting shot back to the top of the sed script, as D would do.

Forking sed separately for each input file means clobbering whichever output
file gets the honor of being the w destination.  I suppose that I could just
have come up with a bunch of wfiles and concatenated the lot together
afterwards.  As it turns out I did find a way to spot the first line of each
input file and apply the same commands to it and the lines that followed as
to the corresponding lines at the beginning of the first input file, and thus
run sed in only one pass.  The single line per input file that needed to be
appended to the existing file I w'ed to a fifo to be catted >>the existing
file.  System V doesn't let sed clobber fifos, fortunately.

| Ahh, but with perl, you typically only have to learn one arcane set of
| commands and syntax instead of the several you need to know to write
| useful shell scripts.   If you really care about reducing the number of
| forks (at the expense of some start-up time), perl is the way to go.

Yes, I knew that.  It probably doesn't take all that many forks of smaller
programs before there's more overhead than for one fork of perl.  Are you
volunteering to teach me, Les?  You know where I live.

| Probably not, but there is still a simple alternative.  Just do the
| editting portion of the script with ex.  You probably already know
| the appropriate syntax and it supports the w >>file command to explicitly
| append.  As an added benifit you also get the ability to specify an
| address an arbitrary number of lines above a pattern match and some
| other things that are impossible in sed (at the expense of making a
| temporary copy of the file).

That's not so simple for this application.  There are a number of things I'm
doing in the sed script that would be really bearish in ex; the first that
comes to mind is how to resume the editing from the beginning when the next
input file is ready.  (Surely I shouldn't have to change the script to have N
copies of itself when the input contains N files.)  That brings us back to
having the shell, not the editor, do the cycling and thus forking the editor
again and again.  The second is that one thing I'm doing involves handling
the case of a pattern that doesn't show up by h'ing it when it arrives and
then, after its last chance to appear, g'ing and checking for its presence. 
In ex I'd have to do a pattern search for it, and if it isn't there, the ex
script will break; moreover, if I am trying to ex the entire input in one
fork, a pattern search while I'm working on a portion that is missing the
pattern will land on its occurrence in another portion, and then KABOOM.

This is the sort of thing that awk and perl are for.  I'm trying to cut down
a sapling with a carving knife.  It's better than a butterknife, and I don't
need a chainsaw, but a regular saw would be very nice to have use of.

David Tamkin  Box 7002  Des Plaines IL  60018-7002  708 518 6769  312 693 0591
dattier@ddsw1.mcs.com   MCI Mail: 426-1818  CIS: 73720,1570  GEnie: D.W.TAMKIN

les@chinet.chi.il.us (Leslie Mikesell) (03/02/91)

In article <1991Feb27.172236.7202@ddsw1.MCS.COM> dattier@ddsw1.MCS.COM (David W. Tamkin) writes:
[about perl]
>Yes, I knew that.  It probably doesn't take all that many forks of smaller
>programs before there's more overhead than for one fork of perl.  Are you
>volunteering to teach me, Les?  You know where I live.

I'm hardly an expert.  I just get by with two terminals on my desk so
when I get stuck I can experiment on the 2nd one without losing my
place in the problem.  Perl has the great advantage of having a usable
debugger as opposed to sed's "garbled command" or awk's "bailing out
near line _" error message (singular).  I don't object to questions
by email but I can't guarantee how long it will take to respond...

[use ex instead]
>That's not so simple for this application.  There are a number of things I'm
>doing in the sed script that would be really bearish in ex;

On the contrary - ex handles them just fine.  Sed just barely does them.

>the first that
>comes to mind is how to resume the editing from the beginning when the next
>input file is ready.  (Surely I shouldn't have to change the script to have N
>copies of itself when the input contains N files.) 

With ex, you must use an explicit 'n' to go to the next file so you know
when it is going to happen.  Duplicating the command script is trivial
once you have it right and you can carry text between files in the
named registers.  Back in the days before perl I would have done something
like this by having the shell generate the ex script with the proper
number of iterations, then execute it.  Since your main script can source
another file it can just consist of:
so real_script
n!
so real_script
...

>That brings us back to
>having the shell, not the editor, do the cycling and thus forking the editor
>again and again.

That's pretty reasonable too, given ex's ability to w >>file (remember the
original problem).  Note that the file has to exist or the >> will fail
but you can let the shell create an empty file to start.

>The second is that one thing I'm doing involves handling
>the case of a pattern that doesn't show up by h'ing it when it arrives and
>then, after its last chance to appear, g'ing and checking for its presence. 
>In ex I'd have to do a pattern search for it, and if it isn't there, the ex
>script will break; moreover, if I am trying to ex the entire input in one
>fork, a pattern search while I'm working on a portion that is missing the
>pattern will land on its occurrence in another portion, and then KABOOM.

Use the address,addresscommand form instead of separate search, command.
Note that address can be a line number or a search pattern.
Example:
/foo/d a
(delete line containing pattern foo into register a)
0 put a
(put register a before the first line)
/start/,/end/d b
(delete line containing pattern "start" through line containing "end" to
 register b)
/foo/put b
(put register b after line containing pattern foo)

The problem remains of failing pattern matches terminating the script
(likewise for attempting to put registers with nothing in them).
The only solution I've found is that you can source another file, and
a failure will only terminate the file being sourced, allowing the
next level up script to continue.  I suppose you could get a crude form of
flow control out of this, but it seems like there should be a settable
option to prevent aborting on errors.

>This is the sort of thing that awk and perl are for.  I'm trying to cut down
>a sapling with a carving knife.  It's better than a butterknife, and I don't
>need a chainsaw, but a regular saw would be very nice to have use of.

Ex is pretty regular.  The standard documentation doesn't begin to cover
it, though.

Les Mikesell
  les@chinet.chi.il.us