dattier@ddsw1.MCS.COM (David W. Tamkin) (02/21/91)
Hmmm...this is knotty to explain... I'm running sed in a for loop (in sh) on several input files. Let's call them A, B, C, D, and E. The sed script should send output to three places. If I made a separate output file for each one, I'd finish with fifteen output files: Ax, Ay, Az, Bx, etc., through Ez. What I really want are a single x file, y file, and z file. When a sed script begins, sed scans for w operators and creates all target files needed. If files by those names already exist, tough; the old contents are clobbered. So if I use "w x" and "w y" in the script and redirect sed's stdout >>z, only z will have the proper contents; x and y will have only the extractions from input file E. I can use "w $i.x" and "w $i.y" (with proper quoting for the loop variable to be expanded) and then cat [A-E].x > x and cat [A-E].y > y afterward, but I'd rather lengthen the sed script than add the extra forks. Yes, I'm sure it can all be done in awk or perl, but I don't know them, don't do this stuff for a living, and certainly don't have the means or time or talent to learn awk or perl. Can sed be prevented from clobbering a file named in a w command? David Tamkin Box 7002 Des Plaines IL 60018-7002 708 518 6769 312 693 0591 MCI Mail: 426-1818 GEnie: D.W.TAMKIN CIS: 73720,1570 dattier@ddsw1.mcs.com
les@chinet.chi.il.us (Leslie Mikesell) (02/26/91)
In article <1991Feb20.234921.5738@ddsw1.MCS.COM> dattier@ddsw1.MCS.COM (David W. Tamkin) writes: >Hmmm...this is knotty to explain... I'm running sed in a for loop (in sh) >on several input files. Let's call them A, B, C, D, and E. The sed script >should send output to three places. If I made a separate output file for >each one, I'd finish with fifteen output files: Ax, Ay, Az, Bx, etc., through >Ez. What I really want are a single x file, y file, and z file. >When a sed script begins, sed scans for w operators and creates all target >files needed. If files by those names already exist, tough; the old contents >are clobbered. So if I use "w x" and "w y" in the script and redirect sed's >stdout >>z, only z will have the proper contents; x and y will have only the >extractions from input file E. Simple suggestion: If the input files all exist at once, let sed iterate over the list of filenames instead of the shell. Multiple w's within a script will append. >I can use "w $i.x" and "w $i.y" (with proper quoting for the loop variable to >be expanded) and then cat [A-E].x > x and cat [A-E].y > y afterward, but I'd >rather lengthen the sed script than add the extra forks. Yes, I'm sure it >can all be done in awk or perl, but I don't know them, don't do this stuff >for a living, and certainly don't have the means or time or talent to learn >awk or perl. Ahh, but with perl, you typically only have to learn one arcane set of commands and syntax instead of the several you need to know to write useful shell scripts. If you really care about reducing the number of forks (at the expense of some start-up time), perl is the way to go. >Can sed be prevented from clobbering a file named in a w command? Probably not, but there is still a simple alternative. Just do the editting portion of the script with ex. You probably already know the appropriate syntax and it supports the w >>file command to explicitly append. As an added benifit you also get the ability to specify an address an arbitrary number of lines above a pattern match and some other things that are impossible in sed (at the expense of making a temporary copy of the file). Les Mikesell les@chinet.chi.il.us
dattier@ddsw1.MCS.COM (David W. Tamkin) (02/28/91)
les@chinet.chi.il.us (Leslie Mikesell) wrote in <1991Feb25.184548.14778@chinet.chi.il.us>: | In article <1991Feb20.234921.5738@ddsw1.MCS.COM> dattier@ddsw1.MCS.COM | (David W. Tamkin) writes: | >Hmmm...this is knotty to explain... I'm running sed in a for loop (in sh) | >on several input files. Let's call them A, B, C, D, and E. The sed script | >should send output to three places. If I made a separate output file for | >each one, I'd finish with fifteen output files: Ax, Ay, Az, Bx, etc., through | >Ez. What I really want are a single x file, y file, and z file. | >When a sed script begins, sed scans for w operators and creates all target | >files needed. If files by those names already exist, tough; the old contents | >are clobbered. So if I use "w x" and "w y" in the script and redirect sed's | >stdout >>z, only z will have the proper contents; x and y will have only the | >extractions from input file E. | If the input files all exist at once, let sed iterate over the list of | filenames instead of the shell. Multiple w's within a script will append. Les, what I'm guessing you mean here is that I should use sed -f scriptfile infile1 infile2 ... > outfile instead of for i in infile1 infile2 ... ; do sed -f scriptfile $i ; done >> outfile I had two problems with that approach: first, I didn't know how to make the sed script realize that it was starting on a new input file and thus should operate on the last line of each input file in a particular way, given that it was last, and operate on the first line of every input file as I instructed it for the first line of the entire input. Ideally, yes, that would have saved forking sed again for each file of the input, but sed has no way to address "last line before you start taking input from somewhere else" nor "first line of a new input source." It acts as if you had catted them together first. Second, one of the files contains text before the sed even starts. I could use it as >>stdout (file z in my explanation) except for one thing: it's the one that receives output of a single w per pass. The other file (there really are only two, not three; I was illustrating before) gets the output of all the p commands plus one P command. Yes, I could interchange the p's with the w, but how do I make a w out of the P without clobbering the hold space? Plus, at the time I first asked, the script had print-at-end-of-cycle as the default -- no, that's an easy one: just change all the b's to bz, stick #n at the top or -n on the command line, and end the script with d :z w filename which I didn't do anyway, because the script in its final form was run with sed -n and the b's became p's (or sometimes p ; d to avoid dropping down into commands that would mess up the hold space). Actually, it's not all that easy, because I had a lot of n's too. Without -n in effect, interchanging stdout and wfile would have meant changing a simple n to w filename N s/.*\n// to avoid getting shot back to the top of the sed script, as D would do. Forking sed separately for each input file means clobbering whichever output file gets the honor of being the w destination. I suppose that I could just have come up with a bunch of wfiles and concatenated the lot together afterwards. As it turns out I did find a way to spot the first line of each input file and apply the same commands to it and the lines that followed as to the corresponding lines at the beginning of the first input file, and thus run sed in only one pass. The single line per input file that needed to be appended to the existing file I w'ed to a fifo to be catted >>the existing file. System V doesn't let sed clobber fifos, fortunately. | Ahh, but with perl, you typically only have to learn one arcane set of | commands and syntax instead of the several you need to know to write | useful shell scripts. If you really care about reducing the number of | forks (at the expense of some start-up time), perl is the way to go. Yes, I knew that. It probably doesn't take all that many forks of smaller programs before there's more overhead than for one fork of perl. Are you volunteering to teach me, Les? You know where I live. | Probably not, but there is still a simple alternative. Just do the | editting portion of the script with ex. You probably already know | the appropriate syntax and it supports the w >>file command to explicitly | append. As an added benifit you also get the ability to specify an | address an arbitrary number of lines above a pattern match and some | other things that are impossible in sed (at the expense of making a | temporary copy of the file). That's not so simple for this application. There are a number of things I'm doing in the sed script that would be really bearish in ex; the first that comes to mind is how to resume the editing from the beginning when the next input file is ready. (Surely I shouldn't have to change the script to have N copies of itself when the input contains N files.) That brings us back to having the shell, not the editor, do the cycling and thus forking the editor again and again. The second is that one thing I'm doing involves handling the case of a pattern that doesn't show up by h'ing it when it arrives and then, after its last chance to appear, g'ing and checking for its presence. In ex I'd have to do a pattern search for it, and if it isn't there, the ex script will break; moreover, if I am trying to ex the entire input in one fork, a pattern search while I'm working on a portion that is missing the pattern will land on its occurrence in another portion, and then KABOOM. This is the sort of thing that awk and perl are for. I'm trying to cut down a sapling with a carving knife. It's better than a butterknife, and I don't need a chainsaw, but a regular saw would be very nice to have use of. David Tamkin Box 7002 Des Plaines IL 60018-7002 708 518 6769 312 693 0591 dattier@ddsw1.mcs.com MCI Mail: 426-1818 CIS: 73720,1570 GEnie: D.W.TAMKIN
les@chinet.chi.il.us (Leslie Mikesell) (03/02/91)
In article <1991Feb27.172236.7202@ddsw1.MCS.COM> dattier@ddsw1.MCS.COM (David W. Tamkin) writes: [about perl] >Yes, I knew that. It probably doesn't take all that many forks of smaller >programs before there's more overhead than for one fork of perl. Are you >volunteering to teach me, Les? You know where I live. I'm hardly an expert. I just get by with two terminals on my desk so when I get stuck I can experiment on the 2nd one without losing my place in the problem. Perl has the great advantage of having a usable debugger as opposed to sed's "garbled command" or awk's "bailing out near line _" error message (singular). I don't object to questions by email but I can't guarantee how long it will take to respond... [use ex instead] >That's not so simple for this application. There are a number of things I'm >doing in the sed script that would be really bearish in ex; On the contrary - ex handles them just fine. Sed just barely does them. >the first that >comes to mind is how to resume the editing from the beginning when the next >input file is ready. (Surely I shouldn't have to change the script to have N >copies of itself when the input contains N files.) With ex, you must use an explicit 'n' to go to the next file so you know when it is going to happen. Duplicating the command script is trivial once you have it right and you can carry text between files in the named registers. Back in the days before perl I would have done something like this by having the shell generate the ex script with the proper number of iterations, then execute it. Since your main script can source another file it can just consist of: so real_script n! so real_script ... >That brings us back to >having the shell, not the editor, do the cycling and thus forking the editor >again and again. That's pretty reasonable too, given ex's ability to w >>file (remember the original problem). Note that the file has to exist or the >> will fail but you can let the shell create an empty file to start. >The second is that one thing I'm doing involves handling >the case of a pattern that doesn't show up by h'ing it when it arrives and >then, after its last chance to appear, g'ing and checking for its presence. >In ex I'd have to do a pattern search for it, and if it isn't there, the ex >script will break; moreover, if I am trying to ex the entire input in one >fork, a pattern search while I'm working on a portion that is missing the >pattern will land on its occurrence in another portion, and then KABOOM. Use the address,addresscommand form instead of separate search, command. Note that address can be a line number or a search pattern. Example: /foo/d a (delete line containing pattern foo into register a) 0 put a (put register a before the first line) /start/,/end/d b (delete line containing pattern "start" through line containing "end" to register b) /foo/put b (put register b after line containing pattern foo) The problem remains of failing pattern matches terminating the script (likewise for attempting to put registers with nothing in them). The only solution I've found is that you can source another file, and a failure will only terminate the file being sourced, allowing the next level up script to continue. I suppose you could get a crude form of flow control out of this, but it seems like there should be a settable option to prevent aborting on errors. >This is the sort of thing that awk and perl are for. I'm trying to cut down >a sapling with a carving knife. It's better than a butterknife, and I don't >need a chainsaw, but a regular saw would be very nice to have use of. Ex is pretty regular. The standard documentation doesn't begin to cover it, though. Les Mikesell les@chinet.chi.il.us