[comp.unix.shell] cat, pipes, and filters

root@progress.COM (Root of all Evil) (05/31/91)

Hi,

   I've got a question regarding the way cat behaves in a pipeline.
(I know, his fur gets all oily %+})  Can I cat the contents of a
file | pipe the output to a filter (such as sed) | then cat the
filtered output back to the original file?  I've tried this with the
following commands:

   cat $FILE | sed s/"$ENTRY"/"$NEWENTRY"/ > $FILE

   cat $FILE | sed s/"$ENTRY"/"$NEWENTRY"/ | cat > $FILE

Both command produce identical results: $FILE is truncated to 0-length.

   However, the following command gives me the result I want:

   cat $FILE | sed s/"$ENTRY"/"$NEWENTRY"/ | tee $FILE 1>/dev/null

So now my script works but I don't really understand why.  I've tested
this on SunOs 4.1 and Interactive SysV 2.2, no difference.  Is there
something simple about pipes or I/O redirection that I'm not grasping?
Or is this a feature of cat?  

   Any enlightenment would be appreciated.  Also, if you can think of
a better way to do the same thing (short of using perl), please let
me know.

                              Curious Rich

------------------------------------------------------------------------------
Rich Lenihan                 UUCP: mit-eddie!progress!rich
Progress Software Corp.      Internet: rich@progress.com
5 Oak Park                   >-   Insert funny stuff here  -<
Bedford, MA  01730           >-Draw amusing symbols, logo's-<
USA                          >-     or characters here     -<

jik@cats.ucsc.edu (Jonathan I Kamens) (06/01/91)

In article <1991May31.165446.1530@progress.com>, root@progress.COM (Root of all Evil) writes:
|>    cat $FILE | sed s/"$ENTRY"/"$NEWENTRY"/ > $FILE
|> 
|>    cat $FILE | sed s/"$ENTRY"/"$NEWENTRY"/ | cat > $FILE

In this case, you're using shell redirection to output to the file.  Shell
redirection takes place before any of the commands on the command line
actually get run.  Therefore, the shell opens $FILE and truncates it in
preparation for sending output to it *before* the "cat" at the beginning of
the pipeline opens it in order to read it.

|>    cat $FILE | sed s/"$ENTRY"/"$NEWENTRY"/ | tee $FILE 1>/dev/null

In this case, cat gets started and opens the file in order to read it, and
*then* tee gets started, sees that it's supposed to send output to $FILE, and
opens the file and truncates it in preparation for sending output to it.  Once
cat has opened the file for read, it gets to read it, because the file that
tee is sending output to is (in effect) a "different file" from the kernel's
point of view.

|> So now my script works but I don't really understand why.  I've tested
|> this on SunOs 4.1 and Interactive SysV 2.2, no difference.  Is there
|> something simple about pipes or I/O redirection that I'm not grasping?
|> Or is this a feature of cat?  

Quoting from the man page cat(1):

NOTES
     Beware of `cat a b > a' and `cat a b > b', which destroy the
     input files before reading them.

tchrist@convex.COM (Tom Christiansen) (06/01/91)

From the keyboard of root@progress.COM (Root of all Evil):
:
:Hi,
:
:   I've got a question regarding the way cat behaves in a pipeline.
:(I know, his fur gets all oily %+})  Can I cat the contents of a
:file | pipe the output to a filter (such as sed) | then cat the
:filtered output back to the original file?  I've tried this with the
:following commands:
:
:   cat $FILE | sed s/"$ENTRY"/"$NEWENTRY"/ > $FILE
:
:   cat $FILE | sed s/"$ENTRY"/"$NEWENTRY"/ | cat > $FILE
:
:Both command produce identical results: $FILE is truncated to 0-length.
:
:   However, the following command gives me the result I want:
:
:   cat $FILE | sed s/"$ENTRY"/"$NEWENTRY"/ | tee $FILE 1>/dev/null
:
:So now my script works but I don't really understand why.  I've tested
:this on SunOs 4.1 and Interactive SysV 2.2, no difference.  Is there
:something simple about pipes or I/O redirection that I'm not grasping?

yes.

:Or is this a feature of cat?  
:
:   Any enlightenment would be appreciated.  Also, if you can think of
:a better way to do the same thing (short of using perl), please let
:me know.

hm.... with that kind of proviso, i shouldn't answer.  :-)/2

you need to think about when things are getting opened.  the
shell opens your files before i calls the command.

so either use a temporary, cheat with tee, or else use

    perl -p -i -e "s/$ENTRY/$NEWENTRY/" $FILE

--tom
--
Tom Christiansen		tchrist@convex.com	convex!tchrist
		"So much mail, so little time."

wallace@ynotme.enet.dec.com (Ray Wallace) (06/01/91)

In article <1991May31.165446.1530@progress.com>, root@progress.COM (Root of all Evil) writes...
>   I've got a question regarding the way cat behaves in a pipeline.
> 
>   cat $FILE | sed s/"$ENTRY"/"$NEWENTRY"/ > $FILE
>Both command produce identical results: $FILE is truncated to 0-length.
One of the first things the shell does when parsing the line, is to handle I/O
redirection. So before any of the commands are executed the ">$FILE" part of
the line causes the shell to create an empty file which just happens to be
the file that you are trying to read (cat).

>   Any enlightenment would be appreciated.  Also, if you can think of
>a better way to do the same thing (short of using perl), please let
  cat $FILE | sed s/"$ENTRY"/"$NEWENTRY"/ > ${FILE}.new ; mv ${FILE}.new $FILE
is a different way, not neccessarily a better way.

---
Ray Wallace		
		(INTERNET,UUCP) wallace@oldtmr.enet.dec.com
		(UUCP)		...!decwrl!oldtmr.enet!wallace
		(INTERNET)	wallace%oldtmr.enet@decwrl.dec.com
---

marc@mercutio.ultra.com (Marc Kwiatkowski {Host Software-AIX}) (06/01/91)

In article <1991May31.165446.1530@progress.com> root@progress.COM (Root of all Evil) writes:

	[stuff about inability to redirect standard in and standard
	 out to the same file omitted.]

This problem and a good solution are discussed thoroughly in 
"The Unix Programming Environment, B.W. Kernighan and R. Pike, 
Prentice Hall, ISBN 0-13-937699-2."  They provide a generalized 
bourne shell script that will work with any arbitrary command, and is 
fairly robust, called "overwrite"

-- 
        Marc P. Kwiatkowski			Ultra Network Technologies
        Internet: marc@ultra.com		101 Daggett Drive
        uucp: ...!ames!ultra!marc		San Jose, CA 95134 USA
        telephone: 408 922 0100 x249

torek@elf.ee.lbl.gov (Chris Torek) (06/01/91)

>In article <1991May31.165446.1530@progress.com> root@progress.COM
>(Root of all Evil) writes:
>>    cat $FILE | sed s/"$ENTRY"/"$NEWENTRY"/ | tee $FILE 1>/dev/null

In article <16438@darkstar.ucsc.edu> jik@cats.ucsc.edu [ucsc?!]
(Jonathan I Kamens) writes:
>In this case, cat gets started and opens the file in order to read it, and
>*then* tee gets started, sees that it's supposed to send output to $FILE, and
>opens the file and truncates it in preparation for sending output to it.  Once
>cat has opened the file for read, it gets to read it, because the file that
>tee is sending output to is (in effect) a "different file" from the kernel's
>point of view.

Although the two processes hold different file table entries, they use
the same vnode and hence this is not a reliable way of doing things.
Depending on O/S vagaries, cat might read some, all, or none of $FILE
before tee truncates it; when tee truncates $FILE, anything cat has not
yet read will vanish forever.

As it happens, on most Unix boxes cat will run long enough to get the
first `block' (however big a block is, typically 8K under 4BSD-
derivative file systems) before sed will run, and sed will run long
enough to wait for cat before tee will run.  It is unwise to depend
on this.
-- 
In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 415 486 5427)
Berkeley, CA		Domain:	torek@ee.lbl.gov

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (06/02/91)

In article <13798@dog.ee.lbl.gov> torek@elf.ee.lbl.gov (Chris Torek) writes:
  [ why cat foo | blah | tee foo involves race conditions ]

More to the point, if you want to have blah read foo and write its
output to a new copy of foo, do something like this:

  ( rm foo; blah > foo ) < foo

---Dan

bp@chorus.fr (Bruno Pillard) (06/03/91)

From article <1991May31.165446.1530@progress.com>, by root@progress.COM

>    However, the following command gives me the result I want:
> 
>    cat $FILE | sed s/"$ENTRY"/"$NEWENTRY"/ | tee $FILE 1>/dev/null
> 
>    Any enlightenment would be appreciated. 

What about:

  ( /bin/rm $FILE ; sed  s/"$ENTRY"/"$NEWENTRY"/ > $FILE ) < $FILE

I understand that this may look harmful at first glance because of the
/bin/rm of your precious file but it  works perfectly for  me under sh
and (t)csh.

Is there any problem using that construction ?

   __                       
  /  )                        Bruno Pillard
 /--<  __  . . __   __        Chorus Systemes
/___/_/ (_(_/_/) )_(_)        6 Avenue Gustave Eiffel
                              78182  St-Quentin-en-Yvelines-Cedex  France
bp@chorus.fr  (Internet)      Tel: +33 (1) 30 64 82 13

mmk@d62iwa.mitre.org (Morris M. Keesan) (06/04/91)

In article <1991May31.165446.1530@progress.com> root@progress.COM (Root of all Evil) writes:

[various ways to edit a file in place using sed as part of a pipeline, e.g.]
> cat $FILE | sed s/"$ENTRY"/"$NEWENTRY"/ > $FILE

It's interesting that most of the responses have pointed out the problem
(creating the new $FILE before reading the old one), but all have accepted your
initial premise, which is that "sed" is the appropriate tool for editing a file
in place.  It seems to me that the much more obvious approach is to use "ed",
which is designed for editing files, as "sed" isn't.
My approach, tested under Ultrix 4.1:

#!/bin/sh
#this replaces all occurrences of $ENTRY with $NEWENTRY
ed $FILE <<KICKME
g/$ENTRY/s//$NEWENTRY/g
w
q
KICKME

gwc@root.co.uk (Geoff Clare) (06/04/91)

In <10918@chorus.fr> bp@chorus.fr (Bruno Pillard) writes:

>What about:

>  ( /bin/rm $FILE ; sed  s/"$ENTRY"/"$NEWENTRY"/ > $FILE ) < $FILE

>I understand that this may look harmful at first glance because of the
>/bin/rm of your precious file but it  works perfectly for  me under sh
>and (t)csh.

>Is there any problem using that construction ?

Yes!  If any errors occur you lose your data.

Slightly better would be:

  ( /bin/rm $FILE && sed  s/"$ENTRY"/"$NEWENTRY"/ > $FILE ) < $FILE

which would not truncate the file if the "rm" fails, but this still
loses your data if the disk is full.  It's much safer to:

  sed  s/"$ENTRY"/"$NEWENTRY"/ < $FILE > $TMPFILE && mv $TMPFILE $FILE
-- 
Geoff Clare <gwc@root.co.uk>  (Dumb American mailers: ...!uunet!root.co.uk!gwc)
UniSoft Limited, London, England.   Tel: +44 71 729 3773   Fax: +44 71 729 3273

wolfgang@wsrcc.com (Wolfgang S. Rupprecht) (06/06/91)

>More to the point, if you want to have blah read foo and write its
>output to a new copy of foo, do something like this:
>  ( rm foo; blah > foo ) < foo

Will this work if "foo" is an NFS file?

-wolfgang
-- 
Wolfgang Rupprecht    wolfgang@wsrcc.com (or) uunet!wsrcc!wolfgang
Snail Mail Address:   Box 6524, Alexandria, VA 22306-0524