[comp.unix.shell] Bash, tar, and broken pipe

jwu@kepler.com (Jasper Wu) (05/15/91)

I have some problem when using pipelined tar in bash and hope someone
can help me find out why.

When I do    
           zcat foo.tar.Z | tar tvfB -
    or     uncompress < foo.tar.Z | tar tvfB -

it gives me the table of contents correctly but reports an error
message "Broken pipe" to stderr when it finishes.   However, if i add 
a "cat" to the pipeline as
           cat foo.tar.Z | uncompress | tar tvfB -

then it works fine (i.e., no error message).  All commands above work fine
in csh or sh.

The machine is Sparcstation running SunOs 4.1.1; the compiler used to compile
bash is Sun's cc.  The particular file in my case is bison-1.14.tar.Z (245kb)
if that matters.

Any ideas?  Has any one experienced the same problem?  Did i miss something
obvious (i'm new to bash) ?  Any comments will be greatly appreciated.

--jasper

weimer@garden.ssd.kodak.com (Gary Weimer (253-7796)) (05/15/91)

In article <586@kepler1.kepler.com>, jwu@kepler.com (Jasper Wu) writes:
|> 
|> I have some problem when using pipelined tar in bash and hope someone
|> can help me find out why.
|> 
|> When I do    
|>            zcat foo.tar.Z | tar tvfB -
|>     or     uncompress < foo.tar.Z | tar tvfB -
|> 
|> it gives me the table of contents correctly but reports an error
|> message "Broken pipe" to stderr when it finishes.   However, if i add 
|> a "cat" to the pipeline as
|>            cat foo.tar.Z | uncompress | tar tvfB -
|> 
|> then it works fine (i.e., no error message).  All commands above work fine
|> in csh or sh.

In csh I use:

   uncompress -c foo.tar.Z | tar tvf -
              ^^

(I've never had a problem not using B for tar). I don't know if this
will fix your problem or not.

weimer@ssd.kodak.com ( Gary Weimer )

heinz@cc.univie.ac.at (05/17/91)

In <1991May15.155040.19078@ssd.kodak.com> weimer@garden.ssd.kodak.com (Gary Weimer (253-7796)) writes:


>In article <586@kepler1.kepler.com>, jwu@kepler.com (Jasper Wu) writes:
>|> 
>|> I have some problem when using pipelined tar in bash and hope someone
>|> can help me find out why.
>|> 
>|> When I do    
>|>            zcat foo.tar.Z | tar tvfB -
>|>     or     uncompress < foo.tar.Z | tar tvfB -
>|> 
>|> it gives me the table of contents correctly but reports an error
>|> message "Broken pipe" to stderr when it finishes.   However, if i add 
>|> a "cat" to the pipeline as
>|>            cat foo.tar.Z | uncompress | tar tvfB -
>|> 
>|> then it works fine (i.e., no error message).  All commands above work fine
>|> in csh or sh.

>In csh I use:

>   uncompress -c foo.tar.Z | tar tvf -
>              ^^

>(I've never had a problem not using B for tar). I don't know if this
>will fix your problem or not.

This does *not* fix the problem. I've been using bash for quite a
long time  now,  and have made the same experience as Jasper. For
bash, it doesn't make any difference whether  you  use  'compress
-c'  or  'zcat'  (because  it's  the  same program anyway - zcat,
compress and uncompress are links to the same  program),  and  it
doesn't make any difference whether you use 'tar' with -B or not.

I'm sorry I can't give a solution to the problem  -  all  that  I
know  is  that bash reports a broken pipe if one of the processes
making up the pipe is killed or terminated abnormally  (well,  it
doesn't  have  to  be  terminated abnormally, it only needs to be
just  terminated).   As  I  didn't   have   time   to  look  into
bash's  source code  yet,  I don't  know if the actual problem is
a bug in  bash or an abnormal behaviour of zcat which  isn't  re-
vealed  by the other  shells.  It  also  might be a 'timing prob-
lem' as zcat  surely terminates before tar, and  this  may  cause
bash to report the pipe as being broken.
-- 


--
--------------------------------------------------------------------------------
---/     Heinz M. Herbeck                    /    Trust me, I know    /       /-
--/     heinz@sophie.pri.univie.ac.at       /    what I'm doing !    /       /--
-/     Vienna University, Austria          /    (Sledge Hammer)     /       /---
--------------------------------------------------------------------------------
--


--
--------------------------------------------------------------------------------

byron@archone.tamu.edu (Byron Rakitzis) (05/17/91)

In article <heinz.674472878@cc.univie.ac.at> heinz@cc.univie.ac.at writes:
>In <1991May15.155040.19078@ssd.kodak.com> weimer@garden.ssd.kodak.com (Gary Weimer (253-7796)) writes:
>>In article <586@kepler1.kepler.com>, jwu@kepler.com (Jasper Wu) writes:
>>|> I have some problem when using pipelined tar in bash and hope someone
>>|> can help me find out why.
>>|> 
>>|> When I do    
>>|>            zcat foo.tar.Z | tar tvfB -
>>|>     or     uncompress < foo.tar.Z | tar tvfB -
>>|> 
>>|> it gives me the table of contents correctly but reports an error
>>|> message "Broken pipe" to stderr when it finishes.   However, if i add 
>>|> a "cat" to the pipeline as
>>|>            cat foo.tar.Z | uncompress | tar tvfB -
>>|> then it works fine (i.e., no error message).  All commands above work fine
>>|> in csh or sh.
>
>>   uncompress -c foo.tar.Z | tar tvf -
>>              ^^
>
>This does *not* fix the problem.
 [...]
>I'm sorry I can't give a solution to the problem  -  all  that  I
>know  is  that bash reports a broken pipe if one of the processes
>making up the pipe is killed or terminated abnormally  (well,  it
>doesn't  have  to  be  terminated abnormally, it only needs to be
>just  terminated).
 [...]
>It  also  might be a 'timing prob-
>lem' as zcat  surely terminates before tar, and  this  may  cause
>bash to report the pipe as being broken.

Come again? zcat surely terminates before tar? Surely not!

The answer to all this is quite simple. When I do:

	cat /usr/dict/words | sed 10q

what happens? sed reads 10 lines of input, and then quits. However, cat
does not know it is writing to a pipe, so it keeps dumping stuff to its
stdout. Something has to stop it, so Unix has a signal called SIGPIPE.
cat therefore dies with the signal "SIGPIPE". Some shells report this
with a "broken pipe" message, because the tail of the pipe died before
the head.

Now, I have not looked at bash source either, but my guess is that
the code does not check the status of each exiting pipeline member,
because in the line

	cat foo.tar.Z | uncompress | tar ft -

the "uncompress|tar" section of the pipeline will break when tar finishes
printing the table of contents of the tar file.

Finally, if I type

	(echo hi; echo there) | sed 1q

then a shell will most likely not report any error, since "hi\n" and "there\n"
will fit into the pipe buffer.

Hope this clears things up a little. (In my opinion, a shell should never
report broken pipes, since they are usually a part of normal operation.
However, if can be handy to check the exit status of all pipe members.
I have written a shell which does this:

	cat /usr/dict/words | tail -r | exit 42
	echo $status

prints the output

	0 sigpipe 42

but does not report the broken pipe explicitly)


--
Byron Rakitzis
byron@archone.tamu.edu

byron@archone.tamu.edu (Byron Rakitzis) (05/20/91)

Heinz (heinz@cc.univie.ac.at) sent me some personal mail which I could
not reply to (is there another address I could use to get mail to you,
Heinz?). However, he raised an interesting point:

Given a pipeline

	foo | tar ft -

it seems clear that tar must read to EOF in order to determine whether
the tar file that foo writes has come to an end or not. Therefore a
normal instance of

	foo | tar ft -

should not cause a pipe to break, since tar will always terminate after
foo. I have no clue why tar is exiting prematurely. If anyone can shed
light on the matter, I think Heinz and I would appreciate it. (Servus,
Heinz!)


--
Byron Rakitzis
byron@archone.tamu.edu

pfalstad@phoenix.princeton.edu (Paul Falstad) (05/21/91)

byron@archone.tamu.edu (Byron Rakitzis) wrote:
>Heinz (heinz@cc.univie.ac.at) sent me some personal mail which I could
>not reply to (is there another address I could use to get mail to you,
>Heinz?). However, he raised an interesting point:
>
>Given a pipeline
>
>  foo | tar ft -
>
>it seems clear that tar must read to EOF in order to determine whether
>the tar file that foo writes has come to an end or not. Therefore a
>normal instance of
>
>  foo | tar ft -
>
>should not cause a pipe to break, since tar will always terminate after
>foo. I have no clue why tar is exiting prematurely. If anyone can shed
>light on the matter, I think Heinz and I would appreciate it. (Servus,
>Heinz!)

I don't know the tarfile format, since I don't have source, but let's see:

% ls a
b  c
% tar cvf foo a
a/
a/b
a/c
% tar tvf foo
drwxr-xr-x pfalstad/student 0 May 20 20:05 1991 a/
-rw-r--r-- pfalstad/student 29 May 20 20:05 1991 a/b
-rw-r--r-- pfalstad/student 29 May 20 20:05 1991 a/c
% ls -l foo
-rw-r--r--  1 pfalstad    10244 May 20 20:13 foo

(that's quite a big file for only 58 bytes of data.  Must be lots
 of padding at the end)

% cat /etc/motd /usr/dict/words >>foo
% tar tvf foo
drwxr-xr-x pfalstad/student 0 May 20 20:05 1991 a/
-rw-r--r-- pfalstad/student 29 May 20 20:05 1991 a/b
-rw-r--r-- pfalstad/student 29 May 20 20:05 1991 a/c
% man tar | sed -n 242,246p
     If there are multiple archive  files  on  a  tape,  each  is
     separated from the following one by an EOF marker.  tar does
     not read the EOF mark on the tape after it finishes  reading
     an  archive  file  because tar looks for a special header to
     decide when it has reached the end of the archive.   Now  if
% ...

--
Paul Falstad                     | 10 PRINT "PRINCETON CS"
pfalstad@phoenix.princeton.edu   | 20 GOTO 10

heinz@cc.univie.ac.at (05/22/91)

In <16345@helios.TAMU.EDU> byron@archone.tamu.edu (Byron Rakitzis) writes:

>Heinz (heinz@cc.univie.ac.at) sent me some personal mail which I could
>not reply to (is there another address I could use to get mail to you,
>Heinz?). However, he raised an interesting point:

Try one of the following: heinz@sophie.pri.univie.ac.at (<-- preferred)
			hh@eacpc1.tuwien.ac.at
			A4424GAF at AWIUNI11.BITNET
			herbeck@rice.edu

>Given a pipeline

>	foo | tar ft -

>it seems clear that tar must read to EOF in order to determine whether
>the tar file that foo writes has come to an end or not. Therefore a
>normal instance of

>	foo | tar ft -

>should not cause a pipe to break, since tar will always terminate after
>foo. I have no clue why tar is exiting prematurely. If anyone can shed
>light on the matter, I think Heinz and I would appreciate it. (Servus,
>Heinz!)

Yep, I do appreciate it. (Servus, Byron ! :)

I looked up the format of a tar-file (tar(5)), which is as follows:

	A ``tar tape'' or file is a series of blocks.  Each block is
	of  size  TBLOCK.  A  file  on  the tape is represented by a
	header block which describes the file, followed by  zero  or
	more blocks which give the contents of the file.  At the end
	of the tape are two blocks filled with binary zeros,  as  an
	EOF indicator.

	The header block looks like:

		#define TBLOCK 512
		#define NAMSIZ 100
		union hblock {
			char dummy[TBLOCK];
			struct header {
				char name[NAMSIZ];
				char mode[8];
				char uid[8];
				char gid[8];
				char size[12];
				char mtime[12];
				char chksum[8];
				char linkflag;
				char linkname[NAMSIZ];
			} dbuf;
		};
(quoted from the man-page)

This proves what was intuitively clear: there's no 'directory' contained in
a tar-file (how would you efficiently maintain a directory on a physical
tape ? :)
So tar has to scan the entire output from the first process in the pipe and 
terminates after this process.

This does not explain the broken pipe, though. I tried the following:

	cat <some_long_file> | more

and killed 'more' by pressing 'q' at the first prompt (so more terminates
first). No 'Broken Pipe'.

Then I tried:

	echo Hallo | (sleep 10; more) # first process terminates first, since
		'Hallo' should fit into the pipe's buffer

No 'Broken Pipe' either.

So is this problem specific to tar ???? I have not encountered it anywhere
else yet. Maybe I should take the time and hack up the source code of bash,
but I'm not sure if it's worth the effort.

Anyone who might have a clue please let me know. It doesn't really bother me
if a pipe brakes (unless it happens in my bathroom :), but it is something
that shouldn't happen, and I wonder why it does.

Greetings,
HH
--
--------------------------------------------------------------------------------
---/     Heinz M. Herbeck                    /    Trust me, I know    /       /-
--/     heinz@sophie.pri.univie.ac.at       /    what I'm doing !    /       /--
-/     Vienna University, Austria          /    (Sledge Hammer)     /       /---
--------------------------------------------------------------------------------

djm@eng.umd.edu (David J. MacKenzie) (05/23/91)

I think the reason for the broken pipe is that 'tar tf -' ignores the
padding at the end of the tar file; tar files are always an even
multiple of the blocksize in length, so they get padded with garbage
or nulls.  The 'foo' program writes the nulls into the pipe, but tar
never reads them.
--
David J. MacKenzie <djm@eng.umd.edu> <djm@ai.mit.edu>

chet@odin.INS.CWRU.Edu (Chet Ramey) (05/23/91)

In article <heinz.674916703@cc.univie.ac.at> heinz@cc.univie.ac.at () writes:

>This does not explain the broken pipe, though. I tried the following:
>
>	cat <some_long_file> | more
>
>and killed 'more' by pressing 'q' at the first prompt (so more terminates
>first). No 'Broken Pipe'.

I guess I'll take a shot at this one.  First of all, other shells (csh and
ksh for sure) special-case the message printed when a child process dies due
to an interrupt (SIGINT) or a broken pipe (SIGPIPE).  Bash does not skip
over SIGPIPE, hence the unexpected `Broken Pipe' message.

>Then I tried:
>
>	echo Hallo | (sleep 10; more) # first process terminates first, since
>		'Hallo' should fit into the pipe's buffer
>
>No 'Broken Pipe' either.

Try

slc2$ cat /etc/termcap | sleep 1
Broken pipe

(I also get the `Broken Pipe' message when I do `cat /etc/termcap | more' and
immediately hit `q'.)

The broken pipe/SIGPIPE/EPIPE happens to the *first* process in a pipeline;
the error occurs when an attempt is made to write on a pipe when no process
has it open for reading.

The process must exit due to the SIGPIPE, by the way -- no message will be
printed if it catches the SIGPIPE and calls exit(), unless the fatal signal
handler is coded like this:

fatal(sig)
int	sig;
{
	cleanup();
	_exit(128+sig);
}

>Maybe I should take the time and hack up the source code of bash,
>but I'm not sure if it's worth the effort.

It's a several-minute job, to be sure ;-)

Chet
-- 
Chet Ramey			  Internet: chet@po.CWRU.Edu
Case Western Reserve University	  NeXT Mail: chet@macbeth.INS.CWRU.Edu

``Now,  somehow we've brought our sins back physically -- and they're pissed.''

martin@mwtech.UUCP (Martin Weitzel) (05/24/91)

In article <1991May22.192914.22142@usenet.ins.cwru.edu> chet@po.CWRU.Edu writes:
>The process must exit due to the SIGPIPE, by the way -- no message will be
>printed if it catches the SIGPIPE and calls exit(), unless the fatal signal
>handler is coded like this:
>
>fatal(sig)
>int	sig;
>{
>	cleanup();
>	_exit(128+sig);
	^^^^^^^^^^^^^^  rather: kill(getpid(), sig); ????

>}

Hmm, I know that the shell encodes the information that some program
was terminated by a signal this way in $? - but the other way round
should also be true? On which system and for which shell?

I've just run a quick test, started a child sub-shell from the shell
prompt and terminated that with exit N (for several N close above 128).
I saw no message from the parent shell though the exit status was
transferred correctly according to $?.

(In case it should matter: It ran the test for the Bourne Shell on
ISC's UNIX/386 2.2.)
-- 
Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83

chet@odin.INS.CWRU.Edu (Chet Ramey) (05/24/91)

In article <1147@mwtech.UUCP> martin@mwtech.UUCP (Martin Weitzel) writes:

>>	_exit(128+sig);
>	^^^^^^^^^^^^^^  rather: kill(getpid(), sig); ????

You're right, but you have to deal with all the differences between the
BSD/Posix signals and the Sys V signals.  Simply replacing the call to exit
with the kill signal will cause an infinite loop.

>Hmm, I know that the shell encodes the information that some program
>was terminated by a signal this way in $? - but the other way round
>should also be true? On which system and for which shell?

No system, and for no shell.  I made a mistake.

Chet
-- 
Chet Ramey			  Internet: chet@po.CWRU.Edu
Case Western Reserve University	  NeXT Mail: chet@macbeth.INS.CWRU.Edu

``Now,  somehow we've brought our sins back physically -- and they're pissed.''