[comp.lang.perl] problem redirecting STDOUT to pipe

vsh@etnibsd.UUCP (Steve Harris) (07/13/90)

Perlers,

	[ I am running perl 3.0, patchlevel 15 (our netfeed has been
	down for a while).  If this problem has been fixed in a more
	recent patch, please let me know. ]


Abstract: Consider the following perl fragments:

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

	print STDERR "before: fileno(STDOUT) = ", fileno(STDOUT), "\n";
	open(STDOUT, ">foo");
	print STDERR "after:  fileno(STDOUT) = ", fileno(STDOUT), "\n";
result:
	before: fileno(STDOUT) = 1
	after:  fileno(STDOUT) = 1

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

	print STDERR "before: fileno(STDOUT) = ", fileno(STDOUT), "\n";
	open(STDOUT, "| dd of=foo");
	print STDERR "after:  fileno(STDOUT) = ", fileno(STDOUT), "\n";
result:
	before: fileno(STDOUT) = 1
	after:  fileno(STDOUT) = 3

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

In the first case, re-opening STDOUT preserves the original fileno, in
the second, it does not.  Why?  Is the way it should behave?


The Whole Story:

I'm writing a (table-driven) script to do backups to an Exabyte tape
drive.  I'm using John Gilmore's public domain tar (which we call
gnutar), which can read the list of files to be archived from stdin.
For the moment, all systems are NFS mounted and treated as part of a
vast virtual filesystem.

I use find to generate the list of files -- depending on conditions,
the archive will be full or incremental.  For incremental backups, I
use "find -newer timestamp_file" to generate the list of files; the
timestamp_file is touch'ed when a full backup is done.

I use bdd, a program from Delta Microsystems (from whom we purchased
our Exabyte), to write to the tape.  Bdd is based on dd, but allows you
to specify both a "bs" parameter (as in dd) and a "odbs" parameter, the
"output device block size".  When bdd processes its final chunk of
data, if that chunk is less than bs bytes, it will write that chunk out
in odbs blocks.  With normal dd, the final chunk is null-padded to bs
bytes.  If bs is 1 Mb, that can be a lot of nulls.

Each partition of each system is handled independently.  E.g., you
could have a full backup of the root partition and an incremental
backup of the /usr partition.  There is a separate timestamp_file in
the root directory of each partition.

Since tape filemarks on the Exabyte are costly (2+ MB, as I recall),
the entire archive for each system is written to a single tape file.
This file is prefixed with a (1024-byte) header describing the contents
of the tar saveset (date, system name; for each partition: type (full
or incremental) and the mod date of the timestamp file.

[ When I get the script working, I'll publish it to the net. ]


Here's an outline of the script (without error checking):

...;	# rewind tape
foreach $sys (@sys_list) {
	...;	# construct partition_list for this sys
	...;	# construct header for this sys
	$pid = open(TAR, "|-");
	if ($pid == 0) {	# child
		open(STDOUT, "| bdd of=$TAPE bs=$bs odbs=$odbs");
		print STDOUT $header;
		exec "gnutar -cvf - -T - -b 1";
	}
	# parent
	foreach $partition (@partition_list) {
		...;	# construct find_args for this sys/partition
		open(FIND, "find $find_args |");
		while (<FIND>) {
			...;		# cannonicize file name
			print TAR $_;	# pass file name to gnutar
		}
	}
	close TAR;	# wait for child to complete
}
...;	# rewind tape


As written above, the script won't work because gnutar has not
inherited the pipe on its stdout.  The "open(STDOUT, ...);" does not
associate file descriptor 1 with the pipe, it just reassigns the name
"STDOUT" to some other file descriptor.  Within perl that's fine, but
gnutar still thinks stdout is file descriptor 1.

I can get around the problem by replacing the child code with:

	if ($pid == 0) {	# child
		open(BDD, "| bdd of=$TAPE bs=$bs odbs=$odbs");
		open(STDOUT, ">&BDD);	# this reassigns STDOUT
		print STDOUT $header;

but then I cannot exec tar, as the tape device does not get closed and
the next attempt to open it (for the backup of the next system) will
fail.  Instead, I have to do (this works):

		system "gnutar -cvf - -T - -b 1";
		close BDD;
		close STDOUT;
	}

If I close BDD before the exec/system, the whole thing blocks.  If I
don't explicitly close both BDD and STDOUT, the next iteration fails
(bdd cannot open $TAPE, presumably it's not closed when the child exits).

I'm not at all sure that causing:
	
	open(STDOUT, "| bdd of=$TAPE bs=$bs odbs=$odbs");

to associate file descriptor 1 with the pipe will solve my problem, but
I wonder, isn't that at least the correct semantics?  And why do I have
to close BDD and STDOUT explicitly?

Comments, Larry?

Just another perl diver, coming up for air.

Steve Harris - Eaton Corp. - Beverly, MA - uunet!etnibsd!vsh
-- 
Steve Harris - Eaton Corp. - Beverly, MA - uunet!etnibsd!vsh

merlyn@iwarp.intel.com (Randal Schwartz) (07/16/90)

In article <1123@etnibsd.UUCP>, vsh@etnibsd (Steve Harris) writes:
| 	print STDERR "before: fileno(STDOUT) = ", fileno(STDOUT), "\n";
| 	open(STDOUT, ">foo");
| 	print STDERR "after:  fileno(STDOUT) = ", fileno(STDOUT), "\n";
| result:
| 	before: fileno(STDOUT) = 1
| 	after:  fileno(STDOUT) = 1
| =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
| 	print STDERR "before: fileno(STDOUT) = ", fileno(STDOUT), "\n";
| 	open(STDOUT, "| dd of=foo");
| 	print STDERR "after:  fileno(STDOUT) = ", fileno(STDOUT), "\n";
| result:
| 	before: fileno(STDOUT) = 1
| 	after:  fileno(STDOUT) = 3
| =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
| 
| In the first case, re-opening STDOUT preserves the original fileno, in
| the second, it does not.  Why?  Is the way it should behave?

The short answer, which may lead you to more insight, is that you also
need to think of the stdout of the "dd" command.  The stdin fd of the
dd command (which you are asking to be the stdout of the parent)
cannot be the same as the stdout fd of dd.  Therefore, in order for
the dd to have a proper stdout (fd 1), the parent's stdout fd gets
trashed after all the piping and forking.

(Or something like that.  It's too hot to think here... we native
Oregonians are just not used to 85-degree weather for almost a whole
week in a row now.  Especially on a weekend.  Yuck.  I may actually
have to buy an air-conditioner for my house because of this stupid
greenhouse effect. :-)

print "Just another Perl hacker, with time enough to read only the first part of articles, :-)"
-- 
/=Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ==========\
| on contract to Intel's iWarp project, Beaverton, Oregon, USA, Sol III      |
| merlyn@iwarp.intel.com ...!any-MX-mailer-like-uunet!iwarp.intel.com!merlyn |
\=Cute Quote: "Welcome to Portland, Oregon, home of the California Raisins!"=/

worley@compass.com (Dale Worley) (07/16/90)

   X-Name: Randal Schwartz

   The short answer, which may lead you to more insight, is that you also
   need to think of the stdout of the "dd" command.  [...]

The problem is that this loses...  You have to always put STDOUT on fd
1, because otherwise if you exec something its output will go into
never-never land.

The proper way to handle pipes and redirection is to first create a
pipe pair (which goes on fd's 7 and 8, say), then fork, then each
process closes stdin/stdout/whatever and dup's the input or output
side of the pipe into the correct fd's, and then both close fd's 7 and
8 (to clean up).  With this method, each process can put either end of
the pipe on whatever fd's it wants.

While we're at it, what's the method for creating a pipe pair in Perl?
I can't find anything in the manual that does not also cause forking.
I'm sure this has been discussed before, but I don't remember the
answer and now I'm in the throes of a "redirect both STDIN and STDOUT
to a subprocess" program...

Dale Worley		Compass, Inc.			worley@compass.com
--
"If you could have any amount of money... How much would you want?"
"All of it."

merlyn@iwarp.intel.com (Randal Schwartz) (07/17/90)

In article <1990Jul16.161925.9869@uvaarpa.Virginia.EDU>, worley@compass (Dale Worley) writes:
| While we're at it, what's the method for creating a pipe pair in Perl?
| I can't find anything in the manual that does not also cause forking.
| I'm sure this has been discussed before, but I don't remember the
| answer and now I'm in the throes of a "redirect both STDIN and STDOUT
| to a subprocess" program...

Here's the hack I produced when this subject came up last time.
You have to watch your synchronization carefully, or use select():

==================================================
#!/local/usr/bin/perl

$| = 1;

sub make_child {
	local($read,$write,@cmd) = @_;
	# shouldn't need eval in next two lines...
	eval "pipe(CHILD_READ,$write)" || die "no pipe: $!";
	eval "pipe($read,CHILD_WRITE)" || die "no pipe: $!";
	local($pid) = fork;
	unless ($pid) { # child:
		close($read);
		close($write);
		open(STDIN,"<&CHILD_READ") || die "cannot dup: $!";
		open(STDOUT,">&CHILD_WRITE") || die "cannot dup: $!";
		exec @cmd;
		die "cannot exec @cmd: $!";
	} else { # parent:
		die "no fork: $!" unless defined $pid;
		close(CHILD_READ);
		close(CHILD_WRITE);
		local($os) = select($write); $| = 1; select($os);
	}
}

&make_child("BR","BW","/bin/sh");
print BW "date\n";
$x = <BR>; print $x;
close(BW);
print <BR>;
==================================================

And yes, Larry, look at those lines marked "shouldn't need eval".  Why
do I need eval?  I thought a scalar was always a good replacement for
a filehandle...

print "Just another Perl hacker,"
-- 
/=Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ==========\
| on contract to Intel's iWarp project, Beaverton, Oregon, USA, Sol III      |
| merlyn@iwarp.intel.com ...!any-MX-mailer-like-uunet!iwarp.intel.com!merlyn |
\=Cute Quote: "Welcome to Portland, Oregon, home of the California Raisins!"=/

lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (07/17/90)

In article <1123@etnibsd.UUCP> vsh@etnibsd.UUCP (Steve Harris) writes:
: Perlers,
: 	print STDERR "before: fileno(STDOUT) = ", fileno(STDOUT), "\n";
: 	open(STDOUT, "| dd of=foo");
: 	print STDERR "after:  fileno(STDOUT) = ", fileno(STDOUT), "\n";
: result:
: 	before: fileno(STDOUT) = 1
: 	after:  fileno(STDOUT) = 3
: 
: =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
: 
: In the first case, re-opening STDOUT preserves the original fileno, in
: the second, it does not.  Why?  Is the way it should behave?

The problem is that pipe() opens two file descriptors, the second one
being the write descriptor.  I've made the mypopen() routine detect this
and dup the upper descriptor to the lower where necessary, so it'll be
fixed in the next patch.

Larry

lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (07/18/90)

In article <1990Jul16.180207.28646@iwarp.intel.com> merlyn@iwarp.intel.com (Randal Schwartz) writes:
: And yes, Larry, look at those lines marked "shouldn't need eval".  Why
: do I need eval?  I thought a scalar was always a good replacement for
: a filehandle...

Er, it is if I don't blow it...

After the next patch the line in arg.h which reads

	A(0,0,0),       /* PIPE */

will read

	A(1,1,0),       /* PIPE */

and you'll be much happier with life.

Larry