[comp.compression] Problems running compress from perl on MS-DOS

bill@camco.Celestial.COM (Bill Campbell) (05/10/91)

I am trying to build an archive of compressed files on an MS-DOS
system using a perl script.  The guts are:
	open(INPUT, "filelist");	# opens input file list
	open(ARCHIVE, "archive");
	binmode(ARCHIVE);
	$buf_size = 512;
	while(<INPUT>) {
		chop;
		system("compress < $_ > tmpfile.Z");
		$comp_size = (stat('tmpfile.Z'))[7];
		&write_file_header();	# contains, names, sizes...
		open(WORKFILE, "tmpfile.Z");
		binmode(WORKFILE);
		$\ = '';	# don't insert anything after each write
		while(sysread(WORKFILE, $buffer, $buf_size) > 0) {
			print ARCHIVE $buffer;
			undef($buffer);	# strange stuff remaining on short
							# read if I don't do this.
		}
		close(WORKFILE);
	}
	close(INPUT);....

This works fine on Xenix/Unix systems, but the MS-DOS version has
one of two problems:
	1.	It frequently says that it is out of memory compressing
		from stdin.  I have recompiled compress for 12-bit
		compression and it does this less frequently, but it
		still does it.
	2.	The compressed file is longer than it should be.  It
		appears that it might be created as a text file rather
		than binary when the standard output is redirected to the
		file.

I'm using compress,v 4.3d compiled with Xenix cc -dos in large model.

I don't know much about MS-DOS (by choice), and don't understand
why this should fail in this manner.  Compressing in-place works
fine under MS-DOS.

Does anybody know what I'm doing wrong?

Thanks
-- 
INTERNET:  bill@Celestial.COM   Bill Campbell; Celestial Software
UUCP:   ...!thebes!camco!bill   6641 East Mercer Way
             uunet!camco!bill   Mercer Island, WA 98040; (206) 947-5591

lwall@jpl-devvax.jpl.nasa.gov (Larry Wall) (05/10/91)

In article <991@camco.Celestial.COM> bill@camco.Celestial.COM (Bill Campbell) writes:
: I am trying to build an archive of compressed files on an MS-DOS
: system using a perl script.  The guts are:

: 	open(ARCHIVE, "archive");
: 	binmode(ARCHIVE);
...

: This works fine on Xenix/Unix systems, but the MS-DOS version has
: one of two problems:
: 	1.	It frequently says that it is out of memory compressing
: 		from stdin.  I have recompiled compress for 12-bit
: 		compression and it does this less frequently, but it
: 		still does it.

You need to get ahold of Len Reed's mods that swap perl out to secondary
storage while running a subprocess.  compress's data structures tend
to use all of an MS-DOS machines ordinary memory.  Or get a version
of compress that uses extraordinary memory.  :-)

: 	2.	The compressed file is longer than it should be.  It
: 		appears that it might be created as a text file rather
: 		than binary when the standard output is redirected to the
: 		file.

I don't know if it's related, but you need to put a > before the
filename when you open the archive.  It might be that binmode only words
if you've opened the file for output.  That wouldn't explain compress's
problems, if that's where the problem is.  It might explain the buffer
troubles though.

Just not another MS-DOS hacker,
Larry

wjb@moscom.UUCP (Bill de Beaubien) (05/10/91)

In article <991@camco.Celestial.COM> bill@camco.Celestial.COM (Bill Campbell) writes:
 >I am trying to build an archive of compressed files on an MS-DOS
 >system using a perl script....  
[code omitted]
 >	1.	It frequently says that it is out of memory compressing
 >		from stdin.  I have recompiled compress for 12-bit
 >		compression and it does this less frequently, but it
 >		still does it.
 >	2.	The compressed file is longer than it should be.  It
 >		appears that it might be created as a text file rather
 >		than binary when the standard output is redirected to the
 >		file.
 >
 >I'm using compress,v 4.3d compiled with Xenix cc -dos in large model.
 >
 >I don't know much about MS-DOS (by choice), and don't understand
 >why this should fail in this manner.  Compressing in-place works
 >fine under MS-DOS.
 >
 >Does anybody know what I'm doing wrong?

Well, I don't know much about compress, but it seems reasonable that if
you try to run it from a shell under Perl it's going to have a lot less
memory to work with.  Perl 4.0p3 works out to 313k (no optimization),
plus memory for the shell, leaving you with maybe 200k to work with,
if you're lucky.  Running out of memory's hardly a surprise...

 >
 >Thanks
 >-- 
 >INTERNET:  bill@Celestial.COM   Bill Campbell; Celestial Software
 >UUCP:   ...!thebes!camco!bill   6641 East Mercer Way
 >             uunet!camco!bill   Mercer Island, WA 98040; (206) 947-5591


-- 
"Bless me, Father; I ate a lizard."
"Was it an abstinence day, and was it artificially prepared?"
-------------------------------------------------------------
Bill de Beaubien / wjb@moscom.com 

lbr@holos0.uucp (Len Reed) (05/13/91)

In article <991@camco.Celestial.COM> bill@camco.Celestial.COM (Bill Campbell) writes:
 [script deleted]
 
>This works fine on Xenix/Unix systems, but the MS-DOS version has
>one of two problems:
>	1.	It frequently says that it is out of memory compressing
>		from stdin.  I have recompiled compress for 12-bit
>		compression and it does this less frequently, but it
>		still does it.

It's tough to run *any* subprocess from non-swapping perl.  I distributed
an enhanced 3.041 version in the fall that swapped perl to disk or RAM-disk
when running a subprocess.  (Perl takes over 300K before it even starts
mallocing.)

Tom Dinger was folding swapping code into 4.x last I heard.  I've been
meaning to write to him.  In the meantime you should be able to find
the swapping version any place that archive comp.binaries.ibm.pc.

>	2.	The compressed file is longer than it should be.  It
>		appears that it might be created as a text file rather
>		than binary when the standard output is redirected to the
>		file.

Your diagnosis is surely correct.  Your compressed file will be hopelessly
corrupt, since 0xA will become {0xD, 0xA}.  The problem is in compress
itself, which is writing "text" mode.  (BTW, it's probably also reading
in text mode: if you're reading a binary file this, too, is a problem.)

You must alter compress so that it handles the output file in binary.
The can be done at output time by ORing in O_BINARY along with
the usual assortment of O_ flags.  For already-open files (e.g., stdout),
use "setmode (handle, O_BINARY)".  Key this all on "#ifdef MSDOS" and
you're all set.

BTW, binmode() in perl simply executes setmode(handle, O_BINARY).

>I'm using compress,v 4.3d compiled with Xenix cc -dos in large model.
>
>I don't know much about MS-DOS (by choice), and don't understand
>why this should fail in this manner.  Compressing in-place works
>fine under MS-DOS.

I don't have 4.3 source.  Perhaps the open() call has O_BINARY but
there's no setmode() call to handle the case of writing to stdout?


-- 
Len Reed
Holos Software, Inc.
Voice: (404) 496-1358
UUCP: ...!gatech!holos0!lbr