[comp.sys.apple] //gs APW "compress"

fadden@cory.Berkeley.EDU (Andy McFadden) (12/16/89)

This is the README file from a program I'm about to post on
comp.binaries.apple2.  I've posted it here primarily so that people can see
what they're getting without having to download the file...

(if this looks familiar, it's because I posted something similar a few
 hours ago, canceled the article just now, and posted this one...  hopefully
 it hadn't propagated too far).

-- 
fadden@cory.berkeley.edu (Andy McFadden)
...!ucbvax!cory!fadden

-----
UNIX compress (12 or 13 bit)
APW C port by Andy McFadden (fadden@cory.berkeley.edu)
Version 1.1  December 1989

***** How to use:
compress [-dcvfV] [-b maxbits] [file ...]

    -d:     If given, decompression is done instead
    -c:     Write output on stdout, don't remove original.
    -b:     Parameter limits the max number of bits/code [12 or 13]
    -f:     Forces output file to be generated, even if one already exists,
            and even if no space is saved by compressing.  If -f is not
            used, the output file will not be overwritten if it exists.
    -v:     Write compression statistics.
    -V:     Print version info.
    file..: Files to be compressed.  If none specified, stdin is used.

Output to file.Z (with same attributes as original), or stdout (if stdin
is used as input).  Does not replace original file if no compression is
achiveved.  Filenames must be short enough to concatenate the ".Z" suffix.

To uncompress a file, the best way is to put
alias uncompress   compress -d
in your login file.  Then just use uncompress with the above options.

Note that the maximum for this version is  ** 13 bits **  .  If you use
the compress command normally under UNIX, you will make a compressed file
which may use 16 bit codes.  This port CANNOT uncompress such a file, so make
sure you do something like
% compress -b13 file1 file2 ...
when you are compressing the files initially.


***** Benchmarks / statistics:

(note these were made with a 1K I/O buffer; see comments below)

File        Storage     Compress        Uncompress      Size
------------------------------------------------------------------------------
Moria GS    hard drive  870/1024 sec    345/330 sec     66% / 64%  of original.
 (577K)
compress.c  3.5" drive  55 / 69 sec     38 / 37 sec     47% / 41% of original.
 (45K)      hard drive  46 / 60 sec     28 / 30 sec
            /ram5       41 / 54 sec     24 / 24 sec

The double entries correspond to 12 and 13-bit compress.  13-bit compress
generally requires slightly more time to compress, but about the same (or
even LESS in the case of MoriaGS) to uncompress.  So it is probably to your
advantage to use 13-bit codes.

By way of comparison, ShrinkIt v2.1 takes about two minutes to compress
Moria GS, with slightly better compression.  It takes about 10 seconds to
compress "compress.c"; the resulting file is 49% of the original.  Generally
speaking, ShrinkIt compresses binary or sparse files better than UNIX compress
can.  However, it is difficult to match the crunching power for text files.

It should be painfully obvious from the few statistics here that disk speed
plays a large role in the time required.  After Larry Virden pointed out
the virtues of setvbuf(), I boosted the buffer size to 8K.  The time to
compress and uncompress "compress.c" on a 3.5" drive was reduced by six and
three seconds, respectively.


***** Notes:

- This will require about 125K free memory to run.

- The executable file is NOT compacted.  When I ran compact on it, no
  errors were reported.  However, the program crashed when I ran it.  I
  tried this several times, all to no avail.  Apparently this is due to
  the increased array sizes for 13-bit compress; there were no problems
  compacting a 12-bit only version (obtainable by setting USERMEM to
  65000 in the first few lines of the program).  Using static arrays instead
  of global automatic arrays didn't help (it still crashed, in more or
  less the same bank 0 location).

- When uncompressing, either don't type the last .Z or type ".Z", not
  ".z".  Under UNIX, ".z" is a different kind of file, and some parts of
  compress do distinguish case.

- Compressing a file to stdout will probably be a mistake, since I believe
  that will convert linefeeds (hex 0a) to carriage returns (hex 0d).  However,
  uncompressing a text file to a file works fine.  Uncompressing a text file
  to the screen has the usual APW one-line weirdness.

- This version does not properly support wildcards, although it does
  correctly handle ".." and device numbers.  Things generally work the way
  you would expect them to, except for '=' and '?'.

- The UNIX version prompts before overwriting; APW (apparently) does not
  allow reading from stderr, which is what compress wants to do, so I just
  made it not overwrite the file unless the -f option is used.

- If you compress a file on UNIX and try to download it, it may grow in size
  because some transfer programs/protocols append null characters to the
  ends of files.  Unfortunately this may cause compress to become confused...
  the best solution is probably to encapsulate it with something like NuLib
  (archive w/o compression) and then use ShrinkIt to extract the compressed
  file (extract uncompressed; works pretty fast).  This will preserve the
  original EOF.

- This won't work on subdirectories or extended files.

lvirden@pro-tcc.cts.com (Larry Virden) (12/18/89)

In-Reply-To: message from fadden@cory.Berkeley.EDU

Andy, what source code changes have been made to this compress?  I know of a
source for a 16 bit compress in C which works on MS-DOS.  Since they have many
of the same memory problems that the IIgs does, I wonder if we couldnt use
that code pretty much as is.

Also, has anyone thought of a way to build an interface to a C program so that
it would operate in 3 modes:

1. from a shell with arguments
2. from a program launcher without arguments (from prosel, etc.)
3. from the Finder launched as a part of an Icon.

I would think that a standard subroutine could pretty much be written up which
could use getopt to parse arguments as provided by the user, or prompt the
user (perhaps as dialog boxes?) as necessary.

Has anyone taken this source code and compiled it under Orca/C?
-- 
Larry W. Virden                 ProLine: pro-tcc!lvirden
674 Falls Place                 Work:   lvirden@cas.bitnet
Reynoldsburg, OH 43068-1614     Aline:  LVIRDEN
                                CIS:    75046,606

fadden@cory.Berkeley.EDU (Andy McFadden) (12/19/89)

In article <3734.feeds.info-apple@pro-tcc> lvirden@pro-tcc.cts.com (Larry Virden) writes:
>In-Reply-To: message from fadden@cory.Berkeley.EDU
>Andy, what source code changes have been made to this compress?  I know of a
>source for a 16 bit compress in C which works on MS-DOS.  Since they have many
>of the same memory problems that the IIgs does, I wonder if we couldnt use
>that code pretty much as is.

I made very few changes to the code, and even then the changes were only to
things involving signal(), chmod(), and utime().  The most recent version I
have is a slightly hacked v4.0; I wasn't aware until today of an MS-DOS
compatible version.  At any rate, I made *no* modifications to the code
that did the actual compression, and wound up writing my own versions of
stat() and unlink() so that those portions of the code could remain unchanged.

Apparently there already is an APW version that handles 16-bit codes; if it
isn't on an archive site, perhaps the author will do a repost (the original
was posted about a year ago).

The reason I chose to limit it to 13-bit codes is the following:

- Minimum free memory for 13 bits is 73464 bytes.
- Minimum free memory for 16 bits is 433484 bytes.
- 13-bit codes require 16-bit ints; 16-bit codes require 32-bit longs (so
all handling of the codes is slowed by 50%).
- The M_XENIX defines (for compilers unable to reasonably cope with arrays >
64K) only handle BITS = 12, 13, or 16.

As far as I can tell, modifying it to handle 16-bit codes would require
changing one line, but would increase memory use from about 125K to around
455K.  I guess if you're using APW, then you've probably got enough memory
to handle it, but I wanted it to work for somebody on a 768K system running
ECP-16.

>Also, has anyone thought of a way to build an interface to a C program so that
>it would operate in 3 modes:
>
>1. from a shell with arguments
>2. from a program launcher without arguments (from prosel, etc.)
>3. from the Finder launched as a part of an Icon.

Change the filetype to S16.  Rogue works that way.  It may be possible to
check the filetype of argv[0]; that would give you a good indication of how
the program is being run, and allow you to react appropriately.  I think
numbers 2 & 3 are pretty much the same...

>-- 
>Larry W. Virden                 ProLine: pro-tcc!lvirden

-- 
fadden@cory.berkeley.edu (Andy McFadden)
...!ucbvax!cory!fadden

mattd@Apple.COM (Matt Deatherage) (12/19/89)

In article <3734.feeds.info-apple@pro-tcc> lvirden@pro-tcc.cts.com (Larry Virden) writes:
>
>Also, has anyone thought of a way to build an interface to a C program so that
>it would operate in 3 modes:
>
>1. from a shell with arguments
>2. from a program launcher without arguments (from prosel, etc.)
>3. from the Finder launched as a part of an Icon.
>
>I would think that a standard subroutine could pretty much be written up which
>could use getopt to parse arguments as provided by the user, or prompt the
>user (perhaps as dialog boxes?) as necessary.
>
>Has anyone taken this source code and compiled it under Orca/C?
>-- 
>Larry W. Virden                 ProLine: pro-tcc!lvirden
>674 Falls Place                 Work:   lvirden@cas.bitnet
>Reynoldsburg, OH 43068-1614     Aline:  LVIRDEN
>                                CIS:    75046,606

Actually, this is not difficult.  When a program is launched, the X and Y
registers point to the arguments (if there are any), and are both $0000 if
there are none.  The "Finder" interface is the same for all program launchers
(including ProSEL), which specifies that if Message #1 exists in the Message
Center, it's the pathnames of documents to open.

The MessageCenter chapter of Toolbox Reference and the GS/OS Reference contain
all the information any interested party needs to write such a piece of code.

-- 
============================================================================
Matt Deatherage, Apple Computer, Inc. | "The opinions represented here are
Developer Technical Support, Apple II |  not necessarily those of Apple
Group.  Personal mail only, please.   |  Computer, Inc.  Remember that."
============================================================================

lvirden@pro-tcc.cts.com (Larry Virden) (12/22/89)

In-Reply-To: message from mattd@Apple.COM

Thanks Matt - sounds very useful.  Maybe when I get things together here far
enough along to begin doing some work I will put one of those little buggers
together.
-- 
Larry W. Virden                 ProLine: pro-tcc!lvirden
674 Falls Place                 Work:   lvirden@cas.bitnet
Reynoldsburg, OH 43068-1614     Aline:  LVIRDEN
                                CIS:    75046,606