[comp.binaries.ibm.pc.d] SIMTEL20 to ban ARC files

W8SDZ@SIMTEL20.ARMY.MIL (Keith Petersen) (09/08/88)

SIMTEL20 today announced that it will soon be banning all ARC files
from its archives.

We have contacted Phil Katz and asked to make arrangements for
anything he develops in whatever new format in C source so that it can
ported to our TOPS-20 operating system.  Then, as time permits, it is
our intent to convert *everything* on SIMTEL20 from ARC to that new
format, including the PC/BLUE files.

--Keith Petersen
Maintainer of the CP/M and MSDOS archives at SIMTEL20.ARMY.MIL [26.0.0.74]
Arpa: W8SDZ@SIMTEL20.ARMY.MIL
Uucp: {att,decwrl,harvard,ucbvax,uunet,uw-beaver}!simtel20.army.mil!w8sdz

nelson@sun.soe.clarkson.edu (Russ Nelson) (09/08/88)

   SIMTEL20 today announced that it will soon be banning all ARC files
   from its archives.

   We have contacted Phil Katz and asked to make arrangements for
   anything he develops in whatever new format in C source so that it can
   ported to our TOPS-20 operating system.  Then, as time permits, it is
   our intent to convert *everything* on SIMTEL20 from ARC to that new
   format, including the PC/BLUE files.

So what is wrong with zoo?  looz is Public Domain, the real thing.  I think
that going with yet another commercial product is just asking for trouble.
I think that Rahul would not object at all if someone came out with a
commercial version of zoo, provided, of course, that it didn't violate the
copyright on his code.
--
--russ (nelson@clutx [.bitnet | .clarkson.edu])
Shuzan held out his short staff and said, "If you call this a short staff,
you oppose its reality.  If you do not call it a short staff, you ignore the
facts.  Now, what do you wish to call it?"

malpass@vlsi.ll.mit.edu (Don Malpass) (09/08/88)

In article <NELSON.88Sep7224416@sun.soe.clarkson.edu> nelson@clutx.clarkson.edu writes:
>So what is wrong with zoo?  looz is Public Domain, the real thing.  I think
>that going with yet another commercial product is just asking for trouble.

I second the motion.
-- 
Don Malpass   [malpass@LL-vlsi.arpa],  [malpass@spenser.ll.mit.edu] 
  My opinions are seldom shared by MIT Lincoln Lab, my actual
    employer RCA (known recently as GE), or my wife.

hartung@sdics.ucsd.EDU (Jeff Hartung) (09/08/88)

In article <159@vlsi.ll.mit.edu> malpass@ll-vlsi.arpa.UUCP (Don Malpass) writes:
>In article <NELSON.88Sep7224416@sun.soe.clarkson.edu> nelson@clutx.clarkson.edu writes:
>>So what is wrong with zoo?  looz is Public Domain, the real thing.
>I second the motion.

Add my vote as well.  It makes no sense to set oneself up for the same
problems that have arisen from the recent ARC wars.  (Personally, I'd
have liked zoo better even if there *hadn't* been a court battle over
ARC.)

-- 
 --Jeff Hartung--                        

 ARPA - hartung@sdics.ucsd.edu          
 UUCP - !ucsd!sdics!hartung

davidsen@steinmetz.ge.com (William E. Davidsen Jr) (09/09/88)

  There has been a lot of stuff posted about changing from arc to
another format, including the following execpts. I feel very strongly
that if people want to get off of ARC they should consider going to an
established standard, like zoo. To jump from one proprietary standard to
another is really bad from a technology standpoint, and to announce that
you will do so before the product exists seems totally irrational.

  If PK releases a new format, and it's slow, buggy, and produces larger
files than ARC(tm) would you go to it anyway? Or would you want to say
"I made a bad statement, and I'm not going to do it?" If PK decides to
*require* payment and set it at $100, are you still going to do it? Is
PK going to allow anyone to use his algorithms and file format, or lock
them down, which you found so unpalitable from SEA?

  Rahul has never claimed any control over his file format, and even has
a more or less document, which I'm trying to use to write some utilities
to compliment zoo. 

I'll go on record, too, with a rash statement about the new format:
	  I will evaluate the new compressor when and if it comes out,
	on the basis of speed, compression, and user acceptance. I
	will leave social issues out of technical decisions, and only
	change if the new method is better than what I use now.

Technical issue:

  I have tried a quick and dirty grafting of splay tree compression to
my old FastArch program (remember that one, old people) and found that
it is slower than zoo by at least 20%. I'm sure that you could do better
in assembler, if you don't mind a week or two of porting for each
machine, and taking the chance that the person doing the port didn't
mess it up so it won't unpack.

These are some examples of the "blind faith" acceptance of the new
standard file compressor:

  From: W8SDZ@SIMTEL20.ARMY.MIL (Keith Petersen)
  Subject: Exec-PC BBS to ban ARC files
  Date: 7 Sep 88 21:05:00 GMT
  
  [Exec-PC BBS is one of the largest MS/PCDOS-oriented systems in the country]
  -----
  
  A STATEMENT FROM THE EXEC-PC BBS CONCERNING THE RECENT SEA VS PKWARE SUIT
	[ ... ]
  
  Exec-PC has decided the following:  As soon as PKware brings out a new
  format for creating crunched/squeezed/squashed/packed/tramped collections
  of files, that new format will be used for ALL files on the Exec-PC BBS.
  At last count Exec-PC had more than 16,000 files online in the arc format.
  
  Many sysops are nervous about the amount of work required to convert to
  a new format.  I don't understand what the problem is.  A simple batch file
  can be created for unarcing all the old files, then rePAKing them into the
  new format.

How about terrified? Forget the people work, on an AT class machine
figure about 900-1500 bytes/sec to unarc and repack, based on the
*uncompressed* size of the files. Add about 1.5 sec/file to create and
delete directory entries, etc, and then look at how much stuff you have.
The answer comes in days. Do I want to take my system down for days? Do
I want to do the conversion in the background and take weeks? To go to a
format which 95% of my users don't have?

I have always let the users vote with their modems. If people upload in
arc format, or pkarc, or zoo or dwc, and if they download in those
formats, they're telling me something.

================================================================
  From: W8SDZ@SIMTEL20.ARMY.MIL (Keith Petersen)
  Subject: SIMTEL20 to ban ARC files
  Date: 7 Sep 88 22:16:00 GMT

  
  SIMTEL20 today announced that it will soon be banning all ARC files
  from its archives.
  
  We have contacted Phil Katz and asked to make arrangements for
  anything he develops in whatever new format in C source so that it can
  ported to our TOPS-20 operating system.  Then, as time permits, it is
  our intent to convert *everything* on SIMTEL20 from ARC to that new
  format, including the PC/BLUE files.


I make the same comment... are the archives there to serve the users or
as a political statement? I assume that having the archives available is
a public service. Would your conversion include changing the volume
documentation to use the pnf (Phil's new format) extension? How about
the program documentation?

================================================================

  I don't want anyone to think I have anything against PK (or SEA),
simply that I see a lot of decisions being made about adopting new
standards which haven't been published, running new software for which
the design spec isn't even available, and all predicated on the
assumption that this will somehow attain some social goals.

  I freely admit that I like zoo, and that if I were changing I would
go to zoo as a proven non-shareware standard. Most of the UNIX programs
on my bbs are in zoo format, except for the zoo source itself. I would
accept another format if it were clearly needed or wanted, but only on
technical grounds.

-- 
	bill davidsen		(wedu@ge-crd.arpa)
  {uunet | philabs}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

w8sdz@smoke.ARPA (Keith B. Petersen ) (09/09/88)

SIMTEL20 will be going with Phil Katz's new file archiving method
because it will be a standard set by a group effort of various
well-known shareware and PD authors.  The file format will be clearly
defined and a public release will be made of portable C-language sources
suitable for porting to any operating system with a C compiler.  The
file format and the portable source code will be placed in the PUBLIC
DOMAIN, with no restrictions on how it may be distributed.  If you want
to pay $12.50 an hour for downloading it from one service when it is
also available from another for $5 an hour, that's your business.

--Keith
-- 
Keith Petersen
Arpa: W8SDZ@SIMTEL20.ARMY.MIL
Uucp: {att,decwrl,harvard,lll-crg,ucbvax,uw-beaver}!simtel20.army.mil!w8sdz
GEnie: W8SDZ

ralf@b.gp.cs.cmu.edu (Ralf Brown) (09/09/88)

In article <12094@steinmetz.ge.com> davidsen@crdos1.UUCP (bill davidsen) writes:
}
}  There has been a lot of stuff posted about changing from arc to
}another format, including the following execpts. I feel very strongly
}that if people want to get off of ARC they should consider going to an
}established standard, like zoo. To jump from one proprietary standard to
}another is really bad from a technology standpoint, and to announce that
}you will do so before the product exists seems totally irrational.

Consider this another vote for using ZOO on SIMTEL20.  I will be uploading
the next version of the interrupt list as a ZOOchive (and not just because it
is nearly 5K *smaller* than pkarc -oct).
-- 
{harvard,uunet,ucbvax}!b.gp.cs.cmu.edu!ralf -=-=- AT&T: (412)268-3053 (school) 
ARPA: RALF@B.GP.CS.CMU.EDU |"Tolerance means excusing the mistakes others make.
FIDO: Ralf Brown at 129/31 | Tact means not noticing them." --Arthur Schnitzler
BITnet: RALF%B.GP.CS.CMU.EDU@CMUCCVMA -=-=- DISCLAIMER? I claimed something?

db21@ihlpl.ATT.COM (Beyerl) (09/09/88)

In article <NELSON.88Sep7224416@sun.soe.clarkson.edu>, nelson@sun.soe.clarkson.edu (Russ Nelson) writes:
> 
> So what is wrong with zoo?  looz is Public Domain, the real thing.  I think
> that going with yet another commercial product is just asking for trouble.

	Before we adopt some new standard for creating 'archive'
type files, I think we need to consider more than just 'is it 
faster?' and 'does it produce more compact files?'.  I believe we 
also need to consider how easy the program is to use.  Along with 
that is how easy is its name(s) to remember.  With ARC and PKARC 
there are one, maybe two, easily remembered names.  For ZOO there 
are a number of non-relating names which the user has to be 
familiar with.  I have found this somewhat confusing and as a 
result have developed the following memory aide to keep tract of 
all the pieces: 

	"Arch tool [arctool] used by dogs [arff] while 
    drinking [fiz, booz] at the zoo [zoo] and saying [sez]
    loose [looz] rhyme, the alphabet [atoz], and other
    stuff [stuff]."

	I am sorting this out, but I still don't feel comfortable
in adopting as our standard a program that has so many pieces and
non-relating names.  Perhaps Rahul, as he improves the program,
could pull this altogether under one, integrated package.

					Dave Beyerl
					ihnp4!ihlpl!db21

tneff@dasys1.UUCP (Tom Neff) (09/09/88)

In article <KPETERSEN.12428752438.BABYL@SIMTEL20.ARMY.MIL> W8SDZ@SIMTEL20.ARMY.MIL (Keith Petersen) writes:
>SIMTEL20 today announced that it will soon be banning all ARC files
>from its archives.
>
>We have contacted Phil Katz and asked to make arrangements for
>anything he develops in whatever new format ...
 ^^^^^^^^^^^^^^^^^^^^

Politics 2, Users 0.  (Commonsense 0 as well.)


-- 
Tom Neff			UUCP: ...!cmcl2!phri!dasys1!tneff
	"None of your toys	CIS: 76556,2536	       MCI: TNEFF
	 will function..."	GEnie: TOMNEFF	       BIX: t.neff (no kidding)

stevev@uoregon.uoregon.edu (Steve VanDevender) (09/10/88)

In article <6630@ihlpl.ATT.COM> db21@ihlpl.ATT.COM (Beyerl) writes:
>	Before we adopt some new standard for creating 'archive'
>type files, I think we need to consider more than just 'is it 
>faster?' and 'does it produce more compact files?'.  I believe we 
>also need to consider how easy the program is to use.  Along with 
>that is how easy is its name(s) to remember.  With ARC and PKARC 
>there are one, maybe two, easily remembered names.  For ZOO there 
>are a number of non-relating names which the user has to be 
>familiar with.  I have found this somewhat confusing and as a 
>result have developed the following memory aide to keep tract of 
>all the pieces: 
>
>	"Arch tool [arctool] used by dogs [arff] while 
>    drinking [fiz, booz] at the zoo [zoo] and saying [sez]
>    loose [looz] rhyme, the alphabet [atoz], and other
>    stuff [stuff]."
>
>	I am sorting this out, but I still don't feel comfortable
>in adopting as our standard a program that has so many pieces and
>non-relating names.  Perhaps Rahul, as he improves the program,
>could pull this altogether under one, integrated package.
>
>					Dave Beyerl
>					ihnp4!ihlpl!db21

Things aren't so complicated as you seem to think, or, at least, the
ZOO system is no more complex than the ARC/PKARC systems.

ARCTOOL and ARFF aren't even part of Rahul Dhesi's ZOO system.  ARFF
will work with ZOO files since it uses the archive program of your
choice to inspect archives; ARCTOOL won't even work on ZOO archives.

ZOO performs all the archiving functions of ARC or PKARC/PKXARC.  If
you have ZOO, you're as well off as if you had ARC or any of its
variants (and in some ways, better off).  This is the one, integrated
package you're looking for, and the one name you need to remember.

LOOZ might be thought of as analogous to the ARCE program--it's just a
small, quick program that only unpacks ZOO archives.  However, it
doesn't do anything ZOO doesn't and is just there if you want to
unpack your ZOO archives a little faster, or don't want to make ZOO
archives, just extract them.

BOOZ is a stripped-down ZOO which is mainly of use to people with tiny
machines.  It isn't something most ZOO users would need or want.

FIZ is roughly like an ARCTOOL for ZOO files.  It will scan damaged
ZOO archives or self-extracting archives and find directory and data
blocks, so you can use ZOO to extract files from them.  The ability of
ZOO to extract from a self-extracting archive is unique--I've never
heard of a way to do it with ARC or PKXARC.

ATOZ is also unique among archiving tools--I never saw a utility that
let people convert from the LBR format to ARC, anyway.  Having just
converted my archives to ZOO format myself, I can vouch for its
usefulness.

STUFF is part of the MS-DOS ZOO distribution and is simply a utility
that creates filename lists that ZOO uses to pack a subdirectory tree.
If you want to use ZOO just like you used ARC, you don't need STUFF.

My experience with ZOO has been positive so far.  It's almost as fast as
PKARC and compresses almost as well as PKARC but better than ARC.  Since
I do a lot of transfers between DOS and UNIX, I like that ZOO has been
developed and tested under UNIX.  Since our site also has a VMS system,
I'd recommend it for transferring files to VMS, too (if I can get ahold
of the VMS version).

After some reflection, I think that SIMTEL20's decision to ban ARC
files in favor of whatever Phil Katz comes up with next may not be
wise.  When C source for ZOO is available now, and has been already
debugged and tested, and isn't a commercial product or even shareware,
it seems to me to be the best choice for a new archiving system, if
that's what's really needed.  I honestly don't think that the ARC
format will go away anytime soon--I'm certainly holding on to my
copies, whether it's politically correct or not.  With a little
convoluted reasoning, one might argue that it would be best to abandon
the ARC format because PKARC (now PKPAK) may soon be illegal, and most
would rather use it than painfully slow ARC, so let's switch to
something else that will be equally fast . . . but it seems to be more
in the spirit of a boycott in protest of SEA's recent actions.

Why did I convert to ZOO?  All of the recent discussion has made me aware
of its capabilities and features, and it sounded far more intelligent than
either ARC or PKARC.  I've lately gotten tired of having to remember to
type PKPAK and PKUNPAK; I could have renamed them, but I prefer to stick
with the distribution names of programs.  I also found the inconsistencies
between PKPAK and PKUNPAK to be a bit annoying--mostly that PKPAK used
options without dashes, and PKUNPAK required options with dashes.  And,
admittedly, I became a little dubious of both ARC and the PK stuff after
reading about all the recent brouhaha.

From my experience with ZOO in my private use so far, I'd recommend it as
the file archiver of choice.  If everyone is really serious about abandoning
ARC, I'd also hope that people switch to ZOO instead of an archiver that
isn't even out yet.
-- 
Steve VanDevender	uoregon!drizzle!stevev	stevev@oregon1.BITNET
"Bipedalism--an unrecognized disease affecting over 99% of the population.
Symptoms include lack of traffic sense, slow rate of travel, and the
classic, easily recognized behavior known as walking."

Ralf.Brown@B.GP.CS.CMU.EDU (09/11/88)

In article <6630@ihlpl.ATT.COM>, db21@ihlpl.ATT.COM (Beyerl) writes:
}[for ARC] there are one, maybe two, easily remembered names.  For ZOO there
}are a number of non-relating names which the user has to be 
}familiar with.  I have found this somewhat confusing and as a 
}result have developed the following memory aide to keep tract of 
}all the pieces: 
}
}        "Arch tool [arctool] used by dogs [arff] while 
}    drinking [fiz, booz] at the zoo [zoo] and saying [sez]
}    loose [looz] rhyme, the alphabet [atoz], and other
}    stuff [stuff]."

Comparing the corresponding pieces, we get:

        ARC(tm?)                ZOO
        ---------------------------
        arc                     zoo
        pkarc/pkxarc            zoo
        pksfx                   sez
        arctool                 fiz
        arce                    booz OR looz
        pkfind OR arff          arff
        ???                     stuff

Looks pretty much even to me....

--
UUCP: {ucbvax,harvard}!cs.cmu.edu!ralf -=-=-=- Voice: (412) 268-3053 (school)
ARPA: ralf@cs.cmu.edu  BIT: ralf%cs.cmu.edu@CMUCCVMA  FIDO: Ralf Brown 1:129/31
Disclaimer? I     |Ducharm's Axiom:  If you view your problem closely enough
claimed something?|   you will recognize yourself as part of the problem.

w8sdz@smoke.ARPA (Keith B. Petersen ) (09/11/88)

Read the documentation for the latest version of ZOO and note that it is
*not* public domain.  There is also a redistribution restriction which
prevents it from being universally available.

The new archiving method will be a group effort, not just Phil Katz, of
well-known shareware and PD authors who are all tired of being in
jeopardy of being sued by SEA.  The next time around it will be TOTALLY
PUBLIC DOMAIN (that's a shout, folks!).  Have I got your attention?
This has nothing to do with court cases.  It has to do with public
domain.

-- 
Keith Petersen
Arpa: W8SDZ@SIMTEL20.ARMY.MIL
Uucp: {att,decwrl,harvard,lll-crg,ucbvax,uw-beaver}!simtel20.army.mil!w8sdz
GEnie: W8SDZ

pjh@mccc.UUCP (Pete Holsberg) (09/15/88)

Keith,
	When will it be ready?

rlb@xanth.cs.odu.edu (Robert Lee Bailey) (09/16/88)

In article <8465@smoke.ARPA> you write:
>SIMTEL20 will be going with Phil Katz's new file archiving method
>because it will be a standard set by a group effort of various
>well-known shareware and PD authors.  The file format will be clearly
>defined and a public release will be made of portable C-language sources
>suitable for porting to any operating system with a C compiler.  The
>file format and the portable source code will be placed in the PUBLIC
>DOMAIN, with no restrictions on how it may be distributed.  If you want
>to pay $12.50 an hour for downloading it from one service when it is
>also available from another for $5 an hour, that's your business.
AMEN!

I'm glad to see that finally there is going to be some cooperation
in setting a standard archive format.  I, for one, don't like being
held at the whim of a company (SEA) that wants to act in such
a manner that hurts everyone in the PC universe.  

I hope that this standard will also be submitted to IEEE for
consideration.  IEEE adoption of this as a standard would insure
that NO ONE company could claim that it belongs to them.  This
would certainly insure file compatibility regardless of the type
of system.

		Bob Bailey

loci@csccat.UUCP (Chuck Brunow) (09/17/88)

In article <8475@smoke.ARPA> w8sdz@brl.arpa (Keith Petersen) writes:
> ...
>TOTALLY PUBLIC DOMAIN (that's a shout, folks!).  Have I got your attention?
>This has nothing to do with court cases.  It has to do with public
>domain.
>
	Bravo, great, wonderful, and good thinking. I'm curious to know
	what the compression scheme will be.  Could we just concentrate
	on LZW or are the other methods of use? Seems like the program
	could be smaller and faster if it speciallized. Comments?


-- 
			CLBrunow - ka5sof
	clb@loci.uucp, loci@csccat.uucp, loci@killer.dallas.tx.us
	  Loci Products, POB 833846-131, Richardson, Texas 75083

jamesd@qiclab.UUCP (James Deibele) (09/18/88)

In article <8465@smoke.ARPA> w8sdz@brl.mil (Keith Petersen) writes:
>SIMTEL20 will be going with Phil Katz's new file archiving method
>because it will be a standard set by a group effort of various
>well-known shareware and PD authors.  The file format will be clearly
>defined and a public release will be made of portable C-language sources
>suitable for porting to any operating system with a C compiler.  The
>file format and the portable source code will be placed in the PUBLIC
>DOMAIN, with no restrictions on how it may be distributed.  If you want

Did Thom Henderson (principal of SEA) kick your dog or something, Keith?
ARC is a file format that's clearly defined, available now, with source
available.  Machines that are capable of reading SEA-style ARC format are
not limited to IBM and clones, but include Amiga, Apple, Atari (8-bit), 
Atari ST, CP/M, Macintosh, UNIX, and VMS machines.  For all I know, there
are more.  SEA hasn't gone after anybody except Phil Katz, who was going
after their bread-and-butter market, site licensing (that's where the
shareware authors who make more than $50 total in registrations are making
their money, not from the public).

There's been a lot of concern over the lawsuit in FidoNet because ARC (or
its derivatives) is used constantly to move mail.  Henderson has expressed
some agreement to signing releases for use of ARC, but so far as I know,
no-one has taken them up on it.

SEA is a four-person firm.  PKware is a four-person firm.  SEA approached
PKware about coming to some sort of licensing agreement, the terms of which
only PKware and SEA know.  PKware declined.  SEA spent 8 months and $40,000
in legal fees getting their ducks lined up, time they undoubtedly could have
put to better use in speeding up ARC.  From what I remember, SEA tolerated
PKware for a long time, even though PKware was actively solicting donations
from day one.  They reacted only when PKware started taking out ads deni-
grating ARC.

SIMTEL20 can do whatever.  Encode your stuff with ROT-13 if it brings you joy.
Meanwhile, I'll continue to use a standard that all my callers, even non-DOS
types, can use.  

-- 
James S. Deibele   jamesd@qiclab or jamesd@percival 
TECHBooks: The Computer Book Specialists   (800) TECH-BKS
3646 SE Division  Portland, OR  97202      (503) 238-1005
TECHBooks One BBS (#1:105/4.0); 3/12/24    (503) 760-1473

w8sdz@smoke.ARPA (Keith B. Petersen ) (09/18/88)

The officers of FIDO had better do some calling around, as I did.
Tell them to start with Gary Conway, author of the popular NARC
menu-driven ARC viewer/extractor/etc.  Gary's program is written TOTALLY
in assembler.  Gary has had to hire a lawyer after SEA contacted him
with certain demands only Gary can tell you about.

I'm surprised that the Fido group is so uninformed.  They had better do
some inventigation of their own before aligning themselves with SEA.
Don't take my word - call various shareware authors who write programs
that do anything with or to ARC files.  Many of them are on CIS's IBM
Forum and they are up in arms!  Read the "Hot Topics" thread there.

Please let's move this to comp.sys.ibm.pc *only*.  The binaries
discussion group is for discussing the programs posted to
comp.binaries.ibm.pc.
-- 
Keith Petersen
Arpa: W8SDZ@SIMTEL20.ARMY.MIL
Uucp: {att,decwrl,harvard,lll-crg,ucbvax,uw-beaver}!simtel20.army.mil!w8sdz
GEnie: W8SDZ

haugj@pigs.UUCP (John F. Haugh II) (09/19/88)

In article <2594@csccat.UUCP> loci@csccat.UUCP (Chuck Brunow) writes:
>	                                      Could we just concentrate
>	on LZW or are the other methods of use? Seems like the program
>	could be smaller and faster if it speciallized. Comments?

if the data were known to be 7 bit pure, or say, in a 96 character
ascii subset (pure readable text), then additional file compression
tools can be thrown at it.  the first which comes to mind is atob
compression [ no, not REALLY a compression technique, but it works ].

i'd like to see an ARC tool which did NO compression and then have
another tool which did the compression and then yet another for the
decompression.  that would yield three small tools, each of which
could be highly specialized.
-- 
=-=-=-=-=-=-=-The Beach Bum at The Big "D" Home for Wayward Hackers-=-=-=-=-=-=
               Very Long Address: John.F.Haugh@rpp386.dallas.tx.us
                         Very Short Address: jfh@rpp386
                           "ANSI C: Just say no" -- Me

brad@looking.UUCP (Brad Templeton) (09/21/88)

The fact is that for the net compression is not desirable.  It clouds the
issue, sometimes *increases* transmission time, and just makes postings
harder to deal with.

I would suggest we use an existing format like "cpio" to do archiving.
Writing a decode only cpio program should be fairly trivial.  Cpio supports
all sorts of file info, directories and links.  It is well known, already
comes standard with many Unix machines, and is part of Posix, as I
understand it.

There are also quality non-pd CPIO programs out there, for those that want
them.

I would support TAR if it didn't put all files on block boundaries, which
can be wasteful.

Or even a slightly modified "par." (par is a PD archiver that was posted to
the net a while ago.  The source was posted, and it's really very simple.
It's compatible with 4BSD's "ar" as well.)  The original par did not have
proper support of directories.

For private archives, use ARC, PKPAK, ZOO whatever you like.  For archives
to give to people, let's be simple, non-compressed and already supported.
-- 
Brad Templeton, Looking Glass Software Ltd.  --  Waterloo, Ontario 519/884-7473

art@felix.UUCP (Art Dederick) (09/21/88)

In article <424@pigs.UUCP> haugj@pigs.UUCP (John F. Haugh II) writes:
>i'd like to see an ARC tool which did NO compression and then have
>another tool which did the compression and then yet another for the
>decompression.

tar cf - foo | compress > foo.tar.Z
zcat foo.tar.Z | tar xf -

A PD tar has been published.  Compress is already PD.
Now that we have the solution, lets not hear any more about this.

Now where did I put that flame proof suit? :-)

Art D.
{hplabs|oliveb}!felix!art

ked@garnet.berkeley.edu (Earl H. Kinmonth) (09/21/88)

In article <2054@looking.UUCP> brad@looking.UUCP (Brad Templeton) writes:
>The fact is that for the net compression is not desirable.  It clouds the

>I would suggest we use an existing format like "cpio" to do archiving.

>I would support TAR if it didn't put all files on block boundaries, which
>can be wasteful.

Yes, but this attribute makes it easier to recover at least part
of a damaged archive.  I've had to do this for both cpio and tar
archives.  The latter can usually be handled with dd and shell
loop to skip to the first valid header.  Recovering mangled cpio
archives requires a program capable of finding the "magic"
element, which may occur anywhere in a block.

ibmbin@bsu-cs.UUCP (09/22/88)

In article <8526@smoke.ARPA> w8sdz@smoke.ARPA (Keith B. Petersen ) writes:

   Please let's move this to comp.sys.ibm.pc *only*.  The binaries
   discussion group is for discussing the programs posted to
   comp.binaries.ibm.pc.

The discussion on a new archive format is now scattered over at least 3 groups
     comp.sys.ibm.pc,comp.binaries.ibm.pc,comp.sys.misc

Please let us concentrate it in a single group. I suggest comp.sys.misc,
because it it not limited to a single type of system or operating system.


-- 
Piet van Oostrum, Dept of Computer Science, University of Utrecht
Padualaan 14, P.O. Box 80.089, 3508 TB Utrecht, The Netherlands
Telephone: +31-30-531806              UUCP: ...!mcvax!ruuinf!piet

tneff@dasys1.UUCP (Tom Neff) (09/22/88)

In article <424@pigs.UUCP> haugj@pigs.UUCP (John F. Haugh II) writes:
>i'd like to see an ARC tool which did NO compression and then have
>another tool which did the compression and then yet another for the
>decompression.  that would yield three small tools, each of which
>could be highly specialized.

This is somewhat reminscent of the old order-of-operations quandary
in the days of LBR and SQ on CP/M and MSDOS.  There you had two separate
tools, a squeezer and a librarian.  There were two headaches.  First,
people kept LBR'ing *first* and then squeezing (yielding an LQR file),
which is the wrong way to do it for two well-defined reasons[1].
Second, it was an unnecessarily complicated business managing archives
with so many processing steps and intermediate files.  People tried to
work up batch files to automate the process, but they were not reliable
or portable in general.

One of the things ARC (in any form) really brought to the party was
unparallelled user convenience -- combining the steps invisibly and
selecting an algorithm automatically were godsends.  There was a
certain size and performance tradeoff but it was well worth it for
almost all users.

Small, specialized tools are great too of course, which is why Buerg
wrote the little ASM one-shot extractors and builders.

--------------------------

NOTES

[1] Squeezed libraries are much worse than libraries of individually
squeezed members because (a) an LQR has no immediately accessible
directory structure - it has to be unsqueezed before you can look at
it; and (b) dissimilar member files (README, executable, fonts etc)
yield a heterogeneous library which responds poorly to most types
of compression.

-- 
Tom Neff			UUCP: ...!cmcl2!phri!dasys1!tneff
	"None of your toys	CIS: 76556,2536	       MCI: TNEFF
	 will function..."	GEnie: TOMNEFF	       BIX: t.neff (no kidding)

malpass@vlsi.ll.mit.edu (Don Malpass) (09/22/88)

In article <59253@felix.UUCP> art@felix.UUCP (Art Dederick) writes:
>A PD tar has been published.  Compress is already PD.
>Now that we have the solution, lets not hear any more about this.
>
Perhaps a pointer to DOS-PD tar and compress that has been tested
and blessed?  Certainly I'm interested.
-- 
Don Malpass   [malpass@LL-vlsi.arpa],  [malpass@spenser.ll.mit.edu] 
  My opinions are seldom shared by MIT Lincoln Lab, my actual
    employer RCA (known recently as GE), or my wife.

tneff@dasys1.UUCP (Tom Neff) (09/23/88)

In article <2054@looking.UUCP> brad@looking.UUCP (Brad Templeton) writes:
>The fact is that for the net compression is not desirable.  It clouds the
>issue, sometimes *increases* transmission time, and just makes postings
>harder to deal with.

However, the net is more than its bandwidth -- it is also its component
sites, and disk space is a resource just like transmission time. No one
whose spool volume has filled lately is likely to look kindly on doubling
their archive allocation.

I disagree with Brad that compression "clouds the issue" or "makes
postings harder to deal with," I think those are just afterthoughts
to his real argument which is that 'compress' has zero to negative
effect on pre-compressed files -- so that sites which batch news
compressed may actually spend a few percent more time on a pre-compressed
binary than on an uncompressed one.  My answer is that even if this
were a major headache (and I'm not convinced it is), there ought to
be some way of segregating your binaries feed so it runs uncompressed.

-- 
Tom Neff			UUCP: ...!cmcl2!phri!dasys1!tneff
	"None of your toys	CIS: 76556,2536	       MCI: TNEFF
	 will function..."	GEnie: TOMNEFF	       BIX: t.neff (no kidding)

davidsen@steinmetz.ge.com (William E. Davidsen Jr) (09/27/88)

In article <6583@dasys1.UUCP> tneff@dasys1.UUCP (Tom Neff) writes:

| This is somewhat reminscent of the old order-of-operations quandary
| in the days of LBR and SQ on CP/M and MSDOS.  There you had two separate
| tools, a squeezer and a librarian.  There were two headaches.  First,
| people kept LBR'ing *first* and then squeezing (yielding an LQR file),
| which is the wrong way to do it for two well-defined reasons[1].

| --------------------------
| 
| NOTES
| 
| [1] Squeezed libraries are much worse than libraries of individually
| squeezed members because (a) an LQR has no immediately accessible
| directory structure - it has to be unsqueezed before you can look at
| it; and (b) dissimilar member files (README, executable, fonts etc)
| yield a heterogeneous library which responds poorly to most types
| of compression.

  Your conclusion is not universally valid. In the case where the files
are similar, such as source code, data files, etc, the compression will
be greater if the compress is done on the archive as a whole, since LZW
is adaptive and will improve for larger files.

  As an example I compressed a set of source files in two ways:
individually into an archive, and uncompressed into an archive and
compress the entire archive as a single file. The cpio method was to
compress files and cpio the results vs. cpio all files and compress. The
2nd allows use of zcat for directory or extract.

Results:
 archiver	individual%	group%
  zoo		58		68
  compress	57		68
  arc		55		66
  cpio+cmprs	60		66

  If the objective is convenient storage with some compression, your are
completely correct, but for saving disk space or transfer time it is not
optimal in many common cases.
-- 
	bill davidsen		(wedu@ge-crd.arpa)
  {uunet | philabs}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

cww@ndmath.UUCP (Clarence W. Wilkerson) (09/27/88)

I think the comparisons offered by Bill Davidsen are most
indicative of the effect of 16 bit compress compresses on
Unix system. With a 12 bit compress, I don't think you would
see such a large difference.

tneff@dasys1.UUCP (Tom Neff) (09/27/88)

If you have a bunch of relatively small files, you may see an aggregate
improvement in raw compression numbers by the LQR method, since you are
spending less room on the overhead of the individual dictionaries and the
library VTOC itself; however, what you sacrifice in terms of flexibility
of access makes it manifestly not worth it as a packing and distribution
method.  Not that it was easy to get this thru folks' heads in the old
days... :-)
-- 
Tom Neff			UUCP: ...!cmcl2!phri!dasys1!tneff
	"None of your toys	CIS: 76556,2536	       MCI: TNEFF
	 will function..."	GEnie: TOMNEFF	       BIX: t.neff (no kidding)

davidsen@steinmetz.ge.com (William E. Davidsen Jr) (09/30/88)

In article <1221@ndmath.UUCP> cww@ndmath.UUCP (Clarence W. Wilkerson) writes:
| 
| I think the comparisons offered by Bill Davidsen are most
| indicative of the effect of 16 bit compress compresses on
| Unix system. With a 12 bit compress, I don't think you would
| see such a large difference.

  A good point, but... I tried creating a zoo archive using the "no
compression" flag, then compressing it. Same order of magnitude of
results, compressed with arc, zoo, 12 or 16 bit compress. The results
were all the same, but if the archive being compressed had a relatively
small number of discrete tokens the 12 bit compress was actually
smaller by 8 bytes.

  As a further test I created a file holding an "ls -lR" listing of a
small subdirectory, then catted three copies into a 2nd file, and eight
copies of the 2nd file into a 3rd. I then compressed the files with zoo,
and here are the results.


Archive foo.zoo:
Length    CF  Size Now  Date      Time
--------  --- --------  --------- --------
    3619  57%     1543  29 Sep 88 14:10:16     x
   10857  68%     3457  29 Sep 88 14:10:16     y
   86856  79%    18601  29 Sep 88 14:10:18     z
--------  --- --------  --------- --------
  101332  77%    23601     3 files


  The reason the compression gets better on the same data is that the
LVW algorithm "learns" about the data, and therefore does a better job
as long as the data are similar. Using more bits only makes a big
difference when a LOT of data is being processed, and the number of
tokens to be remembered becomes larger than will fit in 12 bits.
-- 
	bill davidsen		(wedu@ge-crd.arpa)
  {uunet | philabs}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me