[comp.os.minix] LHARC available for MINIX!

nfs@cs.Princeton.EDU (Norbert Schlenker) (07/03/90)

In article <1990Jul2.143113.2267@jarvis.csri.toronto.edu> wayne@csri.toronto.edu (Wayne Hayes) writes:
>...
>LHARC now available for Minix!
>
>YES!  Now you too can have a *real* compression utility on your Minix
>system.
>
>Well, OK, "real" is a subjective term.  It doesn't accept pipe input.
>However it's compression ratios are second to none.  It'll blow away
>13-bit compress AND 16-bit compress.

A month ago, I was also amazed to discover that LHARC compiled and ran
virtually unchanged under Minix.  It really is a good compressor, and
it is nice to have a combination compressor/archiver, like all those
poor DOS folks.  I have never had any luck porting ARC, ZOO, or their
ilk to PC-Minix ... they always run out of code or data space or fail
in mysterious and untraceable ways.  LHARC worked first time.  Its
compression ratios are astonishing.  It is a truly fine program for
local archives.  But there is a wee problem ...

LHARC wasn't written with portability in mind.  The archive header
format is dependent on byte order within words and on the length of
integers.  A PC archive cannot be read on an ST and vice versa,
without considerable pulling of teeth.  I started to rewrite the parts
of the code that wrote the headers (I think something like a TAR
header, perhaps tagged to reduce the space required, would be much
better) but real work got in the way.

LHARC would be a wonderful method for distributing Minix software.  The
LZHUF algorithm is a clever device and achieves much better compression
than compress does.  Andy would be less likely to dominate net bandwidth
when posting new releases.  But it's all useless if the archives can't
be read by all the platforms on which Minix runs.

Norbert

wayne@csri.toronto.edu (Wayne Hayes) (07/03/90)

In article <781@rossignol.Princeton.EDU> nfs@cs.Princeton.EDU (Norbert Schlenker) writes:
>LHARC wasn't written with portability in mind.  The archive header
>format is dependent on byte order within words and on the length of
>integers.  A PC archive cannot be read on an ST and vice versa,
>without considerable pulling of teeth.

Have you actually tried this?  I'm sorry but I think you're wrong about
this.  Yes, the headers weren't written with portability in mind, but
the UNIX source I'll be posting handles this very well.  I've transfered
.LZH files from my PC under DOS to a Sun and to Minix and the same archived
file is readable on all systems.  There was a version out awhile ago that
I also ported and had to write my own INTEL <-> Motorola transfer
functions, but this version is by the original author of LHARC and much
cleaner.

-- 
Mathematics: That branch of Human Thought which takes a finite set of trivial
axioms and maps them to a countably infinite set of unintuitive theorems.

Wayne Hayes	INTERNET: wayne@csri.utoronto.ca	CompuServe: 72401,3525

nfs@cs.Princeton.EDU (Norbert Schlenker) (07/03/90)

In article <1990Jul3.011235.5774@jarvis.csri.toronto.edu> wayne@csri.toronto.edu (Wayne Hayes) writes:
>In article <781@rossignol.Princeton.EDU> nfs@cs.Princeton.EDU (Norbert Schlenker) writes:
>>LHARC wasn't written with portability in mind.  The archive header
>>format is dependent on byte order within words and on the length of
>>integers.  A PC archive cannot be read on an ST and vice versa,
>>without considerable pulling of teeth.
>
>Have you actually tried this?  I'm sorry but I think you're wrong about
>this.  Yes, the headers weren't written with portability in mind, but
>the UNIX source I'll be posting handles this very well...

Here is the Lharc header structure from the Unix source that I have.  It
is possible that this structure is from an old version, but this is from
the most recent posting in comp.sources.misc.

typedef struct LzHeader {
  unsigned char		header_size;
  char			method[METHOD_TYPE_STRAGE];
  long			packed_size;
  long			original_size;
  long			last_modified_stamp;
  unsigned short	attribute;
  char			name[256];
  unsigned short	crc;
  boolean		has_crc;
  unsigned char		extend_type;
  unsigned char		minor_version;
  /*  extend_type == EXTEND_UNIX  and convert from other type. */
  time_t		unix_last_modified_stamp;
  unsigned short	unix_mode;
  unsigned short	unix_uid
  unsigned short	unix_gid;
} LzHeader;

This header CANNOT be portable between machines.  It depends on the
size of short integers (usually 16 bits, but they don't have to be),
on the size of long integers (usually 32 bits, but ...), on the
endianness of the machines which read and write the archive, on the
padding between structure elements, and on the fact that Unix modes,
uids, and gids all fit into unsigned short integers.

These assumptions cannot be guaranteed, even on the limited range of
machines that Minix runs on.  They are certainly not guaranteed in the
Unix range.  Lharc archives produced with the above header on my PC
can be read on a VAX here.  The same archive cannot be read on a Sun-4
(byte order and structure padding problems) nor a DEC 5400 (an endian
and structure padding problem) nor a MIPS box (a structure padding
problem).  I have little hope that they will be readable on the next
fast box that is installed here.

The solution is a more portable form of header.  As I suggested before,
something like a TAR header, cut back in size from 512 bytes, will be
necessary.  Without that, this archive format will not fly.

Norbert

jds@mimsy.umd.edu (James da Silva) (07/04/90)

In article <784@rossignol.Princeton.EDU> nfs@cs.Princeton.EDU (Norbert
Schlenker) writes: 
>This header CANNOT be portable between machines.  It depends on the
>size of short integers (usually 16 bits, but they don't have to be),
>on the size of long integers (usually 32 bits, but ...), on the
>endianness of the machines which read and write the archive, on the
>padding between structure elements, and on the fact that Unix modes,
>uids, and gids all fit into unsigned short integers.

Yes, but there's nothing that says you have to read it directly into the
structure.  Knowing that the "native" layout for LZH headers is, say,
that produced by an MS-DOS compiler, you *can* write a portable routine
to read LZH headers by reading into a char array and extracting the
fields one by one.  Likewise for writing such headers.

I don't know whether the source Wayne Hayes is refering to does this, but
it can be done.  It would be useful to have such a portable program for
Unix and Minix, as LZH files are becoming more popular.

>The solution is a more portable form of header.  As I suggested before,
>something like a TAR header, cut back in size from 512 bytes, will be
>necessary.  Without that, this archive format will not fly.

The problem is that a more "portable" header would be incompatible with the
current LZH header, making the files you produce themselves less
transportable.  You couldn't really call the output LZH.

Actually, I do agree with you; this format will not fly as a generic Unix
file transfer format.  The combined archiver/compresser is too much of a
DOS-ism.

But rather than coming up with a new header format, how about separating
out the LHARC compress/decompress routines and making them standalone, ala
the current compress(1)?  Then we can use tar just like we've always done.
"tar.L" files, anyone?

I do have one question: Does LZH require reading the input twice or
creating a temporary file?  I seem to recall that normal huffman encoding
required one pass to determine the relative frequencies of input tokens,
then another pass to do the encoding.  Does lharc work the same way?  That
would rule out its use in situations where on-the-fly compression through
pipes is needed.

Jaime
...........................................................................
: domain: jds@cs.umd.edu				     James da Silva
: path:   uunet!mimsy!jds	 	    Systems Design & Analysis Group

steve@pnet51.orb.mn.org (Steve Yelvington) (07/05/90)

wayne@csri.toronto.edu (Wayne Hayes) writes:
>In article <781@rossignol.Princeton.EDU> nfs@cs.Princeton.EDU (Norbert Schlenker) writes:
>>LHARC wasn't written with portability in mind.  The archive header
>>format is dependent on byte order within words and on the length of
>>integers.  A PC archive cannot be read on an ST and vice versa,
>>without considerable pulling of teeth.
>
>Have you actually tried this?  I'm sorry but I think you're wrong about
>this.  Yes, the headers weren't written with portability in mind, but
>the UNIX source I'll be posting handles this very well.  I've transfered
>.LZH files from my PC under DOS to a Sun and to Minix and the same archived
>file is readable on all systems.  There was a version out awhile ago that
>I also ported and had to write my own INTEL <-> Motorola transfer
>functions, but this version is by the original author of LHARC and much
>cleaner.
 
The problem is that LHARC has been ported between Intel and Motorola
platforms probably eight or nine times, and at least half of the people
doing the port did not know what the hades they were doing.
 
I have several alleged LHARC programs for the ST under TOS. None is fully
compatible with any other. My most recent adventure with LHARC produced
file dates that were so screwed up that they actually crashed the shell
I was using when I tried to list a directory. 
 
It ain't worth the grief.

---
 steve@thelake.mn.org

hyc@math.lsa.umich.edu (Howard Chu) (07/12/90)

In article <784@rossignol.Princeton.EDU> nfs@cs.Princeton.EDU (Norbert Schlenker) writes:
%Here is the Lharc header structure from the Unix source that I have.  It
%is possible that this structure is from an old version, but this is from
%the most recent posting in comp.sources.misc.
%
%typedef struct LzHeader {
%  unsigned char		header_size;
%  char			method[METHOD_TYPE_STRAGE];
%  long			packed_size;
%  long			original_size;
%  long			last_modified_stamp;
%  unsigned short	attribute;
%  char			name[256];
%  unsigned short	crc;
%  boolean		has_crc;
%  unsigned char		extend_type;
%  unsigned char		minor_version;
%  /*  extend_type == EXTEND_UNIX  and convert from other type. */
%  time_t		unix_last_modified_stamp;
%  unsigned short	unix_mode;
%  unsigned short	unix_uid
%  unsigned short	unix_gid;
%} LzHeader;
%
%This header CANNOT be portable between machines.  It depends on the
%size of short integers (usually 16 bits, but they don't have to be),
%on the size of long integers (usually 32 bits, but ...), on the
%endianness of the machines which read and write the archive, on the
%padding between structure elements, and on the fact that Unix modes,
%uids, and gids all fit into unsigned short integers.
%
%These assumptions cannot be guaranteed, even on the limited range of
%machines that Minix runs on.  They are certainly not guaranteed in the
%Unix range.  Lharc archives produced with the above header on my PC
%can be read on a VAX here.  The same archive cannot be read on a Sun-4
%(byte order and structure padding problems) nor a DEC 5400 (an endian
%and structure padding problem) nor a MIPS box (a structure padding
%problem).  I have little hope that they will be readable on the next
%fast box that is installed here.

At least in ARC, all that matters is if a certain type is *at least*
X bits large. If a short is 16 or 18 bits is irrelevant, and the difference
between 32 and 36 bits is also no problem. With the ARC port, the
ARC header is defined as a struct, but it is read from the archive
one byte at a time, and assembled into the internal format by the
header processing routines.

The exact layout & sizes of fields in the structure is irrelevant.
All that matters is what you do when you pull the structure into memory
and shove it back out again. It's trivial to write a single routine
that will do this correctly on *any* architecture. This is what I
did for ARC. My port of ARC runs on Crays, PCs, 68000s, IBM 3090s,
Vaxen, MIPS, Sparc, and some other more obscure architectures...

So enough of this noisemaking over non-portable header definitions.
It's just Not A Problem. If your versions of Lharc are incompatible
across your various systems, then you probably have an outdated
version or the guy who ported it was inept.

--
  -- Howard Chu @ University of Michigan
  one million data bits stored on a chip, one million bits per chip
	if one of those data bits happens to flip,
		one million data bits stored on the chip...