nfs@cs.Princeton.EDU (Norbert Schlenker) (07/03/90)
In article <1990Jul2.143113.2267@jarvis.csri.toronto.edu> wayne@csri.toronto.edu (Wayne Hayes) writes: >... >LHARC now available for Minix! > >YES! Now you too can have a *real* compression utility on your Minix >system. > >Well, OK, "real" is a subjective term. It doesn't accept pipe input. >However it's compression ratios are second to none. It'll blow away >13-bit compress AND 16-bit compress. A month ago, I was also amazed to discover that LHARC compiled and ran virtually unchanged under Minix. It really is a good compressor, and it is nice to have a combination compressor/archiver, like all those poor DOS folks. I have never had any luck porting ARC, ZOO, or their ilk to PC-Minix ... they always run out of code or data space or fail in mysterious and untraceable ways. LHARC worked first time. Its compression ratios are astonishing. It is a truly fine program for local archives. But there is a wee problem ... LHARC wasn't written with portability in mind. The archive header format is dependent on byte order within words and on the length of integers. A PC archive cannot be read on an ST and vice versa, without considerable pulling of teeth. I started to rewrite the parts of the code that wrote the headers (I think something like a TAR header, perhaps tagged to reduce the space required, would be much better) but real work got in the way. LHARC would be a wonderful method for distributing Minix software. The LZHUF algorithm is a clever device and achieves much better compression than compress does. Andy would be less likely to dominate net bandwidth when posting new releases. But it's all useless if the archives can't be read by all the platforms on which Minix runs. Norbert
wayne@csri.toronto.edu (Wayne Hayes) (07/03/90)
In article <781@rossignol.Princeton.EDU> nfs@cs.Princeton.EDU (Norbert Schlenker) writes: >LHARC wasn't written with portability in mind. The archive header >format is dependent on byte order within words and on the length of >integers. A PC archive cannot be read on an ST and vice versa, >without considerable pulling of teeth. Have you actually tried this? I'm sorry but I think you're wrong about this. Yes, the headers weren't written with portability in mind, but the UNIX source I'll be posting handles this very well. I've transfered .LZH files from my PC under DOS to a Sun and to Minix and the same archived file is readable on all systems. There was a version out awhile ago that I also ported and had to write my own INTEL <-> Motorola transfer functions, but this version is by the original author of LHARC and much cleaner. -- Mathematics: That branch of Human Thought which takes a finite set of trivial axioms and maps them to a countably infinite set of unintuitive theorems. Wayne Hayes INTERNET: wayne@csri.utoronto.ca CompuServe: 72401,3525
nfs@cs.Princeton.EDU (Norbert Schlenker) (07/03/90)
In article <1990Jul3.011235.5774@jarvis.csri.toronto.edu> wayne@csri.toronto.edu (Wayne Hayes) writes: >In article <781@rossignol.Princeton.EDU> nfs@cs.Princeton.EDU (Norbert Schlenker) writes: >>LHARC wasn't written with portability in mind. The archive header >>format is dependent on byte order within words and on the length of >>integers. A PC archive cannot be read on an ST and vice versa, >>without considerable pulling of teeth. > >Have you actually tried this? I'm sorry but I think you're wrong about >this. Yes, the headers weren't written with portability in mind, but >the UNIX source I'll be posting handles this very well... Here is the Lharc header structure from the Unix source that I have. It is possible that this structure is from an old version, but this is from the most recent posting in comp.sources.misc. typedef struct LzHeader { unsigned char header_size; char method[METHOD_TYPE_STRAGE]; long packed_size; long original_size; long last_modified_stamp; unsigned short attribute; char name[256]; unsigned short crc; boolean has_crc; unsigned char extend_type; unsigned char minor_version; /* extend_type == EXTEND_UNIX and convert from other type. */ time_t unix_last_modified_stamp; unsigned short unix_mode; unsigned short unix_uid unsigned short unix_gid; } LzHeader; This header CANNOT be portable between machines. It depends on the size of short integers (usually 16 bits, but they don't have to be), on the size of long integers (usually 32 bits, but ...), on the endianness of the machines which read and write the archive, on the padding between structure elements, and on the fact that Unix modes, uids, and gids all fit into unsigned short integers. These assumptions cannot be guaranteed, even on the limited range of machines that Minix runs on. They are certainly not guaranteed in the Unix range. Lharc archives produced with the above header on my PC can be read on a VAX here. The same archive cannot be read on a Sun-4 (byte order and structure padding problems) nor a DEC 5400 (an endian and structure padding problem) nor a MIPS box (a structure padding problem). I have little hope that they will be readable on the next fast box that is installed here. The solution is a more portable form of header. As I suggested before, something like a TAR header, cut back in size from 512 bytes, will be necessary. Without that, this archive format will not fly. Norbert
jds@mimsy.umd.edu (James da Silva) (07/04/90)
In article <784@rossignol.Princeton.EDU> nfs@cs.Princeton.EDU (Norbert Schlenker) writes: >This header CANNOT be portable between machines. It depends on the >size of short integers (usually 16 bits, but they don't have to be), >on the size of long integers (usually 32 bits, but ...), on the >endianness of the machines which read and write the archive, on the >padding between structure elements, and on the fact that Unix modes, >uids, and gids all fit into unsigned short integers. Yes, but there's nothing that says you have to read it directly into the structure. Knowing that the "native" layout for LZH headers is, say, that produced by an MS-DOS compiler, you *can* write a portable routine to read LZH headers by reading into a char array and extracting the fields one by one. Likewise for writing such headers. I don't know whether the source Wayne Hayes is refering to does this, but it can be done. It would be useful to have such a portable program for Unix and Minix, as LZH files are becoming more popular. >The solution is a more portable form of header. As I suggested before, >something like a TAR header, cut back in size from 512 bytes, will be >necessary. Without that, this archive format will not fly. The problem is that a more "portable" header would be incompatible with the current LZH header, making the files you produce themselves less transportable. You couldn't really call the output LZH. Actually, I do agree with you; this format will not fly as a generic Unix file transfer format. The combined archiver/compresser is too much of a DOS-ism. But rather than coming up with a new header format, how about separating out the LHARC compress/decompress routines and making them standalone, ala the current compress(1)? Then we can use tar just like we've always done. "tar.L" files, anyone? I do have one question: Does LZH require reading the input twice or creating a temporary file? I seem to recall that normal huffman encoding required one pass to determine the relative frequencies of input tokens, then another pass to do the encoding. Does lharc work the same way? That would rule out its use in situations where on-the-fly compression through pipes is needed. Jaime ........................................................................... : domain: jds@cs.umd.edu James da Silva : path: uunet!mimsy!jds Systems Design & Analysis Group
steve@pnet51.orb.mn.org (Steve Yelvington) (07/05/90)
wayne@csri.toronto.edu (Wayne Hayes) writes: >In article <781@rossignol.Princeton.EDU> nfs@cs.Princeton.EDU (Norbert Schlenker) writes: >>LHARC wasn't written with portability in mind. The archive header >>format is dependent on byte order within words and on the length of >>integers. A PC archive cannot be read on an ST and vice versa, >>without considerable pulling of teeth. > >Have you actually tried this? I'm sorry but I think you're wrong about >this. Yes, the headers weren't written with portability in mind, but >the UNIX source I'll be posting handles this very well. I've transfered >.LZH files from my PC under DOS to a Sun and to Minix and the same archived >file is readable on all systems. There was a version out awhile ago that >I also ported and had to write my own INTEL <-> Motorola transfer >functions, but this version is by the original author of LHARC and much >cleaner. The problem is that LHARC has been ported between Intel and Motorola platforms probably eight or nine times, and at least half of the people doing the port did not know what the hades they were doing. I have several alleged LHARC programs for the ST under TOS. None is fully compatible with any other. My most recent adventure with LHARC produced file dates that were so screwed up that they actually crashed the shell I was using when I tried to list a directory. It ain't worth the grief. --- steve@thelake.mn.org
hyc@math.lsa.umich.edu (Howard Chu) (07/12/90)
In article <784@rossignol.Princeton.EDU> nfs@cs.Princeton.EDU (Norbert Schlenker) writes:
%Here is the Lharc header structure from the Unix source that I have. It
%is possible that this structure is from an old version, but this is from
%the most recent posting in comp.sources.misc.
%
%typedef struct LzHeader {
% unsigned char header_size;
% char method[METHOD_TYPE_STRAGE];
% long packed_size;
% long original_size;
% long last_modified_stamp;
% unsigned short attribute;
% char name[256];
% unsigned short crc;
% boolean has_crc;
% unsigned char extend_type;
% unsigned char minor_version;
% /* extend_type == EXTEND_UNIX and convert from other type. */
% time_t unix_last_modified_stamp;
% unsigned short unix_mode;
% unsigned short unix_uid
% unsigned short unix_gid;
%} LzHeader;
%
%This header CANNOT be portable between machines. It depends on the
%size of short integers (usually 16 bits, but they don't have to be),
%on the size of long integers (usually 32 bits, but ...), on the
%endianness of the machines which read and write the archive, on the
%padding between structure elements, and on the fact that Unix modes,
%uids, and gids all fit into unsigned short integers.
%
%These assumptions cannot be guaranteed, even on the limited range of
%machines that Minix runs on. They are certainly not guaranteed in the
%Unix range. Lharc archives produced with the above header on my PC
%can be read on a VAX here. The same archive cannot be read on a Sun-4
%(byte order and structure padding problems) nor a DEC 5400 (an endian
%and structure padding problem) nor a MIPS box (a structure padding
%problem). I have little hope that they will be readable on the next
%fast box that is installed here.
At least in ARC, all that matters is if a certain type is *at least*
X bits large. If a short is 16 or 18 bits is irrelevant, and the difference
between 32 and 36 bits is also no problem. With the ARC port, the
ARC header is defined as a struct, but it is read from the archive
one byte at a time, and assembled into the internal format by the
header processing routines.
The exact layout & sizes of fields in the structure is irrelevant.
All that matters is what you do when you pull the structure into memory
and shove it back out again. It's trivial to write a single routine
that will do this correctly on *any* architecture. This is what I
did for ARC. My port of ARC runs on Crays, PCs, 68000s, IBM 3090s,
Vaxen, MIPS, Sparc, and some other more obscure architectures...
So enough of this noisemaking over non-portable header definitions.
It's just Not A Problem. If your versions of Lharc are incompatible
across your various systems, then you probably have an outdated
version or the guy who ported it was inept.
--
-- Howard Chu @ University of Michigan
one million data bits stored on a chip, one million bits per chip
if one of those data bits happens to flip,
one million data bits stored on the chip...