[comp.sys.mac.misc] Really detailed Binhex format info ??

esink@turia.dit.upm.es (04/04/91)

The following is copied from file-formats.txt, which I obtained from
sumex (I have retained only some sections of her paper):

----------
Date: Tue, 22 Jan 91 17:08 CDT
From: "Sande Nissen (CompuCtr, SoftServ)" <SNISSEN@carleton.edu>

After several days of intensive detective work, I have come to a point where 
I believe I can explain the BinHex/MacBinary/.hqx confusion.  Some of the 
earliest information may be incorrect because the truth is lost in the clouds 
of ancient Machistory.

Yves Lempereur, a programmer for Mainstay, specified an "Hqx7" 
encoding scheme and developed the free program BinHex version 4, which uses a 
filename extension of ".Hqx".  Thus the first standard was born.  The Hqx7 
protocol not only pastes together the data and resource forks of a file, it 
also converts the 8-bit ASCII into a 7-bit format that can be transferred 
(and even mailed) across all networks.  Because of the widespread use of 
Lempereur's free utility, this Hqx7 protocol is sometimes called "binhex" 
encoding.

At some point, the author of StuffIt, Raymond Lau, apparently saw a need for 
a 7-bit encoding of StuffIt archives.  (I could not find any information on 
what prompted this decision.)  He added to StuffIt the ability to encode and 
decode what he called "binhex" files, which expected a filename extension of 
".hqx".  Unfortunately, he introduced a serious problem in the process: his 
7-bit ".hqx" encoding is _different_ from the old BinHex version 4 Hqx7 
protocol, despite the fact it has the same extension and common name!  Even 
more unfortu-nately, one of the largest Macintosh software collections in the 
country, the Info-Mac (sumex-aim) archives at Stanford University, adopted 
StuffIt's hqx variant as their only acceptable standard.  Thus the third 
standard was born.

                                        Sande Nissen
                                        Computer Center
                                        Carleton College
                                        12/31/1990

----------

Basically Sande states that StuffIt's 'binhex' encoding is not the
same as BinHex 4.0, even though StuffIt inserts the standard header

(This file must be converted with Binhex 4.0)

on things it encodes.  I find this thought intolerable; so much so
that I have wondered if it is true or not.  I sent mail to Sande
some time ago, asking for further evidence.  What I think Sande said
is that the StuffIt encoding of a file does not match the Binhex 4.0
encoded version.

After studying the Binhex file format, I realized that such a
match is not necessary.  Due to run-length encoding, different
Binhex encodings of the same file are conceivable.  The real
test should be :  can Binhex 4.0 and StuffIt decode each other's
coded files correctly, every time ?

Does anyone know the details of the file formats and algorithms
used to encode/decode said formats which are used by these two
programs ?  Can anyone verify whether or not these two tools
are or are not compatible ?

Suspicion tells me they are...

Eric

Eric W. Sink                     | Putting the phrase      |All opinions
Departamento de Telematica       | "Frequently Asked"      |are mine and
Universidad Politecnica de Madrid| in your kill file is    |not necessarily
esink@turia.dit.upm.es           | not recommended.        |yours.

jcav@quads.uchicago.edu (john cavallino) (04/05/91)

In article <1991Apr04.114105.6198@dit.upm.es> esink@turia.dit.upm.es () writes:
>
>The following is copied from file-formats.txt, which I obtained from
>sumex (I have retained only some sections of her paper):
>
>----------
>Date: Tue, 22 Jan 91 17:08 CDT
>From: "Sande Nissen (CompuCtr, SoftServ)" <SNISSEN@carleton.edu>
>
>After several days of intensive detective work, I have come to a point where 
>I believe I can explain the BinHex/MacBinary/.hqx confusion.  Some of the 
>earliest information may be incorrect because the truth is lost in the clouds 
>of ancient Machistory.
>
>Yves Lempereur, a programmer for Mainstay, specified an "Hqx7" 
>encoding scheme and developed the free program BinHex version 4, which uses a 
>filename extension of ".Hqx".  Thus the first standard was born.  The Hqx7 
>protocol not only pastes together the data and resource forks of a file, it 
>also converts the 8-bit ASCII into a 7-bit format that can be transferred 
>(and even mailed) across all networks.  Because of the widespread use of 
>Lempereur's free utility, this Hqx7 protocol is sometimes called "binhex" 
>encoding.

If you think about it, it's obvious that BinHex 4 was not the first version
of BinHex.  There were in fact at least two earlier versions (and one
subsequent version) of BinHex.  Those of us who remember looking at sumex in
the very early days (ca. 1985), there used to be files with extensions '.hex'
and '.hcx'.  These files were created by those earlier versions of BinHex.
I believe that BinHex 4 can decode these earlier formats, should there still
be a need.
Anybody remember .pit files?...  (floating off into a haze of nostalgia)

-- 
John Cavallino                      |     EMail: jcav@midway.uchicago.edu
University of Chicago Hospitals     |    USMail: 5841 S. Maryland Ave, Box 145
Office of Facilities Management     |            Chicago, IL  60637
"Opinions, my boy. Just opinions"   | Telephone: 312-702-6900

russotto@eng.umd.edu (Matthew T. Russotto) (04/05/91)

In article <1991Apr04.114105.6198@dit.upm.es> esink@turia.dit.upm.es () writes:

>After studying the Binhex file format, I realized that such a
>match is not necessary.  Due to run-length encoding, different
>Binhex encodings of the same file are conceivable.  The real
>test should be :  can Binhex 4.0 and StuffIt decode each other's
>coded files correctly, every time ?

The file formats are intended to be compatible, if I remember what the
Stuffit documentation says.  Stuffit does not use RLE when making a Binhex
file, though it can decode it.  I have heard of some people having files
being decodable by one and not the other, but I have never run across them.
I would suspect bugs in (hopefully) earlier versions of Stuffit to be
responsible for them.
--
Matthew T. Russotto	russotto@eng.umd.edu	russotto@wam.umd.edu
     .sig under construction, like the rest of this campus.

dplatt@ntg.uucp (Dave Platt) (04/06/91)

In article <1991Apr04.114105.6198@dit.upm.es> esink@turia.dit.upm.es writes:
>
>Basically Sande states that StuffIt's 'binhex' encoding is not the
>same as BinHex 4.0, even though StuffIt inserts the standard header
>
>(This file must be converted with Binhex 4.0)
>
>on things it encodes.  I find this thought intolerable; so much so
>that I have wondered if it is true or not.  I sent mail to Sande
>some time ago, asking for further evidence.  What I think Sande said
>is that the StuffIt encoding of a file does not match the Binhex 4.0
>encoded version.

Sande's observation (that the two programs create different encodings)
is correct.  Sande's conclusion that the two encodings are incompatible
is _not_ correct.  Sande jumped to Conclusions (and, as Milo found out,
it's easy to do but you must usually swim back ;-)

>After studying the Binhex file format, I realized that such a
>match is not necessary.  Due to run-length encoding, different
>Binhex encodings of the same file are conceivable.

And, in fact, this is precisely the reason that BinHex 4.0 and StuffIt
do not produce identical encodings.  BinHex 4.0 implements run-length
encoding.  StuffIt does not use run-length encoding when it hqxifies
a file.  Hence, the encodings differ.

However, _both_ of these programs can correctly decode a file produced
by the other, since both of them _do_ implement the run-length decoder.
It doesn't matter to _either_ decoder whether the input file uses RLE or
not.

>                                                      The real
>test should be :  can Binhex 4.0 and StuffIt decode each other's
>coded files correctly, every time ?

Yes, with one exception that I'm aware of.  There's a bug in the decoder
in BinHex 4.0 and BinHex 5.0, which causes problems if the name of the
encoded file is between 27 and 31 characters long... the decoder loses
sync with the file, reports a CRC mismatch in one of the two file
forks, and scrambles the decoded file.  This will occur even if BinHex
4.0 was used to do both the encoding and decoding... it's not due to an
incompatibility between StuffIt and BinHex.

>
>Does anyone know the details of the file formats and algorithms
>used to encode/decode said formats which are used by these two
>programs ?  Can anyone verify whether or not these two tools
>are or are not compatible ?
>
>Suspicion tells me they are...

Your suspicion is correct.  These two tools are compatible with one
another, and with quite a few other tools which have been engineered to
mimic the BinHex encoding.  "xbin" and "unxbin" and "mcvert" are
available on Unix, and at least two desk-accessory decoders are
available on the Mac.  As far as I know, Yves never officially posted
the documentation for the BinHex encoding... it has been successfully
reverse-engineered by a number of people.

I've even implemented a BinHex encoder myself, as part of a mail-agent
which acts as a front-end for uupc 2.1 or Mac/gnuucp 4.3, so I can
simply issue an "Attach binary file" command when composing a message,
and have the file converted to ASCII and appended to the message
"automagically".  It's not terribly hard to do.

However, I chose _not_ to implement the run-length encoding... following
Ray Lau's lead in this respect.  RLE doesn't buy you all that much in
many cases... especially if the file in question has already been
processed through a compression utility such as StuffIt or Compactor.
The run-length encoder is a rather gnarly thing to implement, due to the
amount of state information that must be kept around, and the
interactions between the 8-to-6 encoder, data/resource fork gobbler, and
ASCII line builder.  Writing a run-length decoder is a good deal
simpler (and is _not_ optional if you want to be BinHex-compatible!)

-- 
Dave Platt                                                VOICE: (415) 813-8917
                    UUCP: ...apple!ntg!dplatt
 USNAIL: New Technologies Group Inc. 2468 Embarcardero Way, Palo Alto CA 94303