[comp.sys.mac.programmer] SUMMARY: .hqx

roland@dna.lth.se (Roland Mansson) (06/11/91)

In article <1991May28.210608.16038@lth.se> roland@dna.lth.se (Roland Mansson) writes:
>I need a description of the binhex file format. I'm especially interested
>in how the checksum is calculated, on what part of the fil it is
>calculated, what flags are saved (locked etc), which flags the encoder
>and/or decoder should clear, and if the flags should be cleared after or
>before you calculate the checksum.
>
>Also only eight of the 16 Finder flags are included in the format, and neither
>date created nor date modified are included. I would like to include the
>other eight flags in the byte used for "protect" flag, but then other
>decoders don't decode my files. What is the status of the protect flag, anyway?

Here is a summary of the responses I got to my question about the BinHex
format.  If you need more info, check the sources in the unix directory at
sumex. Someone suggests that there is an informal definition of the format at
sumex. I have not verified that, but I include the definition below.

I used modified the 'mcvert' sources, and it works well. If you need a C
example, I'll be happy to send my sources to you (actually, you'll get the
binhex.c file; to compile it needs some general routines, but it should be
easy to figure out what they do).

THANKS to those who sent answers.

[To those who asked about MacPost: check pollux.lu.se, pub/mac/comm/macpost]

*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*

Date: Wed, 29 May 91 08:58:14 -0500
From: Steve Dorner <dorner@pequod.cso.uiuc.edu>

Eudora's BinHex is based on UNIX xbin, which is available for anonymous
ftp from sumex-aim.stanford.edu.  I haven't had any interoperability
complaints.

>Also only eight of the 16 Finder flags are included in the format

No, all sixteen go in.  Whether or not encoders/decoders honor them
is another story, but they're in the format.

I personally put all the flags in the encoded file, but unconditionally
clear OnDesk, Invisible, and Initted when I decode.  I did it this way
because it seemed reasonable, not because I ever found any kind of
authoritative doc on BinHex.-- 

*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
Date: Wed, 29 May 91 19:06:34 GMT
From: Sak Wathanasin <sw@network-analysis-ltd.co.uk>

I have the sources for the Unix utility "xbin" which I have since converted
to an MPW tool. This is the closest to a "definition" that I've seen.

*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*

Date: Thu, 30 May 91 11:16:17 -0700
From: daveg@Apple.com (Dave Green)

I found the documents that I needed via anon ftp.
It was in a set of sources on sumex-aim.stanford.edu
It was in the utils or unix subdirectory.  It was titled xbin.  It was unix
compilable sources for a utility which converted .hqx files to binary format.

I have in my posession sources to an MPW tool which is a modification of
xbin.  It has conditional code to handle the different io models.  If
you need it, I could dig it up for you.

*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*

Date: Fri, 31 May 1991 14:50 +8:00
From: Lewis_P@cc.curtin.edu.au

This is becoming a FAQ! :-)
There is a file at info-mac (in the help or tech directory I think) called
hqx-format.txt which describes the format (but not the checksum).  I'll
include it, and a file I have on the CRC's at the end of this message.
The CRC is calculated over the enitre resource (or data) fork, plus
the checksum bytes taken as 0 (ie, if the fork is $12 $34 $56 $78 (4 bytes),
then the checksum is over the bytes $12 $34 $56 $78 $00 $00).  The flags 
field should be a direct copy of the finder flags (ie finfo.fdFlags) when
encoded, and can be more or less totally ignored when decoded (system and
bundle should probably be preserved).  (It is usually irrelevent, since
the file is almost always a StuffIt or Compactor archive)

All 16 of the FInfo flags are stored, I dont know about the "protect" flag
(I havent heard of it).  If you really want to include extra information
in the hqxed file, you could put it after the trailing colon on the end
of the hqx file, that way you would not upset any current decoders, but
could set a standard for extra information.  If you do decide to do this,
I would strongly suggest you put you applications signiture as the first
four bytes of the extra information (or some other identifying string
(so as to allow others to recognize it, as well as do there own version).
Also, I would suggest you put, say a % as the very first character after
the colon so it is easy to see if there is a trailing packet (allow the
whitespace characters to intervene so as not to break the 64 column normal
limit).  If you do anything like this, let me know so I can update DeHQX
to handle it.  Speaking of which, if you would like to look at pascal
source code for DeHQX (a BinHex decoder for the mac), it is available
for FTP from Info-Mac in /info-mac/source/pascal/dehqx-105.hqx

---------------- HQX-FORMAT.TXT -----------------------

11-Apr-89 14:01:19-GMT,5282;000000000011
Return-Path: <tcora@pica.army.mil>
Received: from ARDEC-AC4.ARPA (AC4.PICA.ARMY.MIL) by sumex-aim.stanford.edu (4.0/inc-1.0)
	id AA09059; Tue, 11 Apr 89 07:01:19 PDT
Date:     Tue, 11 Apr 89 10:00:43 EDT
From: Tom Coradeschi <tcora@PICA.ARMY.MIL>
To: The Moderators <Info-Mac-Request@sumex-aim.stanford.edu>
Subject:  Re:  Administrivia
Message-Id:  <8904111000.aa23031@ARDEC-AC4.ARDEC.ARPA>

>We have urgent need for a Unix program to convert between BinHex files (.hqx)
>and binary files (.bin) such as those on wsmr-simtel20.army.mil.
>Werner@rascal.ics.utexas.edu send me a few programs but it looked like a
>difficult thing to do on a routine basis. Are there any nice ways to do this?
>As soon as I find out about this I'll put the new Hypercard stack containing
>all of the Tech Notes in the tn directory.
>
>Bill Lipa
>Info-Mac
>
Hey Now, Bill.
  What exactly are you trying to do with respect to binary to hex
conversions? There are UNIX utilities for converting from hex to binary, and
to combine the binary forks into an application. I know because I got them
from info-mac. Is what you're trying to do, a conversion from binary to hex,
via UNIX? That I don't know much about. But what you might want to try is
getting in touch with the guys who've written the xbin programs. Perhaps
they can get you pointed in the right direction. I think they're in the
/info-mac/utilties directory. The following is a bit I saved from when I
downloaded those files:

This is version 2.3 of xbin.  The major changes include
perfomance improvements from Dan LaLiberte of UIUC, fixes
for 16-bit machines from Jim Budler of AMD, and a fix for
a bug in the run-length encoding code.

This version of "xbin" can handle all three BinHex formats
(so far).  Thanks to Darin Adler at TMQ Software for providing
the code to compute and check the CRC values for all three formats.
(There are no plans to support binhex5.0, as its use of binary
encoding makes it useless for sending programs through e-mail).

Other new features include "list" and "verbose" modes, the
ability to convert several binhex files at one time, the ability
to read standard input, somewhat better error handling, and a
manual page.

Any extraneous mail or news headers are ignored, but xbin relies
on finding a line which starts with "(This file" to know when
the header ends and the good stuff begins.  You can add one
of these by hand if it's been lost.

To compile it on USG systems, type:
	cc -o xbin xbin.c

or on Berkeley systems:
	cc -o xbin xbin.c -DBSD

As usual, please report any problems, suggestions, or
improvements to me.

	Dave Johnson
	Brown University Computer Science
	ddj%brown@csnet-relay.ARPA
	aihnp4,decvax,allegra,ulysses,linusa!brunix!ddj

===================
Here's an informal description of the HQX format as I understand it:
-----
The first and last characters are each a ':'.  After the first ':',
the rest of the file is just string of 6 bit encoded characters.
All newlines and carriage returns are to be ignored.

The tricky part is that there are holes in the translation string
so you have to look up each file character to get its binary 6 bit
value.  I found the string by looking at a hex dump of BinHex:

	!"#$%&'()*+,-012345689@ABCDEFGHIJKLMNPQRSTUVXYZAeabcdefhijklmpqr

I can't see how this aids or abets any kind of error recovery, but
if you ran into a char not in the list, you would know something's
wrong and give up.

There is some run length encoding, where the character to be repeated
is followed by a 0x90 byte then the repeat count.  For example, ff9004
means repeat 0xff 4 times.  The special case of a repeat count of zero
means it's not a run, but a literal 0x90.  2b9000 => 2b90.

*** Note: the 9000 can be followed by a run, which means to repeat the
0x90 (not the character previous to that).  That is, 2090009003 means
a 0x20 followed by 3 0x90's.

Once you've turned the 6 bit chars into 8, you can parse the header.
The header format consists of a one byte name length, then the mac
file name, then a null.

[ NOTE. The null is a version no, NOT an end of a C-string /Roland]

The rest of the header is 20 bytes long,
and contains the usual file type, creator/author, file flags, data
and resource lengths, and the two byte crc value for the header.

The data fork and resource fork contents follow in that order.
There is a two byte file crc at the end of each fork.  If a fork
is empty, there will be no bytes of contents and the checksum
will be two bytes of zero.

So the decoded data between the first and last ':' looks like:

	 1       n       4    4    2    4    4   2	(length)
	+-+---------+-+----+----+----+----+----+--+
	|n| name... |0|TYPE|AUTH|FLAG|DLEN|RLEN|HC|	(contents)
	+-+---------+-+----+----+----+----+----+--+

			DLEN			 2	(length)
	+--------------------------------------+--+
	|		DATA FORK	       |DC|	(contents)
	+--------------------------------------+--+

			RLEN			 2	(length)
	+--------------------------------------+--+
	|	RESOURCE FORK		       |RC|	(contents)
	+--------------------------------------+--+

------

Good Luck!

tom c

Electromagnetic Armament Technology Branch, US Army Armament Research,
Development and Engineering Center, Picatinny Arsenal, NJ 07806-5000
ARPA: tcora@pica.army.mil -or- tcora@ardec.arpa
UUCP: ...!auunet,rutgersa!pica.army.mil!tcora BITNET: Tcora@DACTH01.BITNET

----------------- CRC.TXT ---------------------

Subject: Re: Calculating BinHex 4.0's Checksums
From: physi-hf@garnet.berkeley.edu (Howard Haruo Fukuda)
Date: 6 Dec 90 07:05:18 GMT
Sender: usenet@agate.berkeley.edu (USENET Administrator)
References: <5063.275cde88@cc.curtin.edu.au>
Organization: University of California, Berkeley
Lines: 26

In article <5063.275cde88@cc.curtin.edu.au> Lewis_P@cc.curtin.edu.au (Peter Lewis) writes:
>Hi All,
>   Could someone tell me how the CRC is calculated in a BinHex 4.0 encoded
>file?  I know the format of .hqx files, and I can decode them, but I can't 
>calculate and check the CRCs.
>
>Any information on this would be greatly appreciated,
>   Peter.

BinHex 4.0 uses a 16-bit CRC with a 0x1021 seed.  The general algorithm is to
take data 1 bit at a time and process it through the following:

1) Take the old CRC (use 0x0000 if there is no previous CRC) and shift it to
the left by 1.

2) Put the new data bit in the least significant position (right bit).

3) If the bit shifted out in (1) was a 1 then xor the CRC with 0x1021.

4) Loop back to (1) until all the data has been processed.

You should be careful that when BinHex has a 2 Byte location for the CRC (such
as at then end of the header), you should feed in 2 bytes of 0x00 before
you compare the CRCs.

-Howard

*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
From: mfort@ub.d.umn.edu (Michael Fort)
Date: Fri, 31 May 91 17:29:16 CDT

What I have found is a method or application of converting the binhexed file
to a file format called macbinary.  There are applications on both mac and
unix for conversion to and from these two formats.  They are both on
sumex-aim.stanford.edu in info-mac under unix stuff.  They are called xbin
and mcvert-16 I think.

Good Luck!
-- 
Roland Mansson, Lund University Computing Center, Box 783, S220 07 Lund, Sweden
Phone: +46-46107436   Fax: +46-46138225   Bitnet: roland_m@seldc52
Internet: roland.mansson@ldc.lu.se   or   roland.mansson%ldc.lu.se@uunet.uu.net
UUCP: {uunet,mcvax}!sunic!ldc.lu.se!roland.mansson    AppleLink: SW0022