[comp.sys.mac] Binhex section delimiters

howard@amdahl.amdahl.com (The Toolmaster) (06/19/87)

(All references to binhex format are 4.0)

When putting multi-part binhex postings together, I tend to search for
the string "---" since it has become a fairly standard delimiter.  But
occasionally, that string appears in the binhex (sometimes often) slowing
me down.  It seems like a better delimiter could be used, something that
cannot be construed as part of the binhex.  But what? Binhex is comprised
of all the printables, right.  Almost, those little colons are only found
at the beginning and end of the code.  So here's the pitch...

Could we adopt a new standard of '::: end of part whatever :::' for
delimiting sections of binhex.

Now someone is going to say "why don't you just use the spiffy program
posted a while back that puts binhex sections together."  Well I would,
if it was based on the delimiting concept mentioned above.

Roger Long, are you listening? (and what is the direct path to you?)
-- 
"Plan for the future because that's where you                Howard C. Simonson
    are going to spend the rest of your life." {hplabs,ihnp4,nsc}!amdahl!howard
         - Mark Twain -

[ The disclaimer for this message may be found in my next article ]

denbeste@bgsuvax.UUCP (William C. DenBesten) (06/22/87)

> (All references to binhex format are 4.0)

> When putting multi-part binhex postings together, I tend to search for
> the string "---" since it has become a fairly standard delimiter.

> Could we adopt a new standard of '::: end of part whatever :::' for
> delimiting sections of binhex.

what about searching for a space.  Binhex doesn't allow spaces, and
there is almost always a space somewhere in the message (either after
the --- or bewteen end and of).  It usually works pretty well to
search for a space and then go to the beginning of the line that it
occured on.

---
          William C. DenBesten | CSNET denbeste@research1.bgsu.edu
      Dept of Computer Science | UUCP  ...!cbosgd!osu-eddie!bgsuvax!denbeste
Bowling Green State University |
  Bowling Green, OH 43403-0214 |

earleh@dartvax.UUCP (Earle R. Horton) (06/24/87)

In article <1177@bgsuvax.UUCP>, denbeste@bgsuvax.UUCP (William C. DenBesten) writes:
> > (All references to binhex format are 4.0)
> 
> > When putting multi-part binhex postings together, I tend to search for
> > the string "---" since it has become a fairly standard delimiter.
> 
> > Could we adopt a new standard of '::: end of part whatever :::' for
> > delimiting sections of binhex.
> 
> what about searching for a space.  Binhex doesn't allow spaces, and

Bravo, Bill!  I tried this with the rather large posting that came over
comp.binaries.mac in eight parts (title below)

>Newsgroups: comp.binaries.mac
>Subject: Help Data Base for Macintosh Programmers (Manual part 1 of 8)

Here's what I did (this really works!).

a)  Save the eight parts of the manual in macman.[1-8] in a new 
    directory
b)  shell commands:

   set filec
   cat macman.1 macman.2 macman.3 macman.4 macman.5 macman.6 macman.7 \
      macman.8 | sed -n -e '/ /d' -e '/^---$/d' '/./p' > macman
   cat $HOME/thisfile macman > Macman
   xbin Macman 
   ls
   unpit Manual.pit.data
   macput ...

In summation:

i)     Granted, this took five or six tries to get it right, but that's
       still less CPU time than loading everything into a screen editor.
       (And much less user time.)
ii)    One step could have been eliminated, but I didn't have my sed
       manual handy, and don't know how to insert lines (yet).  The file 
       "thisfile" contains "(This file...)" and a blank line.
iii)   Use any delimiter you want, but please put spaces in it!
iv)    A "post mortem" on the file "macman" revealed nothing but
       raw binhex code, as far as I could tell.

Try it!  This really works!

Earle "I love sed" Horton

-- 
*********************************************************************
*Earle R. Horton, H.B. 8000, Dartmouth College, Hanover, NH 03755   *
*********************************************************************

dudek@utai.UUCP (06/26/87)

>Could we adopt a new standard of '::: end of part whatever :::' for
>delimiting sections of binhex.
>
>Now someone is going to say "why don't you just use the spiffy program
>posted a while back that puts binhex sections together."  Well I would,
>if it was based on the delimiting concept mentioned above.

   The combining program I posted a while back didn't require
special delimiters to work properly.  I'd be surprised if the "spiffy 
programs" subsequent to mine were any dumber.  Since you know what 
the line length of the binhex fragments are by the time you get to 
the end of the first part, you can spot the beginning to the next 
part very reliably. (There are other heuristics that can also be 
used .)

  Greg Dudek
-- 
Dept. of Computer Science (vision group)    University of Toronto
Usenet:	{linus, ihnp4, allegra, decvax, floyd}!utcsri!dudek
CSNET:	dudek@ai.toronto.edu 		ARPA: dudek%ai.toronto.edu@csnet-relay
DELPHI: GDUDEK
Paper mail: DCS, 10 King's College Circle, Toronto, Canada 

rond@zaphod.UUCP (Ronald James Domes) (06/29/87)

I uploaded the program from the net awhile ago that took a file and stripped
	everything out of it that is not part of the actual binhex code. What
	it appears to do is look for the header line and strip everything that
	is before it. If a line begins with '---' and that is all that is in
	the line then it begins stripping again until the next ofccurance of
	it. Sooo, what I do with multi-part binhex documents is

		cat MacMan.?.b |bhcomb>MacMan.xmt

		WHERE
			the MacMan files are MacMan.1.b thru MacMan.6.b
			and the output is MacMan.xmt

If you create an alias for this all you have to do is

		bh MacMan.?.b MacMan.xmt

I find it works great, except for one very livable bug. If the ending colon
	(which is how it terminates the capture) is at the end of a line it
	will report that there is no end colon.

I have found this program to be the most useful tool for uploading from the
	net. My compliments to the author!

-- 

						Thanks
						 Ronald James Domes
						 Develcon Electronics Ltd.

loucks@intvax.UUCP (Cliff Loucks) (07/01/87)

in article <3977@utai.UUCP>, dudek@utai.UUCP says:
> 
>>Could we adopt a new standard of '::: end of part whatever :::' for
>>delimiting sections of binhex.
>>
Following Earle R. Horton's suggestion, I now use a shell script to
strip all header/trailer lines from the already concatenated file.
The script is simply:

cat $1 | sed -n -e '/ /d' -e '/	/d' -e '/^---$/d' -e '/./p' > $1.hqx
rm $1
                     ^         ^           ^           ^
                     remove    remove      remove      pass all
                     lines     lines       lines       other lines
                     with a    with a      containing  through to
                     space     tab         only "---"  the output
                                                       file

This is the same as Earle's, except for the deletion of lines which
have a tab in them also.  I've only used this since last week but
it has worked so far.  After running this script, I then download
the file, with kermit (text), and unbinhex (hexbin?) the file on
the Mac.

-- 
Cliff Loucks   ucbvax!unmvax!sandia!intvax!loucks
Sandia National Labortories, Albuquerque, New Mexico

earleh@dartvax.UUCP (Earle R. Horton) (07/04/87)

In article <278@intvax.UUCP>, loucks@intvax.UUCP (Cliff Loucks) writes:
> cat $1 | sed -n -e '/ /d' -e '/	/d' -e '/^---$/d' -e '/./p' > $1.hqx
> rm $1
>                      ^         ^                 ^           ^
>                      remove    remove            remove      pass all
>                      lines     lines             lines       other lines
>                      with a    with a            containing  through to
>                      space     tab               only "---"  the output

cat $HOME/thisfile $* |\
         sed -n -e 1,2p -e '/ /d' -e '/	/d' -e '/^---$/d' -e '/./p' > $1.hqx

contents of $HOME/thisfile:
(This file...)

--EOF--

The trick is to do all the files at once, and still get the crucial
"(This file...)", while using the minimal number of programs/CPU time.
"$*" allows multiple files to be input after "thisfile", and "-e 1,2p"
prevents the header lines from being stripped out.  I never use
binhex, but the header lines are required for xbin.

I would like to see someone accomplish this task with
   (a)   one UNIX or VMS program in a shell script  (100 points)
   or
   (b)   no file "thisfile"  (40 points)
   or
   (c)   an alias  (30 points)

Only pre-existing programs which come with the distribution tapes
are allowed in (a).  Only single line shell scripts or aliases are
allowed, although continuation by "\" to improve readability is OK.
Winners in all categories will be invited to the annual Horton
Barbecue and Potluck, at my house later this month.

(I prefer to do the "rm" by hand.)

Earle
-- 
*********************************************************************
*Earle R. Horton, H.B. 8000, Dartmouth College, Hanover, NH 03755   *
*********************************************************************

straka@ihlpf.ATT.COM (Straka) (07/06/87)

I couldn't resist: All of this discussion on a way to do the job in a
very UNrobust fashion.  I offer (again, and even slightly revised) my
solution to the combining problem: bhcomb.c.  Anyone with a c compiler
and stdio can use it.  And it's reasonably robust, too.

Try it; you'll like it.  Totally automated.  Gives diagnostics.
Output works with xbin, macput, etc.  Free support from the author. :-)

Yes, I know this probably belongs in comp.sources.mac, BUT:

In article <6582@dartvax.UUCP> earleh@dartvax.UUCP (Earle R. Horton) writes:
>In article <278@intvax.UUCP>, loucks@intvax.UUCP (Cliff Loucks) writes:
>> cat $1 | sed -n -e '/ /d' -e '/	/d' -e '/^---$/d' -e '/./p' > $1.hqx
>> rm $1
>cat $HOME/thisfile $* |\
>         sed -n -e 1,2p -e '/ /d' -e '/	/d' -e '/^---$/d' -e '/./p' > $1.hqx
>The trick is to do all the files at once, and still get the crucial
>"(This file...)", while using the minimal number of programs/CPU time.

snip--- snip--- snip--- snip--- snip--- snip--- snip--- snip--- snip--- snip---

/*	bhcomb.c: combine and strip header information from BinHexed files.
	          for MacIntosh file transfer.
	Author: R. J. Straka
	Revision 1.1
	Date: July 6, 1987

	Bhcomb is a program that takes a BinHexed MacIntosh file that
	has been broken into several pieces to avoid electronic mailer
	handling problems and splices them back together again.  Bhcomb
	does this process in a totally automated fashion (when
	accompanied by an appropriate shell script), and attempts to be
	fairly rigorous by:

	1) Looking for the logical start of the file (delimited by the
	   string: "(This file ...)"
	2) Checking each line of the input for proper length and validity
	   of all characters.
	3) Looking for the logical end of the file.

	Bhcomb was developed under UNIX SVR2, and uses stdin, stdout
	   and stderr exclusively:
		Stdin is used for the input.
		Stdout is used for the valid file output.
		Stderr is used for the garbage and diagnostics.

	After bhcombing, the user would typically use xmodem (or
	  similar) or macput on the resulting file.

	Bhcomb assumes the following BinHex file structure:

	several lines of unrelated header
	(This file must be converted by BinHex 4.0)
	:123456789012345678901234567890123456789012345678901234567890123
	1234567890123456789012345678901234567890123456789012345678901234
	.
	.
	.
	1234567890123456789012345678901234567890123456789012345678901234
	1234...4321:
	several lines of unrelated footer

	Additional unrelated headers and/or footers may be present
	   within the data stream.
	The actual data is prepended with "(This file... BinHex 4.0)"
	   and an extra blank line.
	The actual data begins with a framing ":" (not checked)
	The actual data must end with a framing ":"
	All data lines (except potentially the last) are of the same
	   length (default=64).
	The last data line is of random length, and ends with a ":".
	Certain characters are never seen within the BinHex portion:
		nothing < \012
		nothing > \012, yet < \040
		no spaces
		no . / ; < = > ? O g n o s t u v w x y z { } characters
		no | ~ \ ] ^ _ characters
		nothing > \176

	Data is gathered through stdin.
	Good data is sent to stdout.
	Bad data and diagnostics are sent to stderr.

	A shell line (or procedure) of the following form is recommended:

	   bhcomb <foo?.net >foo 2>foo.doc || echo ^G bhcomb Failed!

	Where the input filenames are foo1.net, foo2.net, ...  The shell
	   should put the files in the proper order given proper naming
	   convention by the files' creator.

	BUGS:
		More than one BinHexed file per invocation ignores all
		  but the first BinHexed file.
		Does not check for additional ":"s inside of the valid
		  portion of the data.
		Has no way to check for files in inappropriate order
		  (except for the first and last)
		Could be made more efficient by being table driven.
		No manual page.  (You can tell I don't write
		  applications code for a living.)
	
	Revision Notes:
		1.0:	Original Release
		1.1:	Now recognizes last line of exactly LENGTH chars
			  without complaining.
			Minor check added for out of sequence input files.
*/

#include <stdio.h>
#include <string.h>
#define	LENGTH	64			/* LENGTH = default BinHex line
					     length = 64
					*/
main(argc,argv)
{
int valid=0, started=0, lth;
					/* started = "we have started
					     collecting valid BinHex data"
					   valid = "the last line encountered
					     was a valid BinHex line"
					   lth = line length
					*/
char inline[256];
while (gets (inline) >0)
	{
	if (strncmp (inline,"(This file must be converted with BinHex 4.0)",45)==0)
		{
		if (started != 0)	/* Have we already started? */
					/* If so, something is wrong! */
			{
			started=0;	/* Unused hook for multiple files */
			fprintf(stderr,"%s\n",inline);		  /* Print it */
			fprintf(stderr,"More than one BinHex file!\n");/*ERROR*/
			exit (1);
			}
		else
			{
			printf("%s\n",inline);
			gets (inline);		/* read dummy blank line */
			printf("\n");		/* put out dummy blank line */
			valid=1;		/* This line of data is valid */
			started=1;		/* We started data gathering
			*/
			}
		}
	else
		{
		lth=strlen (inline);
		if (badchars (inline,lth) != 0)	/* Do we have illegal chars? */
			{
			fprintf(stderr,"%s\n",inline);	/* Put to stderr */
			valid=0;			/* Line not valid */
			}
		else
			{				/* All chars OK */
			if (strlen (inline) != LENGTH)	/* if bad line length */
				{
				if (valid!=1)	/*not expecting last line with :*/
					{
					fprintf(stderr,"%s\n",inline); /*bad line*/
					valid=0;	/* Line not valid */
					}
				else			/*expecting last line with : */
					{
					if (findcolon (inline) == 0)
					/* if colon at end of line */
						{
						printf("%s\n",inline); /* last line */
						started=2;	/* FINISHED */
						exit (0);    /* NORMAL EXIT */
						}
					else
						{
						fprintf(stderr,"%s\n",inline);
						valid=0;	/* bad line */
						}
					}
				}
			else
				{
				if (started != 1)
					{
					fprintf(stderr,"%s\n",inline);	/* Print it */
					fprintf(stderr,"No beginning BinHex message; files may be out of order.\n");  /*ERROR*/
					fprintf(stderr,"Out of phase, get help. :-)\n");  /*ERROR*/
					exit (1);
					}
				else
					{
					printf("%s\n",inline);	/* Good line */
					valid=1;
					if (findcolon (inline) == 0)
					/* if colon at end of
					   this 64 character line */
						{
						started=2;	/* FINISHED */
						exit (0);    /* NORMAL EXIT */
						}
					}
				}
			}
		}
	}
fprintf(stderr,"Improper EOF; no ending colon!\n");  /* should never get here */
exit (2);
}

badchars(lptr,length)			/* Look for illegal characters */
char *lptr;
int length;
{
int badchar, p;
char c;
c='a';
badchar=0;
for (p=0;p<length;p++)
	{
	c=lptr[p];
	if (c < '\n')            {badchar=1; break;}
	if (c > '\n' && c < '!') {badchar=1; break;}
	if (c > '-' && c < 0)    {badchar=1; break;}
	if (c > ':' && c < '@')  {badchar=1; break;}
	if (c == 'O')            {badchar=1; break;}
	if (c > '[' && c < '`')  {badchar=1; break;}
	if (c == 'g')            {badchar=1; break;}
	if (c > 'n' && c < 'o')  {badchar=1; break;}
	if (c >  's')            {badchar=1; break;}
	}
return (badchar);
}

findcolon(lptr)			/* Look for : at end of line */
char *lptr;
{
int p;
p=strlen(lptr);
while (lptr[p--]=='\n') ;	/* get rid of all possible trailing \n_s */
if (lptr[p]==':')
	{
	return (0);
	}
else
	{
	return (1);
	}
}

-- 
Rich Straka     ihnp4!ihlpf!straka

Advice for the day: "MSDOS - just say no."