[comp.sources.misc] v07i001: ABE bullet-proof ascii encoder, part 1 of 2

allbery@uunet.UU.NET (Brandon S. Allbery - comp.sources.misc) (06/04/89)
Posting-number: Volume 7, Issue 1
Submitted-by: brad@looking.on.ca (Brad Templeton)
Archive-name: abe/part01

: This is a shar archive.	Extract with sh, not csh.
: The rest of this file will extract:
: read.me abeformat Makefile abe.h abe.1 abe.c dabe.1 dabe.c tdABE1 tdABE2 tdUUENCODE
echo Extracting read.me
sed 's/^X//' > read.me << 'E-O-F'
X
X	ABE Ascii-Binary Encoding System by B. Templeton
X
XABE is a replacement for uuencode/uudecode designed to deal with all
Xthe typical problems of USENET transmission, along with those of other
Xmedia.
X
XAdvantages are:
X	Files are often smaller, and compress well.
X
X	All printable characters map to themselves, so strings in
X		binaries are readable right in the encoding.
X
X	All lines are indexed, so sort(1) can repair any random
X		scrambling of lines or files. (This can be turned off.)
X
X	Extraneous lines (news headers, comments, signatures etc.) are
X		ignored, even in the middle of encodings.
X
X	A PD tiny decoder is available to include with files for first
X		time users.
X
X	Files can be split up automatically into equal sized blocks.
X
X	Blocks can contain redundant information so that the decoder
X		can handle blocks in any order, even with reposted duplicates
X		and extraneous articles.
X
X	Files with blank regions can be constructed from multi-part encodings
X		with damaged blocks.
X
X	Multiple files can be placed in one encoding.
X
X	The decoder is extremely general and configurable, and supports many
X	features not currently found in the encoder, but which other encoder
X	writers might fight useful.
X
XIn general, a redundant ABE encoding posted to a typical newsgroup over a
Xcertain article region can be decoded with something as simple as:
X	
X	dabe /usr/spool/news/comp/binaries/group/3[45]?
X
XWhere it doesn't matter much if there are postings in a random order,
Xduplicate postings, or inserted articles on other topics.   Ie. exactly
Xall the things that are a pain about usenet (or mail) binaries.
X(You can usually run dabe right on your entire mailbox.)
X
X
XThe ABE encoder (and decoder) support 3 different encoding formats.  One
Xuses all 94 printable ASCII characters, the other avoids characters that
Xhave trouble in ASCII-EBCDIC translations, and the 3rd is the UUENCODE
Xformat.  (ABE can make files decodable by a typical uudecode program.)
X
X-----------------
X
XTo build, unpack these files in a directory.  Move the tiny decoders,
Xtd*, to some official place, and edit the Makefile to indicate
Xwhere you put it.  (Example, /usr/lib/td%s)   Check the Makefile
Xand defines file (abe.h) for any system dependencies, then make
X
XInstall the resulting abe and dabe programs in some bin directory.
XInstall abe.1 and dabe.1 in appropriate man directories.  Leave the file
Xformat description around for people to read it.  (Possibly modify abe.1
Xto indicate where the file format file is, and where the tiny decoders are)
X
XThe main portability concern involves my use of arbitary args to
Xfprintf in the various warning and error message routines in abe.c and
Xdabe.c.  Watch for these.
E-O-F
echo Extracting abeformat
sed 's/^X//' > abeformat << 'E-O-F'
X
XABE File Format:
X
XABE1 Format:
X94 Printable characters from "!" to "~" are used.  Space and TAB are not.
XThe 86 characters from "%" to "z" are used to represent bytes on data lines.
X
XABE2 Format: (and line number format)
X85 printable characters are used.  The 64 character set:
X"./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
Xrepresents data bytes.  It is also used to form line numbers and checksum
Xbytes on lines.  Characters from the sets: +,-, "#$%&'()* and
X:;<=>?@_ are used to shift between sets.
X
XThe ABE2 system avoids the characters in the following set:
X	! ` [ \ ] ^ { | } ~
Xas they have been reported to fail reliable translation between ASCII
Xand EBCDIC on IBM machines.  (!) fails due to a bug in "dd" on some unix
Xmachines.
X
XAll ABE Lines begin with a 3 character line "number".  This is formed from
Xthe 64 ABE2 printables, in base 64, where '.' represents 0.  The first
Xbyte is special, in that 'T' represents 0, rather than '.'.  (This avoids
Xlines that start with dots, or "From."
X
XThe 4th byte in any ABE line is a checksum of the remaining bytes in the
Xline.  The bytes are summed (as their ASCII values) and the result is taken
Xmod 64.  (The newline is not included.)  The checksum character is formed
Xby indexing into the ABE2 64 character set.
X
XThe remaining bytes can be data or header items.  All header items begin
Xwith two identical characters that are NOT in the main character map --
Xie. chosen from the shifting set.
X
XIf a line starts with two different characters, or two identical characters
Xthat are not header characters, it is a data line.
X
XThe following header-lines are defined:
X
XMAIN_HEAD	'##S'		Start of File
XSUB_HEAD	'$$'		General Header with English keyword
XCODE_HEAD	'""'		Character map line
XMAIN_END	'##E		End of file
X
XMost header lines use the SUB_HEAD and an English keyword.
XNote that the three header chars '#', '$' and '"' can't be used as
Xmapping characters, unless it is assure no line will start with two
Xof them.  They should always be used as shifting characters.
X
X(An undocument option to dabe, h=string, redefines the three header characters
Xin case of future problems.  The default is h=#$", as shown above.  If you
Xmake an encoding with different header chars, you would have to inform
Xdabe users to provide the correct option.)
X
XData lines are streams of printable characters, all 65 to 68 characters
Xlong (plus line number and checksum.)  Bytes are represented with three
Xsets of printable characters.  A table is built defining which real
Xbyte is meant by each printable character in the sets.
X
XBy default, we assume we are in set 0, the most common set.  If a
Xprintable character is encountered in the mapping set, we simply look it
Xup in the table for set 0, and output the proper full byte.
X
XSpecial escape characters, defined below, cause a temporary
Xshift into sets 1 and 2, or combinations of those sets.  After
Xencountering an escape character, we temporarily shift set for the next
X1 to 3 characters.  Shifts never cross a line break -- we always go back
Xto set 0 with each fresh line.  The shift character meanings are defined
Xlater.
X
X
XMAIN_HEAD:	##tver,fver,ever,style
X
XThe first three values are decimal numbers, the style is a string.
X	
X	tver -	Earliest version of the tiny dabe decoder that can
X		decode this file.
X
X	fver -	Version number of ABE encoder that encoded this file.
X
X	ever -	Earliest full ABE decoder that can decode this file
X
X	style - ident, up to 8 digits or upper case letters long, of the
X		encoding style. Currently ABE1, ABE2, UUENCODE or TEXT.
X
XCODE_HEAD:	!!<which32> 8(<enc1><enc2><enc3><enc4><sets>)   (ABE1)
X		!!<which32> 16(<enc1><enc2><sets>)		(ABE2)
X
XThese lines define the printable character encodings used in these files.
XEach line gives the encoding for a set of 32 bytes, and there will be 8
Xsuch lines.  The first character in these lines, <which32> will be the
Xcharacter representing a number from 0 to 7 in the mapping set indicating
Xwhich of the blocks of 32 bytes this line defines.
X
XABE1:
XThis is followed by eight sets of five characters each.
XThe sets of 5 characters consist of 4 characters that are the printable
Xcharacters that will be used to encode the byte in question.  The 5th character
Xindicates which of the three sets each printable character resides in, for
Xall four bytes.   Thus the first set of five characters from CODE_HEAD 0
Xdefine, for the bytes 0, 1, 2 and 3, which printable characters in which
Xset will represent them.   The <sets> byte defines the set for each of the
X4 bytes.  It is a number from 0 to 80, where '%' represents 0, as always in AB1.
XExpress the number as a 4 digit number in base 3 to get the 4 sets.  The
Xfirst (most) significant digit ( or <sets>/27 ) gives the set of the first
Xof the 4 bytes.  The last (least) significant digit (or <sets> % 3) gives
Xthe set of the last of the 4 bytes, and so on.
X
XABE2:
XThis is followed by 16 sets of 3 characters each.  The third character
Xindicates which of the 4 sets each printable character resides in, for
Xboth bytes.  It is thus the same as the ABE1 encoding, except the ABE2
Xmapping set is used, and the set byte is decoded as <sets>/4 for the first
Xchar and sets%4 for the second byte.
X
X
XMAIN_END:	""longint
X
X	This line closes off the file.  It stops the tinydabe
X	decoder from reading further, and includes the file checksum (longint).
X	The regular DABE decoder can expect data after this line in a non
X	sorted series of random blocks, or a multi-file encoding.
X	The checksum is that of all printable characters representing
X	data bytes in the file, mod 65536.  Only bytes from data
X	lines are used in this checksum, and the sum is made of
X	the printable characters in the data lines, not the actual
X	bytes in the output file.
X
X	Like all other checksums, the line number, checksum and newline
X	bytes are not included in the sum.
X
XSUB_HEAD	$$keyword=value
X
X	Many sub-header lines are defined, and more are possible in the
X	future.  Keywords are alphanumeric, and the case of the letters
X	is unimportant.  Values can be any string of printable, non-blank
X	characters.   While blanks could be used in values, they are
X	advised against.
X
X	Here are the sub-headers:
X
X	startblock=blocknum,seekaddr,earlyver,filename
X
X		blocknum - Decimal integer, the index number of this block, from
X			0 to N-1.
X		seekadr -  Decimal long integer, the seek address into the file
X			where the block should be written.
X		earlyver - Earliest decoder that can decode the file.
X		filename - Ascii string, the universal filename of the file.
X			(See below and the man pages for a discussion of
X			universal file names.)
X
X		This record begins a new block.  It usually is found at the
X		start of an independent file, although multiple blocks can
X		exist in a file.
X
X	closeblock=blocknum,block_checksum,bytecount,blockcrc
X
X		blocknum - Decimal integer, the index number of this block, from
X		   0 to N-1.
X		block_checksum - Decimal long integer, the checksum of the
X		   printable characters in the block (not including the
X		   CLOSE_HEAD line) after the 4th (checksum) byte of every
X		   valid data or header line in the block.  This checksum is
X		   presented modulo 65536.
X		bytecount - The number of data bytes that should have been
X		   present in the block.  (Not printable characters, but
X		   actual output data bytes.)
X		blockcrc - An unsigned long int, the 32 bit CRC of the block.
X	style=string
X		Sets the encoding style for this block.  (For now, used
X		only in redundant block encoding, as the main file header
X		sets the encoding style normally.)  The ident can be up
X		to 8 upper case alphanumercis long.  Currently defined
X		are ABE1, ABE2, UUENCODE and TEXT.
X
X	os=string
X		Defines the operating system that encoding took place on.
X		Currently defined values are "unix" and "msdos",
X		but anything can be used.  The two OSs must match if
X		full file pathnames and machine independent forms of
X		file information are to be used.
X
X	blocking=true|false
X		The value may be either "true" or "false."  This indicates
X		whether this file will be split into blocks or not.
X
X	fname=string
X		Defines the file's true filename, to a limit of 60 characters.
X		The true filename is only used when decoding on the same OS
X		as the encoding was made on.  This field is optional.
X	uname=string
X		Defines the short universal name.  Universal names should
X		include no directory characters and must be 12 characters
X		in length or less.  While no other rules are enforced, it
X		is advised that universal names be limited to alphanumerics
X		and the dot (.) character, and that there be no more than
X		one dot in a universal name, and that there be no more than
X		3 significant characters after the dot.
X
X		Decoding programs must ensure that universal names conform
X		to the rules of their operating system.
X
X	owner=string
X		The name of the user who owns the file.  Currently optional
X		and unused.
X
X	total-blocks=longint
X		In a blocked file, the argument will be a decimal integer
X		indicating the total number of blocks the file was split
X		into.  If the number is "10", then blocks 0 through 9 should
X		be found.  This field is not found on non-blocked files.
X
X	end_file=string
X		The string will be the file's universal name, once again,
X		just to be redundant.  This is the sub-header version of
X		the MAIN_END header line.
X	date=longint
X		The modification date/time for the file, expressed as the
X		number of seconds since 00:00:00 GMT, January 1, 1970.
X		This is the unix epoch, and unix systems will be able to
X		use this number directly.  Other systems must convert if
X		they wish to use this number.  This field is optional.
X
X	perm=int
X		Specifies access permissions for the file.  Only the
X		lower 3 bits (perms & 7) are OS independent.  The rest
X		of the number is OS-dependent.  Of the lower 3 bits,
X		bit 0 (lsb) indicates general execute permission, bit
X		1 indicates general write permission and bit 2 indicates
X		general read permission.
X
X		On unix systems, this number will be the file's "mode."
X
X	size=longint
X		The size, in bytes, of the file.  This field is optional.
X	filecount=int
X		Number of files in this encoding.  Currently present but
X		unused.
X	linenumbers=true|false
X		Indicates whether line numbers are present in this block,
X		after this line.  If set to false, further lines until
X		the end of the block need no line numbers.  The need for
X		line numbers will resume in the next block or file.
X		Thus this must appear in every block if line numbers are
X		not to be used.
X	filecrc32=unsigned long int
X		This gives the 32 bit CRC for the entire file.
X		The decoder only checks this value if the file was
X		not blocked.  Blocked files rely on the block CRCs, as
X		the file CRC will not be right if the blocks come in
X		a random order.
X
XThe following sub-headings are understood by the decoder, but not
Xcurrently used by the encoder.  They make the decoder more general.  All
Xencoding start off as a standard encoding (ABE1,ABE2,UUENCODE, TEXT) but
Xthese headers can change the parameters.
X
X	numsets=int
X		The number of character sets.  Numbers from 1 to 6 are
X		valid.  ABE1 defaults to 3, ABE2 to 4, UUDECODE to 0.
X	setgroup=int
X		Defines the number of characters per set group in a
X		CODE_HEAD line.  This is 4 for ABE1 and 2 for ABE2.  The
X		number must be 1, 2 or 4.  This controls how CODE_HEAD
X		lines are decoded.
X	prints1=string
X		Defines the mapping character set -- the N characters which
X		are used to represent chracters, but not the shifts.
X		This either defines the entire set, or the first 48 characters
X		of it.
X	prints48=string
X		Defines the rest of the mapping character set if it is more
X		than 48 bytes long.  Thus a set up to 96 bytes long can be
X		defined.   prints48 MUST come after prints1, or the
X		prints48 will be ignored.
X
XThe following headings define the shift characters for an ABE encoding.
XThere are 4 types of shift characters that can be defined, and the number
Xof shift characters per type depends on the number of sets.  If a shift
Xcharacter is not to be define, use "0" in its place.  Thus "0" can not be
Xa shift character in an ABE encoding.
X
X	xshifts=string
X		Define the (sets-1) shift characters that encode a single
X		byte temporary shift to a given set.  The first char is
X		the shift to set 1, the second is the shift to set 2, etc.
X		There is no shift to set 0, as that's the default.
X	xxshifts=string
X		Define the (sets-1)^2 shift characters that encode a
X		double byte temporary shift to two arbitrary non-zero sets.
X		In ABE1, for example, the first character means shift
X		to set 1, and set 1 again.   The second is set 1, set 2.
X		The third is set 2, set 1, the fourth and last is set 2, set 2.
X	xcxshifts=string
X		Define the (sets-1)^2 shift characters that encode a
X		triple byte temporary shift to one arbitrary set, the
X		default set (0) and another arbitrary set.  The same
X		system is used as for xxshifts, except a return to the
X		default set (normally) 0 is in the middle.  You often
X		don't have room for all of these, so some with be marked
X		undefined by using 0 in that place.
X	runlength=char
X		Defines the character as a 'run length' character.  This
X		character takes the mappable character after it, gets the
X		character index of it and adds 1 to get a count N.  N
X		repeitions of the last unshifted or single-shifted
X		character will be placed on the output file.  Note that
X		this only repeats the last character from set 0 or the
X		result of an xshift single-byte shift.  Note as well
X		that since the character has already appeared in the
X		output, you get N+1 of it.  (C+2, where C is the index
X		number of the mappable character you used for the count.)
X
X		This is not currently used by the encoder.  In fact,
X		compression of this sort of thing should really be left
X		up to compression programs.  But it could sometimes be
X		of use in ABE encodings, so it's here.
X	changeset=string
X		Define the (sets) shift characters that encode a
X		permanent change (for the rest of the line) in the
X		default set.   In all current ABE encodings, the default
X		set is always 0, but a future encoder might use this system.
X		The change is only for the rest of the line.  The default
X		returns to 0 at the start of any line.   The first char
X		in the string will be the shift to set 0 byte, the second
X		char will shift to set 1, and so on.
X	
X
X
X
XSub-headings currently undefined, but possible for further expansion:
X	variant=string
X		OS variant.  Version numbers or things like "SysV" or
X		"BSD" could be placed here.  Few files should contain
X		anything so machine dependent that an OS variant should
X		be needed, but who knows?
X	group=string
X		Group owner of the file
X	link=string
X		This file is just a link to another file in the same
X		encoding.
X	textfile=true|false
X		Indicates whether the file is to be decoded as a text
X		file.  Default is false on this optional field.
X	newline=byte,byte,...
X		Gives the string, as a series of decimal integer byte numbers,
X		that represents a newline.  The decoder should output whatever
X		is a newline on its own system.  For example, a unix system
X		might say newline=10, an MS-DOS system would say newline=13,10
X
XEscape 'Shifting' Characters for Data Lines (ABE1):
X
X	!	Set 1, Set 1 (2 chars)
X	"	Set 1, Set 2 (2 chars)
X	#	Set 2, Set 1 (2 chars)
X	$	Set 2, Set 2 (2 chars)
X	{	Set 1 (1 char)
X	|	Set 2 (1 char)
X	}	Set 1, Set 0, Set 1 (3 chars)
X	~	Set 1, Set 0, Set 2 (3 chars)
X	( No mappings are defined for 2, 0, 1 and 2, 0, 2 )
X
XEscape 'Shifting' Characters for Data Lines (ABE2):
X	+	Set 1 (1 char)
X	,	Set 2 (1 char)
X	-	Set 3 (1 char)
X
X	"	Set 1, Set 1
X	#	Set 1, Set 2
X	$	Set 1, Set 3
X	%	Set 2, Set 1
X	&	Set 2, Set 2
X	'	Set 2, Set 3
X	(	Set 3, Set 1
X	)	Set 3, Set 2
X	*	Set 3, Set 3
X
X	:	Sets 1, 0, 1
X	;	Sets 1, 0, 2
X	<	Sets 1, 0, 3
X	=	Sets 2, 0, 1
X	>	Sets 2, 0, 2
X	?	Sets 2, 0, 3
X	@	Sets 3, 0, 1
X	_	Sets 3, 0, 2
X
X	(No mapping is defined for 3, 0, 3)
X
XEncoding Style:
X
X	ABE decoders ignore (give a warning for) unknown sub-header
X	keywords, so expansions to the format can add these without
X	necessarily hurting backwards-compatibility.
X
XUUENCODE Format:
X	The UUENCODE format uses no CODE_HEAD lines, and, in the data
X	region, is identical to the basic format used by uuencode(1),
X	with the exception of the presence of 4 bytes of line number
X	and checksum (in ABE2, Base 64 form) on the front of each line.
X	If line-numbers are turned off, the middle of a UUENCODE ABE
X	encoding looks just like a UUENCODE encoding.  UUENCODE files
X	can be blocked, but uudecode programs may not fully understand
X	them in this format.
X
X	Our uuencode method, like most uuencoders, uses the grave
X	accent to represent 0, rather than space, as the original ones
X	did.
XTEXT Format:
X	This format is designed for unix text files.  It maps all
X	characters to themselves, other than the shift character
X	'#' and the newline.  This allows text files to appear almost
X	verbatim, although their lines must not begin with ##, $$ or "",
X	as these will be mistaken for headers.
X
X	Lines in a text encoding will have a newline output after them,
X	unless the shift character '#' appears as the last character on
X	the line.  These special lines that don't get a newline can be
X	used to break long lines into a series of short ones.
X
X	The encoder does not currently make TEXT format files.  Some
X	future encoder may.
X
X	The shift character '#' is a single byte shift.  It can be
X	followed by any of the following bytes which will map to the
X	appropriate useful byte:
X	
X		G	^G
X		H	^H (backspace)
X		r	Carriage return (byte 13)
X		n	Newline (byte 10)
X		E	Escape (byte 27)
X		@	# (the shift char itself)
X		EOL	Supress newline on this line
X
X	The remainder of the characters in the shifted set all map to
X	themselves.  In particular, '"' and '$' map this way, and shifting
X	can be used to avoid lines starting with "" or $$.  These defaults
X	can be changed with code_map lines in the ABE1 style.
X
X	The generation of a proper TEXT encoder will allow dabe to replace
X	'shar' and other text encodings.
X
XNewlines:
X	While we talk about only using safe printable characters here,
X	one other very special character -- the newline -- is used.
X	ABE files most definitely consist of lines.  Anything that removes
X	all newlines will damage the files.
X
X	Should this occur in the future, the fact that the ABE
X	file formats do not use whitespace can be used, by writing
X	translate utilities that substitute newlines for whitespace
X	and back.  If you find a system that doesn't support newlines
X	or whitespace, I guess this format just won't work.
E-O-F
echo Extracting Makefile
sed 's/^X//' > Makefile << 'E-O-F'
XDECODER=/usr/lib/td%s
X
XDEBFLAGS=
X
X# flags for a typical unix
XCFLAGS = -Dunix -DDECODER=\"$(DECODER)\" $(DEBFLAGS)
X# flags for a typical BSD unix
X#CFLAGS = -Dunix -DDECODER=\"$(DECODER)\" -Dstrchr=index $(DEBFLAGS)
X# flags for an ms-dos compile with MS C or equiv
X#CFLAGS = -Dmsdos -DDECODER=\"$(DECODER)\"
X
Xall:	abe	dabe
X
Xabe: abe.c
X	cc $(CFLAGS) abe.c -o abe
X
X$(DECODER): tinydabe.c
X	cp tinydabe.c $(DECODER)
X
Xdabe: dabe.c
X	cc $(CFLAGS) dabe.c -o dabe
X
Xabe.exe: abe.c
X	cc -dos -Dmsdos -F 2000 abe.c -o abe.exe
X
Xdabe.exe: dabe.c
X	cc -dos -Dmsdos -F 2000 dabe.c -o dabe.exe
X
Xshar:
X	shar read.me abeformat Makefile abe.h abe.1 abe.c dabe.1 dabe.c tdABE1 tdABE2 tdUUENCODE >abe.shar
E-O-F
echo Extracting abe.h
sed 's/^X//' > abe.h << 'E-O-F'
X#define A1NSETS 3			/* number of character sets */
X#define MAX_SETS 5		/* max number of char sets */
X#define A1NPRINTS 86		/* number of printable characters */
X#define MAX_PRINTS 96
X#define TOTAL_PRINTS 128	/* highest ASCII printable possible */
X#define A1FPRINT '%'		/* first printable to be used (ABE1)*/
X#define SFBYTE 31		/* index of first line number byte in
X					line number char set table */
X#define NUM_SAFE 64
X#define A1LPRINT 'z'
X
X#ifndef INB_LEN
X# define INB_LEN 10000		/* input buffer for pre-pass */
X#endif
X#define MAX_LLEN 80		/* max output line len */
X#define FNAMELEN 255		/* max len of file name */
X#define CSUM_MOD 65536l		/* modulus for checksums */
X
Xtypedef char bool;
X
X
X	/* read mode for binary files in this OS */
X#ifdef msdos
X	/* in case of DOS, prepare for binary read mode */
X# define READMODE "rb"
X# define WRITEMODE "wb"
X# include <fcntl.h>
X# include <io.h>
X# define DIRCHARS "/\\:"
X# define OUROS "msdos"
X#else
X# define READMODE "r"
X# define WRITEMODE "w"
X# define DIRCHARS "/"
X# define OUROS "unix"
X#endif
X
X#ifndef DECODER
X#define DECODER "td%s"
X#endif
X
X	/* methods of processing */
X#define ONEPASS 1
X#define TWOPASS 2
X
Xstruct frq {
X	long freq;
X	int bytenum;
X	};
X
X#define LB_LEN 4		/* length of look ahead buffer */
X
X#define MPERLINE 65
X
X#define MAX_UNAME 14		/* max len of universal name */
X#define MAX_FNAME 255		/* maximum file names */
X#define MAX_BLOCKS 255
X#define MAX_COMLEN MAX_LLEN	/* maximum command len */
X#define OUR_VERSION 1000
X#define MAX_STYLE 8		/* max len of a style code */
X
X#define OUR_EOF 256
X
X/* general names for control characters in decoder */
X#define MCHAR 0
X#define SHIFTX 1
X#define SHIFTXX 2
X#define SHIFTXcX 3
X#define CHANGE_SET 4
X#define RUNLENGTH 5
X
X/* for ABE1 format */
X#define SETXX '!'
X#define NEWSET1 '{'
X#define SET10X '}'
X
X/* for ABE2 format */
X#define A2SETXX '"'
X#define A2NEWSET1 '+'
X#define A2X0XSET  ":;<=>?@_"
X	
X
X
X#define TRUE 1
X#define FALSE 0
X
X/* header characters */
X
X#define CODE_HEAD '"'
X#define MAIN_HEAD '#'
X#define SUB_HEAD '$'
X
X#define DEF_BLOCKSIZE 40000L
X
X#define VERNUM 1000
X#define EARLIEST_VERNUM 1000
X#define EARLIEST_SIMD 1000
X
X#define FULL_MAP 255		/* all 8 lines of the character map */
X
Xextern char *malloc();
Xextern char *allocstring();
Xextern FILE *openout();
Xextern char *lineconv();
Xextern long atol();
X
Xstruct fseen_list {
X	char *name_seen;
X	struct fseen_list *next_fs;
X	};
X
X/* codes for return from decoder */
X#define GOOD_FILE 0
X#define NO_DATA 1
X#define UUDECODE -1
X
X/* encoding styles */
X#define UNDEF -1
X#define ABE1 0
X#define ABE2 1
X#define UUENCODE 2
X#define TEXT 3
X/* crc information */
X#define TABSIZE         256            /* no of entries in table */
Xtypedef unsigned long   tcrc;          /* type of crc value */
X/* Gary S. Brown's CRC32 macro */
X#define UPDC32(b, c) (crctab[((int)c ^ b) & 0xff] ^ ((c >> 8) & 0x00FFFFFF))
E-O-F
echo Extracting abe.1
sed 's/^X//' > abe.1 << 'E-O-F'
X.TH ABE 1
X.SH NAME
Xabe - Ascii-Binary Encoder
X.SH SYNOPSIS
X.B abe
X[ options ] [filename ...]
X.SH DESCRIPTION
XThe
X.I abe
Xprogram program encodes binary files into a bullet-proof form consisting
Xonly of printable ASCII characters.  This new form can be sent through
Xcommunications channels which might get upset at non-printable characters,
Xsuch as USENET news, mail and various text file downloading programs.  ABE
Xfiles should be able to pass through a lot of mechanisms and Operating
XSystems that will kill lesser files.
X.PP
X.I Abe
Xis a replacement for the
X.IR uuencode (1)
Xprogram.   The encodings produced by
X.I abe
Xare usually smaller, more compressible, more readable and far more
Xbullet-proof than those produced by
X.I uuencode.
X.PP
XAll lines in an ABE
Xencoding have a three character line number as well as a checksum.  That
Xmeans that ABE lines may be broken apart, scrambled in a random order,
Xand even have garbage lines inserted into them without damage.  The
X.IR sort(1)
Xprogram (or any other text file sort utility) can always restore an ABE file
Xto its proper state.
X.PP
XABE files can be split into "blocks" when the transport mechanism being
Xused is unable to transfer files longer than a given length.  These blocks
Xcontain checksums, length information and `seek address' information for
Xindependent verification.  With the full
X.I dabe
XABE decoder, it is possible to still decode a file with missing blocks.
XEmpty regions will simply be left undefined in the resulting file.  If
Xredundant decoding information is added to the blocks, they can be presented
Xto the decoder in any order, without sorting, and blocks may even be
Xduplicated.  All this was designed with the typical problems of USENET
Xbinary distribution in mind.
X.PP
XTwo decoders exist.  One is the `tiny' decoder,
X.I tinydabe.c.
XThis is a 100 line, public domain, portable C program which can be included
Xwith ABE files.  Thus any person with a C compiler can decode an ABE file,
Xeven if they have never heard of ABE files before.  It is limited to
Xsingle file encodings of less than 2 megabytes in size.
X.PP
XThe full ABE decoder,
X.I dabe,
Xmore advanced decoding, with more error checking, is possible.  It is
Xsuggested that the tiny decoder only be used by first time users of the
Xformat, and those who plan more work should endeavour to use the complete
Xdecoder.
X.SH OPTIONS
X(Note that while option names are displayed here in full, only the first
Xletter is actually required.  For +/\- options, using + turns the option on,
Xand using \- turns the option off.)
X.TP
X.B blocksize=num
XRequest that files be split into blocks with an approximate size of
X.I num.
XNote that files will actually be a little bit larger than the requested
Xsize, so choose a number lower than your hard maximum.  Blocks will be
Xput into the single output file unless an output file prefix name is
Xprovided (p=name).
X.TP
X.B prefix=str
XNormally,
X.I abe
Xwrites encodings to the standard output.  This option turns on file blocking,
Xand arranges for each block to go into a different file.  All file names
Xwill start with the prefix
X.I str
Xand will have a 2-digit hexadecimal number at the end.  The default block
Xsize is 40,000 characters, but that may be set with the (b=num) option.
X.TP
X.B prefix=|command
XOn UNIX systems, if the prefix string begins with an or-bar (|), the
Xblocks will actually be piped through a shell process using popen(3).
XThe shell command string passed to popen will be that generated by
Xsprintf(3) with the prefix string (excluding the or-bar) given as the
Xformat string, and the file number given as an integer argument.  For
Xexample, on Unix:
X.ce
Xabe b=25000 file "p=|mail -s 'Part%d' fbaggins"
Xwould mail all the blocks, with titles, to user fbaggins.  Note
Xthat you must quote the whole option, or the or-bar will be taken as
Xa pipe character by the Unix shell.
X.TP
X.B universalname=name
XABE encodings include both the real name of the encoded file and a special
Xuniversal name that is limited to 12 characters and should contain no
Xdirectory characters like slash.  The universal name is used when decoding
Xon an operating system different from the encoder's system.  Universal
Xnames are also used when multiple files are placed in the same encoding.
XIf you don't provide a universal name, one will be formed from the real
Xfile name.  You can only provide your own universal name when encoding
Xa single file.  If no filename is given, a universal name of "stdin" is used.
X.TP
X.B decoder=pathname
XInsert the source to the tiny ABE decoder "tinydabe.c" from the file
Xin
X.I pathname.
X.TP
X.B sample=size
X.I abe
Xand do either a single pass or double pass over its input, except when
Xthe input is the standard input, in which case only a single pass is possible.
X.I abe
Xlikes to do two passes so that it can get frequency tables for the bytes in
Xthe input file.  The more accurate the frequency tables, the smaller the
Xencoding.  If two passes are not possible, or you request one-pass operation
Xwith this option,
X.I abe
Xreads in a buffer of size
X.I size
Xand builds the frequency table from that.  The default (for stdin) is
X10,000 bytes.  You can set it as high as the limit for dynamic memory
Xallocation on your system.
X.TP
X.B linenumber=num
XNormally ABE encodings start at line one.  If you wish to concatenate two
Xencodings, you can start your second encoding at a higher line number with
Xthis option.  You give the number in decimal, although it will be output
Xin ABE's special format of 3 printable characters.  Encodings that don't
Xstart at line 1 will be rejected by the tiny dabe decoder.
X.TP
X.B +redundant
XIn a blocked encoding, this option asks that redundant information be added
Xto each block, so that the file may be decoded without sorting
Xby the advanced ABE decoder,
Xeven if blocks are missing, duplicated or in the wrong order.
X.TP
X.B +decoder
XRequest that the source for the tiny ABE decoder be inserted into your
XABE encoding.  The source is to be taken from the standard location
Xdefined by your system administrator.
X.TP
X.B +ebcdic
XUse the ABE2 encoding, which is designed to pass through EBCDIC machines
Xwithout trouble.  The ABE2 encoding does not make use of the following
Xcharacters: "![\\]^`{|}~" -- they have all been reported to sometimes
Xnot survive multiple ASCII<-->EBCDIC translations.  It maps to 4 sets
Xof 64 characters, and produces encodings that are slightly larger.
X.TP
X.B +uuencode
XUse the UUENCODE encoding scheme.  These scheme is totally unlike ABE
Xschemes, but is quite popular and sometimes a bit smaller on compressed
Xbinary files.   UUENCODE lines are formed from the 64 characters from space
Xto underbar, with a simple mapping that maps 4 printable characters to
X3 binary bytes.  Files produced with +uuencode can often be decoded by
Xuudecode(1) decoders after the application of sort and a simple sed(1)
Xscript to remove the first four bytes of each line.  If you use the -number
Xoption, the sed script is not even necessary   The UUENCODE format is
Xmore prone to errors and usually is more bulky than either ABE format.
X.TP
X.B -numbers
XRemove line numbers and line checksums from most of the encoding.  The
Xfirst few lines of every block will still have line numbers, but the
Xbulk of the encoding will not.  This saves 4 bytes per line, reducing
Xthe size of encodings by about 6%.   Such encodings can't be decoded by
Xtinydabe decoders, nor can they be sorted.  While they are more prone to
Xpotential errors, most such errors occur between blocks, so removing the
Xline numbers is usually safe.  When used with +uuencode, this option allows
Xencodings that can be decoded both with dabe(1) and uudecode(1).
X.SH OPERATION
X.PP
X.I Abe
Xcan take input in 3 ways.  The first is the standard input, which allows
X.I abe
Xto be used at the end of a pipe.  If the standard input is used,
X.I abe
Xworks in one-pass mode, and only reads the first part of the file to
Xfigure out character mappings.
X.PP
X.I Abe
Xmay also be given a single filename, in which case that file will be
Xencoded.  An alternate output name can be provided with the "universalname="
Xoption.
X.PP
X.I Abe
Xcan also be given multiple files.  The output will be roughly equivalent
Xto the concatenation of single-file ABE encodings, except the line numbers
Xwill continue properly in sequence.  This produces a sort of multi-file
Xarchive, although
X.I abe
Xis not intended to be used as an archiver.   In fact, it is better to use
X.I abe
Xon the output of general non-compressing archivers like tar(1) or cpio(1).
XIt can also be used on compressed archiver output, but generally it's
Xbetter to let the transport mechanism (usually USENET links) worry about
Xdoing the compressing.
X.SH ENCODING FORMAT
XIn the standard ABE1 encoding, 256 bytes are broken up into 3 sets, with 86,
X86 and 84 bytes,
Xrespectively.  The most common 86 bytes in the file go into set 0, and so
Xon.  86 of the printable ASCII characters are used to encode the members
Xof each set.  Special printable escape characters switch from set to set.
X.PP
XIn an ABE encoding, printable characters always map to themselves, if possible.
XThis means that printable character strings found in binary files are still
Xreadable in an ABE encoding.  You can often look at a raw ABE file and see
Xwhat it is, which is quite useful.  In addition, the byte 0 maps to the
XASCII digit "0," and several other similar useful mappings are made.
X.PP
XABE files also have header information that defines information about the
Xencoded files, block headings, sizes and checksums.  For full details on
Xthe encoding format, see the special file on that in the ABE kit.
X.PP
XThe ABE2 encoding splits the 256 bytes into 4 sets of 64 bytes each.
XIt avoids certain dangerous characters.
XOtherwise it is similar to ABE1.  ABE2 encodings are only slightly larger,
Xand slightly less readable than ABE1 encodings
X.SH COMPRESSION
XABE files usually always use the same string of printable characters to
Xrepresent a given string of printable bytes.  (This is not true for
Xuuencodings.)  This is good for LZW compressors.
X.PP
XABE encodings are very good on text files.  In general, except for
Xthe overhead of headers, checksums and line numbers, text files encode
Xto the same size in an ABE file.  Sadly, ABE does its worst job on
Xcompressed files.  This "worst job" is usually about the same as the
Xjob done by uuencode, plus the overhead of headers, checksums and line
Xnumbers.  In general, files posted to USENET should not be pre-compressed,
Xas compression should be left to the transportation mechanisms.  (Most
XUSENET links batch and compress what they transmit.)
X.SH BLOCKING
XThe ABE blocking system is ideal for sending binaries over USENET and
Xother limited channels.  Normally, ABE output is a continuous stream
Xsent to the standard output.
X.SH AUTHOR
XThe ABE system was written by Brad Templeton, who is brad@looking.UUCP.
X(Mail regarding abe should go to abe@looking.UUCP.)
XThe tiny ABE decoder is released to the public domain.  All other files
Xare Copyright 1989 by Brad Templeton.  A licence for unlimited non-commercial
Xuse of these encoders is granted.  See the source code in the ABE kit
Xfor full details on the licence.
X
XNo fee is requested or required for the use of these programs.
XIf you feel the need
Xto show appreciation, You might order copies of the REC.HUMOR.FUNNY
XComputer Network Humour Annual(s) (a USENET jokebook) for 9.95 USD+S/H.
XMail to jokebook@looking.UUCP or call 519/884-7473.  There is no requirement to
Xbuy the jokebook in order to use these programs. 
X.SH FILES
Xtinydabe.c
X.SH "SEE ALSO"
X.IR dabe (1),
X.IR uuencode (1),
XABE file format (abeformat)
X.SH VERSION
XVersion 1.0
E-O-F
echo Extracting abe.c
sed 's/^X//' > abe.c << 'E-O-F'
X#include <stdio.h>
X#include "abe.h"
X
X/* ABE 'Ascii-Binary Encoding' Encoder
X   This is the full and only encoder for ABE files.  It handles
X   multiple-block files with redundant information, and outputs all
X   sorts of other headers to bulletproof the data.  Refer to the man pages
X   for details on the options and file format.
X
X   This program was written with Unix(TM-Bell Labs) in mind, but it
X   can be easily ported to other systems, notably MS-DOS using the
X   Microsoft C compiler conventions for text/binary files. 
X
X   This program was written by Brad Templeton, and is a companion to
X   the full ABE decoder and the simple ABE decoder.  While the simple
X   ABE decoder was released to the public domain, this encoder is not.
X
X   It is Copyright 1989 by Brad Templeton.  I hereby grant a licence
X   for unlimited use of binaries generated by compiling this source
X   code.  The source code may be distributed and modified for non-commercial
X   purposes, but all copyright messages must be retained.  Changes which
X   alter the file format of ABE files are not permitted without
X   the explicit consent of Brad Templeton.  I mean it.
X
X   No fee is required for the use of the program.
X   See the MAN page for more details.
X
X   Changes which move this program to other machines and operating
X   systems are encouraged.  Please send any port to a new machine to
X   me at abe@looking.UUCP.
X
X */
X
X/* the set of printable characters used in line numberas and in ABE2 files */
X
Xchar safe_prints[] =
X"./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
X
Xstruct frq freqs[256];		/* frequencies of characters in the file */
Xunsigned char *inbuf = 0;	/* allocated look-ahead buffer for one pass */
Xint inbsize;			/* size of said one-pass mode buffer */
Xint eofflag = FALSE;		/* got EOF yet? */
Xint bufdex;			/* index into one-pass look ahead buffer */
Xint whatset[256+1+3];		/* what set a given char is in */
Xchar whatprint[256+1];		/* the printable in that set to use */
X
Xint inmode = ONEPASS;		/* method of processing input file */
Xint line_num = 1;		/* line number of first line to output */
Xint file_number = 0;		/* current output block  (all files) */
Xint blocknum = 0;			/* block number in current file */
Xlong block_bytecount;		/* number of bytes in this block */
Xlong bl_outcount;		/* number of printed chars output this block */
Xlong block_checksum;		/* checksum of all info in block */
Xlong total_checksum = 0;	/* checksum of data bytes in file */
Xlong seek_addr = 0;		/* current position in input file */
Xtcrc crc = 0;			/* running 32 bit crc */
Xtcrc bcrc = 0;			/* running 32 bit block crc */
X
Xint num_sets = 0;			/* number of characters sets */
Xint num_prints = 86;			/* number of printables per set */
X
Xint encoding_style = ABE1;	/* style for encoding */
Xint saved_mode = 0664;		/* file permissions, saved for uuencode */
XFILE *instream;			/* input stream */
X
Xbool give_numbers = TRUE;	/* give line numbers in file */
Xbool temp_nonums;		/* temporarily give no numbers */
Xbool did_size = FALSE;		/* have we given the file size */
X
Xtcrc crctab [TABSIZE];		/* big mother CRC-32 lookup table */
X
X	/* read in some, or all of the input (depending on the number of
X	   passes) and build a frequency table for the bytes */
X
X
Xfreqbuild(fname)
Xchar *fname;		/* file name or null string */
X{
X	FILE *inf;	/* temporary input file */
X	int i;		/* general counter */
X	int c;		/* byte read from file */
X	char *malloc();
X	extern int samp_size;		/* size of one pass pre-read sample */
X
X
X	/* clear freq table */
X	for( i = 0; i < 256; i++ ) {
X		freqs[i].freq = 0;
X		freqs[i].bytenum = i;
X		whatprint[i] = 0;
X		}
X
X	if( fname && fname[0] ) {
X		inf = fopen( fname, READMODE );
X		if( !inf )
X			error( "Could not open %s", fname );
X		}
X	 else {
X	 	if( inmode == TWOPASS )
X			error( "Two pass scan not available on standard input");
X		inf = stdin;
X#ifdef msdos
X		setmode( fileno(inf), O_BINARY );
X#endif
X		}
X
X	if( inmode == TWOPASS ) {
X		/* scan the file for frequency table */
X		while( (c = getc(inf)) != EOF )
X			freqs[c].freq++;
X		fclose(inf);
X		}
X	 else {
X		/* build a buffer to save pre-read chars */
X		inbuf = (unsigned char *)malloc( samp_size );
X		if( !inbuf )
X			error( "Could not allocate enough memory for pre-pass");
X		inbsize = fread( inbuf, 1, samp_size, inf );
X		eofflag = inbsize < samp_size;
X		/* set up frequency table */
X		for( i = 0; i <inbsize; i++ )
X			freqs[ inbuf[i] ].freq++;
X		instream = inf;
X		}
X
X}
X
X
X	/* build the character set maps for this file, based on the
X	   frequency tables calculated.  There are 86 printable characters
X	   used.  Three sets of such characters are used to encode bytes.
X
X	   The 86 most common bytes go into set 0.  The next 86 go to
X	   set 1 and the remaining 84 go to set 2.
X
X	   Some characters (space, NL, CR, NUL) always get a favourite
X	   printable character.
X
X	   All remaining printable characters are represented by themselves
X	   (in their set) if possible.
X
X	   In ABE2, we have 4 sets of 64 instead of 3 of 86.
X
X	   */
X
X
Xchar whoused[MAX_SETS][MAX_PRINTS];	/* what characters are taken in each set */
X
X
Xint specials1[] = { ' ', 9 /* tab */, 13 /*cr*/, 10 /*nl*/, 0, 255, -1 };
Xint specials2[] = { ' ', 10, 0, -1 };
X/* what the specials will map to */
Xchar specprint1[] = { '.', ':', '\\', '/', '0', '*' };
Xchar specprint2[] = { '.', '/', '0' };
X
Xint *specials;
Xchar *specprint;
X
Xsetbuild()
X{
X	int i;
X	int set;		/* which set */
X	int ws;			/* what special character */
X	int lastfree[MAX_SETS];	/* last free byte in a given set */
X	int bycomp();
X
X
X	if( num_sets == 0 )
X		return;
X	qsort( freqs, 256, sizeof(struct frq), bycomp );
X
X	/* now assign a char set and printable character to every byte */
X
X	/* clear out the who-is-allocated table */
X	for( set = 0; set < num_sets; set++ ) {
X		for( i = 0; i < num_prints; i++ )
X			whoused[set][i] = FALSE;
X		/* give each character a set */
X		for( i = 0; i < num_prints && set*num_prints+i < 256; i++ ) 
X			whatset[ freqs[set*num_prints+i].bytenum ]  = set;
X		}
X	whatset[256] = -1;		/* illegal set */
X
X	/* if space, tab and return are in the first set, grab some special
X	   chars for each */
X	for( ws = 0; specials[ws] >= 0; ws++ )
X		if( whatset[ specials[ws] ] == 0 ) {
X			whatprint[ specials[ws] ] = specprint[ws];
X			/* mark that char used */
X			whoused[0][code_of(specprint[ws])] = TRUE;
X			}
X	/* now take the printables for themselves, wherever they are found */
X	
X	for( i = 0; i < 256; i++ ) {
X		int c;
X		int ccode;
X		c = freqs[i].bytenum;
X		/* if it is available, take it */
X		ccode = code_of(c);
X		if( isourprint(c) && !whoused[whatset[c]][ccode] ) {
X
X			whoused[whatset[c]][ccode] = TRUE;
X			whatprint[c] = c;
X			}
X		}
X	/* now allocate the rest of the bytes to the first available in
X	   their set */
X
X
X	/* first clear the mark of the first possibly available char */
X	for( set = 0; set < num_sets; set++ )
X		lastfree[set] = 0;
X
X	for( i = 0; i <256; i++ ) {
X		int c;
X		int cset;
X		int inset;
X
X		c = freqs[i].bytenum;
X		cset = whatset[c];
X
X		if( whatprint[c] == 0 ) {
X			/* scan for an available char in this set */
X			for( inset = lastfree[cset]; inset < num_prints; inset++ ){
X				if( !whoused[cset][inset] ) {
X					/* mark this char used */
X					whoused[cset][inset] = TRUE;
X					/* and use it */
X					whatprint[c] = printof(inset);
X					/* have other searches start here */
X					lastfree[cset] = inset+1;
X					break;
X					}
X				}
X			/* it should not be possible for the loop to terminate
X			   normally */
X			}
X		}
X
X	/* the character sets have been allocated */
X
X}
X
X	/* byte comparison function for qsort */
Xint
Xbycomp( a, b )
Xstruct frq *a, *b;
X{
X	/* we want descending order */
X	return a->freq > b->freq ? -1 : ( a->freq == b->freq ? 0 : 1 );
X}
X
X
X	/* initialize the small look ahead buffer, and general input, for
X	   the file encoding pass */
X
Xstatic int lbuf[4];		/* look ahead buffer */
Xint lbufdex;			/* index into buffer */
X
Xinitbuf(fname)
Xchar *fname;
X{
X	int i;
X	extern int inmode;
X
X	if( inmode == TWOPASS ) {
X		instream = fopen( fname, READMODE );
X		inbuf = 0;
X		}
X	 else {
X		/* instream already set up */
X		bufdex = 0;
X		}
X	seek_addr = 0;
X	eofflag = FALSE;
X	for( i = 0; i < LB_LEN; i++ ) 
X		lbuf[i] = ourget();
X	lbufdex = 0;
X}
X
X	/* get the next byte from the input, filling the look ahead
X	   buffer as needed */
X#define MAGIC 010201
X
Xnextchar()
X{
X	short c;
X
X	c = lbuf[lbufdex];
X	lbuf[lbufdex] = ourget();
X	lbufdex = (lbufdex+1) % LB_LEN;
X	if( c != OUR_EOF ) {
X		++seek_addr;	/* where it will be after reading this char */
X		++block_bytecount;	/* total bytes in this block */
X		/* compute CRC with this byte present */
X		/* whole file crc */
X		crc = UPDC32( c, crc );
X		/* block crc */
X		bcrc = UPDC32( c, bcrc );
X		}
X	return (int)c;
X}
X
X	/* macro to look ahead in the input look ahead-buffer */
X
X#define lachar(x)	lbuf[(lbufdex+x)%LB_LEN]
X
X
X	/* the raw routine to get a character from the input */
X
Xourget()
X{
X	int c;
X
X	if( inbuf ) {
X		if( bufdex < inbsize ) 
X			return (unsigned char)inbuf[bufdex++];
X		 else {
X			free( inbuf );
X			inbuf = 0;
X			}
X		}
X	if( eofflag )
X		return OUR_EOF;
X	c = getc(instream);
X	if( c == EOF ) {
X		eofflag = TRUE;
X		return OUR_EOF;
X		}
X	 else
X		return c;
X}
X
Xint redundant = FALSE;		/* add extra headers to each block */
Xlong blocksize = 0L;		/* how long blocks are to be */
Xchar *file_prefix = NULL;	/* prefix for block file names */
Xchar *m_univname = NULL;	/* option universal file name */
Xchar *univname = NULL;
Xint decoder = FALSE;		/* do not add mini-decoder to output */
Xchar *decode_name = DECODER;	/* name of simple decoder source */
Xint samp_size = INB_LEN;	/* size of one pass scan buffer */
X
XFILE *out_desc;			/* output descriptor */
Xchar *fname_array[MAX_FNAME];	/* array of files to process */
Xint fn_count = 0;		/* number of files */
X
X	/* handle options and control encoding */
X
Xmain(argc,argv)
Xint argc;
Xchar **argv;
X{
X	int i;
X	int argnum;
X	char *strchr();
X	long atol();
X
X
X	inmode = TWOPASS;
X		
X
X	for( argnum = 1; argnum < argc; argnum++ ) {
X		char *argline;
X		char *argstr;		/* argument string */
X		int argval;
X		int isplus;		/* boolean tells +arg vs -arg */
X		argline = argv[argnum];
X
X		if (argstr = strchr(argline, '=')) {
X			argstr++;
X			argval = atoi(argstr);
X			switch( argline[0] ) {
X				case 'b':
X					blocksize = atol( argstr );
X					break;
X				case 'p':
X					/* provide a prefix for filenames for
X					   the multiple parts */
X					file_prefix = argstr;
X					/* if no blocksize, pick a default */
X					if( blocksize == 0 )
X						blocksize = DEF_BLOCKSIZE;
X					break;
X				case 'u':
X					/* specify a universal name */
X					m_univname = argstr;
X					if( strlen(m_univname) > MAX_UNAME )
X						m_univname[MAX_UNAME] = 0;
X					break;
X				case 'd':
X					/* add decoder source from specified
X						file */
X					decoder = TRUE;
X					decode_name = argstr;
X					break;
X				case 's':
X					/* sample size, one pass */
X					samp_size = argval;
X					inmode = ONEPASS;
X					break;
X				case 'l':
X					/* set starting line number */
X					line_num = atol( argstr );
X					break;
X				default:
X					do_options();
X					error( "Bad Option %s\n", argline );
X				}
X			}
X		else if( (isplus = argline[0] == '+') || argline[0] == '-' ) {
X			switch( argline[1] ) {
X				case 'r': /* extra info in each block */
X					redundant = isplus;
X					break;
X				case 'd': /* add decoder source */
X					decoder = isplus;
X					break;
X				case 'e': /* ebcdic */
X					encoding_style = isplus ? ABE2 : ABE1;
X					break;
X				case 'u':
X					if( isplus )
X						encoding_style = UUENCODE;
X					break;
X				case 'n':
X					give_numbers = isplus;
X					break;
X				default:
X					do_options();
X					error( "Bad Option %s\n", argline );
X				}
X			}
X		else {
X			/* code for untagged option */
X			if( fn_count >= MAX_FNAME )
X				error( "Too many file names\n" );
X			fname_array[fn_count++] = argline;
X			}
X		}
X	
X	mkcrctab();			/* init crc table */
X
X	init_encoding(encoding_style);
X
X	if( !fn_count )
X		inmode = ONEPASS;
X	init_output(fn_count);
X
X	if( m_univname && fn_count > 1 )
X		error( "You may not specify a universal name with multiple files\n" );
X
X	if( fn_count ) {
X		for( i = 0; i < fn_count; i++ )
X			do_file( fname_array[i] );
X		}
X	 else
X		do_file( NULL );
X#ifdef unix
X	if( file_prefix && file_prefix[0] == '|' && out_desc )
X		pclose( out_desc );
X#endif
X
X	return 0;
X}
X	/* encode a specified source file */
X
Xdo_file( fname )
Xchar *fname;
X{
X	char unbuf[MAX_UNAME+1];		/* keep universal name */
X
X	if( !m_univname ) {
X		formuname( fname, unbuf );
X		univname = unbuf;
X		}
X	 else
X		univname = m_univname;
X
X	start_of_file( fname );
X
X	freqbuild( fname );
X	setbuild();
X
X	total_checksum = 0;
X	crc = 0xffffffffL;
X
X
X	if( !redundant )
X		printmap();
X
X	initbuf( fname );
X
X	if( blocksize )
X		print_blockhead(fname);
X
X	file_preface();
X
X
X	while( outline() )
X		if( blocksize && bl_outcount >= blocksize ) {
X			close_block(TRUE);
X			init_newblock();
X			print_blockhead(fname);
X			}
X	if( blocksize )
X		close_block(FALSE);
X	write_trailer();
X	if( fname )
X		fclose( instream );
X}
X	/* output the byte-character map we have generated */
X
Xprintmap()
X{
X	int line, block, i;
X	int ccount;
X	char linebuf[MAX_LLEN];
X	int thechar;
X	int setnum;
X
X	/* no map if no character sets */
X	if( num_sets == 0 )
X		return;
X
X	thechar = 0;
X	for( line = 0; line < 8; line++ ) {
X		ccount = 0;
X		linebuf[ccount++] = CODE_HEAD;
X		linebuf[ccount++] = CODE_HEAD;
X		linebuf[ccount++] = printof(line);
X		switch( encoding_style ) {
X		 case ABE1:
X			for( block = 0; block < 8; block++ ) {
X				setnum = 0;
X				for( i = 0; i < 4; i++ ) {
X					setnum = setnum * 3 + whatset[thechar];
X					linebuf[ccount++] =whatprint[thechar++];
X					}
X				/* output the set numbers for here */
X				linebuf[ccount++] = printof(setnum);
X				}
X			break;
X		 case ABE2:
X			for( block = 0; block < 16; block++ ) {
X				setnum = 0;
X				for( i = 0; i < 2; i++ ) {
X					setnum = setnum * 4 + whatset[thechar];
X					linebuf[ccount++] =whatprint[thechar++];
X					}
X				/* output the set numbers for here */
X				linebuf[ccount++] = printof(setnum);
X				}
X			break;
X		 }
X
X		linebuf[ccount] = 0;
X		(void)lineput( linebuf );
X		}
X}
X
X	/* encode a line of data and output it to the encoding file */
X
Xoutline()
X{
X	switch( encoding_style ) {
X		case ABE1:
X			return outline1();
X		case ABE2:
X			return outline2();
X		case UUENCODE:
X			return out_uuencode();
X		}
X}
X
Xoutline1()
X{
X	int c;
X	char linebuf[MAX_LLEN];
X	int csum;
X	int ccount;
X	
X	ccount = 0;
X	do {
X		c = nextchar();
X		if( c == OUR_EOF )
X			break;
X		if( whatset[c] == 0 ) {
X			linebuf[ccount++] = whatprint[c];
X			}
X		 else {
X			if( whatset[lachar(0)] > 0 ) {
X				linebuf[ccount++] = SETXX + (whatset[c]-1) * 2 +whatset[lachar(0)]-1;
X				linebuf[ccount++] = whatprint[c];
X				linebuf[ccount++] = whatprint[nextchar()];
X				}
X			 else {
X				if( whatset[c] == 1 && whatset[lachar(1)] > 0){
X					linebuf[ccount++] = SET10X +
X							whatset[lachar(1)]-1;
X					linebuf[ccount++] = whatprint[c];
X					linebuf[ccount++]=whatprint[nextchar()];
X					linebuf[ccount++]=whatprint[nextchar()];
X					}
X				 else {
X					linebuf[ccount++] =NEWSET1+whatset[c]-1;
X					linebuf[ccount++] = whatprint[c];
X					}
X				}
X			}
X		
X		} while ( ccount < MPERLINE );
X	linebuf[ccount] = 0;
X
X	sumput( linebuf );
X
X	return c != OUR_EOF;
X}
X
X/* output line, add to checksum */
Xsumput( line )
Xchar *line;
X{
X	total_checksum = (lineput( line ) + total_checksum) % CSUM_MOD;
X}
X
X/* characters for the X0X mapping in ABE2 */
Xstatic char x0xmaps[] = A2X0XSET;
X
Xoutline2()
X{
X	int c;
X	char linebuf[MAX_LLEN];
X	int csum;
X	int ccount;
X	
X	ccount = 0;
X	do {
X		int cset;
X
X		c = nextchar();
X		if( c == OUR_EOF )
X			break;
X		cset = whatset[c];
X		if( cset == 0 ) {
X			linebuf[ccount++] = whatprint[c];
X			}
X		 else {
X			if( whatset[lachar(0)] > 0 ) {
X				linebuf[ccount++] = A2SETXX + (cset-1) * 3 +whatset[lachar(0)]-1;
X				linebuf[ccount++] = whatprint[c];
X				linebuf[ccount++] = whatprint[nextchar()];
X				}
X			 else {
X				/* 8 mappings only, 303 not allowed */
X				int setmap;
X				if( whatset[lachar(0)] == 0 &&
X						whatset[lachar(1)] > 0 &&
X						(setmap =
X						(cset-1)*3+whatset[lachar(1)]-1)
X						< 8 ) {
X					linebuf[ccount++] =x0xmaps[setmap];
X					linebuf[ccount++] = whatprint[c];
X					linebuf[ccount++]=whatprint[nextchar()];
X					linebuf[ccount++]=whatprint[nextchar()];
X					}
X				 else {
X					linebuf[ccount++] = A2NEWSET1+cset-1;
X					linebuf[ccount++] = whatprint[c];
X					}
X				}
X			}
X		
X		} while ( ccount < MPERLINE );
X	linebuf[ccount] = 0;
X
X	sumput( linebuf );
X
X	return c != OUR_EOF;
X}
X
X/* VARARGS */
Xerror(a,b,c,d,e,f)
Xint a,b,c,d,e,f;
X{
X	fprintf( stderr, a, b, c, d, e, f );
X	fprintf( stderr, "\n" );
X	exit(1);
X}
X
X	/* output a general line, with line number, checksum and NL */
Xint
Xlineput( line )
Xchar *line;
X{
X	int i;
X	int sum;
X
X	sum = 0;
X
X	for ( i = 0; line[i]; i++ )
X		sum += line[i];
X	block_checksum += sum;
X	bl_outcount += i;
X	/* calc and place checksum for line */
X	if( temp_nonums ) {
X		if( strncmp( line, "From ", 5 ) == 0 )
X			fprintf( stderr,"Warning: line %s begins with 'From'\n",
X				line );
X		bl_outcount++;
X		fprintf( out_desc, "%s\n", line );
X		}
X	 else {
X		fprintf( out_desc, "%c%c%c%c%s\n",
X			safe_prints[ SFBYTE + line_num / (NUM_SAFE*NUM_SAFE)],
X			safe_prints[ (line_num / NUM_SAFE) % NUM_SAFE ],
X			safe_prints[ line_num % NUM_SAFE ],
X			safe_prints[ sum % NUM_SAFE ],
X			line );
X		line_num++;
X		bl_outcount += 5;
X		}
X	return sum;
X}
X
X/* names for the defined styles of encoding.  (index as encoding_style ) */
X
Xchar *stylenames[] = {"ABE1", "ABE2", "UUENCODE"};
X
X	/* set up the first output descriptor */
X
Xinit_output(count)
Xint count;
X{
X	char fcbuf[20];
X
X	if( file_prefix ) {
X		file_number = 0;
X		init_newblock();
X		}
X	else
X		out_desc = stdout;	
X	/* print the main header */
X	fprintf( out_desc, ";ABE ASCII-Binary-Encoding (by Brad Templeton)\n" );
X	fprintf( out_desc, ";Use 'sort' and/or 'dabe' to decode\n" );
X	if( decoder ) {
X		FILE *decf;
X		char decname[MAX_FNAME];
X		int c;
X
X		sprintf( decname, decode_name, stylenames[encoding_style] );
X		decf = fopen( decname, "r" );
X		if( !decf )
X			error( "Could not open decoder %s", decname );
X		fprintf( out_desc, "--------Tiny DABE (%s) ------\n", decname );
X		while( (c = getc(decf)) != EOF )
X			putc( c, out_desc );
X		fclose( decf );
X		fprintf( out_desc, "--------Cut here to extract tiny decoder------\n" );
X		}
X	temp_nonums = FALSE;
X	sprintf( fcbuf, "%d", count ? count : 1 );
X	subheading( "filecount", fcbuf );
X}
X
X	/* output the main start-of-file header */
X
X
X
Xstart_of_file(fname)
Xchar *fname;
X{
X	char headline[MAX_LLEN];
X
X	temp_nonums = FALSE;
X	did_size = FALSE;
X
X	sprintf( headline, "%c%cS%u,%u,%u,%s", MAIN_HEAD, MAIN_HEAD,
X			EARLIEST_SIMD, VERNUM, EARLIEST_VERNUM,
X			stylenames[encoding_style] );
X	(void)lineput( headline );
X	/* indicate if this file should have blocks or not */
X	subheading( "blocking", blocksize ? "true" : "false" );
X	if( !give_numbers ) {
X		subheading( "linenumbers", "false" );
X		temp_nonums = TRUE;
X		}
X
X	if( univname )
X		subheading( "uname", univname );
X	if( !redundant )
X		fileinfo(fname);		/* write general file info */
X	blocknum = 0;
X}
X
X	/* output main header information on the file */
X
Xfileinfo(fname)
Xchar *fname;
X{
X	subheading( "os", OUROS );
X	/* we do not output the real name if an explicit universal name
X	   was given */
X	if( fname ) {
X		if( m_univname == NULL )
X			subheading( "fname", fname );
X		osfileinfo(fname);
X		}
X}
X
X#ifdef unix
X
X#include <sys/types.h>
X#include <sys/stat.h>
X#include <pwd.h>
X
X	/* operating system specific file information */
X
Xosfileinfo(fname)
Xchar *fname;
X{
X	struct stat ourf;
X	struct passwd *opwent, *getpwuid();
X	char numbuf[40];
X
X	if( stat( fname, &ourf ) == 0 ) {
X		if( opwent = getpwuid( ourf.st_uid ) )
X			subheading( "owner", opwent->pw_name );
X		sprintf( numbuf,"%ld", (long)ourf.st_mtime );
X		subheading( "date", numbuf );
X		sprintf( numbuf, "%u", ourf.st_mode );
X		saved_mode = ourf.st_mode;
X		subheading( "perm", numbuf );
X		sprintf( numbuf, "%lu",  (long)ourf.st_size );
X		did_size = TRUE;
X		subheading( "size", numbuf );
X		}
X		
X}
X
X#else
X# ifdef msdos
X	/* almost identical unix code due to Microsoft C compat library */
X#include <sys/types.h>
X#include <sys/stat.h>
X
X
Xosfileinfo(fname)
Xchar *fname;
X{
X	struct stat ourf;
X	unsigned short mode;
X	struct passwd *opwent, *getpwuid();
X	char numbuf[40];
X
X	if( stat( fname, &ourf ) == 0 ) {
X		sprintf( numbuf,"%ld", (long)ourf.st_mtime );
X		subheading( "date", numbuf );
X		mode = ourf.st_mode & (S_IREAD|S_IWRITE|S_IEXEC);
X		/* duplicate the mode bits down from user to general */
X		mode |= (mode >> 3) | (mode >> 6);
X		saved_mode = mode;
X		sprintf( numbuf, "%u", mode );
X		subheading( "perm", numbuf );
X		sprintf( numbuf, "%lu",  (long)ourf.st_size );
X		did_size = TRUE;
X		subheading( "size", numbuf );
X		}
X		
X}
X# else
X/* other OS output lines */
Xosfileinfo()
X{
X}
X# endif
X#endif
X
X		/* output a sub-header line */
Xsubheading(hclass,value)
Xchar *hclass;
Xchar *value;
X{
X	char headline[MAX_LLEN];
X	sprintf( headline, "%c%c%s=%s", SUB_HEAD, SUB_HEAD, hclass, value );
X	(void)lineput( headline );
X}
X
X	/* begin a new output block (file) */
Xinit_newblock()
X{
X	char blockname[FNAMELEN];
X
X
X	blocknum++;
X
X	if( file_number >= MAX_BLOCKS )
X		error( "Too many blocks -- limit of 255" );
X	if( file_prefix ) {
X#ifdef unix
X		if( file_prefix[0] == '|' ) {
X			sprintf( blockname, file_prefix+1, file_number );
X			out_desc = popen( blockname, "w" );
X			}
X		 else
X#endif
X			{
X			sprintf( blockname, "%s%2.2x", file_prefix,
X							file_number );
X			out_desc = fopen( blockname, "w" );
X			}
X		file_number++;
X		if( !out_desc )
X			error( "Could not open %s for output", blockname );
X		}
X}
X
X	/* output the header for a block */
X
Xprint_blockhead(fname)
Xchar *fname;
X{
X	char blockhead[MAX_LLEN];
X
X	sprintf( blockhead, "%c%cstartblock=%d,%lu,%u,%s", SUB_HEAD,
X		SUB_HEAD, blocknum, seek_addr, EARLIEST_VERNUM, univname );
X	block_checksum = 0;
X	block_bytecount = 0;
X	bl_outcount = 0;
X	bcrc = 0xffffffffL;		/* block crc */
X	(void)lineput( blockhead );
X
X	if( !give_numbers && !temp_nonums ) {
X		subheading( "linenumbers", "false" );
X		temp_nonums = TRUE;
X		}
X
X	/* give all the main info again if redundant blocks requested */
X	if( redundant ) {
X		subheading( "style", stylenames[encoding_style] );
X		printmap();
X		fileinfo(fname);
X		}
X}
X
X	/* output the trailer for a block */
X
Xclose_block(closefile)
Xint closefile;
X{
X	char blockhead[MAX_LLEN];
X
X	sprintf( blockhead, "%c%ccloseblock=%d,%lu,%lu,%lu", SUB_HEAD, SUB_HEAD,
X			blocknum,  (long)block_checksum % CSUM_MOD, 
X			(long)block_bytecount, ~bcrc );
X	(void)lineput( blockhead );
X	fprintf( out_desc, ";ABE encoding end of part %d\n", blocknum );
X	if( file_prefix && closefile ) {
X#ifdef unix
X		if( file_prefix[0] == '|' )
X			pclose( out_desc );
X		 else
X#endif
X			fclose( out_desc );
X		out_desc = (FILE *)0;
X		}
X	temp_nonums = FALSE;
X		
X}
X
X	/* write the end of file records */
X
Xwrite_trailer()
X{
X	char numbuf[40];
X
X	if( blocksize ) {
X		sprintf( numbuf, "%d", blocknum+1 );	/* they start at 0 */
X		subheading( "total-blocks", numbuf );
X		}
X	if( !did_size ) {
X		sprintf( numbuf, "%lu", (long)seek_addr );
X		subheading( "size", numbuf );
X		}
X	subheading( "end_file", univname );
X	sprintf( numbuf, "%lu", ~crc );
X	subheading( "filecrc32", numbuf );
X	sprintf( numbuf, "%c%cE%ld", MAIN_HEAD, MAIN_HEAD,
X			(long) total_checksum % CSUM_MOD );
X	(void)lineput( numbuf );
X	fprintf( out_desc, ";End of ABE encoding\n" );
X}
X
X	/* form a universal filename from the true pathname */
X	/* this should be OS-dependent code, but for now, DIRCHARS
X	   contains the OS-dependent information */
X
Xformuname( name, ubuf )
Xchar *name;	/* full pathname */
Xchar *ubuf;	/* buffer of 15 bytes (at least) for universal name */
X{
X	int len;
X	int i;
X
X	if( name == NULL || (len  = strlen(name)) == 0 ||
X			strchr( DIRCHARS, name[len-1] ) != NULL ) {
X		strcpy( ubuf, "stdin" );
X		return;
X		}
X	for( i = len-1; i >= 0; i-- )
X		if( strchr( DIRCHARS, name[i] ) ) {
X			break;
X			}
X
X	strncpy( ubuf, name+i+1, MAX_UNAME );
X	ubuf[MAX_UNAME] = 0;
X}
X
X	/* provide the printable character for a given number */
Xprintof(c)
Xint c;
X{
X	return (encoding_style == ABE1) ? c + A1FPRINT : safe_prints[c];
X}
X
X	/* provide the integer code for a given printable char */
X
Xcode_of(n)
Xint n;
X{
X	if( encoding_style == ABE1 )
X		return n - A1FPRINT;
X	 else /* ABE2 */
X		return n - (n > '9' ? (n > 'Z' ? 'a'-38 : 'A'-12) : '.');
X}
X
X	/* is a char one of our printable chars? */
Xisourprint(c)
Xint c;
X{
X	return (encoding_style == ABE1) ? (c >= A1FPRINT && c <= A1LPRINT) :
X		(( c >= '.' && c <= '9') || (c >= 'A' && c <= 'Z') ||
X			( c >= 'a' && c <= 'z' ));
X}
X
Xinit_encoding( style )
X{
X	switch( style ) {
X		case ABE1:
X			num_sets = 3;
X			num_prints = 86;
X			specials = specials1;
X			specprint = specprint1;
X			break;
X		case ABE2:
X			num_sets = 4;
X			num_prints = 64;
X			specials = specials2;
X			specprint = specprint2;
X			break;
X		case UUENCODE:
X			inmode = ONEPASS;
X			samp_size = 100;	/* token read ahead */
X			num_sets = 0;
X			break;
X		}
X}
X
X	/* any preface on the data */
X
Xfile_preface()
X{
X	char pline[MAX_LLEN];
X
X	switch( encoding_style ) {
X		case UUENCODE:
X			sprintf( pline, "begin %o %s", saved_mode & 0777,
X					univname );
X			sumput( pline );
X			break;
X		}
X}
X
X/* this uses grave instead of space */
X
X#define UUE(x)  (( ((x)-1) & 0x3f )+'!')
X
Xout_uuencode()
X{
X	char uinbuf[46];
X	int len;
X	int c;
X
X	/* read in up to 45 bytes */
X	for( len = 0; len < 45; len++ ) {
X		if( (c = nextchar()) == OUR_EOF )
X			break;
X		uinbuf[len] = c;
X		}
X	/* we must never allow a line of length 2, 3, or 4, as that might,
X	   just maybe, generate one of our header lines */
X	if( len > 1 && len < 5 ) {
X		int i;
X		/* spew it out as single char lines -- what a waste */
X		for( i = 0; i < len; i++ )
X			uuline( uinbuf+i, 1 );
X		}
X	 else
X		uuline( uinbuf, len );
X	if( len == 0 )
X		sumput( "end" );
X	return len > 0;
X}
X
Xuuline(uinbuf, len)
Xchar *uinbuf;		/* binary data */
Xint len;		/* number of bytes of binary data */
X{
X	char uubuf[MAX_LLEN];
X	char *ubp;
X	int i;
X
X	ubp = uubuf;
X
X	*ubp++ = UUE(len);
X
X	for( i = 0; i < len; i += 3 ) {
X		*ubp++ = UUE( uinbuf[i] >> 2 );
X		*ubp++ = UUE( (uinbuf[i] << 4) & 060 | (uinbuf[i+1] >> 4)&017 );
X		*ubp++ = UUE( (uinbuf[i+1] << 2) & 074 | (uinbuf[i+2] >> 6)&03);
X		*ubp++ = UUE( uinbuf[i+2] & 077 );
X		}
X	*ubp = 0;
X	/* end of file if len was 0 */
X	sumput( uubuf );
X}
X
Xdo_options() {
X	fprintf( stderr, "Usage:\n" );
X	fprintf( stderr, "\tabe [options] files\n" );
X	fprintf( stderr, "or to read from stdin:\n" );
X	fprintf( stderr, "\tabe [options]\n" );
X	fprintf( stderr, "\nAbe options:\n" );
X	fprintf( stderr, "\t+r\tAdd redundant info to each block of multi-block encodings\n" );
X	fprintf( stderr, "\t+d\tAdd source to tiny decoder\n" );
X	fprintf( stderr, "\t+e\tUse character set safe to translate into EBCDIC\n" );
X	fprintf( stderr, "\t+u\tUse UUENCODE style of encoding\n" );
X	fprintf( stderr, "\t-n\tDo not place line sequence 'numbers' in output\n" );
X	fprintf( stderr, "\tblocksize=int\t\tBlock size for multi-block encoding files\n" );
X	fprintf( stderr, "\tprefix=path-prefix\tPrefix for names of multi-block encoding files\n" );
X	fprintf( stderr, "\tuname=simple-filename\tUniversal filename (single file encode only)\n" );
X	fprintf( stderr, "\tdecoder=pathname\tName of tiny decoder source code file\n" );
X	fprintf( stderr, "\tsample=int\t\tSample size for one pass read of stdin\n" );
X
X	fprintf( stderr, "\nIf you use -n, tiny decoders won't work.  On the other hand, -n and +u\n" );
X	fprintf( stderr, "together make an encoding most UUDECODE programs can decode.\n" );
X}
X
X/* Crc-32 builder, thanks to Rahul Dhesi */
X#define CRC_32          0xedb88320L    /* CRC-32 polynomial */
X
X/* calculates CRC of one item */
Xtcrc
Xonecrc (item)
Xint item;
X{
X   int i;
X   tcrc accum = 0;
X   item <<= 1;
X   for (i = 8;  i > 0;  i--) {
X      item >>= 1;
X      if ((item ^ accum) & 0x0001)
X         accum = (accum >> 1) ^ CRC_32;
X      else
X         accum >>= 1;
X   }
X   return (accum);
X}
X
X/* generates CRC table, calling onecrc() to make each term */
Xmkcrctab()
X{
X   int i;
X   for (i = 0;  i < TABSIZE;  i++)
X      crctab[i] = onecrc (i);
X}
E-O-F