[comp.sources.misc] v20i030: reclin - turn blocks into lines, Part01/01

luke@modus.sublink.org (Luciano Mannucci) (05/31/91)

Submitted-by: Luciano Mannucci <luke@modus.sublink.org>
Posting-number: Volume 20, Issue 30
Archive-name: reclin/part01

This program reads ugly blocked files from odd machines, converting
them in lines on a field-by-field basis. It can handle as many
fields as your computer does, until it has RAM enough to handle
them.

It's syntax is a little bit old-fashioned because of it has been deeply
used with the unix command dd.

It should compile on any machine able to understand C.

Luciano Mannucci 
#---------------------------------- cut here ----------------------------------
# This is a shell archive.  Remove anything before this line,
# then unpack it by saving it in a file and typing "sh file".
#
# Wrapped by Luciano Mannucci <luke@modus> on Mon May 27 18:18:48 1991
#
# This archive contains:
#	README		reclin.1	reclin.c	
#
# Error checking via wc(1) will be performed.

LANG=""; export LANG
PATH=/bin:/usr/bin:$PATH; export PATH

echo x - README
cat >README <<'@EOF'
Reclin, by L. Mannucci and A. Fioravanti.

This is the "public domain" version of a little program written some
time ago in order to read magnetic tapes coming from IBM mainframes.
It is released under the terms stated below and it should not be difficult
to port to various platforms. It has been compiled and tested on
HP-UX 7.0, Sun Unix, SunOS (4.0.3 and 4.1.1), Microsoft Xenix,
SCO XENIX V 3.2.3, SCO UNIX 3.2, NCR UNIX V 2.1.2 .
It has been ported to MS-DOS too, using MS C 6.0.

HOW TO BUILD:
No Makefile is provided because the defaults of `make' should be enough. 
On many systems, just type "make reclin" after having "unshared" the file.
To build under MS-DOS, you need to link using binmode.obj.

BUGS AND PATCHES are to be mailed to: luke@modus.sublink.ORG.

NOTICE: THIS SOFTWARE IS RELEASED "AS IS", WITHOUT ANY WARRANTY
OF ANY KIND. PERMISSION IS GRANTED TO COPY, SELL, INSERT INTO
SOFTWARE PACKAGES OR OPERATING SYSTEMS, BIND INTO STANDARDS, TURN
INTO A DEMON, SUPPORT, UNSUPPORT, CHANGE, MODIFY FORGET OR DESTROY,
PROVIDED THAT NO RESPONSABILITY WHATSOEVER IS LAID ON THE AUTHORS.
RELEASES PRIOR TO 2.0 ARE NOT IN THE PUBLIC DOMAIN.
@EOF
set `wc -lwc <README`
if test $1$2$3 != 231961152
then
	echo ERROR: wc results of README are $* should be 23 196 1152
fi

chmod 640 README

echo x - reclin.1
cat >reclin.1 <<'@EOF'
.TH RECLIN 1 "June 1991"
.| reclin version 2.0
.SH NAME
reclin - converts blocked files into lines
.SH SYNOPSIS
reclin -fmt|filefmt [file] [-sep_char] [count=#] [opt=..] [bs=#]
.SH DESCRIPTION
.I Reclin
reads binary bytes from the input file in blocks and outputs lines
with positional fields separated by a char (default blank). It is
intended for converting fixed record files with positional fields
(rather common in IBM or database oriented machines) into a format
suitable for use with
.I awk(1)
or
.I cut(1)
unix tools accordingly with the \fB format string\fR.
If the format is too long or too complex it can be stored in a file. In
any case the format is taken from the first argument, as a string if it
begins with a minus sign ('-'), otherwise as a file name containing the
format. No default format is provided.
.LP
.SH FORMAT
The format string consists of a record length expressed in bytes followed by
one or more field specifiers, separated by any character
.B not
from the ranges [0-9], [a-z] or [A-Z]. A field specifier is an integer
representing the field width followed by a character indicating the
requested action. Possible actions currently available are:
.LP
.IP \fBs\fR:
Print out the field as is.
.LP
.IP \fBa\fR:
Print the field if it is in printable \fBascii\fR. Otherwise it prints
a warning on the standard error and converts the non-ascii byte into a blank.
.LP
.IP \fBb\fR:
Print the field turning it into
.B ascii
from \fBebcdic\fR. Printable-only characters are converted; non printable
characters are silently turned into blank.
.LP
.IP \fBp\fR:
Convert from packed decimal.
.LP
.IP \fBe\fR:
Expand from binary. Fields greater than 4 bytes are not allowed. This
conversion is highly machine-dependent as long as 
.I binary representation
is a computer architecture related topic.
.LP
.IP \fBu\fR:
Expand from unsigned binary. See the
.B e
field specifier for the restrictions.
.LP
.IP \fBE\fR:
Expand from binary, swapping bytes. Only 2 and 4 byte-long fields
are allowed.
.LP
.IP \fBU\fR:
Expand from unsigned binary. Same as the
.B E
option.
.LP
.IP \fBf\fR:
Skip the contents of the field and output the field separator only.
.LP
.IP \fBn\fR:
Skip the contents of the field but do not output the field separator.
.PP
Multiple similar fields can be specified using #x (where # is a number and
x is the character 'x'). For example,
.B 2s:2s:2s:2s
can be specified as \fB4x:2s\fR.
.PP
In a format file a valid example could be:
.br
80 2x 20s 20n 20s
.br
while the equivalent given as a string could be:
.br
-80:2x:20s:20n:20s
.SH ARGUMENTS
.PP
The first argument must be the
.B format string
and is mandatory. All other arguments are optional.
.TP 16
-sep_char
a char following a minus sign is taken as the output field separator char
(i.e.
.B -,
means that the output separator required is a comma ','). Default is
set to blank.
.TP 16
count=\fBn\fR
output only the first
.B n
records.
.TP 16
opt=\fBnsv\fR
Options are specified via this switch.
.B v
stands for print the
.I version
to the standard error.
.B n
means don't output the final
.I newline
character at the end of the record.
.B s
is for
.I silent
running: nothing will be output to the standard error.
.TP 16
bs=\fBn\fR
set input block size to
.B n
bytes. This option was intended to deal with physical blocked raw
mag tapes. Its use is discouraged: where possible, use
.I dd(1)
instead.
.PP
Any argument otherwise not recognised is taken as the file to be processed.
Only the first one is actually opened. If omitted, it defaults to the
standard input.
.LP
.SH AUTHORS
Luciano Mannucci && Alberto Fioravanti.
.SH BUGS
Please send bug reports, enhancements and patches to luke@modus.sublink.ORG
@EOF
set `wc -lwc <reclin.1`
if test $1$2$3 != 1336313692
then
	echo ERROR: wc results of reclin.1 are $* should be 133 631 3692
fi

chmod 644 reclin.1

echo x - reclin.c
cat >reclin.c <<'@EOF'
/*
 * Copyright 1991 by Luciano Mannucci
 *
 * NOTICE: THIS SOFTWARE IS RELEASED "AS IS", WITHOUT ANY WARRANTY
 * OF ANY KIND. PERMISSION IS GRANTED TO COPY, SELL, INSERT INTO
 * SOFTWARE PACKAGES OR OPERATING SYSTEMS, BIND INTO STANDARDS, TURN
 * INTO A DEMON, SUPPORT, UNSUPPORT, CHANGE, MODIFY FORGET OR DESTROY,
 * PROVIDED THAT NO RESPONSABILITY WHATSOEVER IS LAID ON THE AUTHORS.
 * PLEASE INCLUDE THIS NOTICE WHEN DISTRIBUTING THIS SOURCE CODE.
 * RELEASES PRIOR TO 2.0 ARE NOT IN THE PUBLIC DOMAIN.
 */

/*
 * @(#) reclin.c 2.0 Mon May 27 18:07:16 MET 1991
 * Revised By Luciano Mannucci.
 * Enhancements: portability strongly decreased.
 * Alot of new bogus misfeatures added ;-).
 */

/*
 * @(#) reclin.c 1.6.1.1 (Luke & Al)	10/3/86 12:47:59
 * (Luciano Mannucci && Alberto Fioravanti)
 *
 * reclin is intended for IBM blocked files processing
 * in order to make them treatable
 * by the standard unix filter AWK .
 *
 */

#include <sys/types.h>
#include <sys/stat.h>
#include <stdio.h>
#include <ctype.h>

#ifndef MSDOS
#include <values.h>
#endif

#define MSG	"reclin -fmt|filefmt [file] [-sep_char] [count=#] [opt=..] [bs=#]"

#define OBUFLEN		8192

#define K		1024

#ifndef MAXLONG
#define MAXLONG 4294967295L
#endif

#define NONZERO 3

char *whatid = " @(#) reclin.c 2.0 Mon May 27 18:07:16 MET 1991";

static unsigned *field;

static unsigned char *f_flag;

static unsigned reclen, nelem;

static unsigned long non_a;

long atol();

static struct stat EnQ;
static char *ptr, sep, nlrq;
static union {
	short	d;
	char	c[2];
} eshort;
static union {
	long	l;
	char	c[4];
} elong;
static char *fm1[3] = { "%d", "%hd", "%ld" };
static char *fm2[3] = { "%u", "%hu", "%lu" };

static unsigned char _toa[] = {
0x00,0x01,0x02,0x03,0x00,0x09,0x00,0x7f,0x00,0x00,0x00,0x0b,0x0c,0x0d,0x0e,0x0f,
0x10,0x11,0x12,0x13,0x00,0x0a,0x08,0x00,0x18,0x19,0x00,0x00,0x00,0x00,0x00,0x00,
0x00,0x00,0x1c,0x00,0x00,0x0a,0x17,0x1b,0x00,0x00,0x00,0x00,0x00,0x05,0x06,0x07,
0x00,0x00,0x16,0x00,0x00,0x1e,0x00,0x04,0x00,0x00,0x00,0x00,0x14,0x15,0x00,0x1a,
0x20,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x2e,0x3c,0x28,0x2b,0x5b,
0x26,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x21,0x24,0x2a,0x29,0x3b,0x5d,
0x2d,0x2f,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x7c,0x2c,0x25,0x5f,0x3e,0x3f,
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x3a,0x23,0x40,0x27,0x3d,0x22,
0x00,0x61,0x62,0x63,0x64,0x65,0x66,0x67,0x68,0x69,0x00,0x00,0x00,0x00,0x00,0x00,
0x00,0x6a,0x6b,0x6c,0x6d,0x6e,0x6f,0x70,0x71,0x72,0x00,0x00,0x00,0x00,0x00,0x00,
0x00,0x7e,0x73,0x74,0x75,0x76,0x77,0x78,0x79,0x7a,0x00,0x00,0x00,0x00,0x00,0x00,
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
0x7b,0x41,0x42,0x43,0x44,0x45,0x46,0x47,0x48,0x49,0x00,0x00,0x00,0x00,0x00,0x00,
0x7d,0x4a,0x4b,0x4c,0x4d,0x4e,0x4f,0x50,0x51,0x52,0x00,0x00,0x00,0x00,0x00,0x00,
0x5c,0x00,0x53,0x54,0x55,0x56,0x57,0x58,0x59,0x5a,0x00,0x00,0x00,0x00,0x00,0x00,
0x30,0x31,0x32,0x33,0x34,0x35,0x36,0x37,0x38,0x39,0x00,0x00,0x00,0x00,0x00,0x00,
};

unsigned char btoa(c)
{
	return (_toa[c&0xff] ? _toa[c&0xff] : ' ');
}

fSize(f)
char *f;
{
	if (stat(f, &EnQ) != 0) return 0;

	return (int) EnQ.st_size;
}

char *splits(s)
register char *s;
{
	while (s && *s) if (isfield(*s)) s++; else {
		while (*s && (!isfield(*s))) *s++ = 0;
		return s;}
	return (char *) 0;
}

isfield(c)
char c;
{
	if (c >='0' && c <='9') return 1;
	if (c >='a' && c <='z') return 1;
	if (c >='A' && c <='Z') return 1;
	return 0;
}

char *spalloc(n)
{
	char *p, *malloc();

	if ((p = malloc(n)) == (char *) 0) horror("Sorry, ", "not enough ram");
	return p;
}

char *sprealloc(s,n)
char *n;
{
	char *p, *realloc();

	if ((p = realloc(s, n)) == (char *) 0)
		horror("Sorry, ", "not enough ram yet");
	return p;
}

lnprt(s)
char *s;
{
	register count,incount, bct = 1;
	register notfound = 0;
	unsigned char by, oksep = 1;
	char **f;

	f = fm1;
	for(count=1;count<nelem;count++) {
		for(incount=notfound=0;incount<field[count];incount++,s++) {
			switch(f_flag[count]) {
				case 's':
					putchar(*s);
					notfound = 1;
					break;
				case 'a':
					if (*s < ' ' || *s > 0x7f) {
						fprintf(stderr,
							"Warning: non ASCII char %#X\n",(int) *s);
						non_a++;
						*s = ' ';
					}
					putchar(*s);
					notfound = 1;
					break;
				case 'b':
					putchar(btoa(*s));
					notfound = 1;
					break;
				case 'f':
					s += (field[count]-1);
					incount = field[count];
					notfound = 1;
					break;
				case 'n':
					s += (field[count]-1);
					incount = field[count];
					notfound = 1;
					oksep = 0;
					break;
				case 'p':
					by = *(s+(field[count]-1));
					by &= 0x0f;
					if (by == 0xd) putchar('-');
					for (bct =1; bct <= field[count]; bct++, s++) {
						if ( bct == field[count] ) {
							*s >>= 4;
							*s &= 0x0f;
							printf("%d", *s);
							break;
						} else {
							by = *s;
							*s >>= 4;
							*s &= 0x0f;
							by &= 0x0f;
							printf("%d%d", *s, by);
						}
					}
					incount = field[count];
					notfound = 1;
					break;
				case 'u':
					f = fm2;
				case 'e':
					switch (field[count]) {
						case 1:
							printf(f[0],(int) *s & 0xff);
							break;
						case 2:
							eshort.c[0] = *s;
							eshort.c[1] = *s+1;
							printf(f[1], eshort.d);
							break;
						case 3:
							elong.c[0] = 0;
							elong.c[1] = *s;
							elong.c[2] = *s+1;
							elong.c[3] = *s+2;
							printf(f[2], elong.l);
							break;
						case 4:
							elong.c[0] = *s;
							elong.c[1] = *s+1;
							elong.c[2] = *s+2;
							elong.c[3] = *s+3;
							printf(f[2], elong.l);
							break;
						default:
							break;
					}
					incount = field[count];
					notfound = 1;
					f = fm1;
					break;
				case 'U':
					f = fm2;
				case 'E':
					switch (field[count]) {
						case 1:
							printf(f[0],(int) *s & 0xff);
							break;
						case 2:
							eshort.c[1] = *s;
							eshort.c[0] = *s+1;
							printf(f[1], eshort.d);
							break;
						case 4:
							elong.c[3] = *s;
							elong.c[2] = *s+1;
							elong.c[1] = *s+2;
							elong.c[0] = *s+3;
							printf(f[2], elong.l);
							break;
						default:
							break;
					}
					incount = field[count];
					notfound = 1;
					f = fm1;
					break;
				case 'd':
				default:
					if (isdigit(*s)) {
						putchar(*s);
						notfound = 1;
					}
					break;
			}
		}
		if (notfound==0)
			putchar('0');
		if (oksep)
			putchar(sep);
		oksep = 1;
	}
	if (nlrq) putchar('\n');
}

FILE * f_open(s, ss)
char *s, *ss;
{
	FILE *iot = fopen(s, ss);
	if (iot == (FILE *) 0) horror("Can't open File: ", s);
	return iot;
}

FILE * f_reopen(s, ss, ioio)
char *s, *ss;
FILE *ioio;
{
	FILE *ioo = freopen(s, ss, ioio);
	if (ioo = (FILE *) 0) horror("Can't reopen File: ", s);
	return ioo;
}

main(argc, argv)
char **argv;
{
	FILE *iop;
	FILE *ioi = (FILE *) 0;
	unsigned long count;
	register tmp = 0;
	unsigned long maxcct = MAXLONG;
   	char *destr, *s, petitbuf[81], *malloc();

	if (argc == 1) syntax(MSG);
	sep = ' ';
	nlrq = 1;

	/* Argument decoding */

	while (argc > 2) {
		if (s_cmp(argv[argc -1], "opt=")) {
			s=(&(argv[--argc][4]));
			while (*s) switch (*s++) {
				case 's':
					f_reopen("/dev/null","w",stderr);
					break;
				case 'v':
					fprintf(stderr, "%s\n", whatid);
					break;
				case 'n':
					nlrq = 0;
					break;
				default:
					fprintf(stderr,"Unknown option: %c\n", *(s-1));
			}
			continue;
		}
		if (s_cmp(argv[argc -1], "count=")) {
			maxcct = atol(&(argv[--argc][6]));
			maxcct--;
			continue;
		}
		if (s_cmp(argv[argc -1], "bs=")) {
			tmp = atoi(&(argv[--argc][3]));
			continue;
		}
		if (*argv[argc -1] == '-' && argv[argc -1][1]) {
			sep = *(argv[--argc]+1);
			continue;
		}
		if (ioi) fclose(ioi);
		ioi = f_open(argv[--argc], "r");
		if (ioi == (FILE *) 0) horror(argv[0], ": AaaaaAAAAAaaarrrgh!");
		continue;
	}

	if (ioi == (FILE *) 0) ioi = stdin;

	if (tmp) setvbuf(ioi, spalloc(tmp), _IOFBF, tmp);

	if (*argv[1] != '-') {
		if ((tmp = fSize(argv[1])) == 0)
			horror(argv[0], "file format empty");
		destr = spalloc(tmp);
		iop = f_open(argv[1], "r");
		if (! (fread(destr, 1, tmp, iop))) horror(argv[0], "Aaarrghhh!");
	} else {
		destr = &(argv[1][1]);
		tmp = strlen(destr);
	}

	if (tmp < 4) horror(argv[0], " invalid format");

	/* Format decoding */

	field = (unsigned *) spalloc(sizeof(int)*K);
	f_flag = (unsigned char *) spalloc(K);
	tmp = K;

	field[0] = atoi(destr);
	destr = splits(destr);

	for(nelem=1;;nelem++) {
		int i;
		char c;
		s = splits(destr);
		if (nelem >= tmp) {
			tmp += K;
			field = (unsigned *) sprealloc(field, sizeof(int)*tmp);
			f_flag = (unsigned char *) sprealloc(f_flag, tmp);
		}
		i = atoi(destr);
		c = destr[strlen(destr) -1];
		if (c != 'x') {
			field[nelem]=i;
			f_flag[nelem]=c;
		} else {
			int j,k;
			char h;
			if (s == (char *) 0) horror(argv[0], " invalid format(*)");
			destr = s;
			s = splits(destr);
			k = atoi(destr);
			h = destr[strlen(destr) -1];
			for (j = 0; j < i; j++,nelem++) {
				if (nelem >= tmp) {
					tmp += K;
					field = (unsigned *) sprealloc(field, sizeof(int)*tmp);
					f_flag = (unsigned char *) sprealloc(f_flag, tmp);
				}
				field[nelem]=k;
				f_flag[nelem]=h;
			}
		}
		destr = s;
		if (destr == (char *) 0) break;
	}

	/* Check that the sum of field lengths matches the record length */

	for(tmp=1,reclen=0;tmp<=nelem;tmp++)
		reclen+=field[tmp];

	if (field[0] != reclen) {
		sprintf(petitbuf,"bad record length count (is %d: expected %d)",
		reclen, field[0]);
		horror("reclin:", petitbuf);
	}
	
	
	/* ptr is the pointer to the buffer to read the record */
	ptr = spalloc(reclen);

	/* Record  reading */

	for(count=0;((tmp = fread(ptr,1,reclen,ioi)) == reclen);count++) {
		lnprt(ptr);		
		if (count >= maxcct) {
			tmp = 0;
			count++;
			break;
		}
	}
	
	/* The End */

	fprintf(stderr,"%s: %ld record(s) out\n", argv[0], count);
	if (tmp != 0)
		horror(argv[0], "bad last record (possible file out of format)");
	if (non_a) fprintf(stderr, "nonascii: %lu\n", non_a);
	fclose(iop);
	fclose(ioi);
}

s_cmp(str,s)
char *str,*s;
{
	for ( ; *s; s++,str++ )
		if ( *s != *str ) return 0;
	return 1;
}

horror(s, ss)
char *s, *ss;
{
	fprintf(stderr,"%s%s\n", s, ss);
	exit(NONZERO);
}

syntax(s)
char *s;
{
	horror("Syntax: ", s);
}
@EOF
set `wc -lwc <reclin.c`
if test $1$2$3 != 470130410244
then
	echo ERROR: wc results of reclin.c are $* should be 470 1304 10244
fi

chmod 666 reclin.c

exit 0
-- 
  _ _           __             Via Aleardo Aleardi, 12 - 20154 Milano (Italy)
 | | | _  _|   (__             PHONE : +39 2 3315328 FAX: +39 2 3315778
 | | |(_)(_||_|___) Srl        E-MAIL: luke@modus.sublink.ORG
______________________________ Software & Services for Advertising & Marketing

exit 0 # Just in case...
-- 
Kent Landfield                   INTERNET: kent@sparky.IMD.Sterling.COM
Sterling Software, IMD           UUCP:     uunet!sparky!kent
Phone:    (402) 291-8300         FAX:      (402) 291-4362
Please send comp.sources.misc-related mail to kent@uunet.uu.net.