luke@modus.sublink.org (Luciano Mannucci) (05/31/91)
Submitted-by: Luciano Mannucci <luke@modus.sublink.org> Posting-number: Volume 20, Issue 30 Archive-name: reclin/part01 This program reads ugly blocked files from odd machines, converting them in lines on a field-by-field basis. It can handle as many fields as your computer does, until it has RAM enough to handle them. It's syntax is a little bit old-fashioned because of it has been deeply used with the unix command dd. It should compile on any machine able to understand C. Luciano Mannucci #---------------------------------- cut here ---------------------------------- # This is a shell archive. Remove anything before this line, # then unpack it by saving it in a file and typing "sh file". # # Wrapped by Luciano Mannucci <luke@modus> on Mon May 27 18:18:48 1991 # # This archive contains: # README reclin.1 reclin.c # # Error checking via wc(1) will be performed. LANG=""; export LANG PATH=/bin:/usr/bin:$PATH; export PATH echo x - README cat >README <<'@EOF' Reclin, by L. Mannucci and A. Fioravanti. This is the "public domain" version of a little program written some time ago in order to read magnetic tapes coming from IBM mainframes. It is released under the terms stated below and it should not be difficult to port to various platforms. It has been compiled and tested on HP-UX 7.0, Sun Unix, SunOS (4.0.3 and 4.1.1), Microsoft Xenix, SCO XENIX V 3.2.3, SCO UNIX 3.2, NCR UNIX V 2.1.2 . It has been ported to MS-DOS too, using MS C 6.0. HOW TO BUILD: No Makefile is provided because the defaults of `make' should be enough. On many systems, just type "make reclin" after having "unshared" the file. To build under MS-DOS, you need to link using binmode.obj. BUGS AND PATCHES are to be mailed to: luke@modus.sublink.ORG. NOTICE: THIS SOFTWARE IS RELEASED "AS IS", WITHOUT ANY WARRANTY OF ANY KIND. PERMISSION IS GRANTED TO COPY, SELL, INSERT INTO SOFTWARE PACKAGES OR OPERATING SYSTEMS, BIND INTO STANDARDS, TURN INTO A DEMON, SUPPORT, UNSUPPORT, CHANGE, MODIFY FORGET OR DESTROY, PROVIDED THAT NO RESPONSABILITY WHATSOEVER IS LAID ON THE AUTHORS. RELEASES PRIOR TO 2.0 ARE NOT IN THE PUBLIC DOMAIN. @EOF set `wc -lwc <README` if test $1$2$3 != 231961152 then echo ERROR: wc results of README are $* should be 23 196 1152 fi chmod 640 README echo x - reclin.1 cat >reclin.1 <<'@EOF' .TH RECLIN 1 "June 1991" .| reclin version 2.0 .SH NAME reclin - converts blocked files into lines .SH SYNOPSIS reclin -fmt|filefmt [file] [-sep_char] [count=#] [opt=..] [bs=#] .SH DESCRIPTION .I Reclin reads binary bytes from the input file in blocks and outputs lines with positional fields separated by a char (default blank). It is intended for converting fixed record files with positional fields (rather common in IBM or database oriented machines) into a format suitable for use with .I awk(1) or .I cut(1) unix tools accordingly with the \fB format string\fR. If the format is too long or too complex it can be stored in a file. In any case the format is taken from the first argument, as a string if it begins with a minus sign ('-'), otherwise as a file name containing the format. No default format is provided. .LP .SH FORMAT The format string consists of a record length expressed in bytes followed by one or more field specifiers, separated by any character .B not from the ranges [0-9], [a-z] or [A-Z]. A field specifier is an integer representing the field width followed by a character indicating the requested action. Possible actions currently available are: .LP .IP \fBs\fR: Print out the field as is. .LP .IP \fBa\fR: Print the field if it is in printable \fBascii\fR. Otherwise it prints a warning on the standard error and converts the non-ascii byte into a blank. .LP .IP \fBb\fR: Print the field turning it into .B ascii from \fBebcdic\fR. Printable-only characters are converted; non printable characters are silently turned into blank. .LP .IP \fBp\fR: Convert from packed decimal. .LP .IP \fBe\fR: Expand from binary. Fields greater than 4 bytes are not allowed. This conversion is highly machine-dependent as long as .I binary representation is a computer architecture related topic. .LP .IP \fBu\fR: Expand from unsigned binary. See the .B e field specifier for the restrictions. .LP .IP \fBE\fR: Expand from binary, swapping bytes. Only 2 and 4 byte-long fields are allowed. .LP .IP \fBU\fR: Expand from unsigned binary. Same as the .B E option. .LP .IP \fBf\fR: Skip the contents of the field and output the field separator only. .LP .IP \fBn\fR: Skip the contents of the field but do not output the field separator. .PP Multiple similar fields can be specified using #x (where # is a number and x is the character 'x'). For example, .B 2s:2s:2s:2s can be specified as \fB4x:2s\fR. .PP In a format file a valid example could be: .br 80 2x 20s 20n 20s .br while the equivalent given as a string could be: .br -80:2x:20s:20n:20s .SH ARGUMENTS .PP The first argument must be the .B format string and is mandatory. All other arguments are optional. .TP 16 -sep_char a char following a minus sign is taken as the output field separator char (i.e. .B -, means that the output separator required is a comma ','). Default is set to blank. .TP 16 count=\fBn\fR output only the first .B n records. .TP 16 opt=\fBnsv\fR Options are specified via this switch. .B v stands for print the .I version to the standard error. .B n means don't output the final .I newline character at the end of the record. .B s is for .I silent running: nothing will be output to the standard error. .TP 16 bs=\fBn\fR set input block size to .B n bytes. This option was intended to deal with physical blocked raw mag tapes. Its use is discouraged: where possible, use .I dd(1) instead. .PP Any argument otherwise not recognised is taken as the file to be processed. Only the first one is actually opened. If omitted, it defaults to the standard input. .LP .SH AUTHORS Luciano Mannucci && Alberto Fioravanti. .SH BUGS Please send bug reports, enhancements and patches to luke@modus.sublink.ORG @EOF set `wc -lwc <reclin.1` if test $1$2$3 != 1336313692 then echo ERROR: wc results of reclin.1 are $* should be 133 631 3692 fi chmod 644 reclin.1 echo x - reclin.c cat >reclin.c <<'@EOF' /* * Copyright 1991 by Luciano Mannucci * * NOTICE: THIS SOFTWARE IS RELEASED "AS IS", WITHOUT ANY WARRANTY * OF ANY KIND. PERMISSION IS GRANTED TO COPY, SELL, INSERT INTO * SOFTWARE PACKAGES OR OPERATING SYSTEMS, BIND INTO STANDARDS, TURN * INTO A DEMON, SUPPORT, UNSUPPORT, CHANGE, MODIFY FORGET OR DESTROY, * PROVIDED THAT NO RESPONSABILITY WHATSOEVER IS LAID ON THE AUTHORS. * PLEASE INCLUDE THIS NOTICE WHEN DISTRIBUTING THIS SOURCE CODE. * RELEASES PRIOR TO 2.0 ARE NOT IN THE PUBLIC DOMAIN. */ /* * @(#) reclin.c 2.0 Mon May 27 18:07:16 MET 1991 * Revised By Luciano Mannucci. * Enhancements: portability strongly decreased. * Alot of new bogus misfeatures added ;-). */ /* * @(#) reclin.c 1.6.1.1 (Luke & Al) 10/3/86 12:47:59 * (Luciano Mannucci && Alberto Fioravanti) * * reclin is intended for IBM blocked files processing * in order to make them treatable * by the standard unix filter AWK . * */ #include <sys/types.h> #include <sys/stat.h> #include <stdio.h> #include <ctype.h> #ifndef MSDOS #include <values.h> #endif #define MSG "reclin -fmt|filefmt [file] [-sep_char] [count=#] [opt=..] [bs=#]" #define OBUFLEN 8192 #define K 1024 #ifndef MAXLONG #define MAXLONG 4294967295L #endif #define NONZERO 3 char *whatid = " @(#) reclin.c 2.0 Mon May 27 18:07:16 MET 1991"; static unsigned *field; static unsigned char *f_flag; static unsigned reclen, nelem; static unsigned long non_a; long atol(); static struct stat EnQ; static char *ptr, sep, nlrq; static union { short d; char c[2]; } eshort; static union { long l; char c[4]; } elong; static char *fm1[3] = { "%d", "%hd", "%ld" }; static char *fm2[3] = { "%u", "%hu", "%lu" }; static unsigned char _toa[] = { 0x00,0x01,0x02,0x03,0x00,0x09,0x00,0x7f,0x00,0x00,0x00,0x0b,0x0c,0x0d,0x0e,0x0f, 0x10,0x11,0x12,0x13,0x00,0x0a,0x08,0x00,0x18,0x19,0x00,0x00,0x00,0x00,0x00,0x00, 0x00,0x00,0x1c,0x00,0x00,0x0a,0x17,0x1b,0x00,0x00,0x00,0x00,0x00,0x05,0x06,0x07, 0x00,0x00,0x16,0x00,0x00,0x1e,0x00,0x04,0x00,0x00,0x00,0x00,0x14,0x15,0x00,0x1a, 0x20,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x2e,0x3c,0x28,0x2b,0x5b, 0x26,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x21,0x24,0x2a,0x29,0x3b,0x5d, 0x2d,0x2f,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x7c,0x2c,0x25,0x5f,0x3e,0x3f, 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x3a,0x23,0x40,0x27,0x3d,0x22, 0x00,0x61,0x62,0x63,0x64,0x65,0x66,0x67,0x68,0x69,0x00,0x00,0x00,0x00,0x00,0x00, 0x00,0x6a,0x6b,0x6c,0x6d,0x6e,0x6f,0x70,0x71,0x72,0x00,0x00,0x00,0x00,0x00,0x00, 0x00,0x7e,0x73,0x74,0x75,0x76,0x77,0x78,0x79,0x7a,0x00,0x00,0x00,0x00,0x00,0x00, 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, 0x7b,0x41,0x42,0x43,0x44,0x45,0x46,0x47,0x48,0x49,0x00,0x00,0x00,0x00,0x00,0x00, 0x7d,0x4a,0x4b,0x4c,0x4d,0x4e,0x4f,0x50,0x51,0x52,0x00,0x00,0x00,0x00,0x00,0x00, 0x5c,0x00,0x53,0x54,0x55,0x56,0x57,0x58,0x59,0x5a,0x00,0x00,0x00,0x00,0x00,0x00, 0x30,0x31,0x32,0x33,0x34,0x35,0x36,0x37,0x38,0x39,0x00,0x00,0x00,0x00,0x00,0x00, }; unsigned char btoa(c) { return (_toa[c&0xff] ? _toa[c&0xff] : ' '); } fSize(f) char *f; { if (stat(f, &EnQ) != 0) return 0; return (int) EnQ.st_size; } char *splits(s) register char *s; { while (s && *s) if (isfield(*s)) s++; else { while (*s && (!isfield(*s))) *s++ = 0; return s;} return (char *) 0; } isfield(c) char c; { if (c >='0' && c <='9') return 1; if (c >='a' && c <='z') return 1; if (c >='A' && c <='Z') return 1; return 0; } char *spalloc(n) { char *p, *malloc(); if ((p = malloc(n)) == (char *) 0) horror("Sorry, ", "not enough ram"); return p; } char *sprealloc(s,n) char *n; { char *p, *realloc(); if ((p = realloc(s, n)) == (char *) 0) horror("Sorry, ", "not enough ram yet"); return p; } lnprt(s) char *s; { register count,incount, bct = 1; register notfound = 0; unsigned char by, oksep = 1; char **f; f = fm1; for(count=1;count<nelem;count++) { for(incount=notfound=0;incount<field[count];incount++,s++) { switch(f_flag[count]) { case 's': putchar(*s); notfound = 1; break; case 'a': if (*s < ' ' || *s > 0x7f) { fprintf(stderr, "Warning: non ASCII char %#X\n",(int) *s); non_a++; *s = ' '; } putchar(*s); notfound = 1; break; case 'b': putchar(btoa(*s)); notfound = 1; break; case 'f': s += (field[count]-1); incount = field[count]; notfound = 1; break; case 'n': s += (field[count]-1); incount = field[count]; notfound = 1; oksep = 0; break; case 'p': by = *(s+(field[count]-1)); by &= 0x0f; if (by == 0xd) putchar('-'); for (bct =1; bct <= field[count]; bct++, s++) { if ( bct == field[count] ) { *s >>= 4; *s &= 0x0f; printf("%d", *s); break; } else { by = *s; *s >>= 4; *s &= 0x0f; by &= 0x0f; printf("%d%d", *s, by); } } incount = field[count]; notfound = 1; break; case 'u': f = fm2; case 'e': switch (field[count]) { case 1: printf(f[0],(int) *s & 0xff); break; case 2: eshort.c[0] = *s; eshort.c[1] = *s+1; printf(f[1], eshort.d); break; case 3: elong.c[0] = 0; elong.c[1] = *s; elong.c[2] = *s+1; elong.c[3] = *s+2; printf(f[2], elong.l); break; case 4: elong.c[0] = *s; elong.c[1] = *s+1; elong.c[2] = *s+2; elong.c[3] = *s+3; printf(f[2], elong.l); break; default: break; } incount = field[count]; notfound = 1; f = fm1; break; case 'U': f = fm2; case 'E': switch (field[count]) { case 1: printf(f[0],(int) *s & 0xff); break; case 2: eshort.c[1] = *s; eshort.c[0] = *s+1; printf(f[1], eshort.d); break; case 4: elong.c[3] = *s; elong.c[2] = *s+1; elong.c[1] = *s+2; elong.c[0] = *s+3; printf(f[2], elong.l); break; default: break; } incount = field[count]; notfound = 1; f = fm1; break; case 'd': default: if (isdigit(*s)) { putchar(*s); notfound = 1; } break; } } if (notfound==0) putchar('0'); if (oksep) putchar(sep); oksep = 1; } if (nlrq) putchar('\n'); } FILE * f_open(s, ss) char *s, *ss; { FILE *iot = fopen(s, ss); if (iot == (FILE *) 0) horror("Can't open File: ", s); return iot; } FILE * f_reopen(s, ss, ioio) char *s, *ss; FILE *ioio; { FILE *ioo = freopen(s, ss, ioio); if (ioo = (FILE *) 0) horror("Can't reopen File: ", s); return ioo; } main(argc, argv) char **argv; { FILE *iop; FILE *ioi = (FILE *) 0; unsigned long count; register tmp = 0; unsigned long maxcct = MAXLONG; char *destr, *s, petitbuf[81], *malloc(); if (argc == 1) syntax(MSG); sep = ' '; nlrq = 1; /* Argument decoding */ while (argc > 2) { if (s_cmp(argv[argc -1], "opt=")) { s=(&(argv[--argc][4])); while (*s) switch (*s++) { case 's': f_reopen("/dev/null","w",stderr); break; case 'v': fprintf(stderr, "%s\n", whatid); break; case 'n': nlrq = 0; break; default: fprintf(stderr,"Unknown option: %c\n", *(s-1)); } continue; } if (s_cmp(argv[argc -1], "count=")) { maxcct = atol(&(argv[--argc][6])); maxcct--; continue; } if (s_cmp(argv[argc -1], "bs=")) { tmp = atoi(&(argv[--argc][3])); continue; } if (*argv[argc -1] == '-' && argv[argc -1][1]) { sep = *(argv[--argc]+1); continue; } if (ioi) fclose(ioi); ioi = f_open(argv[--argc], "r"); if (ioi == (FILE *) 0) horror(argv[0], ": AaaaaAAAAAaaarrrgh!"); continue; } if (ioi == (FILE *) 0) ioi = stdin; if (tmp) setvbuf(ioi, spalloc(tmp), _IOFBF, tmp); if (*argv[1] != '-') { if ((tmp = fSize(argv[1])) == 0) horror(argv[0], "file format empty"); destr = spalloc(tmp); iop = f_open(argv[1], "r"); if (! (fread(destr, 1, tmp, iop))) horror(argv[0], "Aaarrghhh!"); } else { destr = &(argv[1][1]); tmp = strlen(destr); } if (tmp < 4) horror(argv[0], " invalid format"); /* Format decoding */ field = (unsigned *) spalloc(sizeof(int)*K); f_flag = (unsigned char *) spalloc(K); tmp = K; field[0] = atoi(destr); destr = splits(destr); for(nelem=1;;nelem++) { int i; char c; s = splits(destr); if (nelem >= tmp) { tmp += K; field = (unsigned *) sprealloc(field, sizeof(int)*tmp); f_flag = (unsigned char *) sprealloc(f_flag, tmp); } i = atoi(destr); c = destr[strlen(destr) -1]; if (c != 'x') { field[nelem]=i; f_flag[nelem]=c; } else { int j,k; char h; if (s == (char *) 0) horror(argv[0], " invalid format(*)"); destr = s; s = splits(destr); k = atoi(destr); h = destr[strlen(destr) -1]; for (j = 0; j < i; j++,nelem++) { if (nelem >= tmp) { tmp += K; field = (unsigned *) sprealloc(field, sizeof(int)*tmp); f_flag = (unsigned char *) sprealloc(f_flag, tmp); } field[nelem]=k; f_flag[nelem]=h; } } destr = s; if (destr == (char *) 0) break; } /* Check that the sum of field lengths matches the record length */ for(tmp=1,reclen=0;tmp<=nelem;tmp++) reclen+=field[tmp]; if (field[0] != reclen) { sprintf(petitbuf,"bad record length count (is %d: expected %d)", reclen, field[0]); horror("reclin:", petitbuf); } /* ptr is the pointer to the buffer to read the record */ ptr = spalloc(reclen); /* Record reading */ for(count=0;((tmp = fread(ptr,1,reclen,ioi)) == reclen);count++) { lnprt(ptr); if (count >= maxcct) { tmp = 0; count++; break; } } /* The End */ fprintf(stderr,"%s: %ld record(s) out\n", argv[0], count); if (tmp != 0) horror(argv[0], "bad last record (possible file out of format)"); if (non_a) fprintf(stderr, "nonascii: %lu\n", non_a); fclose(iop); fclose(ioi); } s_cmp(str,s) char *str,*s; { for ( ; *s; s++,str++ ) if ( *s != *str ) return 0; return 1; } horror(s, ss) char *s, *ss; { fprintf(stderr,"%s%s\n", s, ss); exit(NONZERO); } syntax(s) char *s; { horror("Syntax: ", s); } @EOF set `wc -lwc <reclin.c` if test $1$2$3 != 470130410244 then echo ERROR: wc results of reclin.c are $* should be 470 1304 10244 fi chmod 666 reclin.c exit 0 -- _ _ __ Via Aleardo Aleardi, 12 - 20154 Milano (Italy) | | | _ _| (__ PHONE : +39 2 3315328 FAX: +39 2 3315778 | | |(_)(_||_|___) Srl E-MAIL: luke@modus.sublink.ORG ______________________________ Software & Services for Advertising & Marketing exit 0 # Just in case... -- Kent Landfield INTERNET: kent@sparky.IMD.Sterling.COM Sterling Software, IMD UUCP: uunet!sparky!kent Phone: (402) 291-8300 FAX: (402) 291-4362 Please send comp.sources.misc-related mail to kent@uunet.uu.net.