root@qetzal.UUCP (Admin) (12/18/87)
[Submitted courtesy of John Rupley, reachable at arizona!rupley!local] (The following is for those who have purchased for ca $300 or otherwise have legal access to the ATT source code for the new awk. If you use awk regularly for text or database manipulation, the new awk is really worth its price. See the book, "The Awk Programming Language", by Aho, Kernighan and Weinberger (Addison-Wesley, 198[78]) for a description.) The new awk compiles and runs on an 80286 system (Microport sys5/AT 2.2), but with some problems that are not found on a Vax: (1) The system hangs on "print" of numerical values larger than 31 bits. Work-around is to use "printf". (2) Regular expressions containing a bracketed character group (character class) dumped core. Fix is to repair int and int * declarations and casts and sizeofs. (3) There are various compilation problems, not related to code, that have fixes or work-arounds. (4) Richard Stevens has reported a bug in and a fix of error() in lib.c Details of (1) to (4) are given below. I would appreciate learning of a fix for (1). Anyone else have nawk problems that they have resolved? I suspect that there are some still submerged monsters that will surface when running on a 286 machine. The new awk is truly improved and appears to be worth the problem of bringing it up. John Rupley uucp: ..{ihnp4 | hao!noao}!arizona!rupley!local internet: rupley!local@megaron.arizona.edu telex: 9103508679(JARJAR) (H) 30 Calle Belleza, Tucson AZ 85716 - (602) 325-4533 (O) Dept. Biochemistry, Univ. Arizona, Tucson AZ 85721 - (602) 621-3929 ************************************************************************** Sat Nov 21 16:01:23 MST 1987 NEW AWK-- ON MICROPORT SYS5/AT (80286) (1) The system hangs on "print" of numerical values larger than 31 bits. BUG: The following hangs the system with $1=46341; ok with $1=46340 (input mode does not matter-- same effect with 4.6341e4): echo $1 | awk '{print $1; x = $1*$1; print x}' WORK-AROUND: The following code is fine for large values, eg $1=1e100 (although *very* large values hang the system): echo $1 | awk '{print $1; x = $1*$1; printf("%20.6e\n", x)}' COMMENTS: The system hangs when nawk attempts to "print" a numerical result larger than 31 bits (2.147 * 10^9). This was found through testing under the awktest.a programs t.i.x and T.misc. The hang is *only* with the function "print". Large numbers (greater than 10^100) can be handled with "printf". Similarly, no problem with computation ( *, exp(), ...), except again a hang at very large results, much greater than 10^100. There is no problem with nawk on the vax, for which the limit is 127 bits in the output of the "print" function (at least in the exercises used). The uport v2.2 distribution version of old awk seems to hang on the above print statement also. (2) Regular expressions containing a bracketed character group dumped core. BUG: any /re/ of the type /..[..]../ dumps core. FIX (NOT GUARANTEED): in awk.h, declare lval as int * in b.c, change several casts and sizeofs, from int to int * or int * to int, as appropriate (See the diff file below.) COMMENTS: The structure element re[].lval is used to store both characters (as int) and pointers to arrays of characters. A vax is happy with the structure declaration of lval as int. However, with 16 bit int's, uport is not. The kludged code passes the awktest.a tests, but no guarantees-- I did as little tracing of the logic as possible, to the extent of not even listing the code for b.c, much less that for the other routines. (3A) Compilation problem-- BUG: getsval() stands alone in a line of code FIX: insert dummy lvalue, in lib.c and run.c, at several places (See the diff file below.) COMMENTS: PCC under uport coughs, apparently at a conditional expression that lacks an assignment. Specifically, getsval() is #defined as a several-branched conditional, some branches of which do not constitute a statement. (3B) Other remarks on compilation on a 286 machine-- The version of yacc from uport gave an out of space error in the nawk compilation. Yacc from sys5 source, compiled with the "HUGE" size option, worked. Alternatively, one might get y.tab.[ch] and lex.yy.c by use of yacc and lex on a vax, ie generate C code from awk.g.y and awk.lx.l on a vax, then move it over and compile it on a uport system. Small model compilation of nawk gave a working version, but its data segment is too small for reasonable temporaries, etc, as found by testing with the awktest.a programs and data files, which are not particularly large. Large model compilation (-Ml) gave a version that passes the awktest.a tests, after repair of the problems noted above, excepting the two tests that hang the system. For the debug option (nawk -d ....) to work, various dprint statements must have %o changed to %lo. (4) There is a bug in the print operations of error() of lib.c, reported by Richard Stevens in comp.bugs.sys5. His fix is included in the diff file below. *********************diff output*************************** (distribution compared with modified) awk.h compared with ../nawk/awk.h 203c203 < int lval; --- > int *lval; b.c compared with ../nawk/b.c 115c115 < if ((f->posns[0] = (int *) Calloc(1, *(f->re[0].lfollow)*sizeof(int))) == NULL) --- > if ((f->posns[0] = (int *) Calloc(1, *(f->re[0].lfollow)*sizeof(int *))) == NULL) 117c117 < if ((f->posns[1] = (int *) Calloc(1, sizeof(int))) == NULL) --- > if ((f->posns[1] = (int *) Calloc(1, sizeof(int *))) == NULL) 137c137 < if ((f->posns[2] = (int *) Calloc(1, (k+1)*sizeof(int))) == NULL) --- > if ((f->posns[2] = (int *) Calloc(1, (k+1)*sizeof(int *))) == NULL) 278c278 < f->re[(int) left(v)].lval = (int) right(v); --- > f->re[(int) left(v)].lval = (int *) right(v); 283c283 < if ((p = (int *) Calloc(1, (setcnt+1)*sizeof(int))) == NULL) --- > if ((p = (int *) Calloc(1, (setcnt+1)*sizeof(int *))) == NULL) 439c439 < if ((f->posns[2] = (int *) Calloc(1, (k+1)*sizeof(int))) == NULL) --- > if ((f->posns[2] = (int *) Calloc(1, (k+1)*sizeof(int *))) == NULL) 491c491 < if ((f->posns[2] = (int *) Calloc(1, (k+1)*sizeof(int))) == NULL) --- > if ((f->posns[2] = (int *) Calloc(1, (k+1)*sizeof(int *))) == NULL) 707c707 < if (k == CHAR && c == f->re[p[i]].lval --- > if (k == CHAR && c == (int ) f->re[p[i]].lval 754c754 < if ((p = (int *) Calloc(1, (setcnt+1)*sizeof(int))) == NULL) --- > if ((p = (int *) Calloc(1, (setcnt+1)*sizeof(int *))) == NULL) lib.c compared with ../nawk/lib.c 183a184 > uchar *dummy; 190c191 < getsval(recloc); --- > dummy = getsval(recloc); 400a402,404 > /* bug fix richard stevens */ > register int c; > 404a409,411 > > /* bug fix richard stevens */ > /* 405a413,417 > */ > while (c = *s++) > putc(c, stderr); > /* end of bug fix */ > makefile compared with ../nawk/makefile 10c10,11 < CFLAGS = -g --- > #CFLAGS = -g > CFLAGS = -g -Ml run.c compared with ../nawk/run.c 901a902 > uchar *dummy; 905,906c906,907 < getsval(x); < getsval(y); --- > dummy = getsval(x); > dummy = getsval(y); 1346a1348 > uchar *dummy; 1349c1351 < getsval(x); --- > dummy = getsval(x); -- //////////////////286 Moderator -- comp.unix.microport\\\\\\\\\\\\\\\\\ Email to microport@uwspan for info on the newsgroup comp.unix.microport. otherwise mail to microport@uwspan with a Subject containing one of: 386 286 Bug Source Merge or "Send Buglist" (rutgers!uwvax!uwspan!microport)