liberte@uiucdcs.UUCP (04/28/84)
#N:uiucdcs:8200019:000:3684 uiucdcs!liberte Apr 27 16:14:00 1984 Subject: Awk doesnt allow \b \r \f and \ddd in strings, regular expressions (except \ddd) or character classes. Index: /usr/src/bin/awk/awk.lx.l 4.2BSD Description: According to the awk manual "The version of printf is identical to that used with C". This seems to indicate that all the escape sequences of C should work. But only \n \t \\ and \" do work. Thus the only way to get a \f is to printf "%c" followed by the code for FF which I just now failed to find within 2 minutes. I subsequently learned that all the escape sequence processing for awk is done by lex (similarly for C) (so you can forget about trying to print "\\" "n"). In the interest of compatibility with C, I added the missing code. I also fixed some other inconsistencies: 1) \ddd fixed to allow 1, 2 or 3 octal digits only. (awk requires 3 digits and allows digits 8 or 9) 2) \. will ignore \ if . is not recognized. (awk does not ignore \ except in regular expressions) Repeat-By: Try this awk program: /\ignore[\d]/ { print "\ignored \\"} /\\/ { print "formfeed\fbackspace\bcarriage return\r\""} /\10/ { print "not back space( \010), but \10"} /[\010\t\7]/ { print "back space, tab or bell\7" } Fix: In the following changes to awk.lx.l, *** is the original. I fixed the diff output so it is easier to read, so the line numbers are gone. *** /tmp/,RCSt1002731 Sat Apr 21 20:11:09 1984 --- awk.lx.l Sat Apr 21 20:09:44 1984 *************** *** 24,29 A [a-zA-Z_] B [a-zA-Z0-9_] D [0-9] WS [ \t] %% --- 25,31 ----- A [a-zA-Z_] B [a-zA-Z0-9_] D [0-9] + OD [0-7] WS [ \t] %% *************** *** 122... <reg>")" RETURN(')'); <reg>"^" RETURN('^'); <reg>"$" RETURN('$'); ! <reg>\\{D}{D}{D} { sscanf(yytext+1, "%o", &yylval); RETURN(CHAR); } <reg>\\. { if (yytext[1]=='n') yylval = '\n'; else if (yytext[1] == 't') yylval = '\t'; else yylval = yytext[1]; RETURN(CHAR); } --- ... ----- <reg>")" RETURN(')'); <reg>"^" RETURN('^'); <reg>"$" RETURN('$'); ! <reg>\\{OD}{OD}?{OD}? { sscanf(yytext+1, "%o", &yylval); RETURN(CHAR); } <reg>\\. { if (yytext[1]=='n') yylval = '\n'; else if (yytext[1] == 't') yylval = '\t'; + else if (yytext[1] == 'b') yylval = '\b'; + else if (yytext[1] == 'r') yylval = '\r'; + else if (yytext[1] == 'f') yylval = '\f'; else yylval = yytext[1]; RETURN(CHAR); } *************** *** 137,... yylval = (hack)setsymtab(cbuf, s, 0.0, CON|STR, symtab); RETURN(STRING); } <str>\n { yyerror("newline in string"); lineno++; BEGIN A; } <str>"\\\"" { cbuf[clen++]='"'; } <str,chc>"\\"n { cbuf[clen++]='\n'; } <str,chc>"\\"t { cbuf[clen++]='\t'; } <str,chc>"\\\\" { cbuf[clen++]='\\'; } <str>. { CADD; } --- ... ----- yylval = (hack)setsymtab(cbuf, s, 0.0, CON|STR, symtab); RETURN(STRING); } <str>\n { yyerror("newline in string"); lineno++; BEGIN A; } <str>"\\\"" { cbuf[clen++]='"'; } + <str,chc>"\\"{OD}{OD}?{OD}? { sscanf(yytext+1, "%o", &yylval); + cbuf[clen++] = (char)yylval; } <str,chc>"\\"n { cbuf[clen++]='\n'; } <str,chc>"\\"t { cbuf[clen++]='\t'; } + <str,chc>"\\"b { cbuf[clen++]='\b'; } + <str,chc>"\\"r { cbuf[clen++]='\r'; } + <str,chc>"\\"f { cbuf[clen++]='\f'; } <str,chc>"\\\\" { cbuf[clen++]='\\'; } + <str,chc>"\\". { cbuf[clen++]=yytext[1]; } <str>. { CADD; } ********************************************************** I had some trouble using: {OD}{1,3} instead of {OD}{OD}?{OD}? for the <reg> match. Any ideas why? Daniel LaLiberte, U of Illinois, Urbana-Champaign, Computer Science (uiucdcs!liberte) {if it's a feature - document it; if it's a bug - document it or fix it}