jmm@eci386.uucp (John Macdonald) (06/28/90)
This is mostly a precautionary tale (although Larry may decide to treat it is a bug report if he wishes). I was recently doing a conversion script that processed a file that could contain mixed text and binary - where the text was at the beginning of the file (a leading #! line and some other stuff). The conversion would result in a file which had the same leading header lines, but would have the binary stuff run through a filter (decrypt, or uncompress, etc.). I was trying to process the leading text lines as follows: ---- start ---- sub splitline { if( $buf =~ /\n/ ) { $line = "$`\n"; $buf = "$'"; } else { $line = ""; } } open( curin, "file" ) || die "Can't open file"; exit unless read( curin, $buf, 1024 ); do splitline(); if( $line =~ /^#!/ ) { print $line; do splitline(); } # check for other possible leading text lines ... # ... # now filter the binary open( FILTER, '|filterprog' ); select( FILTER ); print $buf; while( read( curin, $buf, 1024 ) ) { print $buf; } close( FILTER ); ---- end --- The problem occurred in the splitline function - when it found a text line it would correctly set $line, but the assignment of $buf = "$'"; did not set $buf to everything after the match, but stopped at a null byte. I'll let Larry decide whether he wants to consider this to be a bug. (While it would be possible to handle this special case without (presumably) too much hassle, trying to allow for all possible variations of binary data being processed by regexp might be rather tough. I'm sure Randall will be able to find fertile ground for obfuscated signatures in the possibilities...) It was easy enough for me to work around in my case, I just used: $buf = substr( $buf, length($line), length($buf)-length($line) ); instead. -- Algol 60 was an improvment on most | John Macdonald of its successors - C.A.R. Hoare | jmm@eci386
lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (06/29/90)
In article <1990Jun28.142155.12170@eci386.uucp> jmm@eci386.UUCP (John Macdonald) writes:
: This is mostly a precautionary tale (although Larry may decide
: to treat it is a bug report if he wishes).
I do. $' should work on binary data. It will be fixed in the next patch.
Larry