gnb@bby.oz.au (Gregory N. Bond) (05/21/91)
Consider this fragment of code, which is extracted from a script I am writing to parse and interpret a data feed that has the following format: SOH header STX data ETX 2 bcc characters optional NUL I was using $/ = SOH to split the lines as the nul was not necessarily present, so the line ended with ETX, 2 characters, potential NUL, SOH. I inserted some diagnostics to asci-fy the input data and got random results. A much-cutdown version is shown below, with results. Note that the string "^@" in the output is a literal NUL character (i.e. hex 0x0) that I have converted for news. Script started on Tue May 21 12:14:06 1991 leo% cat t.perl #! /usr/local/bin/perl $NUL = "\0"; $SOH = "\1"; $STX = "\2"; $ETX = "\3"; $_ = "data a${STX}data b${ETX}cc${NUL}${SOH}"; $p = $_; $p =~ s/$ETX/<ETX>/og; $p =~ s/$STX/<STX>/og; $p =~ s/$SOH/<SOH>/og; $p =~ s/$NUL/<NUL>/og; print "[$p]\n"; $p = $_; $p =~ s/$NUL/<NUL>/og; $p =~ s/$SOH/<SOH>/og; $p =~ s/$STX/<STX>/og; $p =~ s/$ETX/<ETX>/og; print "[$p]\n"; leo% perl -v This is perl, version 4.0 $RCSfile: perl.c,v $$Revision: 4.0.1.1 $$Date: 91/04/11 17:49:05 $ Patch level: 3 Copyright (c) 1989, 1990, 1991, Larry Wall Perl may be copied only under the terms of the GNU General Public License, a copy of which can be found with the Perl 4.0 distribution kit. leo% perl t.perl [data a<STX>data b<ETX>cc^@<SOH>] [data a<STX>data b<ETX>cc^@<NUL>] leo% ^D script done on Tue May 21 12:14:22 1991 The output should be the same independent of the order in which the substitutions are done, and the literal NUL should have gone. Is this a bug or am I out to lunch here? Environment is Solbourne OS/MP 4.0D (equiv to Sun4 / SunOs 4.0.3) with the system compiler and all defaults in config. Also happens with perl compiled on a Sun 3/60, SunOs 4.0.3. Greg. -- Gregory Bond, Burdett Buckeridge & Young Ltd, Melbourne, Australia Internet: gnb@melba.bby.oz.au non-MX: gnb%melba.bby.oz@uunet.uu.net Uucp: {uunet,pyramid,ubc-cs,ukc,mcvax,prlb2,nttlab...}!munnari!melba.bby.oz!gnb
jwu@kepler.com (Jasper Wu) (05/21/91)
Greg, from perl manpage on s///: ...If the PATTERN evaluates to a null string, the most recent successful regular expression is used instead.... In article <GNB.91May21123118@leo.bby.oz.au> gnb@bby.oz.au (Gregory N. Bond) writes: >#! /usr/local/bin/perl > >$NUL = "\0"; >$SOH = "\1"; >$STX = "\2"; >$ETX = "\3"; > >$_ = "data a${STX}data b${ETX}cc${NUL}${SOH}"; > > $p = $_; > $p =~ s/$ETX/<ETX>/og; > $p =~ s/$STX/<STX>/og; > $p =~ s/$SOH/<SOH>/og; > $p =~ s/$NUL/<NUL>/og; ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > print "[$p]\n"; > > $p = $_; > $p =~ s/$NUL/<NUL>/og; ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > $p =~ s/$SOH/<SOH>/og; > $p =~ s/$STX/<STX>/og; > $p =~ s/$ETX/<ETX>/og; > print "[$p]\n"; > both lines have $NUL evalualtes to null string, and happened to be equivalent to $p =~ s/$SOH/<SOH>/og; that's why you got the (undesirable) result. >leo% perl t.perl >[data a<STX>data b<ETX>cc^@<SOH>] >[data a<STX>data b<ETX>cc^@<NUL>] > To get what you want, simply not to evaluate the pattern to null string. You can either 1) change those two lines to $p =~ s/\0/<NUL>/og; or (preferrably) 2) change the first four lines to $NUL = \0; $SOH = \1; $STX = \2; $ETX = \3; (ie, no double quote) and keep rest of the program intact. Both ways worked fine when i tested it on my machine. Hope this helps. >Greg. >-- >Gregory Bond, Burdett Buckeridge & Young Ltd, Melbourne, Australia >Internet: gnb@melba.bby.oz.au non-MX: gnb%melba.bby.oz@uunet.uu.net >Uucp: {uunet,pyramid,ubc-cs,ukc,mcvax,prlb2,nttlab...}!munnari!melba.bby.oz!gnb --jasper ============================ Jasper Wu jwu@kepler.com
gnb@bby.oz.au (Gregory N. Bond) (05/22/91)
>>>>> On 21 May 91 16:51:50 GMT, jwu@kepler.com (Jasper Wu) said:
Jasper> Greg, from perl manpage on s///:
Jasper> ...If the PATTERN evaluates to a null string, the most recent
Jasper> successful regular expression is used instead....
Well, I would have thought that
$var = "";
is a null string, but that
$var = "\0";
is a non-null string with a single nul character. So I would still
class it as a bug.
But I understand the work-around.
Greg.
--
Gregory Bond, Burdett Buckeridge & Young Ltd, Melbourne, Australia
Internet: gnb@melba.bby.oz.au non-MX: gnb%melba.bby.oz@uunet.uu.net
Uucp: {uunet,pyramid,ubc-cs,ukc,mcvax,prlb2,nttlab...}!munnari!melba.bby.oz!gnb
lwall@jpl-devvax.jpl.nasa.gov (Larry Wall) (05/23/91)
In article <GNB.91May22181634@leo.bby.oz.au> gnb@bby.oz.au (Gregory N. Bond) writes: : : >>>>> On 21 May 91 16:51:50 GMT, jwu@kepler.com (Jasper Wu) said: : : : Jasper> Greg, from perl manpage on s///: : : Jasper> ...If the PATTERN evaluates to a null string, the most recent : Jasper> successful regular expression is used instead.... : : : Well, I would have thought that : $var = ""; : is a null string, but that : $var = "\0"; : is a non-null string with a single nul character. So I would still : class it as a bug. Yes, it's a bug. I already fixed it in my copy. One of those little hanger-oners from the bad old days before Perl was 8-bitified. As long as we're on the subject of the next patch, lemme tell you some of the other things that are there. # //g in scalar context has built-in iterator while (/pattern/g) { print "$&\n"; } $matches++ while /foo/g; # //g in array context returns all matches (or all substrings if parens) ($one, $five, $fifteen) = `uptime` =~ /\d+\.\d+/g; %options = /(\w+)=(.*)/g; # //o now optimized to run as fast as compile-time pattern without eval $pattern = shift; while (<>) { print if /$pattern/o; # fastest grep in Perl now } Note, however, that //o optimization still happens after switch optimization, so if you have several matches in a row it's still worthwhile putting them into an eval. In addition, there will be an alternate set of distribution terms, called the "Artistic License". You'll be able to distribute under either the GPL or the new one, your choice. I hope to get the next patch out soon, but I'm trying to finish up another article for Unix World... Larry