gnb@bby.oz.au (Gregory N. Bond) (05/21/91)
Consider this fragment of code, which is extracted from a script I am writing to parse and interpret a data feed that has the following format: SOH header STX data ETX 2 bcc characters optional NUL I was using $/ = SOH to split the lines as the nul was not necessarily present, so the line ended with ETX, 2 characters, potential NUL, SOH. I inserted some diagnostics to asci-fy the input data and got random results. A much-cutdown version is shown below, with results. Note that the string "^@" in the output is a literal NUL character (i.e. hex 0x0) that I have converted for news. Script started on Tue May 21 12:14:06 1991 leo% cat t.perl #! /usr/local/bin/perl $NUL = "\0"; $SOH = "\1"; $STX = "\2"; $ETX = "\3"; $_ = "data a${STX}data b${ETX}cc${NUL}${SOH}"; $p = $_; $p =~ s/$ETX/<ETX>/og; $p =~ s/$STX/<STX>/og; $p =~ s/$SOH/<SOH>/og; $p =~ s/$NUL/<NUL>/og; print "[$p]\n"; $p = $_; $p =~ s/$NUL/<NUL>/og; $p =~ s/$SOH/<SOH>/og; $p =~ s/$STX/<STX>/og; $p =~ s/$ETX/<ETX>/og; print "[$p]\n"; leo% perl -v This is perl, version 4.0 $RCSfile: perl.c,v $$Revision: 4.0.1.1 $$Date: 91/04/11 17:49:05 $ Patch level: 3 Copyright (c) 1989, 1990, 1991, Larry Wall Perl may be copied only under the terms of the GNU General Public License, a copy of which can be found with the Perl 4.0 distribution kit. leo% perl t.perl [data a<STX>data b<ETX>cc^@<SOH>] [data a<STX>data b<ETX>cc^@<NUL>] leo% ^D script done on Tue May 21 12:14:22 1991 The output should be the same independent of the order in which the substitutions are done, and the literal NUL should have gone. Is this a bug or am I out to lunch here? Environment is Solbourne OS/MP 4.0D (equiv to Sun4 / SunOs 4.0.3) with the system compiler and all defaults in config. Also happens with perl compiled on a Sun 3/60, SunOs 4.0.3. Greg. -- Gregory Bond, Burdett Buckeridge & Young Ltd, Melbourne, Australia Internet: gnb@melba.bby.oz.au non-MX: gnb%melba.bby.oz@uunet.uu.net Uucp: {uunet,pyramid,ubc-cs,ukc,mcvax,prlb2,nttlab...}!munnari!melba.bby.oz!gnb
jwu@kepler.com (Jasper Wu) (05/21/91)
Greg, from perl manpage on s///: ...If the PATTERN evaluates to a null string, the most recent successful regular expression is used instead.... In article <GNB.91May21123118@leo.bby.oz.au> gnb@bby.oz.au (Gregory N. Bond) writes: >#! /usr/local/bin/perl > >$NUL = "\0"; >$SOH = "\1"; >$STX = "\2"; >$ETX = "\3"; > >$_ = "data a${STX}data b${ETX}cc${NUL}${SOH}"; > > $p = $_; > $p =~ s/$ETX/<ETX>/og; > $p =~ s/$STX/<STX>/og; > $p =~ s/$SOH/<SOH>/og; > $p =~ s/$NUL/<NUL>/og; ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > print "[$p]\n"; > > $p = $_; > $p =~ s/$NUL/<NUL>/og; ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > $p =~ s/$SOH/<SOH>/og; > $p =~ s/$STX/<STX>/og; > $p =~ s/$ETX/<ETX>/og; > print "[$p]\n"; > both lines have $NUL evalualtes to null string, and happened to be equivalent to $p =~ s/$SOH/<SOH>/og; that's why you got the (undesirable) result. >leo% perl t.perl >[data a<STX>data b<ETX>cc^@<SOH>] >[data a<STX>data b<ETX>cc^@<NUL>] > To get what you want, simply not to evaluate the pattern to null string. You can either 1) change those two lines to $p =~ s/\0/<NUL>/og; or (preferrably) 2) change the first four lines to $NUL = \0; $SOH = \1; $STX = \2; $ETX = \3; (ie, no double quote) and keep rest of the program intact. Both ways worked fine when i tested it on my machine. Hope this helps. >Greg. >-- >Gregory Bond, Burdett Buckeridge & Young Ltd, Melbourne, Australia >Internet: gnb@melba.bby.oz.au non-MX: gnb%melba.bby.oz@uunet.uu.net >Uucp: {uunet,pyramid,ubc-cs,ukc,mcvax,prlb2,nttlab...}!munnari!melba.bby.oz!gnb --jasper ============================ Jasper Wu jwu@kepler.com