gnb@bby.oz.au (Gregory N. Bond) (05/21/91)
Consider this fragment of code, which is extracted from a script I am
writing to parse and interpret a data feed that has the following format:
	SOH
	header
	STX
	data
	ETX
	2 bcc characters
	optional NUL
I was using $/ = SOH to split the lines as the nul was not necessarily
present, so the line ended with ETX, 2 characters, potential NUL, SOH.
I inserted some diagnostics to asci-fy the input data and got random
results.  A much-cutdown version is shown below, with results.  Note
that the string "^@" in the output is a literal NUL character (i.e.
hex 0x0) that I have converted for news.
Script started on Tue May 21 12:14:06 1991
leo% cat t.perl
#! /usr/local/bin/perl
$NUL = "\0";
$SOH = "\1";
$STX = "\2";
$ETX = "\3";
$_ = "data a${STX}data b${ETX}cc${NUL}${SOH}";
	$p = $_;
	$p =~ s/$ETX/<ETX>/og;
	$p =~ s/$STX/<STX>/og;
	$p =~ s/$SOH/<SOH>/og;
	$p =~ s/$NUL/<NUL>/og;
	print "[$p]\n";
	$p = $_;
	$p =~ s/$NUL/<NUL>/og;
	$p =~ s/$SOH/<SOH>/og;
	$p =~ s/$STX/<STX>/og;
	$p =~ s/$ETX/<ETX>/og;
	print "[$p]\n";
leo% perl -v
This is perl, version 4.0
$RCSfile: perl.c,v $$Revision: 4.0.1.1 $$Date: 91/04/11 17:49:05 $
Patch level: 3
Copyright (c) 1989, 1990, 1991, Larry Wall
Perl may be copied only under the terms of the GNU General Public License,
a copy of which can be found with the Perl 4.0 distribution kit.
leo% perl t.perl
[data a<STX>data b<ETX>cc^@<SOH>]
[data a<STX>data b<ETX>cc^@<NUL>]
leo% ^D
script done on Tue May 21 12:14:22 1991
The output should be the same independent of the order in which the
substitutions are done, and the literal NUL should have gone.
Is this a bug or am I out to lunch here?
Environment is Solbourne OS/MP 4.0D (equiv to Sun4 / SunOs 4.0.3) with
the system compiler and all defaults in config.  Also happens with
perl compiled on a Sun 3/60, SunOs 4.0.3.
Greg.
--
Gregory Bond, Burdett Buckeridge & Young Ltd, Melbourne, Australia
Internet: gnb@melba.bby.oz.au    non-MX: gnb%melba.bby.oz@uunet.uu.net
Uucp: {uunet,pyramid,ubc-cs,ukc,mcvax,prlb2,nttlab...}!munnari!melba.bby.oz!gnbjwu@kepler.com (Jasper Wu) (05/21/91)
Greg, from perl manpage on s///: ...If the PATTERN evaluates to a null string, the most recent successful regular expression is used instead.... In article <GNB.91May21123118@leo.bby.oz.au> gnb@bby.oz.au (Gregory N. Bond) writes: >#! /usr/local/bin/perl > >$NUL = "\0"; >$SOH = "\1"; >$STX = "\2"; >$ETX = "\3"; > >$_ = "data a${STX}data b${ETX}cc${NUL}${SOH}"; > > $p = $_; > $p =~ s/$ETX/<ETX>/og; > $p =~ s/$STX/<STX>/og; > $p =~ s/$SOH/<SOH>/og; > $p =~ s/$NUL/<NUL>/og; ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > print "[$p]\n"; > > $p = $_; > $p =~ s/$NUL/<NUL>/og; ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > $p =~ s/$SOH/<SOH>/og; > $p =~ s/$STX/<STX>/og; > $p =~ s/$ETX/<ETX>/og; > print "[$p]\n"; > both lines have $NUL evalualtes to null string, and happened to be equivalent to $p =~ s/$SOH/<SOH>/og; that's why you got the (undesirable) result. >leo% perl t.perl >[data a<STX>data b<ETX>cc^@<SOH>] >[data a<STX>data b<ETX>cc^@<NUL>] > To get what you want, simply not to evaluate the pattern to null string. You can either 1) change those two lines to $p =~ s/\0/<NUL>/og; or (preferrably) 2) change the first four lines to $NUL = \0; $SOH = \1; $STX = \2; $ETX = \3; (ie, no double quote) and keep rest of the program intact. Both ways worked fine when i tested it on my machine. Hope this helps. >Greg. >-- >Gregory Bond, Burdett Buckeridge & Young Ltd, Melbourne, Australia >Internet: gnb@melba.bby.oz.au non-MX: gnb%melba.bby.oz@uunet.uu.net >Uucp: {uunet,pyramid,ubc-cs,ukc,mcvax,prlb2,nttlab...}!munnari!melba.bby.oz!gnb --jasper ============================ Jasper Wu jwu@kepler.com