[comp.sources.bugs] BUG in interpolating and/or substituting NULs in strings?

gnb@bby.oz.au (Gregory N. Bond) (05/21/91)

Consider this fragment of code, which is extracted from a script I am
writing to parse and interpret a data feed that has the following format:

	SOH
	header
	STX
	data
	ETX
	2 bcc characters
	optional NUL

I was using $/ = SOH to split the lines as the nul was not necessarily
present, so the line ended with ETX, 2 characters, potential NUL, SOH.

I inserted some diagnostics to asci-fy the input data and got random
results.  A much-cutdown version is shown below, with results.  Note
that the string "^@" in the output is a literal NUL character (i.e.
hex 0x0) that I have converted for news.

Script started on Tue May 21 12:14:06 1991
leo% cat t.perl
#! /usr/local/bin/perl

$NUL = "\0";
$SOH = "\1";
$STX = "\2";
$ETX = "\3";

$_ = "data a${STX}data b${ETX}cc${NUL}${SOH}";

	$p = $_;
	$p =~ s/$ETX/<ETX>/og;
	$p =~ s/$STX/<STX>/og;
	$p =~ s/$SOH/<SOH>/og;
	$p =~ s/$NUL/<NUL>/og;
	print "[$p]\n";

	$p = $_;
	$p =~ s/$NUL/<NUL>/og;
	$p =~ s/$SOH/<SOH>/og;
	$p =~ s/$STX/<STX>/og;
	$p =~ s/$ETX/<ETX>/og;
	print "[$p]\n";

leo% perl -v

This is perl, version 4.0

$RCSfile: perl.c,v $$Revision: 4.0.1.1 $$Date: 91/04/11 17:49:05 $
Patch level: 3

Copyright (c) 1989, 1990, 1991, Larry Wall

Perl may be copied only under the terms of the GNU General Public License,
a copy of which can be found with the Perl 4.0 distribution kit.
leo% perl t.perl
[data a<STX>data b<ETX>cc^@<SOH>]
[data a<STX>data b<ETX>cc^@<NUL>]
leo% ^D
script done on Tue May 21 12:14:22 1991

The output should be the same independent of the order in which the
substitutions are done, and the literal NUL should have gone.

Is this a bug or am I out to lunch here?

Environment is Solbourne OS/MP 4.0D (equiv to Sun4 / SunOs 4.0.3) with
the system compiler and all defaults in config.  Also happens with
perl compiled on a Sun 3/60, SunOs 4.0.3.

Greg.
--
Gregory Bond, Burdett Buckeridge & Young Ltd, Melbourne, Australia
Internet: gnb@melba.bby.oz.au    non-MX: gnb%melba.bby.oz@uunet.uu.net
Uucp: {uunet,pyramid,ubc-cs,ukc,mcvax,prlb2,nttlab...}!munnari!melba.bby.oz!gnb

jwu@kepler.com (Jasper Wu) (05/21/91)

Greg, from perl manpage on s///:

	...If the PATTERN evaluates to a null string, the most recent 
	successful regular expression is used instead....

In article <GNB.91May21123118@leo.bby.oz.au> gnb@bby.oz.au (Gregory N. Bond) writes:
>#! /usr/local/bin/perl
>
>$NUL = "\0";
>$SOH = "\1";
>$STX = "\2";
>$ETX = "\3";
>
>$_ = "data a${STX}data b${ETX}cc${NUL}${SOH}";
>
>	$p = $_;
>	$p =~ s/$ETX/<ETX>/og;
>	$p =~ s/$STX/<STX>/og;
>	$p =~ s/$SOH/<SOH>/og;
>	$p =~ s/$NUL/<NUL>/og;
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>	print "[$p]\n";
>
>	$p = $_;
>	$p =~ s/$NUL/<NUL>/og;
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>	$p =~ s/$SOH/<SOH>/og;
>	$p =~ s/$STX/<STX>/og;
>	$p =~ s/$ETX/<ETX>/og;
>	print "[$p]\n";
>

both lines have $NUL evalualtes to null string, and happened to be equivalent 
to
	$p =~ s/$SOH/<SOH>/og;
that's why you got the (undesirable) result.

>leo% perl t.perl
>[data a<STX>data b<ETX>cc^@<SOH>]
>[data a<STX>data b<ETX>cc^@<NUL>]
>


To get what you want, simply not to evaluate the pattern to null string.  
You can either
	1) change those two lines to 	$p =~ s/\0/<NUL>/og;
or (preferrably)
	2) change the first four lines to
		$NUL = \0;
		$SOH = \1;
		$STX = \2;
		$ETX = \3;
	(ie, no double quote) and keep rest of the program intact.

Both ways worked fine when i tested it on my machine.
Hope this helps.


>Greg.
>--
>Gregory Bond, Burdett Buckeridge & Young Ltd, Melbourne, Australia
>Internet: gnb@melba.bby.oz.au    non-MX: gnb%melba.bby.oz@uunet.uu.net
>Uucp: {uunet,pyramid,ubc-cs,ukc,mcvax,prlb2,nttlab...}!munnari!melba.bby.oz!gnb


--jasper
============================
Jasper Wu   jwu@kepler.com