[comp.lang.perl] BUG in interpolating and/or substituting NULs in strings?

gnb@bby.oz.au (Gregory N. Bond) (05/21/91)

Consider this fragment of code, which is extracted from a script I am
writing to parse and interpret a data feed that has the following format:

	SOH
	header
	STX
	data
	ETX
	2 bcc characters
	optional NUL

I was using $/ = SOH to split the lines as the nul was not necessarily
present, so the line ended with ETX, 2 characters, potential NUL, SOH.

I inserted some diagnostics to asci-fy the input data and got random
results.  A much-cutdown version is shown below, with results.  Note
that the string "^@" in the output is a literal NUL character (i.e.
hex 0x0) that I have converted for news.

Script started on Tue May 21 12:14:06 1991
leo% cat t.perl
#! /usr/local/bin/perl

$NUL = "\0";
$SOH = "\1";
$STX = "\2";
$ETX = "\3";

$_ = "data a${STX}data b${ETX}cc${NUL}${SOH}";

	$p = $_;
	$p =~ s/$ETX/<ETX>/og;
	$p =~ s/$STX/<STX>/og;
	$p =~ s/$SOH/<SOH>/og;
	$p =~ s/$NUL/<NUL>/og;
	print "[$p]\n";

	$p = $_;
	$p =~ s/$NUL/<NUL>/og;
	$p =~ s/$SOH/<SOH>/og;
	$p =~ s/$STX/<STX>/og;
	$p =~ s/$ETX/<ETX>/og;
	print "[$p]\n";

leo% perl -v

This is perl, version 4.0

$RCSfile: perl.c,v $$Revision: 4.0.1.1 $$Date: 91/04/11 17:49:05 $
Patch level: 3

Copyright (c) 1989, 1990, 1991, Larry Wall

Perl may be copied only under the terms of the GNU General Public License,
a copy of which can be found with the Perl 4.0 distribution kit.
leo% perl t.perl
[data a<STX>data b<ETX>cc^@<SOH>]
[data a<STX>data b<ETX>cc^@<NUL>]
leo% ^D
script done on Tue May 21 12:14:22 1991

The output should be the same independent of the order in which the
substitutions are done, and the literal NUL should have gone.

Is this a bug or am I out to lunch here?

Environment is Solbourne OS/MP 4.0D (equiv to Sun4 / SunOs 4.0.3) with
the system compiler and all defaults in config.  Also happens with
perl compiled on a Sun 3/60, SunOs 4.0.3.

Greg.
--
Gregory Bond, Burdett Buckeridge & Young Ltd, Melbourne, Australia
Internet: gnb@melba.bby.oz.au    non-MX: gnb%melba.bby.oz@uunet.uu.net
Uucp: {uunet,pyramid,ubc-cs,ukc,mcvax,prlb2,nttlab...}!munnari!melba.bby.oz!gnb

jwu@kepler.com (Jasper Wu) (05/21/91)

Greg, from perl manpage on s///:

	...If the PATTERN evaluates to a null string, the most recent 
	successful regular expression is used instead....

In article <GNB.91May21123118@leo.bby.oz.au> gnb@bby.oz.au (Gregory N. Bond) writes:
>#! /usr/local/bin/perl
>
>$NUL = "\0";
>$SOH = "\1";
>$STX = "\2";
>$ETX = "\3";
>
>$_ = "data a${STX}data b${ETX}cc${NUL}${SOH}";
>
>	$p = $_;
>	$p =~ s/$ETX/<ETX>/og;
>	$p =~ s/$STX/<STX>/og;
>	$p =~ s/$SOH/<SOH>/og;
>	$p =~ s/$NUL/<NUL>/og;
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>	print "[$p]\n";
>
>	$p = $_;
>	$p =~ s/$NUL/<NUL>/og;
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>	$p =~ s/$SOH/<SOH>/og;
>	$p =~ s/$STX/<STX>/og;
>	$p =~ s/$ETX/<ETX>/og;
>	print "[$p]\n";
>

both lines have $NUL evalualtes to null string, and happened to be equivalent 
to
	$p =~ s/$SOH/<SOH>/og;
that's why you got the (undesirable) result.

>leo% perl t.perl
>[data a<STX>data b<ETX>cc^@<SOH>]
>[data a<STX>data b<ETX>cc^@<NUL>]
>


To get what you want, simply not to evaluate the pattern to null string.  
You can either
	1) change those two lines to 	$p =~ s/\0/<NUL>/og;
or (preferrably)
	2) change the first four lines to
		$NUL = \0;
		$SOH = \1;
		$STX = \2;
		$ETX = \3;
	(ie, no double quote) and keep rest of the program intact.

Both ways worked fine when i tested it on my machine.
Hope this helps.


>Greg.
>--
>Gregory Bond, Burdett Buckeridge & Young Ltd, Melbourne, Australia
>Internet: gnb@melba.bby.oz.au    non-MX: gnb%melba.bby.oz@uunet.uu.net
>Uucp: {uunet,pyramid,ubc-cs,ukc,mcvax,prlb2,nttlab...}!munnari!melba.bby.oz!gnb


--jasper
============================
Jasper Wu   jwu@kepler.com

gnb@bby.oz.au (Gregory N. Bond) (05/22/91)

>>>>> On 21 May 91 16:51:50 GMT, jwu@kepler.com (Jasper Wu) said:


Jasper> Greg, from perl manpage on s///:

Jasper> 	...If the PATTERN evaluates to a null string, the most recent 
Jasper> 	successful regular expression is used instead....


Well, I would have thought that
	$var = "";
is a null string, but that
	$var = "\0";
is a non-null string with a single nul character.  So I would still
class it as a bug.

But I understand the work-around.

Greg.

--
Gregory Bond, Burdett Buckeridge & Young Ltd, Melbourne, Australia
Internet: gnb@melba.bby.oz.au    non-MX: gnb%melba.bby.oz@uunet.uu.net
Uucp: {uunet,pyramid,ubc-cs,ukc,mcvax,prlb2,nttlab...}!munnari!melba.bby.oz!gnb

lwall@jpl-devvax.jpl.nasa.gov (Larry Wall) (05/23/91)

In article <GNB.91May22181634@leo.bby.oz.au> gnb@bby.oz.au (Gregory N. Bond) writes:
: 
: >>>>> On 21 May 91 16:51:50 GMT, jwu@kepler.com (Jasper Wu) said:
: 
: 
: Jasper> Greg, from perl manpage on s///:
: 
: Jasper> 	...If the PATTERN evaluates to a null string, the most recent 
: Jasper> 	successful regular expression is used instead....
: 
: 
: Well, I would have thought that
: 	$var = "";
: is a null string, but that
: 	$var = "\0";
: is a non-null string with a single nul character.  So I would still
: class it as a bug.

Yes, it's a bug.  I already fixed it in my copy.  One of those little
hanger-oners from the bad old days before Perl was 8-bitified.

As long as we're on the subject of the next patch, lemme tell you some
of the other things that are there.

	# //g in scalar context has built-in iterator
	while (/pattern/g) { print "$&\n"; }
	$matches++ while /foo/g;

	# //g in array context returns all matches (or all substrings if parens)
	($one, $five, $fifteen) = `uptime` =~ /\d+\.\d+/g;
	%options = /(\w+)=(.*)/g;

	# //o now optimized to run as fast as compile-time pattern without eval
	$pattern = shift;
	while (<>) {
	    print if /$pattern/o;	# fastest grep in Perl now
	}

Note, however, that //o optimization still happens after switch optimization,
so if you have several matches in a row it's still worthwhile putting them
into an eval.

In addition, there will be an alternate set of distribution terms, called
the "Artistic License".  You'll be able to distribute under either the GPL
or the new one, your choice.

I hope to get the next patch out soon, but I'm trying to finish up another
article for Unix World...

Larry