[comp.lang.perl] recursive &func breaks s/

lawrence@epps.kodak.com (Scott Lawrence) (03/13/91)

I think that I have discovered a problem with substitution and
recursion.  Please someone demonstrate that I am wrong.

I was working on translating simple logical expressions into perl
expressions so that I could eval them.  A simplified version of
my program is included below to demonstrate the problem.

The function Expr uses a substitution operator with both the 'e'
and 'g' modifiers to match each word in an expression, and pass
it to a function (Atom) which returns a perl fragment that could
be evaluated to a boolean value (the evaluation is not important
to the demo, so I left it out).  For simple words, Atom just
wraps the word in a subroutine invocation (as in case 'foo'
below).  If the word has been defined to be a set ( using the
associative array %Set ), Atom calls Expr with the value of the
set, having wrapped it in parens.  The problem occurs when a
member of one set is another set, so that Expr is called a third
time; the remainder of the original call is lost.

-------------- example
demo% ./demo

Case = 'foo'                # This case is simple
Expr = 'foo'                # The expression is just one atom
Atom = 'foo'                # which is not a set
Result =>  &In('foo')       # the result is correct

Case = 'test'               # Slightly more complicated because test
Expr = 'test'               # is defined as a set by:
Atom = 'test'               # $Set{ 'test' } = 'foo & bar';
Expr = '( foo & bar )'      # which works,
Atom = 'foo'                # foo and bar are not sets
Atom = 'bar'                
Result => (  &In('foo')  &  &In('bar')  ) # so the result is correct

Case = 'set'                # 'set' is defined as a set by
Expr = 'set'                # $Set{'set'} = "test | done";
Atom = 'set'                # 
Expr = '( test | done )'    # set is correctly expanded
Atom = 'test'               # but the first atom (test) is also a set
Expr = '( foo & bar )'      # which is correctly expanded here
Atom = 'foo'                # and each atom is expanded ok
Atom = 'bar'                # but result is wrong - Atom is never called
Result => ( (  &In('foo')  &  &In('bar')  ) # for 'done' 

The final result should have been:

Result => ( (  &In('foo')  &  &In('bar')  ) |  &In('done') )
                                            ^^^^^^^^^^^^^^^^- omitted

I can rewrite this so that it doesn't rely on the s///eg to work,
but if it did work it would be much more elegant.  It looks to me
as though I either have a variable scoping problem or the return
stack is getting messed up.  Any suggestions?  Perl source for
demo follows, with my perl version info...

$Header: perly.c,v 3.0.1.10 91/01/11 18:22:48 lwall Locked $
Patch level: 44

SunOS Release 4.1 (GENERIC) #1: Tue Mar 6 17:27:17 PST 1990

----------------- begin demo -----------------
#!/usr/local/bin/perl

$Set{'test'} = "foo & bar";   # one level of substitution
$Set{'set'} = "test | done";  # recursive substitution

@Cases = ( 'foo', 'test', 'set' );

test: while( $_ = shift @Cases )
{
    print "\nCase = '$_'\n";
    $Result = &Expr( $_ );
    print "Result => $Result\n";
}

sub Expr
{
    local( $Expr ) = $_[0];
    print "Expr = '$Expr'\n";

    $Expr =~ s/(\w+)/&Atom($1)/eg; # <<<<<<<<<<<<<<<< 

    return $Expr;
}

sub Atom
{
    local( $Atom ) = $_[0];
    print "Atom = '$Atom'\n";

    return defined $Set{ $Atom } 
           ? &Expr("( $Set{$Atom} )") : " &In('$Atom') ";
}
----------------------- end of demo ---------------------

--
--
Scott Lawrence             <lawrence@epps.kodak.com>  Voice: 508-670-4023
Atex Advanced Publishing Systems                        Fax: 508-670-4033
Atex, Inc; 165 Lexington St. MS 400/165L; Billerica MA 01821

brocher@urz.unibas.ch (Dominic Brocher) (03/13/91)

In article <5128@atexnet.UUCP>, lawrence@epps.kodak.com (Scott Lawrence) writes:
> I think that I have discovered a problem with substitution and
> recursion.  Please someone demonstrate that I am wrong.

I have executed your script on a microVAX 3500 running Ultrix 4.1
with the same version of perl you used:

  This is perl, version 3.0

  $Header: perly.c,v 3.0.1.10 91/01/11 18:22:48 lwall Locked $
  Patch level: 44

I get exactly the same (wrong) result:

  Result => ( (  &In('foo')  &  &In('bar')  ) 

But on a NeXT running NeXT Mach 1.0 and the same version of Perl
I get the right result!

  Result => ( (  &In('foo')  &  &In('bar')  ) |  &In('done')  )

I have compiled Perl myself on both machines from the same source 
code and they both passed all test.  I'd really like to know the
reason for this behaviour (and have a fix for it, of course :-)
 
> --
> Scott Lawrence             <lawrence@epps.kodak.com>  Voice: 508-670-4023
> Atex Advanced Publishing Systems                        Fax: 508-670-4033
> Atex, Inc; 165 Lexington St. MS 400/165L; Billerica MA 01821


-- Dominic

--------------------------------------------------------------------------------
I am not bound to please thee with my answers.     | Dominic Brocher
-- Shylock, in The Merchant of Venice (IV, 1/65)   | brocher@urz.unibas.ch
================================================================================

lwall@jpl-devvax.jpl.nasa.gov (Larry Wall) (03/14/91)

In article <5128@atexnet.UUCP> lawrence@epps.kodak.com (Scott Lawrence) writes:
: I think that I have discovered a problem with substitution and
: recursion.  Please someone demonstrate that I am wrong.

You're right, but you'll be wrong when 4.0 comes out.  :-)

When you do a pattern match, the regular expression routines sometimes
save a copy of the input string so that $1, $&, etc. work right after
the pattern match.  The substitution operator was depending on this
string to stay there so that it could continue the substitution using
the value of $', more or less.  Unfortunately, the recursion clobbered
that temporary value.  The do_subst() routine just needed to make sure
it could restore $' after evaluating the right-hand side, and I figured
out a way to do that by manipulating the pointers, so I don't have to
actually copy the contents of $' around.

Note, however that order of evaluation will still be important.  If you say

	s/(whatever)/&recurse($1) . $1/eg;

The first $1 refers to the $1 from this substitution, while the second $1
refers to the $1 from pattern match done within &recurse.  (I think.)

Larry