[comp.sources.bugs] the adventure of sed to perl conversion

tietz@gmdzi.UUCP (Christoph Tietz) (09/12/88)

For the last few hours I tried to convert my sed scripts for the
postprocessing of automatically generated makefiles to perl scripts.
I use perl 2.0 patchlevel 14. In principle perl really is a GREAT tool
and I think I will rewrite a lot of my shell scripts to perl scripts.
Nevertheless the conversion of my sed scripts to perl led to some problems
that I want to state here. Perhaps some kind guru can enlighten me on
the nature of my problems ?

A problem with the different notion of line ends in sed and perl could
easily be solved. s2p translates the sed expression:

	#   second case: no '\' at line end => append '\' to line end and
	#		 insert .sym dependency as new line without '\'
	#		 at the end
	#
	s/^\(\([^ \	]*\)_Dummy\.o:.*[^\\]\)$/\1 \\\
	\	\	\2\.sym/

to:

	s/^(([^ \	]*)_Dummy\.o:.*[^\\])$/$1 \\\n\	\	$2\.sym/;

This is not what was intended, because '$1' in the perl script contains the
whole line including the newline character and the match perl performs
allows a backslash at the end of the line because the newline character is
matched against [^\\]. Because of this the perl substitution just inserts
a new line that contains a space and the backslash even if the matching
line ends with '\'. I had to change the perl command to:

    if (/^([^ \	]*)_Dummy\.o:.*[^\\]\n$/) {
       chop; $_ .= "\\\n";
       $atext .= "\	\	$1.sym \n";
    }
    # print $_ and $atext

and it worked. My first question: Why is the '\n' before the line end '$'
neccessary ? What sense makes the existence of '$' if I have to use '\n' to
anchor a match at the line end ? What possibility do I have to end a line
other than using '\n' as the delimiter ?

The next problem took me more time to solve. s2p translates:

		/^[^ ]*\.out:/,/^[ ]*$/d

to the perl command:

	if (/^[^ ]*\.out:/ .. /^[ ]*$/) {
	   # skip this input line
	}

The sed script is intended to clean up a makefile that contains the lines:

SIMCore_Dummy.out: SIMCore_Dummy.o UserCore.o /users/susi/vaxlib/Strings.o 
		$(M2C) -e SIMCore_Dummy -o \
		SIMCore_Dummy.out $(M2FLAGS) $(M2LINK) 

objects: 	Alias.sym Alias.o CoreTool.sym CoreTool.o Env.sym Env.o \

The sed script erases the lines from "SIMCore_Dummy.out:" up to the line
before "objects:". The perl script erases the whole file following
"SIMCore_Dummy.out:". The end of the range is never found. If the range
expression is evaluated before the 'IF' statement everything works fine:

    $gotcha = /^[^ ]*\.out:/ .. /^[ ]*$/;
    if ($gotcha) { 
       # skip this input line
    }

does exactly what I wanted it to do. My second question: Is this a bug or
am I missing some semantical details?

Thank you in advance for all responses.

    Christoph Tietz          (    tietz@gmdzi.uucp     )
                             ( or tietz@zix.gmd.dbp.de )

lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (09/14/88)

In article <836@gmdzi.UUCP> tietz@gmdzi.UUCP (Christoph Tietz) writes:
: A problem with the different notion of line ends in sed and perl could
: easily be solved. s2p translates the sed expression:
: 
: 	#   second case: no '\' at line end => append '\' to line end and
: 	#		 insert .sym dependency as new line without '\'
: 	#		 at the end
: 	#
: 	s/^\(\([^ \	]*\)_Dummy\.o:.*[^\\]\)$/\1 \\\
: 	\	\	\2\.sym/
: 
: to:
: 
: 	s/^(([^ \	]*)_Dummy\.o:.*[^\\])$/$1 \\\n\	\	$2\.sym/;
: 
: This is not what was intended, because '$1' in the perl script contains the
: whole line including the newline character and the match perl performs
: allows a backslash at the end of the line because the newline character is
: matched against [^\\]. Because of this the perl substitution just inserts
: a new line that contains a space and the backslash even if the matching
: line ends with '\'. I had to change the perl command to:
: 
:     if (/^([^ \	]*)_Dummy\.o:.*[^\\]\n$/) {
:        chop; $_ .= "\\\n";
:        $atext .= "\	\	$1.sym \n";
:     }
:     # print $_ and $atext
: 
: and it worked. My first question: Why is the '\n' before the line end '$'
: neccessary ? 

It isn't strictly necessary.  $ will match either before the \n or at the
end of the string (after the \n, in this case).  I wrote s2p before I put
the "chop" operator into perl, so I traded off an occasional error with
end of line processing for greatly increased speed.  Now that "chop" exists
I should probably rewrite s2p to use it.  At least as an option.  It's
still a little faster to leave the \n on if we can get away with it.

: What sense makes the existence of '$' if I have to use '\n' to
: anchor a match at the line end ?

Ordinarily you don't have to anchor it with a \n, since most things you put
into a regular expression don't match a newline.  A negated character class
is an exception, unfortunately.  S2p should be smarter about negated
character classes, I suppose.  The \s (whitespace) will also match \n.
A dot (.) specifically does NOT match \n.

: What possibility do I have to end a line
: other than using '\n' as the delimiter ?

You can set your input line delimiter to any character you choose,
and your input line will end with that.  The primary reason perl doesn't
strip it off on input is so that "while (<INFILE>)" always evaluates the
input line as true, even if it's a blank line.

: The next problem took me more time to solve. s2p translates:
: 
: 		/^[^ ]*\.out:/,/^[ ]*$/d
: 
: to the perl command:
: 
: 	if (/^[^ ]*\.out:/ .. /^[ ]*$/) {
: 	   # skip this input line
: 	}
: 
: The sed script is intended to clean up a makefile that contains the lines:
: 
: SIMCore_Dummy.out: SIMCore_Dummy.o UserCore.o /users/susi/vaxlib/Strings.o 
: 		$(M2C) -e SIMCore_Dummy -o \
: 		SIMCore_Dummy.out $(M2FLAGS) $(M2LINK) 
: 
: objects: 	Alias.sym Alias.o CoreTool.sym CoreTool.o Env.sym Env.o \
: 
: The sed script erases the lines from "SIMCore_Dummy.out:" up to the line
: before "objects:". The perl script erases the whole file following
: "SIMCore_Dummy.out:". The end of the range is never found. If the range
: expression is evaluated before the 'IF' statement everything works fine:
: 
:     $gotcha = /^[^ ]*\.out:/ .. /^[ ]*$/;
:     if ($gotcha) { 
:        # skip this input line
:     }
: 
: does exactly what I wanted it to do. My second question: Is this a bug or
: am I missing some semantical details?

This looks like a real bug, probably brought about by trying to optimize
the conditional.  I'll have to glare at it some.

Larry Wall
lwall@jpl-devvax.jpl.nasa.gov