[comp.lang.perl] pattern matching question

dboles@ccwf.cc.utexas.edu (David Boles) (05/24/91)

Warning: PERL NOVICE approaching !!!

I am trying to do the following file manipulation:

p220 p220   20    5
p220 p235   20    7
etc.

 ==>

p220 -20 -5
p220 20 5
p220 -20 -7
p235 20 7
etc.

I am using:

while (<>) {
    s/(p\d*) (p\d*) (\d*) (\d*)/$1 -$3 -$4\n$2 $3 $4\n/;
    print;
}

and I get:

p220 - -
p220 20 5
p220 - -
p235 20 7
etc.

If I take away the minus signs in the replacement string, I just get:

p220
p220 20 5
p220
p235 20 7
etc.

Why aren't $3 and $4 "alive" in the first half of the replacement
string?  What am I missing?

Thanks!

  David Boles



-- 
-------------------------------------------------------------------------------
David Boles                                       Applied Research Laboratories
dboles@ccwf.cc.utexas.edu                       
apas611@chpc.utexas.edu                      This space for rent, apply within.
-------------------------------------------------------------------------------

sherman@unx.sas.com (Chris Sherman) (05/24/91)

In <49420@ut-emx.uucp> dboles@ccwf.cc.utexas.edu (David Boles) writes:

>Warning: PERL NOVICE approaching !!!

Warning:  PERL NOVICE answering!!!

>I am trying to do the following file manipulation:
>p220 p220   20    5
>p220 p235   20    7
>etc.
> ==>
>p220 -20 -5
>p220 20 5
>p220 -20 -7
>p235 20 7
>etc.
>I am using:
>while (<>) {
>    s/(p\d*) (p\d*) (\d*) (\d*)/$1 -$3 -$4\n$2 $3 $4\n/;
>    print;
>}
>and I get:
>p220 - -
>p220 20 5
>Why aren't $3 and $4 "alive" in the first half of the replacement
>string?  What am I missing?

I think I got it.  Perl is taking your test string literally, space for space.
The synchronization is lost.

I used the following input file with your code:

p220 p220 20 5
p220 p235 20 7

and got:

p220 -20 -5
p220 20 5

p220 -20 -7
p235 20 7


So then I tried the following code:

#!/usr/local/bin/perl
while (<>) {
  s/(p\d*) *(p\d*) *(\d*) *(\d*)/$1 -$3 -$4\n$2 $3 $4\n/;
  print;
}

With the following input:

p220 p220   20   5
p220 p235   20   7

and got 

p220 -20 -5
p220 20 5

p220 -20 -7
p235 20 7

Maybe perl pro's can tell me what the '*'s meant exactly, why they are working,
and if they would work in every case.  (I have my ideas, but they are probably
wrong, and I just got lucky.  I was hoping to set up a one-or-more-number-of
spaces type thing, but I don't think I did that right).
--
Chris Sherman .................... sherman@unx.sas.com   |
              ,-----------------------------------------'
             /  Q:  How many IBM CPU's does it take to execute a job?
            |   A:  Four; three to hold it down, and one to rip its head off.

lamour@gong.mitre.org (Michael Lamoureux) (05/24/91)

In article <sherman.675057031@foster>, sherman@unx.sas.com (Chris Sherman) writes:
|> In <49420@ut-emx.uucp> dboles@ccwf.cc.utexas.edu (David Boles) writes:
|> 
|> >Warning: PERL NOVICE approaching !!!
|> 
|>    Warning:  PERL NOVICE answering!!!

	Ditto.  (But I am avidly reading the book...)

|> >I am using:
|> >while (<>) {
|> >    s/(p\d*) (p\d*) (\d*) (\d*)/$1 -$3 -$4\n$2 $3 $4\n/;
|> >    print;
|> >}
|> >Why aren't $3 and $4 "alive" in the first half of the replacement
|> >string?  What am I missing?
|> 
|> I think I got it.  Perl is taking your test string literally, space
|> for space.

	This is exactly it.

|> #!/usr/local/bin/perl
|> while (<>) {
|>   s/(p\d*) *(p\d*) *(\d*) *(\d*)/$1 -$3 -$4\n$2 $3 $4\n/;
|>   print;
|> }
|> 
|> Maybe perl pro's can tell me what the '*'s meant exactly, why
|> they are working, and if they would work in every case.
|> I was hoping to set up a one-or-more-number-of
|> spaces type thing, but I don't think I did that right).

	Well, your expression tests for 0 or more spaces.  An "*" tests
for 0 or more occurences, a "+" tests for 1 or more.  So using "+"
instead of "*" would fix that, but I think using a "\s" instead of a
" " would be more multi-purpose.  This matches any whitespace, not just
a space.  So I guess it should look like this:

while (<>) {
  s/(p\d+)\s+(p\d+)\s+(\d+)\s+(\d+)/$1 -$3 -$4\n$2 $3 $4\n/;
  print;
}

	This matches the p's only if they have numbers appended as well.
So even better...

while (<>) {
  if (/.*(p\d+)\s+(p\d+)\s+(\d+)\s+(\d+).*/) {
    print "$1 -$3 -$4\n$2 $3 $4\n";
  }
}

	This allows you to put comments or something else in the file
and only prints out the strings which match (and drops typos...you may
want to change this expression a bit and flag errors with an else).  Note
that "." matches any character.

Michael
lamour@mitre.org
Disclaimer:  Perl is addictive ;-)