[comp.lang.perl] need perl help

usenet@carssdf.UUCP (UseNet Id.) (01/04/90)

I would like to remove pairs of a letter from a string.  After I remove
spaces & vowels, something like this:
  $a =~ tr/AEIOU/     /;
  $a =~ s/ //og;
I then would like to remove double letters something like
     wizzard  -->  wizard
This all goes toward building a key to compare names, addresses, etc... to
eliminate duplicates.

Does anyone have any ideas?   There's probably a more elegant way to remove
the vowels and spaces too, for that matter.

John Watson        ...!rutgers!carssdf!usenet

lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (01/04/90)

In article <229@carssdf.UUCP> usenet@carssdf.UUCP (UseNet Id.) writes:
: I would like to remove pairs of a letter from a string.  After I remove
: spaces & vowels, something like this:
:   $a =~ tr/AEIOU/     /;
:   $a =~ s/ //og;

(The o is unnecessary.)

: I then would like to remove double letters something like
:      wizzard  -->  wizard
: This all goes toward building a key to compare names, addresses, etc... to
: eliminate duplicates.
: 
: Does anyone have any ideas?   There's probably a more elegant way to remove
: the vowels and spaces too, for that matter.

Yes, use the [] construct and say s/[AEIOU ]//g; or some such.

There are several ways to remove duplicate characters, but the most concise
(and probably the fastest) is to say

	$a =~ s/(.)\1/$1/g;

This does have the problem that it doesn't reduce three in a row, but

	while ($a =~ s/(.)\1/$1/g) {}

will fix that.  You ought to be able to say

	$a =~ s/(.)\1+/$1/g;

but you'll get a complaint about "regexp *+ operand could be empty".
Now that I think on it, you can say

	$a =~ s/(.)\1\1?\1?/$1/g;

which will translate up to 4 duplicate chars.  How thorough do you want
to get?

Larry

merlyn@iwarp.intel.com (Randal Schwartz) (01/05/90)

|> I would like to remove pairs of a letter from a string.  After I remove
|> spaces & vowels, something like this:
|>   $a =~ tr/AEIOU/     /;
|>   $a =~ s/ //og;
|> I then would like to remove double letters something like
|>      wizzard  -->  wizard
|> This all goes toward building a key to compare names, addresses, etc... to
|> eliminate duplicates.
|> 
|> Does anyone have any ideas?   There's probably a more elegant way to remove
|> the vowels and spaces too, for that matter.

Well, the spaces and vowels is best done with:

	$a =~ s/[aeiou ]+//g;

And the duplicate letters is then done with:

	while ($a =~ s/(.)\1/\1/) { ; }

Just another Perl hacker,

/== Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ====\
| on contract to Intel's iWarp project, Hillsboro, Oregon, USA, Sol III  |
| merlyn@iwarp.intel.com ...!uunet!iwarp.intel.com!merlyn	         |
\== Cute Quote: "Welcome to Oregon... Home of the California Raisins!" ==/