usenet@carssdf.UUCP (UseNet Id.) (01/04/90)
I would like to remove pairs of a letter from a string. After I remove spaces & vowels, something like this: $a =~ tr/AEIOU/ /; $a =~ s/ //og; I then would like to remove double letters something like wizzard --> wizard This all goes toward building a key to compare names, addresses, etc... to eliminate duplicates. Does anyone have any ideas? There's probably a more elegant way to remove the vowels and spaces too, for that matter. John Watson ...!rutgers!carssdf!usenet
lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (01/04/90)
In article <229@carssdf.UUCP> usenet@carssdf.UUCP (UseNet Id.) writes:
: I would like to remove pairs of a letter from a string. After I remove
: spaces & vowels, something like this:
: $a =~ tr/AEIOU/ /;
: $a =~ s/ //og;
(The o is unnecessary.)
: I then would like to remove double letters something like
: wizzard --> wizard
: This all goes toward building a key to compare names, addresses, etc... to
: eliminate duplicates.
:
: Does anyone have any ideas? There's probably a more elegant way to remove
: the vowels and spaces too, for that matter.
Yes, use the [] construct and say s/[AEIOU ]//g; or some such.
There are several ways to remove duplicate characters, but the most concise
(and probably the fastest) is to say
$a =~ s/(.)\1/$1/g;
This does have the problem that it doesn't reduce three in a row, but
while ($a =~ s/(.)\1/$1/g) {}
will fix that. You ought to be able to say
$a =~ s/(.)\1+/$1/g;
but you'll get a complaint about "regexp *+ operand could be empty".
Now that I think on it, you can say
$a =~ s/(.)\1\1?\1?/$1/g;
which will translate up to 4 duplicate chars. How thorough do you want
to get?
Larry
merlyn@iwarp.intel.com (Randal Schwartz) (01/05/90)
|> I would like to remove pairs of a letter from a string. After I remove |> spaces & vowels, something like this: |> $a =~ tr/AEIOU/ /; |> $a =~ s/ //og; |> I then would like to remove double letters something like |> wizzard --> wizard |> This all goes toward building a key to compare names, addresses, etc... to |> eliminate duplicates. |> |> Does anyone have any ideas? There's probably a more elegant way to remove |> the vowels and spaces too, for that matter. Well, the spaces and vowels is best done with: $a =~ s/[aeiou ]+//g; And the duplicate letters is then done with: while ($a =~ s/(.)\1/\1/) { ; } Just another Perl hacker, /== Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ====\ | on contract to Intel's iWarp project, Hillsboro, Oregon, USA, Sol III | | merlyn@iwarp.intel.com ...!uunet!iwarp.intel.com!merlyn | \== Cute Quote: "Welcome to Oregon... Home of the California Raisins!" ==/