des0mpw@colman.newcastle.ac.uk (M.P. Ward) (11/10/90)
Here's a little perl script to find anagrams:
#! /usr/bin/perl
# anagram finder
# Usage: anag [file...]
while (<>) {
chop;
tr/[A-Z]/[a-z]/; # convert to lower case
$sortword = join('',sort(split(//)));
$anag{$sortword} .= $_ . " ";
}
foreach $words (values %anag) {
if ($words =~ / ./) { # more than one word in group
print $words . "\n";
}
}
Feed it a file of words and it spits out the anagram groups, one group per
line. It seems to chew up a lot of memory though: our /usr/dict/words
has about 24,000 words (about 200k) and the script requires about 5 Meg
of space. Is this a memory leak or simply an artifact of the way
tables work?
Here's another version which uses the unix sort program (and is less hungry
for memory!)
#! /usr/bin/perl
# anagram finder
# Usage: anag [file...]
unless(open(GOODS, "-|")) {
# child is here, with its stdout attached to parent's GOODS
open(OUTPUT, "|sort");
while (<>) {
chop;
tr/A-Z/a-z/;
print OUTPUT join('',sort(split(//,$_))) . " $_\n";
}
exit 0; # child has done its work
}
# parent reads the GOODS.
$last = "nosuchword";
$line = "";
while($pair = <GOODS>) {
chop($pair);
($code,$word) = split(/ /,$pair);
#print "$last: $code, $word\n";
if ($code ne $last) {
print $line,"\n" if ($line =~ /. ./);
$line = "$word ";
$last = $code;
} else {
$line .= "$code ";
}
}
I shall leave it to Randell to produce the customary one liner! :-)
Martin.
JANET: Martin.Ward@uk.ac.durham Internet (eg US): Martin.Ward@DURHAM.AC.UK
or if that fails: Martin.Ward%uk.ac.durham@nfsnet-relay.ac.uk
or even: Martin.Ward%DURHAM.AC.UK@CUNYVM.CUNY.EDU
BITNET: IN%"Martin.Ward@DURHAM.AC.UK" UUCP:...!mcvax!ukc!durham!Martin.Ward