des0mpw@colman.newcastle.ac.uk (M.P. Ward) (11/10/90)
Here's a little perl script to find anagrams: #! /usr/bin/perl # anagram finder # Usage: anag [file...] while (<>) { chop; tr/[A-Z]/[a-z]/; # convert to lower case $sortword = join('',sort(split(//))); $anag{$sortword} .= $_ . " "; } foreach $words (values %anag) { if ($words =~ / ./) { # more than one word in group print $words . "\n"; } } Feed it a file of words and it spits out the anagram groups, one group per line. It seems to chew up a lot of memory though: our /usr/dict/words has about 24,000 words (about 200k) and the script requires about 5 Meg of space. Is this a memory leak or simply an artifact of the way tables work? Here's another version which uses the unix sort program (and is less hungry for memory!) #! /usr/bin/perl # anagram finder # Usage: anag [file...] unless(open(GOODS, "-|")) { # child is here, with its stdout attached to parent's GOODS open(OUTPUT, "|sort"); while (<>) { chop; tr/A-Z/a-z/; print OUTPUT join('',sort(split(//,$_))) . " $_\n"; } exit 0; # child has done its work } # parent reads the GOODS. $last = "nosuchword"; $line = ""; while($pair = <GOODS>) { chop($pair); ($code,$word) = split(/ /,$pair); #print "$last: $code, $word\n"; if ($code ne $last) { print $line,"\n" if ($line =~ /. ./); $line = "$word "; $last = $code; } else { $line .= "$code "; } } I shall leave it to Randell to produce the customary one liner! :-) Martin. JANET: Martin.Ward@uk.ac.durham Internet (eg US): Martin.Ward@DURHAM.AC.UK or if that fails: Martin.Ward%uk.ac.durham@nfsnet-relay.ac.uk or even: Martin.Ward%DURHAM.AC.UK@CUNYVM.CUNY.EDU BITNET: IN%"Martin.Ward@DURHAM.AC.UK" UUCP:...!mcvax!ukc!durham!Martin.Ward