[comp.lang.perl] Anagram finder and memory question

des0mpw@colman.newcastle.ac.uk (M.P. Ward) (11/10/90)

Here's a little perl script to find anagrams:

#! /usr/bin/perl
# anagram finder
# Usage: anag [file...]

while (<>) {
  chop;
  tr/[A-Z]/[a-z]/;      # convert to lower case
  $sortword = join('',sort(split(//)));
  $anag{$sortword} .= $_ . " ";
}

foreach $words (values %anag) {
  if ($words =~ / ./) { # more than one word in group
    print $words . "\n";
  }
}


Feed it a file of words and it spits out the anagram groups, one group per
line. It seems to chew up a lot of memory though: our /usr/dict/words
has about 24,000 words (about 200k) and the script requires about 5 Meg
of space. Is this a memory leak or simply an artifact of the way
tables work?


Here's another version which uses the unix sort program (and is less hungry
for memory!)

#! /usr/bin/perl
# anagram finder
# Usage: anag [file...]

unless(open(GOODS, "-|")) { 
  # child is here, with its stdout attached to parent's GOODS
  open(OUTPUT, "|sort");
  while (<>) {
    chop;
    tr/A-Z/a-z/;
    print OUTPUT join('',sort(split(//,$_))) . " $_\n";
  }
  exit 0; # child has done its work  
}
# parent reads the GOODS.
$last = "nosuchword";
$line = "";
while($pair = <GOODS>) {
  chop($pair);
  ($code,$word) = split(/ /,$pair);
#print "$last: $code, $word\n";
  if ($code ne $last) {
    print $line,"\n" if ($line =~ /. ./);
    $line = "$word ";
    $last = $code;
  } else {
    $line .= "$code ";
  }
}


I shall leave it to Randell to produce the customary one liner! :-)


			Martin.

JANET: Martin.Ward@uk.ac.durham    Internet (eg US): Martin.Ward@DURHAM.AC.UK
or if that fails:  Martin.Ward%uk.ac.durham@nfsnet-relay.ac.uk  
or even: Martin.Ward%DURHAM.AC.UK@CUNYVM.CUNY.EDU
BITNET: IN%"Martin.Ward@DURHAM.AC.UK" UUCP:...!mcvax!ukc!durham!Martin.Ward