[comp.lang.perl] Examples of associative arrays

des0mpw@colman.newcastle.ac.uk (M.P. Ward) (02/08/91)

Some time ago I presented the results of using perl on a wordlist to look
for various `interesting' words and was mildly chastised for not giving
the code. I think it makes an interesting exercise, especially illustrating
the use of associative arrays. So here they are:

Some things you can do with grep/egrep since you are just matching each
word against a regular expression eg:

Words with two equal components:
egrep '^(.*)\1$' wordlist

One each of all vowels (including y) in order:
(I have split this line)
grep '^[^aeiouy]*a[^aeiouy]*e[^aeiouy]*\
i[^aeiouy]*o[^aeiouy]*u[^aeiouy]*y[^aeiouy]*$'

But most require more powerful tools - and perl has them all:
Note the idiom: join('',sort(split(//,$word))) which returns a string
containing all the characters in $word sorted in order:

------------------------------------
# find words with letters in alphabetical order
# Usage: alpha file
# Only interested in fairly long words (at least 6 letters):
open(WORDS,"grep '......' $ARGV[0] |");
while ($word = <WORDS>) {
  chop($word);                  # remove the newline character
  # print word if word = sortword:
  print "$word\n" if ($word eq join('',sort(split(//,$word))));
}
-----------------------------------
# Find all anagram words
# Usage: anagrams wordlist
open(WORDS,"$ARGV[0]");
while ($word = <WORDS>) {
  chop($word);
  # record each word under its sorted letters:
  $sortword = join('',sort(split(//,$word)));
  $words{$sortword} .= "$word ";
}
foreach $sortword (keys %words) {
  # if a set of sorted letters has two or more words stored then they
  # are anagrams, so print them:
  $list = $words{$sortword};
  # count no. of spaces in list (== no of words):
  if (($list =~ tr/ / /) >= 2) {
    chop($list);
    print "$list\n";
  }
}
-----------------------------------
# single anagram finder
# Usage: anag letters wordlist
$input = join('',sort(split(//,$ARGV[0])));
$l = length($input);
# pick words with all letters in given word:
open(WORDS,"grep '^[$input]*\$' $ARGV[1] |");
while (<WORDS>) {
  chop;
  next if (length != $l);       # must have right length
  $sortword = join('',sort(split(//)));
  if ($input eq $sortword) {
    print "$_\n";
  }
}
-----------------------------------
# find the last few words in an anagram dictionary:
# Usage: last-anag file
# Only interested in words which don't contain 
# `early' letters of the alphabet:
open(WORDS,"grep -v '[a-s]' $ARGV[0] |");
while ($word = <WORDS>) {
  chop ($word);
  # store word under its sorted letters:
  $words{join('',sort(split(//,$word)))} .= $word . " ";
}
# print results sorted:
foreach $sortword (sort(keys %words)) {
  print "$sortword: $words{$sortword}\n";
}
----------------------------------- 
# find words with no letters repeated
# Usage: no-repeats file
# Only interested in fairly long words (at least 12 letters):
open(WORDS,"grep '............' $ARGV[0] |");
while ($word = <WORDS>) {
  chop($word);
  $sortword = join('',sort(split(//,$word)));
  print "$word\n" unless ($sortword =~ /(.)\1/);
}
-----------------------------------
# ``reversable words'' finder
# Usage: revword file
# Only interested in fairly long words (at least 6 letters):
open(WORDS,"grep '......' $ARGV[0] |");
while (<WORDS>) {
  chop;
  # print word if reversed word has been found:
  print "$_\n" if ($words{reverse} == 1);
  $words{$_} = 1;
  # Reverse the order of the last two lines if you want to list
  # palindrome words as well.
}
-----------------------------------


I hope you find this of some interest/amusement.

Finally I noticed that the anagrams program (which lists all anagrams in
the word list) eventually required over 17 Mbytes of memory to analyse a
650k wordlist. This seems to be rather excessive since each word is stored
at most twice (as a key and as an array entry). 

                        Martin.

print "Just another ";open(P,"/usr/dict/words");@_= <P>; 
grep(((/^Perl.$/)&&(tr/P/p/,chop,chop,print)),@_);print " hacker";

JANET: Martin.Ward@uk.ac.durham    Internet (eg US): Martin.Ward@DURHAM.AC.UK
or if that fails:  Martin.Ward%uk.ac.durham@nfsnet-relay.ac.uk  
or even: Martin.Ward%DURHAM.AC.UK@CUNYVM.CUNY.EDU
BITNET: IN%"Martin.Ward@DURHAM.AC.UK" UUCP:...!mcvax!ukc!durham!Martin.Ward