[net.puzzle] how many words can you make from gorilla?

dash@fluke.UUCP (Mike Dash) (04/16/84)

you can use the letters in gorilla to make lots of words, of course (rag,
rail, grail, log, gal, oil, air...).  what i really want to know is how
can i ask unix to look for the words?

i would like to say foo gorilla, and have foo look for all words which are
composed of letters in gorilla, using no other letters, and using no letter
twice (except of course for l).

is there an easy way to do this?  is there any way?  i'm no hacker and there-
fore have to ask if anyone else can think of a way...

-- dash (direct replies to ...decvax!microsof!fluke!dash or fluke!dash@uw-vlsi)

wolit@rabbit.UUCP (Jan Wolitzky) (04/19/84)

The question posed was: how would you write a program to find all the
English words that can be made from any given word.  For instance,
from the letters of "gorilla" you can form "grill," "rail," etc.

The problem can be expressed as the combination of two easily solved
sub-problems.  Consider the original word to be a set containing 'n' 
distinct elements, one representing each letter in the word (note that 
we consider the two 'l's to be distinct).  The first problem is to
form the (2**n)-2 distinct subsets of this set (we are not interested in
the empty string, nor in the original string itself).  This can be done 
by a simple combinatorial program.  Note that this step does not
produce every possible word formable from the original, since not all
permutations are produced (for example, from "gorilla" we would
produce one word containing 'r', 'a', 'i', and 'l', but we would not
get both "rail" and "lair" -- in fact, we might not get either, but
"lria" instead).

The next problem is to find all the English words that are anagrams of
each string produced in the first section.  Jon Bentley, in his
column in the "Communications of the ACM" entitled "Programming Pearls,"
addressed this question in the August, September, and October issues
last year.  The scheme is as follows:  Preprocess your dictionary
by tagging each word with a string formed from the letters of that word 
arranged in alphabetical order.  This maps words that are anagrams 
of each other into identical strings.  Sort the dictionary based on
these tags (this groups anagram classes together).  Now do the same
for the list of strings generated by the first section (note that none
of these strings will be anagrams of each other).  (Note, too, that
this will already be done if we are careful in the design of the 
program that generates the substrings.) The task is now simply to go 
through the alphabetized lists looking for matches.

The dictionary processing, of course, need only be done once; the
result can be used for future word-generation searchs.  The generation
of candidate strings takes exponential time, depending on the length
of the string, but this scheme is still much more efficient than generating 
all possible permutations of these strings, thus forming all possible
anagrams, sorting this list, and checking against a sorted dictionary.

I highly recommend Bentley's column to anyone interested in learning
elegant ways to attack thorny problems.

	Jan Wolitzky, AT&T Bell Labs, Murray Hill, NJ

aaw@pyuxss.UUCP (Aaron Werman) (04/19/84)

-1**.5

word permutation games are boring, unless they lead to insights, so
anyone who tries computer generated answers gets bored real soon and
quits. Brute force techniques in polynomial time/space should not be
used to play games, except by the patient. Other techniques (digram
frequency, or other context sensitive heuristics) are not exhaustive.
Here is an example of a piddling brute force program which will
merrily mutilate /tmp in nonreal time:
# sh WORTHLESS ALGOLAGNIAC PROGRAM
# DO NOT TRY THIS PROGRAM ON COMPUTER-INSTEAD HAND INTERPRET IT
if [ $# != 1 ]; then echo 'Usage: permute <word>'; exit; fi
trap "rm $AFILE $BFILE" 0 2 5
IN=/tmp/p$$
OUT=/tmp/q$$
AWKSCRIPT=' { for (i = length; i >= 0; i--)
	print substr($0,1,i) CHAR substr($0, i + 1) }'
expr $1 : '\(.\).*'>$IN
for LETTER in `expr $1 : '.\(.*\)'|sed 's/./& /g'`
  do	awk "$AWKSCRIPT" CHAR=$LETTER $IN >$OUT
	TEMP=$OUT;OUT=$IN;IN=$TEMP
  done
if [ $TEMP = $OUT ]
  then	cat $OUT
  else	cat $IN
fi
rm $IN $OUT

# after hand running the program on the word of your choice,
# run the output through a bunch of cut(1)s, sort, uniq (all
# by hand, please, computers were not meant for this quantity of I/O)
# you will have your combinations. Rots of Ruck
			{harpo,houxm,ihnp4}!pyuxss!aaw
			Aaron Werman