dash@fluke.UUCP (Mike Dash) (04/16/84)
you can use the letters in gorilla to make lots of words, of course (rag, rail, grail, log, gal, oil, air...). what i really want to know is how can i ask unix to look for the words? i would like to say foo gorilla, and have foo look for all words which are composed of letters in gorilla, using no other letters, and using no letter twice (except of course for l). is there an easy way to do this? is there any way? i'm no hacker and there- fore have to ask if anyone else can think of a way... -- dash (direct replies to ...decvax!microsof!fluke!dash or fluke!dash@uw-vlsi)
wolit@rabbit.UUCP (Jan Wolitzky) (04/19/84)
The question posed was: how would you write a program to find all the English words that can be made from any given word. For instance, from the letters of "gorilla" you can form "grill," "rail," etc. The problem can be expressed as the combination of two easily solved sub-problems. Consider the original word to be a set containing 'n' distinct elements, one representing each letter in the word (note that we consider the two 'l's to be distinct). The first problem is to form the (2**n)-2 distinct subsets of this set (we are not interested in the empty string, nor in the original string itself). This can be done by a simple combinatorial program. Note that this step does not produce every possible word formable from the original, since not all permutations are produced (for example, from "gorilla" we would produce one word containing 'r', 'a', 'i', and 'l', but we would not get both "rail" and "lair" -- in fact, we might not get either, but "lria" instead). The next problem is to find all the English words that are anagrams of each string produced in the first section. Jon Bentley, in his column in the "Communications of the ACM" entitled "Programming Pearls," addressed this question in the August, September, and October issues last year. The scheme is as follows: Preprocess your dictionary by tagging each word with a string formed from the letters of that word arranged in alphabetical order. This maps words that are anagrams of each other into identical strings. Sort the dictionary based on these tags (this groups anagram classes together). Now do the same for the list of strings generated by the first section (note that none of these strings will be anagrams of each other). (Note, too, that this will already be done if we are careful in the design of the program that generates the substrings.) The task is now simply to go through the alphabetized lists looking for matches. The dictionary processing, of course, need only be done once; the result can be used for future word-generation searchs. The generation of candidate strings takes exponential time, depending on the length of the string, but this scheme is still much more efficient than generating all possible permutations of these strings, thus forming all possible anagrams, sorting this list, and checking against a sorted dictionary. I highly recommend Bentley's column to anyone interested in learning elegant ways to attack thorny problems. Jan Wolitzky, AT&T Bell Labs, Murray Hill, NJ
aaw@pyuxss.UUCP (Aaron Werman) (04/19/84)
-1**.5 word permutation games are boring, unless they lead to insights, so anyone who tries computer generated answers gets bored real soon and quits. Brute force techniques in polynomial time/space should not be used to play games, except by the patient. Other techniques (digram frequency, or other context sensitive heuristics) are not exhaustive. Here is an example of a piddling brute force program which will merrily mutilate /tmp in nonreal time: # sh WORTHLESS ALGOLAGNIAC PROGRAM # DO NOT TRY THIS PROGRAM ON COMPUTER-INSTEAD HAND INTERPRET IT if [ $# != 1 ]; then echo 'Usage: permute <word>'; exit; fi trap "rm $AFILE $BFILE" 0 2 5 IN=/tmp/p$$ OUT=/tmp/q$$ AWKSCRIPT=' { for (i = length; i >= 0; i--) print substr($0,1,i) CHAR substr($0, i + 1) }' expr $1 : '\(.\).*'>$IN for LETTER in `expr $1 : '.\(.*\)'|sed 's/./& /g'` do awk "$AWKSCRIPT" CHAR=$LETTER $IN >$OUT TEMP=$OUT;OUT=$IN;IN=$TEMP done if [ $TEMP = $OUT ] then cat $OUT else cat $IN fi rm $IN $OUT # after hand running the program on the word of your choice, # run the output through a bunch of cut(1)s, sort, uniq (all # by hand, please, computers were not meant for this quantity of I/O) # you will have your combinations. Rots of Ruck {harpo,houxm,ihnp4}!pyuxss!aaw Aaron Werman