sahayman@iuvax.cs.indiana.edu (Steve Hayman) (12/13/90)
Have you ever wondered what the longest word is that can be spelled with consecutive Unix commands? (i.e. "fingertip" = "finger" + "tip") You have? Well, stop worrying. Here's a script that will find them by seeing which words in /usr/dict/words can be spelled via combinations of commands in /bin:/usr/bin:/usr/ucb ok ok it's a dumb script, but how many of you knew that "testicular" can be spelled with standard Ultrix commands? I ran this on an Ultrix system and the longest words produced were watershed prescript fingertip predicate extricate flintlock printmake collinear manometric manuscript communique testicular clearheaded fingerprint don't waste too much time running this ... #!/usr/bin/perl # unixword # find the words in /usr/dict/words that can be constructed # out of unix commands. sort by length. # when I run this on our ultrix machine, the longest words I get are # manometric # manuscript # communique # testicular # clearheaded # fingerprint # # OK it's a silly script. don't waste too much time running it. # (it may take a few minutes) # steve hayman # dec 12/1990 @dirs = ( '/bin', '/usr/bin', '/usr/ucb'); $wordlist = '/usr/dict/words'; # step 1: get a list of file names in the various directories foreach $dir ( @dirs ) { opendir(DIR, $dir) || die "Can't opendir $dir: $!"; push(@files, readdir(DIR)); close(DIR); } # step 2: protect metacharacters like '.' or '[' which # can occur in the file names foreach $f ( @files ) { $f =~ s/[.[]/\\$&/g; } # step 3: construct a suitable regular expression matching # all these filenames $re = '^(' . join("|", @files) . ')+$' ; # step 4: match the dictionary file against this pattern; store words that # match the pattern - assoc. array indexed by word, containing word len. open(DICT, $wordlist) || die "Can't open $wordlist: $!"; while ( <DICT> ) { chop; $len{$_} = length if /$re/io; } # step 5: print word list in order of length foreach $word ( sort lengthwise keys %len ) { print "$word\n"; } sub lengthwise { $len{$a} - $len{$b}; }
tchrist@convex.COM (Tom Christiansen) (12/13/90)
sahayman@iuvax.cs.indiana.edu (Steve Hayman) writes:
:I ran this on an Ultrix system and the longest words
:produced were
: watershed
: prescript
: fingertip
: predicate
: extricate
: flintlock
: printmake
: collinear
: manometric
: manuscript
: communique
: testicular
: clearheaded
: fingerprint
:don't waste too much time running this ...
Neat.
But don't you know that telling someone not to something is the best way
to get it to happen? :-)
This one took ~600 CPU seconds on a 2.5 megabyte dictionary.
--tom
<<<8 CHARS>>>
arcuated
arvicole
asellate
assorted
calendar
calfkill
Colville
diffused
dullhead
dulseman
educated
educatee
errorful
excalate
excudate
expanded
extrared
exuviate
eyestalk
farewell
fattrels
feedhead
fingered
flathead
flatware
flatweed
flockman
Fulfulde
headlock
headwall
headwear
hostname
indented
indentee
killcalf
killweed
makefile
mandatee
Manville
morefold
prateful
predwell
preprint
preshare
presumed
pretreat
producal
revulsed
shadbush
shareman
shearman
shelfful
shellful
shellman
sleepful
spellful
sufflate
suffused
tailhead
tartrate
ultrared
unmassed
unmeated
unmudded
unmulled
untalked
viduated
wellhead
9 CHARS:
clearcole
clearweed
commodate
compacted
cucullate
exululate
fatheaded
fingertip
flintlock
manducate
preassume
predefeat
predetail
predetest
prescript
printmake
sheartail
shellhead
shepstare
splittail
strippage
subcellar
sulfatase
tartrated
timeshare
uncompact
unicelled
watershed
windowful
windowman
<<<10 CHARS>>>
astipulate
communique
exsufflate
extipulate
loggerhead
manuscript
prediction
prefearful
preinstall
proflogger
sheepshead
stringsman
tartarated
unexpanded
unicellate
unmodelled
<<<11 CHARS>>>
clearheaded
fingerprint
printscript
splitfinger
subcultrate
uncompacted
unicellular
unmeditated
unmodulated
<<<12 CHARS>>>
killeekillee
loggerheaded
unmedullated
--
Tom Christiansen tchrist@convex.com convex!tchrist
"With a kernel dive, all things are possible, but it sure makes it hard
to look at yourself in the mirror the next morning." -me
maart@cs.vu.nl (Maarten Litmaath) (12/15/90)
In article <77888@iuvax.cs.indiana.edu>, sahayman@iuvax.cs.indiana.edu (Steve Hayman) writes: ) )Have you ever wondered what the longest word is that can be spelled )with consecutive Unix commands? (i.e. "fingertip" = "finger" + "tip") )You have? Well, stop worrying. Here's a script that will find them )by seeing which words in /usr/dict/words can be spelled via )combinations of commands in /bin:/usr/bin:/usr/ucb Nice indeed, Steve! I've changed your script in a few ways, though: - /etc and /usr/etc are now searched too, which leads to the next change - it's checked if an entry is really an executable - double entries (from different directories) are removed and most importantly - it's shown HOW each word can be broken up into UNIX commands! Some words have more than 1 `representation' in the `UNIX vector space'. Example: view vi-e-w I don't have much experience with Perl yet, so my version of the script may be improved too. Here's some output: Ac-ta-e-on Cal-cut-ta Sh-ar-on Sh-e-ld-on Wall-ac-e W-ar-sa-w ac-comm-od-at-e as-sum-e as-tr-id-e clear-head-ed col-line-ar e-du-cat-e e-man-at-e enroll-e-e ex-e-cut-e id-e-at-e man-at-e-e on-e-time pr-e-sum-e pr-e-tty refer-e-e sed-at-e su-cc-e-ed test-at-e time-sh-ar-e tr-e-as-on w-ar-head w-ar-time w-at-e-rsh-ed Here's the new script: --------------------cut here-------------------- #!/usr/local/bin/perl # unixword v2.0 # find the words in /usr/dict/words that can be constructed # out of unix commands. sort alphabetically. # show how each word can be constructed from which commands. # /etc and /usr/etc are now searched too. # # v1.0 by steve hayman, dec 12/1990 # v2.0 by maarten litmaath, dec 15/1990 @dirs = ( '/bin', '/usr/bin', '/usr/ucb', '/etc', '/usr/etc'); $wordlist = '/usr/dict/words'; # step 1: get a list of executables in the various directories # step 2: leave out all entries containing non-alphabetic characters # use an associative array to get rid of duplicate entries foreach $dir ( @dirs ) { opendir(DIR, $dir) || die "Can't opendir $dir: $!"; foreach $f (readdir(DIR)) { $ent = $dir . '/' . $f; if ($f !~ /\W|_|\d/ && -x $ent && ! -d $ent) { $files{$f} = 0; } } close(DIR); } @files = keys(%files); # step 3: construct a suitable regular expression matching # all these filenames $re = '^(' . join("|", @files) . ')+$' ; # step 4: match the dictionary file against this pattern; store words that # match the pattern - assoc. array indexed by word, containing word len. open(DICT, $wordlist) || die "Can't open $wordlist: $!"; while ( <DICT> ) { chop; $len{$_} = length if /$re/io; } # breakup() returns an array of all possible `breakups' of its argument # example for `abcd': # a-b-c-d # a-b-cd # a-bc-d # a-bcd # ab-c-d # ab-cd # abc-d # abcd sub breakup { local($word) = @_; local(@L) = 1 .. length($word) - 1; local(@ans, @sufs, $pre, $prelen, $suf); for $prelen (@L) { $pre = substr($word, 0, $prelen); @sufs = &breakup(substr($word, $prelen)); foreach $suf (@sufs) { push(@ans, $pre . '-' . $suf); } } push(@ans, $word); @ans; } $brkupre = '^(-' . join("|-", @files) . ')+$' ; # step 5: print word list alphabetically, show how each word can be # broken up foreach $word ( sort keys %len ) { @tries = &breakup($word); foreach $try (@tries) { print "$try\n" if "-$try" =~ /$brkupre/io; } } -- In the Bourne shell syntax tabs and spaces are equivalent almost everywhere. The exception: _indented_ here documents. :-( Does anyone remember the famous mistake Makefile-novices often make?