mathas_a@maths.su.oz.au ( Andrew ) (07/10/90)
Any ideas anyone? Andrew ______________________________________________________________ #!/bin/ksh # counts the number of unique words in a file # - unless the -t option is used assumes a TeX input file # - ignores words of length 1 if [ $# = 0 ] then echo "Usage: words [-text] file" else case $1 in -t*) cmd="tr -cs A-Za-z '\012' < $2" ;; *) cmd="prespell < $1.tex | tr -cs A-Za-z '\012'" ;; esac exec $cmd | sort | awk ' { if ( length($1) > 1) { word+=1 repword+=1 lastword = $1 while ( getline && $1 == lastword ) repword+=1 } } END \ { per = int(100*word/repword) printf " %d words %d unique (%d%)\n", repword, word, per }' fi _______________________________________________________________ -- - smile at a stranger today and help make the world a better place; while you're, at it, why not hug a friend!
leo@ehviea.ine.philips.nl (Leo de Wit) (07/18/90)
In article <1990Jul10.120034.10119@metro.ucc.su.OZ.AU> mathas_a@maths.su.oz.au ( Andrew ) writes: | | Any ideas anyone? |Andrew | | |______________________________________________________________ | |#!/bin/ksh |# counts the number of unique words in a file |# - unless the -t option is used assumes a TeX input file |# - ignores words of length 1 | | | if [ $# = 0 ] | then | echo "Usage: words [-text] file" | else | case $1 in | -t*) | cmd="tr -cs A-Za-z '\012' < $2" | ;; | *) | cmd="prespell < $1.tex | tr -cs A-Za-z '\012'" | ;; | esac | | | exec $cmd | sort | | awk ' [rest of script omitted for brevity]... This will not work as it stands. Metacharacters like < and | are interpreted before variable substitution, so in your script they become arguments for the commands tr and prespell respectively. Another problem (an small one) is that the exec on a pipeline has no effect (at least in a Bourne shell; maybe ksh is different?). The first problem is easily solved using the builtin eval command, which roughly speaking does a reparse of the already parsed expression(s) (the expanded $cmd in this case). eval $cmd | sort | awk ' etc. should do the job. If your system has 'uniq' you can probably avoid most of the awk script. You can even avoid the eval altogether in this case by piping from the case command: case $1 in -t*) tr -cs A-Za-z '\012' < $2;; *) prespell < $1.tex | tr -cs A-Za-z '\012';; esac | sort | awk ' etc. Leo.