rouben@math9.math.umbc.edu (09/24/90)
How can I count the number of occurrences of a given character in a file? It can be done rather trivially in C, but I wonder if it can also be done using standard unix utilities like awk, sed, tr, wc, etc. The closest I have come to this is the following construction: cat file | tr -c 'A' '' | wc -c which attempts to count the number of occurrences of the character "A" in the file. The "tr" command replaced all characters different from "A" by the null character, then "wc" counts all characterters in its input (unfortunately) also counting the null characters :-( I feel that I am missing something, and that there should be an easy way to count characters a la unix. Any hints? [If it matters, the operating system is ultrix and the shells are sh and csh.] -- Rouben Rostamian Telephone: (301) 455-2458 Department of Mathematics and Statistics e-mail: University of Maryland Baltimore County rostamian@umbc.bitnet Baltimore, MD 21228, U.S.A. rostamian@umbc3.umbc.edu
emv@math.lsa.umich.edu (Edward Vielmetti) (09/24/90)
In article <4002@umbc3.UMBC.EDU> rouben@math9.math.umbc.edu writes:
How can I count the number of occurrences of a given character in a file?
It can be done rather trivially in C, but I wonder if it can also be done
using standard unix utilities like awk, sed, tr, wc, etc.
The closest I have come to this is the following construction:
cat file | tr -c 'A' '' | wc -c
This is what I came up with in perl, after about 15 minutes of digging
in the perl info pages:
cat file | perl -ne '$c += tr/A/A/; if (eof()) {print "$c\n";}'
Going back to the tr man page this one seems to work too:
cat file | tr -cd 'A' | wc -c
I don't see an easy perl equivalent of the "tr -cd" idiom.
--Ed
Edward Vielmetti, U of Michigan math dept <emv@math.lsa.umich.edu>
moderator, comp.archives
ted@nmsu.edu (Ted Dunning) (09/24/90)
i didn't want to answer this one, but In article <EMV.90Sep23181658@picasso.math.lsa.umich.edu> emv@math.lsa.umich.edu (Edward Vielmetti) writes: In article <4002@umbc3.UMBC.EDU> rouben@math9.math.umbc.edu writes: ... cat file | tr -c 'A' '' | wc -c ... ... cat file | tr -cd 'A' | wc -c ed must have been kidding when he left the cat in place, instead of tr -cd 'A' < file | wc -c -- ted@nmsu.edu +---------+ | In this | | style | |__10/6___|
skwu@boulder.Colorado.EDU (WU SHI-KUEI) (09/24/90)
The solution: cat file | tr -c 'A' '' | wc -c is very close. Just change it to: cat file | tr -cd 'A' | wc -c and you'll count nothing but A's.
ror@grassys.bc.ca (Richard O'Rourke) (09/25/90)
In article <4002@umbc3.UMBC.EDU>, rouben@math9.math.umbc.edu writes: > How can I count the number of occurrences of a given character in a file? [ stuff deleted ] > > cat file | tr -c 'A' '' | wc -c > > which attempts to count the number of occurrences of the character "A" > in the file. The "tr" command replaced all characters different from > "A" by the null character, then "wc" counts all characterters in its input > (unfortunately) also counting the null characters :-( I'm not sure that what you think `tr` does in this case is what happens in reality. I respectfully suggest re-reading the tr man page. > > I feel that I am missing something, and that there should be an easy way > to count characters a la unix. Any hints? I did not test this extensively, and I'm sure that it will work only on textual files. Sure to be bettered and or critiqued: # # Count # of 'A' chars in a file # use $0 filename # set `sed 's/[^A]//g /^$/d ' $1 | sed 's/ //g' | wc` echo $3 - $2 | bc # # End of script > > [If it matters, the operating system is ultrix and the shells are sh and csh.] The above seems to work with sh. > > -- > Rouben Rostamian Telephone: (301) 455-2458 > Department of Mathematics and Statistics e-mail: > University of Maryland Baltimore County rostamian@umbc.bitnet > Baltimore, MD 21228, U.S.A. rostamian@umbc3.umbc.edu Richard O'Rourke: (604)438-8249 | Grass Root Systems: 436-1995 UUCP: uunet!van-bc!mplex!grassys!ror | Smart UUCP: ror@grassys.bc.ca ror@grassys.wimsey.bc.ca |
ellis@motcid.UUCP (John T Ellis) (09/25/90)
In article <4002@umbc3.UMBC.EDU> rouben@math9.math.umbc.edu () writes: >How can I count the number of occurrences of a given character in a file? >It can be done rather trivially in C, but I wonder if it can also be done >using standard unix utilities like awk, sed, tr, wc, etc. > >The closest I have come to this is the following construction: > >cat file | tr -c 'A' '' | wc -c > >which attempts to count the number of occurrences of the character "A" >in the file. The "tr" command replaced all characters different from >"A" by the null character, then "wc" counts all characterters in its input >(unfortunately) also counting the null characters :-( [Text Deleted] Try the following: cat file | tr -cs A-Za-z A-Za-z'\012' | sort | unique -c If I understand tr, which is not necessarily true at this ungodly hour of the morning :-), this should take the first occurence of either A-Z or a-z and map it to A-Z or a-z with a line feed. Hence, you get a long list with single characters. Sort it and push it through the unique filter which with the -c option tells you the number of times a character appeared. Note: This will differentiate between A and a. John -- ---------------------------------------------------+---------------------------- Any sufficiently advanced technology | John T. Ellis 708-632-7857 is indistinguishable from magic. :-} | Motorola Cellular | ...uunet!motcid!ellis
dmt@PacBell.COM (Dave Turner) (09/26/90)
In article <4002@umbc3.UMBC.EDU> rouben@math9.math.umbc.edu () writes: >How can I count the number of occurrences of a given character in a file? >It can be done rather trivially in C, but I wonder if it can also be done >using standard unix utilities like awk, sed, tr, wc, etc. > >I feel that I am missing something, and that there should be an easy way >to count characters a la unix. Any hints? The following will count all the occurrences of all character types in a file. Simple modifications could limit it to those of interest. cat file | sed -n -e "s/./&\\ /gp" | sort | uniq -c Note: the first line of output is the number of newline characters. -- Dave Turner 415/823-2001 {att,bellcore,sun,ames,decwrl}!pacbell!dmt
george@hls0.hls.oz (George Turczynski) (09/27/90)
In article <4002@umbc3.UMBC.EDU>, rouben@math9.math.umbc.edu writes: [...Deleted...] > The closest I have come to this is the following construction: > > cat file | tr -c 'A' '' | wc -c > > which attempts to count the number of occurrences of the character "A" > in the file. [...Deleted...] OK, try this: awk -F'A' '{ sum+= (NF-1) } END { print sum }' file The single quotes around the "A" here are only to point out the "A", and aren't really necessary. It simply makes "A" the field separator and adds the number of fields (NF) less one to the total. You will see why the "-1" is necessary if you think about it. On some systems you may have to initialize sum to zero, with:- BEGIN { sum= 0 } > [If it matters, the operating system is ultrix and the shells are sh and csh.] Just for interest's sake, this is under SunOS 4.0.3. I trust this is what you were looking for ! -- George P. J. Turczynski, Computer Systems Engineer. Highland Logic Pty Ltd. ACSnet: george@highland.oz |^^^^^^^^^^^^^^^^^^^^^^^^| Suite 1, 348-354 Argyle St Phone: +61 48 683490 | Witty remarks are as | Moss Vale, NSW. 2577 Fax: +61 48 683474 | hard to come by as is | Australia. --------------------------- space to put them ! ---------------------------
haberman@msi.umn.edu (Joe Habermann) (09/29/90)
george@hls0.hls.oz (George Turczynski) writes: >OK, try this: > awk -F'A' '{ sum+= (NF-1) } END { print sum }' file This is close. Doesn't seem to work when the number of matches = 0, though. In that case NF = 0 and the awk will return -1. How about: awk -F'A' '{ if (NF > 0) sum += (NF-1) } END { print sum }' file Joe Habermann / haberman@msi.umn.edu
bob@wyse.wyse.com (Bob McGowen x4312 dept208) (09/29/90)
In article <1990Sep28.173033.292@msi.umn.edu> haberman@msi.umn.edu (Joe Habermann) writes: >george@hls0.hls.oz (George Turczynski) writes: > ... Deleted examples of awk scripts. ... Original postings on this topic used tr and wc. Following that line I decided to try my hand at a script for counting characters. In the meantime things seem to have moved away from the "simple" solutions into more esoteric (still interesting) ways to solve the problem. Never the less I will present my script for commnent and feed back. The basic design is to take advantage of the tr commands use of regular expressions and provide a tool that will allow the user to count the set of characters named or their inverse. So: chrcnt abc file chrcnt -n abc file will count all occurances of the letters a, b and c followed by a count of all characters that are not a, b or c. This will work with white space as well and handles cases where there are no matches. The use of cat allows you to specify one or more files on the command line or have the script read its standard input. One final note is that if you should want to look for dashes and n's, use n- as the pattern (or --n, if you want). ------------script follows---------- #!/bin/sh case $# in 0) # the following is because cmd aliasing can produce absolute paths CMD=`basename $0` echo "$CMD: usage: $CMD [-n] reg_expression [files...]\n"\ "\twhere -n means not the following pattern characters." >&2 exit 1 ;; 1) # if only one arg it must be the pattern TR_ARGS=-cd pattern="$1" ;; *) # all other cases may or may not have -n as the first arg case $1 in -n) TR_ARGS=-d pattern="$2" shift;shift files="$*" # if only two args, files is null ;; *) TR_ARGS=-cd pattern="$1" shift files="$*" ;; esac ;; esac cat $files | tr $TR_ARGS "$pattern" | wc -c Bob McGowan (standard disclaimer, these are my own ...) Product Support, Wyse Technology, San Jose, CA ..!uunet!wyse!bob bob@wyse.com
mickey@ncst.ernet.in (R Chandrasekar) (10/04/90)
In article <4002@umbc3.UMBC.EDU> rouben@math9.math.umbc.edu () writes: >How can I count the number of occurrences of a given character in a file? >... >The closest I have come to this is the following construction: > >cat file | tr -c 'A' '' | wc -c Try: tr -dc 'A' < file | wc -c tr -dc 'A' deletes all chars which are in the complement set of 'A'. Voila etc etc! Works, but is a 'sumb' way to count chars. Is there a better way? > >Rouben Rostamian Telephone: (301) 455-2458 -- Chandrasekar