bamford@randvax.ARPA (Cliff Bamford) (03/10/84)
This line is for the line gobbler. Can anyone out there in netland point me to information on Zipf's law? As I understand it, Zipf sez that for a wide range of naturally-occuring distributions (like the letters in english text), the product (cardinal rank on distribution (eg "e" = 1)) * (number of occurances) is approximately a constant. I'm interested in applications to fast lookup of names of human beans. Any suggestions will be appreciated. Cliff Bamford ARPA: bamford@rand-unix UUCP: {decvax, sdcrdcf, vortex}!randvax!bamford
peters@cubsvax.UUCP (03/14/84)
I decided to post this to the net instead of by mail to Cliff because others might find it of interest as well. The key reference to Zipf's Law is George King(s?)ly Zipf's book, "Human Behavior(?) and the Principle of Least Effort." Zipf was a Harvard linguist who loved gathering statistics. He found that many, many distributions -- such as of words, letters, phonemes, populations of cities, etc. -- are linear when plotted on a "rank-frequency diagram," in which the abscissa is the rank (i. e., the ordering, where 1 is the most commonly occurring word, or the most pupulous city...) and the ordinate is the log of the frequency (how many occurrences of the word, number of cities of that population...). I believe the slope also tends to -1. He had rather eccentric explanations for all this, all based rather hazily on some kind of "principle of least effort," so that he was trying to to put human behavior on a physics-like basis, at least statistically speaking. I could go on... but won't, unless there's interest. By the way, by looking up Zipf's book in Science Citation Index, you should be able to find many recent references. {philabs,cmcl2!rocky2}!cubsvax!peters Peter S. Shenkin Dept of Biol. Sci.; Columbia Univ.; New York, N. Y. 10027; 212-280-5517