[net.math] Request for info on Zipf's law

bamford@randvax.ARPA (Cliff Bamford) (03/10/84)

This line is for the line gobbler.

Can anyone out there in netland point me to information on Zipf's law?

As I understand it, Zipf sez that for a wide range of naturally-occuring
distributions (like the letters in english text), the product

   (cardinal rank on distribution (eg "e" = 1)) * (number of occurances)

is approximately a constant. I'm interested in applications to fast lookup
of names of human beans. Any suggestions will be appreciated.

	Cliff Bamford
	ARPA: bamford@rand-unix
	UUCP: {decvax, sdcrdcf, vortex}!randvax!bamford

peters@cubsvax.UUCP (03/14/84)

I decided to post this to the net instead of by mail to Cliff 
because others might find it of interest as well.

The key reference to Zipf's Law is George King(s?)ly Zipf's book,
"Human Behavior(?) and the Principle of Least Effort."  Zipf was
a Harvard linguist who loved gathering statistics.  He found that
many, many distributions -- such as of words, letters, phonemes,
populations of cities, etc. -- are linear when plotted on a 
"rank-frequency diagram," in which the abscissa is the rank
(i. e., the ordering, where 1 is the most commonly occurring word,
or the most pupulous city...) and the ordinate is the log of the
frequency (how many occurrences of the word, number of cities
of that population...).  I believe the slope also tends to -1.

He had rather eccentric explanations for all this, all based rather 
hazily on some kind of "principle of least effort," so that he
was trying to to put human behavior on a physics-like basis, at
least statistically speaking.

I could go on... but won't, unless there's interest.

By the way, by looking up Zipf's book in Science Citation Index,
you should be able to find many recent references.

{philabs,cmcl2!rocky2}!cubsvax!peters            Peter S. Shenkin 
Dept of Biol. Sci.;  Columbia Univ.;  New York, N. Y.  10027;  212-280-5517