snell@utzoo.uucp (snell) (08/05/89)
(I posted this to sci.math.stat yesterday, and am throwing it in this ring today. I could provide a program in C which produces exactly the same thing as this S program. This group has rather low traffic, but at least the subject is suitable.) I am working with allele frequency data based on electrophresis, and want to test possible departure of my data from the Infinite allele--Constant mutation rate (IC) model. I am trying to generate expected `n' values for various allele frequency intervals using equation (5) from Chakraborty, Fuerst and Nei (Genetics 94: 1039-1063, 1980). One needs only an estimate of heterozygosity and number of genes in the sample (for diploid species, N genes = # individuals X # loci X 2). A U-shaped distribution is expected, with most alleles being either very common or very rare (Chakraborty et. al 1980, Fig. 1). Their equation 5 is printed as follows: nq n ___ ____ \ 1 | | n + 1 - j n(p,q) = M > --- | | ---------- /__ i | | n + M - j i=np+1 j=1 This equation (apparently) was used by Barrowclough, Johnson and Zink (pages 135-154 _In_ Current Ornithology, Vol. 2, Plenum Press, New York, 1985). However, Barrowclough et al. modified it somewhat (presumably in an attempt to clarify it: `n' having originally been used in different places for different things). The equations appear different, as indicated by the differing symbols over the product sign. (I assume, in Chakroborty et al.'s formulation, that the string of products is not really `1 to N'; otherwise it would be a constant.) Barrowclough et al.'s formulation is as follows: Nq i ___ ____ \ 1 | | N + 1 - j n(p,q) = M > --- | | ---------- /__ i | | N + M - j i=Np+1 j=1 where N = individuals * loci * 2 M = H/(1-H) ;H = heterozygosity estimate p = lower boundary of frequency class q = upper boundary of frequency class I have assumed, since Nq and Np are not always integers (for instance, where N = 1100 and p = .05), one would round up or down as appropriate to include the integer values falling within the frequency class. I am unable to interpret what is meant by either formulation in a way that gives a U-shaped expected curve with n(p,q) values similar to those provided by these authors. Below is an S program which (I think) should give expected values acording to Barrowclough et al.'s formula. It does not. I have set it up to calculate expected values for Barrowclough et al's _Sphyrapicus nuchalis_ example (their Fig. 1). From that figure, expected values in the 12 frequency classes are about (it is hard to tell): 2., 1.5, 1.5, 1., 1., 1., 1., 1.5, 1.5, 2., 1.5, and 28. This S program produces a U-shape, but n(p,q) values are all far too small (all less than one, and they sum up to more than one). Modifying this program slightly so that the product string becomes a constant not give the right sort of values either. Any suggestions on how this program might be modified, or on how to correctly calculate these expected values, would be most helpful. This has everyone about here rather stumped. *************************** # Barrowclough et al. (1985: Fig. 1) example, _Sphyrapicus nuchalis_ # (a Sapsucker type woodpecker) data. # Barrowclough's Pi product notation used. H_0.039 # heterozygosity estimate M_H/(1-H) freqs_c(0,.05,.1,.2,.3,.4,.5,.6,.7,.8,.9,.95,1.) N_1110 # number of genes (15 birds * 37 loci * 2) p.q_floor(freqs*N) # `floor' value of limits on frequency classes P_p.q[1:(len(p.q)-1)]+1 # integral np+1 Q_p.q[2:len(p.q)] # integral nq Q[len(Q)]_p.q[len(p.q)] # fixing upper nq value (where q = 1) n.exp_NA # vector for expected n(p,q) values t.x_NA # vector for product terms for (j in 1:N) { # loop for product terms t.x_append(t.x,(N+1-j)/(N+M-j)) # getting product terms } t.x_t.x[2:len(t.x)] # getting rid of the NA for (n in 1:len(P)) { # frequency class loop t.s_NA # vector for sums for (i in P[n]:Q[n]) { # summation loop t.s_append(t.s,(prod(t.x[1:i],T))/i) # getting summation term } t.ss_sum(t.s,T) # the summation n.exp_append(n.exp,M*t.ss) # expected n(p,q) allele frequency--IC model } n.exp_n.exp[2:len(n.exp)] # getting rid of the NA n.exp # list of expected values under IC model *************************** -- Name: Richard Snell Mail: Dept. Zoology, Univ. Toronto, Toronto, Ontario, Canada M5S 1A1 UUCP: uunet!attcan!utzoo!snell BITNET: snell@zoo.utoronto.ca