[bionet.molbio.genome-program] Combining 2-point data

dcurtis@crc.ac.uk (Dr. David Curtis) (02/26/91)

The problem is simple: how much evidence can we adduce for the regional
localisation of e.g.  a disease gene based on linkage data using a
number of other markers? This must be commonly addressed, but I don't
seem to be having much luck with the literature. 

One approach: do a full multipoint analysis e.g. with LINKMAP to provide
a location score. This is limited with polymorphic markers and large
families to say three markers + the disease gene. What can I do with the
data from the other nearby markers which I cannot include in the
analysis? Or do lots of multipoints using different subsets of markers -
but each time I am only using some of the available information, and
anyway which of the multipoints should I "trust" to be giving the
"correct" answer?

Other approaches would in general look at some way to combine all the
two-point data, but which is the best way and what is the strength of
the evidence thus obtained? I looked at the paper by Olson and Boehnke
(Am J Hum Genet 47:470-482, 1990) comparing different algorithms to
order the markers, but this did not really tell me much about the degree
of evidence that the disease gene was or was not linked to a group of
markers known to be linked to each other.

Morton and Andrews, in their paper on MAP (Ann Hum Genet 53:263-269,
1989), describe how they order loci and then say that "Global support
for a locus expresses the evidence on chromosome assignment as sigma
Z(thetaE) where thetaE is the recombination rate expected between the
locus and another marker on the chromosome, and the summation is over
all syntenic markers.  Although the lods from the same data set are
dependent, this has remarkably little effect on significance levels."
This sounds like exactly what I need, except I cannot believe that it is
correct.  If I understand what they are saying it is that having found
the best position for the markers and disease locus I can calculate
global support for the disease locus being linked to the other markers
by summing the lod scores of the disease locus with each of the other
markers at the distance which separates them on the new map.  It seems
to me that this could easily give a large overestimation of the global
support, and I think their second sentence is incorrect - I think that
the fact that the data are dependent could have a large effect on
signicance levels.

I believe Edwards mentioned this point in a letter to Nature in 1989
(although as he was writing in the context of a multipoint analysis I
did not agree with him entirely). If we had a small family and a highly
informative (in fact say completely informative) marker which gave a
small positive lod score with a disease at a certain distance, then we
would expect that if we studied another extremely informative marker in
the same family which was tightly linked to the first (in fact say
another polymorphism of the first) then we would get the same positive
lod score at the same distance. So we know that studying the new
polymorphism would (if I have understood Morton's approach correctly)
double the lod score. But clearly we have not doubled our evidence in
favour of linkage, which still stands where it was before.

Intuitively, the closer linked and more informative are two markers, the
less independent information one gives over and above the information
from the first. What are people's views on this subject - as I say it
must be a common problem. How can we best utilise data from all markers
to judge the extent of evidence in favour of disease gene being located
approximately in a particular region?

Dave Curtis

Academic Department of Psychiatry,    Janet:       dc@UK.AC.UCL.SM.PSYCH
Middlesex Hospital,                   Elsewhere:   dc@PSYCH.SM.UCL.AC.UK
Mortimer Street, London W1N 8AA.      EARN/Bitnet: dc%PSYCH.SM.UCL@UKACRL
Tel 071-636 8333 Fax 071-323 1459     Usenet: ...!mcsun!ukc!mrccrc!D.Curtis