[sci.bio] information content of DNA

turpin@ut-sally.UUCP (04/01/87)

Whatever the information content of human DNA is, it should NOT
be interpretted as the amount of information required to describe
what a human is. A better interpretation is to view this
information as describing differences between individuals, so
that the "amount of information" determines the potential genetic
variability in the human population. (Even this is a rough cut,
since not all possible values for human DNA are practical
biological values.)

Outside of DNA content, what is the other information that
determines what a human is? The entire environment under which
DNA is an encoding: biochemical "laws" that determine RNA and
protein synthesis, the form of human DNA as opposed to other DNA
(23 chromosome pairs), etc. Consider a computer that on receiving
a one-bit message will either print the Declaration of
Independence or Hobbes' Leviathan. That one bit determines which
is chosen, but does not fully describe either result. 

Russell

johnc@haddock.UUCP (04/04/87)

>			 Consider a computer that on receiving
>a one-bit message will either print the Declaration of
>Independence or Hobbes' Leviathan. That one bit determines which
>is chosen, but does not fully describe either result. 

Red herring time.  The message contains exactly one bit.  The complexity
of the response has nothing whatsoever to do with the size of the message.
The original question was about the information content of human DNA, not
about the DNA's environment, which obviously contains much more information.

One observation I haven't seen yet is the peculiarity of DNA called "reading
frames"  This effectively triples the number of amino-acid sequences a given
chunk of DNA encodes.  Multiply this by two for the complementary strand.

Granted, it is very rare that all six readings actually code for something
in real life.  But this doesn't have much to do with the information content.

I remember back in the early days of computing, a prof in a course held up
a punchcard with lots of holes in it, and another with no holes punched at
all, and asked the class to compare their information contents.  The correct
answer, of course, was that all the punchcards in the house contained exactly
the same amount of information: 12*80 bits.  Some of them contained all 0
bits, but that isn't less information than one with some holes punched out.

I wonder if anyone has ever built a computer with the possibility of multiple
"reading frames".  Consider an 8-bit memory, but a 16-bit instruction size.
If you start executing at address A and at A+1, you get two possibly very
different programs.  Can any real-life processors do this?  It happens
with DNA quite often.

-- 
	John Chambers	(617)247-1155
	...!ima!johnc	
[No, I don't work at cdx39 any more.]

emigh@ecsvax.UUCP (04/04/87)

In article <425@haddock.UUCP> johnc@haddock.ISC.COM.UUCP (John Chambers) writes:
>
>I wonder if anyone has ever built a computer with the possibility of multiple
>"reading frames".  Consider an 8-bit memory, but a 16-bit instruction size.
>If you start executing at address A and at A+1, you get two possibly very
>different programs.  Can any real-life processors do this?  It happens
>with DNA quite often.

When I was still using Microsoft CP/M FORTRAN, I ran across some of their
assembler programming in the FORTRAN libraries that did this.  The 16 bit
load was 1 byte for the instruction and 2 bytes of data.  The 8 bit load
was 1 byte for the instruction and 1 byte of data.  A segment of code
might look like:

Location      1     2     3
If we started executing at 1, it read as
Instructions LD HL Low   High   (Low and High are the 2 bytes of data)
If we started executing at 2, it was
Instructions  xx   LD A  Data

The HL value was always discarded (a slow NOP), and this presumably saved
then a jump at some point.  Needless to say, it was a mess to read and
very poor programming practice.

-- 
Ted H. Emigh     Genetics and Statistics, North Carolina State U, Raleigh  NC
USENET: emigh@ecsvax.uucp		DOMAIN:	emigh%ecsvax.ncecs.edu
ARPA:	ecsvax!emigh@mcnc.org           BITNET: NEMIGH@TUCC
Distribution to monotremes and flightless waterfowl **RESTRICTED**

6065833@pucc.UUCP (04/04/87)

John Chambers writes:

>>a discussion of various aspects of DNA coding, quantity and form.
>...
>One observation I haven't seen yet is the peculiarity of DNA called "reading
>frames"  This effectively triples the number of amino-acid sequences a given
>chunk of DNA encodes.  Multiply this by two for the complementary strand.
>...
>I wonder if anyone has ever built a computer with the possibility of multiple
>"reading frames".  Consider an 8-bit memory, but a 16-bit instruction size.
>If you start executing at address A and at A+1, you get two possibly very
>different programs.

My mother once asked me to explain the basics about computers:  what bits and
bytes are, ect.  She was confused about something;  eventually I figured out
that what she really wanted to know was HOW the computer knows where an 8-bit
word begins and ends, given that there are only 2 "letters" in alphabet, and
no punctuation or spaces allowed.  This is the "reading frame" question from
the other side.  I'm actually quite surprised that she, who has never even
touched a computer to date, came up with such a discerning question.  Most
people who use computers (as opposed to programming them) seem never to think
about this at all (and I've dealt with thousands of such people).

Una Smith   6065833@PUCC

I thought signature files were silly until I realized I usually forget to
identify myself.  No longer.  Please forgive my apparent rudeness.

diaz@aecom.UUCP (04/05/87)

In article <425@haddock.UUCP>, johnc@haddock.UUCP (John Chambers) writes:
> One observation I haven't seen yet is the peculiarity of DNA called "reading
> frames"  This effectively triples the number of amino-acid sequences a given
> chunk of DNA encodes.  Multiply this by two for the complementary strand.
> 
> Granted, it is very rare that all six readings actually code for something
> in real life.  But this doesn't have much to do with the information content.
> 

Forget rare, there ain't no such animal. Although multiple reading
frames have been observed in phage, and transcription from complementary
strands observed in a variety of organisms (most recently, mice) there
is no documented example of anywhere near the six possible readings
coding for functional polypeptides. The implications for molecular
evolution for such a scheme would be disastrous. 

Organisms apparently think little about the advantages of compact
genomes, prokaryotes included. Rather there seems to be something about
having a lot of "junk" DNA that's beneficial, if I may be allowed this
ounce of teleology. Granted, we may one day realize that much satellite
and intron DNA may have functions we don't even dream about today, but I
truly doubt that coding for proteins will be one of them.

-- 
            dn/dx = Dan Diaz    (philabs!aecom!diaz)
            Department of Molecular Biology & Pizza Chemistry AECOM
            "Hold the E.coli"

srp@ethz.UUCP (04/05/87)

In article <425@haddock.UUCP> johnc@haddock.ISC.COM.UUCP (John Chambers) writes:
>
>One observation I haven't seen yet is the peculiarity of DNA called "reading
>frames"  This effectively triples the number of amino-acid sequences a given
>chunk of DNA encodes.  Multiply this by two for the complementary strand.
>
>Granted, it is very rare that all six readings actually code for something
>in real life.  But this doesn't have much to do with the information content.
>
>It happens with DNA quite often.

Whoa... I can think of only a few cases where two reading frames are used
at one time:  PhiX174, Sv40 (both viruses) come to mind.  I donot know of
*any* cases where more than one reading frame is used at once. Can you
elaborate?

I view the multiple reading frame translation as evolutions' way of dealing
with size problems, thus it happens mainly in viruses (which would like to
be small).  Reading frames do overlap, but not for a very long segment.  It
doesn't take long before one reading frames' Alanine is another reading
frames' Stop codon! The only reason this works at any length is because
the genetic code is degenerate.

As for uses in computer land, I think we are again limited by how difficult
it is to have two strands of information running in the same data space.
On top of that there isn't any data degenaracy to work with in computers.

-- 
-----------

Scott Presnell  Swiss Federal Institute of Technology (ETH-Zentrum)
		Department of Organic Chemistry
		Universitaetsstrasse 16
		CH-8092 Zurich Switzerland.

uucp:		...seismo!mcvax!cernvax!ethz!srp     (srp@ethz.uucp)
earn/bitnet:	Benner@CZHETH5A

chiaraviglio@husc2.UUCP (04/06/87)

In article <425@haddock.UUCP>, johnc@haddock.UUCP (John Chambers) writes:
> I wonder if anyone has ever built a computer with the possibility of multiple
> "reading frames".  Consider an 8-bit memory, but a 16-bit instruction size.
> If you start executing at address A and at A+1, you get two possibly very
> different programs.  Can any real-life processors do this?  It happens
> with DNA quite often.

	Actually, you don't need 16-bit instruction size to do this, just an
8-bit instruction which expects some bytes of data immediately following it.
This is the case on most processors (although in some cases the relevant
numbers of bits are 16 and 32 or more); you can observe such an effect even on
a 6502.  The problem on processors is compounded by the fact that they store
states much more than enzymes, so that even if an instruction is at A and the
next one at A + 1, the program may behave very differently depending on the
address at which execution begins.

-- 
	-- Lucius Chiaraviglio
	   lucius@tardis.harvard.edu
	   seismo!tardis.harvard.edu!lucius

Please do not mail replies to me on husc2 (disk quota problems, and mail out
of this system is unreliable).  Please send only to the address given above.

gerryg@laidbak.UUCP (04/07/87)

In article <425@haddock.UUCP> johnc@haddock.ISC.COM.UUCP (John Chambers) writes:
>I wonder if anyone has ever built a computer with the possibility of multiple
>"reading frames".  Consider an 8-bit memory, but a 16-bit instruction size.
>If you start executing at address A and at A+1, you get two possibly very
>different programs.  Can any real-life processors do this?  It happens
>with DNA quite often.

When I first understood that a coded DNA sequence could mean different things
when decoded in different "phases", I thought about this connection to
computer programs and data.  It's not hard to see that this is a characteristic
of the software, not the hardware.  I think of it as a compression technique,
once you have a program or data structure, you can look for ways of
compressing it by folding it back on itself.  That is look for places where
a piece of code or data is duplicated in another, unintended place.  The
original copy can be deleted, and references to it refered to the new location.
Of course you can get more fancy by rearanging or modifying things in a way
that doesn't effect function to create a redundancy that can be deleted.
Its probably not a very productive way to save your computer resources, but
it is interesting to think about.

In the case of DNA, there are several things that I wonder about.  First, how
does the cell keep non-sensical things from getting expressed.  I know there
is a lot of DNA devoted to "control" functions, but its hard to beleive that
every possible reading of a DNA sequence either doesn't get expressed, or is
necessary to (or at least not dangerous for) the organism.  Another thing,
when DNA sequences get rearanged in reproduction, new sequences are
produced and other destroyed by this process.  How can a cell survive this
with its genetic information intact?  Or is it that we never see the mistakes?
And then, the transcription process can't be 100% reliable, but there's a lot
of information in the DNA of most organisms; there must be mistakes.  How
does the organism cope with this?  Is it just redundancy?  Or is there some
kind of repair mechanism that puts an almost right sequence back together?

Well, I'm not a biologist, but it seems to me that these are interesting
questions, and I suspect that we haven't gotten very close to answering
them.

gerry gleason

johnc@haddock.UUCP (04/07/87)

In article <1010@aecom.UUCP> diaz@aecom.UUCP (Dizzy Dan) writes:
>In article <425@haddock.UUCP>, johnc@haddock.UUCP (John Chambers) writes:
>> One observation I haven't seen yet is the peculiarity of DNA called "reading
>> frames"  This effectively triples the number of amino-acid sequences a given
>> chunk of DNA encodes.  Multiply this by two for the complementary strand.
>> Granted, it is very rare that all six readings actually code for something
>> in real life.  But this doesn't have much to do with the information content.
>
>Forget rare, there ain't no such animal. Although multiple reading
>frames have been observed in phage, and transcription from complementary
>strands observed in a variety of organisms (most recently, mice) there
>is no documented example of anywhere near the six possible readings
>coding for functional polypeptides. 

Come now, isn't it a tad early to make such a declaration?  The literature
has on the order of 100 genomes published, all but a handful being viruses.
A couple of real cases of overlapping genes have been discovered; both are
in viruses.  From this you are going to presume to predict that there are
no cases at all in higher organisms?  You have more chutzpa than I.

> The implications for molecular evolution for such a scheme would 
> be disastrous. 

No, they'd only be disastrous if such overlaps were common.  It seems
clear after just a little thought that selection would tend to eliminate
overlapping genes, perhaps replacing them with replicates that can then
mutate independently.  Simple info-theoretic calculations would predict
that such multiply-read stretches of DNA would be rare; they wouldn't
be impossible.  I'll go out on a limb myself, and predict that when
we finally get entire genomes of vertebrate species, we will in fact
find a few overlapping genes.  The frequency will be much lower than
you would expect in a random list of nucleotides, but they will be
found occasionally.

BTW, there is one mitigating factor that allows them slightly more often
than you'd expect:  There are many codings for most of the amino acids.
Of the 64 possible frames, there are only 21 amino acids and a stop code.
This means that it is possible (although difficult) to make slight changes
to one of a pair of overlapping genes without changing the other.  But
it'll still be quite rare.

>Organisms apparently think little about the advantages of compact
>genomes, prokaryotes included. Rather there seems to be something about
>having a lot of "junk" DNA that's beneficial, if I may be allowed this
>ounce of teleology. Granted, we may one day realize that much satellite
>and intron DNA may have functions we don't even dream about today, but I
>truly doubt that coding for proteins will be one of them.

The same thing goes for programs.  Most of my programs contain some junk
that I hope is never used.  I call it "debugging code", and it is turned
on by something like a -D5 option on the command line.  If you don't know
about such an enabling option, you might well look at the code and decide
that it is worthless and can never be executed.  Consider also the use of
#ifdef in C to supply parallel chunks of code that can never be activated
together:
	#ifdef SYS5
	...
	#endif
	#ifdef BSD
	...
	#endif
	#ifdef XENIX
	...
	#endif
How do you know that some of the "junk" DNA isn't like this?

I sorta suspect that my DNA comes loaded down with similar "junk" sequences
that can become enabled by some circumstance that probably won't happen in 
my lifetime, but happened often enough to my ancestors that the stuff passed
the Darwinian tests and got passed on.  The fact that current researchers
can't explain it is interesting to me, but not to my genome.

Note that genetic "diseases" like sickle-cell anemia and diabetes are
already known to be adaptive in certain environments.  If such downright
damaging genes are "adaptive" in some populations, you'd expect a lot of
the apparently-innocuous DNA to also be adaptive somehow.

Also, the literature already contains some descriptions of stretches of
DNA that have regulatory functions rather than coding for amino acids.
Also, sometimes DNA (more often RNA) ends up curling around, interacting
with itself like an enzyme, and modifying its own function.  This could
easily activate stretches that otherwise appear to be dummies.  It's not 
well understood yet, but wait a few more years.

-- 
	John Chambers	(617)247-1155 <...!ima!johnc>
[The above opinions are my own; for a small fee, they can be yours, too.]

eddy@boulder.UUCP (04/08/87)

>Come now, isn't it a tad early to make such a declaration?  The literature
>has on the order of 100 genomes published, all but a handful being viruses.
>A couple of real cases of overlapping genes have been discovered; both are
>in viruses.  From this you are going to presume to predict that there are
>no cases at all in higher organisms?  You have more chutzpa than I.

No, John, the point Dizzy was making was that while cases of overlap exist,
they are 1)very rare  2)very short and 3)only in two of the three reading
frames. The party line is that these cases have evolved because of the
pressure on viral genomes to be as small as possible (the smaller they
are, the faster they replicate). The difficulties in writing overlapping
codes are enormous even for a human who is doing the writing deliberately;
the difficulties for evolution are incredible.

And remember that your original point was that any given sequence 
potentially represents 6 (!) codings, not just two. Dizzy rightly
replied that this is a good approximation to impossible. 

So what I'm trying to get across is that you're right, in theory;
a given nucleotide sequence is capable of coding for 6 different
proteins. The practical consideration is that DNA sequence length
is not a limiting factor for anything except phage. Thus Dizzy is
also right that there is little reason to expect coding region overlaps
in anything but phage.

(Um, just to be safe, the above applies only to overlaps for the purpose
of information compression. Examples exist, I believe, of overlap
for the purpose of regulation.)

>Also, the literature already contains some descriptions of stretches of
>DNA that have regulatory functions rather than coding for amino acids.
>Also, sometimes DNA (more often RNA) ends up curling around, interacting
>with itself like an enzyme, and modifying its own function.  This could
>easily activate stretches that otherwise appear to be dummies.  It's not 
>well understood yet, but wait a few more years.

I know the RNA literature to some degree (considering some of the big
guns in RNA 'ribozymes' are here at Colorado). But I have never heard
of DNA possessing catalytic activity. My impression was that the 2' OH
on RNA was what enabled it to be reactive; DNA ('deoxy') lacks this
2' OH. Could you provide references for catalytic DNA??  --this is not
a flame, I am really interested; there's too much molecular biology to
hope to know it all.

- Sean Eddy
- Dept. of Molecular, Cellular, Developmental Biology
- Univ. of Colorado, Boulder; Boulder, CO 80309
- 
- "Ph.D.'s are for suckers."  -- from 'Ask Mr. Science'

werner@aecom.UUCP (04/09/87)

In article <430@haddock.UUCP>, johnc@haddock.UUCP (John Chambers) writes:
> In article <1010@aecom.UUCP> diaz@aecom.UUCP (Dizzy Dan) writes:
> >In article <425@haddock.UUCP>, johnc@haddock.UUCP (John Chambers) writes:
> >> Granted, it is very rare that all six readings actually code for something
> >
> >Forget rare, there ain't no such animal. Although multiple reading
> >frames have been observed in phage, and transcription from complementary
> >strands observed in a variety of organisms (most recently, mice) there
> >is no documented example of anywhere near the six possible readings
> >coding for functional polypeptides. 
> 
> Come now, isn't it a tad early to make such a declaration? 

	It is quite probable that an organism using all 6 reading
frames in a given DNA sequence will never be found. Two is used
quite often (usually opposite strands, including genes within
an intron, rather than overlapping, which is also stretching the
point).
	Three is only observed once, and that almost doesn't count.
In the splicing of Polyoma T-antigens, the 3' splice site of the
respective introns, fall into 3 reading frames.  Small T-antigen
encounters a stop codon several amino acids later, but large T
goes on for quite some distance.
	The reason that this is cheating is that the first part
(95%, 50%, 30%) of each transcript is absolutely identical -
there is only one promoter.  It is only the 3' end of the genes
that are in all three reading frames.

-- 
			      Craig Werner (MD/PhD '91)
				!philabs!aecom!werner
              (1935-14E Eastchester Rd., Bronx NY 10461, 212-931-2517)
               "Time flies when you're streaking out N. gonorrheae."

ma_jpb@bath63.UUCP (04/09/87)

An effect similar to that of DNA reading frames will have been experienced by
anyone who has tried to write a disassembler. Finding where to start decoding
an arbitrary chunk of code, particularly if the code may include static data is
difficult, even for a machine such as the 6502 with a relatively sparse
instruction set. It is very easy to find chunks of static data that decode for
several instructions as apparently valid code. For machines such as the NS32016
with very densly encoded instruction sets, such that almost any bit sequence is
a valid instruction, symbolic disassembly is exceedingly awkward.

J.P. Bennett

School of Mathematical Sciences
University of Bath
Bath, England, BA2 7AY
Tel:   +44 225 826891
Email: ma_jpb@uk.ac.bath.ux63

howard@cpocd2.UUCP (04/10/87)

In article <891@sigi.Colorado.EDU> eddy@beagle.Colorado.EDU (Sean Eddy) writes:
>No, John, the point Dizzy was making was that while cases of overlap exist,
>they are 1)very rare  2)very short and 3)only in two of the three reading
>frames.

I'm sure there was an article in Sci Am recently about a virus which had a
very short segment of triple overlap.  This makes point 3 false.

>And remember that your original point was that any given sequence 
>potentially represents 6 (!) codings, not just two. Dizzy rightly
>replied that this is a good approximation to impossible. 

Any sequence not containing a terminator does, in some sense, code for 6
proteins.  It would perhaps be more accurate to say that the probability of
all 6 of these proteins being at all functional (or, less likely, actually
produced by an organism) is very close to zero.  The exact probability
is a negative exponential of the sequence length, which we could approximate
via information theory and statistics about protein mutability vs. function.
Anyone have any relevant statistics?
-- 
Copyright (c) 1987 by Howard A. Landman.  You may copy this material for any
non-commercial purpose as long as this notice is retained.  You may also
transmit this material to others and charge for such transmission, as long
as you place no additional restrictions on retransmission of the material
by the recipients.

eddy@boulder.UUCP (04/12/87)

In article <569@cpocd2.UUCP> howard@cpocd2.UUCP (Howard A. Landman) writes:
>In article <891@sigi.Colorado.EDU> eddy@beagle.Colorado.EDU (Sean Eddy) writes:
>>No, John, the point Dizzy was making was that while cases of overlap exist,
>>they are 1)very rare  2)very short and 3)only in two of the three reading
>>frames.
>
>I'm sure there was an article in Sci Am recently about a virus which had a
>very short segment of triple overlap.  This makes point 3 false.

Sorry, I stand corrected. Thanks; I didn't know about the polyoma example,
though I should have. 

I also take back number two. Did a little reading on phiX174 (Nature
264: 34-41, 1976), which is a bacteriophage of E. coli.
Apparently gene E which codes for a host cell lysis
protein is located completely within the coding sequence for gene D,
which is necessary for replication. No small overlap there, we're
talking two complete genes. Pardon me while I extract my foot from
my mouth.

>>And remember that your original point was that any given sequence 
>>potentially represents 6 (!) codings, not just two. Dizzy rightly
>>replied that this is a good approximation to impossible. 
>
>Any sequence not containing a terminator does, in some sense, code for 6
>proteins.  It would perhaps be more accurate to say that the probability of
>all 6 of these proteins being at all functional (or, less likely, actually
>produced by an organism) is very close to zero.  The exact probability
>is a negative exponential of the sequence length, which we could approximate
>via information theory and statistics about protein mutability vs. function.
>Anyone have any relevant statistics?

But this I won't buy yet.

What is meant by a terminator here? To me, 'terminator' refers to a 
transcriptional terminator. The regulatory signals for protein
translation are different. Having no transcription stop site
should, to my mind, make little difference to protein translation.

Also, something to keep in mind is that translation is a very
controlled system, for good reason. Protein synthesis costs a
hell of a lot of energy. A cell that wantonly made all 6 possible
proteins from a sequence would quickly be selected against in
favor of a cell that only produced the functional one. 

- Sean Eddy
- Dept. of Molecular, Cellular, Developmental Biology
- Univ. of Colorado, Boulder; Boulder, CO 80309
- 
- "Science has done some wonderful things, but I'd rather be happy
-  than right."
- "Are you?"
- "Well, I'm afraid that's where it all falls down."
-                  - from Hitchhiker's Guide to the Galaxy

howard@cpocd2.UUCP (04/16/87)

>In article <569@cpocd2.UUCP> howard@cpocd2.UUCP (Howard A. Landman) writes:
>>Any sequence not containing a terminator does, in some sense, code for 6
>>proteins.

In article <918@sigi.Colorado.EDU> eddy@boulder.Colorado.EDU (Sean Eddy) writes:
>What is meant by a terminator here? To me, 'terminator' refers to a 
>transcriptional terminator. The regulatory signals for protein
>translation are different. Having no transcription stop site
>should, to my mind, make little difference to protein translation.

All I was trying to do was eliminate the possibility of having the coding for
one protein stop and the coding for another start in the same reading frame
in the same sequence, because then the counting gets messier.  Perhaps it
would be clearer to look at it as a function of the base pairs: how many
separate proteins are there for which THIS BASE PAIR is part of the coding?
Or, rephrasing my above statement: Any base-pair can theoretically be part
of the coding for 6 proteins.
-- 
Copyright (c) 1987 by Howard A. Landman.  You may copy this material for
any non-commercial purpose, or transmit this material to others and charge
for such transmission, as long as this notice is retained and you place no
additional restrictions on retransmission of the material by the recipients.

evs@duke.cs.duke.edu (Ed Simpson) (04/30/87)

In article <430@haddock.UUCP> johnc@haddock.ISC.COM.UUCP (John Chambers) writes:
> ....  It seems
>clear after just a little thought that selection would tend to eliminate
>overlapping genes, perhaps replacing them with replicates that can then
>mutate independently.

Not if the overlapping DNA sequences confer some sort of "coadaptaion".
This would be the case if the DNA sequences coded for proteins that conferred
a higher fitness of the organism than homologous proteins produced by other
DNA sequences.  There's lots of discussion in the literature about the 
possibility of selection favoring increased linkage of certain genotypic 
combinations.  In the case of non-overlapping genes there can never be
100% linkage; there is always the possibility of crossover occuring meiosis.
It seems to me that overlapping genes would be one way of achieving 
essentially 100% linkage.
-- 
UUCP: {decvax, seismo}!mcnc!duke!evs  ARPA: evs@cs.duke.edu  CSNET: evs@duke
Ed Simpson, P.O.Box 3140, Duke Univ. Medical Center, Durham, NC, USA 27710

c60a-4er@tart17.BERKELEY.EDU (Class Account) (05/04/87)

a product which is deleterious in
oversupply, such duplication could be deleterious, preserving the 
overlapping loci.

Also, I seem to remember at least one pair of overlapping in-frame genes
whose protein products must share an identical sequence to interact correctly,
and which are protected by being overlapping from mutations which would
disrupt their interaction.  I can't remember what genes or in what organism.
Can anyone help me out with this?

Mary K. Kuhner