[sci.bio] question - DNA's information

cherry@husc4.UUCP (03/31/87)

In article <3310@udenva.UUCP> agranok@udenva.UUCP (Alexander Granok) writes:
>In article <978@aecom.UUCP> werner@aecom.UUCP (Craig Werner) writes:
>>In article <11189@teknowledge-vaxc.ARPA>, rburns@teknowledge-vaxc.ARPA (Randy Burns) writes:
>>> I was wondering roughly how many 'bytes' of information are contained
>>> within human chromosomes?
>>
>>	Hence, if a byte is a base pair, that's your answer, although
>>only two bits are required to specify a base, ergo a 'byte' could 
>>actually be a tetranucleotide, but most sequences are stored as
>>letters (ATCG). 
>
>The whole arguement gets caught up in definitions, here.  I would consider a
>bit to be a base pair, and a byte to be the set of three that encodes for one
>amino acid. 
>....   I guess it all depends on what you mean by "information."
>....   I think a better question might be something like:
>"How many amino acids (words in the language of proteins) are encoded for 
>on the human chromosomes?"  or "How many books could these words fill?
>....   Anyway, I think that would
>give a much more easily palpable idea for the enormity of information involved.


The unit of information stored in the chromosomes is the nucleotide. Not
all of that information ever makes it to proteins. Much of the information
in DNA never even makes it to RNA. Also there are many RNAs which either
have a structural or catalytic function encoded in the DNA. Thus the 
information is at the DNA level. 

Ciliates are single celled animals that have cilia for locomotion and two
types of nuclei. One called the micronucleus is where all the genetic
information is stored. The other nucleus called the macronucleus is where
all RNA synthesis occurs. The interesting fact is that in Oxytricha the 
macronucleus is made of only 15% of the micronuclear DNA complexity. 
Therefore 85% of this organism's inherited DNA sequences are not required 
to make the somatic nucleus where all RNA synthesis occurs, a neccesary 
precursor to protein synthesis.

Mike Cherry
Dept. of Molecular Biology, Mass. General Hospital, Boston
cherry%frodo.decnet@mghccc.harvard.edu

diaz@aecom.UUCP (04/03/87)

In article <1534@husc6.UUCP>, cherry@husc4.HARVARD.EDU (michael cherry) writes:
> 
> The unit of information stored in the chromosomes is the nucleotide. Not
> all of that information ever makes it to proteins. Much of the information
> in DNA never even makes it to RNA. Also there are many RNAs which either
> have a structural or catalytic function encoded in the DNA. Thus the 
> information is at the DNA level. 

Although I hate to pick at others' articles, I have seen the increasing
use of the term "catalytic RNAs" to describe those RNA molecules (e.g.
the Tetrahymena rRNA) which mediate their own splicing in vitro. This
bastardization of the term catalysis has unfortunately infiltrated many
of our scientific journals.

Let's make it clear that self-splicing RNAs mediate their splicing via
intramolecular reactions. Once exons are spliced together, the reaction
is over. By definition, a catalyst accelerates the rate of a chemical
reaction without itself being consumed in the net reaction.
Self-splicing RNAs are NOT catalytic. 

The only examples I know of genuinely catalytic RNAs are: M1 RNA from
E.coli, which has been elegantly shown by Sidney Altman & coworkers to
accurately process the 5' termini of tRNAs in the absence of its
physiological protein cofactor in vitro; M1 RNA-like molecules from
Bacillus subtilis and other organisms; the Tetrahymena rRNA intron left
over from the self-splicing of this transcript, which has been shown by
T. Cech, et al, to possess RNA polymerase- and RNA restriction
endonuclease-like activities in vitro. There may be other examples of
which I may not be aware (if so, let me know). Let's be careful in our
description of the members of the RNA world. 

-- 
            dn/dx = Dan Diaz    (philabs!aecom!diaz)
            Department of Molecular Biology & Pizza Chemistry AECOM
            "Hold the E.coli"

cherry@husc4.UUCP (04/05/87)

>Let's make it clear that self-splicing RNAs mediate their splicing via
>intramolecular reactions. Once exons are spliced together, the reaction
>is over. By definition, a catalyst accelerates the rate of a chemical
>reaction without itself being consumed in the net reaction.
>Self-splicing RNAs are NOT catalytic. 
>
>The only examples I know of genuinely catalytic RNAs are: M1 RNA from
>E.coli, which has been elegantly shown by Sidney Altman & coworkers to
>accurately process the 5' termini of tRNAs in the absence of its
>physiological protein cofactor in vitro; M1 RNA-like molecules from
>Bacillus subtilis and other organisms; the Tetrahymena rRNA intron left
>over from the self-splicing of this transcript, which has been shown by
>T. Cech, et al, to possess RNA polymerase- and RNA restriction
>endonuclease-like activities in vitro.

The group I and II self-slicing RNAs, as well as the viroid RNAs, act on
part of themselves. However the work of Cech's group in Boulder and
Szostak's group in Boston have shown that the "enzyme" part of the group I
self-splicing introns (specifically the Tetrahymena rRNA sequence) is not
destroyed by the three reactions taking place in vivo but rather the
"substrate" part of the molecule is destroyed. Cech coined the term
ribozyme to designate these RNAs which perform the specific reactions
involved in the RNA splicing reaction. Generally they are considered not to
be catalysts by protein chemist because they only act once not because they
happen to cut themselves. In vitro these RNA molecules have been modified
by both Cech and Szostak to be true enzymes, acting on multiple substrate
molecules, not being changed by the reaction they catalyze, etc. 

Thus while it is true that in vivo the self-splicing RNAs are not catalytic
in the complete sense of the word they do carry out a reaction which would
not have happened without them. The change which occurs in one of these
RNAs is not the result of the ribozyme becoming involved as an intermediate
to the reaction but because the ribozyme cuts itself in a subsequent
reaction, it removes the recognition part of the enzyme. 

If these molecules can be enzymes in vitro, perhaps its just a matter of
time before more truely catalytic RNA molecules are discovered which act as
enzymes in vivo. 

Mike Cherry 

michael@m-net.UUCP (04/06/87)

In article <11189@teknowledge-vaxc.ARPA>, rburns@teknowledge-vaxc.ARPA (Randy Burns) writes:
> I was wondering roughly how many 'bytes' of information are contained
> within human chromosomes?
and gets large number of replies (some with novel definitions of bits
and bytes).

Some of the replies made the assumption that only the information that was
eventually transcribed into protein was significant, and that repeated
sequences, introns, the third base pair of some three-letter codes, and so
on, could be ignored.  I must take issue with that.

Introns may yet prove to have functions beyond their own excision.
(Negative feedback on protein production is an obvious candidate.) 
Some of the regions near protein-coding sections regulate expression.
The third codon for the hypervariable part of antibodies is significant
(affecting the potential antibodies after DNA editing), and may have
other effects in other genes (i.e. affecting the likelyhood of cancer
by changing the probability that a growth-regulator will mutate into
an "always-grow" form).  Repeating sequences may function in DNA repair
mechanisms.

Because of these and other possible functions of DNA, I would not exclude
any information from consideration, and would answer with the number of
8-bit bytes required to specify the reconstruction of the actual DNA found
in an individual.

At 3*10^9 codons, and two bits per codon, that's 3/4 * 10^9 bytes, or
about 750 Megabytes.  (Specifying cutting points to separate the
chromosomes, and other artifacts like the arbitrary choice of direction
of the chromosomes and their order in the representation, gain or lose
so few extra bits they're lost in the one-significant-digit estimate of
the number of codons).  We're starting to see disk drives about that
big.

But this number really represents the maximum data that the genome
could hold.  Some of the redundancies could be absorbed in data
compressing codings, squeezing down (but not quite eliminating) the
data representing repeating regions, but not affecting third-codon
redundancies.  Sorry, I can't put a number on that.

  "I've got code in my node."	| UUCP:  ...!ihnp4!itivax!node!michael
				| AUDIO: (313) 973-8787
	Michael McClary		| SNAIL: 2091 Chalmers, Ann Arbor MI 48104

(If you want to be sure I see it, MAIL to the address in my SIGNATURE!)