[sci.bio] Analog/Digital Distinction

roy@phri.UUCP (Roy Smith) (11/11/86)

In article <680@randvax.UUCP> edhall@rand-unix.UUCP (Ed Hall) writes:
> Nature chose digital code of three-digit base-four numbers to determine
> how you and I are put together. [...] There is a good engineering reason
> why this is so.  You can say all you want about the discontinuous nature
> of digital representations as opposed to analog, but the fact remains
> that digital is exactly reproducible, while analog is not.

	This is one of my favorite topics, so I'd like to expand on that a
bit.  I trust the real biologists out there will take into account the fact
this this is a huge gross simplification of a complicated subject and not
take me to task on details.  I have been deliberately loose with
nomenclature to highlight the information processing aspects at the cost of
some biological accuracy.  Readers interested in finding out more are
encouraged to get a good book on molecular biology.  Jim Watson's
"Molecular Biology of the Gene, 3rd edition (1976)" is a good place to
start.

	The Genetic code is indeed 3-digit, base-4 numbers.  It's also an
overloaded code -- the mapping from DNA to Amino Acids (AA's) is not
one-to-one.  Some AA's are coded for by more than one codon (3 base DNA
sequence).  What's really interesting, is that the copying of DNA does
*not* have the perfect accuracy we have come to expect from digital
processes.

	DNA exists in the cell most of the time as double stranded (dsDNA).
This means that each base exists twice, once on one strand, and again on
the other strand in its complementary form.  After replication, you have a
piece of dsDNA in which one of each base pair is from the original piece of
DNA, and the other is a copy.  You digital types will recognize this as a
2-symbol ECC, with 1 data symbol and one check symbol (there are 4 symbols,
so you can't really say "bits").

	OK, now that we've got our base pairs, what do we do with them?
Well, a wonderful thing happens -- an enzyme (Pol1?) comes along and
re-reads both strands of the new dsDNA.  Every time it finds a place where
a base-pair is wrong, it corrects it.  But, you ask, with only a single
check symbol (Hamming distance < 1), how do you know which one to trust?
The answer is that you don't!  You fix one of them at random and hope it's
the right one.  If it's not, no big deal.  Either you've introduced a fatal
mutation which will take care of itself, or you've made a "silent mutation"
which doesn't make any difference (remember the many-to-one mapping of
codons to AA's).  Of course, you might have just lucked out and made a
useful mutation, in which case you're off on the road to evolution.

	If you really get into this, it's amazing how many computer science
concepts were thought of by living cells first.  The most obvious is that
DNA is a program.  Then you have ECC (described above), subroutines
(different enzymes made from common subunits), regular expressions
(restriction enzymes), compilers and assemblers (ribosomes and tRNA's)
compile-time preprocessing using #ifdef's (introns), self-modifying code
(transposons and integrating phages), portable programs (plasmids), P&V
operations (numerous regulatory systems), etc.  You can even think of mRNA
as a vector register, DNA as main memory, and chromosone-histone complexes
as demand paging from a file system (or maybe as archival tape storage).
-- 
Roy Smith, {allegra,cmcl2,philabs}!phri!roy
System Administrator, Public Health Research Institute
455 First Avenue, New York, NY 10016

"you can't spell unix without deoxyribonucleic!"

werner@aecom.UUCP (Craig Werner) (11/13/86)

> 
> 	The Genetic code is indeed 3-digit, base-4 numbers.  It's also an
> overloaded code 
	Not overloaded - it's redundant, which means sometimes mutations
can be silent at the level of the protein (and hence phenotype).
> 
> 	DNA exists in the cell most of the time as double stranded (dsDNA).
> 
> 	OK, now that we've got our base pairs, what do we do with them?
> Well, a wonderful thing happens -- an enzyme (Pol1?) comes along and
> re-reads both strands of the new dsDNA.  Every time it finds a place where
> a base-pair is wrong, it corrects it.  
>                                      how do you know which one to trust?
> The answer is that you don't!  You fix one of them at random and hope it's
> the right one.  


	First of all there is also another level of proofreading during
synthesis, so there is two chances to get it right before one leaves it
to chance. And of course, everyone I know is diploid.
	Secondly and totally irrelevantly, I would like to bring up a
quote attributed to someone who sequences genes for a living:

	"Since the DNA Polymerase, in-vivo and even in-vitro is much
better at proofreading than humans, the net result is that the mutation
rate of DNA sequences is much higher in press than in vivo."


-- 
			      Craig Werner (MD/PhD '91)
				!philabs!aecom!werner
              (1935-14E Eastchester Rd., Bronx NY 10461, 212-931-2517)
                                 "But I digress..."

chiaraviglio@husc2.UUCP (lucius) (11/13/86)

In article <2489@phri.UUCP>, roy@phri.UUCP (Roy Smith) writes:
>        . . .What's really interesting, is that the copying of DNA does
> *not* have the perfect accuracy we have come to expect from digital
> processes.
[Stuff about how new strands of DNA are derived from each other deleted.]
> 	OK, now that we've got our base pairs, what do we do with them?
> Well, a wonderful thing happens -- an enzyme (Pol1?) comes along and
> re-reads both strands of the new dsDNA.  Every time it finds a place where
> a base-pair is wrong, it corrects it.  But, you ask, with only a single
> check symbol (Hamming distance < 1), how do you know which one to trust?
> The answer is that you don't!  You fix one of them at random and hope it's
> the right one.  If it's not, no big deal.  Either you've introduced a fatal
> mutation which will take care of itself, or you've made a "silent mutation"
> which doesn't make any difference (remember the many-to-one mapping of
> codons to AA's).  Of course, you might have just lucked out and made a
> useful mutation, in which case you're off on the road to evolution.

	Wrong.  DNA Polymerase I does not come along to fix errors after
replication of DNA, but rather does it while it is replicating the DNA.  Every
time it attaches a new base to the strand it is synthesizing, it will refuse
to proceed unless that base pairs properly with the corresponding one on the
old strand.  If the new and old ones will not pair correctly, it snips the new
one off.  The way it can distinguish between new and old strands is that it is
holding on to the two strands in different ways, and also the new strand is
not complete (even if DNA Polymerase I runs into more completed strand, it
cannot close the nick, but will just eat up the part of the strand that it is
running into in order to make space for what it is putting down -- this gives
you a way to radioactively label strands of DNA in places other than the
ends ("nick translation")).  (Completion of a strand (joining the growing (3')
end to the beginning (5' end) of the next part of the strand) requires DNA
Ligase.)  No "fixing at random" is involved, except for the low probability of
the condition which caused DNA Polymerase I to put the wrong base in in the
first place continuing long enough for DNA Polymerase I to put in the next
base after that (which would still have an enhanced chance of not sticking
even if correctly matched, due to the overall weakening of pairing caused by
the mismatched base before it).  This is one of the reasons mutation rates are
as low as has been observed.

	Other ways in which errors are corrected non-randomly depend on the
fact that most alterations to DNA produce invalid bases rather than valid but
incorrect bases.  For example, under UV light thymidines which are next to each
other dimerize, producing obviously invalid bases; under any conditions
cytidine may spontaneously deaminate to uridine (which does not occur in DNA,
but only in RNA, where the resulting error would be less disastrous); the
result of both alterations (and some others) are things which specific enzymes
can recognize as invalid and cleave out, to be replaced with properly matching
bases.  The probability of a base mutating while it is transiently unpaired
(due to a mishap to its partner) is much lower than that of a base mutating
while paired, because only a small fraction of the bases are unpaired at any
given time.

	Recommended reading:  _G_e_n_e_s (or _G_e_n_e_s_ _I_I) by Lewin, and _M_o_l_e_c_u_l_a_r
_B_i_o_l_o_g_y_ _o_f_ _t_h_e_ _C_e_l_l (by 4 authors whose names I cannot remember right off
hand -- this is the 1983 edition; I hear a 1986 edition may be out by a
somewhat different set of authors).

-- 
	-- Lucius Chiaraviglio			Department of Molecular
	   chiaraviglio@husc4.harvard.edu		Biology,
	   seismo!husc4!chiaraviglio		Massachusetts General
							Hospital

Please do not mail replies to me on husc2 (disk quota problems, and broken
mail system won't let me send mail out).  Please send only to the address
given above, until tardis.harvard.edu is revived.

chiaraviglio@husc2.UUCP (lucius) (11/14/86)

	It has been pointed out that I made an error in my previous message in
these newsgroups.  This is due to omission of the underlined words in the
following corrected segment of the affected paragraph in that message:

> 	Wrong.  DNA Polymerase I does not _h_a_v_e_ _t_o_ come along to fix errors _o_f_  
> _r_e_p_l_i_c_a_t_i_o_n_ after _t_h_e_ replication of DNA, but rather does _t_h_e_ _g_r_e_a_t_ _p_a_r_t_ _o__f_  
> it while it is replicating the DNA.  _(_I_t_ _i_s_ _a_l_s_o_ _u_s_e_d_ _t_o_ _h_e_l_p_ _c_o_r_r_e_c_t_ _e_r_r_o_r_s_  
> _a_t_ _o_t_h_e_r_ _t_i_m_e_s_._)_ _ _T_h_e_ _w_a_y_ _i_t_ _w_o_r_k_s_ _d_u_r_i_n_g_ _D_N_A_ _r_e_p_l_i_c_a_t_i_o_n_ _i_s_ _t_h_i_s_:_ _ _every
> time it attaches a new base to the strand it is synthesizing, it will refuse
> to proceed unless that base pairs properly with the corresponding one on the
> old strand.  If the new and old ones will not pair correctly, it snips the new
> one off.
[Rest of message not shown here.]

	That should make it much clearer.

-- 
	-- Lucius Chiaraviglio
	   chiaraviglio@husc4.harvard.edu
	   seismo!husc4!chiaraviglio

Please do not mail replies to me on husc2 (disk quota problems, and broken
mail system won't let me send mail out).  Please send only to the address
given above, until tardis.harvard.edu is revived.