[comp.ai] Information Capacity of Human Genome

srt@aero.ARPA (Scott "RCA" Turner) (11/09/88)

In article <393@uceng.UC.EDU> dmocsny@uceng.UC.EDU (daniel mocsny) writes:
>The information content of the human genome is ~750 MB, of which
>a sizable fraction determines our basic brain structure.

This is a bit off the original subject, but do you have a cite for this
number?  And is there any evidence concerning how much of this information
is duplicated in the genome?

						-- Scott Turner

spector@brillig.umd.edu (Lee Spector) (11/10/88)

In article <393@uceng.UC.EDU> dmocsny@uceng.UC.EDU (daniel mocsny) writes:
>The information content of the human genome is ~750 MB, ...

I have been told by a geneticist (I don't have any other reference) that
if you take the folding of the genetic proteins into account (and not
just the sequence of bases) then the informational content of the 
human genome is MUCH larger than this.
(Apparently the pattern of folding does have "informational content" - 
it affects which traits are expressed, etc.)  He didn't give me an exact
figure, but he indicated that the number was significantly larger than
the memory of any existing computer.

   - Lee Spector  (spector@brillig.umd.edu)

josh@klaatu.rutgers.edu (J Storrs Hall) (11/11/88)

In article <393@uceng.UC.EDU> dmocsny@uceng.UC.EDU (daniel mocsny) writes:
>The information content of the human genome is ~750 MB, of which
>a sizable fraction determines our basic brain structure.

... of which a fairly *small* fraction determines brain structure.
I have read estimates on the order of a few megabytes (smaller than
Common Lisp!).  Of course, this is very compressed, like a fractal
description of an image...

--JoSH

srp@cgl.ucsf.edu (Scott R. Presnell%Langridge) (11/12/88)

In article <Nov.10.15.59.59.1988.4983@klaatu.rutgers.edu>
	josh@klaatu.rutgers.edu (J Storrs Hall) writes:

>In article <393@uceng.UC.EDU> dmocsny@uceng.UC.EDU (daniel mocsny) writes:
>>The information content of the human genome is ~750 MB, of which
>>a sizable fraction determines our basic brain structure.
>
>... of which a fairly *small* fraction determines brain structure.
>I have read estimates on the order of a few megabytes (smaller than
>Common Lisp!).  Of course, this is very compressed, like a fractal
>description of an image...
>
>--JoSH

I hesistate to redefine a byte, so I think the best way to quantify the
situation is to use original units.

The human genome includes 2.3e+9 base pairs. For the sake of simplicity
lets treat the number as 5.0e+9 bases as there may be situations where the
two strands perform different functions.

It has been estimated through areguments of relative complexity of
organisms that the human genome probably contains about 100e+3 genes. If we
make a big assumtion and say that the average gene is 1000 bases after
being appropriately processed, that leads us to 100e+6 bases required for
genes, Therefore ~%5 of the genome is used for genetic information in the
form of genes (or proteins). This is probably a underestimate of the actual
information required for an organism to function.

Stepping out onto a limb:
As for what fraction actually determines the structure of our brain, well,
let's just say that the brain is only one organ or cell type (out of say
20?)  that our cells must differentiate into.  So maybe 0.3% of the genome
determines the structure the brain?  You get the idea...

As for the amount of memorey required to store the genome:
The four bases can be represented by 2 bits. Furthermore, only one strand
need be stored since the other strand can be calcuated from it. That means
we need 5.0e+9 bits or 625e+6 bytes to store the sequence, as calculated
above.

Cheers,



Scott Presnell						       +1 415 476 5326
Dept. of Pharmaceutical Chemistry
Univ. of Calif. at San Francisco (UCSF), San Francisco, CA. 94143
Internet: srp@cgl.ucsf.edu UUCP: ucbvax!ucsfcgl!srp Bitnet: srp@ucsfcgl.bitnet