prion@STOLAF.EDU (Chris Swanson, Moderator) (04/21/91)
Prion Digest Sat, 20 Apr 91 Volume 1 : Issue 6 Today's Topics: Administrivia Origins of Viruses ---------------------------------------------------------------------- Date: Thu, 18 Apr 91 14:13:21 -0500 From: swansonc@stolaf.edu Subject: Administrivia To: prion Our mail system seems to have dropped any messages for prion, prion-request, or prion-archive on the floor yesterday (17 and 18 Apr). As such, I lost all of these messages. Unless I replied to you already, please re-send any messages. One I remember in particular was from a news account (netnews or something like that) from, I believe nwu.edu (I could be wrong here) flashing by. If you know of such an account that made a subscription request this last week, please request them to re-send it with my apologies. Take care, -Chris Swansonc (Prion Digest Moderator) ------------------------------ Date: Wed, 17 Apr 91 12:29:10 -0400 From: Daniel Enxing <djex@ll.mit.edu> Subject: Origins of Viruses To: prion@stolaf.edu Text item: This message is intended for your new Prion & Virus List. The following is an attached File item from cc:Mail. It contains eight bit information which had to be encoded to insure successful trans- mission through various mail systems. To decode the file use the UUDECODE program. [ I uudecoded this text and found that the only 8-bit characters were ] [ ^M's and the standard MS-DOS ^Z EOF at the end. I removed these and ] [ replaced the uuencode text with the resulting clear text. ] [ - Chris Swanson, Prion List Moderator ] -- Plain text follows this line -- ARE VIRUSES RENEGADES? Many years ago, in an attempt to understand how a metazoan cell could possibly do the miraculous things that it does, I came across John Platt's 'book model' of the cellular machinery ('Horizons in Biochemistry', Academic Press, New York, N.Y. 1962) which starts out: "The expression of genetic information in cells and whole organisms is like the reading out of a complex instruction manual, but the analogy extends to more detail than is generally realized. The information is linearly arranged in "words" that are "read out" sequentially in time. There is one copying mechanism (DNA polymerase) for reprinting the whole book, and another (RNA polymerase) for selective read-out into cell chemistry. The read-out is by "paragraphs" (genes) and by "pages" (operons) that can either be "closed" (repressed) or "opened" (induced), according to contingent "instructions" (repressor-corepressor complexes) from "references" (regulator genes) on earlier pages or in "books" of adjacent tissues." Although I admired this paper greatly, I was plain to me that, with respect to eukaryotic processes, the model evaded a very fundamental question, namely, "How are the 'pages' turned?". In my quest for mechanisms that could serve this purpose I was driven to the conclusion that cellular processes absolutely have to be 'real-time' processes. One of the unexpected fringe benefits of a 'real-time' model was that it suggests roles for some of the so-called "junk code" in the genome. I wrote a draft paper outlining the idea, but never got around to having it published. The ideas, apparently, were too outre at the time, because not many people ( and even fewer biologists) understood much about information processing -- and those who knew anything about real-time systems were even sparser on the ground. It may well be that, as Zola said, 'there is nothing as powerful as an idea whose time has come' -- but in my experience 'there is nothing as impotent as an idea whose time has not yet come.' Your announcement about a network for prions and viruses started me thinking again, along slightly different lines. The exquisitely orchestrated, very precise, processes carried out inside the nucleus of an eukaryotic cell demand that the milieu be very tightly controlled. The chances of a random bit of DNA, introduced somehow into the nucleus, ever managing to get itself inserted into the genome in such a way that it is replicated seem to me to be negligible. Yet, we know that viruses achieve this routinely. How can this be? It soon polymerized on me that I already had a plausible answer within my grasp and that with very minor modifications (to my original paper, not to the model) one could derive a logically satisfying (to me, at least) explanation of the origin of viruses. I propose that the answer might be that a virus is not (and never was) just a random bit of DNA. It is able to seize control of the cellular replicating machinery because it has (coded within itself) "inside information" -- derived, no doubt, from a renegade ancestor, in a direct line -- who was once a member in good standing of the organization and so could have been privy to the detailed information necessary to success in this venture. The ideas expressed here are entirely original. Any feedback would be welcome. My network address is: <djex@LL.MIT.EDU> JUNK CODE & VIRUSES INTRODUCTION In higher organisms the nuclear DNA is complexed with proteins and some RNA, known collectively as 'chromatin'. Not quite all the information in the cell is inherent in the chromatin; the organelles (mitochondria &c.) have to be taken into account too. For the purposes of this treatment, though, we shall accept without question the "dogma" that 'all the information needed to complete the organism, as well as the information that must be used by the developing organism to commence its interactions with its environment are inherent in the chromatin complex.' Information can only exist in a context -- it cannot exist 'in vacuo'. For information to be of use, it must be retrievable if and when needed. The role of a library, as a repository of information (an information base) can be fulfilled only if any given item of information in it is retrievable on demand. If all the books in the Library of Congress were to be thrown haphazardly into a warehouse (or stored tightly packed in crates) it would no longer be a library and the information per se would effectively cease to exist. There is an enormous amount of information in the mamamlian genome. This information is at the disposal of the cellular machinery. The cell must, however be able to gain access to whatever specific information it needs whenever it needs it. Each of us comes into being from a single cell, a fertilized ovum. Every time the egg divides, each daughter cell inherits a complete copy of the genome -- it inherits a portfolio of genes. Its nucleus contains all the genes which, encoded in DNA, specify all the different cells in the adult body. At some specific time during development of the organism each cell specializes; it becomes a liver cell or a kidney cell or a neuron, say. From that time on, all its daughter cells will be of the same kind. The formation of a specialized cell does not result from loss of genetic material; rather it follows from a change in the reading of the whole genome -- 'selective gene expression', as it is called. Once a cell has differentiated its metabolic behavior is also determined. Even though the the code in a liver cell, for example, contains the 'programs' also used by a working kidney cell, these may normally never invoked by the cell. Only the code that governs the metabolism of the liver cell can be allowed to be expressed without serious deleterious consequences. The nucleus must, therefore, embody specific regulatory mechanisms capable of activating and deactivating particular regions of the genome for RNA translation and protein synthesis, depending on the instantaneous state of the cell. The emergence of the cellular machinery conferring the ability to express, selectively, different regions of the genome (i.e. different code) is what enabled metazoans to arise and evolve. Some knowledge of regulatory functions in prokaryotes has been gleaned (e.g the lac operon) but the mechanisms by which the selection of genetic potential in the eukaryotic cell is accomplished is still largely unknown and represents one of the most challenging problems in modern biology. In 1971, in an editorial in 'Nature', it was declared that "the structure of the eukaryotic chromosome is the vital issue that must be resolved before research today in cell biology can produce a coherent set of concepts instead of a mass of unrelated data." Almost 20 years later, as far as I can tell, the problem is still largely unresolved. It is my contention that prions and viruses represent part of the regulatory machinery that 'escaped' and mutated. If this conjecture were to be established as fact, it offers the possibility that prion- and virus-like artifacts might be used as 'probes' to elucidate the cellular machinery and give us greater insight into their depredations within the cell. A CYBERNETIC MODEL OF CELLULAR PROCESSES It is generally accepted that information is encoded in the sequences of nucleotides that constitute the DNA in an eukaryotic cell. The details of the triplet code are now well known and the process of transcription, during which the encoded information is precisely translated into complementary strands of RNA that direct the synthesis of specific proteins is well understood. The codes for proteins constitute only a part of the genome. One of the most awkward facts to account for when analyzing the heredity of higher organisms is their great excess of DNA; the amount varies with the species, of course, but there always seems to be far more in the genetic material than can be accounted for by the sum of the the codons needed for proteins production. Some stretches of the 'redundant' code are thought to be regulators which govern the production of protein (analogous to operons in prokaryotes). In additon there is a large amount of repetitive code which seems to serve no apparent purpose. Some biologists refer to this component as "junk code." In attacking the problem of regulation, the first question is one of strategy: how should one attempt to resolve the issue? Since information is the currency of genetic trransactions, it seems natural to try to consider the problem from an information- processing point of view. Nature (if I may be permitted an anthropomorphic metaphor) is a tinkerer, not a designer. The structures that we uncover are "Rube Goldberg" contraptions -- superbly engineered and optimised through the agency of natural selection, but kludges nevertheless. Experience with computer systems, which are orders of magnitude simpler, show that it is supremely difficult to fathom the logic behind such 'ad hoc' constructions "from the bottom up". Is there perhaps a different approach with greater promise? Starting with some rather basic assumptions, a case will be made for selecting a particular information-processing structure. A model of this structure will be described and some consequences will be drawn from the given model. Finally, it will be shown that the cell supports processes similar to those required by the model. FUNDAMENTAL ASSUMPTIONS (1) The physical information structure which resides in the genomic DNA is LINEAR (or, at most, closed in the form of a ring); IT IS NOT BRANCHED OR STRUCTURED IN ANY OTHER WAY. (2) Processing of the information in the DNA does not start at some (global) 'beginning' and proceed sequentially (endlessly) from there on. That is to say, even though a particular stretch of code is expressible (locally) sequentially, for a specific protein, the transcription site for the next product need not necessarily be adjacent to it. (3) Nor is it random. (4) The program embodied in the genome must be responsive to patterns of input 'signals' from three levels: o intranuclear o intracellular o intercellular These assumptions imply that the logic of the process is BRANCHED even though the code for the process is LINEAR. HYPOTHESES THE CELLULAR PROCESSES HAVE TO BE UNDER THE CONTROL OF THE LOGICAL EQUIVALENT OF AN INHERENT EXECUTIVE 'PROCESS CONTROL SYSTEM' WHICH MONITORS INTERNAL AND EXTERNAL SIGNALS, RECOGNISING AND RESPONDING TO SPECIFIC PATTERNS OF STIMULI ACCORDING TO THE CURRENT STATE OR CONTEXT OF THE SYSTEM. FURTHERMORE, THIS EXECUTIVE SYSTEM IS A 'REAL-TIME SYSTEM', IN THAT IT MUST BE CAPABLE OF IMMEDIATELY SUSPENDING ONGOING ACTIVITY AND SWITCHING TO A NEW MODE OF OPERATION -- UNDER CONTROL OF SOME INTERNALLY-EMBODIED PROTOCOL -- WHEN A NEW STIMULUS IS DEEMED TO HAVE A HIGER PRIORITY THAN THE PROCESS CURRENTLY BEING EXECUTED. An example of a real-time reaction might be the response of a cell to adrenalin. If we accept Herbert Simon's argument that biological systems have to be hierarchical because there has not been enough time for any other kind of system to evolve, then 'execution' of the 'program' embodied in the chromatin could be represented by a branching structure or hierarchical 'tree'. Such an executive system would be capable of mapping a linearly- ordered physical information structure into a logically-ordered branched 'time-series' of processes. To do this, it must be able to 'address' specific segments of the program (code) as needed. For this hypothesis to be viable, it is necessary to show that there exist mechanisms capable of accomplishing this feat. Because of the addressing structure (the memory organization) and the sequential nature of programs in a digital computer, the executive can invoke a specific process at will by 'pointing' to it -- transferring control to it by reading the appropriate address into the program counter. The genome does not appear to have any such addressing structure; it is, so to speak, 'diffuse'. One way of achieving the desired effect in a diffuse structure would be to seal off all code except that for the process called for at the moment. This seems to be the method that actually evolved. The coiling and supercoiling of the paired DNA strands are the means whereby only certain sites are allowed to be active at any time. Code that is not meant to be expressed at that time is 'hidden' within the coils -- only code meant to be expressible in the given context is exposed by the local uncoiling of the DNA strands. Such an arrangement would call for 'filler', to keep unexpressible code far enough away to be inaccessible, and this filler may be an important component of the so-called 'junk' code. It is immediately obvious that a liver cell, for example, could become specialized by sealing off forever (by phosphorylation?) all code not specific to the metabolism and replication of liver cells. The executive system could orchestrate cell activity by opening and closing sections of code selectively. An agent that interfered with the seals and allowed 'outlaw' code to be expressed during replication, might cause tumors or cancerous cells to develop. Because of the real-time nature of the system, the executive process itself always has to be 'resident' (i.e. available) to avoid the condition that programmers refer to as the 'deadly embrace'. To illustrate this condition, consider the analogous problem of executing a very large program in a computer with a disk drive but very limited random access memory. Each successive program segment has to be read in as needed. The disk driver -- that process which actually causes the data to be read from the disk into memory -- always has to be resident in memory. If it were to be inadvertently swapped out to disk, a 'deadly embrace' would result because now there would be no way to read in the next segment. By the same token, the 'code' for processes that when expressed, cause the currently-open segment of the genome to wind up and which open the next appropriate segment, always has to be available when needed. Operationally, this means that every lowest common denominator of open code has to contain its own copy of the executive process, so there will have to be a multiplicity of copies of the executive system distributed throughout the system. A 'bug' in the program, (a mutation, that is), in the sequence coding for a protein may or may not be lethal. If it is not lethal, it might be neutral or even beneficial -- or it might have delayed effects, causing complications later (e.g. sickle-cell anemia). Executive systems, (judging from experience with computer systems) are far less tolerant of aberrations. An error in the executive process is most likely to be lethal; so one could expect the code to be highly conserative. This means that the multiple copies of the executive system are likely to be very similar, providing another source of repetitive non-protein- forming code. The executive process itself, furthermore, may not be monolithic, but may itself need to be distributed. This probably would entail a good deal of filler, adding to the non-functional 'burden'. Five (of eight) histones have been isolated and sequenced from a wide spectrum of eukaryotic species, suggesting that it 'froze over' very early in the history of eukaryotic organisms. One might expect that the executive 'machinery' -- playing, as it does, such a fundamental role in cellular function -- would have had to have come into being equally early on. Indeed, the histones, needed for coiling the DNA are very much part and parcel of the same regulatory processes. It follows that the regulatory machinery would be equally widespread and at least as conservative. A search for such invariant processes of repetitive DNA would pinpoint the sections of the genome that represent the executive system and isolate them for further study. Segregation of function in membrane-limited nuclei, mitochondria and plastids is another hall-mark of eukaryotic organisms. The separation of genes for complex organellar elements may be a general principle of organelle and eukaryotic biology. In the interests of efficiency, probably, these organelles (e.g. mitochondria) have had some of their functions (and the associated code) taken over by processes in the nucleus. Mitochondria, for example, no longer make their own membranes. Why, then, have they retained any code at all? A plausible answer is given by this model. An organelle may have to carry out some functions that cannot be subservient to the current process in the nucleus. That is to say, it has to carry out its process irrespective of what is happening in the nucleus (respiration, for example) at the time. By executing its own code, independently, it becomes an 'asynchronous' (satellite) processor, performing its appointed function irrespective of the instantaneous state of contemporary nuclear processes. It seems likely that the logical tree in the DNA has three major branches, each one controlling a specific function: o development o replication o metabolism The Principle of Parsimony suggests that the developmental and the replicative processes might share some commom code. The executive program is more than just a switching network -- it is a dynamic process. It contains information it uses (and modifies) to determine its pathways depending on the instantaneous context. CONCLUSIONS Certain predictions can be made from this model: o The chromatin will contain a large amount of repetitive code, some of which( filler code) may seem non- sensical. o Some of this repetitive code is functionally equivalent to an executive process control system - which is highly conservative, and therefore, may seem 'primitive' - will be found (modulo minor variants) across a very wide spectrum (if not all) eukaryotic organisms o some of the non-histone chromosomal proteins are not for export, but are generated solely for control purposes within the nucleus - A DNA sequence which generated such a protein, if it escaped, might be the precursor of a virus. o some intranuclear RNA may play a similar role. The role of inverse transcriptase is to make this possible. - A RNA sequence which generated such a protein, if it escaped, might be the precursor of a retrovirus. This model allows for evolutionary change in anatomy and way of life to be based on changes in the information controlling the expression of genes as well as point mutations in protein- producing genes. So it is possible for species such as humans and chimpanzees to differ so substantially in anatomic detail and way of life and yet have proteins that are 99% similar. Max ben-Aaron. in care of <DJEX@LL.MIT.EDU> ------------------------------ The "Prion Digest" is a Usenet distributed e-mail list, compiled from postings to it, and distributed weekly (current plan is for early Sat. A.M.) While the main goal of the digest is to provide a resource for researchers working with prions and interested bystanders, all are welcome. All articles posted will be included in the next digest. If a poster feels that his posting is of an urgent nature, it may be distributed sooner than the regular digest. If you want to post an "urgent" message send it to the prion-request address, not the prion one. All requests regarding administrivia (subscriptions, cancellations, comments, etc.) should be mailed to the moderator <prion-request@acc.stolaf.edu>. All postings to the digest should be directed to <prion@acc.stolaf.edu>. There are archives of all back issues available via anonymous ftp from beowulf@acc.stolaf.edu (130.71.192.20) in the pub/prion directory. If you do not have ftp access, please write <prion-archive@acc.stolaf.edu> and back issues will be mailed to you. -- Chris Swanson (Prion Digest Moderator) <swansonc@acc.stolaf.edu> ---------------------------------------------------------------------- End of Prion Digest ------------------------------