HUENSD@vax1.computer-centre.birmingham.ac.uk (02/14/91)
I am a bit of a newcomer to the apparently violent world of bio-matrix. Is it always like this? Anyway, I thought I'd risk abuse and pitch in my two-bits worth. The virus I make my living off is the Epstein-Barr virus - completely and heroically sequenced by Farrell and Barrell. 172 kb with only one error known to date. What did this amazing feat gain for us? A fair bit - about 100 open reading frames, many of unknown significance. Enough info to make some informed guesses as to the organisation of herpesvirus genomes and in comparision to other herpesviruses, a molecular basis for their taxonomy. What did it not tell us? The interest in the virus is primarily as a possible human transforming virus. The data obtained did not "find" the genes expressed in transforming infection for us. Of the eight proteins expressed during transforming infection, 2 were identifiable as ORFs in the sequence, 4 were disrupted by introns in such a manner that while part of the deduced protein sequence was present, the rest was in a nearby ORF. 2 were spliced from so amny introns as to have been unidentifiable from the sequence and were found by cDNA cloning approaches anyway. Promoters and enhancers? 6 of these genes share two common promoters and by alternative splicing over 80 kb assemble the required mRNAs. And in between are all the other ORF's. Predict the mRNAs? Not likely. I do not believe it would have been possible to get any of this info by poring over the sequence - try it, all you need is in files HS4 and HS4-2. How about the lytic cycle switch gene BZLF1? The location in which this gene was believed to lie was known. The sequence of the region was known. It belongs to the zipper protein family but But ultimately, it had to be found by cDNA cloning. Why? There was an ORF in that region but the basic region characteristic of the zipper proteins was added to the ORF by a splice. The dimerisation domain has no leucines at required intervals but functions identically. I freely admit I am thankful that EBV was completely sequenced. It helps me a lot in designing primers, excising ORFS for expression etc., but if one thinks that one can get the info you really desparately want on the biology of the virus by poring over this sequence alone without doing lots of bench science, one is very mistaken. And all this in a virus with constraints on genome size and therefore every incentive to maximise coding content. The human genome is probably a much worse nightmare given the profligacy with which we higher (and therefore less green?) animals use DNA. If I had to resort to sequencing to make a living, I'd much rather do a cDNA library from a tissue of some interest. Far less trash. If any gene is used for tissue specific regulatory purposes in this tissue, a homologue will most likely be used in another tissue for similar purposes anyway so you will probably get most classes of proteins of interest even on this rather limited pass. There has also been suggestion that we will get incredible new insights into genome organisation and higher order regulation from the sequence. Will we? Take the globin locus. A fair chunk has been sequenced but the domains responsible for regulation of the locus were obtained the hard way. The trouble is even if it were there, would we recognise it? We do not have the knowhow to identify a DCR or LCR (or whatever they call it these days) by sequence and no amount of looking at a sequence will tell us this. In contrast, a few person-years of good bench work may. SO why the hurry in getting large chunks of virtually impenetrable info? I fully agree with your correspondent that says junk DNA may not be junk, repetitive DNA may have an important role. BUt in what way will the sequence of the human genome tell you any more than you already suspect? Milosav points out in support of the Genome project that the 1000 Alu sequences in Genbank were in some way instrumental in defining Alus as retrotransposons. But it is its own counter argument. First, I suspect Milosav would have strongly suspected the thing was a retroposon after the first twenty, thirty sequences. Second, it didn't take the sequencing of the yhuman genome to achieve this insight, merely the sequencing of (at most, let's be generous) 1000 ALu sequences. And proof that the rest originated from several active sequences cannot come from knowing the whole genome sequence. It could on the other hand be conclusively generated by taggin one of the blighters and watching it move (which I think has recently been done... at a bench). The whole trouble with sequencing the genome is that the bloody thing doesn't come annotated. The number of hours required at the bench before we even have a sizable fraction of the info required to sensibly read the sequence a meaningful way is so large that it may be best to concentrate on that and let the total sequence come on slowly on the back burner. Let me just momentarily concede the possibility that it is a worthwhile goal for reasons of large-scale structure etc. etc. If we really must do this project, why not run a pilot first to see if it is worthwhile. Sequence say 10 Mbp of some little characterised but potentially interesting locus. That should be large enough for several domains worth. Look over the sequence and then tell me some of these weird and wonderful things that will impress me and astound your neighbours. Before going out and spending a billion smackers , let's just go out there and show what it can do. Will we discover new aspects of genome organisation? Regulatory regions? I doubt it. It will be probably be more like the large repository of tablets in Proto-Elamite discovered in the last century. A wealth of completely unreadable info. This is not to say the whole thing is trash. A high resolution set of markers will be extremely valuable and much cheaper to do than getting the sequence. A comprehensive contig library will also be priceless. But the sequence? I think there is also an ethical question involved. Someone's got to pay for all research. It applies whether it is tax dollars or the money raised by little old ladies working in charity shops. In my view, we had better think carefully before getting loads more dosh to pursue what is likely to be a large scale stamp-collecting exercise. Some day we may have to answer for what we did with it and if the answer is not satisfactory, expect severe cutbacks. So far, I reckon the public has been pretty patient. (S)He's paid for a lot of work over a fairly long period in mol. biol. with somewhat little to show for it. After all, I doubt if they're paying to let us do WHAT WE WANT TO DO. I fear that if we are going to continue to do what we want to do, we might do well to occasionally bear in mind that there's got to be some payout and perhaps even invest a little effort into getting what we do into practice for the poor sods who are paying for all this. The War on Cancer hype and all that got you increased funding but did it get the taxpayer radically improved treatment? The HIV industry got in another whack of cash which you and I know full well a fair proportion goes into activities that have very little association with AIDS but lots of interest to scientists. Now the human genome. If we don't watch it, someday we might just have to pay for our cynicism in all this. Anyway if anyone is interested in a little challenge, I have one based around EBV. As I have said previously, the genome is completely sequenced. The problem concerns whether a message that is known to run through a particular part of the genome encodes anything. Should prove an interesting exercise in whether even the availability of the complete sequence of a genome gets you anywhere on a real life mol. biol. problem. For more details, mail me. ============================================================================= David Huen I Dept. of Cancer Studies I Computer output is much like toilet paper University of Birmingham I It's continuous, perforated and ends up B15 2TJ I full of ****. United Kingdom I JANET: HUENSD@UK.AC.BHAM.VAX1 I Voice: (021) 414-4483 I Fax : (021) 414-4486 I =============================================================================
kristoff@genbank.bio.net (David Kristofferson) (02/15/91)
> I am a bit of a newcomer to the apparently violent world of bio-matrix. Is > it always like this? NO! This newsgroup typically blows up for about a week or two about some quasi-scientific issue and then goes back to sleep for a couple of months. The sudden bursts of mail usually cause a few people to sign off each time, so no one need flog themselves for misdeeds. I would like to echo Peter Karp's suggestion that people discuss their research (or possibly controversies about published research if you don't want to spill any beans). Sincerely, Dave Kristofferson GenBank Manager kristoff@genbank.bio.net