[bionet.molbio.bio-matrix] Scud for a Genome Project Patriot?

HUENSD@vax1.computer-centre.birmingham.ac.uk (02/14/91)

I am a bit of a newcomer to the apparently violent world of bio-matrix. Is
it always like this?

Anyway, I thought I'd risk abuse and pitch in my two-bits worth. The virus I
make my living off is the Epstein-Barr virus - completely and heroically
sequenced by Farrell and Barrell. 172 kb with only one error known to date.

What did this amazing feat gain for us? A fair bit - about 100 open reading
frames, many of unknown significance. Enough info to make some informed
guesses as to the organisation of herpesvirus genomes and in comparision to
other herpesviruses, a molecular basis for their taxonomy.

What did it not tell us? The interest in the virus is primarily as a possible
human transforming virus. The data obtained did not "find" the genes expressed
in transforming infection for us. Of the eight proteins expressed during
transforming infection, 2 were identifiable as ORFs in the sequence, 4 were
disrupted by introns in such a manner that while part of the deduced protein
sequence was present, the rest was in a nearby ORF. 2 were spliced from so
amny introns as to have been unidentifiable from the sequence and were found
by cDNA cloning approaches anyway. Promoters and enhancers? 6 of these genes
share two common promoters and by alternative splicing over 80 kb assemble
the required mRNAs. And in between are all the other ORF's. Predict the mRNAs?
Not likely. I do not believe it would have been possible to get any
of this info by poring over the sequence - try it, all you need is in files
HS4 and HS4-2.

How about the lytic cycle switch gene BZLF1? The location in which this gene
was believed to lie was known. The sequence of the region was known. It belongs
to the zipper protein family but But ultimately, it had to be found by cDNA
cloning. Why? There was an ORF in that region but the basic region
 characteristic
of the zipper proteins was added to the ORF by a splice. The dimerisation domain
has no leucines at required intervals but functions identically.

I freely admit I am thankful that EBV was completely sequenced. It helps me a
lot in designing primers, excising ORFS for expression etc., but if one thinks
that one can get the info you really desparately want on the biology of the
virus by poring over this sequence alone without doing lots of bench science,
one is very mistaken. And all this in a virus with constraints on genome size
and therefore every incentive to maximise coding content.  The human genome
is probably a much worse nightmare given the profligacy with which we higher
(and therefore less green?) animals use DNA.

If I had to resort to sequencing to make a living, I'd much rather do a cDNA
library from a tissue of some interest. Far less trash. If any gene is used
for tissue specific regulatory purposes in this tissue, a homologue will most
likely be used in another tissue for similar purposes anyway so you will
probably get most classes of proteins of interest even on this rather limited
pass.

There has also been suggestion that we will get incredible new insights into
genome organisation and higher order regulation from the sequence. Will we?
Take the globin locus. A fair chunk has been sequenced but the domains
responsible for regulation of the locus were obtained the hard way. The trouble
is even if it were there, would we recognise it? We do not have the knowhow
to identify a DCR or LCR (or whatever they call it these days) by sequence
and no amount of looking at a sequence will tell us this. In contrast, a few
person-years of good bench work may. SO why the hurry in getting large chunks
of virtually impenetrable info?

I fully agree with your correspondent that says junk DNA may not be junk,
repetitive DNA may have an important role. BUt in what way will the sequence
of the human genome tell you any more than you already suspect? Milosav
points out in support of the Genome project that the 1000 Alu sequences
in Genbank were in some way instrumental in defining Alus as retrotransposons.
But it is its own counter argument. First, I suspect Milosav would have
strongly suspected the thing was a retroposon after the first twenty, thirty
sequences. Second, it didn't take the sequencing of the yhuman genome to achieve
this insight, merely the sequencing of (at most, let's be generous) 1000
ALu sequences. And proof that the rest originated from several active sequences
cannot come from knowing the whole genome sequence. It could on the other
hand be conclusively generated by taggin one of the blighters and watching
it move (which I think has recently been done... at a bench).
The whole trouble with sequencing the genome is that the bloody
thing doesn't come annotated. The number of hours required at the bench before
we even have a sizable fraction of the info required to sensibly read the
 sequence
a meaningful way is so large that it may be best to concentrate on that and
let the total sequence come on slowly on the back burner.

Let me just momentarily concede the possibility that it is a worthwhile goal
for reasons of large-scale structure etc. etc. If we really must do this
project, why not run a pilot first to see if it is worthwhile. Sequence
say 10 Mbp of some little characterised but potentially interesting locus.
That should be large enough for several domains worth. Look over the sequence
and then tell me some of these weird and wonderful things that will impress
me and astound your neighbours. Before going out and spending a billion smackers
, let's just go out there and show what it can do. Will we discover new
aspects of genome organisation? Regulatory regions? I doubt it. It will be
probably be more like the large repository of tablets in Proto-Elamite
 discovered
in the last century. A wealth of completely unreadable info.

This is not to say the whole thing is trash. A high resolution set of markers
will be extremely valuable and much cheaper to do than getting the sequence.
A comprehensive contig library  will also be priceless. But the sequence?

I think there is also an ethical question involved. Someone's got to pay for
all research. It applies whether it is tax dollars or the money raised by
little old ladies working in charity shops. In my view, we had better think
carefully before getting loads more dosh to pursue what is likely to be a
large scale stamp-collecting exercise. Some day we may have to answer for
what we did with it and if the answer is not satisfactory, expect severe
cutbacks. So far, I reckon the public has been pretty patient. (S)He's paid for
a lot of work over a fairly long period in mol. biol. with somewhat little
to show for it. After all, I doubt if they're paying to let us do WHAT WE WANT
TO DO. I fear that if we are going to continue to do what we want to do, we
might do well to occasionally bear in mind that there's got to be some payout
and perhaps even invest a little effort into getting what we do into practice
for the poor sods who are paying for all this. The War on Cancer hype and
all that got you increased funding but did it get the taxpayer radically
improved treatment? The HIV industry got in another whack of cash which you
and I know full well a fair proportion goes into activities that have very
little association with AIDS but lots of interest to scientists. Now the
human genome. If we don't watch it, someday we might just have to pay for
our cynicism in all this.

Anyway if anyone is interested in a little challenge, I have one based
around EBV. As I have said previously, the genome is completely sequenced.
The problem concerns whether a message that is known to run through a particular

part of the genome encodes anything. Should prove an interesting exercise
in whether even the availability of the complete sequence of a genome gets you
anywhere on a real life mol. biol. problem. For more details, mail me.

=============================================================================
David Huen                    I
Dept. of Cancer Studies       I Computer output is much like toilet paper
University of Birmingham      I It's continuous, perforated and ends up
B15 2TJ                       I full of ****.
United Kingdom                I
JANET: HUENSD@UK.AC.BHAM.VAX1 I
Voice: (021) 414-4483         I
Fax  : (021) 414-4486         I
=============================================================================

kristoff@genbank.bio.net (David Kristofferson) (02/15/91)

> I am a bit of a newcomer to the apparently violent world of bio-matrix. Is
> it always like this?

NO!  This newsgroup typically blows up for about a week or two about
some quasi-scientific issue and then goes back to sleep for a couple
of months.  The sudden bursts of mail usually cause a few people to
sign off each time, so no one need flog themselves for misdeeds.  I
would like to echo Peter Karp's suggestion that people discuss their
research (or possibly controversies about published research if you
don't want to spill any beans).

				Sincerely,

				Dave Kristofferson
				GenBank Manager

				kristoff@genbank.bio.net