gunnell@FCRFV1.NCIFCRF.GOV ("Gunnell, Mark") (02/12/91)
In article <9102111731.AA00773@genbank.bio.net> Ellington@frodo.mgh.harvard.edu (Deaddog) writes: > In article <9102111606.AA25622@genbank.bio.net> > gribskov@FCRFV1.NCIFCRF.GOV ("Gribskov, Michael") writes: > > I suppose the that the cataloging of galaxies is a similar boondoggle, > > in spite of the fact that this effort is currently leading to some of > > the most important and interesting progress in astrophysics. I guess > > the real problem with these kinds of projects is that the day-to-day > > work is tedious, and results only come in the long term. Strange how > > much of science falls in that category isn't it? > > Ah, Michael, you really should ask for my opinion rather than just > making one up for me. > > Catalogue them galaxies! Discover the secrets of cosmology; see how stars > form; determine the mass of the Universe and how it is distributed; find > amazing physical phenomena never before observed by human eyes. Yes, > all these and more can be yours if you just continue to fund astrophysics. > A noble and worthy cause. > > Make me a list of similar worth that has to do with the Genome Boondoggle. Catalogue all human genes! Discover the functions of mapped genes; see how genes evolve; evaluate molecular evolution theories and how species originate; find amazing biological phenomena never before observed by human eyes. Yes, all these and more can ... etc.,etc. 8-) Mark A. Gunnell gunnell@ncifcrf.gov
elmo@troi.cc.rochester.edu (Eric Cabot) (02/12/91)
In article <9102111942.AA08834@genbank.bio.net> gunnell@FCRFV1.NCIFCRF.GOV ("Gunnell, Mark") writes: >In article <9102111731.AA00773@genbank.bio.net> >Ellington@frodo.mgh.harvard.edu (Deaddog) writes: > >> >> Make me a list of similar worth that has to do with the Genome Boondoggle. > >Catalogue all human genes! Discover the functions of mapped genes; see how >genes evolve; evaluate molecular evolution theories and how species originate; >find amazing biological phenomena never before observed by human eyes. Yes, >all these and more can ... etc.,etc. 8-) You *must* be either kidding us or yourself! But seriously, item 1 is hardly possible, item 2 is probably not possible, and the remaining items are not even close to possible from a mere sequence determination of the (a?) human genome. -- =+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+= Eric Cabot | elmo@uhura.cc.rochester.edu "insert your face here" | elmo@uordbv.bitnet =+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=
Ellington@Frodo.MGH.Harvard.EDU (Deaddog) (02/12/91)
In article <9102111942.AA08834@genbank.bio.net> gunnell@FCRFV1.NCIFCRF.GOV ("Gunnell, Mark") writes: > Catalogue all human genes! Discover the functions of mapped genes; see how > genes evolve; evaluate molecular evolution theories and how species originate; > find amazing biological phenomena never before observed by human eyes. Ah, finally some meat. 1) Discover the function of mapped genes. The Genome Initiative is not necessary for this. If the phenotype is important, then a directed effort can be made to clone and sequence a given gene. The sequence of the human genome can of course be used to find the sequence of genes mapped *after* the genome sequence has been determined. However, I submit that mapping/ sequencing which targets a specific human diseases will proceed much faster and with less waste than the Genome Initiative itself. Further, I submit that theories having to do with developmental biology, organization of transcription units, regulatory phenomena, and so forth are most readily answered by testing specific hypotheses/cloning specific genes. The Genome Initiative is massive overkill for these answers (see also (3), below.) 2) See how genes evolve; evaluate molecular evolution theories and how species originate. As a card-carrying molecular evolutionist (burn him!), I agree that these are indeed mouth-watering problems. It is sad that they are receiving so little support now. But, given the Scourge of AIDS, this is perhaps understandable. What is not understandable is why the Herculean efforts required to sequence the Human Genome will yield more information than comparative sequence analysis of limited regions or of specific genes. Imagine a grant proposal in which I proposed to sequence lactate dehydrogenase genes from a huge diversity of organisms; this proposal would be at the very bottom of the funding heap. Yet it would say an immense amount about how genes evolve. Imagine a grant proposal in which I proposed to clone/sequence "all zinc finger proteins from three developmentally different organisms." This might be more fundable, would again yield an immense amount of information about protein evolution and perhaps transcription/gene regulation, but would not require the Human Genome Initiative. 3) Find amazing biological phenomena never before observed by human eyes. I think not. The genotype is relatively uninterpretable without corresponding phenotypes (which is why I support Drosophila, Coli, etc. sequencing but cannot get behind the human effort). The only biological phenomena that I can see being elucidated by the sequence of the human genome would be instances where phenotype = genotype; that is, selfish DNAs such as transposable elements. A very interesting phenomena, but one that would again probably be nuked by a review section if the experiment proposed was "I want to sequence the human genome so that I can know the distribution of transposable elements in an individual human." So: back to (one of) my original point(s): In an era of limited funding, directed research is essential. Let each of the putative benefits of the Genome Initiative be put up against other research proposals. Let the molecular evolutionists who want to study bacterial speciation fight for the same money as those that want the sequence of every repetitive element on each human chromosome. Let the developmental Drosophilists who are feeling the crunch compete against those who would assert that we can decode the series of steps by which an organ takes shape from the sequence of the human genome. Let biological phenomena from neural networks to nematodes compete against whatever 'new' biological phenomena the Genome Initiative will proport to discover. There is so much *INTERESTING* science to be done. And so little of it will come from this genetic telephone book. The question comes back to how to get the most bang out of your buck. For each question you say the Genome Initiative will answer, I believe I can give you a dozen that are cheaper and more interesting. Non-woof (Getting less vitriolic, but not less arrogant or obnoxious. Who knows, maybe I'll even start using smileys or something.) (Hmmm. I actually had a thought. How about we put a little check-off box on NIH grant proposals--you know, just like on your tax returns. "I would like to earmark $5 for the Human Genome Initiative." That way, everyone that thinks it's a great idea can devote some of their funding to it. Then they could get discounts on the sequences of genes of interest when they start flowing out of the sequencing sweatshops. Just an idea.)
toms@fcs260c2.ncifcrf.gov (Tom Schneider) (02/13/91)
In article <12145@ur-cc.UUCP> elmo@troi.cc.rochester.edu (Eric Cabot) writes: >In article <9102111942.AA08834@genbank.bio.net> gunnell@FCRFV1.NCIFCRF.GOV ("Gunnell, Mark") writes: >>In article <9102111731.AA00773@genbank.bio.net> >>Ellington@frodo.mgh.harvard.edu (Deaddog) writes: >> >>> >>> Make me a list of similar worth that has to do with the Genome Boondoggle. >> >>Catalogue all human genes! Discover the functions of mapped genes; see how >>genes evolve; evaluate molecular evolution theories and how species originate; >>find amazing biological phenomena never before observed by human eyes. Yes, >>all these and more can ... etc.,etc. 8-) >You *must* be either kidding us or yourself! >But seriously, item 1 is hardly possible, item >2 is probably not possible, and the remaining items are not even >close to possible from a mere sequence determination of the (a?) >human genome. I think that Mark is exactly correct, and you have missed the point. Having a huge database full of human sequences opens vistas for those of us who know how to use statistical tools to analyse sequences. There are many things that can be done. Some of them include learning how to identify genes from raw sequences alone. Predictions can be tested - which leads to rapid discovery of new genes. I have been involved in two cases of this already (see Stormo et al NAR 10:2997 1982 for the first example of gene identification by computer; the second one is in preparation), and it will certainly will happen more as people use neural nets more. A straight sequencing of the genome will avoid the terrible biases that we currently have in the GenBank database. For example, the database is missing the insides of introns. If you think that these are not important, then you may well be in for some super surprises later. The phrase "junk DNA" is a statement of ignorance, not scientific fact. People currently chop off the bases near the 3' sides of introns and don't report them in the database. The proof is that they often end 10, 20 or 30 bases from the splice junction. This would not happen if people reported all their data. Unfortunately, this means that people have thrown out important parts of splice junctions BECAUSE THEY THOUGHT THEY WERE UN-IMPORTANT. Do you follow? People think something is not important, so they don't report it in the database, or limit the reports, so nobody discovers that it IS important! Another example is the reporting of only the coding sequence of a procaryotic gene, even though we KNOW that there is a region upstream (the Shine/Dalgarno) which is important for translational initiation. Any statistical analysis of human sequences must be done carefully to avoid biases from the highly over-represented immunoglobulin and MHC sequences. I'm sure you can think of other examples. A complete sequence, without any bias is the best way to get around this. I think that that alone justifies the project. The second major justification is the enormous boost to sequencing technology that the project is making. We are eventually going to be able to sequence everybody's DNA in a few minutes. This will have enormous medical implications, since it will remove much guess work from medicine. I also used to think that the project was foolish, but these reasons have convinced me that it is worthwhile. There is also the spirit of adventure. Fred Blattner once pointed out that it would be really neat (my words, not his) to have the entire sequence of E. coli - simply because it would be the first time that we knew the entire specification of a living organism. (Viruses don't count since they are dependent on the host.) >Eric Cabot elmo@uhura.cc.rochester.edu elmo@uordbv.bitnet Tom Schneider National Cancer Institute Laboratory of Mathematical Biology Frederick, Maryland 21702-1201 toms@ncifcrf.gov
Ellington@Frodo.MGH.Harvard.EDU (Deaddog) (02/13/91)
In article <2050@fcs280s.ncifcrf.gov> toms@fcs260c2.ncifcrf.gov (Tom Schneider) defends the faith: > learning how to identify genes from raw > sequences alone. Predictions can be tested - which leads to rapid > discovery of new genes. As does PCR amplification or hybridization: the analogue versions of your digital statistical analyses. The question is not whether some genes will be identified, the question is (a) how many could already be identified without the sequence of the genome, and (b) whether the (IMO paltry) number that remain be worth the enormous cost? Statisticians drool at the mounds of data to be created. Researchers who go begging want to shoot the statisticians. Rather than pistols at ten paces, how about each side trying to justify expenditures for the same set of money? > avoid the terrible biases that we > currently have in the GenBank database. I'm sorry, but this does not seem like a terribly important problem. GenBank is skewed. Big deal. It gets the job done. We find genes, we miss some stuff. Science slops along and we still find those self-splicing introns and centromeres and other cool things. Without the sequence of the human genome. And with many people happily employed (for now) producing gobs of worthwhile data. I mean, what's a good example of what we have missed? We know the Shine/Dalgarno sequences. We have learned far more from mutation than we would by sequencing a bacterial genome (note: sequencing the Coli genome is indeed a cool thing to do). And will the "insides of introns" generate data for 2 PNAS papers and a TIBS review, or will it actually be worth the billions of dollars it will take to properly correct this horrific accounting error? > I think that that alone justifies the project. Please, go speak to any faculty of any public university. Wear body armor. > The second major justification is the enormous boost to sequencing > technology that the project is making. Good sequencing technology stands on its own. It does not need the Genome Boondoggle to help it along. > We are eventually going to be able to sequence > everybody's DNA in a few minutes. Matrix-teers: Is this nuts or what? I've never seen this before, but if it is even remotely true, I'll eat the small plastic rats that reside on the top of my terminal. > There is also the spirit of adventure. There is also the whiff of despair pouring out of research labs across the U.S. Alleviate that stench, then sequence your genome. Non-woof
elmo@troi.cc.rochester.edu (Eric Cabot) (02/14/91)
In article <2050@fcs280s.ncifcrf.gov> toms@fcs260c2.ncifcrf.gov (Tom Schneider) writes: >I think that Mark is exactly correct, and you have missed the point. Having a >huge database full of human sequences opens vistas for those of us who know how >to use statistical tools to analyse sequences. There are many things that can >be done. Some of them include learning how to identify genes from raw (much stuff deleted) Ok, I agree that it is possible to use statistical methods to infer that a given sequence contains a "gene". If I read your perspective correctly (and ignoring the self-back patting) the main goal is to beef up the database so that we can find new genes, whether functional or not. I I'm sorry, but I just see that as cost effective, given that we won't have the slightest inkling of what most of these genes are supposed to do. > >A straight sequencing of the genome will avoid the terrible biases that we >currently have in the GenBank database. For example, the database is missing Oh really? Wouldn't you say that concentrating on coli, fly, worm, yeast human and maybe a plant species puts a bit of bias into the database? >the insides of introns. If you think that these are not important, then you >may well be in for some super surprises later. The phrase "junk DNA" is a >statement of ignorance, not scientific fact. People currently chop off the >bases near the 3' sides of introns and don't report them in the database. The >proof is that they often end 10, 20 or 30 bases from the splice junction. This >would not happen if people reported all their data. Unfortunately, this means >that people have thrown out important parts of splice junctions BECAUSE THEY >THOUGHT THEY WERE UN-IMPORTANT. Do you follow? People think something is not >important, so they don't report it in the database, or limit the reports, so >nobody discovers that it IS important! Another example is the reporting of (Nothing deleted because I am in complete agreement. Oh how I have ranted and raved about missing intron sequences.) But frankly, I don't follow if this is part of the defense of the genome project. Sure it'd be great to have chromosome long tracts of sequences to infer gemone organization but will we really be able to make sense out of it all using the sequence data alone? Take the case of upstream control regions, their significance was worked for the most part by experimental techinques. Those results are the stuff that are used to generate rules for sequence analysis. Not the other way around. > >The second major justification is the enormous boost to sequencing technology >that the project is making. We are eventually going to be able to sequence >everybody's DNA in a few minutes. This will have enormous medical implications Ok, that's a valid argument. There's nothing like technological advancement. -- =+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+= Eric Cabot | elmo@uhura.cc.rochester.edu "insert your face here" | elmo@uordbv.bitnet =+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=
BROE@AARDVARK.UCS.UOKNOR.EDU (Bruce Roe) (02/14/91)
Andrew, I'm sending you this directly rather than posting to the net. If you think it will add to the public discussion I'll post it, or modify it before posting............... I've been very quiet during the recent rash of discussion regarding funding, the Human Genome Project (HGP), etc. and enjoyed reading all the FLAMES, SLAMS, and APPOLOGIES as well as the more serious discussions. As I see it, the HGP is a very inovative way to increase funding opportunities for serious science and alot of serious, hard core science is being supported by the HGP. Having been on several Human Genome study sections I can assure the community that these are of the same rigor as other study sections I have served on (including Biochemistry and Physiological Chem. study sections). Only the BEST science is being supported after a thorough peer review. Andrew D. Ellington writes: > Let these projects compete in the normal grant pool: > if they are worthy they will be funded and science will be better off. This is the bottom line to his argument, and the REAL problem he has with the HGP. It is a fear that the HGP will drain funds from other research programs. The truth is that it will, but not in the way he and others think. Read on please..... I would argue that Dr. Ellington really doesn't want the HGP grants to be put in the general grant pool and compete with others BECAUSE, the grants I've reviewed were so good and filled with so much real science and would be given such high priority scores that they not only would be funded but would be at the top of the heep. Sure, some of the grants were cow-dung and given low scores but others contained extremely exciting science and given priority scores accordingly. The rigor of grant review at the HGP study sections is at the highest standards and even higher than other study sections I've been on. The members of the HGP study sections are scientists such as ourselves with diverse interests but with the commom goal of rigorously reviewing each proposal and scoring on merit. Dr. Ellington is way off base here and may be should re-think his position based on the above. Dr. Ellington also writes: > If not, at least we won't have the built-in impetus to clone and > sequence random DNA for no good reason. This is pure crap. No one is sequencing random DNA for no good reason. Andrew, the following is FYI: At this point in the HGP very little human DNA sequencing is being supported. The bulk of the efforts, on the ACTUAL SEQUENCING projects is aimed at model organisms, E. coli, Yeast, C. elegans, and Mycobacteria. The work on mammalian genomes mostly involves mapping, determining STS's, etc. A couple of groups, us included, are attempting to sequence small (ca. several hundred thousand base pairs) regions of extreme biological significance. Our bit is the c-abl gene on chrom. 9 and the bcr gene on chrom. 22, which are involved in the chromosomal translocation which cause the major forms of leukemia. Our purpose is to understand the sequences and subsequent events which result in this biological phenomenon (chromosomal translocation). That's hard core science as I see it and that's the kind of stuff given the highest scores by peer reviews. To address this issue we have to figure out how to sequence almost half a million bases without going competely crazy and without creating a "Sequencing Sweat Shop". With the present sequencing approaches, we only can do this if we first design the experiments aimed at understanding the physics of how we can improve the separation of nested fragment sets generated during the dideoxynucleotide sequencing reaction, the biochemical parameters (Km's, Vmax's, and Kd's, etc) for various DNA polymerases and the reactions they catalyze, all the way to develop new algorithms for obtaining relevant information from the massive data we will generate. There's science all along the way and many guestions for which we do not yet know the answer. The HGP also is bringing a new group of scientists into biology. These are physicists, engineers, chemists and others who are investigating new and more efficient ways of maping and sequencing. In the long term it may be possible to "sequence a single human's genome in a few minutes" but clearly this is a long way off. Yes, PCR, hybridization blots, etc are clinically useful today but for the future who knows. Automated sequencing of PCR fragments in every hospital within a decade? When you drive from Boston to New York City you need a road map. That's really what we will be getting from the HGP, a road map of every gene, every alu sequence, every intron, every CG island, etc. This will be extremely useful information and the cost will be much less than if done piece meal. Speaking of costs, there is a big debate within the sequencing community and the HGP regarding the actual real cost of sequencing. Believe it or not, the REAL cost of sequencing in the average molecular biology lab. is over $10/base final sequence. You may not believe it since we're not used to calculating costs including indirect costs, equipment already in place, etc >$10/base is a good number. For those of us whose labs are involved in lots of sequencing, the cost is something between $2.50 and $3.50/base final sequence and it will take alot to cut the cost to < $0.50/base within the first 5 years of the HGP. If we (the "professional sequencers") can cut the cost by a factor of 20, think of the benefit to the average mol. biol. lab. That's more money for Joe Biologist to spend doing other experiments. I suggest that as a critic of the Human Genome Project you should: 1. You get your facts straight and see who and what is being supported. 2. You get involved in this project and help us solve the basic, fundamental questions which MUST be solved before we realistically can begin to actually sequence and understand what we've sequenced. There are many new scientific observations that will come out of this project and many new scientific questions for which experiments must be designed. We now must begin to remove our heads from the sand and start thinking of how we will address these scientific issues rather that wasting our time on the political discussion. By the way, don't take those "remove my name from the list" messages personally. They occur all the time and I can't see how this discussion has in any way prompted more of them. Best regards, --Bruce Roe
toms@fcs260c2.ncifcrf.gov (Tom Schneider) (02/15/91)
In article <5714@husc6.harvard.edu> Ellington@Frodo.MGH.Harvard.EDU (Deaddog): >In article <2050@fcs280s.ncifcrf.gov> toms@fcs260c2.ncifcrf.gov (Tom >Schneider): >> learning how to identify genes from raw >> sequences alone. Predictions can be tested - which leads to rapid >> discovery of new genes. > >As does PCR amplification or hybridization: the analogue versions of your >digital statistical analyses. Wrong. Those techniques only allow one to jump from previously identified sequences in other species to the human sequence. This is a wonderful thing, but it does not allow one to take a pure raw sequence and identify the genetic control systems in it. The difference is that those techniques are only techniques, not theoretical understanding. And if you are going to poo-pa theoretical understanding, then I have some papers for you to read! Start with: @article{StormoPerceptron1982, author = "G. D. Stormo and T. D. Schneider and L. Gold and A. Ehrenfeucht", title = "Use of the `Perceptron' algorithm to distinguish translational initiation sites in {E. coli.}", year = "1982", journal = "Nucl. Acids Res.", volume = "10", pages = "2997-3011"} > The question is not whether some genes will >be identified, the question is (a) how many could already be identified >without the sequence of the genome, and (b) whether the (IMO paltry) >number that remain be worth the enormous cost? I'm sure that we can continue on the blind route we are following and find lots of interesting things eventually. The US road system comes to mind. Sure, we could have survived without a network of major roads. But having started on the big project, we were able to become much more integrated as a society, and now it is hard to imagine not having superhighways (or are they merely PARKways? And why is the place one parks the car in the DRIVEway?? :-). Similar things could be said about a uniform telephone system: we have (had??) the best in the world because people at Bell labs thought big. A third example is the improvement in making maps that Landsat and other satellites have given us. And, yes, Arpanet turned into internet. In all these cases we start off ad hoc and then eventually learn to do things systematically. Consider the cow paths you use to get to work! (I refer to the roads of Boston.) Would you like to use muddy winding paths? The genome project is merely a recognition that we are close to the time that we can make our maps in a direct logical way rather than piece meal. >Statisticians drool at the mounds of data to be created. And so might the rest of the biologists. They can use the data to direct their experiments more effectively. If they are afraid of math and computers (is that your problem?? :-) then there are plenty of theoretical-types whom they can team up with. >> avoid the terrible biases that we >> currently have in the GenBank database. > >I'm sorry, but this does not seem like a terribly important >problem. GenBank is skewed. Big deal. It gets the job done. >We find genes, we miss some stuff. Science slops along and >we still find those self-splicing introns and centromeres and >other cool things. Without the sequence of the human genome. >And with many people happily employed (for now) producing >gobs of worthwhile data. The problem is here, and getting worse. You apparently haven't tried to make a consistent dataset from the data in GenBank. It's a tough job! The point about the genome project is that we don't need to miss anything anymore. You seem to have the idea that some genes are not important, and that 'junk' DNA exists in the genome. Consider that this merely is a way for you to express to the rest of us how ignorant you are. (We are also, but we admit it. Do you admit that you are ignorant?) >I mean, what's a good example of what we have missed? We know >the Shine/Dalgarno sequences. Well, you missed the other statistically important features that were discovered by looking at the sites more carefully. See: @article{Gold1981, author = "L. Gold and D. Pribnow and T. Schneider and S. Shinedling and B. S. Singer and G. Stormo", title = "Translational initiation in prokaryotes.", year = "1981", journal = "Annu. Rev. Microbiol.", volume = "35", pages = "365-403"} @article{StormoInitiation1982, author = "G. D. Stormo and T. D. Schneider and L. M. Gold", title = "Characterization of translational initiation sites in {{E. coli.}}", year = "1982", journal = "Nucl. Acids Res.", volume = "10", pages = "2971-2996"} @article{Schneider1986, author = "T. D. Schneider and G. D. Stormo and L. Gold and A. Ehrenfeucht", title = "Information content of binding sites on nucleotide sequences", year = "1986", journal = "J. Mol. Biol.", volume = "188", pages = "415-431"} > We have learned far more from >mutation than we would by sequencing a bacterial genome (note: >sequencing the Coli genome is indeed a cool thing to do). This is a completely flip statement, with no foundation since you didn't quantitate your answer and the experiment has not been done. (But I do agree that getting that sequence will be cool.) Genetics is certainly a powerful way to approach biological problems. But once one has defined a biolgically interesting system, direct methods can produce answers that would be difficult if not impossible to get by genetics. For example, the sequence of a gene, or exactly what bases are important for a promoter to function. See: @article{Schneider1989, author = "T. D. Schneider and G. D. Stormo", title = "Excess Information at Bacteriophage {T7} Genomic Promoters Detected by a Random Cloning Technique", year = "1989", journal = "Nucl. Acids Res.", volume = "17", pages = "659-674"} >And will the "insides of introns" generate data for 2 PNAS papers and >a TIBS review, yes. The work of Andrez Konopka is an example you seem to have missed. >or will it actually be worth the billions of >dollars it will take to properly correct this horrific accounting >error? Your mistake here is to suggest that the genome project would only give these data. It would give much other data also. >> The second major justification is the enormous boost to sequencing >> technology that the project is making. > >Good sequencing technology stands on its own. It does not need the Genome >Boondoggle to help it along. You have missed the point. The project will focus more people on the problems of sequencing, and the art will improve as a result. >> We are eventually going to be able to sequence >> everybody's DNA in a few minutes. > >Matrix-teers: Is this nuts or what? I've never seen this before, but >if it is even remotely true, I'll eat the small plastic rats that reside >on the top of my terminal. Ever heard of nanotechnology? Well, bone up if you are ignorant. I'll forgive you, you don't need to eat those rats. Tom Schneider National Cancer Institute Laboratory of Mathematical Biology Frederick, Maryland 21702-1201 toms@ncifcrf.gov
toms@fcs260c2.ncifcrf.gov (Tom Schneider) (02/15/91)
In article <12180@ur-cc.UUCP> elmo@troi.cc.rochester.edu (Eric Cabot) writes: >In article <2050@fcs280s.ncifcrf.gov> toms@fcs260c2.ncifcrf.gov (Tom Schneider) writes: >>I think that Mark is exactly correct, and you have missed the point. Having a >>huge database full of human sequences opens vistas for those of us who know how >>to use statistical tools to analyse sequences. There are many things that can >>be done. Some of them include learning how to identify genes from raw > >(much stuff deleted) >Ok, I agree that it is possible to use statistical methods to infer that >a given sequence contains a "gene". If I read your perspective correctly >(and ignoring the self-back patting) the main goal is to beef up the sorry about that. >database so that we can find new genes, whether functional or not. I >I'm sorry, but I just see that as cost effective, given that we won't >have the slightest inkling of what most of these genes are supposed to >do. I would not take it as the main goal; it is one of many worthy goals. Also, once a gene is located, it can be deleted (in mouse or whereever), and then one can play the usual powerful genetic tricks to figure out the function. The sequence is merely a starting point. Since it (supposedly!) contains all the information about the biology, in encripted form, it is nice to have it for starters. I've always been amazed when people would put off sequencing "their" gene for a long time, since one gets such a huge amount of solid data from the sequence. >>A straight sequencing of the genome will avoid the terrible biases that we >>currently have in the GenBank database. For example, the database is missing > >Oh really? Wouldn't you say that concentrating on coli, fly, worm, yeast >human and maybe a plant species puts a bit of bias into the database? Interesting point. I suppose it comes from my biased view of analyzing the binding sites from one species at a time so as to avoid the assumption that the recognizer (ie DNA binding protein, ribosome, polymerase, repressor or whatever) is the same in all species. (We know lots of cases where it's not.) So I am happy if I have a complete genome to work within. But after finishing one, one needs to do the others to answer evolutionary questions, and you are right, there is a huge diversity out there to be sequenced. So until we can sequence genomes quickly (minutes), I suppose the best we can do is to chose the few organisms which have had lots of good genetics done on them. I'm glad to see that these other organisms are considered part of the project! When I first heard of the project I disliked it because I thought that coli wouldn't get done first as a 'pilot'. >>the insides of introns. If you think that these are not important, then you >>may well be in for some super surprises later. The phrase "junk DNA" is a >>statement of ignorance, not scientific fact. People currently chop off the >>bases near the 3' sides of introns and don't report them in the database. The >>proof is that they often end 10, 20 or 30 bases from the splice junction. This >>would not happen if people reported all their data. Unfortunately, this means >>that people have thrown out important parts of splice junctions BECAUSE THEY >>THOUGHT THEY WERE UN-IMPORTANT. Do you follow? People think something is not >>important, so they don't report it in the database, or limit the reports, so >>nobody discovers that it IS important! Another example is the reporting of >(Nothing deleted because I am in complete agreement. Oh how I have ranted >and raved about missing intron sequences.) But >frankly, I don't follow if this is part of the defense of the genome project. >Sure it'd be great to have chromosome long tracts of sequences to infer >gemone organization but will we really be able to make sense out of >it all using the sequence data alone? Take the case of upstream control >regions, their significance was worked for the most part by experimental >techinques. Those results are the stuff that are used to generate rules >for sequence analysis. Not the other way around. That's because theoretical concepts have not been strong enough to date. I think that this will change. Not to be back patting (will you excuse me?? :-), but the example I know best is my own. E. coli ribosome binding sites have about 11.0 bits of pattern. I was pretty surprised to find that the information needed to locate the sites in the genome is about 10.6 bits! This correlation seems to hold for other genetic systems. The idea (working hypothesis) is that the amount of pattern at binding sites is in general just enough to locate the sites in the genome. Then I studied T7 RNA polymerase promoters and found that they contained too much sequence pattern (35 bits of pattern) compared to what is needed to locate them in the genome (16 to 17 bits). This meant that either the hypothesis was wrong or something interesting was happening at T7 promoters. Perhaps another protein binds there, and this accounts for the "excess" information. If so, I should be able to delete the excess information. It took me a while, but I did the experiment and found that 18 +/- 2 bits are all that the polymerase needs! So the hypothesis survived. The experiment would not have been done without the theoretical analysis. I have another case like this that I'm writing up now. So the idea of doing experiments first is only a tradition of molecular biology. Theoretical understanding can also play a role. References to this story can be found in: @article{Schneider.Stephens.Logo, author = "T. D. Schneider and R. M. Stephens", title = "Sequence Logos: A New Way to Display Consensus Sequences", journal = "Nucl. Acids Res.", volume = "18", pages = "6097-6100", year = "1990"} >Eric Cabot | elmo@uhura.cc.rochester.edu > :-):-):-):-):-):-):-):-) | elmo@uordbv.bitnet Tom Schneider National Cancer Institute Laboratory of Mathematical Biology Frederick, Maryland 21702-1201 toms@ncifcrf.gov
elliston@av8tr.UUCP (Keith Elliston) (02/16/91)
In article <2054@fcs280s.ncifcrf.gov>, toms@fcs260c2.ncifcrf.gov (Tom Schneider) writes: > And if you are going to poo-pa > theoretical understanding, then I have some papers for you to read! Start > with: > > author = "G. D. Stormo > and T. D. Schneider > and L. Gold > and A. Ehrenfeucht", > title = "Use of the `Perceptron' algorithm to distinguish translational > initiation sites in {E. coli.}", > year = "1982", > journal = "Nucl. Acids Res.", > volume = "10", > pages = "2997-3011"} . > author = "L. Gold > and D. Pribnow > and T. Schneider > and S. Shinedling > and B. S. Singer > and G. Stormo", > title = "Translational initiation in prokaryotes.", > year = "1981", > journal = "Annu. Rev. Microbiol.", > volume = "35", > pages = "365-403"} > > author = "G. D. Stormo > and T. D. Schneider > and L. M. Gold", > title = "Characterization of translational initiation sites > in {{E. coli.}}", > year = "1982", > journal = "Nucl. Acids Res.", > volume = "10", > pages = "2971-2996"} > > author = "T. D. Schneider > and G. D. Stormo > and L. Gold > and A. Ehrenfeucht", > title = "Information content of binding sites on nucleotide > sequences", > year = "1986", > journal = "J. Mol. Biol.", > volume = "188", > pages = "415-431"} . > author = "T. D. Schneider > and G. D. Stormo", > title = "Excess Information at Bacteriophage {T7} Genomic Promoters > Detected by a Random Cloning Technique", > year = "1989", > journal = "Nucl. Acids Res.", > volume = "17", > pages = "659-674"} Nice C.V. Tom.... Not be pick nits or anything, but this discussion seems to be going over board. Perhaps it is time to drop the dueling keyboards for a moment, and spend some time in the thought process. I would dearly love to see a sort of debate go on on this subject (genome sequencing), but would like to suggest a more refined type of discussion. Perhaps we could have an address by the proponents of the genome sequence initiative, formulated collectively by a group via e-mail. We could then also have an address by a group that is against the genome initiative, again collectively written by a group. This might reduce the flaming, and bring us all to a more in-depth understanding of the issues from both sides. We could then discuss the issues that are disparate and not have to be reduced to singular views or flaming wars. I suspect that a large number of the people reading these messages are like me.... in that I see many advantages to genome sequencing, but also quite a few shortcomings/disadvantes as well. So... who wants to be the head of the Pro Genome sequencing group (Tom??) How about the Anti Genome sequencing group? (Did I hear a woof.. or was it a non-woof?) Well.... Keith "where did Rob Harper go?" Elliston elliston@msdrl.com ... inews fodder ..... asdflkjadlfkljasdlfj klasdjfljalsdjfljasldfj lkasdjfljlsdjfljasdfj klasdjfljsdlfjads;lfkj klasjdfljlajsdfj;lajksdf klasjdfljalsjdfjasldjf klajsdfjlaskdjflajsdfl jaksdjflkasdjfjasdfjaj klasdjfkl;jklasjdfjadsf kljasd;fjakljsdf;lajsdf kljasd;fj;aljdsflk;ajsd;fl kljasdlfkjlakjsdflkjasldkfj kljasdfjlaksjdf;lkajsdf lkjasdfljlkadsjf;la kljasdfjljasdflkjklasdf kljalsd;fjl;ajsdf;lasdf klasjdf;ljasdf;la;lsdfj -- Keith O. Elliston elliston@av8tr.UUCP elliston@msdrl.com AA5A N9734U elliston@mbcl.rutgers.edu elliston@biovax.bitnet "Fly because you have to, to keep some semblance of sanity."
kristoff@genbank.bio.net (David Kristofferson) (02/16/91)
Keith, Sounds like an excellent suggestion if someone will take up the challenge. I suggest a chronological format such as the following: 1) Initial Arguments for and against the Genome Project - these should be posted essentially simultaneously without either side getting to review the others comments. I would be happy to hold the finished reports and release them together when both are completed. 2) Rebuttals to the above - again released simultaneously 3) return to free-for-all 8-)??? Any takers? Dave