BROE@AARDVARK.UCS.UOKNOR.EDU (Bruce Roe) (12/05/89)
David, In regard to the recent discussions regarding Genome Data Submission and NIH Policy, I also was at the Wolf Trap meeting and was bothered by two points. 1. The submission issue and hoarding of data. 2. A discussion of how many errors should be tolerated in the sequence data in the human genome project. This message addresses my concerns about the first issue and another message to follow addresses the second issue. Regarding the first item, those of us who sequence a reasonable amount are well aware of the timely submission of data to GenBank or EMBL and most (although I have no statistics) submit the data in an unannotated form via disk and snail_mail. With the new GenBank-OnLine now replacing many aspects of BIONET, data submission is just an e-mail message away. Even without GenBank-OnLine, it has been possible for quite a time now to e-mail sequence data directly to GenBank (and EMBL) via their network addresses. For GenBank the address was and I'm fairly sure still is: gb-sub%life@lanl.gov For EMBL submissions info can be obtained from the EMBL server: at DATALIB@EMBL. Those of us with the UWGCG program package can obtain a copy of the sequence submission form on-line from our host VAX by fetching the file GENBANK.FORM and then filling in the blanks using one of the VAX editors. The problem of the actual submission of final, published sequence data hopefully will go away once the community becomes more computer literate and discovers e-mail. I know I'm preaching to the believers. As you and I (and others) have discussed many times, there is a large frustration among the sequencing community regarding availability to access the sequence data once it has been submitted to one of the data bases. Both GenBank and EMBL are moving in the direction of eventual daily updates to their databases and in the very near term weekly updates should be a reality. It is very frustrating to see a sequence published in a journal that requires GenBank/EMBL submission and not be able to obtain the data from an on-line source. Both databases are addressing this issue and I have indications that this should not be a problem in the near future. As you indicated in an earlier posting, the GenBank database now lags only a month or so behind the journals in entering published data into the database. With direct access to this data on-line or via ftp, in a timely way, IG is providing the kind of service we all can really use. We still have the questions of why the data was not submitted directly to the database by the group which published the sequence and thus why those at GenBank must enter the data manually? I know you have tried to get everyone to submit their final data to GenBank when they submit their manuscript and wish all journals would make simultaneous submission to the database a prerequisite for publication. I just do not understand why folks do not do this!!! As for hoarding of data, this too shall pass. We are judged by, among other things, our publication record when it comes time for our grant reviews. In the world of "Publish or Perish" the data is not data until it has been published. If the sequences we complete are not published then Study Sections are less likely to rank a grant renewal as high as they would if publications result from the work. This is a self limiting process and those known for hoarding data and not publishing it should and will obtain their just reward. Bruce A. Roe Professor of Chemistry and Biochemistry INTERNET: BROE@aardvark.ucs.uoknor.edu AT&TNET: 405-325-4912 SnailNet: Department of Chemistry and Biochemistry University of Oklahoma 620 Parrington Oval, Rm 208 Norman, Oklahoma 73017