[bionet.general] Sequence submission discussion

BROE@AARDVARK.UCS.UOKNOR.EDU (Bruce Roe) (12/05/89)
David,
In regard to the recent discussions regarding Genome Data Submission
and NIH Policy, I also was at the Wolf Trap meeting and was bothered
by two points. 

1. The submission issue and hoarding of data.
2. A discussion of how many errors should be tolerated in the sequence
   data in the human genome project.

	This message addresses my concerns about the first issue and
another message to follow addresses the second issue.

	Regarding the first item, those of us who sequence a reasonable
amount are well aware of the timely submission of data to GenBank or
EMBL and most (although I have no statistics) submit the data in an
unannotated form via disk and snail_mail.  With the new GenBank-OnLine
now replacing many aspects of BIONET, data submission is just an e-mail
message away.  Even without GenBank-OnLine, it has been possible for
quite a time now to e-mail sequence data directly to GenBank (and EMBL)
via their network addresses.

	For GenBank the address was and I'm fairly sure still is:

	 	gb-sub%life@lanl.gov

	For EMBL submissions info can be obtained from the EMBL server:
at DATALIB@EMBL.  Those of us with the UWGCG program package can obtain
a copy of the sequence submission form on-line from our host VAX by 
fetching the file  GENBANK.FORM  and then filling in the blanks using
one of the VAX editors.  The problem of the actual submission of final,
published sequence data hopefully will go away once the community becomes
more computer literate and discovers e-mail. I know I'm preaching to the
believers.
	As you and I (and others) have discussed many times, there is
a large frustration among the sequencing community regarding availability
to access the sequence data once it has been submitted to one of the data
bases.  Both GenBank and EMBL are moving in the direction of eventual daily
updates to their databases and in the very near term weekly updates should
be a reality.  It is very frustrating to see a sequence published in a
journal that requires GenBank/EMBL submission and not be able to obtain
the data from an on-line source.  Both databases are addressing this issue
and I have indications that this should not be a problem in the near future.
As you indicated in an earlier posting, the GenBank database now lags only
a month or so behind the journals in entering published data into the database.
With direct access to this data on-line or via ftp, in a timely way, IG
is providing the kind of service we all can really use.  We still have
the questions of why the data was not submitted directly to the database
by the group which published the sequence and thus why those at GenBank
must enter the data manually?  I know you have tried to get everyone to
submit their final data to GenBank when they submit their manuscript and
wish all journals would make simultaneous submission to the database a
prerequisite for publication. I just do not understand why folks do not
do this!!!
	As for hoarding of data, this too shall pass.  We are judged by,
among other things, our publication record when it comes time for our
grant reviews.  In the world of "Publish or Perish" the data is not data
until it has been published. If the sequences we complete are not published
then Study Sections are less likely to rank a grant renewal as high as 
they would if publications result from the work.  This is a self limiting
process and those known for hoarding data and not publishing it should and
will obtain their just reward.

	Bruce A. Roe
	Professor of Chemistry and Biochemistry
        INTERNET: BROE@aardvark.ucs.uoknor.edu
	AT&TNET:  405-325-4912
	SnailNet: Department of Chemistry and Biochemistry
		  University of Oklahoma
		  620 Parrington Oval, Rm 208
		  Norman, Oklahoma 73017