[bionet.general] GenBank Curator Program

pgil%histone@LANL.GOV (Paul Gilna) (08/10/90)

Over the past two years, the primary thrust of the GenBank project has
been to improve the timeliness and completeness of the database.
Endeavours such as the interaction with journals, sequence submission
policies, and new submission software tools have brought us to the
point where we now receive 80% of our data in electronic form
directly from the scientific community and where our average turnaround
is now measured in weeks rather than months.  This progress in
soliciting direct and automated data submission, and in the RDBMS
conversion now free us to deal in greater detail with one of the most
important components of the database, the biology represented within
the annotation. In addition to our work to enrich the quality of the
annotation using our own annotation resources, we now wish to seek the
direct involvment of the members of the scientific community.


The following announcment represents the beginning of a program to aid
us to enhance the quality and integrity of the data represented in the
GenBank database.


This announcment will only be distributed via e-mail for the pilot phase,
however recipients are free to redistribute this notice. This notice is
being posted to both the GENBANK-BB and BIONEWS bulletin boards and we
apologize in advance for any redundancy across the two newsgroups.

Paul Gilna
GenBank Biology Domain Leader
Los Alamos National Laboratory
Los Alamos, NM 87545

pgil%histone@lanl.gov

Tel: (505) 665-2177
Fax: (505) 665-3493





			GENBANK CURATOR PROGRAM

	GenBank announces the pilot phase of the GenBank Curator
	Program. We are seeking suggestions for work to be done on the
	database in the form of informal proposals.  Authors of
	successful proposals will travel to Los Alamos and work with
	the annotation or computation staff to carry out their proposed
	project.

	Although GenBank has had some curators in the past, the advent
	of the GenBank RDBMS restructuring and its attendant interface,
	the Annotator's Workbench, allows us to implement an expanded
	program using a unified, intuitive annotation tool that
	provides the capability of remote use.

	The current program seeks to identify domains within the
	database that are in need of overhaul either at the sequence or
	at the annotation level. In addition, as part of ongoing
	development of the Sequence Validation Suite (SVS), a suite of
	software programs that will be used to check the validity of
	submitted sequence and annotation data, we have expanded the
	program to include software development associated with the
	SVS.

	We are looking to the readership of the molecular
	biology-oriented Bulletin Boards for proposals for curation on
	GenBank; if you are familiar with a domain or family of
	sequences represented within the database and with the existing
	annotation, and have some ideas on how the annotation could be
	improved (for example to reflect similarities in features
	across entries, to improve existing nomenclature, or to point
	out sequence merges), or on software that could be developed to
	aid data integrity and validation, then we would like to hear
	from you.

	In this pilot study, about six proposals will be selected to be
	implemented before the end of September, 1990. Based on the
	results of the study, we hope to take on about 30 or so more
	projects over the course of the next two years. The capability
	exists for continued interaction with the data bank staff on a
	consultant basis, using remote access facilities to the
	annotation software. The work will be carried out on site at
	Los Alamos. Travel (within the US for the pilot study), hotel
	costs, and subsistence will be covered. Project proposals will
	be reviewed by GenBank and NIH staff. Proposals should be
	submitted to Dr. Paul Gilna via e-mail (pgil%histone@lanl.gov)
	and should cover the following topics:

	o       Detailled description of work proposed, citing examples from
 		the database, where relevant, and of the scope of the 
		proposed work

	o       Justification of work in terms of benefit to community
		and data bank

	o       Estimation of time needed to conduct work at LANL

	o       Abbreviated CV including representative publications.

roy@phri.nyu.edu (Roy Smith) (08/11/90)

pgil%histone@LANL.GOV (Paul Gilna) writes:
> Authors of successful proposals will travel to Los Alamos and work with
> the annotation or computation staff to carry out their proposed project.

	I made an attempt to respond to this earlier today, over my morning
cup of tea.  Apparantly, enough caffiene had not yet entered my system,
since no trace of my article now exists.  So, let my try again.

	I wonder if it should really be necessary to travel to Los Alamos
to do the work.  The whole idea of building NSFNet, NREN, etc, is to bring
data and computing resources to people, not the other way around.  Private
email with Paul (between the first abortive posting and this one) has
caused me to mellow my original position, to the point where I agree that
an introductory in-person get together is A Good Thing, but I still feel
that it should be possible to do most of the work remotely.  Of course, I
understand the scenery in New Mexico is pretty nice, and you can't really
get that through a T1 wire.

	Aha!  I just figured out why my earlier posting got lost.  The
version of rn I'm using automagically turned the newsgroups line in my
followup of a bionet.general article into bionet.followup, a holdover from
what I think is long-obsolete usenet policy.
--
Roy Smith, Public Health Research Institute
455 First Avenue, New York, NY 10016
roy@alanine.phri.nyu.edu -OR- {att,cmcl2,rutgers,hombre}!phri!roy
"Arcane?  Did you say arcane?  It wouldn't be Unix if it wasn't arcane!"

kristoff@genbank.BIO.NET (David Kristofferson) (08/11/90)

> 	Aha!  I just figured out why my earlier posting got lost.  The
> version of rn I'm using automagically turned the newsgroups line in my
> followup of a bionet.general article into bionet.followup, a holdover from
> what I think is long-obsolete usenet policy.

We encountered that annoying problem with our vnews USENET software
too when we first put it up, but got rid of this troublesome
"feature."  Systems managers, beware!
-- 
				Sincerely,

				Dave Kristofferson
				GenBank On-line Service Manager

				kristoff@genbank.bio.net

pgil%histone@LANL.GOV (Paul Gilna) (08/13/90)

Roy Smith (roy@phri.nyu.edu) writes:

>	I wonder if it should really be necessary to travel to Los Alamos
> to do the work.  The whole idea of building NSFNet, NREN, etc, is to bring
> data and computing resources to people, not the other way around..  Private
> email with Paul (between the first abortive posting and this one) has
> caused me to mellow my original position, to the point where I agree that
> an introductory in-person get together is A Good Thing, but I still feel
> that it should be possible to do most of the work remotely.  Of course, I
> understand the scenery in New Mexico is pretty nice, and you can't really
> get that through a T1 wire.

The goal of the curator program is to enable exactly this--remote
access to the database by a curatorial team of scientists, using system
independant annotation tools running either on a local hardware
platform, or remotely on the GenBank database host.

I would emphasize that we are in the pilot phase of this program, and
as such are treading carefully, so that we may allow and adjust for the
need to be flexible in the implementation of the program. For those
involved in biological curation, there is a fair amount of training in
the annotation tools (the Annotators Workbench, our interface to the
RDBMS), and in our editorial standards and policies. For those involved
in software module development for the SVS (the sequence validation
suite), there is a need to familiarize oneself with the design features
of the RDBMS, that cannot (at this stage) be accomplished remotely.

Early feedback in the program suggested that scientists might be more
comfortable with performing the work in a discrete "chunk" of their
time, rather than drawn out over time, where more conflicts were likely
to occur, hence the emphasis on on-site work. We do not see this policy
as dogma however, and recognize that in the full program, a family of
interaction modes will likely prevail over any single design.

We have already had some favourable reaction from the community, and I
would encourage continued comments (public or private) on this
program.  We are very excited about the possibilities and impact on the
database that will come from this endeavour.
 

Finally, I cannot but concur with the perception that New Mexico is
"pretty nice" (masterly use of the understatement here, Roy!); what
more could one ask than for good science, good scenery, and good food?



Regards,

--paul