[bionet.molbio.genbank] GenBank software

CZJ@CU.NIH.GOV (08/08/90)

As one of the Project Officers of the GenBank Contract, I thought it
would be appropriate to respond to some of the issues raised by Dan
Davison in his "I must have been out of the room" messages.

The history of the new features table goes back several years.  I
remember the first GenBank Advisors's meeting I attended in October
1985.  The subject that came up then, and at every advisors' meeting
hence, was the difficulty in translating an entry from EMBL to GenBank
format and vice versa.  This difficulty had an obvious impact in
delaying the incorporation of EMBL data into GenBank.  The major
impediments to automatically translating an EMBL entry into a GenBank
were incompatibilities in the features table formats.  Neither GenBank
nor EMBL staffs felt that they represented features adequately.
Therefore, a series of meetings, which began with a workshop involving
members of the scientific community, was held to design a features
table format that could better represent the complexities of
biological features.  The result is the new features table effective
with release 64.  I would point out that there was plenty of advance
warning.  This warning included several example entries.

I am sure that David Benton can comment better on the advantages of
the new format.  Rather I would like to discuss briefly Dan's wish to
have GenBank personnel write code that parses the new features table
and distribute it free.  First I would begin by reminding the
community that the purpose of GenBank has been to collect and
distribute data.  The development of sofware was left to the
community.  The success of this policy can be seen by the development
of several excellent packages both by commercial firms and the public.
If you are privy to the problems of the Cambridge Small Molecule
Crystallographic database, you know the problems of linking software
development to database distribution.  Thus it is consciously beyond
the scope of the GenBank contract to develop software to parse the
features table.

Despite this position, it is obvious that some software, i.e. Jim
Fickett's program to translate GenBank is in use by GenBank and will
have to be modified to parse the new features table.  Although it is
perfectly feasible to distribute this program, the problem for GenBank
is one of support.  That is some one has to answer the many questions
about installation, why the program will not work on a specific
machine, modifications,etc.  There is also the question of
updates--that is code used internally is often modified to meet
specific purposes often associated with new releases.  The bottom line
is that the responsible distribution of "free" software can be an
expensive proposition.  Right now the only way for the GenBank to meet
this obligation would be to cut down on services elsewhere.

I hope these comments have been helpful.  One of the gratifying things
to me has been the community spirit among GenBank users and the
willingness to distribute software that takes advantage of the GenBank
resource.  I trust this will continue in the future.

Jim Cassatt

roy@phri.nyu.edu (Roy Smith) (08/09/90)

CZJ@CU.NIH.GOV (Jim Cassatt) writes:
> I would begin by reminding the community that the purpose of GenBank
> has been to collect and distribute data.  The development of sofware
> was left to the community.

	I pretty much agree with Jim that the GenBank folks are better off
trying to do well what they are supposed to do (and by and large, I think
they do an admirable job), and not expend limited resources on projects
outside their scope.  I also agree that the job of a database maintainer is
to distribute the data in a moderately raw form, not to over-package it
with software that fits their pre-existing notion of what the data are
going to be used for.  So, I guess I have to agree that it's not really
GenBank's job to write the database parsing software Dan suggests.

	So, given that, I propose that those of us who are interested (and
I'm certainly one of them) get together privately by email and work on it.
If folks from GB and/or IG want to be in on it, fine.  If not, that's fine
too.  We could come up with a set of routines for parsing a locus and make
them publicly available to whoever wants them on an "as is" basis.  If you
are interested, write me and/or this mailing list/newsgroup.  If enough
people are interested, I'll set up a mailing list redistribution alias and
we can start arguing about the problem. :-) If nobody is interested, I'll
assume it was a dumb idea to begin with and drop it.
--
Roy Smith, Public Health Research Institute
455 First Avenue, New York, NY 10016
roy@alanine.phri.nyu.edu -OR- {att,cmcl2,rutgers,hombre}!phri!roy
"Arcane?  Did you say arcane?  It wouldn't be Unix if it wasn't arcane!"

kristoff@genbank.BIO.NET (David Kristofferson) (08/09/90)

Roy,

	So that you can get input from the various database staff
persons at LANL, EMBL, DDBJ, and IG, it might be a good idea to hold
your discussions on this forum (assuming that people do become serious
about really doing this and are just philosophizing).  I believe that
people at all of the above locations read this group, and traffic has
not been so overwhelming that it would create a nuisance.  In fact,
this is exactly the kind of issue that this newsgroup was designed
for.  I would be gratified to see this kind of communication between
the databanks and the user community expanded greatly.  Although it
may not be our mission to write the software, it would be a big
mistake not to tap into the accumulated expertise available here.  I
hope that the staff at the other databanks concur in this opinion.
-- 
				Sincerely,

				Dave Kristofferson
				GenBank On-line Service Manager

				kristoff@genbank.bio.net

smith@mcclb0.med.nyu.edu (08/09/90)

In article <9008081410.AA02507@alw.nih.gov>, CZJ@CU.NIH.GOV writes:
> As one of the Project Officers of the GenBank Contract, I thought it
> would be appropriate to respond to some of the issues raised by Dan
> Davison in his "I must have been out of the room" messages.
...
> Despite this position, it is obvious that some software, i.e. Jim
> Fickett's program to translate GenBank is in use by GenBank and will
> have to be modified to parse the new features table.  Although it is
> perfectly feasible to distribute this program, the problem for GenBank
> is one of support.  

There is nothing morally wrong in distributing a program and saying that it 
is being provided as an example to aid development, and that it is 
distributed 'as-is'.  This eliminates the support issue.  If people have 
questions, they can post them to this, or some other list, for resolution. No 
one HAS to respond.  Everyone can benefit from this, even the people at 
GenBank who may pick up an idea or two.

I think it is important to emphasize the concept of a 'community of peers'. 
People should bee free to make good-faith 'submissions' of program, even if it 
has drawbacks they are aware of.  The community then decides whether it is 
able to use it.  The individual's choice.  It benefits no one to withhold a 
program when the sole reason for doing so is that they cannot provide 'phone 
support.

+---------------------------------------------------------------------------+
|Ross Smith, Cell Biology,  NYU Medical Center,  550 First Ave.,  NYC, 10016|
|Phone: (212) 340-5356: FAX: (212) 340-8139 (Alternate NYUMC) (212) 340-7190|
|E-Mail:  SMITH@NYUMED.BITNET (BITNET),  SMITH@MCCLB0.MED.NYU.EDU (Internet)|
+---------------------------------------------------------------------------+

roy@alanine.phri.nyu.edu (Roy Smith) (08/09/90)

Dan Davison send this to me, and requested I forward it to this list:
----------------

> CZJ@CU.NIH.GOV (Jim Cassatt) writes:
> > I would begin by reminding the community that the purpose of GenBank
> > has been to collect and distribute data.  The development of sofware
> > was left to the community.

No question.


> So, I guess I have to agree that it's not really
> GenBank's job to write the database parsing software Dan suggests

I did not say that.  I said, paraphrasing, which of the people
responsible for breaking all existing sw (and there is sw that read
the feature table, both commerical and pd/private) are going to *help*
by doing something about the problems they've created?


> 	So, given that, I propose that those of us who are interested (and
> I'm certainly one of them) get together privately by email and work on it.
> If folks from GB and/or IG want to be in on it, fine.  If not, that's fine
> too.  We could come up with a set of routines for parsing a locus and make
> them publicly available to whoever wants them on an "as is" basis.
[...]

Senor J. Ramon  has already very kindly offered his routines; I would
also like to be part of the list Roy proposed, altho anyone who knows
me or has seen my code knows that I can't program myself into a paper
bad, and parsing is completely beyond me (but I'm trying!).

Thanks again to both Senor Ramon and Roy on this.

dan
-- 
dr. dan davison/dept. of biochemical and biophysical sciences/univ. of
Houston/4800 Calhoun/Houston,TX 77054-5500/davison@uh.edu/DAVISON@UHOU
Disclaimer: As always, I speak only for myself, and, usually, only to
myself.
----------------
--
Roy Smith, Public Health Research Institute
455 First Avenue, New York, NY 10016
roy@alanine.phri.nyu.edu -OR- {att,cmcl2,rutgers,hombre}!phri!roy
"Arcane?  Did you say arcane?  It wouldn't be Unix if it wasn't arcane!"

kristoff@genbank.BIO.NET (David Kristofferson) (08/10/90)

> There is nothing morally wrong in distributing a program and saying that it 
> is being provided as an example to aid development, and that it is 
> distributed 'as-is'.  This eliminates the support issue.  If people have 
> questions, they can post them to this, or some other list, for
> resolution.

I guess there is one more item to attend to ...

"Support" means far more than just answering the phone.  It also means
updating the code so that it continues to serve as an example if the
databank format changes.  We are confronting this issue right now for
Fickett's code which generates GenPept.  Any additional code such as a
features table parser is subject to similar revisions.  We could
be nice guys and undertake these additional revisions as volunteer
work, but then this would raise problems if other work to which we are
currently committed did not get done.  As Ross is aware through our
collaboration on bionet.molbio.genbank.updates, we do take on these
kinds of tasks but have to draw a line somewhere.  This job is
too big for us to bite off right now.
-- 
				Sincerely,

				Dave Kristofferson
				GenBank On-line Service Manager

				kristoff@genbank.bio.net