CZJ@CU.NIH.GOV (08/08/90)
As one of the Project Officers of the GenBank Contract, I thought it would be appropriate to respond to some of the issues raised by Dan Davison in his "I must have been out of the room" messages. The history of the new features table goes back several years. I remember the first GenBank Advisors's meeting I attended in October 1985. The subject that came up then, and at every advisors' meeting hence, was the difficulty in translating an entry from EMBL to GenBank format and vice versa. This difficulty had an obvious impact in delaying the incorporation of EMBL data into GenBank. The major impediments to automatically translating an EMBL entry into a GenBank were incompatibilities in the features table formats. Neither GenBank nor EMBL staffs felt that they represented features adequately. Therefore, a series of meetings, which began with a workshop involving members of the scientific community, was held to design a features table format that could better represent the complexities of biological features. The result is the new features table effective with release 64. I would point out that there was plenty of advance warning. This warning included several example entries. I am sure that David Benton can comment better on the advantages of the new format. Rather I would like to discuss briefly Dan's wish to have GenBank personnel write code that parses the new features table and distribute it free. First I would begin by reminding the community that the purpose of GenBank has been to collect and distribute data. The development of sofware was left to the community. The success of this policy can be seen by the development of several excellent packages both by commercial firms and the public. If you are privy to the problems of the Cambridge Small Molecule Crystallographic database, you know the problems of linking software development to database distribution. Thus it is consciously beyond the scope of the GenBank contract to develop software to parse the features table. Despite this position, it is obvious that some software, i.e. Jim Fickett's program to translate GenBank is in use by GenBank and will have to be modified to parse the new features table. Although it is perfectly feasible to distribute this program, the problem for GenBank is one of support. That is some one has to answer the many questions about installation, why the program will not work on a specific machine, modifications,etc. There is also the question of updates--that is code used internally is often modified to meet specific purposes often associated with new releases. The bottom line is that the responsible distribution of "free" software can be an expensive proposition. Right now the only way for the GenBank to meet this obligation would be to cut down on services elsewhere. I hope these comments have been helpful. One of the gratifying things to me has been the community spirit among GenBank users and the willingness to distribute software that takes advantage of the GenBank resource. I trust this will continue in the future. Jim Cassatt
roy@phri.nyu.edu (Roy Smith) (08/09/90)
CZJ@CU.NIH.GOV (Jim Cassatt) writes: > I would begin by reminding the community that the purpose of GenBank > has been to collect and distribute data. The development of sofware > was left to the community. I pretty much agree with Jim that the GenBank folks are better off trying to do well what they are supposed to do (and by and large, I think they do an admirable job), and not expend limited resources on projects outside their scope. I also agree that the job of a database maintainer is to distribute the data in a moderately raw form, not to over-package it with software that fits their pre-existing notion of what the data are going to be used for. So, I guess I have to agree that it's not really GenBank's job to write the database parsing software Dan suggests. So, given that, I propose that those of us who are interested (and I'm certainly one of them) get together privately by email and work on it. If folks from GB and/or IG want to be in on it, fine. If not, that's fine too. We could come up with a set of routines for parsing a locus and make them publicly available to whoever wants them on an "as is" basis. If you are interested, write me and/or this mailing list/newsgroup. If enough people are interested, I'll set up a mailing list redistribution alias and we can start arguing about the problem. :-) If nobody is interested, I'll assume it was a dumb idea to begin with and drop it. -- Roy Smith, Public Health Research Institute 455 First Avenue, New York, NY 10016 roy@alanine.phri.nyu.edu -OR- {att,cmcl2,rutgers,hombre}!phri!roy "Arcane? Did you say arcane? It wouldn't be Unix if it wasn't arcane!"
kristoff@genbank.BIO.NET (David Kristofferson) (08/09/90)
Roy, So that you can get input from the various database staff persons at LANL, EMBL, DDBJ, and IG, it might be a good idea to hold your discussions on this forum (assuming that people do become serious about really doing this and are just philosophizing). I believe that people at all of the above locations read this group, and traffic has not been so overwhelming that it would create a nuisance. In fact, this is exactly the kind of issue that this newsgroup was designed for. I would be gratified to see this kind of communication between the databanks and the user community expanded greatly. Although it may not be our mission to write the software, it would be a big mistake not to tap into the accumulated expertise available here. I hope that the staff at the other databanks concur in this opinion. -- Sincerely, Dave Kristofferson GenBank On-line Service Manager kristoff@genbank.bio.net
smith@mcclb0.med.nyu.edu (08/09/90)
In article <9008081410.AA02507@alw.nih.gov>, CZJ@CU.NIH.GOV writes: > As one of the Project Officers of the GenBank Contract, I thought it > would be appropriate to respond to some of the issues raised by Dan > Davison in his "I must have been out of the room" messages. ... > Despite this position, it is obvious that some software, i.e. Jim > Fickett's program to translate GenBank is in use by GenBank and will > have to be modified to parse the new features table. Although it is > perfectly feasible to distribute this program, the problem for GenBank > is one of support. There is nothing morally wrong in distributing a program and saying that it is being provided as an example to aid development, and that it is distributed 'as-is'. This eliminates the support issue. If people have questions, they can post them to this, or some other list, for resolution. No one HAS to respond. Everyone can benefit from this, even the people at GenBank who may pick up an idea or two. I think it is important to emphasize the concept of a 'community of peers'. People should bee free to make good-faith 'submissions' of program, even if it has drawbacks they are aware of. The community then decides whether it is able to use it. The individual's choice. It benefits no one to withhold a program when the sole reason for doing so is that they cannot provide 'phone support. +---------------------------------------------------------------------------+ |Ross Smith, Cell Biology, NYU Medical Center, 550 First Ave., NYC, 10016| |Phone: (212) 340-5356: FAX: (212) 340-8139 (Alternate NYUMC) (212) 340-7190| |E-Mail: SMITH@NYUMED.BITNET (BITNET), SMITH@MCCLB0.MED.NYU.EDU (Internet)| +---------------------------------------------------------------------------+
roy@alanine.phri.nyu.edu (Roy Smith) (08/09/90)
Dan Davison send this to me, and requested I forward it to this list: ---------------- > CZJ@CU.NIH.GOV (Jim Cassatt) writes: > > I would begin by reminding the community that the purpose of GenBank > > has been to collect and distribute data. The development of sofware > > was left to the community. No question. > So, I guess I have to agree that it's not really > GenBank's job to write the database parsing software Dan suggests I did not say that. I said, paraphrasing, which of the people responsible for breaking all existing sw (and there is sw that read the feature table, both commerical and pd/private) are going to *help* by doing something about the problems they've created? > So, given that, I propose that those of us who are interested (and > I'm certainly one of them) get together privately by email and work on it. > If folks from GB and/or IG want to be in on it, fine. If not, that's fine > too. We could come up with a set of routines for parsing a locus and make > them publicly available to whoever wants them on an "as is" basis. [...] Senor J. Ramon has already very kindly offered his routines; I would also like to be part of the list Roy proposed, altho anyone who knows me or has seen my code knows that I can't program myself into a paper bad, and parsing is completely beyond me (but I'm trying!). Thanks again to both Senor Ramon and Roy on this. dan -- dr. dan davison/dept. of biochemical and biophysical sciences/univ. of Houston/4800 Calhoun/Houston,TX 77054-5500/davison@uh.edu/DAVISON@UHOU Disclaimer: As always, I speak only for myself, and, usually, only to myself. ---------------- -- Roy Smith, Public Health Research Institute 455 First Avenue, New York, NY 10016 roy@alanine.phri.nyu.edu -OR- {att,cmcl2,rutgers,hombre}!phri!roy "Arcane? Did you say arcane? It wouldn't be Unix if it wasn't arcane!"
kristoff@genbank.BIO.NET (David Kristofferson) (08/10/90)
> There is nothing morally wrong in distributing a program and saying that it > is being provided as an example to aid development, and that it is > distributed 'as-is'. This eliminates the support issue. If people have > questions, they can post them to this, or some other list, for > resolution. I guess there is one more item to attend to ... "Support" means far more than just answering the phone. It also means updating the code so that it continues to serve as an example if the databank format changes. We are confronting this issue right now for Fickett's code which generates GenPept. Any additional code such as a features table parser is subject to similar revisions. We could be nice guys and undertake these additional revisions as volunteer work, but then this would raise problems if other work to which we are currently committed did not get done. As Ross is aware through our collaboration on bionet.molbio.genbank.updates, we do take on these kinds of tasks but have to draw a line somewhere. This job is too big for us to bite off right now. -- Sincerely, Dave Kristofferson GenBank On-line Service Manager kristoff@genbank.bio.net