cb%intron@LANL.GOV (03/10/88)
From: cb%intron@LANL.GOV (Christian Burks) From @bionet-20.arpa,@CUNYVM.CUNY.EDU:sys_ms@bmc1.uu.se Wed Mar 9 22:09:00 1988 Received: from LANL.GOV (beta.lanl.gov) by intron.lanl.gov (3.2/5.17) id AA06237; Wed, 9 Mar 88 22:08:54 MST Received: by LANL.GOV (5.54/1.14) id AA05927; Wed, 9 Mar 88 22:03:22 MST Message-Id: <8803100503.AA05927@LANL.GOV> Received: from CUNYVM.CUNY.EDU by BIONET-20.ARPA with TCP; Wed 9 Mar 88 20:03:40-PST Received: from max.uu.se by CUNYVM.CUNY.EDU ; Wed, 09 Mar 88 16:29:46 EST Received: from bmc1.uu.se by max.uu.se; Wed, 9 Mar 88 21:55 MET Date: Wed, 9 Mar 88 21:54 MET From: sys_ms@bmc1.uu.se Subject: Some thoughts on database formats To: bio-matrix@bionet-20.arpa Status: R I am trying to implement a relational database for nucleic and protein sequences. While doing this I have come to the conclusion that the current form of the EMBL and Genbank databases are not an optimal form to use as input to programs that load my databse. As an example, while reading the keyword lines from the EMBL tape you read the documentation and it say: keywords are separated by semicolons and the last keyword is followed by a period. Simple. You run the program you just implemented, and check for semicolons and periods. Just to find out that the keyword "4.5S RNA" is there. You end up with the keyword "4" in the database, and you know you should have done it in another way. I have heard that both Genbank and EMBL are trying to put their databases into relational database handlers. I would then suggest them to consider distributing their data in a simple tabular form that correspond to the relational structures they implement. It would bee much simpler and less errorprone to load other databases from such files. Mats Sundvall, Biomedical Center, University of Uppsala, Sweden mats@bmc1.uu.se sysdan@semax51.bitnet ******************************************************************* Dr. Sundvall, thank you for your thoughts...i'll pass them along to our group. I've mailed you a copy of our current (in progress) manuscript describing a relational schema for the nucleotide sequence databases...any feedback would be welcome. I'll ask Tom Marr to put on the mailing list for the next version of this document, also. Christian Burks GenBank