[bionet.molbio.bio-matrix] relational form of nucleotide sequences

cb%intron@LANL.GOV (03/10/88)

From: cb%intron@LANL.GOV (Christian Burks)

From @bionet-20.arpa,@CUNYVM.CUNY.EDU:sys_ms@bmc1.uu.se Wed Mar  9 22:09:00 1988
Received: from LANL.GOV (beta.lanl.gov) by intron.lanl.gov (3.2/5.17)
	id AA06237; Wed, 9 Mar 88 22:08:54 MST
Received: by LANL.GOV (5.54/1.14)
	id AA05927; Wed, 9 Mar 88 22:03:22 MST
Message-Id: <8803100503.AA05927@LANL.GOV>
Received: from CUNYVM.CUNY.EDU by BIONET-20.ARPA with TCP; Wed 9 Mar 88 20:03:40-PST
Received: from max.uu.se by CUNYVM.CUNY.EDU ; Wed, 09 Mar 88 16:29:46 EST
Received: from bmc1.uu.se by max.uu.se; Wed, 9 Mar 88 21:55 MET
Date: Wed, 9 Mar 88 21:54 MET
From: sys_ms@bmc1.uu.se
Subject: Some thoughts on database formats
To: bio-matrix@bionet-20.arpa
Status: R


I am trying to implement a relational database for nucleic and protein
sequences. While doing this I have come to the conclusion that the
current form of the EMBL and Genbank databases are not an optimal form
to use as input to programs that load my databse.

As an example, while reading the keyword lines from the EMBL tape
you read the documentation and it say: keywords are separated by
semicolons and the last keyword is followed by a period. Simple.

You run the program you just implemented, and check for semicolons
and periods. Just to find out that the keyword "4.5S RNA" is there.
You end up with the keyword "4" in the database, and you know you
should have done it in another way.

I have heard that both Genbank and EMBL are trying to put their
databases into relational database handlers. I would then suggest
them to consider distributing their data in a simple tabular
form that correspond to the relational structures they implement.

It would bee much simpler and less errorprone to load other
databases from such files.


        Mats Sundvall,
        Biomedical Center,
        University of Uppsala,
        Sweden

        mats@bmc1.uu.se
        sysdan@semax51.bitnet



*******************************************************************

Dr. Sundvall,

thank you for your thoughts...i'll pass them along to our group.

I've mailed you a copy of our current (in progress) manuscript describing
a relational schema for the nucleotide sequence databases...any feedback
would be welcome.  I'll ask Tom Marr to put on the mailing list for
the next version of this document, also.

                                    Christian Burks
                                    GenBank