sys_ms@bmc1.uu.se (03/10/88)
From: sys_ms@bmc1.uu.se I am trying to implement a relational database for nucleic and protein sequences. While doing this I have come to the conclusion that the current form of the EMBL and Genbank databases are not an optimal form to use as input to programs that load my databse. As an example, while reading the keyword lines from the EMBL tape you read the documentation and it say: keywords are separated by semicolons and the last keyword is followed by a period. Simple. You run the program you just implemented, and check for semicolons and periods. Just to find out that the keyword "4.5S RNA" is there. You end up with the keyword "4" in the database, and you know you should have done it in another way. I have heard that both Genbank and EMBL are trying to put their databases into relational database handlers. I would then suggest them to consider distributing their data in a simple tabular form that correspond to the relational structures they implement. It would bee much simpler and less errorprone to load other databases from such files. Mats Sundvall, Biomedical Center, University of Uppsala, Sweden mats@bmc1.uu.se sysdan@semax51.bitnet