dbd%benden@LANL.GOV (Dan Davison) (11/20/88)
To take up where Peter Karp left off, I thought I would introduce myself, my relationship to the bio-matrix effort, and my research interests. I'm a postdoc with the Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, NM (USA). My chief research interests are sequence alignment algorithms, codon usage in procaryotes, and the development of a sequence similarity database. The group here also is the home of GenBank; combined with the Lab's many Crays and some search code, it means I can get a lot of searching done! I also have been a part of the Bio-matrix effort; I'm on the executive committee and run the mailing list you are now reading, maintain the archive-server, and I'm editor of the hardcopy newsletter. The latter is printed with help from the Santa Fe Institute, organizers of the original Bio-matrix workshop in the summer of 1987. I view this part of my work a sort of a cheerleader and agitator. I try to get discussion going as well as making all types of scientists aware of the bio-matrix ideas and efforts. My latest scientific effort is an examination of codon usage in extra- chromosomal genes expressed in E. coli such as plasmid, transposon, and insertion sequence genes. It has often been claimed in the literature that there is an "evolutionary strategy" for these genes--specifically, interfere with the host as little as possible. Therefore, if this is true, the codons that are used by these reading frames would be the less frequently used codons in E. coli. Since GenBank now contains some 20% of the E. coli sequence (although not all of it is from K12, of course) there is enough information to statistically test this claim. [I should add that I have made the same claim in the past]. I have completed the computer runs but not yet completely analyzed the results. However, an initial scan suggests that there is *not* a genome strategy for extrachromosomal genes. Another project which has recently become tractable is a similarity database for GenBank. I am currently designing the data structure to hold the information; I would like to design as flexible a structure as possible, one that would allow a person to read the db as well as a program, yet make the information retrieval as simple as possible from the db. These goals are sort of mutually contradictory, as I've found out lately. We will use the Crays here at the Lab to generate the initial db, and use Suns with information retrieval software to access those files. There is another problem with such a database; in order to avoid having to spend several hundred Cray hours each time GenBank is updated, the db must be *very* easy to update. All of this is quite a challenge for a biologist! I would welcome any comments or suggestions on anything discussed in this note. dan davison dd@lanl.gov theoretical biology t-10 ms k710 los alamos national laboratory los alamos, nm 87545