[bionet.molbio.bio-matrix] research summary, D. Davison

dbd%benden@LANL.GOV (Dan Davison) (11/20/88)

To take up where Peter Karp left off, I thought I would introduce myself,
my relationship to the bio-matrix effort, and my research interests.

I'm a postdoc with the Theoretical Biology and Biophysics Group, Los Alamos
National Laboratory, Los Alamos, NM (USA).   My chief research interests
are sequence alignment algorithms, codon usage in procaryotes, and the 
development of a sequence similarity database.  The group here also is the 
home of GenBank; combined with the Lab's many Crays and some search code, it
means I can get a lot of searching done!  

I also have been a part of the Bio-matrix effort; I'm on the executive
committee and run the mailing list you are now reading, maintain the
archive-server, and I'm editor of the hardcopy newsletter.  The latter is
printed with help from the Santa Fe Institute, organizers of the
original Bio-matrix workshop in the summer of 1987.  I view this part
of my work a sort of a cheerleader and agitator.  I try to get discussion
going as well as making all types of scientists aware of the bio-matrix
ideas and efforts.

My latest scientific effort is an examination of codon usage in extra-
chromosomal genes expressed in E. coli such as plasmid, transposon, and
insertion sequence genes.  It has often been claimed in the literature
that there is an "evolutionary strategy" for these genes--specifically,
interfere with the host as little as possible.  Therefore, if this is
true, the codons that are used by these reading frames would be the less
frequently used codons in E. coli.  Since GenBank now contains some 20% of
the E. coli sequence (although not all of it is from K12, of course) there
is enough information to  statistically test this claim.  [I should add
that I have made the same claim in the past].  I have completed the
computer runs but not yet completely analyzed the results.  However,
an initial scan suggests that there is *not* a genome strategy for
extrachromosomal genes.

Another project which has recently become tractable is a similarity database
for GenBank.  I am currently designing the data structure to hold
the information; I would like to design as flexible a structure as possible,
one that would allow a person to read the db as well as a program, yet
make the information retrieval as simple as possible from the db.  These
goals are sort of mutually contradictory, as I've found out lately.  We
will use the Crays here at the Lab to generate the initial db, and use
Suns with information retrieval software to access those files.  There is
another problem with such a database; in order to avoid having to spend
several hundred Cray hours each time GenBank is updated, the db must
be *very* easy to update.  

All of this is quite a challenge for a biologist!  I would welcome any
comments or suggestions on anything discussed in this note.

dan davison
dd@lanl.gov
theoretical biology
t-10 ms k710
los alamos national laboratory
los alamos, nm 87545