[sci.bio] Software for automated subseqence extraction

eesnyder@boulder.Colorado.EDU (Eric E. Snyder) (04/28/91)

I am looking for some software that will allow me to extract subsequences
from genbank or PIR.

For example, I would like to be able to provide a keyword such as 'splice
site' and have the program search genbank and return with a list of sequence
names and the subsequence from each entry corresponding to my keyword.

Any leads would be appreciated....
Thanks,
---------------------------------------------------------------------------
TTGATTGCTAAACACTGGGCGGCGAATCAGGGTTGGGATCTGAACAAAGACGGTCAGATTCAGTTCGTACTGCTG
Eric E. Snyder                            
Department of MCD Biology              ...making feet for childrens' shoes.
University of Colorado, Boulder   
Boulder, Colorado 80309-0347
LeuIleAlaLysHisTrpAlaAlaAsnGlnGlyTrpAspLeuAsnLysAspGlyGlnIleGlnPheValLeuLeu
---------------------------------------------------------------------------

toms@fcs260c2.ncifcrf.gov (Tom Schneider) (04/30/91)

In article <eesnyder.672776972@beagle> eesnyder@boulder.Colorado.EDU (Eric E. Snyder)
writes:
>I am looking for some software that will allow me to extract subsequences
>from genbank or PIR.

The Delila system, old and senile as it is, was designed to extract large
sets of subsequences (DNA only).

>For example, I would like to be able to provide a keyword such as 'splice
>site' and have the program search genbank and return with a list of sequence
>names and the subsequence from each entry corresponding to my keyword.

Because Delila was designed before GenBank, and GenBank structure is STILL not
up to snuff, one must convert from GenBank to Delila format.  This is a simple
program called dbbk (written by Matt Yarus, son of Mike Yarus, you may be
interested to know!).  The Delila viewpoint is that the database consists of a
set of organisms and their chromosomes.  You must specify these, and then the
piece of DNA you are interested in.  The piece corresponds roughly to a GenBank
entry.  The idea is that Delila is a 'librarian' and you give 'her'
instructions that define the fragments you want.  She reaches into the library
and pulls out -- what else? -- a book.  Instructions might look like:

    title 'Demonstration of Delila instructions';
    (* the title is required to name the resulting book *)
    (* this is a comment, just as in the computer language Pascal *)

    organism H.sapians; (* define the organism *)
    chromosome 3; (* I made this name up; unfortunately GenBank hasn't
                     stored this information consistently *)
    piece x253; (* I made this name up also *)

    get from 536 -24 to 536 +30;

The last instruction, 'get' says to Delila that you want the fragment
that starts 24 bases before coordinate 536 and ends 30 bases after.
By having the instructions written in a file, one can handle many of them.

There is now a program that automatically creates Delila instructions from the
GenBank features.  This has allowed us to create hundreds to thousands of
fragments for statistical analysis.

Parts of the Delila system are available by anonymous ftp from
ncifcrf.gov in pub/delila.  See the README files.  I will place more
programs in the archive if you request them.

  Tom Schneider
  National Cancer Institute
  Laboratory of Mathematical Biology
  Frederick, Maryland  21702-1201
  toms@ncifcrf.gov