[comp.archives] [genbank] Re: Software for automated subseqence extraction

toms@fcs260c2.ncifcrf.gov (Tom Schneider) (05/08/91)

Archive-name: bionet/molbio/delila/1991-04-30
Archive-directory: ncifcrf.gov:/pub/delila/ [129.43.1.11]
Original-posting-by: toms@fcs260c2.ncifcrf.gov (Tom Schneider)
Original-subject: Re: Software for automated subseqence extraction
Reposted-by: emv@msen.com (Edward Vielmetti, MSEN)

In article <eesnyder.672776972@beagle> eesnyder@boulder.Colorado.EDU (Eric E. Snyder)
writes:
>I am looking for some software that will allow me to extract subsequences
>from genbank or PIR.

The Delila system, old and senile as it is, was designed to extract large
sets of subsequences (DNA only).

>For example, I would like to be able to provide a keyword such as 'splice
>site' and have the program search genbank and return with a list of sequence
>names and the subsequence from each entry corresponding to my keyword.

Because Delila was designed before GenBank, and GenBank structure is STILL not
up to snuff, one must convert from GenBank to Delila format.  This is a simple
program called dbbk (written by Matt Yarus, son of Mike Yarus, you may be
interested to know!).  The Delila viewpoint is that the database consists of a
set of organisms and their chromosomes.  You must specify these, and then the
piece of DNA you are interested in.  The piece corresponds roughly to a GenBank
entry.  The idea is that Delila is a 'librarian' and you give 'her'
instructions that define the fragments you want.  She reaches into the library
and pulls out -- what else? -- a book.  Instructions might look like:

    title 'Demonstration of Delila instructions';
    (* the title is required to name the resulting book *)
    (* this is a comment, just as in the computer language Pascal *)

    organism H.sapians; (* define the organism *)
    chromosome 3; (* I made this name up; unfortunately GenBank hasn't
                     stored this information consistently *)
    piece x253; (* I made this name up also *)

    get from 536 -24 to 536 +30;

The last instruction, 'get' says to Delila that you want the fragment
that starts 24 bases before coordinate 536 and ends 30 bases after.
By having the instructions written in a file, one can handle many of them.

There is now a program that automatically creates Delila instructions from the
GenBank features.  This has allowed us to create hundreds to thousands of
fragments for statistical analysis.

Parts of the Delila system are available by anonymous ftp from
ncifcrf.gov in pub/delila.  See the README files.  I will place more
programs in the archive if you request them.

  Tom Schneider
  National Cancer Institute
  Laboratory of Mathematical Biology
  Frederick, Maryland  21702-1201
  toms@ncifcrf.gov

-- comp.archives file verification
ncifcrf.gov
total 1111
-rw-r--r--  1 140         42225 May  1 11:16 sites.c.Z
-rw-r--r--  1 140         36808 May  1 11:15 sites.p.Z
-rw-r--r--  1 140          5478 Apr 29 17:45 ribo.logo.Z
-rw-r--r--  1 140         14006 Apr 29 13:55 siva.p.Z
-rw-r--r--  1 140           128 Apr 29 12:37 sivap.Z
-rw-r--r--  1 140           227 Apr 29 12:36 README.delila.Z
-rw-r--r--  1 140          2800 Apr 29 11:04 bez.ps.Z
-rw-r--r--  1 140           271 Apr 29 10:52 README.misc.Z
-rw-r--r--  1 140          1789 Apr 29 10:51 dops.demo.Z
-rw-r--r--  1 140         30365 Apr 29 10:47 dops.p.Z
-rw-r--r--  1 140          1789 Apr 29 10:47 demo.Z
-rw-r--r--  1 140          2692 Apr 17 17:38 standard.t7.Z
-rw-r--r--  1 140          1875 Apr 17 17:35 database.t7.Z
-rw-r--r--  1 140          3311 Apr  9 09:56 vaxmod.p.Z
-rw-r--r--  1 140          1589 Apr  2 17:17 cell.sty.Z
-rw-r--r--  1 140         13642 Apr  2 17:17 cell.bst.Z
-rw-r--r--  1 140          3649 Apr  2 12:36 decat.c.Z
-rw-r--r--  1 140          1864 Apr  2 12:36 decat.p.Z
-rw-r--r--  1 140         15223 Mar 26 13:58 calc.c.Z
-rw-r--r--  1 140         13686 Mar 26 13:58 calc.p.Z
-rw-r--r--  1 140            15 Mar 23 06:21 catalp
-rw-r--r--  1 140         12886 Mar 21 17:57 worcha.p.Z
-rw-r--r--  1 140          2660 Mar 20 14:18 titer.verbose.Z
-rw-r--r--  1 140           625 Mar 20 14:17 titer.result.Z
-rw-r--r--  1 140         12384 Mar 20 14:17 titer.c.Z
-rw-r--r--  1 140         10694 Mar 20 14:17 titer.p.Z
-rw-r--r--  1 140          4661 Mar 20 14:12 titer.plates.Z
-rw-r--r--  1 140          2722 Mar 20 12:34 tkod.p.Z
-rw-r--r--  1 140            87 Mar 20 10:31 dirtyp.Z
-rw-r--r--  1 140          5008 Mar 20 10:31 dirty.p.Z
-rw-r--r--  1 140          1517 Mar 19 18:17 ww.c.Z
-rw-r--r--  1 140          1271 Mar 19 18:16 ww.p.Z
-rw-r--r--  1 140         34749 Mar 19 14:11 search.p.Z
-rw-r--r--  1 140         15296 Mar 15 14:11 README
-rw-r--r--  1 140          3319 Mar  8 18:01 nulldate.p.Z
-rwxr--r--  1 140         10147 Mar  8 17:38 nulldate.Z
-rw-r--r--  1 140         13765 Mar  6 12:35 module.p.Z
-rw-r--r--  1 140          3986 Mar  6 12:34 moddef.Z
-rw-r--r--  1 140         46524 Feb 15 12:40 xyplo.c.Z
-rw-r--r--  1 140          7438 Feb 15 12:31 dalvec.c.Z
-rw-r--r--  1 140           213 Feb 15 12:31 README.logo.Z
-rwxr--r--  1 140           196 Feb 15 12:29 2c.Z
-rw-r--r--  1 140          5865 Feb 15 12:23 alpro.c.Z
-rw-r--r--  1 140         25779 Feb 15 12:21 makelogo.c.Z
-rw-r--r--  1 140            29 Feb 15 12:14 marks.Z
-rw-r--r--  1 140         37858 Feb 14 15:14 libdef.Z
-rw-r--r--  1 140            10 Feb 11 11:21 genhisp
-rw-r--r--  1 140         10587 Feb 11 11:21 genhis.p.Z
-rw-r--r--  1 140         22198 Jan 18 14:56 makelogo.p.Z
-rw-r--r--  1 140           803 Jan 16 20:17 makelogop.dna.Z
-rw-r--r--  1 140           808 Jan 16 20:17 makelogop.ribo.Z
-rw-r--r--  1 140           945 Jan 16 20:17 makelogop.protein.Z
-rw-r--r--  1 140           803 Jan 16 20:17 makelogop.lambcro.Z
-rw-r--r--  1 140           838 Jan 16 20:17 makelogop.demo.Z
-rw-r--r--  1 140           790 Jan 16 20:16 makelogop.alphabet.Z
-rw-r--r--  1 140          8094 Jan 16 14:40 nat.bst.Z
-rw-r--r--  1 140         18717 Jan 11 16:39 encode.p.Z
-rw-r--r--  1 140         15571 Jan 11 16:39 alist.p.Z
-rw-r--r--  1 140          3895 Jan  4 10:01 ver.p.Z
-rw-r--r--  1 140         20030 Dec 19 17:00 rsgra.p.Z
-rw-r--r--  1 140           214 Dec 14 13:27 listerp.Z
-rw-r--r--  1 140          8349 Dec 14 13:27 count.p.Z
-rw-r--r--  1 140         19010 Dec 14 13:27 lister.p.Z
-rw-r--r--  1 140          5404 Dec 10 12:12 p2c.h.Z
-rw-r--r--  1 140          7233 Dec  9 16:47 calhnb.p.Z
-rw-r--r--  1 140          7339 Dec  9 15:59 t7.logo.Z
-rw-r--r--  1 140          6172 Dec  9 15:57 dalvec.p.Z
-rw-r--r--  1 140           871 Dec  9 15:29 rf.p.Z
-rw-r--r--  1 140          4982 Dec  9 14:05 alpro.p.Z
-rw-r--r--  1 140          4018 Dec  8 20:18 lambcro.logo.Z
-rw-r--r--  1 140          5101 Dec  8 20:16 globin.logo.Z
-rw-r--r--  1 140           806 Dec  8 19:07 makelogopo.ribo.Z
-rw-r--r--  1 140           156 Dec  8 18:11 colors.dna.Z
-rw-r--r--  1 140           854 Dec  5 23:05 xyplop.test.Z
-rw-r--r--  1 140          1404 Dec  5 23:05 xyplop.mul.Z
-rw-r--r--  1 140          1448 Dec  5 23:05 xyplop.demo.Z
-rw-r--r--  1 140          1448 Dec  5 23:05 xyplop.Z
-rw-r--r--  1 140         41792 Dec  5 23:05 xyplo.p.Z
-rw-r--r--  1 140           553 Dec  5 23:05 xyin.test.Z
-rw-r--r--  1 140           410 Dec  5 23:05 xyin.mul.Z
-rw-r--r--  1 140           655 Dec  5 23:05 xyin.demo.Z
-rw-r--r--  1 140           655 Dec  5 23:05 xyin.Z
-rw-r--r--  1 140          4607 Dec  5 23:05 sortbibtex.p.Z
-rw-r--r--  1 140          6918 Dec  5 23:05 ref2bib.p.Z
-rw-r--r--  1 140          4139 Dec  5 23:05 verbop.p.Z
-rw-r--r--  1 140          1554 Dec  5 23:05 jmb.sty.Z
-rw-r--r--  1 140         13280 Dec  5 23:04 jmb.bst.Z
-rw-r--r--  1 140          4333 Dec  5 23:04 nar.sty.Z
-rw-r--r--  1 140          8013 Dec  5 23:04 nar.bst.Z
-rw-r--r--  1 140          2040 Dec  5 23:04 rembla.p.Z
-rw-r--r--  1 140          1322 Dec  5 23:04 symvec.dna.Z
-rw-r--r--  1 140           385 Dec  5 23:04 symvec.demo.Z
-rw-r--r--  1 140          2154 Dec  5 23:04 rsdata.dna.Z
-rw-r--r--  1 140          7670 Dec  5 23:04 protseq.globin.Z
-rw-r--r--  1 140          8405 Dec  5 23:04 logo.tex.Z
-rw-r--r--  1 140          1821 Dec  5 23:04 logo.bbl.Z
-rw-r--r--  1 140           158 Dec  5 23:04 colors.two.Z
-rw-r--r--  1 140           305 Dec  5 23:04 colors.protein.Z
-rw-r--r--  1 140           474 Dec  5 23:04 colors.jm.Z
-rw-r--r--  1 140           416 Dec  5 23:04 colors.dg.Z
-rw-r--r--  1 140           197 Dec  5 23:04 colors.demo.Z
-rw-r--r--  1 140           142 Dec  5 23:04 colors.alphabet.Z
-rw-r--r--  1 140           186 Dec  5 23:04 colors.Z
-rw-r--r--  1 140          1216 Dec  5 23:04 w71.Z
-rw-r--r--  1 140          1085 Dec  5 23:04 w51.Z
-rw-r--r--  1 140          1457 Dec  5 23:04 w101.Z
-rw-r--r--  1 140         21444 Dec  5 23:04 rseq.p.Z
-rw-r--r--  1 140          9313 Dec  5 23:04 rawbk.p.Z
-rw-r--r--  1 140         11759 Dec  5 23:04 patval.p.Z
-rw-r--r--  1 140         14519 Dec  5 23:04 patser.p.Z
-rw-r--r--  1 140           319 Dec  5 23:04 patli.Z
-rw-r--r--  1 140           122 Dec  5 23:04 patin.Z
-rw-r--r--  1 140           291 Dec  5 23:04 patbk.Z
-rw-r--r--  1 140         24009 Dec  5 23:04 makebk.p.Z
-rw-r--r--  1 140          1378 Dec  5 23:04 loocat.p.Z
-rw-r--r--  1 140           188 Dec  5 23:04 ex8in.Z
-rw-r--r--  1 140           144 Dec  5 23:03 ex7in.Z
-rw-r--r--  1 140           325 Dec  5 23:03 ex6in.Z
-rw-r--r--  1 140           172 Dec  5 23:03 ex5in.Z
-rw-r--r--  1 140           173 Dec  5 23:03 ex4in.Z
-rw-r--r--  1 140           114 Dec  5 23:03 ex3in.Z
-rw-r--r--  1 140           235 Dec  5 23:03 ex2in.Z
-rw-r--r--  1 140            96 Dec  5 23:03 ex1in.Z
-rw-r--r--  1 140        137765 Dec  5 23:03 delman.Z
-rw-r--r--  1 140         49972 Dec  5 23:03 delila.p.Z
-rw-r--r--  1 140         12186 Dec  5 23:03 dbbk.p.Z
-rw-r--r--  1 140         27183 Dec  5 23:03 catal.p.Z
found delila ok
ncifcrf.gov:/pub/delila/

toms@fcs260c2.ncifcrf.gov (Tom Schneider) (05/15/91)

Archive-name: bionet/molbio/delila/1991-05-09
Archive-directory: ncifcrf.gov:/pub/delila/ [129.43.1.11]
Original-posting-by: toms@fcs260c2.ncifcrf.gov (Tom Schneider)
Original-subject: Re: Software for automated subseqence extraction
Reposted-by: emv@msen.com (Edward Vielmetti, MSEN)


In article <12911@uhccux.uhcc.Hawaii.Edu> jlong@uhunix1.uhcc.Hawaii.Edu (John Long) writes:
>In article <1991May1.114219.25483@phri.nyu.edu> roy@phri.nyu.edu (Roy Smith) writes:
>>toms@fcs260c2.ncifcrf.gov (Tom Schneider) writes:
>>Her?  Why is a librarian automatically assumed to be female?
>With a name like 'Delila' I think it's safe to assume that he/she/ye/it is a 
>female. Maybe the creator named it after herself. Call it artistic license.
>BFD.

>Besides, doesn't it just make sense that software would be female and hardware
>be male?

I was designing a computer language with which one can extract portions
of a DNA sequence.  I needed a name, and one morning woke up and wrote down:
  DEoxyribonucleic acid
    LIbrary
      LAnguage
  DELILA
hence the name.  See 

@article{Schneider1982,
author = "T. D. Schneider
 and G. D. Stormo
 and J. S. Haemer
 and L. Gold",
title = "A design for computer nucleic-acid sequence storage, retrieval and
manipulation",
journal = "Nucl. Acids Res.",
volume = "10",
pages = "3013-3024",
year = "1982"}

"She"'s available by anonymous ftp from ncifcrf.gov in pub/delila.

>Aloha,
>-LongJohn

  Tom Schneider
  National Cancer Institute
  Laboratory of Mathematical Biology
  Frederick, Maryland  21702-1201
  toms@ncifcrf.gov