eesnyder@boulder.Colorado.EDU (Eric E. Snyder) (04/28/91)
I am looking for some software that will allow me to extract subsequences from genbank or PIR. For example, I would like to be able to provide a keyword such as 'splice site' and have the program search genbank and return with a list of sequence names and the subsequence from each entry corresponding to my keyword. Any leads would be appreciated.... Thanks, --------------------------------------------------------------------------- TTGATTGCTAAACACTGGGCGGCGAATCAGGGTTGGGATCTGAACAAAGACGGTCAGATTCAGTTCGTACTGCTG Eric E. Snyder Department of MCD Biology ...making feet for childrens' shoes. University of Colorado, Boulder Boulder, Colorado 80309-0347 LeuIleAlaLysHisTrpAlaAlaAsnGlnGlyTrpAspLeuAsnLysAspGlyGlnIleGlnPheValLeuLeu ---------------------------------------------------------------------------
kristoff@GENBANK.BIO.NET (Dave Kristofferson) (04/30/91)
> I am looking for some software that will allow me to extract subsequences > from genbank or PIR. > > For example, I would like to be able to provide a keyword such as 'splice > site' and have the program search genbank and return with a list of sequence > names and the subsequence from each entry corresponding to my keyword. The most expeditious way of doing this is through the free GenBank IRX account. Instructions are included in the information below. Sincerely, Dave Kristofferson GenBank Manager kristoff@genbank.bio.net ---------------------------------------------------------------------- The GenBank On-Line Service The GenBank On-Line Service (GOS) provides access to the most recent quarterly releases of the GenBank and EMBL nucleic acid sequence databases, as well as the data added to each of these since their most recent releases (in the New Data databases). In addition, the Swiss-Prot protein sequence database and GenPept, a database of peptide sequences derived by the automatic translation of annotated coding regions of entries in the GenBank databases, are available. Users can query the databases by annotation keywords, search for sequence similiarity, and retrieve entries of interest. The GOS is available through e-mail servers, anonymous FTP, anonymous interactive login, and login to established, password-protected, individual accounts. Access to all GOS services is available to both commercial and non-commercial users at the same cost. On-line help is available for all aspects of this Service. User manuals, information on costs, and application forms may be requested from GenBank at GENBANK@GENBANK.BIO.NET. INTERACTIVE ACCESS Interactive access to the GOS databases is provided through the SprintNet public data network and via remote login over the Internet. At present, the IRX (Information Retrieval Experimental Workbench) program is the primary interactive database retrieval program. Three usage classes are available for the GOS; these classes are described below. Class 0 Accounts Anonymous users of the interactive system are provided with 20 minute sessions using the IRX retrieval program. With this program, entries in any of the on-line databases can be located by searching for a keyword or combination of keywords appearing in any of the fields of the entries' annotations. Located entries can be displayed on the terminal or downloaded to the user's computer with the Kermit file-transfer program. (The Kermit program is available for a wide variety of computers from numerous software bulletin boards, user groups, and from Columbia University. MS-DOS and Macintosh versions are available from GenBank on request.) New users of the IRX program should read the on-line introduction which can be displayed by answering 'Y' to the first question the program asks ("Do you want help?"). To use the GOS Class 0 account, one must have a supported terminal or a computer with software for emulating one of those terminals (see the list in the Example at the end of this message) and a modem capable of communicating at 300, 1200, 2400, or 9600 baud. Instructions for dialing to access the GenBank computer are shown in the example below. After completing the login procedure shown in the example, the IRX database query program is immediately started. Class 1 Accounts To gain access to additional services, users of the GOS may wish to establish accounts on the GOS computer. These accounts provide access to the GOS computer, 1 Mbyte of disk space for user files, access to IRX, the GenBank relational database management system, and interactive and batch mode use of FASTA and TFASTA (a version of FASTA that compares a peptide sequence with a nucleic acid sequence database by translating the database sequences in up to six reading frames "on the fly"). Class 1 accounts also provide electronic mail access for contacting other users of the GOS and users of computers connected to the Internet and other computer networks. Access to a wide variety of electronic bulletin boards is also provided. Newsgroups that may be of special interest are the bionet.journals.contents newsgroup which provides on-line versions of the tables of contents of several important journals before publication and bionet.sci-resources which provides on-line copies of the NIH Guides to Grants and Contracts. Several other newsgroups are available for exchange of information on experimental protocols and other areas of scientific interest. Class 2 Accounts For an additional fee, Class 2 users are provided with access to the IntelliGenetics Suite of sequence analysis programs and databases formatted for those programs. Additional databases (e.g., the PIR Protein Sequence Database, KeyBank(TM), and VectorBank(TM)) are also available to Class 2 users. Class 2 users also have access to all the facilities available to Class 1 users. E-MAIL SERVERS In addition to providing interactive access, GenBank currently offers two electronic mail servers, one for sequence similarity searching and one for database entry retrieval. These are freely available to anyone who can send mail to an Internet address. The following networks have gateways to the Internet: BITNET, EARN, NETNORTH and JANET. Users of computers on these networks may need to change the format of the addresses given below to send the message through a forwarding gateway. Users should consult their computer system managers or administrators to determine the proper forwarding gateway and address form. Questions regarding the use of the e-mail servers (or other aspects of the GOS) may be addressed to: CONSULTANT@GENBANK.BIO.NET. FASTA Server The GenBank FASTA Server receives mail messages containing a nucleic acid or protein query sequence with instructions for the search. The server then performs a FASTA sequence similarity search against the specified database, and returns the results by electronic mail. To use the FASTA Server, send an electronic mail message containing the formatted query sequence to the following Internet address: SEARCH@GENBANK.BIO.NET. To receive instructions for formatting the query sequence, send a mail message to this address containing the word "Help" as the only line of the message. Entry Server E-mail access to sequence database entries is provided for three reasons: 1) to enable users of the FASTA Server to retrieve entries identified by sequence similarity searches; 2) to enable users of the Class 0 interactive system described above, who access it by network remote login (e.g., telnet) to retrieve copies of entries of interest; and 3) to enable readers of journals that identify published sequences by accession number to retrieve computer-readable versions of those sequences. To retrieve a database entry, send a mail message containing only the entry name or the accession number (not both) to the address: RETRIEVE@GENBANK.BIO.NET. The on-line databases are searched and the entry (if any) which corresponds to the supplied entry name or accession number will be returned by electronic mail. To receive instructions on using the Entry Server, send a mail message to the RETRIEVE address (above) containing the word "Help" as the only line of the message. Because of the order in which the databases are searched, if both GenBank and EMBL data banks contain entries with the same primary accession number (the usual case), a query on the accession number will result in the GenBank version of the entry being returned. If the EMBL-format version of the entry is required, it can be retrieved from the EMBL file server at NETSERV@EMBL.BITNET. ANONYMOUS FTP In addition to interactive access and electronic mail servers, GenBank also provides files for anonymous FTP (File Transfer Protocol), including GenBank and EMBL new data and contributed software. Each week the new entries created in the GenBank database are collected into an update file. The file has a name in the form of gbMMDD.seq, where MM is the number of the month and DD is the date of file creation. Likewise, new EMBL entries are collected into files with names in the form of emMMDD.seq. The weekly update files are kept in the new data directories until they are superseded by a new quarterly release of the database. To access any of the files available for anonymous FTP, one should use the FTP protocol to connect to GENBANK.BIO.NET [134.172.1.160], using "anonymous" as the Username and one's surname as the Password. =============================================== Example. Login to the free GOS IRX account ATDT14159616860 Use ATDP for pulse dialing phone. CONNECT 2400 Connect to 2400 baud modem Trying GENBANK.BIO.NET (1434.172.1.160)...Open SunOS/BSD UNIX (genbank.bio.net) login:genbank Typing 'genbank' allows you to access the GenBank computer. Password:4nigms This is the password for the GenBank computer; it MUST be entered in lowercase characters. Last login... This message includes a date showing the last anonymous login, as well as other system information. SunOS Release 4.0.3 (GENBANK) The following is a list of commonly used terminals Designation Terminal Type adm3a Lear Siegler (ADM) aaa-48 Ann-Arbor Ambassador in 48 line mode aaa-60 Ann-Arbor Ambassador in 60 line mode dm3025 Datamedia 3025a h19 Heath H19 or Zenith hp2621 Hewlett Packard HP2621 hp2648-iv Hewlett Packard HP2648A sun Sun Microsystems Workstation console tvi912 Televideo 912, 920 tvi950 Televideo 950 vi200 Visual 200 vt100 Digital Equipment VT100 (default) vt102 Digital Equipment VT102 vt200 Digital Equipment VT200 Press Return to select vt100, or enter the appropriate terminal (type the designation of the appropriate terminal type followed by <CR>) After completing the login procedure shown above, the IRX sequence entry searching program is immediately started. =============================================== Further information about the GenBank On-line Service may be obtained by contacting GenBank at: GenBank c/o IntelliGenetics Inc. 700 East El Camino Real Mt. View, CA 94040 (415) 962-7364 genbank@genbank.bio.net
toms@fcs260c2.ncifcrf.gov (Tom Schneider) (04/30/91)
In article <eesnyder.672776972@beagle> eesnyder@boulder.Colorado.EDU (Eric E. Snyder) writes: >I am looking for some software that will allow me to extract subsequences >from genbank or PIR. The Delila system, old and senile as it is, was designed to extract large sets of subsequences (DNA only). >For example, I would like to be able to provide a keyword such as 'splice >site' and have the program search genbank and return with a list of sequence >names and the subsequence from each entry corresponding to my keyword. Because Delila was designed before GenBank, and GenBank structure is STILL not up to snuff, one must convert from GenBank to Delila format. This is a simple program called dbbk (written by Matt Yarus, son of Mike Yarus, you may be interested to know!). The Delila viewpoint is that the database consists of a set of organisms and their chromosomes. You must specify these, and then the piece of DNA you are interested in. The piece corresponds roughly to a GenBank entry. The idea is that Delila is a 'librarian' and you give 'her' instructions that define the fragments you want. She reaches into the library and pulls out -- what else? -- a book. Instructions might look like: title 'Demonstration of Delila instructions'; (* the title is required to name the resulting book *) (* this is a comment, just as in the computer language Pascal *) organism H.sapians; (* define the organism *) chromosome 3; (* I made this name up; unfortunately GenBank hasn't stored this information consistently *) piece x253; (* I made this name up also *) get from 536 -24 to 536 +30; The last instruction, 'get' says to Delila that you want the fragment that starts 24 bases before coordinate 536 and ends 30 bases after. By having the instructions written in a file, one can handle many of them. There is now a program that automatically creates Delila instructions from the GenBank features. This has allowed us to create hundreds to thousands of fragments for statistical analysis. Parts of the Delila system are available by anonymous ftp from ncifcrf.gov in pub/delila. See the README files. I will place more programs in the archive if you request them. Tom Schneider National Cancer Institute Laboratory of Mathematical Biology Frederick, Maryland 21702-1201 toms@ncifcrf.gov
roy@phri.nyu.edu (Roy Smith) (05/01/91)
toms@fcs260c2.ncifcrf.gov (Tom Schneider) writes: > The idea is that Delila is a 'librarian' and you give 'her' instructions > that define the fragments you want. She reaches into the library and > pulls out -- what else? -- a book. Her? Why is a librarian automatically assumed to be female? -- Roy Smith, Public Health Research Institute 455 First Avenue, New York, NY 10016 roy@alanine.phri.nyu.edu -OR- {att,cmcl2,rutgers,hombre}!phri!roy "Arcane? Did you say arcane? It wouldn't be Unix if it wasn't arcane!"
donnel@helix.nih.gov (Donald A. Lehn) (05/01/91)
In article <1991May1.114219.25483@phri.nyu.edu> roy@phri.nyu.edu (Roy Smith) writes: ->toms@fcs260c2.ncifcrf.gov (Tom Schneider) writes: ->> The idea is that Delila is a 'librarian' and you give 'her' instructions ->> that define the fragments you want. She reaches into the library and ->> pulls out -- what else? -- a book. -> ->Her? Why is a librarian automatically assumed to be female? ->-- Its probably wrong to make such an assumption. However, being a frequent user of libraries and noticing how neat and ordered they tend to be, I find it difficult to imagine how a "him" could be responsible. If you don't grasp what I'm talking about, take a look at any little boy's bedroom and compare it to his sister's. :) Don
toms@fcs260c2.ncifcrf.gov (Tom Schneider) (05/01/91)
In article <CMM.0.88.672964473.kristoff@genbank.bio.net> kristoff@GENBANK.BIO.NET (Dave Kristofferson) writes: >> I am looking for some software that will allow me to extract subsequences >> from genbank or PIR. >> For example, I would like to be able to provide a keyword such as 'splice >> site' and have the program search genbank and return with a list of sequence >> names and the subsequence from each entry corresponding to my keyword. >The most expeditious way of doing this is through the free GenBank IRX >account. Instructions are included in the information below. This person wanted PARTS of genbank entries, not whole entries! Since GenBank entries do not carry a coordinate system with them, it is not possible to extract subsequences without losing the location of the sequences. One must add a new feature to the entries: a coordinate system. Do you understand the situation Dave? Genbank does not serve the needs of this user. He needs software that can manipulate portions of entries. > Dave Kristofferson > GenBank Manager > kristoff@genbank.bio.net Tom Schneider National Cancer Institute Laboratory of Mathematical Biology Frederick, Maryland 21702-1201 toms@ncifcrf.gov
POSTMAST@gunbrf.bitnet (05/01/91)
From: edu%"eesnyder@boulder.colorado.edu" 29-APR-1991 11:18:10.80 To: genbank-bb@colorado.edu CC: Subj: Software for automated subseqence extraction Received: From STANFORD(MAILER) by NBRF with Jnet id 6419 for POSTMASTER@GUNBRF; Mon, 29 Apr 91 11:18 EDT Received: by Forsythe.Stanford.EDU; Sun, 28 Apr 91 18:30:08 PDT Received: by genbank.bio.net (5.61/IG-2.0) id AA18837; Sat, 27 Apr 91 12:45:20 -0700 Received: by genbank.bio.net (5.61/IG-2.0) id AA18789; Sat, 27 Apr 91 12:44:37 -0700 Message-Id: <9104271944.AA18789@genbank.bio.net> To: genbank-bb@colorado.edu From: eesnyder@boulder.colorado.edu (Eric E. Snyder) Subject: Software for automated subseqence extraction Date: 27 Apr 91 18:29:32 GMT Sender: news@colorado.edu (The Daily Planet) Nntp-Posting-Host: beagle.colorado.edu I am looking for some software that will allow me to extract subsequences from genbank or PIR. For example, I would like to be able to provide a keyword such as 'splice site' and have the program search genbank and return with a list of sequence names and the subsequence from each entry corresponding to my keyword. Any leads would be appreciated.... Thanks, --------------------------------------------------------------------------- TTGATTGCTAAACACTGGGCGGCGAATCAGGGTTGGGATCTGAACAAAGACGGTCAGATTCAGTTCGTACTGCTG Eric E. Snyder Department of MCD Biology ...making feet for childrens' shoes. University of Colorado, Boulder Boulder, Colorado 80309-0347 LeuIleAlaLysHisTrpAlaAlaAsnGlnGlyTrpAspLeuAsnLysAspGlyGlnIleGlnPheValLeuLeu ---------------------------------------------------------------------------
kristoff@genbank.bio.net (David Kristofferson) (05/02/91)
Whoops, you definitely caught me on that one, Tom (blush)!! I also must admit to not understanding your comment about the lack of a coordinate system. For example, coding sequences are clearly annotated in the features table and one can extract these subsequences from an entry while also carrying along the annotations which refer to their position in the original sequence. What do you mean by the lack of a coordinate system??? As to writing software for the user's request, GenBank is not in a position to produce analysis software (except for the production of the GenPept database). Dave
overt@antony (Christian Overton) (05/02/91)
Dave, One problem with using GenBank IRX in its current form is that retieval of information can only be done via Kermit. This means, in particular, that data cannot be retrieved over the Internet and that instead, one has to dial up the GenBank server directly (too expensive from East Coast) or use SprintNet (never tried this before). It would be nice if disk space was set aside for temporary files (like files that can exist for no more than an hour) that could be created and ftp'd by anonymous users. Chris -- +-------------------------------------------------------------------------------+ | G. Christian Overton || Telephone: (215) 648-2420 | | Center for Advanced Information Technology || Internet: overt@prc.unisys.com | | Unisys || FAX: (215) 648-2288 |
kristoff@genbank.bio.net (David Kristofferson) (05/02/91)
> One problem with using GenBank IRX in its current form is that > retieval of information can only be done via Kermit. This means, in > particular, that data cannot be retrieved over the Internet and that > instead, one has to dial up the GenBank server directly (too expensive > from East Coast) or use SprintNet (never tried this before). It would > be nice if disk space was set aside for temporary files (like files > that can exist for no more than an hour) that could be created and > ftp'd by anonymous users. Chris, We'll consider this proposal at our next GOS meeting. Dave P.S. - Your mail header overt@antony needs fixing. I can't reply to your mail directly.
toms@fcs260c2.ncifcrf.gov (Tom Schneider) (05/03/91)
In article <1991May1.114219.25483@phri.nyu.edu> roy@phri.nyu.edu (Roy Smith) writes: >toms@fcs260c2.ncifcrf.gov (Tom Schneider) writes: >> The idea is that Delila is a 'librarian' and you give 'her' instructions >> that define the fragments you want. She reaches into the library and >> pulls out -- what else? -- a book. >Her? Why is a librarian automatically assumed to be female? Because 'she' is named Delila (DEoxyribonucleic acid LIbrary LAnguage), which is similar to the famous Delilah, who cut Samson's hair. Like Delilah, Delila cuts DNA. Tom Schneider National Cancer Institute Laboratory of Mathematical Biology Frederick, Maryland 21702-1201 toms@ncifcrf.gov
roy@phri.nyu.edu (Roy Smith) (05/04/91)
overt@antony (Christian Overton) writes: > One problem with using GenBank IRX in its current form is that retieval > of information can only be done via Kermit. This means, in particular, > that data cannot be retrieved over the Internet IRX isn't restricted to kermit; you can save to a disk file instead, but selecting "save to file" instead of "download with kermit" is one of the few serious mis-features in the IRX user interface (IMHO). You have to do something like hit the space bar to cycle around the choice; yes, the directions to do this are right there on the screen, but 1) who reads directions, 2) it's counter-intuitive, and 3) the directions are confusing. It took us a few times to figure out how to do it. Anyway, once you've got the entry saved as a disk file, it's easy to ftp it back to your machine. I do this on a regular basis. If I was maintaining IRX, I'd make Save To Disk the default. Really neato would be "Save to remote host via ftp", but if building a whole ftp client into a GenBank retrieval program isn't feeping creatureism, I don't know what is. /roy -- Roy Smith, Public Health Research Institute 455 First Avenue, New York, NY 10016 roy@alanine.phri.nyu.edu -OR- {att,cmcl2,rutgers,hombre}!phri!roy "Arcane? Did you say arcane? It wouldn't be Unix if it wasn't arcane!"
roy@phri.nyu.edu (Roy Smith) (05/05/91)
overt@antony (Christian Overton) writes: > One problem with using GenBank IRX in its current form is that > [...] data cannot be retrieved over the Internet To which I replied: > IRX isn't restricted to kermit; you can save to a disk file instead, It would appear that I spoke without fully comprehending the situation. Apparantly, there are different types of accounts; the account I use allows full access to the Unix shell, so I can save disk files and manipulate them later. Apparantly there are other accounts in which you can not do this. Please excuse the confusion I may have caused. -- Roy Smith, Public Health Research Institute 455 First Avenue, New York, NY 10016 roy@alanine.phri.nyu.edu -OR- {att,cmcl2,rutgers,hombre}!phri!roy "Arcane? Did you say arcane? It wouldn't be Unix if it wasn't arcane!"
toms@fcs260c2.ncifcrf.gov (Tom Schneider) (05/07/91)
In article <May.1.10.08.02.1991.16403@genbank.bio.net> kristoff@genbank.bio.net (David Kristofferson) writes: >I ... >must admit to not understanding your comment about the lack of a >coordinate system. For example, coding sequences are clearly >annotated in the features table and one can extract these subsequences >from an entry while also carrying along the annotations which refer to >their position in the original sequence. What do you mean by the lack >of a coordinate system??? There are many ways to use a genetic sequence database. Most people are interested in a single sequence, and for this the current methods work reasonably well. However, more and more people are interested in studying collections of sequences. For example, we have a huge collection of splice junctions. To analyze these statistically, we would like to extract only a minimum region around the junctions. If we were to do this by hand, then we would be likely to make errors, and the process would be very tedious. To avoid errors, we create a set of instructions that define the regions we want to study. We used the feature table to make the instructions. But what should the output of such an extraction look like? Many years ago, Jeff Haemer and I realized that the best form for the output extraction should be identical to the input! Thus if I want bases 57 through 89 of a partiticular GenBank entry, the most useful output would look like a GenBank entry, but would only contain bases 57 through 89. The power of this is that it allows one to use the same search or other analysis program on GenBank as one uses on a subset. Using written sets of instructions (instead of interactive input), one can automatically create sub-databases and sub-sub databases. The subsets would be equivalent to the main database. For example, we created a subset of E. coli sequences that were the transcribed RNA. Further extractions of ribosome binding sites to create a sub-sub-database, were therefore guaranteed to give us sequences that were alway RNA. These were the initial steps toward creating a database which we used to train the Perceptron (a neural net) to locate ribosome binding sites (Stormo, NAR 10: 2997, 1982). In the present GenBank scheme, this means that the numbering of the extracted fragment 57 through 89 would implicitly become 1 through 89-57+1 = 33. If we made a nice printed listing of open reading frames of the original entry, then we would have to keep doing subtractions to find things in our sub-sequence. If you every have had to do this, you know how painful it is. So the idea came up that the extracted entry should carry a coordinate system. This is a set of numbers that defines the original number of each base in the extracted sequence. But if the extracted entries have coordinate systems, then so too should the main library, in keeping with the principle of equivalence between database and sub-databases. To implement such a scheme today, we would have to add a coordinate system to the extracted GenBank entries. This is equivalent to carrying along the annotations, but makes it more explicit. A true coordinate system does not depend on any 'features'. With today's GenBank, we would also have to have each analysis program check for a coordinate system, and if it is not found, assume that the numbering is 1 to n. This is possible, but is obviously a messy design, forced by the lack of an explicitly defined coordinate system in the main database. You might ask: why not simply implement this program check and be done with it? Well, if nothing else, having a coordinate system would allow GenBank to extend an old sequence before base 1 and not modify any other coordinates. (There is nothing wrong with having a zero coordinate.) These ideas were implemented in the Delila system before GenBank came into existence (NAR 10:3013, 1982; 12:129, 1984). I don't expect GenBank to write software, since that goal of GenBank was dropped for political/funding (?) reasons many years ago. However, GenBank should be creating a database which is useable for many purposes. The ability to automatically create specialized databases is becomming more and more important. Unfortunately it often means the creation of a completely new database, rather than one extracted from the original database. The trouble with absolute coordinate systems is that if two GenBank entries fuse together, the numbering of at least one sequence must change. Any instructions become out of date. The way to avoid this is to have landmarks on the sequence which do not change. For this reason I urged that every feature in GenBank have a name. I see that at least the latest entry I extracted does have a name, but I don't know if this is true of all features (I suspect it isn't). If each feature had a unique name, then the instructions for extracting fragments would remain the same. For example, I could say: organism 'E. coli'; chromosome 'main'; gene lacZ; get from gene beginning -20 to gene beginning + 10; This is pseudo-delila code since the names don't exist and the use of quote marks is not implemented yet. However, with the right database, these instructions would last forever since the names E. coli and lacZ are universal and not likely to change. The best names to use are the currently accepted genetic names (since they are the most stable), but provision must be made for using alternative names. The fragment defined by these instructions would, of course, have whatever numbering (coordinate system) the current database allowed, so that one could compare the results from several different analyses. Tom Schneider National Cancer Institute Laboratory of Mathematical Biology Frederick, Maryland 21702-1201 toms@ncifcrf.gov
jlong@uhunix1.uhcc.Hawaii.Edu (John Long) (05/08/91)
In article <1991May1.114219.25483@phri.nyu.edu> roy@phri.nyu.edu (Roy Smith) writes: >toms@fcs260c2.ncifcrf.gov (Tom Schneider) writes: >Her? Why is a librarian automatically assumed to be female? With a name like 'Delila' I think it's safe to assume that he/she/ye/it is a female. Maybe the creator named it after herself. Call it artistic license. BFD. Besides, doesn't it just make sense that software would be female and hardware be male? Aloha, -LongJohn
toms@fcs260c2.ncifcrf.gov (Tom Schneider) (05/10/91)
In article <12911@uhccux.uhcc.Hawaii.Edu> jlong@uhunix1.uhcc.Hawaii.Edu (John Long) writes: >In article <1991May1.114219.25483@phri.nyu.edu> roy@phri.nyu.edu (Roy Smith) writes: >>toms@fcs260c2.ncifcrf.gov (Tom Schneider) writes: >>Her? Why is a librarian automatically assumed to be female? >With a name like 'Delila' I think it's safe to assume that he/she/ye/it is a >female. Maybe the creator named it after herself. Call it artistic license. >BFD. >Besides, doesn't it just make sense that software would be female and hardware >be male? I was designing a computer language with which one can extract portions of a DNA sequence. I needed a name, and one morning woke up and wrote down: DEoxyribonucleic acid LIbrary LAnguage DELILA hence the name. See @article{Schneider1982, author = "T. D. Schneider and G. D. Stormo and J. S. Haemer and L. Gold", title = "A design for computer nucleic-acid sequence storage, retrieval and manipulation", journal = "Nucl. Acids Res.", volume = "10", pages = "3013-3024", year = "1982"} "She"'s available by anonymous ftp from ncifcrf.gov in pub/delila. >Aloha, >-LongJohn Tom Schneider National Cancer Institute Laboratory of Mathematical Biology Frederick, Maryland 21702-1201 toms@ncifcrf.gov