read@cs.utexas.edu (Rob) (09/23/90)
* Does anyone have a lexer/parser for the GenBank Feature Table? * Would anone want one if wrote one? I am seeking to avoid duplication of effort. I work for the Biological Workstation Group at the University of Texas Center for High Performance Computing (UT-CHPC). We are going to construct a logic programming-based query interface to the feature table using the experimental knowledge base LDL. At present, LDL reads data from a "flat" file in its own format. Therefore, I need to translate the GenBank distribution into that format. I am a computer science student and new to the biological aspects of this area. From a computer science point of view, it seems pretty obvious that the published feature table definition should be easily parsed using widely available lexical anaylyzer and parser generator tools, like Flex/Lex and Bison/Yacc. So I figure some one out there has already built a parser. If not, I will do so. Thanks for any response and looking for electronic contacts, Rob Robert L. Read University of Texas read@cs.utexas.edu Center for High Performance (512)-477-1240 Computation
toms@fcs260c2.ncifcrf.gov (Tom Schneider) (09/25/90)
In article <922@dimebox.cs.utexas.edu> read@cs.utexas.edu (Rob) writes: >* Does anyone have a lexer/parser for the GenBank Feature Table? >* Would anone want one if wrote one? This is an excellent idea. But it depends on a "Definition of Genbank", a document that (to my knowledge) not ever made public. Has this been released? Is it complete? Does it include a complete BNF of the entire data structure (not just the features table)? Does it define allowed values of all parameters in the structure? If these things are not in place and documented, your parser is doomed eventually... (I spent many years attempting as a GenBank Advisor to get them to define the database, and I don't think they have.) >Robert L. Read University of Texas >read@cs.utexas.edu Center for High Performance >(512)-477-1240 Computation Tom Schneider National Cancer Institute Laboratory of Mathematical Biology Frederick, Maryland 21702-1201 toms@ncifcrf.gov
usenet@nlm.nih.gov (usenet news poster) (09/25/90)
In article <1885@fcs280s.ncifcrf.gov> toms@fcs260c2.ncifcrf.gov (Tom Schneider) writes: >In article <922@dimebox.cs.utexas.edu> read@cs.utexas.edu (Rob) writes: >>* Does anyone have a lexer/parser for the GenBank Feature Table? >>* Would anone want one if wrote one? > >This is an excellent idea. But it depends on a "Definition of Genbank", a >document that (to my knowledge) not ever made public... We have a parser which translates the new GenBank format to the data formats defined in the GenInfo ASN.1 definition. It should be released with the NCBI software toolkit distribution later this fall. Sources and documentation will be placed in the anonymous ftp site "ncbi.nlm.nih.gov" subdirectories ./toolbox and ./tech-reports when they are available (not yet). >>Robert L. Read University of Texas >>read@cs.utexas.edu Center for High Performance >>(512)-477-1240 Computation > > Tom Schneider > National Cancer Institute > Laboratory of Mathematical Biology > Frederick, Maryland 21702-1201 > toms@ncifcrf.gov David States National Center for Biotechnology Information Nation Library of Medicine
roy@phri.nyu.edu (Roy Smith) (09/25/90)
states@artemis.NLM.NIH.GOV (David States) writes:
-> data formats defined in the GenInfo ASN.1 definition.
What is "GenInfo ASN.1"?
--
Roy Smith, Public Health Research Institute
455 First Avenue, New York, NY 10016
roy@alanine.phri.nyu.edu -OR- {att,cmcl2,rutgers,hombre}!phri!roy
"Arcane? Did you say arcane? It wouldn't be Unix if it wasn't arcane!"
usenet@nlm.nih.gov (usenet news poster) (09/29/90)
In article <1990Sep25.013551.9336@phri.nyu.edu> roy@phri.nyu.edu (Roy Smith) writes: > >What is "GenInfo ASN.1"? ASN.1 is Abstract Syntax Notation 1, an International Standards Organization standard for the description of data. It is designed to facilitate exchange of data between application programs. Automated tools are available to read ASN.1 definitions and generate source code to read the datafiles etc. GenInfo is a trademark of the National Library of Medicine (NLM) applied to a number of databases and services being developed at the National Center for Biotechnology Information (NCBI), a recently created center within the NLM. More information on the data formats we are using can be obtained by anonymous FTP to the server ncbi.nlm.nih.gov. Look at tech-report.1.txt in the ./tech-reports directory. >-- >Roy Smith, Public Health Research Institute >roy@alanine.phri.nyu.edu -OR- {att,cmcl2,rutgers,hombre}!phri!roy David States states@ncbi.nlm.nih.gov