xxae021@chpc.utexas.edu (Rob Read) (05/15/91)
Hi. This is to announce a software system available from the GenTools project at the University of Texas Center for High Performance Computing which may be of interest to programmers (and their bosses) who are trying to extract information from the GenBank flat file format. I have written a parser for this format using the tools Flex and Yacc (or Bison) which I hope will make it easy for a C programmer with some Yacc experience to extract information from the GenBank flat file format, or to translate it into some other format. The code translates most (99%) of the GenBank entries into a prolog-like language; a programmer could easily produce output in any other required format. This is expected to be useful to many of those who cannot afford or gain access to the (undoubtably superior) relational format of GenBank on the RDBMS SYBASE, or those who wish to write special programs to extract information from the feature tables. The software is an alpha release; few others have tested it and I have only tested it on Sun Sparcstations. However, I suspect many programmers would like to see the code (grammar) I have written, even if they do not intend to use it, because it represents the most concrete description of the GenBank format (including the feature table) that I have seen. I call the code "gbparse". There is documentation in the package. The code is not trouble free, in part because it must deal with actual syntax errors in the distributed flat files. Although the grammar may be wrong in several ways, many of the "parsing errors" which it reports for Release 66 are in fact mismatched quotes in the files, which are hard to deal with. The program is somewhat robust in reporting errors in particular entries. Gbparse-0.0 is available by e-mailing a request to : gentools@chpc.utexas.edu Due to legal complications here, our distribution will not be by anonymous ftp, at least at first. It will, of course, be free to non-profit oriented organizations. The source is distributed (in this case the source is all that is useful.) Thanks to Jacob Engelbrecht and Jo Pelkey? for some initial testing and Dan Davison for a starting grammar. Questions, comments, bugs, and so on, should be reported to: gentools@chpc.utexas.edu Robert L. Read GenTools Project Programmer UT-Center for High Performance Computing (CHPC) Balcones Research Center 10100 North Burnet Road, CMS 1.154 Austin, Texas 78712