[bionet.molbio.genbank] new feature table format

michael%domain@LANL.GOV (Michael J. Cinkosky) (08/13/90)

In release 64, as noted by Dan Davison, /map and /hgml_locus_uid
qualifiers appear within /note qualifiers.  While this is "legal"
according to the feature table syntax, it is certainly not appropriate
long-term.  Dan is certainly correct in wanting to see this change soon.
It was done as a temporary measure only and will be converted to the
more appropriate syntax by release 66 (possibly by release 65).

It should be noted that the strategy adopted (i.e., park new qualifiers
in /notes until the schema/software are ready to support them properly)
will probably be used in the future as the annotation staff here identifies
new candidates for qualifiers.  Using this mechanism allows us to capture
any new information immediately in a form that can be processed
readily at a future point in time when we have a formal mechanism for
storing the new type of data.

Dan also suggests indenting lines on which qualifiers are continued
from the previous line differently than lines on which qualifiers
begin.  This suggestion, along with some of Dan's earlier comments,
highlights the major difficulty of designing a format for anything
of this complexity.  We made the decision early on that the new feature
table format would follow a well-defined formal syntax (e.g., one
describable in BNF).  The grammer that was selected is "token-oriented",
not "line-oriented".  While writing a token-oriented parser is different
from writing a line-oriented parser, it is not significantly more difficult.

Dan's suggestion would, in effect, introduce a second mechanism for
indicating that a qualifier value was continued on another line.  I
personally would be very hesitant to design a syntax with multiple
(possible conflicting) mechanisms for accomplishing one purpose.
It would seem that any software for parsing the new feature table has to
understand wrapping feature qualifiers using the quotation marks, and
therefore should be able navigate the table as a whole using that same
mechanism.

Michael Cinkosky
GenBank
Los Alamos National Laboratory

roy@phri.nyu.edu (Roy Smith) (08/14/90)

	First, some administrativia.  A few people responded to my feeler
for a mailing list to discuss parsing feature tables.  It has been
suggested that rather than form a new (private) discussion group, we just
use the existing channels, namely bionet.molbio.genbank, in all of its
various incarnations.  I can't see any down side to that idea, so that's
exactly what I'm going to do.  Should that turn out to be unworkable, the
mailing list idea can be resurected, but for now this seems a lot simplier.

michael%domain@LANL.GOV (Michael J. Cinkosky) writes:
> Dan also suggests indenting lines on which qualifiers are continued from
> the previous line differently than lines on which qualifiers begin. [...]
> The grammer that was selected is "token-oriented", not "line-oriented".
> While writing a token-oriented parser is different from writing a
> line-oriented parser, it is not significantly more difficult.

	Perhaps there is a middle ground?  Requiring the use of special
indentation as a machine parsing aid is silly, since there is already a
perfectly good parsing mechanism available (the BNF).  On the other hand, it
might make it easier for humans to scan the files by eye.  I don't see how
the extra indentation would get in the way of machine parsing, so why not
use it?

	Dan, I got your Bison grammer and am trying to figure out what might
be wrong.  I don't use Bison, and don't consider myself a Yacc guru either,
so I don't know how much luck I'll have.  I just got the Feature Table
Definition from EMBL (contrary to the MS Word documentation, you can't print
linked files back-to-front!  Grrrr...) but havn't digested it yet.  BTW, I
put it on our anonymous ftp server (goober.phri.nyu.edu, 128.122.136.10,
filename ~ftp/pub/seq/FeatureTable.msw.sit.hqx) to make it easier for people
on this side of the Atlantic to get at it.
--
Roy Smith, Public Health Research Institute
455 First Avenue, New York, NY 10016
roy@alanine.phri.nyu.edu -OR- {att,cmcl2,rutgers,hombre}!phri!roy
"Arcane?  Did you say arcane?  It wouldn't be Unix if it wasn't arcane!"