[comp.text.sgml] Data attribute specification delimiter recognition mode

jjc@jclark.UUCP (James Clark) (05/08/91)

[References are to the SGML Handbook.]

I can't figure out what the delimiter recognition mode [359] is
supposed to be when parsing an attribute specification list [327:17]
occurring in a data attribute specification [428:20]?  (This occurs in
an external entity specification [400:1] in an entity declaration
[394:18].)  I thought initially it would be MD mode but attribute
specifications use the VI delimiter which is recognized only in TAG
mode [360, figure 3], and TAG mode is said to apply only in a
start-tag or an end-tag [361:5], and in any case the DSC delimiter is
not recognized in TAG mode.  What am I missing?  I have a similar
problem with result attribute specifications [446:8] and link
attribute specifications [443:16].

James Clark
jjc@jclark.uucp

enag@ifi.uio.no (Erik Naggum) (05/10/91)

In <JJC.91May8150350@jclark.UUCP>, James Clark writes:

    [What is the] delimiter recognition mode [359] supposed to be
    when parsing an attribute specification list [327:17] occurring
    in a data attribute specification [428:20]?

Good question.

I have to make up an example to be able to answer your question.
Given

	<!NOTATION eqn SYSTEM "/usr/lib/text/eqn2eps">
	<!ATTLIST #NOTATION eqn delim CDATA "$ $">

	<!ENTITY formula1 SYSTEM "formula.1" CDATA eqn [ delim="@ @" ]>

	<!ELEMENT formula - O CDATA>
	<!ATTLIST formula
		    notation NOTATION (eqn|other) #CURRENT
		    location ENTITY #CONREF
	>

consider the element

	<formula location=formula1>

In this (contrived) example, the entity is accessed within a start-
tag, and the delim data attribute is probably parsed at that point.
This is, as far as I read it, the only time an external entity with a
notation should be referenced, as it would make nil sense (semanti-
cally) to reference an external entity with a data content notation
by itself.  Consider the case

	&formula1;

What could this possibly mean?  The entity is referenced by and fed
to the parser, not the application, and the parser doesn't know what
to do with a notation.  Somehow, the notation information (and its
data attributes) have to be communicated to the application, and a
general entity reference can't do that.  (I have some questions
relating to the comment in 10.5.5 [401:9] that a data entity may be
able to reference other data entities or subdocuments _in_its_own_
_notation_.  How?)

I would still think that this is an oversight of the amendment, and
that something should be said about data attributes in general entity
references in replaceable character data [343:1] and other content
[320:14].  It's also somewhat contorted to think of the data attributes
as applying to and being parsed in the start-tag where the entity is
referenced.  I'll file a defect report with ISO.


    I have a similar problem with result attribute specifications
    [446:8] and link attribute specifications [443:16].

The answer here also concerns when parsing of the attribute
specification list occurs, but I don't think an oversight of the
amendment in involved.  Given

	<!DOCTYPE source [
	    <!ELEMENT source (#PCDATA)>
	]>
	<!DOCTYPE result [
	    <!ELEMENT result (#PCDATA)>
	    <!ATTLIST result smurf CDATA #IMPLIED>
	]>
	<!LINKTYPE example source result [
	    <!ATTLIST result gnarf CDATA #IMPLIED>
	    <!LINK #INITIAL
		source [ gnarf="there" ] result [ smurf="here" ]
	    >
	]>

the document instance

	<source>data</source>

becomes

	<source gnarf="there">data</source>

as the source document link attribute definition list is applied,
which becomes

	<result smurf="here">data</result>

after link processing.  That is, the attribute specification lists
are parsed when the start-tag is encountered or generated, in that
start-tag, i.e. in TAG recognition mode.  In other words, the attr
def list is only stored for later parsing when the start-tag to
which is applies is parsed.

Whew!  This took me a while to answer.  (In fact, almost 8 hours,
including leafing back and forth in The SGML Handbook, ISO 8879 and
ISO 8879A1, to check when something was added, trying to find all
references to data attributes and notation, tracing down the syntax
productions (thanks to my package for GNU Emacs, this was relatively
painless) to find where /vi/ was referenced, etc.  I think I've got
it right, but I'm not at all certain about when data attributes are
parsed.  It only makes a lot of sense to do it within the start-tag
which references the entity which has the notation attribute and the
associated data attribute specification, and nil sense to do it for
random general entity references outside start-tags.  I'm somewhat
confused by the first sub-item of item k in attachment 1 to ISO/IEC
JTC 1/SC 18/WG 8/N 1035 (Appendix B in The SGML Handbook) [593:1],
which seems to address general entity references to data entities,
but not explicitly.)

I hope it's been as rewarding to read as it was to write it.

--
[Erik Naggum]           Professional Programmer        <enag@ifi.uio.no>
Naggum Software             Electronic Text          <erik@naggum.uu.no>
0118 OSLO, NORWAY       Computer Communications            +47-2-836-863