[comp.text.sgml] A floating paired tag...

ruben@bcstec.boeing.com (Reuben Wachtfogel) (06/08/91)

This may be very basic question but I really did
try to find the answer independently and I'm stumped.

How can I define a paired tag (one with a required
end_tag) that can be inserted at several places within
the tag heirarchy. 

That is assuming the heirarchy 

<a>
  <a1>
	<b1>
      blah blah blah
	<\b1>
  <\a1>
  
  <a2>
  <\a2>

<\a>

I wish to define a bracketing structure that could be inserted
around any of:
1) blah
2) <b1><\b1>
3) <a1><\a1>
4) <a><\a>

The catch is the legal list of contained elements needs to be 
inherited from the context in which the bracketing tag itself
was inserted (e.g. <bracket> under a1 may contain <bn> but under
<b2> it may only contain whatever is legal for <b2> )

For example consider the tag <LANGUAGE language="CHINESE"> </LANGUAGE>.
I wish to be able to place this tag around any arbitrary "chunk"
of data.

+(Language) for the <doc> tag won't get me there because any tags
found in <Language> would be illegal.

Are my only alternatives:
1) A bunch of +() -() sets for each bracketing tag
2) Define LANG1 LANG2 LANG3 for different locations in the Heirarchy.

Any help will be much appreciated.

Post here or ruben@dsp35001.boeing.com     Thanx...

enag@ifi.uio.no (Erik Naggum) (06/09/91)

Reuben Wachtfogel <ruben@bcstec.boeing.com> writes:
|
|   This may be very basic question but I really did
|   try to find the answer independently and I'm stumped.

Thanks for this courtesy to other news readers.  

|   I wish to define a bracketing structure that could be inserted
|   around any of:
|   1) blah
|   2) <b1><\b1>
|   3) <a1><\a1>
|   4) <a><\a>

If I may make some assumptions about what your problem is, here's my
thoughts.

<assumption>
The bracketing, floating element will be used to encode some
information _about_ the data content found inside the elements it
embraces.
<solution>
This suggests itself to attributes, and since you wish to have the
attribute value remain across sub-elements, you can either (1) let the
value be #IMPLIED by the application, so that the application will
keep track of the nested elements, or (2) let the value be #CURRENT.
<discussion>
The disadvantage with (1) is that you need a special mechanism in the
application software to handle this (and other attributes).  The
disadvantage with (2) is that #CURRENT attribute values do not respect
hierarchies, but are "current" for a given attribute when the element
to which it was specified ends.

I'm unclear on this point, myself, and would appreciate comments.
Given the document type declaration fragment
    <!ELEMENT outer (#PCDATA|inner)*>
    <!ELEMENT inner (#PCDATA)>
    <!ATTLIST (outer,inner) language CDATA #CURRENT>
and the document instance fragment
    <outer language=chinese>
	ha che
	<inner language=english>
	    good food
	</inner>
	kuluyuk
    </outer>
what is the value of the language attribute when "kuluyuk" is seen?

It seems clear that if the fragment ends with
	</inner>
	<inner>
	    sweet-sour pork
	</inner>
    </outer>
the second inner will have the language attribute value "english".

<assumption>
The information you need to encode is somehow external to the document
in which it applies, and should be regarded as a different "view" of
the document.
<solution>
For this purpose, you could define a concurrent document type which
deals with this information, and not the other structural information.
Presented this way, the elements of one document type does not interfere
with the other.
<discussion>
One clear disadvantage with this approach is that you would not be
able to relate the two document types very easily.  For instance, the
first document type's elements need not embrace complete elements of
the second.  Depending on your needs, this may be what you want.

<assumption>
There are external requirements that force you to use an element for
this purpose, and you very badly need to have defined regions within
which the attributes are declared, so badly that you could dispense
with the rigidity of the markup.
<solution>
You could declare the element with content model ANY, and enforce the
rigidity in the application, instead:
    <!ELEMENT language ANY>
    <!ATTLIST language language CDATA #IMPLIED>
If you do this, you can open a new element inside a <language> element
and proceed inside that element according to its content model, and
then close the <language> element.
<discussion>
I don't recommend this.  An alternate solution would be to list all
the valid sub-elements that language could possible embrace, e.g.
    <!ELEMENT language (a|b1|b2|b3)>
but this quickly becomes difficult to maintain and understand,
although it's better from a purist's view.  Note: this works only if
you allow <language> to embrace only _one_ element at a time.  If you
need to embrace several elements things quickly get very difficult.

I need more information on your problem to be able to evaluate each of
these or perhaps think of other solutions.  I can't readily see an
elegant way to do this without floating attributes (such as #CURRENT)
or a concurrent document, but there are many ways to specify attribute
values, including link attributes.

What we need (which may exist, but I may just have overlooked it), is
a means to supply attribute values for the extent of an element and
all sub-elements thereof.  #CURRENT, in my reading, doesn't support
this.  I'd be happy to be proven wrong on this one.  I'd especially
like to know what's the right interpretation of a #CURRENT attribute
which is changed inside an element.

I'm sorry that I can't test this because the parser I've written is
based on my own fuzzy assumptions on the treatment of CURRENT values.
Can anybody help, either with theory or with practice?

</Erik>
--
Erik Naggum             Professional Programmer            +47-2-836-863
Naggum Software             Electronic Text             <ERIK@NAGGUM.NO>
0118 OSLO, NORWAY       Computer Communications        <enag@ifi.uio.no>

jbm@hal.com (Brad Might) (06/12/91)

In article <ENAG.91Jun9020802@gyda.ifi.uio.no> enag@ifi.uio.no (Erik Naggum) writes:

Path: halaus!hal.com!decwrl!spool.mu.edu!snorkelwacker.mit.edu!bloom-beacon!eru!hagbard!sunic!ugle.unit.no!nuug!ifi.uio.no!enag
From: enag@ifi.uio.no (Erik Naggum)

Reuben Wachtfogel <ruben@bcstec.boeing.com> writes:
|
|   This may be very basic question but I really did
|   try to find the answer independently and I'm stumped.

Thanks for this courtesy to other news readers.  

|   I wish to define a bracketing structure that could be inserted
|   around any of:
|   1) blah
|   2) <b1><\b1>
|   3) <a1><\a1>
|   4) <a><\a>

If I may make some assumptions about what your problem is, here's my
thoughts.


I'm unclear on this point, myself, and would appreciate comments.
Given the document type declaration fragment
    <!ELEMENT outer (#PCDATA|inner)*>
    <!ELEMENT inner (#PCDATA)>
    <!ATTLIST (outer,inner) language CDATA #CURRENT>
and the document instance fragment
    <outer language=chinese>
	ha che
	<inner language=english>
	    good food
	</inner>
	kuluyuk
    </outer>
what is the value of the language attribute when "kuluyuk" is seen?

It seems clear that if the fragment ends with
	</inner>
	<inner>
	    sweet-sour pork
	</inner>
    </outer>
the second inner will have the language attribute value "english".


In the above case, the attribute only applies to 
the element that contains it. Therefore, in case (1) above,
kuluyuk is data within <outer language=chinese> and therefore
is treated as chinese. 

A case that may occur that is:


<outer language=chinese>
	ha che
	<inner language=english>
	    good food
	</inner>
	kuluyuk
</outer>
<outer>
	ha che
	<inner>
	    kuluyuk
	</inner>    
</outer>

where the second kuluyuk is going to be looked upon as english
since the last value of attribute language in element inner
was english.

What about making language a NOTATION ?

lets keep these discussions in news (rather than
private email) as it seems very useful in understanding
problems and solutions.


brad
--
-- 
- standard disclaimers apply -
jbm@hal.com (Brad Might) 
HaL Computer Systems - (512)794-2855
8920 Business Park Dr. Suite 300 Austin, Texas 78759

enag@ifi.uio.no (Erik Naggum) (06/13/91)

Brad Might <jbm@hal.com> writes:
|
|   In the above case, the attribute only applies to 
|   the element that contains it. Therefore, in case (1) above,
|   kuluyuk is data within <outer language=chinese> and therefore
|   is treated as chinese. 

This makes sense from a perspective outside SGML, but does ISO 8879
say so?  If attributes are passed by reference, a change in a #CURRENT
attribute will reflect on outer's attribute value, also.  In any case,
if #CURRENT attribute values are shared, are they so shared by the
SGML parser or the application software?  If they are not shared, at
least half the point with #CURRENT is gone, as I see it, and can
produce very counter-intuitive results, although the initial thinking
seems intuitive.  (SGML has some counter-intuitive specifications
which produce intuitive results for the SGML user.  In general, it's
the User which has been given primary importance in the entire stan-
dard, which I find (1) extremely uncommon, and (2) extremely good.)

|   <outer language=chinese>
|	    ha che
|	    <inner language=english>
|	        good food
|	    </inner>
|	    kuluyuk
|   </outer>
|   <outer>
|	    ha che
|	    <inner>
|	        kuluyuk
|	    </inner>    
|   </outer>
|
|   where the second kuluyuk is going to be looked upon as english
|   since the last value of attribute language in element inner
|   was english.

This implies that #CURRENT attributes are not shared, which contra-
dicts my reading of the standard, and which precludes attribute value
inheritance.  I agree that your example makes sense in general, but I
spent some time trying to figure out how #CURRENT works, and I think
that what makes sense in other contexts may not necessarily make sense
in SGML context.

|   What about making language a NOTATION?

I don't understand what this would buy us.  Can you give me an example
or explain what you would accomplish.

</Erik>
--
Erik Naggum             Professional Programmer            +47-2-836-863
Naggum Software             Electronic Text             <ERIK@NAGGUM.NO>
0118 OSLO, NORWAY       Computer Communications        <enag@ifi.uio.no>

jbm@hal.com (Brad Might) (06/13/91)

Erik Naggum replies:


>	This makes sense from a perspective outside SGML, but does ISO 8879
>	say so?  If attributes are passed by reference, a change in a #CURRENT
>	attribute will reflect on outer's attribute value, also.  In any case,
>	if #CURRENT attribute values are shared, are they so shared by the
>	SGML parser or the application software?  If they are not shared, at
>	least half the point with #CURRENT is gone, as I see it, and can
>	produce very counter-intuitive results, although the initial thinking
>	seems intuitive.  (SGML has some counter-intuitive specifications
>	which produce intuitive results for the SGML user.  In general, it's
>	the User which has been given primary importance in the entire stan-
>	dard, which I find (1) extremely uncommon, and (2) extremely good.)
>	
>	|   <outer language=chinese>
>	|	    ha che
>	|	    <inner language=english>
>	|	        good food
>	|	    </inner>
>	|	    kuluyuk
>	|   </outer>
>	|   <outer>
>	|	    ha che
>	|	    <inner>
>	|	        kuluyuk
>	|	    </inner>    
>	|   </outer>
>	|
>	|   where the second kuluyuk is going to be looked upon as english
>	|   since the last value of attribute language in element inner
>	|   was english.
>	
>	This implies that #CURRENT attributes are not shared, which contra-
>	dicts my reading of the standard, and which precludes attribute value
>	inheritance.  I agree that your example makes sense in general, but I
>	spent some time trying to figure out how #CURRENT works, and I think
>	that what makes sense in other contexts may not necessarily make sense
>	in SGML context.
>	

Can you show me where you have read that #CURRENT attributes
are shared amongst different elements ?

I find in "The SGML Handbook":
(emphasis mine)
-------
Clause 4 Definitions

4.67 current attribute: An attribute whose current (that is, most
recently specified) value becomes its default value.

NOTE -- The start-tag cannot be omitted for the first occurrence of AN
ELEMENT with a current attribute.

                AND

Annex B B.5.2.4 Changing Default Values

If the default value is specified as "CURRENT", the default will
automatically become the most recently specified value. This allows
an attribute value to be "inherited" by default from the previous
element OF THE SAME TYPE.
-------

Nowhere in the standard have I seen a mention of inheriting
attribute values from elements of a different type. I do see
this mentioned in "Practical SGML", but it's not the 
standard.

brad
-- 
- standard disclaimers apply -
jbm@hal.com (Brad Might) 
HaL Computer Systems - (512)794-2855
8920 Business Park Dr. Suite 300 Austin, Texas 78759

enag@ifi.uio.no (Erik Naggum) (06/14/91)

Brad Might <jbm@hal.com> writes:
|
|   Can you show me where you have read that #CURRENT attributes
|   are shared amongst different elements?

No.  Not because I'm stubborn, but because you're right -- there is no
such place.  Several things contributed to my misunderstanding of
this.  The word "inherit" occurs in B.5.2.4 [38:20], and I think I
mixed the rank feature in (note the name groupe I used to indicate
that they belonged together -- all bogus), and added a touch of
so-called "object oriented" languages to expand the meaning of
"inherit" beyond the safety limit.

|   Nowhere in the standard have I seen a mention of inheriting
|   attribute values from elements of a different type.

I stand corrected.  I spent a lot of time leafing back and forth
trying to find places where it said I was wrong, but all I found was
places where it said I wasn't right, but could be interpreted to be.
When I looked for places where it said I was right, I found none.
This makes me understand how people can stick with small misunder-
standings for a long time.  Thanks for setting this straight so fast.

|   I do see this mentioned in "Practical SGML", but it's not the
|   standard.

Hmmm... one of the annoying errors that bit _me_.  Eeek.  :-)

To conclude, then, there is no way to make an attribute value be
inherited by subelements of an element which specifies a value.
Kind of sad, since it leaves the <!ELEMENT language ANY> solution.

Does your NOTATION suggestion provide a better solution?

</Erik>
--
Erik Naggum             Professional Programmer            +47-2-836-863
Naggum Software             Electronic Text             <ERIK@NAGGUM.NO>
0118 OSLO, NORWAY       Computer Communications        <enag@ifi.uio.no>