ruben@bcstec.boeing.com (Reuben Wachtfogel) (06/08/91)
This may be very basic question but I really did try to find the answer independently and I'm stumped. How can I define a paired tag (one with a required end_tag) that can be inserted at several places within the tag heirarchy. That is assuming the heirarchy <a> <a1> <b1> blah blah blah <\b1> <\a1> <a2> <\a2> <\a> I wish to define a bracketing structure that could be inserted around any of: 1) blah 2) <b1><\b1> 3) <a1><\a1> 4) <a><\a> The catch is the legal list of contained elements needs to be inherited from the context in which the bracketing tag itself was inserted (e.g. <bracket> under a1 may contain <bn> but under <b2> it may only contain whatever is legal for <b2> ) For example consider the tag <LANGUAGE language="CHINESE"> </LANGUAGE>. I wish to be able to place this tag around any arbitrary "chunk" of data. +(Language) for the <doc> tag won't get me there because any tags found in <Language> would be illegal. Are my only alternatives: 1) A bunch of +() -() sets for each bracketing tag 2) Define LANG1 LANG2 LANG3 for different locations in the Heirarchy. Any help will be much appreciated. Post here or ruben@dsp35001.boeing.com Thanx...
enag@ifi.uio.no (Erik Naggum) (06/09/91)
Reuben Wachtfogel <ruben@bcstec.boeing.com> writes: | | This may be very basic question but I really did | try to find the answer independently and I'm stumped. Thanks for this courtesy to other news readers. | I wish to define a bracketing structure that could be inserted | around any of: | 1) blah | 2) <b1><\b1> | 3) <a1><\a1> | 4) <a><\a> If I may make some assumptions about what your problem is, here's my thoughts. <assumption> The bracketing, floating element will be used to encode some information _about_ the data content found inside the elements it embraces. <solution> This suggests itself to attributes, and since you wish to have the attribute value remain across sub-elements, you can either (1) let the value be #IMPLIED by the application, so that the application will keep track of the nested elements, or (2) let the value be #CURRENT. <discussion> The disadvantage with (1) is that you need a special mechanism in the application software to handle this (and other attributes). The disadvantage with (2) is that #CURRENT attribute values do not respect hierarchies, but are "current" for a given attribute when the element to which it was specified ends. I'm unclear on this point, myself, and would appreciate comments. Given the document type declaration fragment <!ELEMENT outer (#PCDATA|inner)*> <!ELEMENT inner (#PCDATA)> <!ATTLIST (outer,inner) language CDATA #CURRENT> and the document instance fragment <outer language=chinese> ha che <inner language=english> good food </inner> kuluyuk </outer> what is the value of the language attribute when "kuluyuk" is seen? It seems clear that if the fragment ends with </inner> <inner> sweet-sour pork </inner> </outer> the second inner will have the language attribute value "english". <assumption> The information you need to encode is somehow external to the document in which it applies, and should be regarded as a different "view" of the document. <solution> For this purpose, you could define a concurrent document type which deals with this information, and not the other structural information. Presented this way, the elements of one document type does not interfere with the other. <discussion> One clear disadvantage with this approach is that you would not be able to relate the two document types very easily. For instance, the first document type's elements need not embrace complete elements of the second. Depending on your needs, this may be what you want. <assumption> There are external requirements that force you to use an element for this purpose, and you very badly need to have defined regions within which the attributes are declared, so badly that you could dispense with the rigidity of the markup. <solution> You could declare the element with content model ANY, and enforce the rigidity in the application, instead: <!ELEMENT language ANY> <!ATTLIST language language CDATA #IMPLIED> If you do this, you can open a new element inside a <language> element and proceed inside that element according to its content model, and then close the <language> element. <discussion> I don't recommend this. An alternate solution would be to list all the valid sub-elements that language could possible embrace, e.g. <!ELEMENT language (a|b1|b2|b3)> but this quickly becomes difficult to maintain and understand, although it's better from a purist's view. Note: this works only if you allow <language> to embrace only _one_ element at a time. If you need to embrace several elements things quickly get very difficult. I need more information on your problem to be able to evaluate each of these or perhaps think of other solutions. I can't readily see an elegant way to do this without floating attributes (such as #CURRENT) or a concurrent document, but there are many ways to specify attribute values, including link attributes. What we need (which may exist, but I may just have overlooked it), is a means to supply attribute values for the extent of an element and all sub-elements thereof. #CURRENT, in my reading, doesn't support this. I'd be happy to be proven wrong on this one. I'd especially like to know what's the right interpretation of a #CURRENT attribute which is changed inside an element. I'm sorry that I can't test this because the parser I've written is based on my own fuzzy assumptions on the treatment of CURRENT values. Can anybody help, either with theory or with practice? </Erik> -- Erik Naggum Professional Programmer +47-2-836-863 Naggum Software Electronic Text <ERIK@NAGGUM.NO> 0118 OSLO, NORWAY Computer Communications <enag@ifi.uio.no>
jbm@hal.com (Brad Might) (06/12/91)
In article <ENAG.91Jun9020802@gyda.ifi.uio.no> enag@ifi.uio.no (Erik Naggum) writes: Path: halaus!hal.com!decwrl!spool.mu.edu!snorkelwacker.mit.edu!bloom-beacon!eru!hagbard!sunic!ugle.unit.no!nuug!ifi.uio.no!enag From: enag@ifi.uio.no (Erik Naggum) Reuben Wachtfogel <ruben@bcstec.boeing.com> writes: | | This may be very basic question but I really did | try to find the answer independently and I'm stumped. Thanks for this courtesy to other news readers. | I wish to define a bracketing structure that could be inserted | around any of: | 1) blah | 2) <b1><\b1> | 3) <a1><\a1> | 4) <a><\a> If I may make some assumptions about what your problem is, here's my thoughts. I'm unclear on this point, myself, and would appreciate comments. Given the document type declaration fragment <!ELEMENT outer (#PCDATA|inner)*> <!ELEMENT inner (#PCDATA)> <!ATTLIST (outer,inner) language CDATA #CURRENT> and the document instance fragment <outer language=chinese> ha che <inner language=english> good food </inner> kuluyuk </outer> what is the value of the language attribute when "kuluyuk" is seen? It seems clear that if the fragment ends with </inner> <inner> sweet-sour pork </inner> </outer> the second inner will have the language attribute value "english". In the above case, the attribute only applies to the element that contains it. Therefore, in case (1) above, kuluyuk is data within <outer language=chinese> and therefore is treated as chinese. A case that may occur that is: <outer language=chinese> ha che <inner language=english> good food </inner> kuluyuk </outer> <outer> ha che <inner> kuluyuk </inner> </outer> where the second kuluyuk is going to be looked upon as english since the last value of attribute language in element inner was english. What about making language a NOTATION ? lets keep these discussions in news (rather than private email) as it seems very useful in understanding problems and solutions. brad -- -- - standard disclaimers apply - jbm@hal.com (Brad Might) HaL Computer Systems - (512)794-2855 8920 Business Park Dr. Suite 300 Austin, Texas 78759
enag@ifi.uio.no (Erik Naggum) (06/13/91)
Brad Might <jbm@hal.com> writes: | | In the above case, the attribute only applies to | the element that contains it. Therefore, in case (1) above, | kuluyuk is data within <outer language=chinese> and therefore | is treated as chinese. This makes sense from a perspective outside SGML, but does ISO 8879 say so? If attributes are passed by reference, a change in a #CURRENT attribute will reflect on outer's attribute value, also. In any case, if #CURRENT attribute values are shared, are they so shared by the SGML parser or the application software? If they are not shared, at least half the point with #CURRENT is gone, as I see it, and can produce very counter-intuitive results, although the initial thinking seems intuitive. (SGML has some counter-intuitive specifications which produce intuitive results for the SGML user. In general, it's the User which has been given primary importance in the entire stan- dard, which I find (1) extremely uncommon, and (2) extremely good.) | <outer language=chinese> | ha che | <inner language=english> | good food | </inner> | kuluyuk | </outer> | <outer> | ha che | <inner> | kuluyuk | </inner> | </outer> | | where the second kuluyuk is going to be looked upon as english | since the last value of attribute language in element inner | was english. This implies that #CURRENT attributes are not shared, which contra- dicts my reading of the standard, and which precludes attribute value inheritance. I agree that your example makes sense in general, but I spent some time trying to figure out how #CURRENT works, and I think that what makes sense in other contexts may not necessarily make sense in SGML context. | What about making language a NOTATION? I don't understand what this would buy us. Can you give me an example or explain what you would accomplish. </Erik> -- Erik Naggum Professional Programmer +47-2-836-863 Naggum Software Electronic Text <ERIK@NAGGUM.NO> 0118 OSLO, NORWAY Computer Communications <enag@ifi.uio.no>
jbm@hal.com (Brad Might) (06/13/91)
Erik Naggum replies: > This makes sense from a perspective outside SGML, but does ISO 8879 > say so? If attributes are passed by reference, a change in a #CURRENT > attribute will reflect on outer's attribute value, also. In any case, > if #CURRENT attribute values are shared, are they so shared by the > SGML parser or the application software? If they are not shared, at > least half the point with #CURRENT is gone, as I see it, and can > produce very counter-intuitive results, although the initial thinking > seems intuitive. (SGML has some counter-intuitive specifications > which produce intuitive results for the SGML user. In general, it's > the User which has been given primary importance in the entire stan- > dard, which I find (1) extremely uncommon, and (2) extremely good.) > > | <outer language=chinese> > | ha che > | <inner language=english> > | good food > | </inner> > | kuluyuk > | </outer> > | <outer> > | ha che > | <inner> > | kuluyuk > | </inner> > | </outer> > | > | where the second kuluyuk is going to be looked upon as english > | since the last value of attribute language in element inner > | was english. > > This implies that #CURRENT attributes are not shared, which contra- > dicts my reading of the standard, and which precludes attribute value > inheritance. I agree that your example makes sense in general, but I > spent some time trying to figure out how #CURRENT works, and I think > that what makes sense in other contexts may not necessarily make sense > in SGML context. > Can you show me where you have read that #CURRENT attributes are shared amongst different elements ? I find in "The SGML Handbook": (emphasis mine) ------- Clause 4 Definitions 4.67 current attribute: An attribute whose current (that is, most recently specified) value becomes its default value. NOTE -- The start-tag cannot be omitted for the first occurrence of AN ELEMENT with a current attribute. AND Annex B B.5.2.4 Changing Default Values If the default value is specified as "CURRENT", the default will automatically become the most recently specified value. This allows an attribute value to be "inherited" by default from the previous element OF THE SAME TYPE. ------- Nowhere in the standard have I seen a mention of inheriting attribute values from elements of a different type. I do see this mentioned in "Practical SGML", but it's not the standard. brad -- - standard disclaimers apply - jbm@hal.com (Brad Might) HaL Computer Systems - (512)794-2855 8920 Business Park Dr. Suite 300 Austin, Texas 78759
enag@ifi.uio.no (Erik Naggum) (06/14/91)
Brad Might <jbm@hal.com> writes: | | Can you show me where you have read that #CURRENT attributes | are shared amongst different elements? No. Not because I'm stubborn, but because you're right -- there is no such place. Several things contributed to my misunderstanding of this. The word "inherit" occurs in B.5.2.4 [38:20], and I think I mixed the rank feature in (note the name groupe I used to indicate that they belonged together -- all bogus), and added a touch of so-called "object oriented" languages to expand the meaning of "inherit" beyond the safety limit. | Nowhere in the standard have I seen a mention of inheriting | attribute values from elements of a different type. I stand corrected. I spent a lot of time leafing back and forth trying to find places where it said I was wrong, but all I found was places where it said I wasn't right, but could be interpreted to be. When I looked for places where it said I was right, I found none. This makes me understand how people can stick with small misunder- standings for a long time. Thanks for setting this straight so fast. | I do see this mentioned in "Practical SGML", but it's not the | standard. Hmmm... one of the annoying errors that bit _me_. Eeek. :-) To conclude, then, there is no way to make an attribute value be inherited by subelements of an element which specifies a value. Kind of sad, since it leaves the <!ELEMENT language ANY> solution. Does your NOTATION suggestion provide a better solution? </Erik> -- Erik Naggum Professional Programmer +47-2-836-863 Naggum Software Electronic Text <ERIK@NAGGUM.NO> 0118 OSLO, NORWAY Computer Communications <enag@ifi.uio.no>