erik@naggum.uu.no (Erik Naggum) (05/14/91)
James Clark's recent question prompted me to look for articles on the use of attributes in general, and I've found a number of references which appear negative or take a firm stand against using them. Without mentioning names, two large projects in which SGML is used take a firm stand against using attributes. The main argument in both is that the application will be able to sort out what's content and what's not, and that using attributes only complicate the parser - application interface, and also the parsing process. 1. What do you think about attributes versus nesting tags? 2. If you have decided against using attributes, what were the main argument(s) against using them? How did you represent the data that would otherwise have been in attributes? 3. Since NOTATION info is frequently provided in an attribute, does this mean you don't use NOTATIONs? 4. Are there parsers or other SGML software out there unable to process attributes? 5. If you have used attributes extensively, what were the major design choices you made between content and attribute for a given piece of information? I'm hoping for a discussion of the de/merits of attributes, and will summarize any private replies I get. Thanks for your time. -- [Erik Naggum] Professional Programmer <enag@ifi.uio.no> Naggum Software Electronic Text <erik@naggum.uu.no> 0118 OSLO, NORWAY Computer Communications +47-2-836-863
sjd%kirk%ebt-inc@uunet.UU.NET (Steve DeRose) (05/16/91)
Erik: could you cross-post this to comp.text.sgml for me? I'm having mailer troubles here. Thanks! Steve [Done. Headers juggled so replies should now go to the author. </Erik>] - - - - - - Usually there is another agenda behind outlawing attributes (although there are legitimate arguments in that direction). One agenda is to simplify parsers, but that is a silly place to simplify SGML; parsing attributes is trivial compared with several other, less useful, constructs (I know since I've written them). Same goes for the interface; there are *much* better places to simplify. Another is to simplify searching; people using systems like PAT, which operates on strings rather than tags per se, are less happy with things that can move around (the order of attributes in a given start-tag is arbitrary). This is not PAT's fault, it's just an observation heard from some later users of it who want to do certain kinds of searches. I'm surprised that someone argues that *avoiding* attributes would make it easier to separate content from non-content. Consider this example: <LIST TYPE=BULLETED> <LI>...</LI> ... </LIST> versus the attribute-less: <LIST> <TYPE>BULLETED</> <LI>...</LI> ... </LIST> Eliminating the attribute in this way *worsens* the confusion of markup with content, by putting the word "bulleted" in content. This, to my mind, is very bad indeed. The only alternatives that don't have this problem involve multiplying the number of tags, for example: <LIST> <BULLETED> <LI>...</LI> ... </LIST> or <BULLETLIST> <LI>...</LI> ... </LIST> This class of approaches needlessly complicates the DTD, and also loses the generalization that there is a class of object we might call a list, which has sub-types. Also, the former of these last two examples has the bad additional effect of dissociating the attribute's scope of applicability from that of its element. One of the criteria for attribute-hood (IMHO) should be that the attribute share precisely the scope of the element it modifies; only by making it part of the same tag can that relationship be syntactically demonstrated. The sub-type argument is especially important with things like tables, where each instance may have locally-significant information to be specified, such as column placements (I'd rather avoid putting that in the file at all, but this is what everyone does, including AAP, CALS, telecommunications, legal, and many other important DTDs). Certainly one shouldn't have to touch the DTD in order to define a new sort of table. Another major argument in favor of attributes is the need for unique identifiers for referencing. Heaven help us if we have to put anchor id's for links into content! As for NOTATION, there are two ways it's typically used. In the first, you declare a notation, then have some tags that represent the nonSGML data and have attributes (which may take entities as values), which specify where the data is: <ART filename=fig27> In the second, you have an NDATA entity, which is the sole representation of the nonSGML data (no attribute needed): <!ENTITY fig27 SYSTEM "/bin/fig27" NDATA tiff> ... &fig27; The latter eventually becomes messy because: 1) you cannot associate any differential information per instance of the notation; you cannot, for example, attach any information about how you want 'tiff' things treated in the document. For example, some figures can be floated to the next page if space is short, and some cannot; the only way to encode this is to put an ad hoc tag near the entity reference, with some ad hoc semantics that your application (not SGML) knows means that. 2) you have to add an ENTITY declaration for every new figure; authors shouldn't have to know that there even is such a thing as a markup declaration, and commonly they have neither the knowledge nor the software to do so. In passing, I should mention that there is one syntactic hassle with attributes which I hope will be addressed in a later version of the standard. There is no way, in a single line, to declare that a particular attribute can either a) apply to any tag at all, or b) apply to a tag which has already had attributes declared. Thus, you cannot say: <!ATTLIST #ANY ID ID #IMPLIED> or <ATTLIST p type CDATA #IMPLIED> and then add elsewhere as an afterthought: <ATTLIST (p|q) type CDATA #IMPLIED> This is merely syntactic convenience; you can still get the effect. As for Erik's excellent question about real software, awareness of attributes varies greatly between products. Certainly someone will correct me vociferously if I'm wrong, but the last I knew the Chameleon system was reluctant to handle attributes. Many authoring systems cannot decide formatting based on attributes, but that is typically seen as a limitation rather than a desire. My company's product lets you refer to attributes in style sheets, so that you can apply a completely different style definition to instances of an element that have different attribute values. I believe there are other products that can do this also. A few useful cases are shown below. Some encoding guidelines, including those of the Text Encoding Initiative, presume some such capability, and use attributes extensively. This is true of the majority of the large SGML projects I have worked with. <SEC SECURITY=TOP> <P LEVEL=EXPERT> <LIST TYPE=BULLETED> <TABLE LAYOUT=PARTSLIST> In short, I would say attributes are essential, although they should be used judiciously. Steven J. DeRose, Ph.D. Sr. System Architect Electronic Book Technologies One Richmond Square Providence, RI 02906 (401) 421-9550, fax -9551 sjd%ebt-inc@uunet.uu.net -- Erik Naggum Professional Programmer +47-2-836-863 Naggum Software Electronic Text <enag@ifi.uio.no> 0118 OSLO, NORWAY Computer Communications <erik@naggum.uu.no>