tbray@watsol.waterloo.edu (Tim Bray) (07/26/88)
In article <61024@sun.uucp> tut%cairo@Sun.COM (Bill "Bill" Tuthill) writes: > I'm moving a discussion of SGML started in comp.text.desktop into this > newsgroup, because I think the issues are larger than a desktop. and so on. I have to disagree with nearly every line of Bill Tuthill's contribution. There are real problems with SGML, but they are not the ones he identifies. I think the problem is that he considers SGML strictly as a typesetting system, which is really beside the point. Detailed discussion follows, but the important points are: 1. If any on-line use for a document other than printing it out (Hypertext, information retrieval, on-line documentation) is contemplated, structural rather than typographical markup is a necessity. The arguments for this are many and are overpowering in their force. Rather than run through them, I refer everyone to the excellent article `Markup Systems and the Future of Scholarly Text Processing', by Coombs, Renear, and DeRose in the Nov. '87 CACM. 2. The SGML standard is a crock. I have not read it, but this is the unanimous consensus of everyone I know who has tried to work with it. The basic SGML syntax and concepts, however, are sound. I think the logical conclusion should be: let's not let the failure of the standards drafters deter us from using this basically good idea. Now, to address Mr. Tuthill's points: >Instead, SGML should be compared to decent >procedural languages such as troff and TeX. There are good reasons why >troff and TeX macro packages were invented: well-designed macros provide >writers with a descriptive layer ... No, SGML shouldn't be compared to these things. SGML and the typesetting packages exist to solve different problems. When you want to typeset your SGML document, you should translate it into troff or TeX or PostScript or something that's good at that job. SGML exists to prevent typographical nits from getting in the way of structural document design decisions. See the CACM article. >SGML is no panacea for portability. Being a metalanguage, SGML does not >provide one syntax, but only a method for describing different syntaxes. >On p. 68 Goldfarb states, "SGML allows variant concrete syntaxes." This >is tantamount to saying it isn't really standard. It would probably be >as difficult to translate between variant syntaxes as to translate between >troff and Interleaf or Frame. The great virtue of SGML is that it is very easy for computers to parse and is probably the most flexible form in which it is possible to store text. Our practical experience on the New OED project is that the first thing to do with input text is to do away with all the typesetting gibberish and get some approximation of SGML tags in there. You don't have to worry too much about getting them right; once the basic structure is there, it's remarkably easy to transform the text into the right setup, once you figure out what that should be. >SGML was born obsolete. Graphics are missing from the specification, as >are provisions for tables and equations. It is certainly possible in SGML to make a reference to an externally-stored graphic. Then at typesetting time, you copy in the appropriate PostScript/pic/rasterfile or whatever. SGML does indeed allow the specification of tables and equations, in a typography-independent way that lends itself to a variety of information-retrieval applications. Try to make automatic sense out of tbl or eqn source! On the other hand, it's easy to translate SGML structures *into* tbl or eqn or whatever. >SGML: > This added information, called <q>markup</q>, serves two purposes: > <ol> > <li>Separating the logical elements of the document; and > <li>Specifying the processing functions to be performed on those elements. > </ol> > This figure represents divine document intervention. -------- >troff > This added information, called \*Qmarkup\*U, serves two purposes: > .NP > Separating the logical elements of the document; and > .NP > Specifying the processing functions to be performed on those elements. > .LP > This figure represents divine document intervention. Which of these, do you think, lends itself better to online IR applications? Which is more easily automatically translated to the other? Both answers are obvious. >In the concrete syntax described, the >ASCII characters < > & % ; appear to be reserved symbols, but Goldfarb >offers no method for printing these characters literally. '<': <. '>': >. '&': &. etc... >SGML documents are supposed to be rigorous, but >rigorous means inflexible. A good point, and one of the big problems with the SGML standard. ISO SGML requires that one prepare what amounts to a *prescriptive* grammar for your document. This may be appropriate for airplane checkout manuals (maybe), but most document creators, when you get right down to it, know what they're doing pretty well and don't need a grammar getting in their way. Also there is the (common) problem of wanting to markup an existing body of text (for example the Oxford English Dictionary) which just ain't gonna always follow the rules. Does this mean one gives up the descriptive power of structural markup? Hey, I like troff/TeX and so on for doing typesetting. But typesetting is just one of many things that can be done with an electronic document. If you want enough flexibility to do some of those other things, don't limit yourself to typographical markup. Cheers, Tim Bray, New Oxford English Dictionary Project
romwa@gpu.utcs.toronto.edu (Mark Dornfeld) (07/27/88)
In article <7986@watdragon.waterloo.edu> tbray@watsol.waterloo.edu (Tim Bray) writes: >In article <61024@sun.uucp> tut%cairo@Sun.COM (Bill "Bill" Tuthill) writes: Tim has accurately argued the points that Bill brought up. SGML is NOT a typesetting system; Bill really missed that one. We (Royal Ontario Museum) are trying to standardize on SGML for many of our editorial projects. By standardizing on a markup language, we can write filters to troff, TeX, Pagemaker, Ventura and also ask for bids from typesetters who can read SGML. This multiplies our options tremendously. >2. The SGML standard is a crock. I have not read it, but this is the No, it isn't, well, at least not completely. It's usable and the standard is flexible as long as the DTD (Document Type Description) is complete. >Hey, I like troff/TeX and so on for doing typesetting. But typesetting >is just one of many things that can be done with an electronic document. >If you want enough flexibility to do some of those other things, don't >limit yourself to typographical markup. There's the key right there. Mark T. Dornfeld Royal Ontario Museum 100 Queens Park Toronto, Ontario, CANADA M5S 2C6 mark@utgpu!rom - or - romwa@utgpu
tbray@watsol.waterloo.edu (Tim Bray) (07/28/88)
In article <61454@sun.uucp>, sears@sun.uucp (Daniel Sears) writes >SGML provides rules for describing tag sets. So let's create two very simple >tag sets that we will assume are SGML conforming. The first has two tags and >the second has three. >... >it is possible to translate a document from the first tag set to the second. >But it is not possible to translate a document from the second tag set to >the first because there isn't an equivalent tag for <mark3>. What SGML >tries to guarantee is a way of describing the different tag sets, but it >does not guarantee that the tag sets will be rich enough to hold all the >objects that other tag sets may contain. Uh, am I missing something?. The second has more structural information than the first. Clearly you can't translate without 1. Losing information, or 2. Passing the extra info through; maybe the text processor for the first example can be made to understand it. If you want complete interchangability of documents, you need a general agreement on all possible structural components of all documents. This is clearly impossible. Not that we shouldn't try - the AAP effort is worthwhile. The next best thing is a standard, flexible, easily-parsed syntax for marking up the structural components that do exist so that we maximize our ability to translate them. This is all SGML is. But it's a *lot* better than the alternative - parsing troff/TeX gibberish. >In summary, the goal of structured document systems is >quite laudable, but I think it is necessary to distinguish a system like >SGML from that goal. I'll buy that. But for the time being, structural markup is the only safe way to store text for which there may be unanticipated future uses. This includes nearly all on-line text. Tim Bray, New Oxford English Dictionary Project, U. of Waterloo
tut%cairo@Sun.COM (Bill "Bill" Tuthill) (07/30/88)
So I'm learning that SGML is not Standard (multiple tag sets exist) General (doesn't do graphics, tables, or equations) for Markup (not intended for typesetting) a Language (rather a syntax for describing a language) Could SGML possibly be misnamed? Was it hopelessly naive of me to assume that the word Markup in its name indicates it is used for markup? At least I'm learning something from this discussion.
romwa@gpu.utcs.toronto.edu (Mark Dornfeld) (08/01/88)
In article <62137@sun.uucp> tut%cairo@Sun.COM (Bill "Bill" Tuthill) writes: >So I'm learning that SGML is not > > Standard (multiple tag sets exist) The standard is comprised of rules, not tag sets. Similar to the C language "standard", which is not a set of programs, but rules by which to write programs. > General (doesn't do graphics, tables, or equations) File handling and processing is taken care of by the host system. An SGML document could be a bitmap graphic with appropriate description associated with it. The AAP implementation includes a very workable method for describing tables, not setting them. Entity sets for describing equation symbols are supplied in the annex to the standard. > for Markup (not intended for typesetting) That's right. Let typesetting programs do the typesetting. > a Language (rather a syntax for describing a language) I quote from "The Standard" (ISO 8879-1986(E), Page 1): "This International Standard specifies a language for document representation referred to as the "Standard Generalized Markup Language" (SGML). SGML can be used for publishing in its broadest definition, ranging from single medium conventional publishing to multi-media data base publishing. SGML can also be used in office document processing when the benefits of human readability and interchange with publishing systems are required." > >Could SGML possibly be misnamed? Was it hopelessly naive of me to assume >that the word Markup in its name indicates it is used for markup? > >At least I'm learning something from this discussion. My background is Troff. When I began to see that our institution could not depend on a single typesetting system, but would be using graphical programs such as Ventura Publisher, Pagemaker, and would also have to send out material to be typeset at commercial typesetters, the value of SGML became clear. Since the same editors/writers would be producing text for any one of these systems, it was important that we teach them a single markup system. SGML seems to be the one. It is a trivial matter to filter SGML to troff, Ventura, Pagemaker and even to some commercial typesetters' "markup." Since many of our typesetting jobs are repetitive, but the typesetter isn't, we can achieve a level of control not possible before. Our documents become more consistent and less time is taken in editing and markup. When we begin a massive records management project sometime in the future and wish to store data on optical media, we will want a system that is not tied to a particular processing system. Rather we will want the information to be described in an independant way. SGML again seems to fit the bill. SGML doesn't care whether you indent your paragraphs one or two ems. It just want to tell you there is a paragraph there and leave the formatting to the designer. I used to think if the whole world would just learn troff, we'd be in great shape. It's easy to get trapped by such a flexible and powerful system as troff, but it just doesn't answer all our needs. Mark T. Dornfeld Royal Ontario Museum 100 Queens Park Toronto, Ontario, CANADA M5S 2C6 mark@utgpu!rom - or - romwa@utgpu