dns@sq.sq.com (David Slocombe) (03/04/90)
In article <2547@castle.ed.ac.uk> sean@castle.ed.ac.uk (S Matthews) writes: > . . . you may think that the results of > your favorite wysiwig wp are up to TeX standards, but in larger > documents you quickly start to lose stylistic coherence. > > To keep that coherence, you need a markup language, be it troff, sgml or > TeX. > [SGML == Standard Generalized Markup Language (IS 8879)] SGML is not a "markup language" in the same sense as troff and TeX are: there is historical justification for the use of the term "markup" in SGML, but it *does* cause much confusion. Markup was originally the marks put on a manuscript by the copy-editor or book-designer to indicate to the compositor how the manuscript was to be formatted: in words, things like "Compose this text in 10-point Times Medium Italic on a 12-point body, justified on a 22-Pica slug with indents of 1 en on left and right". With the introduction of computers, these things got coded into arcane escape sequences which make even raw troff or TeX look pretty easy. Many compositors were glad to be retiring then! (Typically the code-set used wasn't ASCII: it was TTS, or "Teletypesetting code", with strange control-codes like "elevate", "paper-feed", and "quad-left", designed for paper-tape-driven linecasters made by Linotype and Intertype. As people have done with the codes below 040 in ASCII, the photocomposer manufacturers felt free to redefine the semantics of these linecaster-specific codes in any way they saw fit -- differently for each brand of machine, of course! TTS was actually a 6-bit extension of the 5-bit Baudot code, and was designed originally (I think) for automating the typesetting of stock market quotations at major newspapers. At least that's the first use *I* heard of.) Each new phototypesetter to come on the market had its own proprietary "language" of escape sequences. Since typesetting houses usually stuck with one manufacturer, they didn't mind this too much (at first). But their clients, especially their *large* clients (like the U.S. Govt), soon got the idea of doing the original "keyboarding" on their own computers and sending machine-readable files to their typesetting vendors. Then the catch became obvious: you couldn't keyboard a manuscript until you knew the escape sequences you had to use (called "markup" by analogy), and you couldn't do that until you had chosen your typesetting vendor so you knew what "markup language" to use. So, on large projects, you tended to get locked into a specific vendor because you had on-going keyboarding operations. Not to mention the operator-training problems if you switched vendors and had a new phototypesetter language to learn. So there was a movement to create a standard markup language (initially called "GenCode") which typesetting vendors would all be persuaded to accept as input. Then, purchasers of typesetting could keyboard their manuscripts in only one way regardless of the phototypesetter that would ultimately be used. It would be the type houses' problem to translate the GenCode into the language of their own photocomposer machines. The GenCode effort was spearheaded by the Graphic Communications Association (GCA) -- an industry group -- who own the trademark "GenCode". The codes proposed were in plain, printable ASCII and had names that reflected the human function of the piece of the manuscript, just like troff macros: "<P>" instead of "ESC]@#22&" or something similar in TTS. But still, there was a one-to-one mapping intended between physical changes in the typesetting parameters and the "generic" codes that had to be typed. The model was a state-model: one code-string put in for each change to the state of the typesetter (as with troff and TeX today). Concurrently, an ANSI committee started work on a standard for coding documents, based on Charles Goldfarb's GML computer typesetting language at IBM. The model in GML was hierarchical rather than a state-model: the approach was to represent the document as a hierarchical data-structure first and foremost, then to specify how each of the pieces ("elements") of that data-structure (within their context) was to be typeset for a particular book-design, and then to generate the state-model codes required by the phototypesetter to accomplish that result. Because the document was represented as a hierarchy, it was a small step to think of it with its [structural] "markup" -- see the evolution of the term? -- as an instance of a language with a context-free grammar. Then, by defining the grammar, you could define the class of hierarchical structures that the documents could have. If you made this grammar user-definable, with a parser-generator to create a parser for that grammar, you gained *both* enormous flexibility in the kinds of document-structures you could cope with in practise, *and*, at the same time, an elegant way to validate documents for consistent structural usage before they were typeset. At this point the GCA committee working on GenCode and the ANSI committee working on a standard based on GML combined their efforts, the work was upgraded to the ISO level, and Standard Generalized Markup Language (SGML) was on the way to being born. BUT.... how things had changed by this point! SGML had become a way of encoding the logical [hierarchical] structure of a document and there was no mention at all of how the individual parts of that structure were to be represented on the printed page! Somehow, *that* had become an exercise left to the implementor! There is, today, *no* standard method for encoding the way things are to *look* on the page. SGML *is* a major achievement, because -- apart from its obvious uses in document databases -- it at least provides well-behaved "handles" for each piece of a document that a formatter needs to refer to, and an SGML parser can guarantee for the formatter that there are no unexpected surprises in the structure of the input document. But it provides no way to specify the *design* of the printed document (i.e. how it is to be formatted). This is ironic, because the whole SGML effort started out with formatting very definitely in the forefront. So there is a *new* ISO committee. (Some members of the old SGML committee simply moved over to this new committee when SGML was finished.) It meets under the auspices of ISO/IEC JTC1/SC18/W8 and it is attempting to define a second, complementary, standard with the awkward title: Information Processing -- Text Composition -- Document Style Semantics and Specification Language Or DSSSL for short. The job of this new standard is to define a language that can be used to specify -- for any document that is an instance of a *given* SGML document-type-definition (i.e. grammar) -- how that document is to be transformed. Usually (but not necessarily) the transformation wanted is into a visual representation of the document, or, rather, into a data-file that directly describes that visual representation. The form of the data-file usually assumed in the committee's deliberations is "SPDL": Standard Page Description Language -- think "PostScript/Interpress" and you've got the idea. SPDL is the product of yet another ISO standards committee with co-chairpersons who just happen to work for Adobe and Xerox respectively. SPDL is very close to completion. In my opinion, DSSSL is a long way from completion. But, when it is done, you'll have: -- a standard way (SGML) to specify the logical structure of a class of documents, as well as a way to encode a document which is an instance of that class, and -- a standard way (DSSSL) to specify how documents in a given SGML class are to be represented visually, and -- a standard way (SPDL) to represent in the computer the concrete visualization of a specific document. SGML-document ==> parser/formatter ==> SPDL-print-file. | | SGML DSSSL doctype specification SGML parser-generators -- lex/yacc just isn't up to the task -- are not *really* hard to write. (The hard part is understanding the Standard!) But DSSSL formatters do not yet exist, and the technology for writing them has yet to be invented. In article <RUSTY.90Mar1112619@garnet.berkeley.edu> rusty@garnet.berkeley.edu (rusty wright) writes: > (1) With a wysiwyg system you are constantly made aware of the > formatting; some would say that you are being distracted by the > formatting. During the writing you should only be worrying about the > content and leave worrying about the appearance until just before the > final draft. > One of the proven advantages of SGML is precisely that writers can keep their minds off formatting issues while doing their jobs. Let writers write, and designers design! ---------------------------------------------------------------- David Slocombe (416) 963-8337 Vice-President, Research & Development (800) 387-2777 (from U.S. only) SoftQuad Inc. uucp: {uunet,utzoo}!sq!dns 720 Spadina Ave. Internet: dns@sq.com Toronto, Ontario, Canada M5S 2T9 Fax: (416) 963-9575
ken@cs.rochester.edu (Ken Yap) (03/04/90)
|SGML is not a "markup language" in the same sense as troff and TeX |are: there is historical justification for the use of the term |"markup" in SGML, but it *does* cause much confusion. Coombs, in his CACM article on markup languages of about two years ago calls these descriptive markup and procedural markup respectively. I wish I had the reference online. I also wish there was a standard for machine readable references and that journals would use this standard so that we could run our light pens over the strips and thousands of readers worldwide would not have to key these in manually.
cso@organ.cis.ohio-state.edu (Conleth O'Connell) (03/05/90)
In article <1990Mar4.045813.14391@cs.rochester.edu> ken@cs.rochester.edu writes: >|SGML is not a "markup language" in the same sense as troff and TeX >|are: there is historical justification for the use of the term >|"markup" in SGML, but it *does* cause much confusion. > >Coombs, in his CACM article on markup languages of about two years ago >calls these descriptive markup and procedural markup respectively. > >I wish I had the reference online. I also wish there was a standard for >machine readable references and that journals would use this standard >so that we could run our light pens over the strips and thousands of >readers worldwide would not have to key these in manually. The Coombs reference is: author ="J.H. Coombs and A.H. Renear and S.J. DeRose", title ="Markup Systems and the Future of Scholarly Text Processing", year ="1987", month ="November", journal="Communications of the ACM", volume ="30", number ="11", pages ="933-947", Hope this helps, Con -=- Conleth S. O'Connell Department of Computer and Information Science The Ohio State University cso@cis.ohio-state.edu 2036 Neil Ave., Columbus, OH USA 43210-1277
pedersen@philmtl.philips.ca (Paul Pedersen) (03/05/90)
In article <1990Mar3.224625.2621@sq.sq.com> dns@sq.com (David Slocombe) writes: > [a lot of stuff deleted..] >There is, today, *no* standard method for encoding the way things >are to *look* on the page. SGML *is* a major achievement, because > [more stuff deleted..] I must object. There is such a standard ISO 8613 "Office Document Architecture" or if you prefer CCITT Rec. T.41x "Open Document Architecture". In this standard, both the logical structure and the "layout structure" are part of the interchanged document, including "styles" for presentation and selection of layout. While ODA (currently) cannot handle the complexity of layout hoped for in DSSSL, it is quite suitable for normal "rectangular" layout. Paul