[comp.text] Long nits: Re: software tools for SGML, proposed comp.text.sgml

mss+@andrew.cmu.edu (Mark Sherman) (08/02/90)

I normally try to refrain from flaming, but there is an enormous amount
of "SGML can do everything" rhetoric being expounded by SGML vendors,
that I feel the need to counterbalance a bit (for some truth in
advertising, some people claim that I flame because I have a vested
interest in ODA -- as I've said many times, I view them as incomparable.
No, I will not get into that discussion on a bboard again. I did not
push ODA in the previous message and will not mention it again here.
Call me on the phone.)

I should also preface my comments with saying the Yuri's comments were
very reasonable. But I do have some nits.

Excerpts from netnews.comp.text: 1-Aug-90 Re: software tools for SGML..
Yuri Rubinsky@sq.sq.com (17297)

> Nonetheless, WYSIWYG has a place in the SGML world. In article
> <cah492S00VsA4tzYNl@andrew.cmu.edu>
> mss+@andrew.cmu.edu (Mark Sherman) writes:

> > One can define imaging semantics to be associated with SGML. The program
> > AuthorEditor from SoftQuad is quite nice in that regard. But its
> > conventions are parochial -- an "SGML" system knows nothing about AE's
> > semantics, unless the exchanging parties agree to information outside of
> > the standard.

> Mark is being a little bit mischievous here. Certainly my favourite
> dictionary defines parochial as "confined to a narrow area", but the
> "but" in his sentence doesn't recognize that very often this
> local functionality is indeed a Good Thing.

I do not want to get into the religious argument as to whether a
document ought to include its appearance along with its representation,
i.e, whether "this local functionality is indeed a Good Thing". My point
is only that the semantics are confined to only Author/Editor. If you
send that SGML document to an Interleaf product that supports SGML, it
won't look the same. Historically, our project (Andrew) has taken the
view that life is on the screen and not on paper. Therefore, we followed
the path suggested of making life good on screen at the expense of
paper. For example...

>  Just because the footnotes in my final document may be
> printed in 6 or 8 point type is no reason why I should have to look
> at them in that size on the screen.

Hell, if I got a footnote, I just pop it up or collapse it on the screen
as necessary. My citations snap to the window that they cite. Who cares
about 6 or 8 point font? At least that's how we built our software 4
years ago. We too thought:

> I'm perfectly comfortable knowing that a pair of simple SGML tags will
> allow a text-for-paper formatter to ensure that the footnotes will
> appear at the bottom of the page or chapter end in a small point size,
> while a text-for-screen formatter may place them in-line or at the
> bottom of a screenful of text, or in a thin column to the left of the
> text body. Having computer screens imitate a piece of paper (of all
> ancient technologies!) hardly does justice to their capabilities. 

For us, read that first sentence as "if you want paper, we'll generate
troff". Well, we now have many megabytes of user complaints that they
use the computers to generate paper and what they see ain't what they
got. Their professional life does match our whimsies. They use more than
one computer, more than one program and more than one medium (i.e. paper
in addition to files). They want it to be the same everywhere. Please
don't shoot the messenger, I have enough arrows in my back. Note that a
key element of CALS (one of the big SGML motivaters in the US) is an
interpretation of display semantics for tags. CALS compliance is not
merely SGML compliance at the DTD level, but also visual compliance. 

> With a simple command in such an editor, you insert "list item" or
> "table" tags (for example); screen feedback assures you this is the
> element you wanted. ... A word of explanation for those who don't
> recognize these names: Both SoftQuad Author/Editor and IBM's TextWrite
> are conforming SGML editors, context-sensitive, structured and so forth,
> with good assistance for
> the user encoding an SGML document, and "QUASI-WYG" in the way described
> above. There are other SGML editors, Exoterica's Checkmark, Sobemap's
> Write-It and Datalogics' WriterStation, which don't do this.

Sure, as long as all the editors you use will interpret "table" tag as a
table. I'll bet the Exoterica's Checkmark only knows that identifier as
a tag. If the file goes to a system that does not interpret "table" as a
table, then you'll just get streams of characters matching whatever
table content encoding Author/Editor or TextWrite use. Which is my
point: you can add a great deal to SGML (some of which has been written
down in the references in your message), but all of those additions are
outside of the SGML standard. I don't care to argue whether that is good
or bad.

> Agfa Compugraphic CAPS, Xyvision, Frame, Intergraph, Interleaf, Context,
> Datalogics, Arbortext, SoftQuad and perhaps others (apologies to anyone
> I've forgotten in this list) have demonstrated the ability to take SGML
> files encoded using specific tagsets (generally CALS 28001) and ...

A warning to the casual user: just because a vendor says they are CALS
compliant, does not mean they have a general SGML system. For example,
one can write an editor that has the CALS DTD and imaging semantics
built in. It will work just fine with other CALS systems, but be close
to useless with any general purpose SGML system or other tag
interpretation. In fact, a salesman from one of the above mentioned
vendors (not SoftQuad) told me that their product worked exactly that
way, "so it was a fool proof CALS system -- the user need never worry
that they were generating some SGML that would not be CALS compliant."

> show them on the screen matching line-for-line what will be output to a
> printer.

Right. Because CALS is not only SGML, but a collection of *other*
standards (e.g., MIL 28001) that define what those tags mean and how to
interpret the content so tagged. Not true for generic SGML.

> The standard defines a document as (more or less)
> a Document Type Definition -- the set of elements, other constructs,
> and their relationships -- followed by an "instance" of that DTD,
> content marked up using the semantics rigidly prescribed by the DTD.

> An ability to read the DTD is a vital function within any SGML system.
> Accordingly, there is a completely standardized, interchangeable
> method, within the standard, to pass along the data content notations,
> such as CGM, or TIFF, or RIFF, or IGES, or IFF, or anything. It is
> not the job of SGML (nor should it be) to dictate how applications
> software will respond to the content being passed.

> "Our local cabal" has nothing to do with the story. Anyone with
> an SGML parser can read any SGML file and be passed a meaningful
> output stream.

We are violently agreeing. I am speaking from the perspective of a user,
not an implementor. When someone asks a question like "I have a document
with a CGM drawing in it and want to send it to a PostScript printer. I
heard that SGML is an interchange medium that supports CGM and
PostScript.  Can I use SGML for the conversion?", you *know* they are
asking whether they can print their file, not whether you can write a
file with little tags saying "here are  postscript bytes, here are CGM
bytes". They want the drawing converted. For the editor, printer or
other imaging program to work, they have to know (1) that your tags mean
that the data are represented as CGM , PostScript, CALS tables, AAP
equations, or whatever ("anything") and (2) how to process those bytes
that are so tagged. As you say, SGML does not specify anything about how
to process the content. There are lots of ways to say how to process the
content: all outside of the SGML standard. With just SGML and a generic
SGML parser, I can parse the bytes and print a wonderful message: "I
just found some CGM bytes" and then do nothing more with them. Actually,
I can also say that it was legal for those bytes to appear at that
location in the document, and possibly where else I could put those
bytes.  Sorry, that is not what most users expect.

> b) "SGML Support Facilities: Techniques for Using SGML". The DTDs ...
> contain "content models" for tables of varying complexity.

For the layman: this means that if you want to exchange a table or
equation from your system to another system, your translator must
convert from your representation (say SYLK) into the DTD's format in
SGML. By the way, make sure that the receiving SGML system understands
the same DTD, or you will still lose your table at the other end. (Yeah,
yeah, I know: the table is still there, as a marked up SGML and the DTD
syntax rules can be passed along, but that ain't enough for an editor to
understand the data as a table -- it can only be understood by the
recipient as a collection of structured content.) There are lots of
techniques for using SGML, but saying "SGML" by itself is not enough to
answer most user's questions.

		-Mark