[comp.text] What SGML is and isn't, again

tbray@watsol.waterloo.edu (Tim Bray) (07/21/89)

<intro>
I posted recently on text modelling and SGML; among other things I said: 
<SelfQuote>
 But the SGML standard itself is horribly flawed and permits some things which
 are unhelpful and even dangerous.  The details are too lengthy and sordid to
 go into here, but I can talk in detail on request. </SelfQuote>
Well, lots of people requested.  Herewith some specific gripes.  But first:
</intro>
<caveat>
<point>I'm not an SGML expert.  SGML requires that you write a prescriptive
grammar for your text before you can begin to use it, which you can't for the
OED, or for a large class of similar existing reference documents (e.g. big
dictionaries, legislation & commentary).  So we gave up on it some time
ago.  We do read <Tag>, and I've struggled the best part of the way through
the standard (arrgh), and try to keep in touch, but...  Bear in mind that all
the specific gripes listed below are just background in the context of this
one big problem that keeps it from being useful for an important class of
interesting reference texts.</point>
<point>
There are a lot of people working with (and some betting their jobs on) SGML.
If I'm going to sit here and point out what look like problems, somebody
really should get in and present the other side.  I don't think the SGML folk
are fools, or even wrong, just that there is this big class of problems for
which the system breaks down.</point>
</caveat>
<problist>
<problem>
The SGML spec is ugly, ugly, ugly.  <brag>I can read telephone system
technical specs</brag> and I have BIG comprehension trouble with the
standard.</problem>
<problem>
The SGML meta-syntax is ugly, ugly, ugly.  I have heard it likened to OS JCL.
Such ugliness may not be fatal in and of itself, but is often symptomatic of
design flaws at the core.</problem>
<problem>
Parsability.  The SGML syntax is such that it can't be parsed by a Context
Free Grammar.  This is just stupid, since it could have been specified right
without loss of expressive power.  How long has SGML been around, and how many
real industrial-strength parsers are there?  I've heard of two. Never looked
inside one; don't think I want to.</problem>
<problem>
Tag minimization.  The SGML standard allows all sorts of tags to be omitted,
shortened, or expressed along the lines of </> when it's "obvious" what's
going on.  I understand that specifying when it's obvious has turned out to be
an intractable problem and the spec may end up saying "and when such
minimization doesn't make the grammar ambiguous."  Sounds like an admission of
defeat to me.  The idea, it seems, was to make it easier to type in all those
tedious "<,/,>" sequences.  Well, if your editor program isn't smart enough to
figure out how to complete a partly-entered tag, or figure out which is the
appropriate end tag using a single keystroke, you're in deep trouble anyhow.
This is the same kind of thinking that gave us keyword abbreviation in PL/I.
Wouldn't be a problem is that people actually *use* this misfeature.</problem>
<problem>
Tag attributes.  This is a very controversial area in the SGML field.  Nobody
can agree whether tag attributes should or should not be used (people seem to
be drifting towards "as little as possible") and if so, what should be
expressed as tag and what as attribute.  I note that all the SGML editors I've
seen handle attributes rather awkwardly.</problem>
<problem>
Intellectual impoverishment.  (Special case of the One Big Gripe above).  Once
people have invested all the time in writing a DTD and bludgeoning a document
into line, they start believing in Complete Descriptive Markup and the Tooth
Fairy.  That is, they feel they have captured all the important structures,
and if I, with my flexible computing tools, want to start improving or further
elucidating the structure, I'm out of luck (and they don't want to go near
that grammar again if they don't have to).  This regardless of the fact that
human language has near-infinite flexibility and resists formal specification,
as Chomsky et al found out in the 60's.</problem></problist>
<coda>
Enuffa that.  Maybe SGML is the right structure for an important subclass of
the text universe; in particular, for interchange of relatively simple
documents.  But if it can't handle the big reference documents, then for a 
lot of applications it's a toy.</coda>
<sig>Tim Bray, New OED Project, U of Waterloo, Ont., Canada</sig>