[comp.text] The word "Markup" in SGML

dns@sq.uucp (David Slocombe) (08/02/88)
In article <62137@sun.uucp> tut%cairo@Sun.COM (Bill Tuthill) writes:
>
>Could SGML possibly be misnamed?  Was it hopelessly naive of me to assume
>that the word Markup in its name indicates it is used for markup?
>
When I first went to work for a newspaper in 1966, we "marked up" copy
with phrases like "Times Bld Ext 24/28 centre", in pencil, with a
circle drawn around all such markup to distinguish it carefully from
the text of the story. (This story was going to be keyboarded by
typists working on paper tape punches with no visual feedback -- "blind
perforators".  They were paid by the keystroke, with their rate based on
their average speed, so you had to make everything very, very clear.)
That sort of thing, and other marks that are not so easy to describe on
an ascii keyboard, was called "markup".

The perforating typists, when they saw these encircled things, keyed in
special codes comparable to troff requests.  Sometimes they also keyed
special codes which were called "formats" but were really just simple
macros (non-recursive).   The programmers involved in typesetting tended
to refer to all these codes as "markup", by extension.

In the 1970s, the group of mainly large typesetting firms and in-house
typesetting departments that formed the membership of the Graphic
Communications Association (by "large" I mean big enough to have a
Xerox 9700 printing proofs and have a SECOND 9700 on hot-standby!)
developed the notion of "Generic Markup" or "Generic Coding", meaning
that a standard set of codes would be used in keyboarding, independent
of the particular typesetting equipment that was going to be used.  This
meant no retraining of operators when they changed equipment, and it
also meant that they could start thinking about having their clients
supply them with machine-readable manuscripts.

People like the purchasing agents of the U.S. DOD and the Canadian
Queen's Printer loved this idea because it meant they could get
manuscripts keyboarded with the generic markup in place -- cheaply --
and then shop around for the best competitive bid for typesetting the
machine-readable file.  Without Generic Markup, the typesetting
firms would insist that they had to re-keyboard the document to match
their typesetting equipment (at great expense, of course).  So both
U.S. and Canadian governments became instant converts to Generic
Markup, and SGML soon thereafter.

The codes described in the Generic Markup literature were still largely
typesetting-specific (for one thing they tended to be state-changes in
the middle of a stream of text:  process oriented), but "generic" in
the sense that the typesetting staff had some control over things at a
later stage in the production.  Hence you had "<h1>" for "heading style
1", and <it> for "italic [but the type-family to be specified
elsewhere]".  In effect they were calls to macros and that is how their
processing was generally implemented.

This was still called "markup", but you can see a trend away from
stylistic details.  "Heading style 1" may in practice expand into
"\f(TB\s(14" but it can also be considered a description of a line of
text as a logical element [of the document] of the type "heading-1".

From "Generic Markup" to SGML was a relatively small step conceptually.
(The step was taken around 1980 when the GCA GenCode committee merged
with the ANSI SGML committee.)  The codes you use to "mark up" your
document in SGML no longer have ANY visual-style meaning left, since
that issue has all been punted down-stream within the typesetting shop,
as it were.  But the codes are still necessary if the formatting is to
proceed without a hitch, so they inherit the name of "markup".

----------------------------------------------------------------
David Slocombe				(416) 963-8337
SoftQuad Inc.				uucp: {utzoo,utai}!sq!dns
720 Spadina Ave.			Internet: dns@sq.com
Toronto, Ontario, Canada M5S 2T9