[comp.text] SGML-based tools for University Use

jmwobus@cmx.npac.syr.edu (John Wobus) (08/02/88)

I gather that along with SGML, you need further standards (i.e. sets
of tags for specific types of documents) and formatting or translation
programs to actually use it.  Anyone know of suitable further standards and
tools for general university use?  E.g. standards for theses, dissertations,
term papers, papers for publication, internal position papers, etc?

Note: I've seen the standard and the AAP publications and both look
a bit daunting compared to old GML which I found very easy & convenient.

John Wobus
Syracuse University

dns@sq.uucp (David Slocombe) (08/04/88)

In article <584@cmx.npac.syr.edu> jmwobus@cmx.npac.syr.edu (John Wobus) writes:

>I gather that along with SGML, you need further standards (i.e. sets
>of tags for specific types of documents) and formatting or translation
>programs to actually use it.

The SGML standard guarantees that a conforming parser will be able
to parse ANY document that is a valid instance of the Document Type
Declaration ("DTD", or "doctype") preceding it.  There is no
requirement that the doctype being used be a standard one.

Individual organizations or groups of organizations have strong motivation
for developing "standard" doctypes for use by their employees, clients,
vendors, or members.  In the future there may be a way to "register"
these widely-used doctypes (where they are shared across organizations)
so that a public designation for a doctype is all that has to be transmitted
with the document using it, instead of the whole doctype in all its
gory detail.

Some clear candidates for such public registration -- because they
already exist and are used by people in different organizations --
are the U.S. Dept. of Defense Computerized Acquisition and Logistics
Support (DOD CALS) doctype (which has a "MIL" number now), and the three
Association of American Publishers (AAP) Electronic Manuscript
Project doctypes, for Books, Articles, and Serials.

The CALS doctype is mandated by the DOD for use by vendors supplying
documentation under DOD contracts.

The AAP doctypes will be used by some publishers to standardize the way
in which their authors' manuscripts are submitted in machine-readable form.

In most publishing applications out there in the real world, the
doctypes being used are not standard.  You locally create a doctype
suitable for a class of documents and go with that.  If you need
to send your document to another site, you ship the doctype at the
head of the document, knowing that -- as long as the doctype precedes
the document -- the other site's SGML parser will be able to parse it.

But, possibly, if you send your doctype + manuscript to the DOD,
and you didn't use THEIR doctype, they will send it back saying,
NOT "we can't parse this" [because they can!], but "this [valid]
document isn't what WE want!  Make it conform to OUR doctype."
They will determine this automatically by replacing YOUR doctype
with the CALS doctype at the head of your manuscript and feeding
the works into their SGML parser, which will complain that your
document is invalid.

>                 Anyone know of suitable further standards and
>tools for general university use?  E.g. standards for theses, dissertations,
>term papers, papers for publication, internal position papers, etc?

In association with Virginia Polytechnic, SoftQuad is currently
creating a Document Type Declaration (DTD) for dissertations in
SGML format.  University Microfilms has agreed to accept, in
electronic form, dissertations matching this DTD.  This would be
another candidate for public registration in the future.

>Note: I've seen the standard and the AAP publications and both look
>a bit daunting compared to old GML which I found very easy & convenient.

(a) GML is a language for a text-formatting program first and foremost.
    SGML is a document structuring language, period.  It happens to share
    some early history with GML.

(b) No one has to read all of the AAP booklets in order to get a
    manuscript AAP-encoded.  SoftQuad's president, Yuri Rubinsky,
    once dictated the SGML coding rules over the phone to an author
    of a book being published by Bantam, so she could "tag" her
    WordStar files.  It took him about 15 minutes to explain
    everything she needed to know, without visual aids (let alone
    AAP booklets!).  This 15 minutes included the requisite warnings
    about distinguishing "1" from "l" and "0" from "O" and so on
    that you always have to do for authors learning for the first
    time about preparing manuscripts in machine-readable form
    instead of on paper.

    In effect, this author was using a tiny subset of the AAP doctype
    for Books.

----------------------------------------------------------------
David Slocombe				(416) 963-8337
SoftQuad Inc.				uucp: {utzoo,utai}!sq!dns
720 Spadina Ave.			Internet: dns@sq.com
Toronto, Ontario, Canada M5S 2T9