[comp.text] Want info on SGML

davek@hp-lsd.HP.COM (Dave Kumpf) (07/01/89)

I'm looking for more information on SGML (Standard Generalized Markup
Language).  Books, reports, example applications, information on parsers,
etc. would all be welcome.  At the moment, all I have is "An Author's Guide
to SGML" and small familiarity with HP's internal use.  Are there any
newsgroups discussing SGML?

Thanks --

Dave Kumpf
hplabs!hp-lsd!davek

brucec@demiurge.WV.TEK.COM (Bruce Cohen;685-2439;61-028) (07/06/89)

In article <8210005@hp-lsd.HP.COM> davek@hp-lsd.HP.COM (Dave Kumpf) writes:
>I'm looking for more information on SGML (Standard Generalized Markup
>Language).  Books, reports, example applications, information on parsers,
>etc. would all be welcome.  At the moment, all I have is "An Author's Guide
>to SGML" and small familiarity with HP's internal use.  Are there any
>newsgroups discussing SGML?

The official specification for SGML is ISO Standard 8879, which costs about
$100.  I ordered mine through the company standards office, which got it
from Global Engineering Documents, 2805 McGraw Ave, Irvine, CA 92714,
phone: (800) 854-7179.  ACM sponsered a conference on document processing
systems in 1988 at Sante Fe, NM; I have two sets of course notes from it:
"Introduction to SGML" and "Implementation of SGML Systems", which you may
be able to order from ACM publications; they've been very helpful in
untangling the standard for me.  You might also be interested in report
IR-159, "The Implementation of the Amsterdam SGML Parser" by Warmer and
Egmond, from Vrije Univeriteit Amsterdam.  I got my copy by contacting
Sylvia van Egmond at sylvia@cs.vu.nl .

If you have access to an implementation of Framemaker (tm), a word
processor from Frame in California, you can look at their ML format, which
was an attempt to implement SGML before the standard was final.  It's not
that far off.

I don't know know of any news groups discussing SGML; I hang around here
hoping for some discussion.  When I get to working on it seriously again
(Real Soon Now), maybe I'll have something to contribute myself.

Bruce Cohen
brucec@orca.wv.tek.com
Interactive Technologies Division, Tektronix, Inc.
M/S 61-028, P.O. Box 1000, Wilsonville, OR  97070

mike@mks.UUCP (Mike Brookbank) (07/06/89)

In article <3790@orca.WV.TEK.COM> brucec@demiurge.UUCP (Bruce Cohen) writes:
>In article <8210005@hp-lsd.HP.COM> davek@hp-lsd.HP.COM (Dave Kumpf) writes:
>>I'm looking for more information on SGML (Standard Generalized Markup
>>Language).  Books, reports, example applications, information on parsers,
>>etc. would all be welcome.  At the moment, all I have is "An Author's Guide
>>to SGML" and small familiarity with HP's internal use.  Are there any
>>newsgroups discussing SGML?
>

There is an excellent article on pg 30 in Federal Computer Week about 
SGML and its role in the Federal Agencies electronic publishing needs.
It talks somewhat about SGML as a standard being adopted to help
simplify the processing of millions of pages produced by the US
government.

In this article they mention SoftQuad Inc.'s SGML product
"Author/Editor".  You can probably obtain more information about this
product and information about SGMLs from them directly. (416) 963-8337.


-- 
     Mike Brookbank                          Phone: (519)884-2251
Mortice Kern Systems Inc.               UUCP: uunet!watmath!mks!mike
   35 King St. North                             BIX: join mks
Waterloo, Ontario  N2J 2W9                  CompuServe: 73260,1043

dela@nebula.ee.rochester.edu (Del Armstrong) (07/06/89)

In article <3790@orca.WV.TEK.COM> brucec@demiurge.UUCP (Bruce Cohen) writes:
>In article <8210005@hp-lsd.HP.COM> davek@hp-lsd.HP.COM (Dave Kumpf) writes:
>>I'm looking for more information on SGML (Standard Generalized Markup
>>Language).  
> ...
>If you have access to an implementation of Framemaker (tm), a word
>processor from Frame in California, you can look at their ML format, which
>was an attempt to implement SGML before the standard was final.  It's not
>that far off.

Another system which can generate SGML for you to look at is The Publisher,
which is a document processor produced by Arbortext inc. It uses TeX as it's
underlying engine, but can generate SGML output.


			Del Armstrong

	Internet    : dela@ee.rochester.edu
	UUCP        :     ...allegra!rochester!ur-valhalla!dela
	Twisted pair: (716) 275-5342
	Last resort : Hopeman 407
		      Electrical Engineering
		      University of Rochester
		      Rochester, N.Y.  14627

    +---------------------------------------------------------------------+
    |          On a clear disk you can seek forever.                      |
    +---------------------------------------------------------------------+

tut%cairo@Sun.COM (Bill "Bill" Tuthill) (07/07/89)

This newsgroup hashed over the SGML issue almost one year ago.
The upshot, as I remember, was that people whose livelihoods
depended on SGML thought it was great, while those who didn't
thought it was nearly useless.  The important thing to remember
is, SGML isn't.

SGML != standard generalized markup language

It isn't standard, as there are several tag sets, such as APA
and CALS.  CALS is even a superset of SGML.

It isn't generalized, because it doesn't handle graphics, tables,
or equations.  CALS does, which is why it's a superset of SGML.

It isn't for page markup, but rather, for describing documents
hierarchically.  Kinda like TeX or troff macros!

It isn't a language, but a syntax for describing a language,
like Backus-Naur Form.

The Holy Grail that everybody's looking for is the ability to
readily interchange documents.  So far SGML doesn't help much
in that regard, because common document formatting systems--
TeX, troff, Frame, Interleaf, MSword, WordPerfect, etc.-- don't
read or write SGML.  And that's the bottom line.

edb@io.UUCP (Ed Blachman x4420) (07/07/89)

This message runs long.  Consider yourself (yourselves?) warned.


In article <114143@sun.Eng.Sun.COM> tut%cairo@Sun.COM (Bill "Bill" Tuthill) writes:
>The important thing to remember is, SGML isn't.
>
>SGML != standard generalized markup language
>
>It isn't standard, as there are several tag sets, such as APA
>and CALS.  CALS is even a superset of SGML.
>
>It isn't generalized, because it doesn't handle graphics, tables,
>or equations.  CALS does, which is why it's a superset of SGML.
>
>It isn't for page markup, but rather, for describing documents
>hierarchically.  Kinda like TeX or troff macros!
>
>It isn't a language, but a syntax for describing a language,
>like Backus-Naur Form.

Well... SGML is a standard -- it's ISO 8879-1986, as Bruce Cohen pointed
 out in message <3790@orca.WV.TEK.COM>.  And it's not a particular markup
 language, as you point out, but rather a language that can be used to
 define an entire family of markup languages.  As such, it could be used
 for many purposes -- although those that are being tried first are in
 the model of SGML's progenitor, IBM's GML, which was a set of IBM Script
 macros used to describe documents hierarchically.

SGML *can* be used to write languages describing anything.  For instance:
 in a previous job, for a company that no longer exists, I saw an SGML-
 based markup language that included markup for graphics, tables and
 equations.  And there's currently an ISO committee working on something
 called the SPDL -- Standard Page Description Language -- which will be
 an application of SGML to page markup.  (Don't ask me why the world
 needs another page markup language -- that one I don't understand.  But
 if such a thing is necessary, there's no reason it can't be based on SGML.)
 That sounds pretty general to me.

So I think SGML's name describes it pretty well.  And while a name like
 "Standard language for describing generalized markup languages" might
 alleviate some confusion, SLFDGML woould be pretty disastrous as an
 acronym.

>The Holy Grail that everybody's looking for is the ability to
>readily interchange documents.  So far SGML doesn't help much
>in that regard, because common document formatting systems--
>TeX, troff, Frame, Interleaf, MSword, WordPerfect, etc.-- don't
>read or write SGML.  And that's the bottom line.

But you're right about interchange being the Grail, and about SGML not
 being that Grail, not yet at least.  But SGML is a step toward the Grail.
 It allows markup languages to be specified in a vendor-independent manner,
 and that's already a big step forward.  And SGML's companion standard,
 DSSSL (the Document Style and Semantics Specification Language), will
 allow (as you'd expect from the name) style and semantic information to
 be associated with the hierarchically nested tags characteristic of SGML-
 based languages; the combination of an SGML DTD (Document Type Definition,
 really a markup language specification) and its associated output spec
 written in DSSSL should truly allow documents to be interchanged among
 disparate processing systems with matching results.

In the meantime, the world is walking toward SGML usage... slowly.  One
 step is to begin to be able to read, process and output documents marked
 up with particular (non-vendor originated) SGML-based markup languages.
 CALS includes both a "conforming" DTD and a template to be used in building
 similar DTDs, and some vendors (like Interleaf) are shipping systems that
 can read, process and output CALS documents.  (Actually, CALS itself is an
 attempt to leapfrog or accelerate the race to the Grail by setting up a
 total interchange standard, of which the use of SGML-based languages for
 text markup is a part.)

But as for systems that are fully SGML-capable today -- yeah, there aren't
 any, and it's pretty clear why.  It's no trivial matter to create a text
 processing system to begin with, and most text processing systems accor-
 dingly embody some kind of assumption about the amount and kind of struc-
 ture that authors want to associate with their documents.  For many such
 sets of assumptions, it's fairly easy to map those assumptions onto a
 markup language that could be described in SGML.  That's what my old com-
 pany did, and what others have done as well.  But markup languages can
 be written to embody just about any set of assumptions, and automatically
 translating between such sets is no easy matter.

So (and this is now in reference to Bruce Cohen's pointer to Frame's ML
 and Del Armstrong's (message <2304@valhalla.ee.rochester.edu>) pointer
 to Arbortext Publisher) current systems that "output SGML" actually pro-
 duce output in a particular SGML-compatible markup language.  A fully SGML
 capable system would produce output in *any* SGML markup language -- you'd
 just tell it which DTD to use, and it'd produce your output.

Disclaimers: first, I work for Interleaf, so I have a bias and I haven't
 seen the competing products in detail.  (I *do* follow the SGML world,
 however, and I'm pretty sure that if there's a fully SGML-capable system
 around, its manufacturers are keeping *very* quiet.)  Second, my liveli-
 hood *does* depend on SGML, at least to some extent, so I am bound to take
 a more positive view of it than someone who, say, wants document inter-
 change *today* and is disappointed that SGML doesn't by itself suffice.
 I believe that SGML is compatible with, and could be a step on what I hope
 is an inevitable path towards, a future in which authors and readers are
 free from the tyranny of proprietary formats.

Ed Blachman	edb@ileaf.com	(or)	...!mit-eddie!ileaf!edb

If I am not for myself, who will be for me?
If I am only for myself, what am I?
And if not now, when?

lee@anduk.co.uk (Liam R. Quin) (07/09/89)

In article <3790@orca.WV.TEK.COM> brucec@demiurge.UUCP (Bruce Cohen) writes:
>In article <8210005@hp-lsd.HP.COM> davek@hp-lsd.HP.COM (Dave Kumpf) writes:
>>I'm looking for more information on SGML (Standard Generalized Markup
>>Language).  Books, reports, example applications, information on parsers,
>>etc. would all be welcome.
>>At the moment, all I have is "An Author's Guide
>>to SGML" and small familiarity with HP's internal use.  Are there any
>>newsgroups discussing SGML?

I assume the book you mean is
%T SGML: An Author's Guide to the...
%A Bryan, Martin
%D 1988
%I Addison-Wesley
%i 0-201-17535-5
%c QA76.73.S44B79/1988/005/88-24193

but this is a little dry for my tastes!  Skip the chapters on alphabets etc.!
Martin works for SOBEMAP, who distribute a Parser/translater that reads and
validates SGML.  You might reasonably point out that validating that a
document contains correct SGML is not in itself of any use.  You would be
right, but there's not much else that you can do with SGML right now!
Sobemap's product can also be used to turn SGML into (say) troff, with
a little effort.

> If you have access to an implementation of Framemaker (tm), a word
> processor from Frame in California, you can look at their ML format, which
> was an attempt to implement SGML before the standard was final.  It's not
> that far off.
SoftQuad's Author/Editor uses SGML on the Mac (other systems some time
this year, they say...)
Software Esoterica also have an SGML-conforming editor, as do (I think)
Datalogics.
SoftQuad's product is the best of these to look at if you want to find
out about SGML, I think.

Interleaf can read SGML files (but not, I think, write them).

Compugraphic are demonstrating a CALS-conforming
systeam (at least, they say it is!), which uses Author/Editor and Texet,
running on suns.

There is a long list of other products, ranging from a text database
(Officesmith) through to CD-ROM software.

I think we have a list somewhere...

Maybe we should include a Software Directory in the User's Group Bulletin.

The Oxford English Dictionary is an example of a book (well, lots of books,
18 month's of continuous typesetter output) that was marked up in SGML,
and typeset by Pindar Infotek of York.

An encycolpedia was published simultaneously in CD-ROM and book form, and
also made available on-line, using SGML.


There is also the SGML Users' Group.

For more info on this, you could contact
me (uunet!utai!anduk.uucp!lee), or
Steve Downie, (the Group's Secretary), uunet!utai!anduk.uucp!downie
Unixsys (UK) Ltd.,
The Genesis Centre, Garrett Field, Birchwood, Warrington, ENGLAND, WA3 7BH
Telephone +44 925 828181
Fax +44 925 827834

Ludo Van Vooren (ludo@sq.com, uunet!{utzoo,utai}!sq!ludo
SoftQuad Inc.,
720 Spadina Avenue, Toronto, CANADA, M5S 2T9
Tel. +1 416-963-8337

I have about a dozen other addresses too, for Brussels, Germany, Chicago,
Switzerland, ...

Either we or Ludo can put you in touch with a local SGML group.

If you are a military establishment (or deal with them in any way
whatsoever), you will also *need* to know about CALS before long.

CALS requires that all documentation is prepared electronically.
In SGML.

And this is an excellent decision, because it is not just device independent,
but technology-independent!  There would be no difficulty in arranging
for a voice-synth. to read an SGML document, when the technology
catches up.  Tricky with PostScript/TeX/troff/HPGL/...


You could order an evaluation copy of Author/Editor from SoftQuad, I imagine
(I don't know if they charge for that), or a copy of the manual:
%T Author/Editor
%A Shiff, Sharpe & Spencer
%I SoftQuad Inc.,
%D April 1989
%i 0-88910-303-8

This might help you get started, especially if you have a mac!
If anyone knows a better introduction to doing practical work with
SGML, I would very, very much like to hear about it!

The American Association of Publishers, AAP (or is is the Assoc. Am. Pub?),
have a Document Type Definition, and there are a number of American
Publishers who can take SGML.  I believe that Bantam can, but I'm not sure.

Publishers like it because once the Author and Publisher have agreed on a set
of tags, they can take the author's file and print it without haveing to go
through the ususal process of first stripping out all of the formatting
codes that the author put there, and then adding their own.

If you have ever had published a paper or book containing both emphasis
and key-words, and you used itlic for both of them... and the publisher
used bold for keywords (as is their right, of course; that's why
they employ designers)... you will have spent ages going through
the proofs marking whereever they got it wrong.

Not any more.  And this is a very small example.
It is still *normal* for a typesetting house to re-key your book by hand 
rather than go through the hassle of looking at your floppy disk.
Yes, *normal*.

SGML could change that.  (on the other hand.... :-( :-( )

>I don't know know of any news groups discussing SGML; I hang around here
>hoping for some discussion.  When I get to working on it seriously again
>(Real Soon Now), maybe I'll have something to contribute myself.

There don't seem to be any.

>Bruce Cohen
>brucec@orca.wv.tek.com

Sorry if this is a little garbled, it getteth late!

Lee (uunet!utai!anduk.uucp!lee)

-- 
Lee Russell Quin, Unixsys UK Ltd, The Genesis Centre, Birchwood,
Warrington, ENGLAND, WA3 7BH; Tel. +44 925 828181, Fax +44 925 827834
	lee%anduk.uucp@ai.toronto.edu;  {utzoo,uunet}!utai!anduk!lee
UK:	uu.warwick.ac.uk!anduk.co.uk!lee

hrs1@cbnewsi.ATT.COM (herman.r.silbiger) (07/11/89)

In article <114143@sun.Eng.Sun.COM>, tut%cairo@Sun.COM (Bill "Bill" Tuthill) writes:
> The Holy Grail that everybody's looking for is the ability to
> readily interchange documents.  So far SGML doesn't help much
> in that regard, because common document formatting systems--
> TeX, troff, Frame, Interleaf, MSword, WordPerfect, etc.-- don't
> read or write SGML.  And that's the bottom line.

SGML has found wide application in publishing and related industries for indicating the logical structure of documents.  As was pointed out, the language is
an international standard, but the meaning of the tags is by mutual agreement.

There is a standard which is designed for the open interchange of compound
documents, Open/Office Document Architecture (ISO 8613, CCITT T.410).
ODA can convey both the logical structure and the layout of documents.
ODA uses either ASN.1 or SGML (in the form of ODL) to express these structures.
Systems which will accept ODA documents will be available by the end of the year, some of which will be on the UNIX(TM) operating system.

Work on the definition of both the ODA and SGML standard takes place in the
X3V1 standards committee.  For info on this activity, reply to this posting, or send e-mail to hrs@batavier.ATT.COM, or call 201 949 3193.

Herman Silbiger 

hrs1@cbnewsi.ATT.COM (herman.r.silbiger) (07/11/89)

In article <1141@io.UUCP>, edb@io.UUCP (Ed Blachman x4420) writes:
>  And there's currently an ISO committee working on something
>  called the SPDL -- Standard Page Description Language -- which will be
>  an application of SGML to page markup.  (Don't ask me why the world
>  needs another page markup language --

There currently is no STANDARD page description language.  Postscript(TM) is
widely used, but it is not a standard, and it is controlled by Adobe Systems.
Interpress(TM) by Xerox is another PDL.  SPDL will be internationally standardized, and will have a cleartext, an SGML, and an ASN.1 form.  ODA structured 
will be able to be rendered by SPDL.  The editors of the SPDL draft are Matt
Foley from Adobe, and Steve Strassen from Xerox.
> 
>  and that's already a big step forward.  And SGML's companion standard,
>  DSSSL (the Document Style and Semantics Specification Language), will
>  allow (as you'd expect from the name) style and semantic information to
>  be associated with the hierarchically nested tags characteristic of SGML-
>  based languages; the combination of an SGML DTD (Document Type Definition,
>  really a markup language specification) and its associated output spec
>  written in DSSSL should truly allow documents to be interchanged among
>  disparate processing systems with matching results.

DSSSL is NOT exclusively SGML related, although SGML has the greatest need for 
it.  By interbnational agreement, DSSSL will use ODA semantics and syntax
(source: ISO/IEC JTC1/SC18/WG8), and in effect will be a superset of ODA.
There will be an ASN.1 version as well as the SGML version.

As I mentioned in an earlier followup, this work is taking place in the US in
the X3V1 standards committee, and its international counterpart, JTC1/SC18.

Herman Silbiger hrs@batavier.ATT.COM
201 949 3193

brucec@demiurge.WV.TEK.COM (Bruce Cohen;685-2439;61-028) (07/13/89)

In article <375@cbnewsi.ATT.COM> hrs1@cbnewsi.ATT.COM (herman.r.silbiger) writes:
>In article <1141@io.UUCP>, edb@io.UUCP (Ed Blachman x4420) writes:
>>  And there's currently an ISO committee working on something
>>  called the SPDL -- Standard Page Description Language -- which will be
>>  an application of SGML to page markup.  (Don't ask me why the world
>>  needs another page markup language --
>
>There currently is no STANDARD page description language.  Postscript(TM) is
>widely used, but it is not a standard, and it is controlled by Adobe Systems.
>Interpress(TM) by Xerox is another PDL.

That's not the only justification (pun intended) for SGML.  Postscript and
Interpress are PAGE description languages, not structured markup languages.
They don't really support recording and analyzing the structure of text,
only the visual representation.  Well, maybe that's a bit strong: you could
encode the structure with Postscript, but it's not easy to do unless you
like programming RPN, and there isn't a standard desription syntax for
the textual structure.

As for troff, gag with me a recursive function returning a structure
pointer!  Unless very carefully used by someone willing to write troff
macros, using troff results in losing information about structure.  I
recently tried to develop a way to automatically extract the text structure
from the troff version of our online man pages and gave up in disgust.
Luckily, Framemaker output (MIF) turns out to be acceptable for my
application, and the source for the printed manuals is kept in that form,
so I had an alternative.

Bruce Cohen
brucec@orca.wv.tek.com
Interactive Technologies Division, Tektronix, Inc.
M/S 61-028, P.O. Box 1000, Wilsonville, OR  97070