[comp.text.sgml] What about the DTD

schiers@mcshh.hanse.de (Carsten Schiers) (10/03/90)

Hello,

is the Document Type Definition (DTD) part of an SGML text interchange?
To specify my question: does a software product, e.g. a publishing
system, which tells to be able to use SGML format, have to be able to
read an SGML document and *any* DTD? As I understand, then it has to 
read the DTD, which is something like a grammmar for me, and then parse
the text, using this DTD.

I ask this, because a software manufacturer tells me, I have to send the
DTD for my special problem to him, and then he will return a special 
filter for documents which behave like this DTD. So anytime I have a new
type of document, I have to buy a new filter. 

Is there something inside the SGML definition, which I don't have at
my desk, sorry, which tells me that a DTD is part of an SGML document?
What is the idea of the CALS standard with SGML text exchange?

Thanks,
Carsten Schiers
Deutsche Airbus GmbH
unido!imdm.uke.uni-hamburg.dbp.de!schiers
unido!netmbx!mcshh!schiers

jwh@boston.ifs.umich.edu (Jim Howe) (10/04/90)

In article <8149@mcshh.hanse.de>, schiers@mcshh.hanse.de (Carsten
Schiers) writes:
|> Hello,
|> 
|> is the Document Type Definition (DTD) part of an SGML text interchange?
|>
|> [stuff deleted]
|> 
|> Is there something inside the SGML definition, which I don't have at
|> my desk, sorry, which tells me that a DTD is part of an SGML document?
|> What is the idea of the CALS standard with SGML text exchange?
|> 

SGML is a language for defining DTD's.  Since the DTD specifies the rules
used by your document it must be associated with your document in one
manner or another.  This can be done either by including the DTD at the
start of your document or referring to the DTD through the use of the 
PUBLIC identifier.  If PUBLIC is used, the document processor must know
how to process documents that conform to the DTD specified by the PUBLIC
keyword.  It could do this by knowing where to go to read the specified
DTD, or it could have the rules of the specified DTD built in.  

James W. Howe			   internet: jwh@ifs.umich.edu
University of Michigan             uucp:     uunet!mailrus!ifs.umich.edu!jwh
Ann Arbor, MI   48103-4943

hrs1@cbnewsi.att.com (herman.r.silbiger) (10/05/90)

In article <8149@mcshh.hanse.de>, schiers@mcshh.hanse.de (Carsten Schiers) writes:
> Hello,
> 
> is the Document Type Definition (DTD) part of an SGML text interchange?
> To specify my question: does a software product, e.g. a publishing
> system, which tells to be able to use SGML format, have to be able to
> read an SGML document and *any* DTD? As I understand, then it has to 
> read the DTD, which is something like a grammmar for me, and then parse
> the text, using this DTD.
> 
> I ask this, because a software manufacturer tells me, I have to send the
> DTD for my special problem to him, and then he will return a special 
> filter for documents which behave like this DTD. So anytime I have a new
> type of document, I have to buy a new filter. 

SGML encoded documents cannot be used in an Open Interchange environment,
since, as you have found out, the document cannot be interpreted unless you
also have the DTD.  This is one of the reasons that ODA adopted a standardized
architecture.

While it is true that ODA applications also must conform to a Document 
Application Profile (DAP), these are hierarchical subsets of the complete
standard.  If you conform to the highest level, you can understand all the
lower levels.  In SGML, conforming to one DTD does not help you with the others.

In the future, the use of ODL may provide the advantages of a standardized
architecture to SGML applications.

Herman Silbiger
hsilbiger@attmail.com

yuri@sq.sq.com (Yuri Rubinsky) (10/14/90)

In article <8149@mcshh.hanse.de>, schiers@mcshh.hanse.de (Carsten
Schiers) writes:

> Hello,
> 
> is the Document Type Definition (DTD) part of an SGML text interchange?
 ....
> Is there something inside the SGML definition, which I don't have at
> my desk, sorry, which tells me that a DTD is part of an SGML document?

Clause 6 of ISO 8879 -- SGML deals with the structure of the entities
which make up an SGML document and Clause 7 deals with the nature
of elements within SGML.  Together they explain that, for all practical purposes,
an SGML document begins with an SGML declaration (which establishes the
"default values" for a system, and which is optional), followed by a
prolog (usually simply one DTD or a reference to it as Jim Howe points
out in his article), followed by one or more "document instances", which
follow the markup declared in the DTD.

It is therefore the intention of SGML that interchange include, at the
very least, a DTD (or reference to one) and an encoded text stream.
_________________________________________________________________

However, Herman Silbiger (hsilbiger@attmail.com) writes:

> SGML encoded documents cannot be used in an Open Interchange environment,
> since, as you have found out, the document cannot be interpreted unless you
> also have the DTD.  This is one of the reasons that ODA adopted a standardized
> architecture.

> While it is true that ODA applications also must conform to a Document 
> Application Profile (DAP), these are hierarchical subsets of the complete
> standard.  If you conform to the highest level, you can understand all the
> lower levels.  In SGML, conforming to one DTD does not help you with the others.

Conforming to SGML -- not (as Carsten has realised) to one DTD -- is the
secret of success.  SGML is a **language** which we use to build
the architectures we need.

This is precisely the strength of SGML.  The odds are slim of all of us agreeing
on architectures that will support us in our trade book publishing, our
electronic disclosure of financial information, our creation of technical
manuals for airplanes and aircraft carriers, our building of hypertexts in
a variety of disciplines, our encoding of music, or poetry, or drama,
our interchange of machine-readable dictionaries and research texts,
our creation of training manuals and software reference guides. Indeed, the
odds are very slim of our finding **one architecture** to support all this
plus the things we can't yet imagine.

But, with systems which can read any DTD, and "reconfigure" themselves
to understand a variety of architectures, one is not restricted.
_________________________________________________________________

jwh@boston.ifs.umich.edu (Jim Howe)
is absolutely right in his description:

> SGML is a language for defining DTD's.  Since the DTD specifies the rules
> used by your document it must be associated with your document in one
> manner or another.  This can be done either by including the DTD at the
> start of your document or referring to the DTD through the use of the 
> PUBLIC identifier.  If PUBLIC is used, the document processor must know
> how to process documents that conform to the DTD specified by the PUBLIC
> keyword.  It could do this by knowing where to go to read the specified
> DTD, or it could have the rules of the specified DTD built in.  

The last line of Jim's reply is particularly interesting.  Clause 15 of
the standard, which defines conformance, answers Carsten's other question:
> To specify my question: does a software product, e.g.  a publishing
> system, which tells to be able to use SGML format, have to be able to
> read an SGML document and *any* DTD? As I understand, then it has to 
> read the DTD, which is something like a grammmar for me, and then parse
> the text, using this DTD.

As Jim indicates, a software product may have a DTD built in which it
supports.  That product CANNOT claim to be a conforming SGML system:
     "A conforming SGML system shall be capable of processing any
     conforming SGML document ..."
(where an SGML document may include any conforming DTD)
but it may be a conforming **SGML application**.  

An application includes both formal (DTD, data content notations, etc)
and often informal (application conventions, processing strategies)
specifications for the handling of a particular class of documents.
Generally "SGML application" has been treated as if synonymous with DTD,
or as if it implies just one DTD, and that works most of the time.
_________________________________________________________________

Carsten again:
> I ask this, because a software manufacturer tells me, I have to send the
> DTD for my special problem to him, and then he will return a special 
> filter for documents which behave like this DTD.  So anytime I have a new
> type of document, I have to buy a new filter.  

Your vendor is trying to cover for the fact that the software in question
has an application built in, and is not a conforming **system**.  You may
have better luck by obtaining an SGML parser and translating your SGML 
input to the input required by the publishing system.
_________________________________________________________________

Carsten also asks:
> What is the idea of the CALS standard with SGML text exchange?
CALS is a very broad defense industries initiative for the
automation of acquisition and logistics databases.  Among many
other aspects, it specifies an SGML application strategy (an
architecture for technical manuals and a direction for specifying
architectures for other kinds of paper and paperless documents)
as well as naming graphic standards for 2D, 3D and raster images.

This is an enormous topic.  My prediction is that it will one day
have its own newsgroup.

----------------------------------------------------------------
Yuri Rubinsky				(416) 963-8337
President                               (800) 387-2777 (from U.S. only)
SoftQuad Inc.				uucp: {uunet,utzoo}!sq!yuri
720 Spadina Ave.			Internet: yuri@sq.com
Toronto, Ontario, Canada M5S 2T9	Fax: (416) 963-9575

hrs1@cbnewsi.att.com (herman.r.silbiger) (10/19/90)

In article <1990Oct13.195117.26372@sq.sq.com>, yuri@sq.sq.com (Yuri Rubinsky) writes:
> 
> _________________________________________________________________
> 
> However, Herman Silbiger (hsilbiger@attmail.com) writes:
> 
> > SGML encoded documents cannot be used in an Open Interchange environment,
> > since, as you have found out, the document cannot be interpreted unless you
> > also have the DTD.  This is one of the reasons that ODA adopted a standardized
> > architecture.
> 
> > While it is true that ODA applications also must conform to a Document 
> > Application Profile (DAP), these are hierarchical subsets of the complete
> > standard.  If you conform to the highest level, you can understand all the
> > lower levels.  In SGML, conforming to one DTD does not help you with the others.
> 
> Conforming to SGML -- not (as Carsten has realised) to one DTD -- is the
> secret of success.  SGML is a **language** which we use to build
> the architectures we need.
> 
> This is precisely the strength of SGML.  The odds are slim of all of us agreeing
> on architectures that will support us in our trade book publishing, our
> electronic disclosure of financial information, our creation of technical
> manuals for airplanes and aircraft carriers, our building of hypertexts in
> a variety of disciplines, our encoding of music, or poetry, or drama,
> our interchange of machine-readable dictionaries and research texts,
> our creation of training manuals and software reference guides. Indeed, the
> odds are very slim of our finding **one architecture** to support all this
> plus the things we can't yet imagine.
> 
> But, with systems which can read any DTD, and "reconfigure" themselves
> to understand a variety of architectures, one is not restricted.
> _________________________________________________________________

I have heard much about the advantage of not having a standardized architecture,
which will allow so much freedom in a publishing environment.  However, I have
never seen or heard of such an instance where the ODA architecture could not 
decscribe the structure of a document.  If you know of such an example, please 
describe it.

The one example which has been given was for a hypermedia document.  However, it is possible to have links in the ODA structure which allow hypermedia documents.
In any case, such hypermedia documents are difficult to render in a print medium whether ODA or SGML/DTD is used.

Herman Silbiger
hsilbiger@attmail.com

enag@ifi.uio.no (Erik Naggum) (10/20/90)

In article <1990Oct18.201850.9304@cbnewsi.att.com> hrs1@cbnewsi.att.com (herman.r.silbiger) writes:

   I have heard much about the advantage of not having a standardized
   architecture, which will allow so much freedom in a publishing
   environment.  However, I have never seen or heard of such an
   instance where the ODA architecture could not decscribe the
   structure of a document.  If you know of such an example, please
   describe it.

The system I designed for Oslo's financial daily's electronic news
agency uses attributes to represent internal state information.  We
use marked sections to write different versions of an article, so that
the journalist can chose which one to send when some political or
economic decision has been made.  We make extensive use of entities to
represent ticker codes (also extremely useful for indexing).
Interviews are represented like this:

    <que>Question or interviewer comment</>
    <rep i="NN" name="Nomen Nescio">Reply or other comment</>
    <rep i="AA" name="Anton Aardvark">Reply or other comment</>
    <que>Another question</>
    <rep i="NN">Another reply from NN</>
    ... etc.

None of these can conveniently be described by ODA.

Generally speaking, you can probably describe any document with both,
but as far as I have looked into ODA, the granularity is not entirely
of your own choice.  The power of the ODA Generic layout model (I've
forgot what ODA calls it), is not anywhere near the power of the SGML
DTD.  This is intentional in ODA.  Too powerful tools makes for harder
interoperability.  (Thus a valid argument against SGML in general, but
not the specific instances of its use.)

I'm currently working on a project to represent a stock exchange
ticker with sufficiently intelligent "markup" to be used for several
kinds of display and processing purposes.  An idea spawned off this
project is representing EDI documents in a human readable form, and
SGML seems to be up to the task.  I don't even know if you could have
EDI documents in an ODA environment.  Anybody who cares to comment?

Then there's the demands on the processing environment.  An ODA
environment needs to be pretty extensive to be useful, whereas an SGML
environment could start with only a few hundred lines of Emacs lisp
code.  In most of the cases I've seen, interoperability with other
systems is not that important, anyhow, which would make ODA overkill.
Intelligent searching and indexing tools, however, has been much more
important.  It is easier to do this with SGML, and furthermore, the
user can look at the SGML encoded document without an intelligent
parser.  This latter has had enormous importance with my clients.
Your milage may vary, of course.

--
[Erik Naggum]		Naggum Software; Gaustadalleen 21; 0371 OSLO; NORWAY
	I disclaim,	<erik@naggum.uu.no>, <enag@ifi.uio.no>
  therefore I post.	+47-295-8622, +47-256-7822, (fax) +47-260-4427
--

mss+@andrew.cmu.edu (Mark Sherman) (10/22/90)

There is no substantive difference between the granularity of ODA's
specific logical structures and SGML's tagged documents. Both are
essententially trees: nodes, edges and leaves.  Both allow and neither
one requires that, say, a chapter or a word be a piece of the structure.
What else is there to say?

Since SGML has no notion of formatting or presentation, it is not
meaningful to compare any SGML facility with ODA's layout facility. We
successfully interchanged processable documents w/o any specific layout
structure. Arguments about layout facilities are red herrings.

There are some differences between ODA's generic logical structure and
SGML's DTD, but both have more complexity than most documents use.

Before you say how easy it is to search SGML text, are you expecting all
of the tags to be explicit (or expecting all of them to be implicit)?
Depending on the application, the lack of tags makes searching useless
(while in other applications their presence gets in the way). What about
multiple revisions? Just searching for a string doesn't tell you if it
is really in the document -- you've got to parse the structure to see
when it applies.

I've heard a lot of things about EDI, and its relationship to both ODA
and SGML. For something which apparently is very big, I don't see any
discussions of it on Usenet or Internet. Am I not looking in the right
place?

I can "cheat" in ODA as well as SGML, and say that I'll only use it one
way, which can make life easier. But as an implementor of either a SGML
or ODA conforming system, I have to handle everything by the book (the
exact book depends on whether its a particular SGML application, ODA
DAP, etc, etc). If you want to make your own subset and then exclaim how
wonderful the standard is, I suggest you read the a recent editoral in
(I think) CD ROM End User with the title something of the form
"SGML-like is not SGML".

			-Mark

enag@ifi.uio.no (Erik Naggum) (10/23/90)

I've reviewed the articles I wrote under this subject, and I'm having
a hard time matching Mark Sherman's comments to mine.  Quoting from
the original article may be wasteful of bandwidth, but it ensures that
the reader can check the reply against the question.

I wrote:
"... as far as I have looked into ODA, the granularity is not entirely
of your own choice."

Mark replies:
   There is no substantive difference between the granularity of ODA's
   specific logical structures and SGML's tagged documents. Both are
   essententially trees: nodes, edges and leaves.  Both allow and
   neither one requires that, say, a chapter or a word be a piece of
   the structure.  What else is there to say?

Apart from my stressing _choice_, I have already mentioned several
powerful elements having no ODA counterpart.

I wrote:
"The power of the ODA generic [logical structure], is not anywhere
near the power of the SGML DTD."

Mark replies:
   There are some differences between ODA's generic logical structure
   and SGML's DTD, but both have more complexity than most documents
   use.

I said "SGML > ODA"; you counter with "There exists an X such that
SGML > X and ODA > X".  You can convert ODA to SGML and back without
loss of information, but you can't do it the other way around without
loss of information.  To me, this is sufficient to support my "SGML >
ODA" claim.

I wrote:
"Intelligent searching and indexing tools, however, has been much more
important.  It is easier to do this with SGML, and furthermore, the
user can look at the SGML encoded document without an intelligent
parser."

Mark replies:
   Before you say how easy it is to search SGML text, are you
   expecting all of the tags to be explicit (or expecting all of them
   to be implicit)?  Depending on the application, the lack of tags
   makes searching useless (while in other applications their presence
   gets in the way). What about multiple revisions? Just searching for
   a string doesn't tell you if it is really in the document -- you've
   got to parse the structure to see when it applies.

I intended to imply by the use of the term "intelligent searching and
indexing tools" that they should take the structure of the document
into consideration.  I do not regard anything which does not know the
structure of an SGML document to be intelligent.  Ergo: Parse the
document or die.

Additionally, you may use existing tools while the new, improved ones
are being built.


Hmmm, I really thought I wrote something on a prototype, but I can't
find it.  My, how I miss intelligent searching and indexing tools! :-)

I wrote, at least:
"An ODA environment needs to be pretty extensive to be useful, whereas
an SGML environment could start with only a few hundred lines of Emacs
lisp code."

Mark replies:
   I can "cheat" in ODA as well as SGML, and say that I'll only use it
   one way, which can make life easier. But as an implementor of
   either a SGML or ODA conforming system, I have to handle everything
   by the book (the exact book depends on whether its a particular
   SGML application, ODA DAP, etc, etc). If you want to make your own
   subset and then exclaim how wonderful the standard is, I suggest
   you read the a recent editoral in (I think) CD ROM End User with
   the title something of the form "SGML-like is not SGML".

I believe I said something on my writing an SGML parser in a couple of
days, and a preprocessor to take care of the minimization in another
two, to show a client "what we could do with SGML".  That does not
count as "cheat" in my vocabulary.  If you allude to my "start with
only a few hundred lines of Emacs lisp code", as cheating, I think
you're unfair.  I see no need for you to protect your own hide so
fervently, or indeed any need to attempt to damage mine.  Nowhere have
I said that it is extremely simple to make a full-fledged conformant
SGML system, just that you can get something useful off the ground in
a smallish amount of time.  Customers like to see something before
they pay you enough to make the tax people turn into vampires.

About prototypes, you could say "application-like is not application",
and try to denounce prototypes per se.  I don't do that.  I also don't
leave a customer with the prototype.  As you're on some mailing lists
I'm also on, you have already observed to which extent I insist on
following the standards involved.  I'm quite annoyed at your comment.

We have already taken this to private mail.  The above is not meant as
a polemic intended to be replied to, ad infintum.

--
[Erik Naggum]		Naggum Software; Gaustadalleen 21; 0371 OSLO; NORWAY
	I disclaim,	<erik@naggum.uu.no>, <enag@ifi.uio.no>
  therefore I post.	+47-295-8622, +47-256-7822, (fax) +47-260-4427
--