[comp.text.sgml] Is there a DTD standard?

crowl@cs.rochester.edu (Lawrence Crowl) (09/12/90)

In article <1990Sep11.193327.19935@terminator.cc.umich.edu>
jwh@ifs.umich.edu writes:
>A "DTD" is a Document Type Definition.  It is used to spell out rules which
>govern the markup used in a document.  SGML itself does not specify or define
>any specific markup elements (such as chapter, paragraph, title, etc.)  It 
>really specifies a meta-language for defining markup languages.  A DTD will 
>specify (among other things) what markup "tags" are legal and where tags can
>occur. 

The idea of SGML as a meta-language for defining document syntax is a good
idea.  However, most of the time, I'm just writing text.  Does there exist a
standard for a DTD that says what tags to use in book/articles?  For instance, 
should I use <paragraph> or <par> or what?  How should I describe bulleted
lists?

>The parser would use the DTD to analyze the document for violations of the
>DTD rules and possibly some other work such as translating the document into
>a series of typesetting commands. 

But typesetting the document requires that one know what <chapter> and
<paragraph> mean.  The SGML standard doesn't say what they mean.  Does anyone
know of work in this area?  Surely the book and journal publishers have an
opinion on what should be done.
-- 
  Lawrence Crowl		716-275-9499	University of Rochester
		      crowl@cs.rochester.edu	Computer Science Department
	  ...!{ames,rutgers}!rochester!crowl	Rochester, New York,  14627

bzs@world.std.com (Barry Shein) (09/12/90)

There are some books which describe sample DTD's, for example "SGML:
An Author's Guide" by Martin Bryan (Addison-Wesley.)

But none of these things is authoritative.

If no one can come forward with an authoritative DTD why don't those
of us who understand a little about this stuff come up with a DTD
right here? If nothing else the discussions surrounding that should be
informative to those trying to learn more. We can call it "The USENET
SGML DTD" and put it into the public domain. If it seems reasonable
I'll use it for "The Online Book Initiative", we've been using some of
our own conventions but it wouldn't be hard to conform.

There's nothing wrong with using previous DTD's for starters, just as
one would use other conventions when trying to pick elements.

Here's my contribution:

	<pp>		begin paragraph
	</pp>		end paragraph

and the common convention that <XYZ> is ended by </XYZ> where needed.

And my first question:

	Are we happy with the convention &char-name to encode non-ascii
	characters (e.g. &oumlaut), how far along is the Text Encoding
	Initiative with this? Can we use their conventions yet?

"Your" move.
-- 
        -Barry Shein

Software Tool & Die    | {xylogics,uunet}!world!bzs | bzs@world.std.com
Purveyors to the Trade | Voice: 617-739-0202        | Login: 617-739-WRLD

blarsen@spider.uio.no (Bjorn Larsen) (09/12/90)

In article <BZS.90Sep11225016@world.std.com> bzs@world.std.com (Barry Shein) writes:
> 
> 	Are we happy with the convention &char-name to encode non-ascii
> 	characters (e.g. &oumlaut), how far along is the Text Encoding
> 	Initiative with this? Can we use their conventions yet?
> 

No, of course we're not! I wite Norwegian using SGML, and of course
it Will Not Do for me to have to scatter &Oslash, &oslash, &Aring,
&aring, &AElig and &aelig specifications all around my text.

I want to be able to use ISO Latin 1. At the moment, the text I write
end up inside The Publisher, and my approach is the following:

1. Write the text using ISO Latin 1. (or, occationally, write the
   text on my Mac, and then translate it to ISO Latin 1. Argh.)
2. Translate ISO Latin 1 to SGML-type codes.
3. Import the resulting SGML into The Publisher.

Of course, a small perl script makes it easy to automate this,
but I really got irritated about a year back when it dawned on me
that I was expected to write my name as "Bj&oslash rn" in SGML-ese.

Anyways, what is the 'Text Encoding Initiative', and what are their
'conventions'?

--
Bjorn Larsen                University of Oslo,  Norway
                               Bjorn.Larsen@usit.uio.no

jwh@boston.ifs.umich.edu (Jim Howe) (09/12/90)

In article <1990Sep12.020242.2916@cs.rochester.edu>,
crowl@cs.rochester.edu (Lawrence Crowl) writes:

|> The idea of SGML as a meta-language for defining document syntax is a good
|> idea.  However, most of the time, I'm just writing text.  Does there exist a
|> standard for a DTD that says what tags to use in book/articles?  For
instance, 
|> should I use <paragraph> or <par> or what?  How should I describe bulleted
|> lists?
|> 

There do exist some standard DTD's.  The one I am most familiar with is the AAP
(Association of American Publishers) DTD's.  They have defined DTD's that are
used for books and magazine articles.  These DTD's have been adopted by ANSI
as ANSI/NISO Z39.59-1988.  There is another standard in the works for the CALS
project.  It is known as Mil-28001 and is quite complex.
  
|> 
|> But typesetting the document requires that one know what <chapter> and
|> <paragraph> mean.  The SGML standard doesn't say what they mean. 
Does anyone
|> know of work in this area?  Surely the book and journal publishers have an
|> opinion on what should be done.

The typeset definition of <chapter>, <paragraph>, etc. is purely up to
the discretion
of the publisher (with possible input from the author).  Different
publishers may
choose to format the document differently.  Also, the definition of tags
will vary
depending on the publication medium.  The same document may be both
published on
paper as well as on CD-ROM, for example.  There is work being done to
define a formatting
standard as well as a markup standard but that work is currently incomplete.

|> -- 
|>   Lawrence Crowl		716-275-9499	University of Rochester
|> 		      crowl@cs.rochester.edu	Computer Science Department
|> 	  ...!{ames,rutgers}!rochester!crowl	Rochester, New York,  14627


James W. Howe			   internet: jwh@ifs.umich.edu
University of Michigan             uucp:     uunet!mailrus!ifs.umich.edu!jwh
Ann Arbor, MI   48103-4943         

enag@ifi.uio.no (Erik Naggum) (09/13/90)

Hi, Bj&oslash;rn!

In an SGML document, you have the option of defining a character set
with a mapping to binary values in the text, outside the ISO 646
character set.  I don't have the standard handy, but I seem to
remember you define blocks of character codes in a CHARSET clause.
This is the way you deal with Japanese, Arabic, etc, characters.  A
report on how this is done, is given in ISO TR 9573 (I think that's
the number), which I also have in my SGML file.  Nice reading.

I can dig up the details for you.
--
[Erik Naggum]		Naggum Software; Gaustadalleen 21; 0371 OSLO; NORWAY
	I disclaim,	<erik@naggum.uu.no>, <enag@ifi.uio.no>
  therefore I post.	+47-295-8622, +47-256-7822, (fax) +47-260-4427

hartman@ide.com (Robert Hartman) (09/13/90)

In article <BZS.90Sep11225016@world.std.com> bzs@world.std.com (Barry Shein) writes:
>
>There are some books which describe sample DTD's, for example "SGML:
>An Author's Guide" by Martin Bryan (Addison-Wesley.)
>
>But none of these things is authoritative.
>
>If no one can come forward with an authoritative DTD why don't those
>of us who understand a little about this stuff come up with a DTD
>right here? If nothing else the discussions surrounding that should be
>informative to those trying to learn more. We can call it "The USENET
>SGML DTD" and put it into the public domain. If it seems reasonable
>I'll use it for "The Online Book Initiative", we've been using some of
>our own conventions but it wouldn't be hard to conform.

A GREAT IDEA!!!  -r

ath@prosys.se (Anders Thulin) (09/13/90)

In article <BZS.90Sep11225016@world.std.com> bzs@world.std.com (Barry Shein) writes:

>If no one can come forward with an authoritative DTD why don't those
>of us who understand a little about this stuff come up with a DTD
>right here? If nothing else the discussions surrounding that should be
>informative to those trying to learn more. We can call it "The USENET
>SGML DTD" and put it into the public domain. If it seems reasonable
>I'll use it for "The Online Book Initiative", we've been using some of
>our own conventions but it wouldn't be hard to conform.

The 'USENET SGML DTD' is a rather vague description: what types of
texts should it be used for? RFC's, Email, Digests, ... ?  Or more
traditional types like novels, collections of short stories, dramas
etc?

My own suggestion would be for something like novels. Most people have
read at least one, so they wouldn't be entirely unfamiliar :-) And it
would also fit rather nicely with OBI ...

Choosing a rather restrictive text type could also simplify some of
the keyboard conventions: Paragraph breaks could probably be indicated
by empty lines, dashes could be '---', quotes could use `` and ''. Of
course, this assumes that the system used for parsing would handle
shortrefs and the whatnots that are required.

>And my first question:
>
>	Are we happy with the convention &char-name to encode non-ascii
>	characters (e.g. &oumlaut), how far along is the Text Encoding
>	Initiative with this? Can we use their conventions yet?

I am happy with it. It seems to be largely based on the entity sets
(for Latin-1 and Latin-2) published in one of appendices of the ISO
SGML document - which probably means they would be available for most
SGML implementations.  Or is there any reason to avoid them?

I imagine that an SGML translator would be capable of converting a
document using a local concrete syntax to the either of the reference
syntaxes defined by SGML. So choosing other conventions should'nt be
much of a problem. Or am I mistaken?

-- 
Anders Thulin       ath@prosys.se   {uunet,mcsun}!sunic!prosys!ath
Telesoft Europe AB, Teknikringen 2B, S-583 30 Linkoping, Sweden