crowl@cs.rochester.edu (Lawrence Crowl) (09/12/90)
In article <1990Sep11.193327.19935@terminator.cc.umich.edu> jwh@ifs.umich.edu writes: >A "DTD" is a Document Type Definition. It is used to spell out rules which >govern the markup used in a document. SGML itself does not specify or define >any specific markup elements (such as chapter, paragraph, title, etc.) It >really specifies a meta-language for defining markup languages. A DTD will >specify (among other things) what markup "tags" are legal and where tags can >occur. The idea of SGML as a meta-language for defining document syntax is a good idea. However, most of the time, I'm just writing text. Does there exist a standard for a DTD that says what tags to use in book/articles? For instance, should I use <paragraph> or <par> or what? How should I describe bulleted lists? >The parser would use the DTD to analyze the document for violations of the >DTD rules and possibly some other work such as translating the document into >a series of typesetting commands. But typesetting the document requires that one know what <chapter> and <paragraph> mean. The SGML standard doesn't say what they mean. Does anyone know of work in this area? Surely the book and journal publishers have an opinion on what should be done. -- Lawrence Crowl 716-275-9499 University of Rochester crowl@cs.rochester.edu Computer Science Department ...!{ames,rutgers}!rochester!crowl Rochester, New York, 14627
bzs@world.std.com (Barry Shein) (09/12/90)
There are some books which describe sample DTD's, for example "SGML: An Author's Guide" by Martin Bryan (Addison-Wesley.) But none of these things is authoritative. If no one can come forward with an authoritative DTD why don't those of us who understand a little about this stuff come up with a DTD right here? If nothing else the discussions surrounding that should be informative to those trying to learn more. We can call it "The USENET SGML DTD" and put it into the public domain. If it seems reasonable I'll use it for "The Online Book Initiative", we've been using some of our own conventions but it wouldn't be hard to conform. There's nothing wrong with using previous DTD's for starters, just as one would use other conventions when trying to pick elements. Here's my contribution: <pp> begin paragraph </pp> end paragraph and the common convention that <XYZ> is ended by </XYZ> where needed. And my first question: Are we happy with the convention &char-name to encode non-ascii characters (e.g. öaut), how far along is the Text Encoding Initiative with this? Can we use their conventions yet? "Your" move. -- -Barry Shein Software Tool & Die | {xylogics,uunet}!world!bzs | bzs@world.std.com Purveyors to the Trade | Voice: 617-739-0202 | Login: 617-739-WRLD
blarsen@spider.uio.no (Bjorn Larsen) (09/12/90)
In article <BZS.90Sep11225016@world.std.com> bzs@world.std.com (Barry Shein) writes: > > Are we happy with the convention &char-name to encode non-ascii > characters (e.g. öaut), how far along is the Text Encoding > Initiative with this? Can we use their conventions yet? > No, of course we're not! I wite Norwegian using SGML, and of course it Will Not Do for me to have to scatter Ø, ø, Å, å, Æ and æ specifications all around my text. I want to be able to use ISO Latin 1. At the moment, the text I write end up inside The Publisher, and my approach is the following: 1. Write the text using ISO Latin 1. (or, occationally, write the text on my Mac, and then translate it to ISO Latin 1. Argh.) 2. Translate ISO Latin 1 to SGML-type codes. 3. Import the resulting SGML into The Publisher. Of course, a small perl script makes it easy to automate this, but I really got irritated about a year back when it dawned on me that I was expected to write my name as "Bjø rn" in SGML-ese. Anyways, what is the 'Text Encoding Initiative', and what are their 'conventions'? -- Bjorn Larsen University of Oslo, Norway Bjorn.Larsen@usit.uio.no
jwh@boston.ifs.umich.edu (Jim Howe) (09/12/90)
In article <1990Sep12.020242.2916@cs.rochester.edu>, crowl@cs.rochester.edu (Lawrence Crowl) writes: |> The idea of SGML as a meta-language for defining document syntax is a good |> idea. However, most of the time, I'm just writing text. Does there exist a |> standard for a DTD that says what tags to use in book/articles? For instance, |> should I use <paragraph> or <par> or what? How should I describe bulleted |> lists? |> There do exist some standard DTD's. The one I am most familiar with is the AAP (Association of American Publishers) DTD's. They have defined DTD's that are used for books and magazine articles. These DTD's have been adopted by ANSI as ANSI/NISO Z39.59-1988. There is another standard in the works for the CALS project. It is known as Mil-28001 and is quite complex. |> |> But typesetting the document requires that one know what <chapter> and |> <paragraph> mean. The SGML standard doesn't say what they mean. Does anyone |> know of work in this area? Surely the book and journal publishers have an |> opinion on what should be done. The typeset definition of <chapter>, <paragraph>, etc. is purely up to the discretion of the publisher (with possible input from the author). Different publishers may choose to format the document differently. Also, the definition of tags will vary depending on the publication medium. The same document may be both published on paper as well as on CD-ROM, for example. There is work being done to define a formatting standard as well as a markup standard but that work is currently incomplete. |> -- |> Lawrence Crowl 716-275-9499 University of Rochester |> crowl@cs.rochester.edu Computer Science Department |> ...!{ames,rutgers}!rochester!crowl Rochester, New York, 14627 James W. Howe internet: jwh@ifs.umich.edu University of Michigan uucp: uunet!mailrus!ifs.umich.edu!jwh Ann Arbor, MI 48103-4943
enag@ifi.uio.no (Erik Naggum) (09/13/90)
Hi, Bjørn! In an SGML document, you have the option of defining a character set with a mapping to binary values in the text, outside the ISO 646 character set. I don't have the standard handy, but I seem to remember you define blocks of character codes in a CHARSET clause. This is the way you deal with Japanese, Arabic, etc, characters. A report on how this is done, is given in ISO TR 9573 (I think that's the number), which I also have in my SGML file. Nice reading. I can dig up the details for you. -- [Erik Naggum] Naggum Software; Gaustadalleen 21; 0371 OSLO; NORWAY I disclaim, <erik@naggum.uu.no>, <enag@ifi.uio.no> therefore I post. +47-295-8622, +47-256-7822, (fax) +47-260-4427
hartman@ide.com (Robert Hartman) (09/13/90)
In article <BZS.90Sep11225016@world.std.com> bzs@world.std.com (Barry Shein) writes: > >There are some books which describe sample DTD's, for example "SGML: >An Author's Guide" by Martin Bryan (Addison-Wesley.) > >But none of these things is authoritative. > >If no one can come forward with an authoritative DTD why don't those >of us who understand a little about this stuff come up with a DTD >right here? If nothing else the discussions surrounding that should be >informative to those trying to learn more. We can call it "The USENET >SGML DTD" and put it into the public domain. If it seems reasonable >I'll use it for "The Online Book Initiative", we've been using some of >our own conventions but it wouldn't be hard to conform. A GREAT IDEA!!! -r
ath@prosys.se (Anders Thulin) (09/13/90)
In article <BZS.90Sep11225016@world.std.com> bzs@world.std.com (Barry Shein) writes: >If no one can come forward with an authoritative DTD why don't those >of us who understand a little about this stuff come up with a DTD >right here? If nothing else the discussions surrounding that should be >informative to those trying to learn more. We can call it "The USENET >SGML DTD" and put it into the public domain. If it seems reasonable >I'll use it for "The Online Book Initiative", we've been using some of >our own conventions but it wouldn't be hard to conform. The 'USENET SGML DTD' is a rather vague description: what types of texts should it be used for? RFC's, Email, Digests, ... ? Or more traditional types like novels, collections of short stories, dramas etc? My own suggestion would be for something like novels. Most people have read at least one, so they wouldn't be entirely unfamiliar :-) And it would also fit rather nicely with OBI ... Choosing a rather restrictive text type could also simplify some of the keyboard conventions: Paragraph breaks could probably be indicated by empty lines, dashes could be '---', quotes could use `` and ''. Of course, this assumes that the system used for parsing would handle shortrefs and the whatnots that are required. >And my first question: > > Are we happy with the convention &char-name to encode non-ascii > characters (e.g. öaut), how far along is the Text Encoding > Initiative with this? Can we use their conventions yet? I am happy with it. It seems to be largely based on the entity sets (for Latin-1 and Latin-2) published in one of appendices of the ISO SGML document - which probably means they would be available for most SGML implementations. Or is there any reason to avoid them? I imagine that an SGML translator would be capable of converting a document using a local concrete syntax to the either of the reference syntaxes defined by SGML. So choosing other conventions should'nt be much of a problem. Or am I mistaken? -- Anders Thulin ath@prosys.se {uunet,mcsun}!sunic!prosys!ath Telesoft Europe AB, Teknikringen 2B, S-583 30 Linkoping, Sweden