lange@iscnvx.uucp (A. S. Lange) (06/27/91)
Can some kind soul point me to an archive site (or some other kind soul) where I can pick up a DTD that describes the "structure" of a UNIX manual page (manpage)? Many thanks. If you e-mail, I will post a summary or whatever information that may be sent. Alex Lange lange@iscnvx.is.lmsc.lockheed.com
emv@msen.com (Ed Vielmetti) (06/28/91)
structure of man page for mythical program. the bsd folks are working on a new "man" macro package for man pages, and I'm not sure what the current state of affairs is. there was some concern that since troff is not free they'd have a hard time shipping troff based man pages, and they were considering scrapping troff entirely for something else, but I don't know where that lead to. --Ed <manpage> <name> verbify -- turn nouns into verbs </name> <synopsis> verbify [ -s stemlist ] noun verbify -v [ -s stemlist ] verb </synopsis> <description> verbify verbifies nouns into a state of submission. The -v option forces verbify to nounify verbs. </description> <environment> Options can be specified in the VERBOPTS environment parameter. Where conflicts exist, options on the command line take precedence. </environment> <examples> % verbify verification verify verificationalize % verbify -v negate negation negative </examples> <files> /usr/lib/verbify/stemming stem rules /usr/dict/words system dictionary </files> <limitations> Words can be no longer than 1024 characters. <see-also> prepositionalize(1), gerundify(1) </see-also> <authors> Naomi Valentine </authors> <diagnostics> -- not a noun: unchanged When the input word is not a noun (i.e., a participle), it is left unaltered. </diagnostics> <bugs> The -v switch is a crock. Performance is slow and uses system resources prodigiously. The stemming rules' coverage is uneven; new installations will probably want to monitor the output for several months to gather local additions. </bugs> </manpage>
erik@naggum.no (Erik Naggum) (06/30/91)
Ed Vielmetti <emv@msen.com> writes: | | structure of man page for mythical program. | | [example deleted] Ah, but this is only an instance. I'd imagine the DTD for this instance to look like: <!DOCTYPE manpage [ <!ENTITY % contents "synopsis,description,environment,examples, files,limitations,see-also,authors,diagnostics,bugs"> <!ELEMENT manpage (%contents)> <!ELEMENT (%contents) (#PCDATA)> ]> which I think is horrible. This is also an excellent example of the "structure vs contents" debate. Ed has given us a list of elements which reflect the contents of the man-page, and has assumed that line breaks are significant in some elements and not in others, that lead- ing blank sequences are ignored, that empty lines are meaningful, that in the examples, user input and program output are intuitively differ- entiated, that in the see-also element, manual sections are indicated by a parenthesized number, etc, etc, all of which is probably useful for a richtext rendering of a manual page, but is _very_ hard to make useful for typeset manual pages. (I do think that we need one format for both viewable and printable man pages.) From a structural point of view, we see that a manual-page consists of a title (with its own particular semantics), and several sections. Sections consist of a header, and some contents. The contents can be made up of several types of tokens (command name, options, arguments, to name a few), and maybe examples. Examples should differentiate between system prompts and output and user input on the other, if it's desirable to print the user input in blue, in italicized characters, underlined, or whatever. The contents will also need to contain lists of several types, highlighted phrases which are not any of the token types listed above, and may need to use special words in small capitals (e.g. "UNIX"). Use of constant-width fonts for examples and certain keywords have been useful in the past. I'm looking at the man pages stored at this system (SunOS 4.1, I think), and they're pretty hairy, with lots of details. Lar Kaufman told me that the OSF has done something on manual pages, but I haven't had time to find out what they've done. The following is an attempt to make a useful manual page DTD. THIS IS AN EXAMPLE, ONLY. DON'T USE IT WITHOUT CONTACTING ME. No warranties, express or implied. All rights to original material reserved. Permission to use as an example is granted to readers of comp.text.sgml. This material contains quoted material from Sun Microsystems, Inc. SunOS&tm; Reference Manual, copyright 1987, 1988 by Sun Microsystems, Inc. Material used for instructional purposes, only. ISBN number is valid, but not assignable. Don't refer to it. <!DOCTYPE manual [ <!-- FORMAL PUBLIC identifier: "+//ISBN 82-7640-023-X//DTD Manual page//EN" --> <!-- Copyright 1991 by Naggum Software, Oslo, Norway --> <!-- These parameter entities may be redefined --> <!ENTITY % otherh "bugs"> <!ENTITY % otherp "file"> <!ENTITY % x.con "sys|user"> <!ENTITY % rend "b,i,bi,sc,r"> <!-- Don't touch the rest, unless you _really_ need to --> <!-- Define documents referred to in doc/names.entity, and let this file contain lines such as: <!ENTITY OS "SunOS"> <!ENTITY OSv "&OS; 4.1"> <!ENTITY GSBG "<cit>Getting Started</>"> <!ENTITY SUBG "<cit>Customizing &OS;</>"> <!ENTITY MMBG "<cit>Mail and Messages</>"> <!ENTITY CHANGE "<cit>&OSv; Release Manual</>"> <!ENTITY INSTALL "<cit>Installing &OSv;</>"> <!ENTITY ADMIN "<cit>System and Network Administration</>"> <!ENTITY SECUR "<cit>Security Features Guide</>"> <!ENTITY REFMAN "<cit>&OS; Reference Manual</>"> <!ENTITY INDEX "<cit>Global Index</>"> <!ENTITY ASSY "<cit>Assembly Language Reference</>"> <!ENTITY DEBUG "<cit>Debugging Tools</>"> <!ENTITY KR "<cit>The C Programming Language</>"> (this list is taken from the SunOS 4.1 man macro package) --> <!ENTITY % docs SYSTEM "doc/names.entity"> %docs; <!ENTITY % sectype "name,syn,desc,opt,ref,files,%otherh;"> <!ENTITY % p.con "hi|cmd|arg|opt|%otherp;"> <!ELEMENT manual - O (sect)+> <!ATTLIST manual entry CDATA #REQUIRED section NUTOKEN #REQUIRED revised NUMBERS #IMPLIED release NUTOKEN #IMPLIED> <!ELEMENT sect - O (head,p+) +(ix)> <!ATTLIST sect type (%sectype) #IMPLIED> <!ELEMENT ix - - (#PCDATA)> <!ELEMENT head O O (#PCDATA)> <!ELEMENT p O O (#PCDATA|optlist|list|xmp|cit|%p.con;)*> <!ELEMENT optlist - - (option)+> <!ELEMENT option - O (opt, arg?, (#PCDATA|%p.con;)*)> <!ELEMENT list - - (item)+> <!ATTLIST list type (tag,ord,unord) unord> <!ELEMENT item - O (tag?, (#PCDATA|hi)*)> <!ELEMENT tag - - (#PCDATA)> <!ELEMENT xmp - - (%x.con)+> <!ELEMENT (%x.con) - O (#PCDATA)> <!ELEMENT cit - - (#PCDATA|hi)*> <!ELEMENT (%p.con) - - (#PCDATA)> <!ATTLIST hi rend (%rend;) #IMPLIED> <!ATTLIST cmd sect NUTOKEN #IMPLIED> ]> and an example man-page (cal(1)) (from SunOS 4.1 distribution): <manual entry=cal section=1 release="4.1" revised="1987 09 09"> <sect name>NAME <p><ix>cal -- display a calendar</ix> <sect syn>SYNOPSIS <cmd>cal</> [ [ <arg>month</> ] <arg>year</> ] <sect desc>DESCRIPTION <p><cmd>cal</> displays a calendar for the specified year. If a month is also specified, a calendar for that month only is displayed. If neither is specified, a calendar for the present month is printed. <p><arg>year</> can be between 1 and 9999. Be aware that `<cmd>"cal 78"</>' refers to the early Christian era, not the 20th century. Also, the year is always considered to start in January, even though this is historically naive. <p><arg>month</> is a number between 1 and 12. <p>The calendar produced is that for England and her colonies. <p>Try September 1752. </manual> A simple description of this document type: A manual entry consists of several sections, and has an entry and a section attributed associated with it, and may have revision date and OS release attributes as well. The section starts with a digit, but may be followed by any digit or letter. With the section, entry is used for indexing and referencing purposes. Both of these could probably be subsumed by the file name, but there's no way to specify or retrieve the file name of an entity in SGML, so I left them as attributes, probably machine generatable. Sections have headings and a type, which reflect the intrinsic type of a section, such as name, synopsis, description, options, references (see also), files, bugs, author, etc, and may be redundant, since the content of the header element is supposed to reflect this type. May be useful for alternate versions. (SunOS has both BSD and SysV options, and need to distinguish them, but they're both option-type sections.) The heading is always the first element after the section start-tag, and its tags may be omitted. Following the heading are several paragraphs of text. Paragraphs are more content-oriented than the overall structure, in that several types of elements are specifically named, such as option list, (other) list, example, citation, in addition to general para- graph content, highlighted phrases, command names, argument names, option names, and user-specifyable other content, such as file names. An option list contains a list of options, with optional argument and regular paragraph content. Other lists may have a tag associated with them, useful only if the type attribute is specified as tag. Ordered and unordered lists are avaiable, the latter being the default. Examples contains alternating user and system input and output, where the exact names of these elements are user-defined. The citation element may contain data and highlighted phrases, whereas the other paragraph content just contains data. The highlighted phrase has a rendition attribute associated with it, whose values are user specifyable, and defaults to bold, italic, bold italic, small caps and roman via their initials. The otherh, otherp, x.con, and rend parameter entities may be changed to reflect special needs in some manual entries. One might have another predefined section type by invoking the DTD with the following syntax: <!DOCTYPE manual PUBLIC "+//ISBN 82-7640-023-X//DTD Manual Page//EN" [ <!ENTITY % otherh "bugs,author"> ]> <manual entry=snakoil section=9> ... <sect author>AUTHOR <p>Ken Olson </manual> Using Ed's example command, my suggestion would look like this: <!DOCTYPE manual PUBLIC "+//ISBN 82-7640-023-X//DTD Manual Page//EN" [ <!ENTITY % otherh "bugs,env"> ]> <manual entry=verbify section=1> <sect name>NAME <p>verbify -- turn nouns into verbs <sect syn>SYNOPSIS <p><cmd>verbify</> [ <opt>-s</> <arg>stemlist</> ] <arg>noun</> <p><cmd>verbify</> <opt>-v</> [ <opt>-s</> <arg>stemlist</> ] <arg>verb</> <sect desc>DESCRIPTION <cmd>verbify</> verbifies nouns into a state of submission. <optlist> <option><opt>-v</>forces <cmd>verbify</> to nounify verbs. </optlist> <sect env>ENVIRONMENT <p>Options can be specified in the <hi sc>VERBOPTS</> environment parameter. Where conflicts exist, options on the command line take precedence. <sect xmp>EXAMPLES <xmp> <sys>% <user>verbify verification <sys>verify <sys>verificationalize <sys>% <user>verbify -v negate <sys>negation <sys>negative </xmp> <sect>FILES <file>/usr/lib/verbify/stemming</> stem rules <file>/usr/dict/words</> system dictionary <sect>LIMITATIONS <p>Words can be no longer than 1024 characters. <sect ref>SEE-ALSO <cmd sect=1>prepositionalize</>, <cmd sect=1>gerundify</> <sect>AUTHORS Naomi Valentine <sect>DIAGNOSTICS <list tag> <item><tag>-- not a noun: unchanged</tag> When the input word is not a noun (<hi i>i.e.</>, a participle), it is left unaltered. </list> <sect bugs>BUGS <p>The <opt>-v</> switch is a crock. <p>Performance is slow and uses system resources prodigiously. <p>The stemming rules' coverage is uneven; new installations will probably want to monitor the output for several months to gather local additions. </manual> There is definitely room for improvement in both the DTD and the examples. I hope this has been useful. </Erik> -- Erik Naggum Professional Programmer +47-2-836-863 Naggum Software Electronic Text <erik@naggum.no> 0118 OSLO, NORWAY Computer Communications <enag@ifi.uio.no>