[comp.text.sgml] DTD for UNIX Manual Page

lange@iscnvx.uucp (A. S. Lange) (06/27/91)

Can some kind soul point me to an archive site (or some other kind soul)
where I can pick up a DTD that describes the "structure" of a UNIX
manual page (manpage)?

Many thanks.  If you e-mail, I will post a summary or whatever information
that may be sent.

Alex Lange
lange@iscnvx.is.lmsc.lockheed.com

emv@msen.com (Ed Vielmetti) (06/28/91)

structure of man page for mythical program.

the bsd folks are working on a new "man" macro package for man pages,
and I'm not sure what the current state of affairs is.  there was some
concern that since troff is not free they'd have a hard time shipping
troff based man pages, and they were considering scrapping troff
entirely for something else, but I don't know where that lead to.

--Ed

<manpage>
<name> verbify -- turn nouns into verbs </name>
<synopsis>
	verbify [ -s stemlist ] noun
	verbify -v [ -s stemlist ] verb
</synopsis>
<description>
	verbify verbifies nouns into a state of submission.

	The -v option forces verbify to nounify verbs.
</description>
<environment>
	Options can be specified in the VERBOPTS environment
	parameter.  Where conflicts exist, options on the
	command line take precedence.
</environment>
<examples>
	% verbify verification
	verify
	verificationalize
	% verbify -v negate
	negation
	negative
</examples>
<files>
	/usr/lib/verbify/stemming	stem rules
	/usr/dict/words			system dictionary
</files>
<limitations>
	Words can be no longer than 1024 characters.
<see-also>
	prepositionalize(1), gerundify(1)
</see-also>
<authors>
	Naomi Valentine
</authors>
<diagnostics>
	-- not a noun: unchanged
		When the input word is not a noun (i.e., a
		participle), it is left unaltered.
</diagnostics>
<bugs>
	The -v switch is a crock.

	Performance is slow and uses system resources prodigiously.

	The stemming rules' coverage is uneven; new installations will
	probably want to monitor the output for several months to
	gather local additions.
</bugs>
</manpage>	

erik@naggum.no (Erik Naggum) (06/30/91)

Ed Vielmetti <emv@msen.com> writes:
|
|   structure of man page for mythical program.
|
|   [example deleted]

Ah, but this is only an instance.  I'd imagine the DTD for this
instance to look like:

	<!DOCTYPE manpage [
	<!ENTITY % contents "synopsis,description,environment,examples,
		   files,limitations,see-also,authors,diagnostics,bugs">
	<!ELEMENT manpage (%contents)>
	<!ELEMENT (%contents) (#PCDATA)>
	]>

which I think is horrible.  This is also an excellent example of the
"structure vs contents" debate.  Ed has given us a list of elements
which reflect the contents of the man-page, and has assumed that line
breaks are significant in some elements and not in others, that lead-
ing blank sequences are ignored, that empty lines are meaningful, that
in the examples, user input and program output are intuitively differ-
entiated, that in the see-also element, manual sections are indicated
by a parenthesized number, etc, etc, all of which is probably useful
for a richtext rendering of a manual page, but is _very_ hard to make
useful for typeset manual pages.  (I do think that we need one format
for both viewable and printable man pages.)

From a structural point of view, we see that a manual-page consists of
a title (with its own particular semantics), and several sections.
Sections consist of a header, and some contents.  The contents can be
made up of several types of tokens (command name, options, arguments,
to name a few), and maybe examples.  Examples should differentiate
between system prompts and output and user input on the other, if it's
desirable to print the user input in blue, in italicized characters,
underlined, or whatever.  The contents will also need to contain
lists of several types, highlighted phrases which are not any of the
token types listed above, and may need to use special words in small
capitals (e.g. "UNIX").  Use of constant-width fonts for examples and
certain keywords have been useful in the past.

I'm looking at the man pages stored at this system (SunOS 4.1, I
think), and they're pretty hairy, with lots of details.

Lar Kaufman told me that the OSF has done something on manual pages,
but I haven't had time to find out what they've done.

The following is an attempt to make a useful manual page DTD.

THIS IS AN EXAMPLE, ONLY.  DON'T USE IT WITHOUT CONTACTING ME.

No warranties, express or implied.  All rights to original material
reserved.  Permission to use as an example is granted to readers of
comp.text.sgml.  This material contains quoted material from Sun
Microsystems, Inc. SunOS&tm; Reference Manual, copyright 1987, 1988 by
Sun Microsystems, Inc.  Material used for instructional purposes,
only.  ISBN number is valid, but not assignable.  Don't refer to it.

	<!DOCTYPE manual [

	<!-- FORMAL PUBLIC identifier:
		"+//ISBN 82-7640-023-X//DTD Manual page//EN" -->
	<!-- Copyright 1991 by Naggum Software, Oslo, Norway -->

	<!-- These parameter entities may be redefined -->
	<!ENTITY % otherh	"bugs">
	<!ENTITY % otherp	"file">
	<!ENTITY % x.con	"sys|user">
	<!ENTITY % rend		"b,i,bi,sc,r">
	<!-- Don't touch the rest, unless you _really_ need to -->

	<!-- Define documents referred to in doc/names.entity,
	     and let this file contain lines such as:
	    <!ENTITY OS	        "SunOS">
	    <!ENTITY OSv	"&OS; 4.1">
	    <!ENTITY GSBG	"<cit>Getting Started</>">
	    <!ENTITY SUBG	"<cit>Customizing &OS;</>">
	    <!ENTITY MMBG	"<cit>Mail and Messages</>">
	    <!ENTITY CHANGE	"<cit>&OSv; Release Manual</>">
	    <!ENTITY INSTALL	"<cit>Installing &OSv;</>">
	    <!ENTITY ADMIN	"<cit>System and Network Administration</>">
	    <!ENTITY SECUR	"<cit>Security Features Guide</>">
	    <!ENTITY REFMAN	"<cit>&OS; Reference Manual</>">
	    <!ENTITY INDEX	"<cit>Global Index</>">
	    <!ENTITY ASSY	"<cit>Assembly Language Reference</>">
	    <!ENTITY DEBUG	"<cit>Debugging Tools</>">
	    <!ENTITY KR		"<cit>The C Programming Language</>">
	(this list is taken from the SunOS 4.1 man macro package) -->

	<!ENTITY % docs	SYSTEM "doc/names.entity"> %docs;

	<!ENTITY % sectype	"name,syn,desc,opt,ref,files,%otherh;">
	<!ENTITY % p.con	"hi|cmd|arg|opt|%otherp;">

	<!ELEMENT manual   - O	(sect)+>
	<!ATTLIST manual	entry   CDATA   #REQUIRED
				section NUTOKEN #REQUIRED
				revised NUMBERS #IMPLIED
				release NUTOKEN #IMPLIED>

	<!ELEMENT sect	   - O	(head,p+)	+(ix)>
	<!ATTLIST sect		type (%sectype) #IMPLIED>

	<!ELEMENT ix	   - -	(#PCDATA)>

	<!ELEMENT head	   O O	(#PCDATA)>

	<!ELEMENT p	   O O	(#PCDATA|optlist|list|xmp|cit|%p.con;)*>

	<!ELEMENT optlist  - -  (option)+>
	<!ELEMENT option   - O	(opt, arg?, (#PCDATA|%p.con;)*)>

	<!ELEMENT list     - -	(item)+>
	<!ATTLIST list		type (tag,ord,unord) unord>
	<!ELEMENT item	   - O  (tag?, (#PCDATA|hi)*)>
	<!ELEMENT tag	   - -  (#PCDATA)>

	<!ELEMENT xmp	   - -	(%x.con)+>
	<!ELEMENT (%x.con) - O	(#PCDATA)>

	<!ELEMENT cit	   - -	(#PCDATA|hi)*>
	<!ELEMENT (%p.con) - -	(#PCDATA)>
	<!ATTLIST hi		rend (%rend;) #IMPLIED>
	<!ATTLIST cmd		sect NUTOKEN  #IMPLIED>

	]>

and an example man-page (cal(1)) (from SunOS 4.1 distribution):

	<manual entry=cal section=1 release="4.1" revised="1987 09 09">
	<sect name>NAME
	<p><ix>cal -- display a calendar</ix>
	<sect syn>SYNOPSIS
	<cmd>cal</> [ [ <arg>month</> ] <arg>year</> ]
	<sect desc>DESCRIPTION
	<p><cmd>cal</> displays a calendar for the specified year.
	If a month is also specified, a calendar for that month
	only is displayed. If neither is specified, a calendar for
	the present month is printed.
	<p><arg>year</> can be between 1 and 9999.  Be aware that
	`<cmd>"cal 78"</>' refers to the early Christian era, not
	the 20th century.  Also, the year is always considered to
	start in January, even though this is historically naive.
	<p><arg>month</> is a number between 1 and 12.
	<p>The calendar produced is that for England and her colonies.
	<p>Try September 1752.
	</manual>

A simple description of this document type:

A manual entry consists of several sections, and has an entry and a
section attributed associated with it, and may have revision date and
OS release attributes as well.  The section starts with a digit, but
may be followed by any digit or letter.  With the section, entry is
used for indexing and referencing purposes.  Both of these could
probably be subsumed by the file name, but there's no way to specify
or retrieve the file name of an entity in SGML, so I left them as
attributes, probably machine generatable.

Sections have headings and a type, which reflect the intrinsic type of
a section, such as name, synopsis, description, options, references
(see also), files, bugs, author, etc, and may be redundant, since the
content of the header element is supposed to reflect this type.  May
be useful for alternate versions.  (SunOS has both BSD and SysV
options, and need to distinguish them, but they're both option-type
sections.)  The heading is always the first element after the section
start-tag, and its tags may be omitted.  Following the heading are
several paragraphs of text.

Paragraphs are more content-oriented than the overall structure, in
that several types of elements are specifically named, such as option
list, (other) list, example, citation, in addition to general para-
graph content, highlighted phrases, command names, argument names,
option names, and user-specifyable other content, such as file names.

An option list contains a list of options, with optional argument and
regular paragraph content.  Other lists may have a tag associated with
them, useful only if the type attribute is specified as tag.  Ordered
and unordered lists are avaiable, the latter being the default.

Examples contains alternating user and system input and output, where
the exact names of these elements are user-defined.

The citation element may contain data and highlighted phrases, whereas
the other paragraph content just contains data.  The highlighted
phrase has a rendition attribute associated with it, whose values are
user specifyable, and defaults to bold, italic, bold italic, small
caps and roman via their initials.

The otherh, otherp, x.con, and rend parameter entities may be changed
to reflect special needs in some manual entries.  One might have
another predefined section type by invoking the DTD with the following
syntax:

	<!DOCTYPE manual PUBLIC
		"+//ISBN 82-7640-023-X//DTD Manual Page//EN"
	[ <!ENTITY % otherh "bugs,author"> ]>
	<manual entry=snakoil section=9>
	...
	<sect author>AUTHOR
	<p>Ken Olson
	</manual>

Using Ed's example command, my suggestion would look like this:

	<!DOCTYPE manual PUBLIC "+//ISBN 82-7640-023-X//DTD Manual Page//EN"
	[ <!ENTITY % otherh "bugs,env"> ]>
	<manual entry=verbify section=1>
	<sect name>NAME
	<p>verbify -- turn nouns into verbs
	<sect syn>SYNOPSIS
	<p><cmd>verbify</> [ <opt>-s</> <arg>stemlist</> ] <arg>noun</>
	<p><cmd>verbify</> <opt>-v</> [ <opt>-s</> <arg>stemlist</> ] <arg>verb</>
	<sect desc>DESCRIPTION
	<cmd>verbify</> verbifies nouns into a state of submission.
	<optlist>
	<option><opt>-v</>forces <cmd>verbify</> to nounify verbs.
	</optlist>
	<sect env>ENVIRONMENT
	<p>Options can be specified in the <hi sc>VERBOPTS</> environment
	parameter.  Where conflicts exist, options on the command line take
	precedence.
	<sect xmp>EXAMPLES
	<xmp>
	<sys>% <user>verbify verification
	<sys>verify
	<sys>verificationalize
	<sys>% <user>verbify -v negate
	<sys>negation
	<sys>negative
	</xmp>
	<sect>FILES
	<file>/usr/lib/verbify/stemming</> stem rules
	<file>/usr/dict/words</> system dictionary
	<sect>LIMITATIONS
	<p>Words can be no longer than 1024 characters.
	<sect ref>SEE-ALSO
	<cmd sect=1>prepositionalize</>, <cmd sect=1>gerundify</>
	<sect>AUTHORS
	Naomi Valentine
	<sect>DIAGNOSTICS
	<list tag>
	<item><tag>-- not a noun: unchanged</tag>
	When the input word is not a noun (<hi i>i.e.</>, a participle),
	it is left unaltered.
	</list>
	<sect bugs>BUGS
	<p>The <opt>-v</> switch is a crock.
	<p>Performance is slow and uses system resources prodigiously.
	<p>The stemming rules' coverage is uneven; new installations will
	probably want to monitor the output for several months to
	gather local additions.
	</manual>	

There is definitely room for improvement in both the DTD and the
examples.

I hope this has been useful.

</Erik>
--
Erik Naggum             Professional Programmer            +47-2-836-863
Naggum Software             Electronic Text             <erik@naggum.no>
0118 OSLO, NORWAY       Computer Communications        <enag@ifi.uio.no>