[comp.windows.ms.programmer] RTF description

tom@mims-iris.waterloo.edu (Tom Haapanen) (04/25/91)

The following is a description of the RTF format (for Windows helpfiles)
that I found on Compuserve.  I have uploaded it, along with the Xantippe
hypertext authoring tool (which can export RTF) to cica as well.  Enjoy!

[ \tom haapanen --- university of waterloo --- tom@mims-iris.waterloo.edu ]
[ "i don't even know what street canada is on"               -- al capone ]

---------------------------------------------------------------------------

Specification for RTF
---------------------

RTF text is a form of encoding of various text formatting properties,
document structures, and document properties,
using the printable ASCII character set. Special characters can be also
thus encoded, although RTF does not prevent the utilization of character
codes outside the ASCII printable set.

The main encoding mechanism of "control words" provides a name space that
may be later used to expand the realm of RTF with macros, programming, etc.

1. BASIC INGREDIENTS

Control words are of the form:
	\lettersequence <delimiter>
where <delimiter>. is:
	. a space: the space is part of the control word.
	. a digit or - means that a parameter follows. The following digit
		sequence is then delimited by a space or any other
		non-letter-or-digit as for control words.
	. any other non-letter-or digit: terminates the control word, but is not
		a part of the control word.

By "letter:, here we mean just the upper and lower case ASCII letters.

Control symbols consist of a \ character followed by a single nonletter.
They require no further delimiting.

	Notes: control symbols are compact, but there are not too many
	of them. The number of possible control words are not limited.
	The parameter is partially incorporated in control symbols, so that
	a program that does not understand a control symbol can recognize
	and ignore the corresponding parameter as well.

In addition to control words and control symbols, there are also the braces:
	{       group start, and
	}       group end.
The text grouping will be used for formatting and to delineate document
structure - such as the footnotes, headers, title, and so on.
The control words, control symbols, and braces constitute control information.
All other characters in RTF text constitute "plain text".

Since the characters \, {, and } have specific uses in RFT, the control
symbols \\,\{, and \} are provided to express the corresponding plain
characters.


2. WHAT RFT TEXT MEANS (SEMANTICS)

The reader of a RFT stream will be concerned with:
	Separating control information from plain text.
	Acting on control information. This is designed to be
		a relatively simple process, as described below.
                Some control information just contributes special
                characters to the plain text stream.  Other information
		serves to change the "program state" which includes
		properties of the document as a whole and also a stack
		of "group states" that apply to parts.
                Note that the group state is saved by the { brace and is
		restored by the } brace. The current group state specifies:
		1. the "destination" or part of the document that the
			plain text is building up.
		2. the character formatting properties - such as bold or
			italic.
		3. the paragraph formatting properties - such as justified.
		4. the section formatting properties - such as number of
			columns.
	Collecting and properly disposing of the remaining "plain text"
		as directed by the current group state.

In practice the RFT reader will proceed as follows:
	0. read next char
	1. if ={
		stack current state. current state does not change.
		continue.
	2. if =}
		unstack current state from stack. this will change the
		state in general.
	3. if =\
		collect control word/control symbol and parameter, if any.
		look up word/symbol in symbol table (a constant table)
		and act according to the description there. The different
		actions are listed below. Parameter is left available
                for use by the action.  Leave read pointer before or after
                the delimiter, as appropriate.  After the action, continue.
	4. otherwise, write "plain text" character to current destination
		using current formatting properties.

Given a symbol table etry, the possible actions are as follows:
	A. Change destination:
		change destination to the destination described in the entry.
		Most destination changes are legal only immediately after a {. Other restrictions
		may also apply (for example, footnotes may not be nested.)
	B. Change formatting property:
		The symbol table entry will describe the property and
		whether the parameter is required.
	C. Special character:
		The symbol table entry will describe the character code..
		goto 4.
	D. End of paragraph
		This could be viewed as just a special character.
	E. End of section
		This could be viewed as just a special character.
	F. Ignore

3. SPECIAL CHARACTERS

The special characters are explained as they exist in Mac Word. Clearly,
other characters may be added for interchange with other programs. If
a character name is not recognized by a reader, according to the rules
described above, it will be simply ignored.

	\chpgn          current page number (as in headers)
	\chftn          auto numbered footnote reference
			(footnote to follow in a group)
	\chpict         placeholder character for picture
			(picture to follow in a group)
	\chdate         current date (as in headers)
	\chtime         current time (as in headers)
	\|              formula character
	\~              non-breaking space
	\-              non-required hyphen
	\_              non-breaking hyphen

	\page           required page break
	\line           required line break (no paragraph break)

	\par            end of paragraph.
	\sect           end of section and end of paragraph.
	\tab            same as ASCII 9

For simplicity of opertation, the ASCII codes 9 and 10 will be accepted
as \tab and \par respectively. ASCII 13 will be ignored. The control
code \<10> will be ignored. It may be used to include "soft"
carriage returns for easier readibility but which will have no effect
on the interpretation.

4. DESTINATIONS

The change of destination will reset all properties to default.
Changes are legal only at the beginning of a group (by group here
we mean the text and controls enclosed in braces.)

	\rtf<param>
		The destination is the document. The parameter is the
		version number of the writer. This destination preceded
		by { the beginnings of RTF documents and the corresponding }
		marks the end.
		Legal only once after the initial {.
                Small scale interchange of RTF where other methods for
                marking the end of string are available, as in a string
                constant, need not include this identification but will
                start with this destination as the default.
	\pict
		The destination is a picture. The group must immediately
		follow a \chpict character. The plain text describes
		the picture as a hex dump (string of characters 0,1,...
		9, a, ..., e, f.)
		(Formatting properties to determine data interpretation,
		size)
	\footnote
		The destination is a footnote text. The group must
		immediately follow the footntoe reference character(s).
	\header
		The destination is the header text for the current section.
		The group must precede the first plain text character
		in the section.
	\headerl
		Same as above, but header for left-hand pages.
	\headerr
		Same as above, but header for right-hand pages.
	\headerf
		Same as above, but header for first page.
	\footer
		Same as above, but footer.
	\footerl
		Same as above, but footer for left-hand pages.
	\footerr
		Same as above, but footer for right-hand pages.
	\footerf
		Same as above, but header for first page.
	\ftnsep
		Same as above, but text is footnote separator
	\ftnsepc
		Same as above, but text is separator for continued footnotes.
	\ftncn
		Same as above, but text is continued footnote notice.
	\info
		text is information block for the document. Parts of the
		text is further classified by "properties" of the text
		that are listed below - such as "title". These are not
		formatting properties, but a device to delimit and identify
		parts of the info from the text in the group.
	\stylesheet
		text is the style sheet for the document.
		More precisely, text between semicolons are taken to be
		style names which will be defined to stand for the
		formatting properties which are in effect.
	\fonttbl
		font table. See below.
	\colortbl
		color table. See below.
	\comment
		text will be ignored.

5. DOCUMENT FORMATTING PROPERTIES

(000 stands for a number which may be signed)

	\paperw000      paper width in twips            12240
	\paperh000      paper height                    15840
	\margl000       left margin                     1800
	\margr000       right margin                    1800
	\margt000       top margin                      1440
	\margb000       bottom margin                   1440
	\facingp        facing pages
	\gutter000      gutter width
	\deftab000      default tab width               720
	\widowctrl      enable widow control

	\endnotes       footnotes at end of section
	\ftnbj          footnotes at bottom of page     default
	\ftntj          footnotes beneath text (top just)

	\ftnstart000    starting footnote number        1
	\ftnrestart     restart footnote numbers each page
	\pgnstart000    starting page number            1
	\linestart000   starting line number            1
	\landscape      printed in landscape format

(the "next file" property will be encoded in the info text )


6. SECTION FORMATTING PROPERTIES
	\sectd          reset to default section properties

	\nobreak        break code
	\colbreak       break code                      default
	\pagebreak      break code
	\evenbreak      break code
	\oddbreak       break code
	\pgnrestart     restart page numbers at 1

	\pgndec         page number format decimal      default
	\pgnucrm        page number format uc roman
	\pgnlcrm        page number format lc roman
	\pgnucltr       page number format uc letter
	\pgnlcltr       page number format lc letter

	\pgnx000        auto page number x pos          720
	\pgny000        auto page number y pos          720
	\linemod000     line number modulus
	\linex000       line number - text distance     360

	\linerestart    line number restart at 1        default
	\lineppage      line number restart on each page
	\linecont       line number continued from prev section

	\headery000     header y position from top of page      720
	\footery000     footer y position from bottom of page   720

	\cols000        number of columns               1
	\colsx000       space between columns           720
	\endnhere       include endnotes in this section
	\titlepg        title page is special


7. PARAGRAPH FORMATTING PROPERTIES

	\pard           dreset to default para properties.
	\s000           style

	\ql             quad left                       default
	\ql             right
	\qj             justified
	\qc             centered

	\fi000          first line indent
	\li000          left indent
	\ri000          right indent
	\sb000          space before
	\sa000          space after
	\sl000          space between lines

	\keep           keep
	\keepn          keep with next para
	\sbys           side by side
	\pagebb         page break before
	\noline         no line numbering

	\brdrt          border top
	\brdrb          border bottom
	\brdrl          border left
	\brdrr          border right
	\box            border all around

	\brdrs          single thickness
	\brdrth         thick
	\brdrsh         shadow
	\brdrdb         double

	\tx000          tab position
	\tqr            right flush tab (these apply to last specified pos)
	\tqc            centered tab
	\tqdec          decimal aligned tab
	\tldot          leader dots
	\tlhyph         leader hyphens
	\tlul           leader underscore
	\tlth           leader thick line


8. CHARACTER FORMATTING PROPERTIES

	\plain          reset to default text properties.

	\b              bold
	\i              italic
	\strike         strikethrough
	\outl           outline
	\shad           shadow
	\scaps          small caps
	\caps           all caps
	\v              invisible text
	\f000           font number n
	\fs000          font size in half points        24

	\ul             underline
	\ulw            word underline
	\uld            dotted underline
	\uldb           double underline

	\up000          superscript in half points
	\dn000          subscript in half points

9. INFO GROUP

The plain text in the group is used to sepcify the various fields of
the information block. The current field may be thought of as a
particular setting of the "sub-destination" property of the text..
	\title          following plain text is the title
	\subject        following text is the subject
	\operator
	\author
	\keywords
	\doccomm        comments (not to be cofused with \comment )
	\version
	\nextfile       following text is name of "next" file

The other properties assign their parameters directly to the info block.
	\verno000       internal version number
	\creatim        creation time follows

	\yr000          year to be assigned to previously specified time field
	\mo000
	\dy000
	\hr000
	\min000
	\sec000

	\revtim         revision time follows
	\printtim       print time follows
	\buptim         backup time follows

	\edmins00       editing minutes
	\nofpages000
	\nofwords000
	\noofchars000
	\id000          internal id number

_

rhys@cs.uq.oz.au (Rhys Weatherley) (04/26/91)

In <1991Apr25.123140.17839@watserv1.waterloo.edu> tom@mims-iris.waterloo.edu (Tom Haapanen) writes:

>The following is a description of the RTF format (for Windows helpfiles)
>that I found on Compuserve.  I have uploaded it, along with the Xantippe
>hypertext authoring tool (which can export RTF) to cica as well.  Enjoy!
>[...]

Thanks for posting that Tom.  Now a little announcement.  Within the next 48
hours I will have the first version of a (free) RTF converter ready, called
the "Help Writer's Assistant" (HWA).  It takes plain ASCII files as input
that have special formatting commands in them, that look a little like TeX.
It writes out RTF suitable for use with the Help Compiler.  This is a "CFBT"
(Call For Beta-Testers :-) .  If you are interested in testing this tool, then
please contact me (soon), and I can send you a beta copy.  I hope to release
a full-blown version to the net sometime next week, either as a posting to
comp.windows.ms.programmer or by sending it to SIMTEL20 (or both).  The input
files look a little like the following:

	\topic{File Menu Commands}{filemenu}
	\keyword{File}
	\browse{mainmenu:003}

	The following commands appear on the file menu:

	\xref{New}{filemenunew}

	\xref{Open}{filemenuopen}

	\topic{Edit Menu Commands}{editmenu}
	\keyword{Edit}
	\browse{mainmenu:002}

	The following commands appear on the edit menu:

	\xref{Cut}{editmenucut}

	\xref{Paste}{editmenupaste}

	\topic{Open a file}{filemenuopen}
	\keyword{Open}
	\browse{filemenu:010}

	....

I think this format is infinitely more understandable and maintainable than
RTF itself, and also infinitely cheaper than buying Word.  You'll be able to
distribute any RTF files generated by my tool as far and wide as you please
with no mention needed of me (the "no-nonsense license" is very "forgiving").
You can also distribute my tool with your products (as is) for compiling
source code, etc if you wish.

At the moment, my tool cannot support all of the things that a full-blown
word-processor like Word can, but what I do have (and will have by the time
I send out beta copies) is a tool useful enough to generate most sorts of
help files.  If you want pretty pictures (other than simple bitmaps) and
fancy formats, then get a word-processor.  But usually having fancy help
data detracts from its usefulness anyway.

Cheers,

Rhys.

+=====================+==================================+
||  Rhys Weatherley   |  The University of Queensland,  ||
||  rhys@cs.uq.oz.au  |  Australia.  G'day!!            ||
+=====================+==================================+