tom@mims-iris.waterloo.edu (Tom Haapanen) (04/25/91)
The following is a description of the RTF format (for Windows helpfiles)
that I found on Compuserve. I have uploaded it, along with the Xantippe
hypertext authoring tool (which can export RTF) to cica as well. Enjoy!
[ \tom haapanen --- university of waterloo --- tom@mims-iris.waterloo.edu ]
[ "i don't even know what street canada is on" -- al capone ]
---------------------------------------------------------------------------
Specification for RTF
---------------------
RTF text is a form of encoding of various text formatting properties,
document structures, and document properties,
using the printable ASCII character set. Special characters can be also
thus encoded, although RTF does not prevent the utilization of character
codes outside the ASCII printable set.
The main encoding mechanism of "control words" provides a name space that
may be later used to expand the realm of RTF with macros, programming, etc.
1. BASIC INGREDIENTS
Control words are of the form:
\lettersequence <delimiter>
where <delimiter>. is:
. a space: the space is part of the control word.
. a digit or - means that a parameter follows. The following digit
sequence is then delimited by a space or any other
non-letter-or-digit as for control words.
. any other non-letter-or digit: terminates the control word, but is not
a part of the control word.
By "letter:, here we mean just the upper and lower case ASCII letters.
Control symbols consist of a \ character followed by a single nonletter.
They require no further delimiting.
Notes: control symbols are compact, but there are not too many
of them. The number of possible control words are not limited.
The parameter is partially incorporated in control symbols, so that
a program that does not understand a control symbol can recognize
and ignore the corresponding parameter as well.
In addition to control words and control symbols, there are also the braces:
{ group start, and
} group end.
The text grouping will be used for formatting and to delineate document
structure - such as the footnotes, headers, title, and so on.
The control words, control symbols, and braces constitute control information.
All other characters in RTF text constitute "plain text".
Since the characters \, {, and } have specific uses in RFT, the control
symbols \\,\{, and \} are provided to express the corresponding plain
characters.
2. WHAT RFT TEXT MEANS (SEMANTICS)
The reader of a RFT stream will be concerned with:
Separating control information from plain text.
Acting on control information. This is designed to be
a relatively simple process, as described below.
Some control information just contributes special
characters to the plain text stream. Other information
serves to change the "program state" which includes
properties of the document as a whole and also a stack
of "group states" that apply to parts.
Note that the group state is saved by the { brace and is
restored by the } brace. The current group state specifies:
1. the "destination" or part of the document that the
plain text is building up.
2. the character formatting properties - such as bold or
italic.
3. the paragraph formatting properties - such as justified.
4. the section formatting properties - such as number of
columns.
Collecting and properly disposing of the remaining "plain text"
as directed by the current group state.
In practice the RFT reader will proceed as follows:
0. read next char
1. if ={
stack current state. current state does not change.
continue.
2. if =}
unstack current state from stack. this will change the
state in general.
3. if =\
collect control word/control symbol and parameter, if any.
look up word/symbol in symbol table (a constant table)
and act according to the description there. The different
actions are listed below. Parameter is left available
for use by the action. Leave read pointer before or after
the delimiter, as appropriate. After the action, continue.
4. otherwise, write "plain text" character to current destination
using current formatting properties.
Given a symbol table etry, the possible actions are as follows:
A. Change destination:
change destination to the destination described in the entry.
Most destination changes are legal only immediately after a {. Other restrictions
may also apply (for example, footnotes may not be nested.)
B. Change formatting property:
The symbol table entry will describe the property and
whether the parameter is required.
C. Special character:
The symbol table entry will describe the character code..
goto 4.
D. End of paragraph
This could be viewed as just a special character.
E. End of section
This could be viewed as just a special character.
F. Ignore
3. SPECIAL CHARACTERS
The special characters are explained as they exist in Mac Word. Clearly,
other characters may be added for interchange with other programs. If
a character name is not recognized by a reader, according to the rules
described above, it will be simply ignored.
\chpgn current page number (as in headers)
\chftn auto numbered footnote reference
(footnote to follow in a group)
\chpict placeholder character for picture
(picture to follow in a group)
\chdate current date (as in headers)
\chtime current time (as in headers)
\| formula character
\~ non-breaking space
\- non-required hyphen
\_ non-breaking hyphen
\page required page break
\line required line break (no paragraph break)
\par end of paragraph.
\sect end of section and end of paragraph.
\tab same as ASCII 9
For simplicity of opertation, the ASCII codes 9 and 10 will be accepted
as \tab and \par respectively. ASCII 13 will be ignored. The control
code \<10> will be ignored. It may be used to include "soft"
carriage returns for easier readibility but which will have no effect
on the interpretation.
4. DESTINATIONS
The change of destination will reset all properties to default.
Changes are legal only at the beginning of a group (by group here
we mean the text and controls enclosed in braces.)
\rtf<param>
The destination is the document. The parameter is the
version number of the writer. This destination preceded
by { the beginnings of RTF documents and the corresponding }
marks the end.
Legal only once after the initial {.
Small scale interchange of RTF where other methods for
marking the end of string are available, as in a string
constant, need not include this identification but will
start with this destination as the default.
\pict
The destination is a picture. The group must immediately
follow a \chpict character. The plain text describes
the picture as a hex dump (string of characters 0,1,...
9, a, ..., e, f.)
(Formatting properties to determine data interpretation,
size)
\footnote
The destination is a footnote text. The group must
immediately follow the footntoe reference character(s).
\header
The destination is the header text for the current section.
The group must precede the first plain text character
in the section.
\headerl
Same as above, but header for left-hand pages.
\headerr
Same as above, but header for right-hand pages.
\headerf
Same as above, but header for first page.
\footer
Same as above, but footer.
\footerl
Same as above, but footer for left-hand pages.
\footerr
Same as above, but footer for right-hand pages.
\footerf
Same as above, but header for first page.
\ftnsep
Same as above, but text is footnote separator
\ftnsepc
Same as above, but text is separator for continued footnotes.
\ftncn
Same as above, but text is continued footnote notice.
\info
text is information block for the document. Parts of the
text is further classified by "properties" of the text
that are listed below - such as "title". These are not
formatting properties, but a device to delimit and identify
parts of the info from the text in the group.
\stylesheet
text is the style sheet for the document.
More precisely, text between semicolons are taken to be
style names which will be defined to stand for the
formatting properties which are in effect.
\fonttbl
font table. See below.
\colortbl
color table. See below.
\comment
text will be ignored.
5. DOCUMENT FORMATTING PROPERTIES
(000 stands for a number which may be signed)
\paperw000 paper width in twips 12240
\paperh000 paper height 15840
\margl000 left margin 1800
\margr000 right margin 1800
\margt000 top margin 1440
\margb000 bottom margin 1440
\facingp facing pages
\gutter000 gutter width
\deftab000 default tab width 720
\widowctrl enable widow control
\endnotes footnotes at end of section
\ftnbj footnotes at bottom of page default
\ftntj footnotes beneath text (top just)
\ftnstart000 starting footnote number 1
\ftnrestart restart footnote numbers each page
\pgnstart000 starting page number 1
\linestart000 starting line number 1
\landscape printed in landscape format
(the "next file" property will be encoded in the info text )
6. SECTION FORMATTING PROPERTIES
\sectd reset to default section properties
\nobreak break code
\colbreak break code default
\pagebreak break code
\evenbreak break code
\oddbreak break code
\pgnrestart restart page numbers at 1
\pgndec page number format decimal default
\pgnucrm page number format uc roman
\pgnlcrm page number format lc roman
\pgnucltr page number format uc letter
\pgnlcltr page number format lc letter
\pgnx000 auto page number x pos 720
\pgny000 auto page number y pos 720
\linemod000 line number modulus
\linex000 line number - text distance 360
\linerestart line number restart at 1 default
\lineppage line number restart on each page
\linecont line number continued from prev section
\headery000 header y position from top of page 720
\footery000 footer y position from bottom of page 720
\cols000 number of columns 1
\colsx000 space between columns 720
\endnhere include endnotes in this section
\titlepg title page is special
7. PARAGRAPH FORMATTING PROPERTIES
\pard dreset to default para properties.
\s000 style
\ql quad left default
\ql right
\qj justified
\qc centered
\fi000 first line indent
\li000 left indent
\ri000 right indent
\sb000 space before
\sa000 space after
\sl000 space between lines
\keep keep
\keepn keep with next para
\sbys side by side
\pagebb page break before
\noline no line numbering
\brdrt border top
\brdrb border bottom
\brdrl border left
\brdrr border right
\box border all around
\brdrs single thickness
\brdrth thick
\brdrsh shadow
\brdrdb double
\tx000 tab position
\tqr right flush tab (these apply to last specified pos)
\tqc centered tab
\tqdec decimal aligned tab
\tldot leader dots
\tlhyph leader hyphens
\tlul leader underscore
\tlth leader thick line
8. CHARACTER FORMATTING PROPERTIES
\plain reset to default text properties.
\b bold
\i italic
\strike strikethrough
\outl outline
\shad shadow
\scaps small caps
\caps all caps
\v invisible text
\f000 font number n
\fs000 font size in half points 24
\ul underline
\ulw word underline
\uld dotted underline
\uldb double underline
\up000 superscript in half points
\dn000 subscript in half points
9. INFO GROUP
The plain text in the group is used to sepcify the various fields of
the information block. The current field may be thought of as a
particular setting of the "sub-destination" property of the text..
\title following plain text is the title
\subject following text is the subject
\operator
\author
\keywords
\doccomm comments (not to be cofused with \comment )
\version
\nextfile following text is name of "next" file
The other properties assign their parameters directly to the info block.
\verno000 internal version number
\creatim creation time follows
\yr000 year to be assigned to previously specified time field
\mo000
\dy000
\hr000
\min000
\sec000
\revtim revision time follows
\printtim print time follows
\buptim backup time follows
\edmins00 editing minutes
\nofpages000
\nofwords000
\noofchars000
\id000 internal id number
_
rhys@cs.uq.oz.au (Rhys Weatherley) (04/26/91)
In <1991Apr25.123140.17839@watserv1.waterloo.edu> tom@mims-iris.waterloo.edu (Tom Haapanen) writes: >The following is a description of the RTF format (for Windows helpfiles) >that I found on Compuserve. I have uploaded it, along with the Xantippe >hypertext authoring tool (which can export RTF) to cica as well. Enjoy! >[...] Thanks for posting that Tom. Now a little announcement. Within the next 48 hours I will have the first version of a (free) RTF converter ready, called the "Help Writer's Assistant" (HWA). It takes plain ASCII files as input that have special formatting commands in them, that look a little like TeX. It writes out RTF suitable for use with the Help Compiler. This is a "CFBT" (Call For Beta-Testers :-) . If you are interested in testing this tool, then please contact me (soon), and I can send you a beta copy. I hope to release a full-blown version to the net sometime next week, either as a posting to comp.windows.ms.programmer or by sending it to SIMTEL20 (or both). The input files look a little like the following: \topic{File Menu Commands}{filemenu} \keyword{File} \browse{mainmenu:003} The following commands appear on the file menu: \xref{New}{filemenunew} \xref{Open}{filemenuopen} \topic{Edit Menu Commands}{editmenu} \keyword{Edit} \browse{mainmenu:002} The following commands appear on the edit menu: \xref{Cut}{editmenucut} \xref{Paste}{editmenupaste} \topic{Open a file}{filemenuopen} \keyword{Open} \browse{filemenu:010} .... I think this format is infinitely more understandable and maintainable than RTF itself, and also infinitely cheaper than buying Word. You'll be able to distribute any RTF files generated by my tool as far and wide as you please with no mention needed of me (the "no-nonsense license" is very "forgiving"). You can also distribute my tool with your products (as is) for compiling source code, etc if you wish. At the moment, my tool cannot support all of the things that a full-blown word-processor like Word can, but what I do have (and will have by the time I send out beta copies) is a tool useful enough to generate most sorts of help files. If you want pretty pictures (other than simple bitmaps) and fancy formats, then get a word-processor. But usually having fancy help data detracts from its usefulness anyway. Cheers, Rhys. +=====================+==================================+ || Rhys Weatherley | The University of Queensland, || || rhys@cs.uq.oz.au | Australia. G'day!! || +=====================+==================================+