[net.text] Structured Text Processing: First of a Series on TIDE

perlman@wivax.UUCP (Gary Perlman) (10/17/84)

I expect this will be the first of a series of notes,
so you might want to stay tuned or contact me for more detail.

I am beginning a project that should result in a public domain
text processing environment for reading, writing, analyzing,
and printing structured text.  The project is called TIDE,
and this note will detail its historical motivation, and the
etymology of the name.

  For the past several years, I have used the nroff text formatter
as my primary language for word processing.  With nroff, I have
used about a half dozen macro packages, "high level" languages
defined on top of nroff.  Macro packages are used to define
document parts like sections, paragraphs, lists, and so on.
Nroff doesn't know anything about document parts, only formatting,
but macros make similar parts look the same, so the desired effect
is obtained when the document is printed.
  Nroff macros do not help much with the writing process because
the syntactic sugar sprinkled all over the source file for a
document makes it hard to read.  The vi text editor has some
commands built in for dealing with sections and paragraphs,
and some places (like TRW) have modified vi to be a runtime
formatter of macros.  But for the most part, nroff makes writing
a source document a messy task.
  One topic of interest to me is text analysis like that done
by the Writer's Workbench.  Nroff source text interspersed with
the real text makes parsing sentences more difficult than it should be,
and it gets in the way of text analysis of high level units like sections.
  Still, I liked nroff and the macro packages because it was a huge
improvement over a plain text editor and formatting by hand.

I looked over these macro packages, and of course, there was
an awful lot of overlap BETWEEN packages: all had sections,
paragraphs, and many had lists.  What I found interesting
is that there was a lot of overlap WITHIN packages: sections
had headings, lists had headers, tables had titles, paragraphs
sometimes have labels.  Just about each unit of text has a TITLE
that gives a little information about the unit, and then there
is some INTRODUCTORY information: sections have topic paragraphs,
lists have a preamble (like before this list), tables have captions,
paragraphs have topic sentences.  Similarly, many units have
ENDING material after the DETAIL.  The ending is used to conclude,
summarize, or as a transition device.
  In order, Title, Introduction, Detail, Ending, spells TIDE.
One goal of the TIDE project is to aid in technical writing
with TIDE data structures.  Since its conception, the term
TIDE is used to refer to a Technical Information Documentation
Environment.

In future notes, I will describe the planned design of TIDE.
First, the TIDE block data structure will be formally defined;
there are two versions: the human readable file storage format,
and the C language data structure.  Second, I will describe
the C function programming primitives for reading, modifying,
and printing blocks of text.  Third I will describe some
prototype software with a user interface to allow user level
interaction with structured text.  Finally, I will describe
some new forms of interaction, direct manipulation of graphical
icons may be one, proposed for advanced user interfaces.

All this stuff will be available to interested parties,
in particular, I will be happy to share the C libraries
so others can develop their own TIDE related tools.
I welcome comments at any of these stages, though this
one may only generate 'tell me more' comments.  When I
discuss the data formats, I will be particularly interested
in constructive comments.

Gary Perlman/Wang Institute/Tyng Road/Tyngsboro, MA/01879/(617) 649-9731