C70:editor-people (07/31/82)
>From Ellis.Cohen@CMU-10A Sat Jul 31 06:26:22 1982
The recent SIGPLAN Notices contained an article by Richard Waters
on why program editors should retain text-oriented commands. I
believe that it is not necessary to retain a text-oriented viewpoint
at all (except for dealing with text - i.e. comments, documentation).
Enclosed is a short article which lays out my reasons. Comments?
Text-Oriented Structure Commands for Structure Editors
Ellis Cohen
Computer Science Program
Brandeis University
Waltham MA 02254
Abstract
The main problems associated with program structure editors
are not inherent and can be solved without reverting to a
textual viewpoint. Cursor movements can be made more natural
by viewing the screen as a 2-D arrangement of nodes.
Expressions can be input from left to right by rebinding
operator keys to commands more complex than simple template
expansion. Transformations of program fragments can be
accomplished by an editor which supports matching and
instantiation of subtrees.
Introduction
Editors for programs typically operate using a textual viewpoint or a
parse-tree viewpoint, treating the screen as either a two-dimensional screen
of characters or as the @i(presentation) of a parse tree. These viewpoints
are not easily juxtaposed and there has been some discussion about when
text-based editing is appropriate [ Teitelbaum & Reps 81, Waters 82 ].
This note argues that a textual viewpoint is never necessary for convenient
and efficient program editing.
The textual viewpoint makes syntactic and semantic checking difficult or
expensive. If the program is maintained only in textual form, then searching
is constantly required to determine whether variables are declared and whether
expressions are legal. If the program is internally maintained in both
textual and parse-tree forms, then incremental parsing is constantly required
to keep the parse-tree consistent with the text. In either case, the exact
time at which checking or reparsing is to be done is unclear and facilities
for reporting and supporting the correction of errors must be provided.
The parse-tree viewpoint simplifies checking but, @i(as implemented in many
current systems), has a number of drawbacks: Cursor movement is tedious and
annoying, expressions must be input in Polish-prefix form, and textually
simple transformations are difficult. In fact, none of these difficulties are
inhertent to the parse-tree viewpoint.
Cursor Movement
Many people who have used tree-oriented cursor movement commands have found
them to be uncomfortable. I believe that 2-D commands are indeed preferable,
but @i(need not imply a textual viewoint). A better way of viewing the
presentation of a tree is as a two-dimensional layout of tree nodes on a
background (which for editing purposes can be totally ignored) of keywords.
The nodes are hierarchically orgainized (i.e. parent, children, grandchildren,
etc.), but the leaf nodes (identifiers, constants, and unexpanded templates)
stick out the most. I believe that the most natural cursor movement will
consist of up, down, left and right movements from one leaf node to another
leaf node, with movement out to a parent node as the last step. In other
words, in addition to cursor commands which follow the tree structure, I am
suggesting that the editor include commands like
beginning-leaf-in-line
ending-leaf-in-line
moves the cursor to the first or last leaf on the current line
forward-leaf
backwards-leaf
moves the cursor forward or backward to the last leaf
up-leaf-line
down-leaf-line
moves the cursor up or down to the next line which has a leaf node
and positions to the cursor at the first node on the line (or should
it be the leaf closest to the current position of the cursor).
parent
moves the cursor to the parent of the current node
An additional data structure needs to maintain the two-dimensional
arrangement of nodes with pointers back to the tree, but this can be
easily generated as a side-effect of generating the display.
Entering Expressions
Structure Editors require that a parent node be expanded before its children
can be filled in. When expressions are entered via the structure editor (as
in ALOE[ Medina-Mora & Feiler 81 ]), it appears that they must be input in
Polish prefix form. However, it is possible to obtain a left-to-right input
style by judicious rebinding of keys.
Ordinarily, if the cursor is at the template <expr>, the key '+' is
bound to a command like expand-add-expression, which expands the
template to <expr> + <expr>. If '+' is typed when the cursor is at
an already expanded node, an error occurs.
Instead, one could imagine '+' to be bound to a command like subsume-add-
expression, which could be defined (in an Emacs-MLisp like way [ Gosling 81 ]
as)
(defun
(subsume-add-expression
(delete-to-killbuffer)
(expand-add-expression)
(child)
(yank-from-killbuffer)
(forward-node)))
If the cursor were pointing at 17, then subsume-add-expression would
move the subtree 17 to the kill-buffer, expand an add expression
<expr> + <expr>, move the cursor to the left child and bring back the
contents of the kill-buffer, obtaining <17> + <expr>, and then move
the cursor to the <expr> node.
The subsume code actually needs to be a bit more complex to take care
of operator precedence correctly, since typing a '+' when the cursor is at
'b' in a * b should nonetheless delete and then yank a * b, not just
b. Parentheses also need to be handled correctly.
Transformations
The textual viewpoint often appears as if it should simplify various
transformations of program fragments. For example, to convert
IF xxx THEN yyy
to
WHILE xxx DO yyy
simply requires replacing IF with WHILE and THEN with DO. With many structure
editors, this operation would in fact be quite tedious. However, Mentor
[ Donzeau-Gouge 75 ] contains facilities for matching and instantiating
subtrees. A single command could be defined to effect the entire
transformation. Myte [ Rudell 82 ] under development at Brandeis extends this
facility and allows the transformation command to be bound to a key or
sequence of keystrokes. The binding could be made quite mnemonic -- for
example, ^\iw, where ^\ (control-backslash) means transform and "iw" stands
for "i"f to "w"hile. Using Mentor or Myte, it would be just as easy to bind
^\ir to a command which converts the if statement to
REPEAT yyy UNTIL xxx
or to bind ^\i~r to a command which would produce
REPEAT yyy UNTIL NOT (xxx)
References
V. Donzeau-Gouge et.al., "A Structure-Oriented Program Editor", Tech. Report,
IRIA-LABORIA, France 1975
J. Gosling, "Unix Emacs", Computer Science Dept., Carnegie-Mellon Univ.,
Pittsburgh PA, December 1981
R. Medina-Mora & P. Feiler, "An Incremental Programming Environment",
IEEE Transactions on Software Engineering 7,5 (Sept 81)
T. Teitelbaum & T. Reps, "The Cornell Program Synthesizer: A Syntax-Directed
Programming Evironment", CACM 24,9 (Sept 81)
R. Waters, "Program Editors Should Not Abandon Text Oriented Commands",
SIGPLAN Notices 17,7 (July 82)
M. Rudell, "MYTE: An Extensible Program Structure Editor", Tech. Report
CS-82-104, Computer Science Program, Brandeis University, Waltham MA, May 1982
C70:editor-people (08/07/82)
>From decvax!harpo!npoiv!npois!cbosgd!mark@Berkeley Sat Aug 7 05:32:52 1982 Neither Ellis Cohen's bibliography nor Richard Waters' mentions any of the literature on text-oriented language editors. People seem to think this means EMACS, but EMACS is NOT a language editor, it's just a very clever text editor. (A true language editor keeps more of a data structure than just text. I don't know of an EMACS implementation that keeps a parse tree or symbol table around.) I know of four text oriented language editor projects, which I'm enclosing in a bibliography at the end. I think Ellis Cohen is taking a middle ground - his ideal language editor appears to have a "correct but possibly incomplete" parse tree as a data structure, but a user interface that tries to be more text-oriented than that of template editors such as the Cornell Synthesizer and IPE. >From my point of view, there are two fundamental problems with template oriented language editors. First, the user interface forces the user to be painfully aware of the tree structure. Second, the data structure is overly restrictive of the user, because incorrect programs cannot be represented, and because the user can't format the program (indenting, comments) exactly as desired. Cohen's ideas appear to go far toward solving the first problem. Whether they can really be implemented in the general case is unclear - what he's proposing sounds like incremental parsing on every keystroke, with no lookahead. Sounds hard, although he gets to define his data structures and might include things other than a FSM and a stack. The whole parse tree is there, and auxillary data structures might be useful. Also, it might be possible for the parser to make the wrong decision, as long as it corrects itself when it sees the next few tokens. I claim it's crucial for the user to be able to sit down and type in his/her program EXACTLY as it appears written on a piece of paper. Otherwise there is yet another language the beginner has to learn, and this would be a significant learning barrier. I am not convinced that Cohen's idea can do this, although it might be able to. Maybe for a restricted set of grammars. Cohen's solution does not address my second objection, except briefly for issues such as turning if-then into while-do. I claim that you can't anticipate all the textual changes the user is going to want to make (to compensate for textual typos that were made when typing in the textual program), and that even if you can, forcing the user to learn a separate command for each one is not good human engineering. Let's face it, the text interface is very easy to learn and very powerful. Cohen's claim that text interface editors must do incremental parsing if they keep a parse tree is correct. And there is no denying that incremental scanning and parsing make the editor considerably more complex. However, in my thesis (I actually built one of these text interface language editors) I found that the incremental scanning and parsing is pretty cheap. The really expensive part is incremental semantic analysis, e.g. checking for undeclared variables and type mismatches. Both the template approach and the text approach have (almost) exactly the same problem here. Another problem with the template data structure is the handling of preprocessors. In a language such as C, it is nearly hopeless to provide the power of the C preprocessor using templates. Mark Horton Bell Telephone Laboratories C. N. Alberga, et al "A Program Development Tool", IBM TJ Watson Research Center, Yorktown Heights, NY 10598 (Sept 1979). M. R. Horton, "The Design of a Multi-Language Editor", Ph.D. Thesis, U.C. Berkeley, July 1981. J. M. Morris, et al, "The Design of a Language-Directed Editor for Block-Structured Languages", SIGPLAN June 1981. T.R. Wilcox, et al, "The Design and Implementation of a Table Driven, Interactive, Diagnostic Programming System", CACM November 1976.