[comp.editors] A new idea

pcg@thor.cs.aber.ac.uk (Piercarlo Grandi) (01/04/90)

There has been some discussion on my analysis of GNU Emacs memory
management, with various opinions. There has been (in
comp.editors) a call for editors ideas. The two will be merged in
this posting of mine.

There is (and this is much wieder subject) a fundamental divide
between thos that think that programming is building a tool that
works (often Americans) and those that think that programming is
communicating descriptions of entities and operations on them,
both to humans and computers (often Europeans). The first type of
programmers are 'hackers', and they aim for the 80% solution,
that is the solution that works for them 80% of the time, and
damn the rest; the major example of this attitude is Bill Joy;
the others aim for simple, documentable, complete solutions based
on well understood principles; major example is Dijkstra. (Note:
both Joy and Dijkstra are extreme examples).

I tend to think that I do belong to the Dijkstra group, not the
Joy crowd.

I have been thinking about designing my own proper editor, not as
a quick (or slow) hack, but as it should be done. I think I have
got a clue (IMNHO), and here it is. I think it would make a fine
Ph.D. thesis subject. It is an old idea of mine, but I haven't
yet seen it published (the closest thing is the concept of
Smalltalk browser).

How can we characterize an editor? It is clearly (to my jaundiced
eye) a set manipulator (secondarily, a set browser).  In
particular, I think we can restrict our discussion to
manipulations of sequences (Hoare's article in Structured
Programming), even if at least the TSS editor was key based. More
precisely, and editor should be equivalent to a generic sequence
facility. What kind of operations can be done on sequences? well,
add element, remove element, apply, concatenation, order
comaprison if applicable, ...

An editor is something that reads a sequence from a file into a
buffer, manipulates it, and then writes it back. It can take
advantage of the fact that while in the buffer manipulation need
not be strictly sequential, for example, if the entire sequence
can be held in it.

Notice that under this definition rn, mail -f, and a lot of other
programs are editors.

Indeed an editor should be generic in the sense that it operates
on sequences of entities, where entities may have any type; news
articles, mail messages, passwd lines, etc... An editor should
have a general purpose sequence management engine, and a number
of modules with a standardized interface that define the
semantics for the specific record type. Each record type module
should have primitives that indicate record boundaries, allow for
(potentially recursive!) intra record editing, display in a
pretty form the record contents, allow content based selection of
records. An editor should have an internal language that allows
the user to write programs using these operations through an high
level interface.

It is clear that a generic editor can be what a broswer is *and*
more.  I can easily conceive of an editor that in one window
allows me to edit a mailbox and in another a C source where each
file level entity is considered a record, and in another a list
of newgroups, where each newgroup is considered a record, and
another where one of these newgroups is displayed, and each
article is considered a record; even simple text may have
different modules, e.g. for files made up of lines composed of
ASCII characters or two byte chinese ones. I can easily conceive
of an editor whose 'lines' can take multiple lines of screen, and
have many fields.

In some sense much of the user friendliness of GNU Emacs comes
from such a view of the world, only that in GNU Emacs all types
of 'records' are forcibly mapped into ASCII lines, in an ad hoc
way. This is both inefficient and limiting.

As to implementation details, I would implement this editor by
mapping a file into memory (or just leaving it where it is), and
building a list of pointers to each record boundary in it;
performing updates by recording them in a log (either in memory
or in a file), as this avoids the need to copy the original, and
allows easy undo, threaded with the pointer list. Writing out
simply means then following the pointer list and writing out a
record a time (with obvious optimizations if records happen to be
contiguous in storage). If many updates are done, the user should
be given the option of trimming the log (forgoing some undo), or
to merge the log with the original and restart. Before any of
these two are necessary, the log of updated records should be
compacted, of course, every now and then, both as to useless
entires and as to clustering for better locality.

As per the Emacs cookbook I would have an editing engine distinct
from the redisplay engine, communicating via a window image
buffer. As per the old, good, Arizona idea of frontending ed(1) for
full screen editing, one could conceivably have them in distinct
processes, thus making possible to have several front ends, e.g.
one oriented to full screen editing, one for teletypes, one for
X, etc...

The editing engine and the display engine can be *entirely*
independent of the record semantics provided in each specialized
module. As an optimization, to avoid the necessary indirection
overhead in the most common case, I can conceive the module for
records of ASCII characters terminated by newline being compiled in
and special cased, to reduce path length.

As I see it, the right way to design such an editor would be to
carefully analyze the 'sequence of records' type constructor,
define a suitable abstract interface for it, making sure it makes
semantic good sense, and then define what kind of operations on
the single records are needed to be available to support the
higher level interface. But alas, this is not hacking...

Any takers ?

--
Piercarlo "Peter" Grandi           | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcvax!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

fin@uh.msc.umn.edu (Craig Finseth) (01/04/90)

	There is (and this is much wieder subject) a fundamental divide
	between thos that think that programming is building a tool that
	works (often Americans) and those that think that programming is
	communicating descriptions of entities and operations on them,
	both to humans and computers (often Europeans). The first type of
	programmers are 'hackers', and they aim for the 80% solution,
	that is the solution that works for them 80% of the time, and
	damn the rest; the major example of this attitude is Bill Joy;
	the others aim for simple, documentable, complete solutions based
	on well understood principles; major example is Dijkstra. (Note:
	both Joy and Dijkstra are extreme examples).

I am very uncomfortable with this division.  I prefer a three-way
division:

- Those that aim for the 80% solution (hackers: "see, it works for me").

- Those that aim for good textbook cases (academics: "see this elegant
algorithm?  so what if it scales O(n^n)").

- Those that design, produce, and document complete solutions.  Such
solutions perform well, have good user interfaces, are easy to
maintain, and fit well into the system as a whole.


(Many well-thought out general statements omitted.)


	In some sense much of the user friendliness of GNU Emacs comes
	from such a view of the world, only that in GNU Emacs all types
	of 'records' are forcibly mapped into ASCII lines, in an ad hoc
	way. This is both inefficient and limiting.

Ah, but it is very powerful and consistent.  This very issue comes up
time and again with structure editors.  Yes, it is very pure and
consistent to build an editor in terms of objects.  However, IT IS
IMPORTANT THAT THE EDITOR'S VIEW OF THE OBJECTS IS WELL-MATCHED TO THE
USER'S VIEW.  (Note carefully which view is variable and which is
constant.)

I view the Emacs interface as very consistent.  All of my learning
regarding editing this message carries directly over to editing C
code, to editing nroff source, and to patching binary files.

Many people have developed many different structure editors.  I have
yet to see one that has a user interface with the same degree of
consistency, carryover, and power as the Emacs interface.  I'm not
(yet) saying that it can't be done: I'm saying that it has not been
done and you may wish to think twice about the scope of the problem
that you're tackling.

	As to implementation details, I would implement this editor by
	mapping a file into memory (or just leaving it where it is), and
	building a list of pointers to each record boundary in it;
	performing updates by recording them in a log (either in memory
	or in a file), as this avoids the need to copy the original, and
	allows easy undo, threaded with the pointer list. Writing out
	simply means then following the pointer list and writing out a
	record a time (with obvious optimizations if records happen to be
	contiguous in storage). If many updates are done, the user should
	be given the option of trimming the log (forgoing some undo), or
	to merge the log with the original and restart. Before any of
	these two are necessary, the log of updated records should be
	compacted, of course, every now and then, both as to useless
	entires and as to clustering for better locality.

This is a good design in theory.  Does it scale well in practice?  In
particular, as redisplay -- not editing -- has been shown by your own
data to be the sticking point, how well does this representation
speed up redisplay?

(The best argument that I am aware of for linked line representation
is the performance boost that it gives to redisplay.)

	As per the Emacs cookbook I would have an editing engine distinct
	from the redisplay engine, communicating via a window image
	buffer.....

The "cookbook" also pointed out that the "pure" model would in general
not perform well enough.  Well-designed "behind the scenes" hooks
between the edting and redisplay engines are in general necessary.

	The editing engine and the display engine can be *entirely*
	independent of the record semantics provided in each specialized
	module. As an optimization, to avoid the necessary indirection
	overhead in the most common case, I can conceive the module for
	records of ASCII characters terminated by newline being compiled in
	and special cased, to reduce path length.

You're already bringing in those pesky details which clutter up
elegant algorithms.

	As I see it, the right way to design such an editor would be to
	carefully analyze the 'sequence of records' type constructor,
	define a suitable abstract interface for it, making sure it makes
	semantic good sense, and then define what kind of operations on
	the single records are needed to be available to support the
	higher level interface. But alas, this is not hacking...

Neither will it lead to a pure, elegant program.  Good software
engineering (the field that many of us claim to practice) must
simultaneously encompass high level design and nitty-gritty detail.
As with all engineering, its essence is compromise.

Your model is a good solid one.  It must be shaped by all external
constraints (user interface, machine performance limits,
representations of the objects to be manipulated, etc.) into an actual
program product.  In the process, the model will be refined and
changed, most often by compromises that are mandated by external
constraints and that you would rather not have to make.

Craig A. Finseth			fin@msc.umn.edu [CAF13]
Minnesota Supercomputer Center, Inc.	+1 612 624 3375

spencer@pyr.gatech.EDU (Spencer Rugaber) (01/05/90)

What you are talking about sounds like what was done for TEN/PLUS.  This
is an editing system that is provided with AIX on IBM RT's.  It featured
an editor capable of editing a large and extensible set of document types
including text files, mail boxes, hierarchical data files, C programs, etc.
The semantics of the editing/browsing/file manipulation are separated
from the semanatics of the application, both internally and at the user
interface.  Thus, deleting a message from a mailbox, a line from a file,
or a file from a directory, are accomplished with the same command by
the user, but with different results in the system.

The system is extensible and features a scripting language so that simple
applications can be written by non-programmers.  It can also be extended
to new types of files by experienced C programmers.

(Note: This system has been in existence since 1983, but I am unsure
which of the features I have mentioned are actually available on the
RT.)

Spencer
-- 
SPENCER RUGABER
Georgia Insitute of Technology, Atlanta Georgia, 30332
...!{allegra,amd,hplabs,ihnp4,seismo,ut-ngp}!gatech!spencer

tbray@watsol.waterloo.edu (Tim Bray) (01/09/90)

In article <PCG.90Jan3180137@thor.cs.aber.ac.uk> pcg@thor.cs.aber.ac.uk 
(Piercarlo Grandi) paints a vision of the editor of the future, with only
a moderate amount of handwaving.

First off, the idea of separating the display logic from the buffer
management logic is 100% right; among other things, it gives you the
ability to write outlandish new display schemes for outlandish new
data.

A subtle but important point: one of the most powerful models of text is
as a linear sequence of bytes, without reference to `records' of any
kind.  This model is behind the strength of Unix.  Sadly, none of the
popular unix editors handle record-less files very well.  (Yes, I know
emacs can take it, but is extremely stupid about redisplay and performance
goes down the tubes).  As texts grow larger and are treated more as
database contents & less as typewriter-output, the concept of a 'line'
becomes less and less useful until it is a dangerous hindrance [Cf. Bray,
`Lessons of the New OED Project', Proc. Winter '89 Usenix].  Anyhow,
don't build any assumptions about occurrences of the '\n' character into 
your software.

Also, pcg@aber says:

>There is (and this is much wieder subject) a fundamental divide...
>The first type of
>programmers are 'hackers', ...
>the major example of this attitude is Bill Joy;

Well, since this is comp.editors and you mention Joy, we gotta talk vi.
I personally use emacs, but have a great deal of experience with vi,
including work on a port to VMS.  Vi is old-fashioned and inflexible, but
it provides an amazingly functional interface while consuming almost no
machine resources.  Some of the things inside the implementation are very
elegant.  Credit where credit is due.

Cheers, Tim Bray,
New OED Project         \
 -and-                   > Waterloo, Ontario, Canada
Open Text Systems, Inc. /

scs@iti.org (Steve Simmons) (01/10/90)

tbray@watsol.waterloo.edu (Tim Bray) writes:

>In article <PCG.90Jan3180137@thor.cs.aber.ac.uk> pcg@thor.cs.aber.ac.uk 
>(Piercarlo Grandi) paints a vision of the editor of the future, with only
>a moderate amount of handwaving. . . .

>As texts grow larger and are treated more as
>database contents & less as typewriter-output, the concept of a 'line'
>becomes less and less useful until it is a dangerous hindrance [Cf. Bray,
>`Lessons of the New OED Project', Proc. Winter '89 Usenix].  Anyhow,
>don't build any assumptions about occurrences of the '\n' character into 
>your software.

>Cheers,
>Tim Bray, New OED Project  -and-  Open Text Systems, Inc.

In the Q&A at that session I recall your being asked about the status
of the tools developed for the new OED -- editors, indexers, etc.  "With
only a moderate amount of handwaving" (grin) you said they would probably
become commercially available at some point in the future.  Two questions:

Would you please post some data here about those tools?  (seems like a
relevant post for this group), and

Is the future here yet?