[fa.editor-p] Buffer representations

ARPAVAX:C70:editor-people (09/04/82)

>From Hess.Unicorn@MIT-MULTICS Sat Sep  4 03:22:47 1982
Based upon experience designing new things into a multi-page buffer-gap
editor (Mince), I strongly suggest one of the linked-list-of-lines
representationss if you can spare the space and can GC at all.

Mark of the Unicorn discovered that you start paying penalties in compute
time when you try to do video terminal insert/delete line and multi-window
screens (horizontal and vertical) on a continuous blob of characters --
you either have to search for and count newline characters, or add "screen
marks" (buffer pointers akin to The Mark) which are always updated with
each character insertion (expensive).  And adding anything like region
display where the region to be wiped is displayed in inverse video and
grows and shrinks as you move the point around costs a lot with this
scheme, too.  Truly losing.  Even if you have to have some of these screen
marks in a linked-line approach, at least they can be partitioned so that
you know that you don't have to update a certain subset of marks when you
insert text on the current line.

Why then did we choose to use the buffer gap approach?  We were memory
limited and page store was user-visible (floppy-based CP/M); this made GC
ridiculously slow and noisy.  Perhaps on a larger machine, you wouldn't
notice the CPU and disk overhead as much as we did on a Z80 chip...

ARPAVAX:C70:editor-people (09/14/82)

>From KLH@MIT-MC Tue Sep 14 06:55:18 1982
For what it's worth, when I developed ELLE I decided to
go with a linked-list-of-text scheme.  I deliberately wanted to
avoid the list-of-lines strategy since that is what its predecessor
TORES was using, and I was sick of the problems associated with it.

	One advantage of this approach is that it doesn't matter WHAT
is in a node of the list; the editor works whether your lines are 80
or 8000000 characters long.  I specifically wanted to be able to
"edit" core dumps, for example.  If your memory space is a problem
(ELLE was intended for use with machines limited to 16 bits of
address space, such as the 11/40) and your scheme requires having a
"current line" entirely in memory, then eventually you will blow
yourself up.  You may blow yourself up anyway if you don't have a way
to merge your linked lists of lines or swap out the list of
swapped-out lines, etc etc ad nauseam, which seems to be emulating
the list-of-text approach.

	Note that a key feature of this scheme is that a node need
not hold text in memory.  It can simply point to where the text was
swapped out on disk.  This is one of the other important reasons I
went with this approach; if all the text could be kept in memory
all the time, I doubt I'd find a linked list as attractive.

	Another advantage is that it is easy to merge and split
blocks freely -- you don't have to worry about "line boundaries".
In fact, reading a file into the buffer consists simply of creating
a node which points to the entire file on disk; chunks are read in as
needed.  If the buffer is never modified, the GC can always reduce
this list back to a single node.

	It's true that this requires clever programming and some
crunching during redisplay, which does use char/line I/D (just to
emphasize I wasn't punting anything).  The redisplay data structure
in ELLE is a table of lines, one for each physical screen line, in
which the buffer data is duplicated.  Routines which modify the
buffer use a "core" set of routines which know how to inform
redisplay of the exact range of modifications.  I could go on, but
most of this seems pretty standard, and people who want more details
can just ask.

	As for performance, nobody has ever complained that ELLE was
too slow.  It works well, which is the main point.  It is possible to
ask it to do things that take a long time, but those things (like
upper-casing everything in a million-byte file) would be pretty slow
no matter what scheme you used!

	Of course, if your machine has a huge address space, forget
all this hair and simply use the buffer-gap method, keeping everything
in memory.  It's simpler.  This is how the real ITS EMACS (TECO) works,
by the way.