[comp.editors] Buffer data structures

blandy@marduk.cs.cornell.edu (Jim Blandy) (06/27/88)

Okay everyone - I'm sick and tired of the old linked-list-of-lines
and array-of-pointers-to-lines tricks - I've been trying to think of
a new, creative data structure where insertion is free, block copies
are minimal, memory isn't wasted, and the buffalos roam free (or are
they GNUs?)  What's the best data structure you've come up with
(or heard of) for representing a text buffer?  Does it implement true
markers (you set them, they stay with their text as things move around)?
If a big file shrinks, will the space be returned to the free pool?

My favorite is a combination of the 'gap-in-the-middle' approach
and the linked list approach.  In the gap-in-the-middle approach,
the buffer is one contiguous block of memory,  where the unused
portion acts as a gap for insertion and deletion; you move the gap
by shifting characters from one side of it to the other.  Once you've
got the gap where you want it, insertion is as easy as storing the
characters and shrinking the gap; to delete, enlarge the gap to enclose
the characters you want gone.  (I got this approach from an editor
published in some computer magazine WAY BACK.  Byte, maybe.)  The
problem comes when the gap gets filled in; if you've got the whole address
space to yourself, you just tell the user "tough luck," but if you're
in a multitasking environment, you need to ask for more space;  I'd
make a linked list of blocks, each with its own gap.  If a block
is empty, you free its space.

Okay, now I've given you my favorite data structure.  Feel free to use
it if you like.  Now what's your favorite?  If you'd get fired for
revealing it, don't bother, but if you can, post a quick description.
--
Jim Blandy - blandy@crnlcs.bitnet
"insects were insects when man was just a burbling whatsit."  - archie

jpayne%breakpoint@Sun.COM (Jonathan Payne) (06/28/88)

I like the buffer gap approach, too.  If you have VM it makes a lot of
sense.  But I also like linked lists because it makes a complete
redisplay very cheap.  A typical redisplay algorithm assigns a unique ID
to every line that's displayed on the screen, and whenever an update is
done the new ID's are compared to the old ones, and lines are inserted
or deleted (from the screen) or just redrawn as necessary.  With a linked
list the unique ID can be just the pointer to the line, which is very
quick to calculate.  In Gosling's emacs the unique ID is gotten by
hashing on the contents of the line, which is rather time consuming.

Do you know a good way to do intelligent (i.e., insert/delete line)
redisplay with a buffer gap scheme, without having to set all sorts of
flags to speed things up?  How does GNU do it?  I know it started out as
Gosling's emacs, but the last time I looked I didn't see any code hashing
on hashing on the contents of line.