[comp.emacs] Buffer data structures

blandy@marduk.cs.cornell.edu (Jim Blandy) (06/27/88)

Okay everyone - I'm sick and tired of the old linked-list-of-lines
and array-of-pointers-to-lines tricks - I've been trying to think of
a new, creative data structure where insertion is free, block copies
are minimal, memory isn't wasted, and the buffalos roam free (or are
they GNUs?)  What's the best data structure you've come up with
(or heard of) for representing a text buffer?  Does it implement true
markers (you set them, they stay with their text as things move around)?
If a big file shrinks, will the space be returned to the free pool?

My favorite is a combination of the 'gap-in-the-middle' approach
and the linked list approach.  In the gap-in-the-middle approach,
the buffer is one contiguous block of memory,  where the unused
portion acts as a gap for insertion and deletion; you move the gap
by shifting characters from one side of it to the other.  Once you've
got the gap where you want it, insertion is as easy as storing the
characters and shrinking the gap; to delete, enlarge the gap to enclose
the characters you want gone.  (I got this approach from an editor
published in some computer magazine WAY BACK.  Byte, maybe.)  The
problem comes when the gap gets filled in; if you've got the whole address
space to yourself, you just tell the user "tough luck," but if you're
in a multitasking environment, you need to ask for more space;  I'd
make a linked list of blocks, each with its own gap.  If a block
is empty, you free its space.

Okay, now I've given you my favorite data structure.  Feel free to use
it if you like.  Now what's your favorite?  If you'd get fired for
revealing it, don't bother, but if you can, post a quick description.
--
Jim Blandy - blandy@crnlcs.bitnet
"insects were insects when man was just a burbling whatsit."  - archie

daveb@llama.rtech.UUCP (Dave Brower) (06/28/88)

In article <18612@cornell.UUCP> blandy@cs.cornell.edu (Jim Blandy) writes:
>My favorite is a combination of the 'gap-in-the-middle' approach
>and the linked list approach.  In the gap-in-the-middle approach,
>the buffer is one contiguous block of memory,  where the unused
>portion acts as a gap for insertion and deletion; you move the gap
>by shifting characters from one side of it to the other.  Once you've
>got the gap where you want it, insertion is as easy as storing the
>characters and shrinking the gap; to delete, enlarge the gap to enclose
>the characters you want gone.  (I got this approach from an editor
>published in some computer magazine WAY BACK.  Byte, maybe.)  The
>problem comes when the gap gets filled in; if you've got the whole address
>space to yourself, you just tell the user "tough luck," but if you're
>in a multitasking environment, you need to ask for more space;  I'd
>make a linked list of blocks, each with its own gap.  If a block
>is empty, you free its space.

This is basically what the Mince/Final Word editor uses, with a twist.
The linked list and the descriptors for the blocks are kept in separate
data structures, and the data blocks are a fixed size.  With this
scheme, they can be paged out to a "swap file" when memory runs short,
or if you decide only to devote a certain amount of space to running
your editor.

Later Final Words would periodically write out all pages and stash some
global data, allowing a complete in-context restart in the event of a
machine crash.  This is kind of neat.

The down side to this scheme is that it makes maintaining the point and
marks difficult.  You need to adjust marks when the gap(s) move, and
deal with multiple gaps when moving between pages.  There's a lot of
tradeoffs involved, and it's easy to see why multiple page/gaps aren't of much interest on a big-memory/VM machine.

-dB
{amdahl, cpsc6a, mtxinu, sun, hoptoad}!rtech!daveb daveb@rtech.com <- FINALLY!

jpayne%breakpoint@Sun.COM (Jonathan Payne) (06/28/88)

I like the buffer gap approach, too.  If you have VM it makes a lot of
sense.  But I also like linked lists because it makes a complete
redisplay very cheap.  A typical redisplay algorithm assigns a unique ID
to every line that's displayed on the screen, and whenever an update is
done the new ID's are compared to the old ones, and lines are inserted
or deleted (from the screen) or just redrawn as necessary.  With a linked
list the unique ID can be just the pointer to the line, which is very
quick to calculate.  In Gosling's emacs the unique ID is gotten by
hashing on the contents of the line, which is rather time consuming.

Do you know a good way to do intelligent (i.e., insert/delete line)
redisplay with a buffer gap scheme, without having to set all sorts of
flags to speed things up?  How does GNU do it?  I know it started out as
Gosling's emacs, but the last time I looked I didn't see any code hashing
on hashing on the contents of line.

graham@sce.UUCP (Doug Graham) (06/29/88)

> published in some computer magazine WAY BACK.  Byte, maybe.)  The
> problem comes when the gap gets filled in; if you've got the whole address
> space to yourself, you just tell the user "tough luck," but if you're
> in a multitasking environment, you need to ask for more space;  I'd
> make a linked list of blocks, each with its own gap.  If a block
> is empty, you free its space.

Another problem with the single chunk of memory with a gap in the
middle, is that a lot of time can be spent copying text to one
end or the other of the memory space whenever the gap must be moved.
This becomes very apparent when editing a large file on slow machine
like this little PC here. Seems that your approach would solve this
problem because the maximum amount of text that would have to
be copied could be limited to the block size. But I think you
could wind up wasting a lot of memory with half filled blocks.

On a related note, I would like to Jonathan Payne about the structure
he used for Jove. It uses a linked list of line structures kept
in memory. The actual text of the line is kept in "virtual memory"
with a virtual address in the line structure pointing to it.
I would like to know why he chose to keep the line
structures in memory rather than putting them in "virtual memory"
as well. This decision means that Jove cannot be used to edit
files containing a large number of lines on machines with a limited
amount of memory. Are there advantages to doing things this way?

Doug.

jpayne%breakpoint@Sun.COM (Jonathan Payne) (06/30/88)

In article <394@sce.UUCP> graham@sce.UUCP (Doug Graham) writes:
>On a related note, I would like to [ask] Jonathan Payne about the structure
>he used for Jove. It uses a linked list of line structures kept
>in memory. The actual text of the line is kept in "virtual memory"
>with a virtual address in the line structure pointing to it.
>I would like to know why he chose to keep the line
>structures in memory rather than putting them in "virtual memory"
>as well. This decision means that Jove cannot be used to edit
>files containing a large number of lines on machines with a limited
>amount of memory. Are there advantages to doing things this way?
>
>Doug.


Ignorance.  You kidding?  I hadn't even HEARD of virtual memory when I
started JOVE.  A neat idea, I guess.  I always hated the way JOVE's main
limitation was the number of lines it could hold.  Of course, I haven't
noticed ever since I switched to a VAX, and now to Sun's.  I love
pdp11's, but Boy! who wants to think about memory in such silly ways as
that anymore?

Oh, and you'd be amazed how fast a busy computer can copy a megabyte when
the gap moves.  (Or maybe you wouldn't.)  Mind-boggling.  Computers are
so fast, even when they're slow.