[comp.editors] Editing Huge files

jfjr@mbunix.mitre.org (Freedman) (03/01/91)

  I am interested in buffer management when the files to be
edited are too big to fit in memory - How is this usually
handled? what are the trade offs involved? Advice,
code,pointers into literature, prayers for the success of
all are gratefully accepted.

                       Jerry Freedman,Jr

trier@cwlim.INS.CWRU.Edu (Stephen C. Trier) (03/02/91)

Funny, I was about to ask a related question...

Anyway, I've been working on an editor using a buffer-gap structure for
in-memory storage.  The first time I implemented paging for it, I used
a 64K array in memory for the data and paged data in and out of both ends
of the array as needed in order to keep the buffer gap centered and a
reasonable size.  I paged in 8K blocks, which I tracked with three stacks:
two for blocks off the top and bottom of the buffer, and one for free
blocks.

This technique turned out to have a number of disadvantages.  First off, it
tended to page more often than was strictly required.  I was not making
efficient use of the 64K buffer.  Second, a block caching system turned out
to be difficult to implement.  (This is an MS-DOS machine, so I planned a
64K main buffer and hoped to make use of the rest of RAM by caching blocks
blocks.)  Third, it was difficult to implement a mechanism for file
recovery after a crash.  Fourth, there was no convenient way to examine a
specific location in the file.

I've abandoned that development thread, since my code had a number of other
limitations.  (Hey, this is a learning thing!  :-)

What I'm considering now is the following:  The buffer is represented by an
array ~16K long.  This array contains the index numbers of the blocks of the
buffer, with the gap represented by 0 entries.  (Of course, the blocks on
either edge of the gap are only partially full.  All of the other blocks are
packed full of characters.)  As blocks need to be added, they are allocated
on either side.

This provides a nice analogy between the virtual and physical structures,
given a buffer-gap structure for the editor.  The cacheing structure becomes
essential to the editor, and memory-to-memory copies are minimized.  Since
the block structure is involved in all buffer operations, paging can be
reduced and a least-recently-used cache algorithm becomes easy.

Of course, it all looks golden now.  I haven't tried implementing it yet!
:-)  DNE (my editor) will be limited to 64K for its initial versions.
I'll have to do a rewrite in order to support the system I describe above.
(Isn't the definition of trouble "Thinking about 2.0 when 1.0 isn't
finished?"  :-)

Anyway, I'm interested in comments about the first and second paging
structures.  Perhaps there's something I'm missing that makes another 
approach more effective.

-- 
Stephen Trier                              Case Western Reserve University
Work: trier@cwlim.ins.cwru.edu             Information Network Services
Home: sct@seldon.clv.oh.us               %% Any opinions above are my own. %%

fin@norge.unet.umn.edu (Craig A. Finseth) (03/05/91)

In article <1991Mar2.011344.18600@usenet.ins.cwru.edu> trier@po.CWRU.Edu writes:
>Anyway, I've been working on an editor using a buffer-gap structure for
>in-memory storage.  The first time I implemented paging for it, I used
>a 64K array in memory for the data and paged data in and out of both ends
>of the array as needed in order to keep the buffer gap centered and a
>reasonable size.  I paged in 8K blocks, which I tracked with three stacks:
>two for blocks off the top and bottom of the buffer, and one for free
>blocks.
>
>This technique turned out to have a number of disadvantages.  First off, it

The scheme that you describe is *not* buffer gap.  In a buffer gap
scheme, the only time that text is moved is when the gap is moved.
Your scheme can cause text motion when just the point is moved and
many editor command algorithms (e.g., go to line N) rely on being able
to efficiently move the point.

>Anyway, I'm interested in comments about the first and second paging
>structures.  Perhaps there's something I'm missing that makes another 
>approach more effective.

I suggest trying either:

1) a buffer gap system, or

2) a paged buffer gap system.  In this scheme, you divide the buffer
into many pages (~2K), each of which is a separate buffer gap editor.
You then use a linked list to thread these together.  If you have a
disk or extended memory "backing store," you can extend this to cover
files larger than 64K on a PC.

Craig A. Finseth			fin@unet.umn.edu [CAF13]
University Networking Services		+1 612 624 3375 desk
University of Minnesota			+1 612 625 0006 problems
130 Lind Hall, 207 Church St SE		+1 612 626 1002 FAX
Minneapolis MN 55455-0134, U.S.A.