[comp.sys.ibm.pc.programmer] Editing Huge Files, and strategies therein

Ralf.Brown@B.GP.CS.CMU.EDU (03/03/90)

In article <77832@tut.cis.ohio-state.edu>, schanck@harmonica.cis.ohio-state.edu (Christopher Schanck) wrote:
}With the recent discussion on comp.sys.ibm.pc on which editors will
}handle files larger than memory, no one has discussed any of the
}strategies used by these editors.  Blackbeard's, Norton's, Sprint,
}Multimate, Brief, and others were mentioned as editors which could do
}this. 
}
}Could somebody enlighten me as to how this is accomplished in some of
}these cases, or a general strategy for doing so for an editor?  I have
}not a lot of desire to reinvent the wheel, or reinvent it
}inefficiently, so I would appreciate some info on the topic itself, or
}on where I could find such info.

From my poking around in Sprint's swap file,  I was able to determine the
following:

	  the swap file is divided into 1K pages, which may hold up to 1018
	  bytes of data (six bytes are used for "next" and "prev" pointers
	  and a count of the actual amount of data stored in the page)

	  each file is stored as a doubly-linked circular list, whose head is
	  in a special control page (which themselves are in a d-l circular
	  list whose head is the first page in the swap file)

	  any memory available above the 220K needed for the Sprint executable
	  and fixed scratch space is used as a cache for the swap file

	  insertion of new text can cause the addition of a new page containing
	  only a few bytes into the middle of the file's linked list; thus,
	  it is possible for the file to use up considerably more space in
	  the swap file than when stored as plain text.  In my experience, the
	  swap file tends to grow by 20-25% over the course of editing a file
	  (even after accounting for the file's growth).

	  deletion of text presumably only changes the page(s) on which the
	  deleted text resided, leaving unused space at the end of the page.


Unless BlackBeard has changed its strategy since v7.39, it edits files larger
than the 50K edit buffer by reading in a chunk at a time and rewriting the
entire file from that point on when you move out of the current section of
the file.  Gets rather tedious when editing the start of a large file
(particularly since BB shares the edit buffer with its internal line pointers--
editing a 25000-line file means that a "current section" is less than 1K in
size!)

--
UUCP: {ucbvax,harvard}!cs.cmu.edu!ralf -=- 412-268-3053 (school) -=- FAX: ask
ARPA: ralf@cs.cmu.edu  BIT: ralf%cs.cmu.edu@CMUCCVMA  FIDO: Ralf Brown 1:129/46
"How to Prove It" by Dana Angluin              Disclaimer? I claimed something?
16. proof by cosmology:
    The negation of the proposition is unimaginable or meaningless.  Popular
    for proofs of the existence of God.

schanck@harmonica.cis.ohio-state.edu (Christopher Schanck) (03/04/90)

With the recent discussion on comp.sys.ibm.pc on which editors will
handle files larger than memory, no one has discussed any of the
strategies used by these editors.  Blackbeard's, Norton's, Sprint,
Multimate, Brief, and others were mentioned as editors which could do
this. 

Could somebody enlighten me as to how this is accomplished in some of
these cases, or a general strategy for doing so for an editor?  I have
not a lot of desire to reinvent the wheel, or reinvent it
inefficiently, so I would appreciate some info on the topic itself, or
on where I could find such info.

Followups to comp.sys.ibm.pc.programmer please.

Chris



-=-
"Brother, this trip is gonna make LSD seem like Aspirin!"
                            --- The Green Berets
Christopher Schanck (schanck@cis.ohio-state.edu)

lampi@stb.uucp (Michael Lampi) (03/08/90)

When I wrote the Danford Corp. FSE (Full Screen Editor) for use on Apollo 
workstations, I chose to implement a work file scheme using linked lists. 
Forward links pointed to subsequent lines, backward links pointed to the 
previous line. Each line had its own "internal" linked list, permitting 
unlimited line length, but divided into chunks of 64 bytes. This way, files 
wouldn't expand too much as they are copied into the work file.

You might ask "Doesn't it take a while to copy the input to the work file?", 
and you would be correct (for large files). However, FSE
didn't bother 
to read more than a couple of screens into the work file at startup, and then 
copied more as one proceeded into the file. If the user exited (writing the 
input to the output file), then it copied from the work file everything that 
was there, followed by the rest of the data from the input file. Very fast, 
minimal overhead. If one was just browsing, then startup/load time was 
negligible.

Using linked lists like this also provided an easy means of handling multiple 
active files and easy error recovery, especially since FSE had only a single 
line from the "current" window in memory at any time. All one could lose on a 
crash would be the changes to that line.

FSE had other advances, such as permitting almost any cursor-addressible 
terminal to be used with it, providing completely soft keyboards, pattern 
searches (better than unix), multiple windows, etc.

It was written in a standard high-level language that you almost never hear 
about these days - FORTRAN-77.