blandy@marduk.cs.cornell.edu (Jim Blandy) (06/27/88)
Okay everyone - I'm sick and tired of the old linked-list-of-lines and array-of-pointers-to-lines tricks - I've been trying to think of a new, creative data structure where insertion is free, block copies are minimal, memory isn't wasted, and the buffalos roam free (or are they GNUs?) What's the best data structure you've come up with (or heard of) for representing a text buffer? Does it implement true markers (you set them, they stay with their text as things move around)? If a big file shrinks, will the space be returned to the free pool? My favorite is a combination of the 'gap-in-the-middle' approach and the linked list approach. In the gap-in-the-middle approach, the buffer is one contiguous block of memory, where the unused portion acts as a gap for insertion and deletion; you move the gap by shifting characters from one side of it to the other. Once you've got the gap where you want it, insertion is as easy as storing the characters and shrinking the gap; to delete, enlarge the gap to enclose the characters you want gone. (I got this approach from an editor published in some computer magazine WAY BACK. Byte, maybe.) The problem comes when the gap gets filled in; if you've got the whole address space to yourself, you just tell the user "tough luck," but if you're in a multitasking environment, you need to ask for more space; I'd make a linked list of blocks, each with its own gap. If a block is empty, you free its space. Okay, now I've given you my favorite data structure. Feel free to use it if you like. Now what's your favorite? If you'd get fired for revealing it, don't bother, but if you can, post a quick description. -- Jim Blandy - blandy@crnlcs.bitnet "insects were insects when man was just a burbling whatsit." - archie
daveb@llama.rtech.UUCP (Dave Brower) (06/28/88)
In article <18612@cornell.UUCP> blandy@cs.cornell.edu (Jim Blandy) writes: >My favorite is a combination of the 'gap-in-the-middle' approach >and the linked list approach. In the gap-in-the-middle approach, >the buffer is one contiguous block of memory, where the unused >portion acts as a gap for insertion and deletion; you move the gap >by shifting characters from one side of it to the other. Once you've >got the gap where you want it, insertion is as easy as storing the >characters and shrinking the gap; to delete, enlarge the gap to enclose >the characters you want gone. (I got this approach from an editor >published in some computer magazine WAY BACK. Byte, maybe.) The >problem comes when the gap gets filled in; if you've got the whole address >space to yourself, you just tell the user "tough luck," but if you're >in a multitasking environment, you need to ask for more space; I'd >make a linked list of blocks, each with its own gap. If a block >is empty, you free its space. This is basically what the Mince/Final Word editor uses, with a twist. The linked list and the descriptors for the blocks are kept in separate data structures, and the data blocks are a fixed size. With this scheme, they can be paged out to a "swap file" when memory runs short, or if you decide only to devote a certain amount of space to running your editor. Later Final Words would periodically write out all pages and stash some global data, allowing a complete in-context restart in the event of a machine crash. This is kind of neat. The down side to this scheme is that it makes maintaining the point and marks difficult. You need to adjust marks when the gap(s) move, and deal with multiple gaps when moving between pages. There's a lot of tradeoffs involved, and it's easy to see why multiple page/gaps aren't of much interest on a big-memory/VM machine. -dB {amdahl, cpsc6a, mtxinu, sun, hoptoad}!rtech!daveb daveb@rtech.com <- FINALLY!
jpayne%breakpoint@Sun.COM (Jonathan Payne) (06/28/88)
I like the buffer gap approach, too. If you have VM it makes a lot of sense. But I also like linked lists because it makes a complete redisplay very cheap. A typical redisplay algorithm assigns a unique ID to every line that's displayed on the screen, and whenever an update is done the new ID's are compared to the old ones, and lines are inserted or deleted (from the screen) or just redrawn as necessary. With a linked list the unique ID can be just the pointer to the line, which is very quick to calculate. In Gosling's emacs the unique ID is gotten by hashing on the contents of the line, which is rather time consuming. Do you know a good way to do intelligent (i.e., insert/delete line) redisplay with a buffer gap scheme, without having to set all sorts of flags to speed things up? How does GNU do it? I know it started out as Gosling's emacs, but the last time I looked I didn't see any code hashing on hashing on the contents of line.
graham@sce.UUCP (Doug Graham) (06/29/88)
> published in some computer magazine WAY BACK. Byte, maybe.) The > problem comes when the gap gets filled in; if you've got the whole address > space to yourself, you just tell the user "tough luck," but if you're > in a multitasking environment, you need to ask for more space; I'd > make a linked list of blocks, each with its own gap. If a block > is empty, you free its space. Another problem with the single chunk of memory with a gap in the middle, is that a lot of time can be spent copying text to one end or the other of the memory space whenever the gap must be moved. This becomes very apparent when editing a large file on slow machine like this little PC here. Seems that your approach would solve this problem because the maximum amount of text that would have to be copied could be limited to the block size. But I think you could wind up wasting a lot of memory with half filled blocks. On a related note, I would like to Jonathan Payne about the structure he used for Jove. It uses a linked list of line structures kept in memory. The actual text of the line is kept in "virtual memory" with a virtual address in the line structure pointing to it. I would like to know why he chose to keep the line structures in memory rather than putting them in "virtual memory" as well. This decision means that Jove cannot be used to edit files containing a large number of lines on machines with a limited amount of memory. Are there advantages to doing things this way? Doug.
jpayne%breakpoint@Sun.COM (Jonathan Payne) (06/30/88)
In article <394@sce.UUCP> graham@sce.UUCP (Doug Graham) writes: >On a related note, I would like to [ask] Jonathan Payne about the structure >he used for Jove. It uses a linked list of line structures kept >in memory. The actual text of the line is kept in "virtual memory" >with a virtual address in the line structure pointing to it. >I would like to know why he chose to keep the line >structures in memory rather than putting them in "virtual memory" >as well. This decision means that Jove cannot be used to edit >files containing a large number of lines on machines with a limited >amount of memory. Are there advantages to doing things this way? > >Doug. Ignorance. You kidding? I hadn't even HEARD of virtual memory when I started JOVE. A neat idea, I guess. I always hated the way JOVE's main limitation was the number of lines it could hold. Of course, I haven't noticed ever since I switched to a VAX, and now to Sun's. I love pdp11's, but Boy! who wants to think about memory in such silly ways as that anymore? Oh, and you'd be amazed how fast a busy computer can copy a megabyte when the gap moves. (Or maybe you wouldn't.) Mind-boggling. Computers are so fast, even when they're slow.