blandy@marduk.cs.cornell.edu (Jim Blandy) (06/27/88)
Okay everyone - I'm sick and tired of the old linked-list-of-lines and array-of-pointers-to-lines tricks - I've been trying to think of a new, creative data structure where insertion is free, block copies are minimal, memory isn't wasted, and the buffalos roam free (or are they GNUs?) What's the best data structure you've come up with (or heard of) for representing a text buffer? Does it implement true markers (you set them, they stay with their text as things move around)? If a big file shrinks, will the space be returned to the free pool? My favorite is a combination of the 'gap-in-the-middle' approach and the linked list approach. In the gap-in-the-middle approach, the buffer is one contiguous block of memory, where the unused portion acts as a gap for insertion and deletion; you move the gap by shifting characters from one side of it to the other. Once you've got the gap where you want it, insertion is as easy as storing the characters and shrinking the gap; to delete, enlarge the gap to enclose the characters you want gone. (I got this approach from an editor published in some computer magazine WAY BACK. Byte, maybe.) The problem comes when the gap gets filled in; if you've got the whole address space to yourself, you just tell the user "tough luck," but if you're in a multitasking environment, you need to ask for more space; I'd make a linked list of blocks, each with its own gap. If a block is empty, you free its space. Okay, now I've given you my favorite data structure. Feel free to use it if you like. Now what's your favorite? If you'd get fired for revealing it, don't bother, but if you can, post a quick description. -- Jim Blandy - blandy@crnlcs.bitnet "insects were insects when man was just a burbling whatsit." - archie
jpayne%breakpoint@Sun.COM (Jonathan Payne) (06/28/88)
I like the buffer gap approach, too. If you have VM it makes a lot of sense. But I also like linked lists because it makes a complete redisplay very cheap. A typical redisplay algorithm assigns a unique ID to every line that's displayed on the screen, and whenever an update is done the new ID's are compared to the old ones, and lines are inserted or deleted (from the screen) or just redrawn as necessary. With a linked list the unique ID can be just the pointer to the line, which is very quick to calculate. In Gosling's emacs the unique ID is gotten by hashing on the contents of the line, which is rather time consuming. Do you know a good way to do intelligent (i.e., insert/delete line) redisplay with a buffer gap scheme, without having to set all sorts of flags to speed things up? How does GNU do it? I know it started out as Gosling's emacs, but the last time I looked I didn't see any code hashing on hashing on the contents of line.