pcg@aber-cs.UUCP (Piercarlo Grandi) (12/29/89)
There has been some discussion recently about the memory usage of GNU Emacs. I had stated that, using my sbrk-lowering malloc package, I could get micrognu etc... to release memory to the OS, but not GNU emacs. I speculated that this was because when allocating a buffer something is allocated after it that is not released when the buffer is deallocated. Well, I was wrong, but there is also an interesting story. I looked at the code in buffer.c and insdel.c, which are the low level editing engine parts of GNU emacs. What happens is that when a buffer is created, a buffer header is allocated, and then memory for the buffer contents is reserved (just a little initially, 20 bytes). When it is deleted, the contents and some other data structures are released, but *not* the buffer header and some dependent data structures. These are only released at the next garbage collection. Indeed, if after killing a buffer I collect, the virtual address space size of my GNU emacs process shrinks, as it should (using the proper malloc library. Don't ask for it, by the way -- it has been posted to alt.sources, and I will shortly contribute it to comp.sources, and/or to the FSF). This is not the end of the story. It also appears that the buffer contents are kept using the "gap" method, in which the entire file is represented as a single huge string, and at the (insertion) point there is a "gap". When the gap is filled by too many insertions, the gap is recreated by shifting the tail of the buffer upwards. Unfortunately as it is implemented in GNU emacs the "gap" method is a disgrace for performance because: 1) every time you move the insertion point by N characters, N characters are *copied* from one extreme of the gap to the other to shift it. 2) thank goodness the gap is moved only when the insertion point changes, not when the point changes as well. Too bad that it takes three seconds on a SUN 3/50 to move the gap by 750K... something like 250KBytes per second bandwidth, which is more like a poorly optimized disc or an Ethernet. 3) The copy is effected by the classic C loop, in segments of 32000 bytes: while (--i >= 0) *--to = *--from; while instead of course if i is greater than a few dozen bytes memcpy(3) or bcopy(3) should be used, hoping that these do cache line sized and aligned transfers to avoid catastrophic performance on byte writes. 4) You may argue that since the gap is created at about 2000 bytes, and usually many insertions are done at the same point, the cost of moving the gap is spread across many insertions. Too bad that the cost is still so terribly high. 5) the idea of using the gap method is not too convincing in itself, mostly because of its poor virtual memory profile. Most of the going around the huge line that is the buffer contents will wreack havock with most paging algorithms, because it is strictly sequential, and this is a known bad pattern of reference for most LRU based paging algorithms. When the gap is moved hell ensues. Programs with sequential access characteristics to memory have been reported to markedly improve their performance when it was possible to inform the pager of their peculiarity By the way, this is why, all other advantages notwithstanding, memory mapped files often perform less well than traditional io; the latter usually is heavvily tuned to expect sequential access patterns, both as to read ahead, and to forget behind. Larger page sizes help a bit with the former, but with some loss elsewhere. 6) Having a linked list of lines is a fairly cheap technology, and inasmush it can make the working set smaller (no need to move the gap) it will often actually *save* memory, even if it has an overhead for the links (often smaller however for small files than the cost of keeping a gap). 7) Most interestingly when the gap closes, because we have inserted that much worth of data, the *entire* buffer contents is realloc()ated. If the buffer is not the top one in virtual memory, or it is but you have a a realloc() that will not attempt just extending a block, but simply allocates a new block and copies the old into the new, you are in for extra overheads. They are no longer proportional to the amount of work you do, but to the total size of the file as well, which may be bad news. 8) most interestingly, realloc()ating the buffer will leave large holes in your virtual address space, that will expand forever, taking up if anything swap space. The holes are also likely to get fragmented as your session progresses (remember, GNU emacs has such high overheads that you are supposed to start it up once as a server and then use emacsclient as the "real" editor), and therefore the virtual memory profile of your session will worsen steadily. 9) GNU emacs also has some other interesting overheads. The screen display procedure is no speed daemon either... What are the solutions? Well, essentially GNU Emacs is not built for paged virtual memory machines; it it designed for core based segment swapping systems. It is not by chance that the gap method was very popular on base+limit relocation machines, and (I think) comes from the editor on the MIT CTSS. There are palliatives of course. The default size of the gap could be increased; this would make realloc()ations less frequent, and would not greatly increase the working set size, as the pages making up the gap are never referenced. A gap of say 4 pages per buffer may do, and actually save more address space than more frequent realloc()ations. Use also a nice realloc() that will try to avoid copying blocks to be extended as much as possible. Improving the locality of the lisp area with a compacting collector may help, as a faster, simpler redisplay. Profling GNU emacs can be great fun I suppose. The better solution, made relatively easy by the reasonably modular and layered structure of GNU emacs, would be to accept The Emacs Cookbook recommendation (adopted by Jove and the MicroEmacs/Gnu family of editors) and implement a linked list system. I would suggest just reading (or map, on the operating systems that allow it) the file to be edited as a large lump of virtual address space, then building a linked list of pointers to lines therein, and then to allocate modified lines from ageneral purpose arena. Informing the OS of the peculiar access patterns would also help, if possible. To keep the line descriptor size small (and thus its overhead) it would be perfectly feasible I think to have for example a byte sized line length field, and then use hash linking for storing the lengths of the (expectedly few) lines longer than 254 characters. Many other tricks could be used. I am very busy currently doing other work for the greater glory of the FSF, so I hope somebody else will enjoy the honour to do something about this. If Toshiba, Matsushita, Samsung and the other DRAM suppliers don't get him/her first :->. -- Piercarlo "Peter" Grandi | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcvax!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
tom@ssd.csd.harris.com (Tom Horsley) (12/31/89)
>6) Having a linked list of lines is a fairly cheap technology, and inasmush >it can make the working set smaller (no need to move the gap) it will >often actually *save* memory, even if it has an overhead for the links (often >smaller however for small files than the cost of keeping a gap). I don't want to start any arguments about the merits of one buffer data structure over the other, but before anyone wastes a lot of time making a change like this under the assumption that gnuemacs is a nice modular system with good data hiding principles applied, etc. They should take a look at the regular expression pattern matching code (just to mention one example that would disintegrate). It expects the data for the buffer to be in 1 or 2 chunks. Only RMS knows how deeply intwined this data structure is in other parts of gnuemacs, but it is not something to change in one afternoon. Actually, my biggest objection to emacs is garbage collection. I would do anything (short of actually doing the work to implement it :-) to have an incremental garbage collector. There is nothing more disruptive to your train of thought than being in the middle of coding some algorithm you have finally gotten your brain to understand, then having everything stop while the Garbage collecting... message comes up. -- ===================================================================== domain: tahorsley@ssd.csd.harris.com USMail: Tom Horsley uucp: ...!novavax!hcx1!tahorsley 511 Kingbird Circle or ...!uunet!hcx1!tahorsley Delray Beach, FL 33444 ======================== Aging: Just say no! ========================
woods@robohack.UUCP (Greg A. Woods) (01/01/90)
Re: article <1558@aber-cs.UUCP> by pcg@aber-cs.UUCP (Piercarlo Grandi) I was about to retrieve a copy of GNU Emacs in order to determine the buffer management scheme it used. I am a Jove user, and have been more than satisfied with Jove. Recently I've had the occasional need to use an editor with an extension language. Given that Jove is a known beast, I was considering building a version with elk hooked in (elk is a shceme interpreter). I wanted to make sure Jove would be a good base, and that GNU Emacs was indeed a waste of disk space. Your article has convinced me that GNU Emacs is indeed, in my case, a waste of disk space. However, you have also given me a few ideas. Jove needs a fair amount of clean up before it will be able to edit binary files. It has a hard-coded line length limit, and uses str* functions in places to shuffle data around. I don't know to what extent this cleanup will have to go yet. Jove has several implementation features which make it desireable in both small machines and virtual memory machines. It may just be more valuable to do this work than to fix GNU Emacs in the ways you have described. Of course fixing the line length limit will have to be done with care to allow the occasional editing of binary files to be reasonably efficient. Once Jove has been cleaned up, it will be a complete, functional subset of GNU Emacs, the missing piece being primarily the lisp extension language. If I can successfully hook in elk, I'll then have an editor which is a functional equivalent of GNU Emacs, with the added benefit of having several compile time options which will make it much smaller and more efficient. -- Greg A. Woods woods@{robohack,gate,tmsoft,ontmoh,utgpu,gpu.utcs.Toronto.EDU,utorgpu.BITNET} +1 416 443-1734 [h] +1 416 595-5425 [w] VE3-TCP Toronto, Ontario; CANADA
rlk@think.com (Robert Krawitz) (01/01/90)
Very interesting note. It explains a large number of observations I have made over the years (some of them I was aware of long before reading your note, as I did a lot of work on rmail around 1985, but this ties a lot of stuff together). 1) Rmail is very slow when getting new mail from an inbox. I was aware of this very early, and I understood why (the gap). Rmail normally has to convert each message to babyl format by making a few small edits on each message. When I worked with pmd (personal mail daemon), I put in code to permit mailboxes to be written in rmail format, thereby not requiring any conversion to be done. This speeds up emacs substantially. However, certain operations (such as computing a summary buffer) are still slow. This is in part because rmail writes the summary line into the message header (to cache it for future use). I was never in favor of this, but I never thought too hard about the fact that it edits in the same pattern. BTW, a favorite benchmark of mine involves the following: converting a large number of messages (1000, say) to babyl format, and deleting and expunging these same 1000 messages. The messages are deliberately kept small (a very small header and one line of body) to minimize paging effects. My experience was that the early IBM RT (which was otherwise a real dog) could keep up with a Microvax II on this test, and that in general RISC machines do extremely well on this test (they run emacs very well in general, as it happens). 2) Emacs dies very quickly after its virtual size exceeds 16 Mbytes, due to the 24 bit pointers used (the top 8 bits are used as tag bits for the Lisp interpreter). I have frequently noticed that killing off old buffers does not permit me to prolong the life of my emacs session, and that an emacs with a Lisp buffer (which grows rapidly but erratically) tends to run out of room quickly. This I assume is due to the constant realloc'ing going on. I don't necessarily agree that the issue is design for virtual memory vs. swapping, by the way. There is a general problem in emacs with a lot of things being scaled poorly, or otherwise underdesigned. For example, the 24 bit limit on integers (23 bit signed integers in lisp, 24 bit pointers internally), the inexplicable and seemingly gratuitous divergences from common lisp, etc. The 24 bit integer/pointer problem worried me even in 1985, but RMS wasn't too interested in hearing about it. The problem is only really showing up now (for example, my Sparcstation has 16 MB of physical memory and 100 MB swap, and I run big emacs processes). Judging by your comments, the memory management scheme was similarly unplanned. I don't think it was designed with swapping systems in mind, I simply don't think it was designed to any great degree. A real pity, since no other Unix editor shows any more design. I wish it had been done right in the first place. It's not clear to me that any of this will ever be fixed. -- ames >>>>>>>>> | Robert Krawitz <rlk@think.com> 245 First St. bloom-beacon > |think!rlk (postmaster) Cambridge, MA 02142 harvard >>>>>> . Thinking Machines Corp. (617)876-1111
peter@ficc.uu.net (Peter da Silva) (01/01/90)
In article <32534@news.Think.COM> rlk@think.com (Robert Krawitz) writes: > 2) Emacs dies very quickly after its virtual size exceeds 16 Mbytes, This is a scary thought all by itself, but... > due to the 24 bit pointers used (the top 8 bits are used as tag bits for > the Lisp interpreter). Come *ON*, why is this accepted? I remember not so very long ago people were flaming Microsoft for doing this on the Macintosh, which led to Microsoft applications not working on >1MB machines after the extra bits started getting decoded... Not to mention that making assumptions about pointer formats is dangerous in and of itself. Certainly you can't *use* such pointers without stripping them, which adds overhead to every memory reference. And (per recent discussions in comp.std.c) there are architectures out there that won't even let you load them into an address register. -- `-_-' Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>. 'U` Also <peter@ficc.lonestar.org> or <peter@sugar.lonestar.org>. "It was just dumb luck that Unix managed to break through the Stupidity Barrier and become popular in spite of its inherent elegance." -- gavin@krypton.sgi.com
pcg@aber-cs.UUCP (Piercarlo Grandi) (01/03/90)
I have done some tinings on my 4 MIPS 386 clone, with System V.3.2, and using crash(1M) to get the times (down to centiseconds) for the tests I have made. The first table is about editing /etc/termcap, which is about 80KBytes long, the times are cumulative in seconds.centiseconds, and they have a quite wide margin of variability, but are "typical": OPERATION Emacs 18.54 MicroGNU 2a Jove 4.12 user sys user sys user sys 1) startup 0.45 1.10 0.02 0.10 0.12 0.67 2) C-X C-F 0.51 1.96 0.71 0.44 0.66 1.18 3) C-X C-Q 0.54 2.08 0.74 0.47 0.66 1.20 4) SP 0.62 2.26 0.74 0.48 0.66 1.20 5) M-> 0.65 2.45 0.78 0.53 0.67 1.26 6) SP 0.71 2.58 0.78 0.53 0.67 1.26 7) SP 0.71 2.70 0.78 0.53 0.67 1.26 8) M-< 0.75 3.24 0.82 0.56 0.69 1.36 9) SP 0.82 3.53 0.82 0.75 0.69 1.36 Comments: 1) Is just to invoke an empty editor. GNU Emacs pays dearly for its size and complication here. 2) Reads 80KB in. GNU is relativley quick, it just copies the file to memory. MicroGnu reads it into memory and threads it into a list of lines, and Jove copies it to a temporary file and threads into a list the offsets of lines in this file. 3) This is only to remove the RO property from the buffer. 4) Insert a space at the beginning of buffer. GNU has to move the gap, which is initially at the end of memory, and pays a lot here. The other two take less than a centisecond, GNU seven. This is 70 milliseconds for moving 80KB, or somewhat more than a MB per second on a 4 MIPS machine. Note the relatively high system time for Emacs. Thi is due mostly to redisplay. 5) Go to the end of buffer. This involves no gap movement, just redisplay. System times show it. On my 386 the frame buffer (an Hercules clone) is abjectly slow, and the device driver for it correspondinglu clocks up system time. 6) Insert as space as the last chracter. This moves the gap again, and it shows. Also redisplay. 7) Add a second space at the end. Just redisplay really, and minimal as to that. 8) Go back to the beginning of buffer. 9) insert another space there. I have some other times for GNU Emacs before and after putting in memcpy(3) instead of looping. Well, moving the gap is about three times faster. Note that on cached machines this can be further improved; the hardware string move instructions typically hard coded in memcpy(3) and friends are often very suboptimal. On a cached machine you really want to split your transfer in three parts (if long enough!), a head and a tail that are copied byte-by-byte, and a body that is copied one cache line at a time, and is cache line aligned in the *destination*. Especially on machines with write thru caches, writing cache line aligned cache line sized chunks is *vastly* faster, as it avoids the cache fetching a line only to partially update it and hold ups at the memory interface. I remember that on a VAX-11/780 with an 8 byte SBI write buffer, writing aligned 8 byte transfers was *tremendously* effective. The lesson I derive from these timings is that creating the linked list of lines, and especially copying the file to a temporary as well, slow down file reading time, but then further operations become very much faster. Note also that both MicrGnu and Jove are somewhat carelessly coded, with lots of quadratic algorithms. Ah, another note: in my favourite editor buffer organization, I would use the gap method, for *intra* line editing, as it avoids a lot of these quadratic situations (e.g. repeated inserts or deletes). -- Piercarlo "Peter" Grandi | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcvax!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
jpayne%flam@Sun.COM (Jonathan Payne) (01/03/90)
All this talk about emacs and buffer gap vs. linked list is interesting, and I just have to put in my two cents worth. I've already implemented my own version of emacs using linked lists, so that's my experience. I read the emacs cookbook for the redisplay algorithm before I started JOVE. I picked linked list because I knew I could not store the files in memory simply because I wrote JOVE for the pdp11/70. But since I moved to bigger machines I have also put off using buffer gap because I could not quite see how to make the redisplay algorithm be as smart and cheap as JOVE's. My first exposure to buffer gap was checking out Gosling's emacs (that thing was so full of good ideas!) and the first thing I noticed was how amazingly smart the redisplay algorithm was, but also how amazingly expensive it was. I couldn't see how to make a smart redisplay algorithm be fast with buffer gap. I have since then learned, and now I am writing another version of JOVE using buffer gap. I will never use a linked list of lines approach again. Buffer gap is SO much better in lots ways from an implementor's point of view and often from the user's point of view. Here are the reasons that come to mind: o File i/o is blindingly fast. It's standard to implement read-file the following way: - make sure the buffer gap is big enough for the whole file - make one call to read to read the whole file into the buffer That's it. Simple as anything, FAST as anything. On my Sun 3/60 I can read in a megabyte in a second. Writing files is done in two chunks, the data to the left of the gap and then the data to the right. o A buffer position is represented as an integer. That's very convenient in lots of ways. Forward character is simply b->point += 1; with some bounds checking. No special casing being at the end or beginning of a line. You can then define marks, which are easy to update for insertions. There are hacks you can use so that marks don't need updating after every insertion, but rather after every time the gap is moved, which is relatively rare. You can define marks with lengths associated with them, and mark them as dirty when changes are made within the span of the mark. All are possible with linked list, but the code is harder to write, less reliable, and the implementation will be slower. o Insert and delete operations of ANY kind are trivial to implement. Move the gap to the point of insertion, and either make the gap smaller (for insertion) or make the gap bigger for deletion. No splicing together lines in a linked list and garbage like that. o No line length hassles. o Undo/redo is trivial to implement in this scheme. I just implemented undo/redo in a prototype I am working on, very easy to understand, very nice semantics. I am not crazy about undo in VI, Gosling's emacs, or GNUemacs - I'd be curious to see what you think. o Regular expression search is MUCH cleaner. I ripped the code out of JOVE and converted it to buffer gap, and it's fast and much smaller, and definitely much easier to understand, and is more functional (it searches across newline bounderies). Lessee. You complained about the memory usage and the slowness of buffer gap. First of all, I think the average file in UNIX is much less than 100k. Actually I KNOW the average file is much less than that, but for the hacker types, I bet the average size of a source files is also less than 100k, more like 50k and below. The point is, moving the gap around is not that big a deal. The amount of copying done is directly related to how far the gap is moving, not how big the file is, and most of the time the gap does not move very far! It just doesn't. I thought paging algorithms were geared towards sequential access, and that some versions of UNIX provided a system call to tell the pager not to try and be smart about paging in adjacent pages for certain processes like LISP processes during garbage collection. But ... I could be wrong. >7) Most interestingly when the gap closes, because we have inserted that >much worth of data, the *entire* buffer contents is realloc()ated. If the >buffer is not the top one in virtual memory, or it is but you have a a >realloc() that will not attempt just extending a block, but simply allocates >a new block and copies the old into the new, you are in for extra overheads. >They are no longer proportional to the amount of work you do, but to the >total size of the file as well, which may be bad news. >8) most interestingly, realloc()ating the buffer will leave large holes in >your virtual address space, that will expand forever, taking up if anything >swap space. The holes are also likely to get fragmented as your session >progresses (remember, GNU emacs has such high overheads that you are >supposed to start it up once as a server and then use emacsclient as the >"real" editor), and therefore the virtual memory profile of your session >will worsen steadily. These are the main problems with buffer gap. >9) GNU emacs also has some other interesting overheads. The screen display >procedure is no speed daemon either... The redisplay algorithm can be kept smart and cheap. >The better solution, made relatively easy by the reasonably modular and >layered structure of GNU emacs, would be to accept The Emacs Cookbook >recommendation (adopted by Jove and the MicroEmacs/Gnu family of editors) and >implement a linked list system. I would suggest just reading (or map, on the >operating systems that allow it) the file to be edited as a large lump of >virtual address space, then building a linked list of pointers to lines >therein, and then to allocate modified lines from ageneral purpose arena. >Informing the OS of the peculiar access patterns would also help, if >possible. Again, I think this would muck up the rest of the code in the editor. How would you access consecutive pieces of the buffer? In buffer gap, you do CharAt(pos). What would that look like in your new design? At the least you would need accessor functions (not macros, either) to deal with boundery conditions. OR, you have to move the knowledge of how the buffer is stored into the places that need good performance. I had to do that for the redisplay algorithm in JOVE and for the regular expression searches. I'm not saying all this isn't possible. It's just my belief that it's not clearly a win. In fact, in my eyes, for 99% of the editing that I do, a buffer gap solution makes the most sense. Just my two cents... Jonathan Payne
pinkas@joker.intel.com (Israel Pinkas) (01/03/90)
In article <1558@aber-cs.UUCP> pcg@aber-cs.UUCP (Piercarlo Grandi) writes: > 7) Most interestingly when the gap closes, because we have inserted that > much worth of data, the *entire* buffer contents is realloc()ated. If the > buffer is not the top one in virtual memory, or it is but you have a a > realloc() that will not attempt just extending a block, but simply allocates > a new block and copies the old into the new, you are in for extra overheads. > They are no longer proportional to the amount of work you do, but to the > total size of the file as well, which may be bad news. > 8) most interestingly, realloc()ating the buffer will leave large holes in > your virtual address space, that will expand forever, taking up if anything > swap space. The holes are also likely to get fragmented as your session > progresses (remember, GNU emacs has such high overheads that you are > supposed to start it up once as a server and then use emacsclient as the > "real" editor), and therefore the virtual memory profile of your session > will worsen steadily. One thing to note is that in most malloc() packages (including the one in GNU Emacs), malloc() actually returns a pointer to a block of memory that is 2^n words long. (Actually, the pointer usually points one or two words into the block, to allow the size of the block to be stored so that free knows how big the block is.) The man page for realloc() specifically states that the block may or may not be moved. If you ask for a smaller block, and cross the 2^n word threshold, realloc can do one of three things: 1) grab a smaller block off the free list, and copy the data 2) split the block, if that is supported 3) silently hand back the same block Obviously, 2 is the fastest, but tends to fragment memory. 3 is the same speed, but wastes memory. Some versions will use combinations of the 3 schemes, depending of what the free list looks like and current memory usage. When realloc() is called to allocate a larger chunk of memory, it will often be able to return back the same chunk. Remember, if the current buffer is 100,000 bytes long, the block actually has room for 131,068 bytes (10^17 - 4). calling realloc() with a size of 102,000 is a no-op. As the buffer gets larger, realloc() has to move the block less frequently. At a certain point, the overhead is negligible. (With the 100,000 byte buffer, we would have to run out of room 16 times to necessitate moving the block. We would then have room for 66 gaps (over 130k) before having to move the block again. -Israel -- -------------------------------------- Disclaimer: The above are my personal opinions, and in no way represent the opinions of Intel Corporation. In no way should the above be taken to be a statement of Intel. UUCP: {decwrl,hplabs,oliveb,pur-ee,qantel}!intelca!mipos3!st860!pinkas ARPA: pinkas%st860.intel.com@relay.cs.net CSNET: pinkas@st860.intel.com
mike@umn-cs.CS.UMN.EDU (Mike Haertel) (01/03/90)
A word of warning regarding replacing the hand coded gap move with calls to memcpy(): memcpy() is not guaranteed to work correctly for overlapping source and destination regions. The proposed ANSI standard requires memmove() which does make this guarantee. bcopy() on 4.3BSD vaxes will work correctly, but this is not documented and may not work on other BSD derivatives. -- Mike Haertel <mike@ai.mit.edu> "Of course, we have to keep in mind that this year's Killer Micro is next year's Lawn Sprinkler Controller . . ." -- Eugene Brooks
idall@augean.OZ (Ian Dall) (01/03/90)
In article <1561@aber-cs.UUCP> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes: } }OPERATION Emacs 18.54 MicroGNU 2a Jove 4.12 } } user sys user sys user sys } }1) startup 0.45 1.10 0.02 0.10 0.12 0.67 }2) C-X C-F 0.51 1.96 0.71 0.44 0.66 1.18 }3) C-X C-Q 0.54 2.08 0.74 0.47 0.66 1.20 }4) SP 0.62 2.26 0.74 0.48 0.66 1.20 }5) M-> 0.65 2.45 0.78 0.53 0.67 1.26 }6) SP 0.71 2.58 0.78 0.53 0.67 1.26 }7) SP 0.71 2.70 0.78 0.53 0.67 1.26 }8) M-< 0.75 3.24 0.82 0.56 0.69 1.36 }9) SP 0.82 3.53 0.82 0.75 0.69 1.36 } }6) Insert as space as the last chracter. This moves the gap again, and }it shows. Also redisplay. } }7) Add a second space at the end. Just redisplay really, and minimal as to }that. Your data does not justify your conclusion. If the .71 sec for GNU emacs in 6 is (mainly) due to the movement of the gap, then why is the time identical for 7 when, as you say, no gap movement occurs? My guess (from the time GNUemacs is spending in system state) that it is doing more redisplay than the other two and you would be better spent looking for improvements in the redisplay algorithm. -- Ian Dall life (n). A sexually transmitted disease which afflicts some people more severely than others. idall@augean.oz
pcg@thor.cs.aber.ac.uk (Piercarlo Grandi) (01/04/90)
In article <129799@sun.Eng.Sun.COM> jpayne%flam@Sun.COM (Jonathan Payne) writes: I picked linked list because I knew I could not store the files in memory simply because I wrote JOVE for the pdp11/70. I have been using Jove for many years now. Well, five or six... On PDPs. I am now using it instead of GNU, by the way. While it is coded in a way I disagree with, it has steadily improved, and the mix of deatures offered/non offered is remarkably balanced. But since I moved to bigger machines [... and am going to use gap for Jove ... ] Moving to bigger machines, and all hell breaks loose... Aaaaghhhh. I will never use a linked list of lines approach again. Buffer gap is SO much better in lots ways from an implementor's point of view and often from the user's point of view. Here are the reasons that come to mind: As to that the simplest editor can be built with the two tape model of buffer management, but I hardly recommend it... o File i/o is blindingly fast. It's standard to implement read-file the following way: [ .... ] This is also trivial if you use a log method for updates. You read in the original, and then you keep a log of changes. It is also true that people often edit files for long times, and cost of reading them in is not that important. Writing files is done in two chunks, the data to the left of the gap and then the data to the right. With the log method this is more complicated, but not dramatically so. o A buffer position is represented as an integer. [ .... ] All are possible with linked list, but the code is harder to write, less reliable, and the implementation will be slower. But not terribly slow. And with costs being constant, instead of O(n). o Insert and delete operations of ANY kind are trivial to implement. [ ... ] As you say, just copy everything around... This is a *problem*, not a solution. o No line length hassles. You have to work a bit harder, On the other hand most lines an editor sees are short. So you can just have more complicated code for the case where line length overflows a predetermined limit. o Undo/redo is trivial to implement in this scheme. Undo/redo is also trivial with a log approach. The point is, moving the gap around is not that big a deal. Oh yeah. Too bad that constant time alternatives exist, instead of O(n), and that O(n) uses a lot of memory bandwidth and cycles and paging and... Of course, as always, if you are the only user of a 10 MIPS 16 MByte workstation you can afford to waste a lot of those. To me, however, waste is no good, if it can be easily obviated. I thought paging algorithms were geared towards sequential access, Most paging algorithms implement some approximation of LRU; LRU is *guaranteed* to trash on sequential access. >The better solution, made relatively easy by the reasonably modular and >layered structure of GNU emacs, would be to accept The Emacs Cookbook >recommendation (adopted by Jove and the MicroEmacs/Gnu family of editors) and >implement a linked list system. I would suggest just reading (or map, on the >operating systems that allow it) the file to be edited as a large lump of >virtual address space, then building a linked list of pointers to lines >therein, and then to allocate modified lines from ageneral purpose arena. >Informing the OS of the peculiar access patterns would also help, if >possible. Again, I think this would muck up the rest of the code in the editor. [ ... ] I'm not saying all this isn't possible. It's just my belief that it's not clearly a win. In fact, in my eyes, for 99% of the editing that I do, a buffer gap solution makes the most sense. Well, well, there are techniques around that. Also, it pays to invest more effort in a widely used utility like an editor than otherwise. The idea that a frequently used, response time critical, tool should be coded quick and dirty makes me object vehemently. -- Piercarlo "Peter" Grandi | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcvax!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
news@bbn.COM (News system owner ID) (01/04/90)
In article <129799@sun.Eng.Sun.COM> jpayne@sun.UUCP (Jonathan Payne) writes:
< ...
< I will never use a linked list of lines approach again.
Amen. (after hacking too much on mg:) If I never see a linked list
buffer again it will be too soon.
< ...
< Lessee. You complained about the memory usage and the slowness of buffer
< gap. First of all, I think the average file in UNIX is much less than
< 100k. Actually I KNOW the average file is much less than that, but for
< the hacker types, I bet the average size of a source files is also less
< than 100k, more like 50k and below. The point is, moving the gap around
< is not that big a deal. The amount of copying done is directly related
< to how far the gap is moving, not how big the file is, and most of the
< time the gap does not move very far! It just doesn't.
Actually, as I (perhaps mistakenly recall), the Emacs Cookbook says
that a buffer gap is _more_ efficient than a linked list for files up
to about 75K in size.
< >7) Most interestingly when the gap closes, because we have inserted that
< >much worth of data, the *entire* buffer contents is realloc()ated. If the
< >buffer is not the top one in virtual memory, or it is but you have a a
< >realloc() that will not attempt just extending a block, but simply allocates
< >a new block and copies the old into the new, you are in for extra overheads.
< >They are no longer proportional to the amount of work you do, but to the
< >total size of the file as well, which may be bad news.
<
< >8) most interestingly, realloc()ating the buffer will leave large holes in
< >your virtual address space, that will expand forever, taking up if anything
< >swap space. The holes are also likely to get fragmented as your session
< >progresses (remember, GNU emacs has such high overheads that you are
< >supposed to start it up once as a server and then use emacsclient as the
< >"real" editor), and therefore the virtual memory profile of your session
< >will worsen steadily.
<
< These are the main problems with buffer gap.
Actually, they look to me like problems with this _implementation_ of
buffer gap. Consider making the global data layout something like
<random C data structures>
<used elisp heap space>
<availiable elisp heap space (possibly zero)>
<used buffers>
<unused buffer space>
The lisp heap should be kept with a compacting GC (incremental would
be nice too). New versions of malloc and friends that interact well
with the lisp heap would be required, since the extention of buffer
memory needs to be much more low-level.
The basic idea is that when a buffer needs to be expanded, all the
buffers would be moved to open up the space. This is obviously a
trade-off: it would't leave holes, but it would do *even more* memory
copying. A portable, efficient, memcpy would be required for those
systems that don't have a nice in-assembler version.
-- Paul Placeway
rdh@sli.com (01/04/90)
In article <129799@sun.Eng.Sun.COM> jpayne%flam@Sun.COM (Jonathan Payne) writes:
o Regular expression search is MUCH cleaner. I ripped the code
out of JOVE and converted it to buffer gap, and it's fast and
much smaller, and definitely much easier to understand, and is
more functional (it searches across newline bounderies).
Just as an aside, a fast search algorithm dependant on contiguous buffer
(a gap is a special [degenerate] case of a contiguous buffer) can match
reasonable search strings with an average of less than one machine in-
struction per character searched (for an average PDP-10/68K/x86 class of
machine, anyways). With a linked list buffer scheme, I suspect that the
best one could hope for is an order of magnitude worse. I personally do
*MANY* more searches than I do gap-moving operations...
-RDH
P.S. By "reasonable search strings", I mean at least 4 distinct char-
acters, typically 6 or 8. For example, "read", "write", "(arg1,"
are all "typical, reasonable" search strings that search very
fast, whereas "a" or "zzzzz" would not search nearly as fast.
fin@uh.msc.umn.edu (Craig Finseth) (01/04/90)
As the author of "The Emacs Cookbook" (actual title: "Theory and Practise of Text Editing --or-- A Cookbook for an Emacs", MIT Laboratory for Computer Science (LCS) Technical Memo 165, May 1980 (it was my B.S. thesis)), I feel that I can make a useful contribution to this discussion. Linked line vs. buffer gap: I mention in pages 33+ that of the two, buffer gap is the preferred technology for paged virtual memory systems. As a theoretical point, you're always going to be in trouble if your buffer size is larger than your (allowed by the o.s.) working set size. I contend that you are both in *less* trouble and the trouble point is moved up in a buffer gap system. Tim Bray makes an exellent point: "Hold on. What makes you think you know what the problem is? Have you instrumented it with the profiler and quantified the overhead of this "problem"? My own intuition is that GNUmacs spends much more time CONSing and otherwise chugging through the innards of the Lisp interpreter. But I wouldn't bet $0.50 on my intuition unless I had a good quantitative understanding of what's going on." Aside from a few pathological cases (RMAIL being a notable one), I would be surprized if gap motion was as significant factor on general editing operations. Editing is such a general activity, and GNU-Emacs is used for such a wide variety of purposes, that it would be impractical to optimize its performance in all cases. Instead, the trick (as in all programming) is to identify the frequent cases and optimize them. In particular, I consider editing small files and *viewing* large files to both be frequent: editing a large file (especially the "insert at beginning / insert at end / insert at beginning" case) is comparatively rare. Piercarlo Grandi then presented some data. I took the liberty of combining the user and system times (as we mainly care about wall clock time, not who is paying $$$) and figuring the incremental times for each operation: OPERATION Emacs 18.54 MicroGNU 2a Jove 4.12 total inc total inc total inc 1) startup 1.55 1.55 0.12 0.12 0.79 0.79 2) C-X C-F 2.47 0.92 1.15 1.03 1.84 1.05 3) C-X C-Q 2.62 0.15 1.21 0.06 1.86 0.02 4) SP 2.88 0.16 1.22 0.01 1.86 0.00 5) M-> 3.10 0.22 1.31 0.09 1.93 0.07 6) SP 3.29 0.19 1.31 0.00 1.93 0.00 7) SP 3.41 0.12 1.31 0.00 1.93 0.00 8) M-< 3.99 0.58 1.38 0.07 2.05 0.12 9) SP 4.35 0.36 1.57 0.19 2.05 0.00 Comments on Piercarlo's comments: 1) Is just to invoke an empty editor. GNU Emacs pays dearly for its size and complication here. 2) Reads 80KB in. GNU is relativley quick, it just copies the file to memory. MicroGnu reads it into memory and threads it into a list of lines, and Jove copies it to a temporary file and threads into a list the offsets of lines in this file. If "GNU is relativley quick" and GNU takes twice about as long as the others, what is the definition of "quickness"? 3) This is only to remove the RO property from the buffer. 4) Insert a space at the beginning of buffer. GNU has to move the gap, which is initially at the end of memory, and pays a lot here. The other two take less than a centisecond, GNU seven. This is 70 milliseconds for moving 80KB, or somewhat more than a MB per second on a 4 MIPS machine. Note the relatively high system time for Emacs. Thi is due mostly to redisplay. Actually, it takes GNU about the same time to do this as to clear the RO property. Moving the gap is thus probably not the major problem here. 5) Go to the end of buffer. This involves no gap movement, just redisplay. System times show it. On my 386 the frame buffer (an Hercules clone) is abjectly slow, and the device driver for it correspondinglu clocks up system time. Again, this takes about 50% longer than 4, with no gap motion involved. I would start to suspect redisplay as the real culprit... 6) Insert as space as the last chracter. This moves the gap again, and it shows. Also redisplay. Now we have a drop in time with the gap motion. Less redisplay, too. Again, I would really focus on the redisplay time and ignore the gap motion time (unless it is trivial to speed up). 7) Add a second space at the end. Just redisplay really, and minimal as to that. 8) Go back to the beginning of buffer. 9) insert another space there. 8 and 9 further confirm that gap motion is not the major problem WHEN CONSIDERING THE SYSTEM AS A WHOLE. From the same message: "The lesson I derive from these timings is that creating the linked list of lines, and especially copying the file to a temporary as well, slow down file reading time, but then further operations become very much faster. Note also that both MicrGnu and Jove are somewhat carelessly coded, with lots of quadratic algorithms." I claim that the data does not support this conclusion. This does not mean that the conclusion is incorrect. Rather, the data supports the conclusion that GNU-Emacs' redisplay is slow on the specified computer. This ananlysis parallels that supplied by Ian Dall. Jonathan Payne supplied an excellent summary of how a buffer gap editor works and the problems with redisplay. As it turns out, even basic redisplay is a hard problem (Knuth "The Art of Computer Programming" level 40). Doing a full redisplay correctly (handling variable-width characters, unlimited line lengths, multiple windows, etc.) is HARD (Knuth level 50). Some Historical Notes: Early drafts of the thesis had a chapter "proving" that only a mainframe-type computer could run a full Emacs-type editor. One brilliant insight (:-) later, the chapter was cut and a software company was formed (Mark of the Unicorn). The Mince text editor was not interactively extensible, but was otherwise a full Emacs running on a 48K CP/M system with small floppy disks. The scheme that was used is called paged buffer gap and it is briefly mentioned on page 36 of the thesis. The basic idea is that a file is represented as an array of pages. Each ~1K page is a separate buffer gap system. This representation was very efficient for the environment, especially as it formed a natural basis for the paged virtual memory environment required to edit files larger than will fit in memory. I contend that in a "modern workstation environment" (e.g., Sun 3/60), a simple buffer gap method is preferred over both paged buffer gap and linked line. I leave it as an excercise for the reader to figure out why. Craig A. Finseth fin@msc.umn.edu [CAF13] Minnesota Supercomputer Center, Inc. +1 612 624 3375
mark@b11.ingr (Mark Jones) (01/04/90)
One big bonus of the buffer gap editing is seen when working over NFS. When you can say write this big chunk, followed by write this big chunk, you get a reasonably decent speed out of NFS, when you say for(from now until sometime tomorrow) write this little bitty line; you get pathetic performance out of NFS. Microemacs does its writes one line at a time, and it is pathetic. What does micrognu do? Mark "If it ain't broke, don't break it" Jones GNUemacs is the finest editor ever written. It has more features and capabilities than any other editor even dreamed of.
freedman@granite.cr.bull.com (Jerome Freedman) (01/04/90)
What is the "Emacs Cookbook"? Can I get a copy? How? Jerry Freedman, Jr
vjs@calcite.UUCP (Vernon Schryver) (01/05/90)
In article <1025@uc.msc.umn.edu>, fin@uh.msc.umn.edu (Craig Finseth) writes: > ... Each ~1K page is a separate buffer > gap system. This representation was very efficient for the > environment, especially as it formed a natural basis for the paged > virtual memory environment required to edit files larger than will fit > in memory. > ... > Craig A. Finseth fin@msc.umn.edu [CAF13] > Minnesota Supercomputer Center, Inc. +1 612 624 3375 An editor called "QED" by Peter Deutch (sp?) of Project Genie at Berkeley used this technique on one of the first paged virtual memory systems, the SDS-940 (later XDS-940). The 940 was one of the intellectual parents of UNIX in general, and I've been told that ed is in some important sense a descendent of QED. By making the buffer size match the system page size, by keeping a modest amount of free space in each buffer, using your "gap method" within a buffer, linking buffers together using external links, and possibly doing manual/explicit paging to secondary storage (scratch files), searches were fast, memory management was not hard, and so on. It is a nice combination. I used this technique in a commercial 4.2BSD based, WYSIWYG, multi-window, bit-mapped, extensible, object oriented, "integrated" (with graphics, spread sheet, and DMB) text system with perportional Adobe fonts for a competator and predecessor of Interleaf. I thought overall result, given mid-80's hardware, was fast and powerful. I must note not only that I no longer work for that company, but it is essentially out of business. Vernon Schryver vjs@calcite.uucp ...{pyramid,sgi}!calcite!vjs
pcg@aber-cs.UUCP (Piercarlo Grandi) (01/05/90)
In article <692@augean.OZ> writes: In article <1561@aber-cs.UUCP> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes: }5) M-> 0.65 2.45 0.78 0.53 0.67 1.26 }6) SP 0.71 2.58 0.78 0.53 0.67 1.26 }7) SP 0.71 2.70 0.78 0.53 0.67 1.26 } }6) Insert as space as the last chracter. This moves the gap again, and }it shows. Also redisplay. } }7) Add a second space at the end. Just redisplay really, and minimal as to }that. Your data does not justify your conclusion. If the .71 sec for GNU emacs in 6 is (mainly) due to the movement of the gap, then why is the time identical for 7 when, as you say, no gap movement occurs? They are *cumulative times*; this means that adding the second character costs less than a centisecond in usert time, while the cost in system time is almost all redisplay. However your point that redisplay is *expensive* is very true. How much of this is due to GNU emacs, and how much to the incredible slowness of my machine's frame buffer, I don't know. One should really profile (both these points I have already made in my posting... :->). -- Piercarlo "Peter" Grandi | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcvax!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
fin@uh.msc.umn.edu (Craig Finseth) (01/05/90)
In article <81@calcite.UUCP> vjs@calcite.UUCP (Vernon Schryver) writes: ... >An editor called "QED" by Peter Deutch (sp?) of Project Genie at Berkeley >used this technique on one of the first paged virtual memory systems, the >SDS-940 (later XDS-940). The 940 was one of the intellectual parents of >UNIX in general, and I've been told that ed is in some important sense a >descendent of QED. And I thought I knew about every text editor written before 1980 (:-). This just goes to support my general claim that nothing new has been done in software since 1970. (All of the virtual memory techniques we used were straight out of 1960s implementations.) Without knowing details of the SDS-940, I would surmise that its CPU / memory / disk tradeoffs (not absolute performance!) were similar to the early 1980s micros that we were developing for. Craig A. Finseth fin@msc.umn.edu [CAF13] Minnesota Supercomputer Center, Inc. +1 612 624 3375
peter@ficc.uu.net (Peter da Silva) (01/05/90)
There seems to be an assumption here that the only possible methods are buffer gap and a linked list of lines. What about a linked list of larger blocks? I've seen an editor somewhere that stores the file as a list of fixed-size blocks, not necessarily full. When inserting it splits the current block into two at the insert point. When the insert point moves it considers coalescing blocks... I can't remember the editor involved, unfortunately, nor the algorithm for coalescing blocks. It seems to me, anyway, that by choosing an appropriate size you can get far better performance than either the buffer-gap and list- of-lines method for a wide range of files. And of course reading and writing files as nearly as easy as with the buffer-gap technique. Given an appropriate system architecture, such as Mach, you could even fault your blocks in from the file! -- _--_|\ Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>. / \ Also <peter@ficc.lonestar.org> or <peter@sugar.lonestar.org> \_.--._/ v "Have you hugged your wolf today?" `-_-'
peter@ficc.uu.net (Peter da Silva) (01/05/90)
> The scheme that was used is called paged buffer gap and it is briefly > mentioned on page 36 of the thesis. The basic idea is that a file is > represented as an array of pages. Each ~1K page is a separate buffer > gap system. That's another technique. With such small buffers, though, you'll lose some on bigger systems. And of course Craig goes on to note... > I contend that in a "modern workstation environment" (e.g., Sun 3/60), > a simple buffer gap method is preferred over both paged buffer gap and > linked line. I leave it as an excercise for the reader to figure out > why. I'm not sure this is a valid conclusion. If 75K is the optimal file size for a buffer-gap solution, how about paged buffer-gap with 75K pages? Or to add a bit of compromise with some of today's brain-dead architectures perhaps 64K pages would work nearly as well. And how does buffer-gap compare with buffer-splitting? You can think of it as a buffer-gap where you don't always need to fill in the gap when you move the insertion point. -- _--_|\ Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>. / \ Also <peter@ficc.lonestar.org> or <peter@sugar.lonestar.org> \_.--._/ v "Have you hugged your wolf today?" `-_-'
gumby@Gang-of-Four.Stanford.EDU (David Vinayak Wallace) (01/06/90)
Date: 5 Jan 90 12:21:46 GMT From: peter@ficc.uu.net (Peter da Silva) There seems to be an assumption here that the only possible methods are buffer gap and a linked list of lines. What about a linked list of larger blocks?[...]I can't remember the editor involved, unfortunately, nor the algorithm for coalescing blocks. TEdit, the wysiwyg editor for Interlisp-D used this technique. It has several problems, including the difficulty of implementing search cheaply and its memory footprint (loss of locality). If your aim is to reduce copying this will help, but you'll end up with chunks scattered through memory, which could in a pessimal case result in significantly worse paging performance. As was already pointed out, because of the way humans edit text, the advantage of having lots of little gaps isn't that great since people tend to edit in a small number of places at a time. Many also use search as the primary means of moving the cursor. On the other hand the main use of this technique is to help integrate the redisplay datastructures with the buffer data. There are other, cheaper ways to do this, as have been previously described. And of course reading and writing files as nearly as easy as with the buffer-gap technique. Actually it won't be, since you can't write the buffer in (at most) two block writes and read it in one. Instead you have either to follow the block chains (when writing) or construct them (when reading). Given an appropriate system architecture, such as Mach, you could even fault your blocks in from the file! Mach doesn't do object (structure) paging to my knowledge. On the other hand, with a linear structure as with the gap the pager already does this for you. For large arrays like this it would be better to improve the pager. Here are some suggestions (some are supposed to appear in 4.4, I think): 1> Allow the program to tell the pager which pages are no longer needed (ie set the access time to "long ago"). 2> Allow the program to change the page-ahead factor for blocks of pages, for cases when you know you'll want to do sequential access. 3> Allow the program to change the page mapping (think of it as mmap to your own addess space). Then when you open a gap, you always copy at most one page; you just split the page on which the gap apears into two and translate the pages above or below by one. You've already got paging hardware; why not use it to your advantage? regards, d Now if you want OS mods to improve interactive performance, adding ECHOIN would be a good place to start...
gumby@Gang-of-Four.Stanford.EDU (David Vinayak Wallace) (01/07/90)
[Sorry if this is a duplicate; I just realised that I screwed up the distribution so I don't think this was posted.] Date: 5 Jan 90 12:21:46 GMT From: peter@ficc.uu.net (Peter da Silva) There seems to be an assumption here that the only possible methods are buffer gap and a linked list of lines. What about a linked list of larger blocks?[...]I can't remember the editor involved, unfortunately, nor the algorithm for coalescing blocks. TEdit, the wysiwyg editor for Interlisp-D used this technique. It has several problems, including the difficulty of implementing search cheaply and its memory footprint (loss of locality). If your aim is to reduce copying this will help, but you'll end up with chunks scattered through memory, which could in a pessimal case result in significantly worse paging performance. As was already pointed out, because of the way humans edit text, the advantage of having lots of little gaps isn't that great since people tend to edit in a small number of places at a time. Many also use search as the primary means of moving the cursor. On the other hand the main use of this technique is to help integrate the redisplay datastructures with the buffer data. There are other, cheaper ways to do this, as have been previously described. And of course reading and writing files as nearly as easy as with the buffer-gap technique. Actually it won't be, since you can't write the buffer in (at most) two block writes and read it in one. Instead you have either to follow the block chains (when writing) or construct them (when reading). Given an appropriate system architecture, such as Mach, you could even fault your blocks in from the file! Mach doesn't do object (structure) paging to my knowledge. On the other hand, with a linear structure as with the gap the pager already does this for you. For large arrays like this it would be better to improve the pager. Here are some suggestions (some are supposed to appear in 4.4, I think): 1> Allow the program to tell the pager which pages are no longer needed (ie set the access time to "long ago"). 2> Allow the program to change the page-ahead factor for blocks of pages, for cases when you know you'll want to do sequential access. 3> Allow the program to change the page mapping (think of it as mmap to your own addess space). Then when you open a gap, you always copy at most one page; you just split the page on which the gap apears into two and translate the pages above or below by one. You've already got paging hardware; why not use it to your advantage? regards, d Now if you want OS mods to improve interactive performance, adding ECHOIN would be a good place to start...
peter@ficc.uu.net (Peter da Silva) (01/08/90)
> GNUemacs is the finest editor ever written. It has more features and > capabilities than any other editor even dreamed of. These two sentences are not necessarily complemantary. Nor are they necessarily opposed to each other. They are orthogonal. For examples in another field of endeavour: "The Caddilac Fleetwood is the finest car ever built. It has more options and carrying capacity than any car ever dreamed of." "The Porsche 959 is the finest car ever built. It has more sophisticated handling and a higher drivable speed than any car ever dreamed of." "The VW Beetle is the finest car ever built. It has more versions and a longer useful life than any car ever dreamed of." "The BMW 750 is the finest car ever built..." -- _--_|\ Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>. / \ Also <peter@ficc.lonestar.org> or <peter@sugar.lonestar.org> \_.--._/ v "Have you hugged your wolf today?" `-_-'
ken@argus.UUCP (Kenneth Ng) (01/08/90)
In article <50359@bbn.COM>, news@bbn.COM (News system owner ID) writes:
: Actually, as I (perhaps mistakenly recall), the Emacs Cookbook says
: that a buffer gap is _more_ efficient than a linked list for files up
: to about 75K in size.
I'm curious, exactly what is this "Emacs Cookbook"? Is it one of
the files including with the emacs distribution or something
seperate?
--
Kenneth Ng: Post office: NJIT - CCCC, Newark New Jersey 07102
uucp !andromeda!argus!ken *** NOT ken@bellcore.uucp ***
bitnet(prefered) ken@orion.bitnet
fin@uh.msc.umn.edu (Craig Finseth) (01/09/90)
In article <IN_S4Eggpc2@ficc.uu.net> peter@ficc.uu.net (Peter da Silva) writes: >There seems to be an assumption here that the only possible methods are >buffer gap and a linked list of lines. What about a linked list of larger >blocks? ... Think of pure buffer gap and linked line as opposite ends of a continuum. (Technically, linked character would be on the one end, but I will ignore that case.) Pure buffer gap has one chunk, which is the entire object. As you divide it into separate chunks (paged buffer gap), the number increases and their size decreases. Eventually, you get to a chunk size of one line and have linked line. In may of the intermediate cases, whether you use an array or linked list is an implementation detail. Craig A. Finseth fin@msc.umn.edu [CAF13] Minnesota Supercomputer Center, Inc. +1 612 624 3375
fin@uh.msc.umn.edu (Craig Finseth) (01/09/90)
In article <SN_87Eggpc2@ficc.uu.net> peter@ficc.uu.net (Peter da Silva) writes: >> I contend that in a "modern workstation environment" (e.g., Sun 3/60), >> a simple buffer gap method is preferred over both paged buffer gap and >> linked line. I leave it as an excercise for the reader to figure out >> why. >I'm not sure this is a valid conclusion. If 75K is the optimal file size >for a buffer-gap solution, how about paged buffer-gap with 75K pages? Or >to add a bit of compromise with some of today's brain-dead architectures >perhaps 64K pages would work nearly as well. Where did this "75K" figure come from? It wasn't mentioned by me. It would be a very unusual constant to remain constant over a wide range of architectures, speeds, memory sizes, and other performance-related figures. In particular, I have had no trouble editing multi-megabyte files in GNU Emacs on a Sun 3/280 w/8 MBytes of memory. Craig A. Finseth fin@msc.umn.edu [CAF13] Minnesota Supercomputer Center, Inc. +1 612 624 3375
weening@Gang-of-Four.Stanford.EDU (Joe Weening) (01/09/90)
I'm posting this message to help RMS's desire to remove the discussion of editor implementation methods from gnu.emacs. The discussion is currently going on in comp.editors, gnu.emacs and comp.unix.wizards. Therefore not everyone contributing to the discussion may have seen RMS's messages. The discussion is appropriate for comp.editors, so that's where I'm directing followups to this message. It is not appropriate for gnu.emacs, since that group is meant for short announcements only. It is not at all appropriate for comp.unix.wizards and should never have been posted there. Please, everyone, let's move this discussion to comp.editors only. If you want to continue reading it, subscribe to that newsgroup. -- Joe Weening Computer Science Dept. weening@Gang-of-Four.Stanford.EDU Stanford University
peter@ficc.uu.net (Peter da Silva) (01/09/90)
> >> I contend that in a "modern workstation environment" (e.g., Sun 3/60), > >> a simple buffer gap method is preferred over both paged buffer gap and > >> linked line. I leave it as an excercise for the reader to figure out > >> why. > >I'm not sure this is a valid conclusion. If 75K is the optimal file size > Where did this "75K" figure come from? I honestly don't remember. It was mentioned by someone in this forum. I do think, though, that for any given system there is such an optimal size. It may be that on your workstation that size is measured in megabytes... on others it may be a few K. I wonder how it feels on a NeXT that's paging over the network or onto the OD? > >for a buffer-gap solution, how about paged buffer-gap with 75K pages? Or > >to add a bit of compromise with some of today's brain-dead architectures > >perhaps 64K pages would work nearly as well. > In particular, I have had no trouble editing multi-megabyte files in > GNU Emacs on a Sun 3/280 w/8 MBytes of memory. Having "no trouble" with something doesn't mean you have the optimal solution. Just that you have *a* solution. -- _--_|\ Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>. / \ Also <peter@ficc.lonestar.org> or <peter@sugar.lonestar.org> \_.--._/ v "Have you hugged your wolf today?" `-_-'
jimc@isc-br.ISC-BR.COM (Jim Cathey) (01/10/90)
>The better solution, made relatively easy by the reasonably modular and >layered structure of GNU emacs, would be to accept The Emacs Cookbook >recommendation (adopted by Jove and the MicroEmacs/Gnu family of editors) and >implement a linked list system. I would suggest just reading (or map, on the >operating systems that allow it) the file to be edited as a large lump of >virtual address space, then building a linked list of pointers to lines >therein, and then to allocate modified lines from ageneral purpose arena. >Informing the OS of the peculiar access patterns would also help, if >possible. So long as line-oriented operation preserved the ability of Gnu Emacs to edit binary files that have no 'lines' at all. The MicroEmacs we have here will choke on these (as does vi, ed, see, siv, and all the other editors we have), and MicroEmacs' line orientation is so badly implemented that if (at our site) the file is larger than about 50K it is _faster_ to start emacs on the file than MicroEmacs. MicroEmacs starts faster, but it reads files _much_ slower (fgets,malloc,strcpy). Somebody or other's master's thesis was on buffer styles (I got a copy with my copy of MINCE a few years ago), and his conclusion was that the gap method worked best. That may have been on a machine that wasn't DPV, though. Moving the gap by, say, 20 characters should affect at most two pages (four, if you assume it straddles a page boundary on both ends but this is true for any scheme and may be disregarded). A block with a line pointer array might also affect two pages (the block and the buffer array) so I don't offhand see the advantage. Jumping about wildly would touch a lot of pages, but the assumption is that you work a lot in one place. The gap approach makes it very quick to _save_ files, so the auto-save feature is unobtrusive. It would be absolutely useless if it took 5-10 seconds to rearrange the line-pointer and block mess to get it into savable form, or write a line at a time. If realloc can't do the right thing it should be replaced by one that can. I believe GNU isn't interested in supporting non-GNU machines (read VAX) to the extent that it corrupts the architecture of the program. I somewhat agree with them in that broken environments shouldn't be catered to, but repaired instead. It would be nice if emacs did sbrk- when it could. In our environment, we can also release holes in the _middle_ of the heap. We added an additional system call for it. This gets pages out of the swap space, but they'll be reallocated (and cleared to zero) if you touch in the hole. We have a limited virtual address space (2M on some machines, 4M on most others) so GNU can't edit those really big log files. I think only elle can of the editors I've experienced. I think it uses linked blocks. GNU Emacs _is_ awfully large, though, but I haven't noticed any machine eating behavior. Of course, we have a lot of smaller machines here, so few use it at once. Far more noticible is simultaneous compiles. +----------------+ ! II CCCCCC ! Jim Cathey ! II SSSSCC ! ISC-Bunker Ramo ! II CC ! TAF-C8; Spokane, WA 99220 ! IISSSS CC ! UUCP: uunet!isc-br!jimc (jimc@isc-br.iscs.com) ! II CCCCCC ! (509) 927-5757 +----------------+ "With excitement like this, who is needing enemas?"