pcg@thor.cs.aber.ac.uk (Piercarlo Grandi) (01/04/90)
There has been some discussion on my analysis of GNU Emacs memory management, with various opinions. There has been (in comp.editors) a call for editors ideas. The two will be merged in this posting of mine. There is (and this is much wieder subject) a fundamental divide between thos that think that programming is building a tool that works (often Americans) and those that think that programming is communicating descriptions of entities and operations on them, both to humans and computers (often Europeans). The first type of programmers are 'hackers', and they aim for the 80% solution, that is the solution that works for them 80% of the time, and damn the rest; the major example of this attitude is Bill Joy; the others aim for simple, documentable, complete solutions based on well understood principles; major example is Dijkstra. (Note: both Joy and Dijkstra are extreme examples). I tend to think that I do belong to the Dijkstra group, not the Joy crowd. I have been thinking about designing my own proper editor, not as a quick (or slow) hack, but as it should be done. I think I have got a clue (IMNHO), and here it is. I think it would make a fine Ph.D. thesis subject. It is an old idea of mine, but I haven't yet seen it published (the closest thing is the concept of Smalltalk browser). How can we characterize an editor? It is clearly (to my jaundiced eye) a set manipulator (secondarily, a set browser). In particular, I think we can restrict our discussion to manipulations of sequences (Hoare's article in Structured Programming), even if at least the TSS editor was key based. More precisely, and editor should be equivalent to a generic sequence facility. What kind of operations can be done on sequences? well, add element, remove element, apply, concatenation, order comaprison if applicable, ... An editor is something that reads a sequence from a file into a buffer, manipulates it, and then writes it back. It can take advantage of the fact that while in the buffer manipulation need not be strictly sequential, for example, if the entire sequence can be held in it. Notice that under this definition rn, mail -f, and a lot of other programs are editors. Indeed an editor should be generic in the sense that it operates on sequences of entities, where entities may have any type; news articles, mail messages, passwd lines, etc... An editor should have a general purpose sequence management engine, and a number of modules with a standardized interface that define the semantics for the specific record type. Each record type module should have primitives that indicate record boundaries, allow for (potentially recursive!) intra record editing, display in a pretty form the record contents, allow content based selection of records. An editor should have an internal language that allows the user to write programs using these operations through an high level interface. It is clear that a generic editor can be what a broswer is *and* more. I can easily conceive of an editor that in one window allows me to edit a mailbox and in another a C source where each file level entity is considered a record, and in another a list of newgroups, where each newgroup is considered a record, and another where one of these newgroups is displayed, and each article is considered a record; even simple text may have different modules, e.g. for files made up of lines composed of ASCII characters or two byte chinese ones. I can easily conceive of an editor whose 'lines' can take multiple lines of screen, and have many fields. In some sense much of the user friendliness of GNU Emacs comes from such a view of the world, only that in GNU Emacs all types of 'records' are forcibly mapped into ASCII lines, in an ad hoc way. This is both inefficient and limiting. As to implementation details, I would implement this editor by mapping a file into memory (or just leaving it where it is), and building a list of pointers to each record boundary in it; performing updates by recording them in a log (either in memory or in a file), as this avoids the need to copy the original, and allows easy undo, threaded with the pointer list. Writing out simply means then following the pointer list and writing out a record a time (with obvious optimizations if records happen to be contiguous in storage). If many updates are done, the user should be given the option of trimming the log (forgoing some undo), or to merge the log with the original and restart. Before any of these two are necessary, the log of updated records should be compacted, of course, every now and then, both as to useless entires and as to clustering for better locality. As per the Emacs cookbook I would have an editing engine distinct from the redisplay engine, communicating via a window image buffer. As per the old, good, Arizona idea of frontending ed(1) for full screen editing, one could conceivably have them in distinct processes, thus making possible to have several front ends, e.g. one oriented to full screen editing, one for teletypes, one for X, etc... The editing engine and the display engine can be *entirely* independent of the record semantics provided in each specialized module. As an optimization, to avoid the necessary indirection overhead in the most common case, I can conceive the module for records of ASCII characters terminated by newline being compiled in and special cased, to reduce path length. As I see it, the right way to design such an editor would be to carefully analyze the 'sequence of records' type constructor, define a suitable abstract interface for it, making sure it makes semantic good sense, and then define what kind of operations on the single records are needed to be available to support the higher level interface. But alas, this is not hacking... Any takers ? -- Piercarlo "Peter" Grandi | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcvax!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
fin@uh.msc.umn.edu (Craig Finseth) (01/04/90)
There is (and this is much wieder subject) a fundamental divide between thos that think that programming is building a tool that works (often Americans) and those that think that programming is communicating descriptions of entities and operations on them, both to humans and computers (often Europeans). The first type of programmers are 'hackers', and they aim for the 80% solution, that is the solution that works for them 80% of the time, and damn the rest; the major example of this attitude is Bill Joy; the others aim for simple, documentable, complete solutions based on well understood principles; major example is Dijkstra. (Note: both Joy and Dijkstra are extreme examples). I am very uncomfortable with this division. I prefer a three-way division: - Those that aim for the 80% solution (hackers: "see, it works for me"). - Those that aim for good textbook cases (academics: "see this elegant algorithm? so what if it scales O(n^n)"). - Those that design, produce, and document complete solutions. Such solutions perform well, have good user interfaces, are easy to maintain, and fit well into the system as a whole. (Many well-thought out general statements omitted.) In some sense much of the user friendliness of GNU Emacs comes from such a view of the world, only that in GNU Emacs all types of 'records' are forcibly mapped into ASCII lines, in an ad hoc way. This is both inefficient and limiting. Ah, but it is very powerful and consistent. This very issue comes up time and again with structure editors. Yes, it is very pure and consistent to build an editor in terms of objects. However, IT IS IMPORTANT THAT THE EDITOR'S VIEW OF THE OBJECTS IS WELL-MATCHED TO THE USER'S VIEW. (Note carefully which view is variable and which is constant.) I view the Emacs interface as very consistent. All of my learning regarding editing this message carries directly over to editing C code, to editing nroff source, and to patching binary files. Many people have developed many different structure editors. I have yet to see one that has a user interface with the same degree of consistency, carryover, and power as the Emacs interface. I'm not (yet) saying that it can't be done: I'm saying that it has not been done and you may wish to think twice about the scope of the problem that you're tackling. As to implementation details, I would implement this editor by mapping a file into memory (or just leaving it where it is), and building a list of pointers to each record boundary in it; performing updates by recording them in a log (either in memory or in a file), as this avoids the need to copy the original, and allows easy undo, threaded with the pointer list. Writing out simply means then following the pointer list and writing out a record a time (with obvious optimizations if records happen to be contiguous in storage). If many updates are done, the user should be given the option of trimming the log (forgoing some undo), or to merge the log with the original and restart. Before any of these two are necessary, the log of updated records should be compacted, of course, every now and then, both as to useless entires and as to clustering for better locality. This is a good design in theory. Does it scale well in practice? In particular, as redisplay -- not editing -- has been shown by your own data to be the sticking point, how well does this representation speed up redisplay? (The best argument that I am aware of for linked line representation is the performance boost that it gives to redisplay.) As per the Emacs cookbook I would have an editing engine distinct from the redisplay engine, communicating via a window image buffer..... The "cookbook" also pointed out that the "pure" model would in general not perform well enough. Well-designed "behind the scenes" hooks between the edting and redisplay engines are in general necessary. The editing engine and the display engine can be *entirely* independent of the record semantics provided in each specialized module. As an optimization, to avoid the necessary indirection overhead in the most common case, I can conceive the module for records of ASCII characters terminated by newline being compiled in and special cased, to reduce path length. You're already bringing in those pesky details which clutter up elegant algorithms. As I see it, the right way to design such an editor would be to carefully analyze the 'sequence of records' type constructor, define a suitable abstract interface for it, making sure it makes semantic good sense, and then define what kind of operations on the single records are needed to be available to support the higher level interface. But alas, this is not hacking... Neither will it lead to a pure, elegant program. Good software engineering (the field that many of us claim to practice) must simultaneously encompass high level design and nitty-gritty detail. As with all engineering, its essence is compromise. Your model is a good solid one. It must be shaped by all external constraints (user interface, machine performance limits, representations of the objects to be manipulated, etc.) into an actual program product. In the process, the model will be refined and changed, most often by compromises that are mandated by external constraints and that you would rather not have to make. Craig A. Finseth fin@msc.umn.edu [CAF13] Minnesota Supercomputer Center, Inc. +1 612 624 3375
spencer@pyr.gatech.EDU (Spencer Rugaber) (01/05/90)
What you are talking about sounds like what was done for TEN/PLUS. This
is an editing system that is provided with AIX on IBM RT's. It featured
an editor capable of editing a large and extensible set of document types
including text files, mail boxes, hierarchical data files, C programs, etc.
The semantics of the editing/browsing/file manipulation are separated
from the semanatics of the application, both internally and at the user
interface. Thus, deleting a message from a mailbox, a line from a file,
or a file from a directory, are accomplished with the same command by
the user, but with different results in the system.
The system is extensible and features a scripting language so that simple
applications can be written by non-programmers. It can also be extended
to new types of files by experienced C programmers.
(Note: This system has been in existence since 1983, but I am unsure
which of the features I have mentioned are actually available on the
RT.)
Spencer
--
SPENCER RUGABER
Georgia Insitute of Technology, Atlanta Georgia, 30332
...!{allegra,amd,hplabs,ihnp4,seismo,ut-ngp}!gatech!spencertbray@watsol.waterloo.edu (Tim Bray) (01/09/90)
In article <PCG.90Jan3180137@thor.cs.aber.ac.uk> pcg@thor.cs.aber.ac.uk (Piercarlo Grandi) paints a vision of the editor of the future, with only a moderate amount of handwaving. First off, the idea of separating the display logic from the buffer management logic is 100% right; among other things, it gives you the ability to write outlandish new display schemes for outlandish new data. A subtle but important point: one of the most powerful models of text is as a linear sequence of bytes, without reference to `records' of any kind. This model is behind the strength of Unix. Sadly, none of the popular unix editors handle record-less files very well. (Yes, I know emacs can take it, but is extremely stupid about redisplay and performance goes down the tubes). As texts grow larger and are treated more as database contents & less as typewriter-output, the concept of a 'line' becomes less and less useful until it is a dangerous hindrance [Cf. Bray, `Lessons of the New OED Project', Proc. Winter '89 Usenix]. Anyhow, don't build any assumptions about occurrences of the '\n' character into your software. Also, pcg@aber says: >There is (and this is much wieder subject) a fundamental divide... >The first type of >programmers are 'hackers', ... >the major example of this attitude is Bill Joy; Well, since this is comp.editors and you mention Joy, we gotta talk vi. I personally use emacs, but have a great deal of experience with vi, including work on a port to VMS. Vi is old-fashioned and inflexible, but it provides an amazingly functional interface while consuming almost no machine resources. Some of the things inside the implementation are very elegant. Credit where credit is due. Cheers, Tim Bray, New OED Project \ -and- > Waterloo, Ontario, Canada Open Text Systems, Inc. /
scs@iti.org (Steve Simmons) (01/10/90)
tbray@watsol.waterloo.edu (Tim Bray) writes: >In article <PCG.90Jan3180137@thor.cs.aber.ac.uk> pcg@thor.cs.aber.ac.uk >(Piercarlo Grandi) paints a vision of the editor of the future, with only >a moderate amount of handwaving. . . . >As texts grow larger and are treated more as >database contents & less as typewriter-output, the concept of a 'line' >becomes less and less useful until it is a dangerous hindrance [Cf. Bray, >`Lessons of the New OED Project', Proc. Winter '89 Usenix]. Anyhow, >don't build any assumptions about occurrences of the '\n' character into >your software. >Cheers, >Tim Bray, New OED Project -and- Open Text Systems, Inc. In the Q&A at that session I recall your being asked about the status of the tools developed for the new OED -- editors, indexers, etc. "With only a moderate amount of handwaving" (grin) you said they would probably become commercially available at some point in the future. Two questions: Would you please post some data here about those tools? (seems like a relevant post for this group), and Is the future here yet?