shapiro@blueberry.inria.fr (Marc Shapiro) (11/18/88)
I participated an OSF meeting last week where OS technology and relation to research was much discussed. As this may interest people, here is a report. I believe comp.os.research is the most appropriate forum; if not, forgive me. Marc Shapiro INRIA, B.P. 105, 78153 Le Chesnay Cedex, France. Tel.: +33 (1) 39-63-53-25 e-mail: shapiro@sor.inria.fr or: ...!mcvax!inria!shapiro =========================================================================== Report on OSF meeting Bruxelles, 7 October 1988 [The OSF held a meeting in Brussels on 7 november 1988 to which vendors and research institutions in Europe were invited. I was there as INRIA's ear. I also wanted to voice my own strong objections to OSF: I am afraid that such a big conglomerate can do no useful work, and that they will impede the free circulation of ideas and sources. I summarize here my meeting notes on the 3 topics which I feel most important: * how ``open'' is the OSF? * OS technology * relations to research, especially in Europe My own remarks are in square brackets like this.] 1. How open is the Open Software Foundation? Anybody may join the OSF. It costs $4.5M to be a ``sponsor'', i.e. hold a seat on the board of directors. ``Membership'' is $25K for for-profit organizations, $5K for non-profit, and $2K for educational institutions. Total budget is approx $150K. All members are equal in the decision process, which is ``vendor-neutral''. OSF is not a standards or a consensus commitee, but a software house. They take their decisions independently, after input from members, who receive full and equal information. Members have full and equal access to all project plans, specifications, designs, source code, documentation, rationales of decision and validation suites. They may consult them on-line at any time, even as they are being developed. After a component is released, non-members may apply for a licence for that component, also in source form. A component will be released if it passes OSF's validation suites on various different hardware architectures. OSF delivers source code only (no binaries), with clear separation between machine-independent code, and machine-dependent code for the 3 or 4 reference architectures. The reference architectures shall cover the range from PC's to mainframes. A reference architecture must be supplied by multiple vendors, and be non-controversial. Currently the reference machines are: the 386 with AT bus, the 680x0, and the 370 architecture. The IBM PC/RT is temporarily chosen to represent the RISC family, to be replaced by whatever RISC becomes the industry standard. Source code will be protected by copyrights and patents (which shouldn't prohibit local copies and hacking). No trade secrets, non non-disclosure licences (except for code originating from ATT, of course, unless they can reach an agreement). No export prohibitions (except when mandated by law). The idea is to make new technologies *available*, not to hide them. Anybody can buy a licence to anything (nominal fee for universities). Nothing is mandatory. The scope of OSF covers all of the OS, in a broad sense, including common tools, and excluding hardware and applications. They will support tools which run on non-Unix kernels (such as VMS), e.g. the user interface. [Apparently both hardware manufacturers and software houses have decided there was no more money to made on OS's. They both want to get out of that business. HW people don't want to hassle with system and application developers want their stuff to run on any machine.] [I asked about possible relations with the Free Software Foundation (GNU project). The issue is apparently a painful one. Answer: ``we will distribute user-contributed software too. If FSF asks us to, we will distribute their stuff, but the terms of the FSF copyleft won't let us do that''.] [In conclusion, OSF software is pretty ``open'' but not yet ``free''. I believe they can be convinced to support research for the development of truely free software, since IBM and DEC already do so with Andrew (CMU) and X-windows (MIT).] 2. Operating System technology The OSF Kernel version 1 is based on the IBM-RT/PC's AIX version 3 (i.e. without the Virtual Resource Manager), itself based on System V. AIX was chosen *although* it's an IBM product, because it was believed to be, at the present point in time and from a technical standpoint, the best base to start to have a high-quality Unix system ready by mid-1989. OSF is not commited to AIX, and fells free to rip out anything they don't like. OSF is aware of the limitations of the ATT-based technology and *wants to collaborate with research to replace the kernel with a better one* in the future. 2.1 Main points of the OSF operating system: * conforms to current standards (to protect existing software) i.e.: X/OPEN, SVID, POSIX, 4.3BSD, TCP/IP (OSI to be added), NFS, X-windows, RFC822, SMTP, etc. * Will not be stifled by standard commitees in future developments, but will conform to industry standards as they emerge. * All the BSD functionality is there without code redundancy (no BSD code; it is all re-written): all BSD system calls .h files, and libraries; csh, multiple groups, signals, job control, long file names, symbolic links, select, pty, sockets, dbx, mail, file quotas, etc. * Targeted to any modern architecture: 32 bit address space, supervisor/normal execution mode, memory protection. * Documented ``System Internal Interface'' and hooks for adding functionality to the kernel. 2.2 Enhancements to Unix All enhancements will be upwards compatible: the old Unix interfaces, are preserved. For instance, the kernel now contains multiple concurrent pre-emptible threads, with primitives for protecting critical sections; however the old sleep/wakeup interface is retained for compatibility. * Lightweight processes in the kernel [no mention of kernel support for lightweight processes in user code.] * System V IPC extended across the net, protected by access control lists. Streams not used because of ATT restrictions; replaced by V7 multiplexed files. * The kernel can be configured on-line. * Dynamic linking (implies extended, upwards compatible COFF format) for both user processes and kernel; new drivers can be loaded on-line. * Demand paged virtual memory of user processes and kernel; mapped files; single-level store; no buffer cache. Pin, pre-page, and purge primitives. * Fork by copy-on-reference (because not all hardware supports copy-on-write). * Disks partitioned in 4Mb physical chunks; a logical partition is any number of chunks (possibly spanning multiple disks); its size may be changed at any time, in 4Mb increments, by the administrator. No fixed-sized partitions, no dedicated swap zone. * File system meta-data managed with DB techniques: journal, atomic commit; fsck should never be necessary again. * Terminals: POSIX-compatible job control, page mode, input editing a la PC-DOS; curses with color; pty's. Multiple physical bitmaps possible; each may be multiplexed into multiple virtual terminals (therefore it will be possible to run 2 different window sytems on the same screen). Access either in ``monitored'' mode (i.e. bitblit access) or via ASCII terminal emulator. Efficient graphics library (uses monitored mode) for X, GKS, PHIGS. * National language support: both 8 and 16 bit character sets. * Structured I/O handling to ease writing drivers. * Error logging by error device driver. * Unbundled subsystems, such as X-windows; atomic installation tools. * Distributed file system and virtual memory [I don't know what they mean by that], with local and remote caching, full Unix semantics, C2 security using Kerberos. Availibility: IBM delivers first version at the end of November (will be passed on immediately to the membership). The OSF/1 hardware-independent kernel is to be delivered in the second half of 1989. The user interface (based on X-windows) will be commercially available independently, in the first half of 1989 on System V R3. 3. Relation with research OSF is commited to supporting industry standards, in order to protect existing software, but will not be stifled by them. The OSF Research Institute will be on the lookout for innovations from research institutions which can be marketed within 2 to 5 years (less than 2 years is development; more than 5 is utopia). The OSF Research Institute is a ``transformer'' from research to development, in a vendor-independent way. It will facilitate communication and fund (or help funding) research. 5 programs: * operating systems, * distributed services, * information technology, * user interface, * software engineering. In each program the OSF RI maintains an independent development team which will test and evaluate prototypes from research. Each has a University program which will organize colloquia, print newsletters, edit books, fund sabbaticals and research grants. The money goes 40% to the USA, 40% to Europe, and 20% to the Pacific. They primarily encourage collaborative funding (e.g. Esprit or NSF). Europe has a long tradition of OS research, e.g. ANSA, Amoeba, Chorus, Birlix, Comandos, PCTE, Newcastle, Gothic. Collaboration with research in Europe is sought on: * alternative kernel technologies (e.g. message-based). * Architecture-Neutral Distribution Format * Persistent programming languages. * Distributed application environments. * Fault-tolerance. * Distributed debugging. * Distributed resource allocation. The European RI will organize thight cooperation with European universities. An international workshop will be organized in the Spring in Europe open to members. ``The best way of increasing the weight of Europe in the OSF is for more Europeans to join as members''. The European RI advisory board has leading researchers in it: Mr. Goos of GMD (Berlin), G. Kahn of INRIA (France), S. Mullender of CWI (Amsterdam), and R. Needham (Cambridge). [The OSF is trying very hard to convince research to collaborate. I clearly see what their advantage is in doing so. I see less well what research can gain from such a collaboration, other than strictly material. The goals and terms of the proposed collaboration don't seem very clear to me. ] 4. Conclusion [My questions about free access to sources were answered frankly. OSF sources will not be free but will be available to all members, and no trade-secret protection will apply. I imagine they can be convinced to find research to develop free software, as was done for X-windows and Andrew. OSF is big and bulky. Within 1/2 year we should know if they can get any good work done (when the user interface is scheduled to be believed). OSF has a lot to gain from membership of research institutions and are ready give them financial material support. I still need to be convinced that research institutions have something to gain by joining.]
rick@seismo.CSS.GOV (Rick Adams) (11/24/88)
> * File system meta-data managed with DB techniques: journal, > atomic commit; fsck should never be necessary again. It is a truly impressive piece of software that can prevent hardware errors from damaging data on the disk. I think I'll keep a copy of fsck around anyway. ---rick
w-colinp@microsoft.UUCP (Colin Plumb) (11/27/88)
In article <5583@saturn.ucsc.edu> rick@seismo.CSS.GOV (Rick Adams) writes: >> * File system meta-data managed with DB techniques: journal, >> atomic commit; fsck should never be necessary again. > >It is a truly impressive piece of software that can prevent hardware >errors from damaging data on the disk. I think I'll keep a copy >of fsck around anyway. Not really; the techniques are well known, and involve duplicating all important information. There was one file system at Xerox I recall that was secure against all failures in a single or two consecutive sectors. (User data could be lost, but the file system's integrity was guaranteed.) The free block bitmap had no invariants on its correctness; it was merely a performance-boosting cache. Essentially the idea is that anything fsck can do, the file system does automatically. It's adding "incrementally", i.e. not locking the whole disk while you fix things up, that's hard. The people at Xerox also found another benefit: the amount of code that had to be correct to maintain basic FS consistency was only a few pages. As long as the rest worked most of the time, all would be well. -- -Colin (microsof!w-colinp@sun.com)
rick@seismo.CSS.GOV (Rick Adams) (11/29/88)
> Essentially the idea is that anything fsck can do, the file system does > automatically. It's adding "incrementally", i.e. not locking the whole > disk while you fix things up, that's hard. Moving fsck into the filesystem code is only renaming fsck, not getting rid of it. Whats so horrible about the current BSD filesystem? It's already got duplicate copies of the superblock. It can rebuild the free block bitmap if necessary, so you can say that it too is only a performance win. What about the cost/performance tradeoffs of these great 'database techniques'? I'm not willing to shadow every disk drive I have. Buying 30 extra gigabytes of disk to insure filesystem consistancy is not very reasonable. To use Andy Tanenbaum's example, "What happens if there is an earthquake and your entire computer room falls into a fissure and is suddenly relocated to the center of the earth?". I suspect you lose big. (Tanenbaum discusses distributed filesystems as a possible solution to this) What price? This is totally passed over in the name of fixing something that is not necessarily broken in the first place (Note I'm only talking about the BSD filesystem, the Sys5 filesystem can be considered broken if you wish) E.g. I'm not willing to give up the huge performance gain of having lots of disk blocks cached in memory for the infinitessimal increase in disk stability. The OSF "announcement" clearly wins the prize for buzzwords per square inch, but what is it really saying? ---rick
root@husc6.harvard.edu (Celray Stalk) (11/30/88)
Along the lines of robust file systems: For my master's thesis I modified the Unix kernel (the SunOS version, which matters little except that the code was messier because it dealt with NFS) to include "transaction logging" in the database sense of the words. I used the sticky bit on non-executable files to mean that the file should have transactions logged on it whenever it was changed. Then in the kernel I added code that watched for this bit during writes and logged all changes (the before and after write data, the size of the write and the location it occured at) into a system-wide log file. The next step was to write a set of library routines which implemented the usual "undo", "redo", etc database functions on files which had logging done. The last step was to analyze the performance cost of logging a file. It turned out that, as expected, the cost was no higher than two synchronous disk writes. Synchronous because transaction logging requires that changes occur to the file before the changes are logged, and that cannot be assured unless synchronous writes are used. Of course extra disk space was used to store the transaction file. So the long answer to a short question is that it is possible to add at least _some_ more robustness to current Unix disk systems without incurring large performance penalties. (And of course you get more than just robustness through transaction logging.) --Peter ------------------------------------------ -------------------------------- Peter Baer Galvin (203)432-1254 Senior Systems Programmer, Yale Univ. C.S. galvin-peter@cs.yale.edu 51 Prospect St, P.O.Box 2158, Yale Station ucbvax!decvax!yale!galvin-peter New Haven, Ct 06457 galvin-peter@yalecs.bitnet
shapiro@iznogoud.inria.fr (Marc Shapiro) (12/02/88)
In article <5598@saturn.ucsc.edu> rick@seismo.CSS.GOV (Rick Adams) writes: >Moving fsck into the filesystem code is only renaming fsck, not getting >rid of it. 1) fsck is very slow for large systems. My Sun server has a mere gigabyte attached to it and rebooting takes ages. 2) getting rid of fsck is not the only advantage of doing updates atomically. >Whats so horrible about the current BSD filesystem? It's not bad (except that it's too complex). OSF proposes a filesystem where the size of any partition can be chnaged online. I think that's a *big* win. >What about the cost/performance tradeoffs of these great 'database >techniques'? This of course is the big question. These techniques have been around for a while now and I expect we (i.e. the comp.os.research community) now know how to implement them right. A write-ahead log implementation allows to do atomic updates without duplicating all the data on disk (i.e. you duplicate new data, in the log, only for the short period of time where you are not sure of the outcome of the transaction; then you can re-use the log). However you then lose the benefit of shadow disks: that even a head crash on a single disk doesn't delete your data. Using a write-ahead log shouldn't necessarily slow you down w.r.t. asynchronous updates, because updates are spooled to the log. Only the commit record needs to be written synchronously. > I'm not willing to shadow every disk drive I have. Buying >30 extra gigabytes of disk to insure filesystem consistancy is not very >reasonable. If I understood correctly, the OSF proposal is to update filesystem *metadata* (superblocks, inode tables, and directories) atomically; not user data. >To use Andy Tanenbaum's example, "What happens if there is an >earthquake and your entire computer room falls into a fissure and is >suddenly relocated to the center of the earth?". I suspect you lose >big. I just checked the fsck man page; I didn't find the option to deal with this kind of situation. (:-) > (Tanenbaum discusses distributed filesystems as a possible >solution to this) You're saying that you must duplicate all your data on to two disks (or other media) which are in 2 places far away enough from each other that no single earthquake will swallow them both. You were talking about the cost? >The OSF "announcement" clearly wins the prize for buzzwords per >square inch, but what is it really saying? I guess we wil find out when their kernel becomes available. If they deliver what they promise, and the performance is not a lot worse than existing Unixes on comparable configurations, then I think we should applaud, and demand to have access to the sources to play with. Marc Shapiro INRIA, B.P. 105, 78153 Le Chesnay Cedex, France. Tel.: +33 (1) 39-63-53-25 e-mail: shapiro@sor.inria.fr or: ...!mcvax!inria!shapiro Marc Shapiro INRIA, B.P. 105, 78153 Le Chesnay Cedex, France. Tel.: +33 (1) 39-63-53-25 e-mail: shapiro@sor.inria.fr or: ...!mcvax!inria!shapiro