brent@sloth.Berkeley.EDU (Brent Welch) (03/22/87)
I've been meaning to introduce Sprite to mod.os, and the discussion on mapped files has spurred me to action. I'm from the filesystem camp. I'm one of a few students developing the Sprite operating system here at Berkeley under Prof. Ousterhout. Sprite appears much like UNIX on the outside, but we've spent the last couple years designing and implementing our own kernel. The kernel interface is close enough to UNIX that we can re-link unix programs with a compatibility library and run them just fine. csh make cc, etc. Sprite has been up and running for some time, although it is still under development. (ie. Process migration, Processes behind files, and the IP/TCP interface to outside world, are currently being finished up.) A central notion of Sprite is that we're providing a nice distributed filesystem that will solve the world's problems. :-) Even though we have a zillion diskless workstations, we'd like the ease of sharing and administration of the old timesharing systems. We assert that the filesystem can provide enough functionality at high performance to do this. In particular, there are some things that are more easily done with a filesystem approach than with a mapped file approach. One, you still need naming. The Sprite filesystem has a network global namespace that includes files as well as other things like peripheral devices and server processes. The filesystem takes care of the distribution of the name space, and it provides a uniform interface to all things kept there. As was said by someone else, how do you map a terminal into your memory so you can have device independent I/O? Also, what does a mapped file approach do for your server processes that want a name? In Sprite, a server process can attach itself to a name in the filesystem. After that, clients can operate on the name just like it was a file. Of course, the filesystem operations may not map to what the server wants to implement, so a more general operation, RPC essentially, is also available on these special kinds of files. This may sound exotic, but it provides an RPC interface to a server process, something that is well-known to be a good idea, plus it places the server in the regular name space. You can browse for availble services with "cd" and "ls". What about caching? In Sprite, diskless nodes keep a regular read-ahead, delayed write, cache for filesystem blocks. In order to keep things consistent the fileservers track this caching. To simplify things, they will inhibit caching for a file, and force write backs and flushes, when different clients are simultaneously writing a file. Servers also let clients keep dirty blocks after a file is closed, ie. write-back-on-close is not done. Again, they'll do the right thing client B reads a file shortly after client A writes it. However, this "sequential" sharing usually happens on the same client, and the file may get deleted before being written out of the cache. For example, running on the same hardware, 8 Meg Sun 3/75, Sprite runs compilations faster than Sun's 3.2 using NFS. The slowness of NFS is mainly due to its use of write-through-to-server's-disk. To recompile the Sprite filesystem itself (28,000 lines of C...) takes a Sun UNIX diskless client 15 min 52 sec, while a Sprite diskless client, with the same size cache, completes the compile in 9 min 54. If we ramp up the max size of the Sprite client's cache (the cache size varies dynamically in response to demands by the VM system and FS) it can complete in 9 min 31 sec. (The compile runs faster on Sprite locally too, 10:51 on Sun/UNIX 3.2 vs. 9:12 locally on Sprite, both runs with same size caches.) (Back to mapped files...) If you map files, then it isn't possible to "disable caching" when concurrent write sharing occurs. You either legislate against this kind of sharing, or implement more complex schemes. Typically you pass tokens around the network that allow you to modify pages. One last objection. Supose you have a multi-processor architecture, the kind with a handful of processors each with a large local cache (instructions and data, forget the filesystem here). If its a virtual address cache, then virtual synonyms, different virtual addresses that refer to the same physical address, mess up the multi-processor's cache consistency problem. This means you may not be able to map the same file into different processes' address spaces at different locations. Actually, I don't even think you need the complication of a multiprocessor, just the virtual address tagged cache. Anyhow, this is a more baroque objection to mapped files, but operating systems must run on real machines, after all. So, a return address that our whole group will read is ...ucbvax!ucbginger!spriters spriters@ginger.Berkeley.EDU and mine in particular is ...ucbvax!ucbginger!brent brent@ginger.Berkeley.EDU Brent Welch