pcg@cs.aber.ac.uk (Piercarlo Grandi) (03/01/91)
On 28 Feb 91 19:53:47 GMT, pcg@cs.aber.ac.uk (Piercarlo Grandi) said: On the subject of: Re: how many nfsd's should I run? I have written my usual fundamental and world enlightening Encyclic OOPS... contribution: pcg> I have crossposted to comp.arch, becasue this is really a pcg> system/network architecture question. NFS is almost incidental :-). But there is a wider discussion that deserves some space. I think that now is the time to post the long threatened discussion on network file system bogosities. Hope it is entertaining and amusing speculation, especially for all the thousands of sysadmins who have configured, for timesharing applications, Ethernets for hundreds of diskless workstations hanging off a handful of servers. So, I offer to the general public some truly basic thoughts on file service, which I find regrettably often not very familiar to a large number of fashion conscious people. File service is not a win everywhere; actually, it is only a sensible proposition under certain quite specific assumptions, and only if careful tuning is done. I name no names, and make no specific examples, but some not too veiled references to SUN and its networking technology and applicability to the usually abhorrent practices of their customers are obvious. Note that this is not a discussion of the bogosities of any specific network file service architecture or implementation; it is discussion of network file service itself. REFLECTIONS ON FILE SERVICE With networking technology it is possible to make a computer act as surrogate disc controller for several other computers over a network. It is therefore possible to physically cluster the disc storage of several machines on just one of them. There are a number of reasons for which this may be desirable over giving each machine their own discs and just let each access the others if needed: 1) simplified physical administration, such as backups, even if simplified software administration does not happen, as the number of (virtual) discs to manage does not really change (modulo the use of non traditional technology). 2) economies of scale in buying large discs which have lower cost per byte, and in using fewer, larger capacity backup devices. 3) reduced space usage as files, because many OS and library components need only be stored once. A relatively small drawback is potentially reduced availability, which can be cured by replicated centralized facilities (which tends to counteract at least in part the advantages above); the network itself, which however becomes a single failure point, is not usually a problem. We are saying here essentially that fully replicated autonomous systems provide a level of redundancy whose costs are greater than the potential benefits to many users, so sharing reduces costs without significantly impairing the value of benefits. There is one fundamental limitation, and it is that in the *best* of circumstances (no other traffic, point to point channel between two unloaded machines on an *bus* network) a tyipical net only supports about 700-800KB per second of bandwidth (in truly exception circumstances you might get up to 1000KB per second). This bandwidth is quite small, and the effective bandwidth could well be half to a third of that under less optimal conditions (especially with contention networks, sharing it effectively reduces it, even if not nearly as bad if the network, like Ethernet has collision detection and exponential backoff). Notice that remote disc access latency in the best of conditions can be in the 1 to 5 ms. range; this is usually small compared to typical rotational latency of 8-9 ms. and seek times of 15-25 ms.; this has led some innocent souls to conclude that a 15 ms. avg. seek time remote disc (total delay very roughly 3+8+15=26 ms.) will offer better performance than a local 25 ms. avg. seek time disc (total delay 8+25=33ms.), which is of course entirely true if nobody else is using the wire and the server. If the wire and the server are shared, this may become less true (euphemism). There are other potential bottlenecks, but they are far less important as remote disc access is neither CPU nor memory bound, nor, given the small bandwidth available on the network, really disc IO bound. In case of misdesigns some artificial bottlenecks may become evident, such as poor IO bandwidth. In many cases such bottlenecks by misdesign can be obviated by having multiple server machines, so that the single computer's bottlenecks are not reached. On the other hand, the network bandwidth is a datum, and using more than one wire is the only way out, but not necessarily (even if often) the most cost effective. Remote access to disc space therefore only makes sense if it is *guaranteed* that it happens infrequently, and that it is bursty (as effective point to point bandwidth is inversely proportional to the number of users in a *bus* network) *on the network* overall (it is well known that each single machine will usually be bursty). For this to be true, the working set of disc data necessary to a machine *must* be local to it, either in memory or on a local disc. If this does not happen, and the working set cannot be held in the machine, remote disc access will exhibit thrashing, and the shared network will be used as if it were a private IO bus, which it is not. To keep the working set local, a machine shall have local discs for high traffic, low latency IO, like paging, swapping, and temporary files, and an effective in core buffering scheme for other files, and one that makes the most use of memory. For example, under (some versions of) Unix it is convenient to have a local paging and swapping disc, so that frequently used executables can be made 'sticky' and linger on in the local swapping area, thus saving references to the remote discs when repeatedly invoked. A local paging disk is even more important considering that swapins/swapouts may involve large amounts of IO transactions, and that swap latency critically influences response times. The original ethernet environment consisted of workstations that had each a local hard disc with a removable cartridge. Each user had the OS and his files on a cartridge to be loaded on any available machine. Remote file access was only for shared services and libraries. It is no wonder that it could be said that 'the nice thing about the Alto is that it does not run faster at night'. This, experimentally, does not apply to the 'not responding - still trying' style of networked environments so popular nowadays, I'm afraid. Notice that if caching in the clients is effective, virtually no caching will be needed in the server, as the server will only see either infrequent changes in the working set, or rare requests for data outside the working sets, which are by definition infrequently referenced. Remember that caches are only effective when data is *repeatedly* accessed in a short time; they *cannot* speed up access to data, they can only amortize it over repeated accesses. If the data are not within a working set, caches don't help. Buffers, which are a different thing from caches, only help with smoothing *variations* in access rates, and only if data access patterns can be predicted, e.g. they are FIFO. It is absolutely *pointless* to have a 32 MB, 15 MIPS, 8 MB/s IO machine as server for remote disc access. The "bottleneck is the network", unless, because of (possibly intentional) misdesign, "the computer is the bottleneck". The same delusion applies to those who talk of server accelators; they can at best obviate some of the in-built bottlenecks of a misdesigned/ misapplied server. It is arguably better to spend the money improving the caching of working sets on the clients (more memory or a local disk), as even the fastest of server accelerators cannot accelerate the network bandwidth. Consider how many more megabytes at $50 each, or how many 40MB local disk at $300 each, you can buy for the cost of a network accelerator. It only makes sense if the server is an *application* server, i.e. if it is used to run applications, e.g. a database system, which can take advantage of the locally available abundant resources. But in this case we don't really have distributed processing at all, as the machines that interrogate the remote applications are really working as glorified terminals, i.e. we have just a disguised form of centralized timeshared computing, in which we use a network as terminal concentrator (which is a very bad idea if the network is a bus, less so if it is a ring). Note that having a file server with fast, multiple IO devices and multiple ethernets may also be a good idea. You get a star of ethernets. This again is a variation on the centralized computing theme, but is a defensible one. It also makes sense, for cases were there are CPU and/or memory bound applications, to have fast, large machines, with no terminals or discs, as power servers. If the applications are CPU and/or memory bound, IO interactions, be them with terminals or discs, are by definition scarce, and can well profitably take place on the network, and the greater latency of remote execution is not important. Also, it is usually not economical to give each and every machine a fast CPU and a large memory. So, most machines, especially if for individual use, can be devoted to the (ever raising, again because of interesting misdesigns) costs of running the user environment, such as a GUI, or small compiles, edits, whatever. On the other hand, if applications are IO bound, using your industry standard slow shared network as IO (or cache access) bus is a pretty bad idea, and if they are CPU, memory, and IO bound, a supercomputer is needed, not a network. -- Piercarlo Grandi | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
mo@messy.bellcore.com (Michael O'Dell) (03/01/91)
Don't move data if you don't need to. or Don't move data over a thin wire if you can move it over a thick one. Seems pretty obvious to me.
pcg@cs.aber.ac.uk (Piercarlo Grandi) (03/02/91)
On 1 Mar 91 14:54:31 GMT, mo@messy.bellcore.com (Michael O'Dell) said: mo> Don't move data if you don't need to. mo> or mo> Don't move data over a thin wire if you can move it over a thick one. mo> Seems pretty obvious to me. Cannot argue with your comment, I am afraid. I am sorry for restating the obvious in so many words, but I have spent so much time explaining it to various people in the past, and observing that a large number of LANs are configured as if the obvious were deep misteries of the lost Atlantis, that I wanted to put things on record. Also, it does not seem so obvious to all those that buy into things like NFS accelerators instead of more memory for clients or NFS engines with several Ethernet interfaces and Ethernet wires (Auspex, for example, to make Guy Harris happy), things that make some sense. Also, I want to initiate a move back to timesharing machines, my first love (ahhh, a nice 1108 with DEMAND and a FASTRAND; ahhhh, a nice 370/168 with CP/CMS and 2435s; ahhh, a nice 6080 with Multics and IOPs; ahhh, a nice 11/70 with 2.9BSD and 2 80MB removables; ...). Most syadmins are hapless, and as of now they are the greatest performance bottleneck, not even the wire. LANs are much more difficult to administer than timesharing, and much more difficult to tune. I tried to give some qualitative idea of the enormity of the problems... -- Piercarlo Grandi | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
leo@unipalm.uucp (E.J. Leoni-Smith) (03/05/91)
Hmm. Piercarlo has just gone to great lengths to restate the underlying engineering princilples of the network. Well it probably needed saying. I notice that his email domain is a '.edu' We have a famous remark that is quoted here as a result of a frustrated person who sent back some PC networking software after utterly failing to get it to work (we have sold tens of thousands of same). "I am not entirely stupid: I have a degree in computer science..." I have a degree in engineering, and sometimes I must confess that I have a personal bias against academics who endlessly obfuscate simple issues Like the man siad - don't send data down a wire if you can help it! (prove me wrong - you ARE a computer scientist) Likewise all power to the guys at SUN for taking a poorly specified (but practical) networking standard, and adapting it for a useage (LAN file serving) for which it was not designed. THATS engineering. In the real world.