[comp.arch] Network file service bogosities

pcg@cs.aber.ac.uk (Piercarlo Grandi) (03/01/91)

On 28 Feb 91 19:53:47 GMT, pcg@cs.aber.ac.uk (Piercarlo Grandi) said:

On the subject of: Re: how many nfsd's should I run? I have written my
usual fundamental and world enlightening Encyclic OOPS... contribution:

pcg> I have crossposted to comp.arch, becasue this is really a
pcg> system/network architecture question. NFS is almost incidental :-).

But there is a wider discussion that deserves some space.

I think that now is the time to post the long threatened discussion on
network file system bogosities. Hope it is entertaining and amusing
speculation, especially for all the thousands of sysadmins who have
configured, for timesharing applications, Ethernets for hundreds of
diskless workstations hanging off a handful of servers.

So, I offer to the general public some truly basic thoughts on file
service, which I find regrettably often not very familiar to a large
number of fashion conscious people. File service is not a win
everywhere; actually, it is only a sensible proposition under certain
quite specific assumptions, and only if careful tuning is done.

I name no names, and make no specific examples, but some not too veiled
references to SUN and its networking technology and applicability to the
usually abhorrent practices of their customers are obvious. Note that
this is not a discussion of the bogosities of any specific network file
service architecture or implementation; it is discussion of network
file service itself.

		REFLECTIONS ON FILE SERVICE

With networking technology it is possible to make a computer act as
surrogate disc controller for several other computers over a network. It
is therefore possible to physically cluster the disc storage of several
machines on just one of them.

There are a number of reasons for which this may be desirable over
giving each machine their own discs and just let each access the others
if needed:

    1) simplified physical administration, such as backups, even if
    simplified software administration does not happen, as the
    number of (virtual) discs to manage does not really change (modulo the
    use of non traditional technology).

    2) economies of scale in buying large discs which have lower cost per
    byte, and in using fewer, larger capacity backup devices.

    3) reduced space usage as files, because many OS and library components
    need only be stored once.

A relatively small drawback is potentially reduced availability, which can
be cured by replicated centralized facilities (which tends to counteract at
least in part the advantages above); the network itself, which however
becomes a single failure point, is not usually a problem.

    We are saying here essentially that fully replicated autonomous systems
    provide a level of redundancy whose costs are greater than the potential
    benefits to many users, so sharing reduces costs without significantly
    impairing the value of benefits.

There is one fundamental limitation, and it is that in the *best* of
circumstances (no other traffic, point to point channel between two
unloaded machines on an *bus* network) a tyipical net only supports
about 700-800KB per second of bandwidth (in truly exception
circumstances you might get up to 1000KB per second).

This bandwidth is quite small, and the effective bandwidth could well be
half to a third of that under less optimal conditions (especially with
contention networks, sharing it effectively reduces it, even if not
nearly as bad if the network, like Ethernet has collision detection and
exponential backoff).

    Notice that remote disc access latency in the best of conditions can be
    in the 1 to 5 ms. range; this is usually small compared to typical
    rotational latency of 8-9 ms. and seek times of 15-25 ms.; this has led
    some innocent souls to conclude that a 15 ms. avg. seek time remote
    disc (total delay very roughly 3+8+15=26 ms.) will offer better
    performance than a local 25 ms. avg. seek time disc (total delay
    8+25=33ms.), which is of course entirely true if nobody else is using
    the wire and the server. If the wire and the server are shared, this may
    become less true (euphemism).

There are other potential bottlenecks, but they are far less important as
remote disc access is neither CPU nor memory bound, nor, given the small
bandwidth available on the network, really disc IO bound. In case of
misdesigns some artificial bottlenecks may become evident, such as poor IO
bandwidth. In many cases such bottlenecks by misdesign can be obviated by
having multiple server machines, so that the single computer's bottlenecks
are not reached. On the other hand, the network bandwidth is a datum, and
using more than one wire is the only way out, but not necessarily (even if
often) the most cost effective.

Remote access to disc space therefore only makes sense if it is
*guaranteed* that it happens infrequently, and that it is bursty (as
effective point to point bandwidth is inversely proportional to the
number of users in a *bus* network) *on the network* overall (it is well
known that each single machine will usually be bursty).

For this to be true, the working set of disc data necessary to a machine
*must* be local to it, either in memory or on a local disc. If this does
not happen, and the working set cannot be held in the machine, remote
disc access will exhibit thrashing, and the shared network will be used
as if it were a private IO bus, which it is not.

To keep the working set local, a machine shall have local discs for high
traffic, low latency IO, like paging, swapping, and temporary files, and
an effective in core buffering scheme for other files, and one that
makes the most use of memory. For example, under (some versions of) Unix
it is convenient to have a local paging and swapping disc, so that
frequently used executables can be made 'sticky' and linger on in the
local swapping area, thus saving references to the remote discs when
repeatedly invoked. A local paging disk is even more important
considering that swapins/swapouts may involve large amounts of IO
transactions, and that swap latency critically influences response
times.

    The original ethernet environment consisted of workstations that had
    each a local hard disc with a removable cartridge. Each user had the OS
    and his files on a cartridge to be loaded on any available machine.
    Remote file access was only for shared services and libraries. It is no
    wonder that it could be said that 'the nice thing about the Alto is that
    it does not run faster at night'.  This, experimentally, does not apply
    to the 'not responding - still trying' style of networked environments
    so popular nowadays, I'm afraid.

Notice that if caching in the clients is effective, virtually no caching
will be needed in the server, as the server will only see either
infrequent changes in the working set, or rare requests for data outside
the working sets, which are by definition infrequently referenced.

    Remember that caches are only effective when data is *repeatedly*
    accessed in a short time; they *cannot* speed up access to data, they
    can only amortize it over repeated accesses. If the data are not within
    a working set, caches don't help. Buffers, which are a different
    thing from caches, only help with smoothing *variations* in access
    rates, and only if data access patterns can be predicted, e.g. they
    are FIFO.	

It is absolutely *pointless* to have a 32 MB, 15 MIPS, 8 MB/s IO machine as
server for remote disc access. The "bottleneck is the network", unless,
because of (possibly intentional) misdesign, "the computer is the
bottleneck".

    The same delusion applies to those who talk of server accelators;
    they can at best obviate some of the in-built bottlenecks of a
    misdesigned/ misapplied server. It is arguably better to spend the
    money improving the caching of working sets on the clients (more
    memory or a local disk), as even the fastest of server accelerators
    cannot accelerate the network bandwidth. Consider how many more
    megabytes at $50 each, or how many 40MB local disk at $300 each, you
    can buy for the cost of a network accelerator.

It only makes sense if the server is an *application* server, i.e.  if
it is used to run applications, e.g. a database system, which can take
advantage of the locally available abundant resources. But in this case
we don't really have distributed processing at all, as the machines that
interrogate the remote applications are really working as glorified
terminals, i.e. we have just a disguised form of centralized timeshared
computing, in which we use a network as terminal concentrator (which is
a very bad idea if the network is a bus, less so if it is a ring).

    Note that having a file server with fast, multiple IO devices and
    multiple ethernets may also be a good idea. You get a star of ethernets.
    This again is a variation on the centralized computing theme, but is a
    defensible one.

It also makes sense, for cases were there are CPU and/or memory bound
applications, to have fast, large machines, with no terminals or discs,
as power servers. If the applications are CPU and/or memory bound, IO
interactions, be them with terminals or discs, are by definition scarce,
and can well profitably take place on the network, and the greater
latency of remote execution is not important. Also, it is usually not
economical to give each and every machine a fast CPU and a large memory.
So, most machines, especially if for individual use, can be devoted to
the (ever raising, again because of interesting misdesigns) costs of
running the user environment, such as a GUI, or small compiles, edits,
whatever.

On the other hand, if applications are IO bound, using your industry
standard slow shared network as IO (or cache access) bus is a pretty bad
idea, and if they are CPU, memory, and IO bound, a supercomputer is needed,
not a network.
--
Piercarlo Grandi                   | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

mo@messy.bellcore.com (Michael O'Dell) (03/01/91)

	Don't move data if you don't need to.

or 

	Don't move data over a thin wire if you can move it over a thick one.

Seems pretty obvious to me.

pcg@cs.aber.ac.uk (Piercarlo Grandi) (03/02/91)

On 1 Mar 91 14:54:31 GMT, mo@messy.bellcore.com (Michael O'Dell) said:


mo> 	Don't move data if you don't need to.
mo> or 
mo> 	Don't move data over a thin wire if you can move it over a thick one.

mo> Seems pretty obvious to me.

Cannot argue with your comment, I am afraid. I am sorry for restating
the obvious in so many words, but I have spent so much time explaining
it to various people in the past, and observing that a large number of
LANs are configured as if the obvious were deep misteries of the lost
Atlantis, that I wanted to put things on record.

Also, it does not seem so obvious to all those that buy into things like
NFS accelerators instead of more memory for clients or NFS engines with
several Ethernet interfaces and Ethernet wires (Auspex, for example, to
make Guy Harris happy), things that make some sense.

Also, I want to initiate a move back to timesharing machines, my first
love (ahhh, a nice 1108 with DEMAND and a FASTRAND; ahhhh, a nice
370/168 with CP/CMS and 2435s; ahhh, a nice 6080 with Multics and IOPs;
ahhh, a nice 11/70 with 2.9BSD and 2 80MB removables; ...).

Most syadmins are hapless, and as of now they are the greatest
performance bottleneck, not even the wire. LANs are much more difficult
to administer than timesharing, and much more difficult to tune. I tried
to give some qualitative idea of the enormity of the problems...
--
Piercarlo Grandi                   | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

leo@unipalm.uucp (E.J. Leoni-Smith) (03/05/91)

Hmm. Piercarlo has just gone to great lengths to restate the underlying
engineering princilples of the network.

Well it probably needed saying. I notice that his email domain is a '.edu'

We have a famous remark that is quoted here as a result of a frustrated 
person who sent back some PC networking software after utterly failing to
get it to work (we have sold tens of thousands of same). 

"I am not entirely stupid: I have a degree in computer science..."

I have a degree in engineering, and sometimes I must confess that I have
a personal bias against academics who endlessly obfuscate simple issues


Like the man siad - don't send data down a wire if you can help it!
(prove me wrong - you ARE a computer scientist)

Likewise all power to the guys at SUN for taking a poorly specified 
(but practical) networking standard, and adapting it for a useage
(LAN file serving) for which it was not designed.

THATS engineering.
In the real world.