ntm1569@dsacg3.UUCP (Jeff Roth) (08/24/89)
The mount server appears to be becoming a bottleneck for an application in which we've a large number of PC clients accessing data on a minicomputer server. On occasion we can have quite a few users issuing multiple mount requests simultaneously. When this happens we see some of the requests time out, while users accessing already mounted files continue to receive good service. To be precise, the server is a Gould PowerNode 9050 (uniprocessor) with a full complement of disk and ethernet controllers (four of each), and little running on it other than the various network services (next to no interactive users). PCs start up by mounting and unmounting a file system to download the application binaries, then mount data(base) files. With around forty clients accessing the server additional clients' mount attempts begin to fail (though retries may succeed). (At this point we might have a dozen or so new clients trying to mount). The mount server has to read /etc/exports, and to do the host name to IP address translation would also have to access /etc/hosts (or the name server), and it writes /etc/rmtab. So we thought mountd might be having trouble getting to /etc. But ps "snapshots" showed mountd rarely waiting on disk. The mount server obviously also needs CPU cycles, and must compete for them, mostly with the many NFS server daemons we run. At peaks we see mid-20s load averages, and with mountd reniced to increase its scheduling priority we are able to get seventy clients "on" before we again begin to see the mount requests time out. Our conclusion's been that we have a CPU bottleneck, with mountd getting the worst of it. We're a little surprised, though, by the extreme insensi- tivity of the already-mounted clients to the bottleneck (remember they continue to see good response time even at peak loads). I'd be interested in hearing if anyone else has run into this particular wall in building an NFS application, and what if anything you've done about it. Or if anyone has any other thoughts on what might be happening. jroth@dsac.dla.mil U.S. Defense Logistics Agency (614) 238-9421 --
chuq@Apple.COM (Chuq Von Rospach) (08/25/89)
>The mount server appears to be becoming a bottleneck for an application in >which we've a large number of PC clients accessing data on a minicomputer >server. On occasion we can have quite a few users issuing multiple mount >requests simultaneously. When this happens we see some of the requests time >out, while users accessing already mounted files continue to receive good >service. Definitely. For a good time, set up a machine exporting USENET to three or four hundred machines and then have it crash for 24 hours. All of the NFS servers jump on it as soon as it comes back up, and I've seen mount requests sit two hours waiting to happen. >The mount server has to read /etc/exports, and to do the host name to IP >address translation would also have to access /etc/hosts (or the name >server), and it writes /etc/rmtab. So we thought mountd might be having >trouble getting to /etc. But ps "snapshots" showed mountd rarely waiting >on disk. The disk activity of mountd is fairly trivial.hostname looks via Yellow Pages clears out a good bit since you aren't sequentially searching the host table. Imagine, though, what's happening at the network layer. 50-100 (or more) machines are all trying to create connections to the mountd at once. It's spinning away, dealing with them as fast as it can, but the ethernet buffers are all clogged with incoming packets, the mbuf pool is wedged full of pending requests that are already in the queue (making it tough, sometimes, for the mountd to get the memory it needs to return an fhandle to the client so it can finish a given request, packets are being dropped on the floor, clients are timing out and sending repeat requests -- it gets *really* nasty. You end up, essentially thrashing at a couple of layers in the kernel and sending lots and lots of ethernet packets all over everywhere. It isn't, really, a CPU bottleneck although a faster CPU will help somewhat. The problem from what I've seen, is that the statelessness of NFS makes it impossible for the client to tell whether the server has never seen its request (as opposed to knowing about it and not acting on it yet). So it has to assume the request disappeared and send it out again when it times out. This is correct most of the time, but not in this kind of worst-case scenario. One way to minimize it under the current scheme would be to make the "mount request timeout" be a sliding scale similar to ethernet packet collision delays -- every time it times out, the client waits a little longer (with a randomizing factor tossed in) before sending the request again. That isn't reducing the mounting load, but simply spreading it out further in time. Doesn't hurt the normal case, and would reduce some of the clogging in the worst case scenario. chuq Chuq Von Rospach =|= Editor,OtherRealms =|= Member SFWA/ASFA chuq@apple.com =|= CI$: 73317,635 =|= AppleLink: CHUQ [This is myself speaking. No company can control my thoughts.]
liam@cs.qmc.ac.uk (William Roberts) (09/01/89)
>>On occasion we can have quite a few users issuing multiple mount >>requests simultaneously. When this happens we see some of the requests time >>out, while users accessing already mounted files continue to receive good >>service. This is a difference between user-level RCP and kernel-level RPC. The kernel level *knows* that its NFS RPC requests are idempotent and so it doesn't change the xid when it does sends a retransmission. This means that the first reply is acceptable no matter how many retransmissions have occurred. The user-level makes no such guarantee, so there is a new xid for each retransmission. In particular, this means that the mount program's RPC request to the mount daemons *have* to be answered before the timeout period is up otherwise that reply is discarded as out of date. Ultimately this becomes a race condition, especially as the mount requests are small and the machine can buffer lots of them. We had an NFS server with 40 clients that was a 0.5 MIP Whitechapel MG1 - when all 40 clients rebooted after a power failure it was taking about 3 minutes from a client sending a request to the mountd sending the reply, by which time there were a lot of 25 second timeouts gone by. Funny thing is, every mountd response is identical, so the first one would do and the rest can be discarded.... You are just lucky that your server occasionally gets in there quick enough! >>The mount server has to read /etc/exports, and to do the host name to IP >>address translation would also have to access /etc/hosts (or the name >>server), and >> ***it writes /etc/rmtab*** [ my emphasis ] >>. So we thought mountd might be having >>trouble getting to /etc. But ps "snapshots" showed mountd rarely waiting >>on disk. To be more specific, it does a linear scan through rmtab looking to see if this mount request is already there and adds onto the end if it isn't. On my main machine /etc/rmtab is 978 lines long. The reason it is so long is that most clients unmount their disks by crashing, so the rmtab file never gets cleared by unmount requests. On our MG1 servers we reniced the mountd to -15 and removed all the /etc/rmtab nonsense. I'm sorry Chuq, but all that stuff about relentless mashing of mbufs just doesn't sound at all plausible, especially since the lucky clients who have already mounted are getting good service. (If it hadn't been from someone who ought to know I would have loudly decried it as complete *@*!%*, but perhaps I'm not so certain of my ground...) The Bottom Line: 1) Change mount to use a TCP connection to the mountd, or otherwise provide an idempotent RPC 2) Change mountd to use a dbm file or some other means or speeding up the search through rmtab. 3) Encourage people to remove rmtab as part of the boot sequence! Actually, idempotent RPC is an easy and valuable thing to do, especially as you just say "Buyer beware" and treat "idempotent RPC" to mean "don'T increment the xid for each retransmission". -- William Roberts ARPA: liam@cs.qmc.ac.uk Queen Mary College UUCP: liam@qmc-cs.UUCP AppleLink: UK0087 190 Mile End Road Tel: 01-975 5250 LONDON, E1 4NS, UK Fax: 01-980 6533
brent%terra@Sun.COM (Brent Callaghan) (09/06/89)
I made some improvements to the mountd performance for the SunOs 4.0.3 release. They were oriented to speeding up mounts from a client using the automounter's "-hosts" map. The special situation here is that you have can have a large number of mount requests coming in from the same client in a short period of time but the changes should make the mountd a bit faster for all mount requests. - Exports caching. Previously the mountd had to open the /etc/exports file and do a linear search for the exported filesystem for each mount request from a client. I had the file cached as a linked list. The list is valid as long as a stat() of /etc/exports shows that it hasn't been updated (by exportfs). - Asynchronous /etc/rmtab updating. The mountd was changed to update the /etc/rmtab *after* it had sent the response containing the filehandle back to the client. There's no reason why the client should have to wait for this file to be updated. BTW: the /etc/rmtab is already cached as a linked list. The disk file is read only when the mountd starts up (presumably after a crash). - Change netgroup/hostname checking. Given a list of hostnames/netgroups there's no easy way to tell whether a name represents a hostname or a netgroup. The old mountd used to take each name one at a time first checking it as a netgroup then as a hostname. This is about the most inefficient way to do this checking and it could take a huge amount of time if the list was big (I've seen exports with 100 or so names in the list). The new code first checks the whole list as if it is hostnames. This is just a bunch of strcmp's so it's relatively fast. If there's no match, then it assumes that the list is netgroups and checks them with innetgr calls. This is a whole lot faster if the list is just hostnames. FYI. Brent Made in New Zealand --> Brent Callaghan @ Sun Microsystems uucp: sun!bcallaghan phone: (415) 336 1051