reg@lti.com (Rick Genter x18) (04/22/89)
I have a bizarre situation. We have a Sun-3/180 server hooked up to an Isolan multiport repeater. We have four network legs connected to the repeater (including the leg containing the server). We have ~20 diskless and dataless Suns hooked up to the network; some are on the same leg as the server, while some are on the other legs. The Suns on the server leg work fine. The Suns on the other legs work marginally, requiring us to cut the buffer sizes for NFS reads and writes to 1024. Anything larger and we get NFS Server not responding/still trying. For a long time we suspected the repeater, but then we got an Ethernet test box (handy dandy little piece of equipment) and blasted packets across the repeater in great volumes during times of great load, and never lost a single packet. What's even more bizarre is that the Suns on the other legs go through tftpboot and arp and everything just fine, until they start to "heavily" use NFS. Thus I suspect there is some parameter (other than rsize/wsize) in NFS that needs tweaking. The rsize/wsize tweak is not an acceptable option; the non-server-leg Suns run *SIGNIFICANTLY* slower than the server-leg Suns, to the point of almost being unusable. I tried fiddling with timeo, but with 8K buffers even a timeo of 50 (5.0 seconds) doesn't work. We have a variation of the problem which is that some machines on the non-server leg will work with rsize/wsize=1024 when configured as dataless nodes, but if you try to make them diskless nodes it doesn't matter what size you give, they hang during /etc/rc. They always hang executing "ps -U"; if you comment that line out, they hang at the next command in rc that does "significant" I/O. (ps -U builds the file /etc/psdatabase -- it's not clear what ps does with this file, but it doesn't seem to work without it.) "Hanging" means "NFS Server not responding/still trying" We've been around and around with Sun on this; I have yet to talk to someone who struck me as halfway competent. I've been dealing with Suns since '82 and am no novice at this stuff, but I can't seem to convince Sun of that (I get bullsh*t suggestions like "are your cables making good contact"). Has anyone out there in 'Spots-Land seen anything like this? Frankly, we're stumped, and we're tired of dealing with the incompetent bozos at Sun. Thanks for hearing my gripe. - reg -- Rick Genter ...!{buita,bbn}!lti!reg Language Technology, Inc. reg%lti.uucp@bu-it.bu.edu 27 Congress St., Salem, MA 01970 (508) 741-1507 [[ This was added later: --wnl ]] A followup to my last message: all machines are Sun-3s, and all are running 4.0.1 with tty patch and itrunc patch. -reg
sater@cs.vu.nl (Hans van Staveren) (05/05/89)
We had exactly the same problem, I think. If it only happens to 3/60's and not to 3/50's it must be the same. Our ISOLAN repeaters were swapped to get it fixed. It seems that there is a crystal in early models of it that is just a teeny bit out of spec. The Sun 3/60 seems to be marginal but just in spec, so the combination fails. Every large packet sent from a 3/60 through a repeater has a pretty high chance of being munged. Since 8K NFS writes are 6 Ethernet packets, and are retransmitted all six when things fail, the chance of all six getting through gets close to zero giving the effect you describe. Get your repeaters replaced. Hans van Staveren Vrije Universiteit Amsterdam, Holland
joe@uunet.uu.net (Joe Michel-Angelo) (05/05/89)
| We've been around and around with Sun on this; I have yet to talk to | someone who struck me as halfway competent. I've been dealing with Suns | since '82 and am no novice at this stuff, but I can't seem to convince Sun | of that (I get bullsh*t suggestions like "are your cables making good | contact"). but are your cables making good contact? whenever nd or nfs fails but the network looks good to test hardware, it's because of the following: - thin/thick net segment bad - bad bnc, barrel, or t connector - bad vamp connection - bad drop cable - bad drop cable connector/connection - drop cable too long ("they" say 50 feet is max; i say it's 15 or 25 feet) - giant network! (the length of every segments adds to total length) - too many neighboring repeators (you should never have more then 2 repeators next to each other) in bizzare cases: - isolan repeator power source in 220 and not 110 volts position (repeator works but only for small packets) (hey-- don't try this ....) - ethernet segment ground problem ... ie: grounded to an isolated ground ups and not mother earth. - xcvrs set with ENCODER on ... ENCODER option should be disabled. -- "The Network Joe Angelo, VP/Technical Support - Support Group Division Adminstrator Teknekron Software Systems, Palo Alto, CA 415-325-1025 Is the Computer" joe@tss.com - uunet!tekbspa!joe - tekbspa!joe@uunet.uu.net