dyer@arktouros.MIT.EDU (Steve Dyer) (09/29/88)
I am running A/UX 1.0 and find that I have to reboot the machine regularly because the network dies with the messages ae0: overflow NIC reset failed. ae6_intr: Receive overflow warning. Following this, the network is effectively dead. Some utilities whose names escape me actually say so: "network down", but this is not reflected in a call to "netstat -i", nor does a call to ifconfig reset things. Usually I see many copies of this message on my screen when I come in the first time each day, but I can also make it happen on command by doing a rcp with files going to the A/UX box. Very occasionally, a message of the form ae0 transmitter frozen, resetting appears within a few minutes, and the system is again usable, but more often, I never see this message and rebooting is the only way to clear things. What's the matter here? Is the ethernet interface bad? A mVAX-II, an Apollo, a Bell Tech V.3 box and a RT/PC are all on the same DELNI and exhibit no problems whatsoever. Any and all clues would be appreciated. --- Steve Dyer dyer@arktouros.MIT.EDU dyer@spdcc.COM aka {harvard,husc6,linus,ima,bbn,m2c,mipseast}!spdcc!dyer
dixon@control.steinmetz (walt dixon) (09/30/88)
We too have experienced this problem. I've talked to several people at Apple including A/UX support who could not believe that things like this really happen. (Welcome to the real world). This problem has cost Apple sales at our site. Loss of sales has finially got their attention (at least on a local level). I've promised to send a real time trace of network activity off to the developers so they can try to duplicate the problem. Will it be fixed in 1.1? I don't know. I suspect that the problem originates with either busts of broadcast traffic or bad packets. Hopefully the trace will isolate the problem. A/UX should definitely recover from this condition; no other devices on this ethernet segment have shown similar behavior. You can recover from this condition using ifconfig. "ifconfig ae0 up" will bring the network back up; one can also write a program to turn the network back on. The problem gets more interesting when you have a NFS hard mount. I tried to write a program which would run in the background, catch a signal that the network was done, and turn the network on again. This approach seems reasonable, but time constraints and a lack of Unix knowledge have prevented completion. I'm willing to give out the code I've got to anyone who wants to get it working. The only condition would be that, if you get it to work, you post it so others can use it. Walt Dixon {arpa: dixon@ge-crd.arpa } {us mail: ge corp r&d } { po box 8 } { schenectady, ny 12345 } {phone: 518-387-5798 }
dyer@arktouros.MIT.EDU (Steve Dyer) (09/30/88)
In article <12275@steinmetz.ge.com> dixon@control.steinmetz.ge.com (walt dixon) writes: >You can recover from this condition using ifconfig. "ifconfig ae0 up" will >bring the network back up; one can also write a program to turn the network >back on. In my experience, "ifconfig ae0 up" was a no-op. Programs complained "network down" anyway. Someone else placed the line ifconfig ae0 down; ifconfig ae0 up for cron to execute every 5 minutes or so. I haven't tried this yet; perhaps explicitly turning off the software state of the interface toggles some bit which allows the interface to be reset. If this is as widespread as the comments I see here and the letters I've received in the past few days, this is a major lose. How about a comment from Apple folk who are otherwise so diligent in fending off meta-rumors? Prevalence, workarounds, ideas of when this will be fixed? --- "The network IS the computer..." Steve Dyer
dyer@arktouros.MIT.EDU (Steve Dyer) (10/02/88)
In article <7247@bloom-beacon.MIT.EDU> dyer@arktouros.MIT.EDU (Steve Dyer) writes: >In my experience, "ifconfig ae0 up" was a no-op. Programs complained >"network down" anyway. Someone else placed the line >ifconfig ae0 down; ifconfig ae0 up >for cron to execute every 5 minutes or so. I haven't tried this yet; perhaps >explicitly turning off the software state of the interface toggles some >bit which allows the interface to be reset. Having discovered the network wedged again this morning, I can say that typing: ifconfig ae0 down; ifconfig ae0 up does work. --- Steve Dyer dyer@arktouros.MIT.EDU dyer@spdcc.COM aka {harvard,husc6,ima,bbn,m2c,mipseast}!spdcc!dyer
pane@cat.cmu.edu (John Pane) (10/03/88)
This is a duplication of a post I made in July, for the benefit of those who are experiencing this problem with the ethernet... ---begin forwarded message--- I started having this problem when our network was re-arranged here, and it was so bad that I couldn't do any networking. The new configuration had placed me on a very busy portion of the network at CMU. Some of the problem was tracked down to broadcasts that my A/UX machine was making, that were being responded to by hundreds of machines on campus. Although this doesn't completely solve the problem, here are the steps I took which resulted in a big improvement. 1) In /etc/inittab I turned off nfs0 (the release notes tell you to turn this on even if you're not running nfs). I haven't noticed any loss of functionality after turning it off. 2) I created a file /etc/resolv.conf, listing three domain name servers, so my machine doesn't broadcast domain name resolution requests. See the manual entry for resolver(4). 3) Changed my broadcast address from 128.2.0.0 to 128.2.255.255 (most of the machines in the CS department here are still using 128.2.0.0, but the plan is to move to 128.2.255.255). This is a temporary fix, relying on the fact that fewer machines are currently responding to broadcasts on the new address. So now, my machine does less broadcasting, and because of the change of broadcast address, receives fewer replies when it does broadcast. The only remaining problem, which happens much less frequently, is that 100+ other machines on the network don't know about the 255.255 broadcast address, and when they receive such a broadcast (from my machine or others) they respond by arp'ing. This flood of arp's still causes my networking to go down. The fact remains that the hardware/low-level software should be able to handle this level of traffic. Does anybody know if the acknowledged "defect" in the ethertalk boards could manifest itself in this way? ---end forwarded message--- John Pane Department of Computer Science Carnegie Mellon University (412)268-5884 pane@cs.cmu.edu
news@steinmetz.ge.com (news) (10/03/88)
an ifconfig ae0 down. I thought that this was common knowledge; apparently not. This combination does indeed bring the network back up. From: dixon@control.steinmetz (walt dixon) Path: control!dixon Walt Dixon {ARPA: Dixon@ge-crd.arpa } {US Mail: GE Corp R&D } { PO Box 8 } { Schenectady, NY 12345 } {Phone: 518-387-5798 }
ragge@nada.kth.se (Ragnar Sundblad) (10/05/88)
In article <7247@bloom-beacon.MIT.EDU> dyer@arktouros.MIT.EDU (Steve Dyer) writes: >In my experience, "ifconfig ae0 up" was a no-op. Programs complained >"network down" anyway. Someone else placed the line >ifconfig ae0 down; ifconfig ae0 up .... >Steve Dyer That's probably one of the bugs in the National DP8390 (described in DP8390 Tech Update, problem #3). " Problem 3 Suspended Operation After Transmission: If Collision (COL) is asserted during the transmission of the last byte, the NIC will suspend all operations. This problem is manifested when the Command Register continually reads 26H. The NIC must be hardware reset to resume operation. NOTE: In a properly operating IEEE 802.3 network, a collision will never occur during the last byte of transmission. " You can find the command register at the byte at address 0xFSSE003C, where S = slot number (in a MacII 9 <= S <= E) if you would like to check it out. (and if you somehow manage to look at this address). If this is the problem, you'd better check your ethernet. Note: I don't THINK that the EtherTalk card exchange some months ago solved this problem.
nghiem@ut-emx.UUCP (Alex Nghiem) (10/06/88)
In article <12275@steinmetz.ge.com>, dixon@control.steinmetz (walt dixon) writes: > We too have experienced this problem. I've talked to several people at > Apple including A/UX support who could not believe that things like this > really happen. (Welcome to the real world). This problem has cost Apple I believe I read a Computer World article that mentioned that Apple's Ethernet board is manufactured by 3 com and was temporarily was withdrawn from the market because of bugs. I don't know if the board in question is related to this problem, but it might be worth investigation. I read the article sometime this summer. A corrected board should have been introduced by now.
magorian@umd5.umd.edu (Dan Magorian) (10/08/88)
In article <586@draken.nada.kth.se> ragge@nada.kth.se (Ragnar Sundblad) writes: >In article <7247@bloom-beacon.MIT.EDU> dyer@arktouros.MIT.EDU (Steve Dyer) writes: >>In my experience, "ifconfig ae0 up" was a no-op. Programs complained >>"network down" anyway. Someone else placed the line >>ifconfig ae0 down; ifconfig ae0 up >.... >>Steve Dyer > >That's probably one of the bugs in the National DP8390 (described in >DP8390 Tech Update, problem #3). > >" > Problem 3 > Suspended Operation After Transmission: If Collision (COL) is > asserted during the transmission of the last byte, the NIC will > suspend all operations. This problem is manifested when the > Command Register continually reads 26H. > The NIC must be hardware reset to resume operation. > > NOTE: In a properly operating IEEE 802.3 network, a collision will > never occur during the last byte of transmission. >" >Note: I don't THINK that the EtherTalk card exchange some months ago >solved this problem. Does anyone have some details on what the swapped Rev I or J cards patched? We had the earlier Rev E cards, and were experiencing the problems people are describing (there was even one Rev C card with the earlier version of the NIC). Swapping them out reduced lockups considerably. Basically, it's the same card reworked with 4 additional jumpers (there were already 2). On the MacOS side, the reworks were shipped with the same 1.1 driver, but a 2.0 version later appeared. Comments? Dan Magorian Computer Science Center University of Maryland