XPMAINT@VENUS.TAMU.EDU (Shane Davis -- CSC XPrint Manager) (04/21/88)
This is a strange and tough one... This routine, written in VAX C (V2.3), is called when an earlier TCP connection attempt to a remote TCP server fails. It is supposed to determine the cause of the problem, whether the server is down or the network is unreachable. It attempts to bounce a UDP datagram off the remote host's ECHO server. If it does not receive it's echoed datagram within the timeout period of 5 seconds, it will print out that the network is unreachable. If it does receive it's datagram, it will print out that the server needs restarting. This works fine when the server is down but the network is up. The EX__RECEIVE $QIO call completes asynchronously before the timer set with $SETIMR expires and the program prints "network servers not responding" and exits properly with SS$_NOLISTENER. However, when the timeout period expires and the timer's event flag is set rather than the QIO's event flag, it successfully prints out that the network is down and displays the exit status message "%SYSTEM-F-UNREACHABLE, remote node is not currently reachable" then the cursor is placed over the "%" in the exit message and the process is placed in RWAST state. Here's the code. What am I doing that I shouldn't be doing other than not checking status values for LIB$GET_EF, SYS$SETIMR, and all those system services that really will not fail without an access violation (even if they did I don't know how that could possibly cause an RWAST)? (No flames for this assumption, please...I can say for certain that I *know* these services are not failing) test_tamvm1() { int status,timrval[2],lefn[2],lefmask; static char echosend[] = "Are you up?", echorecv[12]; $DESCRIPTOR(deltat,"0 00:00:05.00"); sock.sa = sinme; status = sys$qiow (0,channel,EX__CLOSE,iosb,0,0,0,0,&sock,0,0,0); if (status != SS$_NORMAL || iosb[0] != SS$_NORMAL) localnetprob(); sinme.sin_port = htons(0); sock.type = SOCK_DGRAM; sock.sa = sinme; status = sys$qiow (0,channel,EX__SOCKET,iosb,0,0,0,0,&sock,0,0,0); if (status != SS$_NORMAL || iosb[0] != SS$_NORMAL) localnetprob(); sinhim.sin_port = htons(IPPORT_ECHO); sock.sa = sinhim; status = sys$qiow (0,channel,EX__SEND,iosb,0,0,echosend,11,&sock,0,0,0); sock.sa = sinme; lib$get_ef (&lefn[0]); lib$get_ef (&lefn[1]); status = sys$qio (lefn[0],channel,EX__RECEIVE,iosb,0,0,echorecv,11,&sock,0, 0,0); putchar('.'); sys$bintim (&deltat,&timrval); sys$setimr (lefn[1],&timrval,0,0); lefmask = 3 << ((lefn[1] > 32) ? lefn[1]-32 : lefn[1]); sys$wflor (lefn[0],lefmask); sys$readef (lefn[0],&status); if (!(status & (1 << ((lefn[0] > 32) ? lefn[0]-32 : lefn[0])))) { test_local(); puts("\n** TAMVM1's TCP/IP network is down."); puts("** Notify machine room (845-4219) to autolog TCPIP."); puts(try_later); exit (SS$_UNREACHABLE); } else { puts("\n** TAMVM1 network servers not responding."); puts("** Notify machine room (845-4219) to autolog XPrint servers."); puts(try_later); exit (SS$_NOLISTENER); } } test_local() { char *localhost="localhost"; int status; sinme.sin_port = htons(0); sock.sa = sinme; status = sys$qiow (0,channel,EX__SOCKET,iosb,0,0,0,0,&sock,0,0,0); if (status != SS$_NORMAL || iosb[0] != SS$_NORMAL) localnetprob(); sinhim.sin_addr.s_addr = rhost (&localhost); sinhim.sin_port = htons (IPPORT_TELNET); sock.sa = sinhim; status=sys$qiow(0,channel,EX__CONNECT,iosb,0,0,0,0,&sock,0,0,0); putchar('.'); status = sys$qiow (0,channel,EX__CLOSE,iosb,0,0,0,0,&sock,0,0,0); if (status != SS$_NORMAL || iosb[0] != SS$_NORMAL) { localnetprob(); } return; } EX__RECEIVE is equivalent to IO$_CONINTREAD, if that matters. The device to which the channel is assigned is a psuedo-mailbox-type-device used with Excelan's EXOS TCP/IP software. The event flags used every time are 63 and 62 as returned by LIB$GET_EF. As you can see, the test_local() routine that is called when the timer expires uses a new TCP socket and event flag 0, so it should not be interfering with it's caller's event flags. If the EX__CONNECT $QIO call in test_local() were to fail (or not complete) then control would be passed to routine localnetprob() which would issue a different exit message and would not ever return to test_tamvm1(). Does anyone have any idea what one earth I could be doing to cause RWAST? How can I cancel the yet-to-complete-and-never-to-timeout EX__RECEIVE $QIO if that is causing this? By the time I get a response to this, as slow as the net has been, I'll probably have filled up the process entry slots on the system and long since have run out of process names :-)...please help ASAP. --Shane Davis Systems Programming Assistant Texas A&M Univ. Computing Services Ctr. Software Systems Group ******************************************************************************** BITnet THEnet Internet XPMAINT@TAMVENUS THOR::XPMAINT xpmaint@venus.tamu.edu RSD1901@TAMSIGMA ZAC::RSD1901 -------- X233SD@TAMVM1 ------ x233sd@tamvm1.tamu.edu SPAN: UTSPAN::UTADNX::(THEnet addr) ******************************************************************************** -------