[comp.os.vms] Routine hangs process in RWAST upon exit

XPMAINT@VENUS.TAMU.EDU (Shane Davis -- CSC XPrint Manager) (04/21/88)

This is a strange and tough one...

This routine, written in VAX C (V2.3), is called when an earlier TCP
connection attempt to a remote TCP server fails. It is supposed to determine
the cause of the problem, whether the server is down or the network is 
unreachable. It attempts to bounce a UDP datagram off the remote host's ECHO
server. If it does not receive it's echoed datagram within the timeout period
of 5 seconds, it will print out that the network is unreachable. If it does
receive it's datagram, it will print out that the server needs restarting.

This works fine when the server is down but the network is up. The EX__RECEIVE
$QIO call completes asynchronously before the timer set with $SETIMR expires
and the program prints "network servers not responding" and exits properly with
SS$_NOLISTENER. However, when the timeout period expires and the timer's event
flag is set rather than the QIO's event flag, it successfully prints out that
the network is down and displays the exit status message "%SYSTEM-F-UNREACHABLE,
 remote node is not currently reachable" then the cursor is placed over the "%"
in the exit message and the process is placed in RWAST state.

Here's the code. What am I doing that I shouldn't be doing other than not 
checking status values for LIB$GET_EF, SYS$SETIMR, and all those system services
that really will not fail without an access violation (even if they did I don't
know how that could possibly cause an RWAST)? (No flames for this assumption, 
please...I can say for certain that I *know* these services are not failing)

test_tamvm1()
{
     int status,timrval[2],lefn[2],lefmask;
     static char echosend[] = "Are you up?", echorecv[12];
     $DESCRIPTOR(deltat,"0 00:00:05.00");

     sock.sa = sinme;
     status = sys$qiow (0,channel,EX__CLOSE,iosb,0,0,0,0,&sock,0,0,0);
     if (status != SS$_NORMAL || iosb[0] != SS$_NORMAL)
	localnetprob();
     sinme.sin_port = htons(0);
     sock.type = SOCK_DGRAM;
     sock.sa = sinme;
     status = sys$qiow (0,channel,EX__SOCKET,iosb,0,0,0,0,&sock,0,0,0);
     if (status != SS$_NORMAL || iosb[0] != SS$_NORMAL)
	localnetprob();
     sinhim.sin_port = htons(IPPORT_ECHO);
     sock.sa = sinhim;
     status = sys$qiow (0,channel,EX__SEND,iosb,0,0,echosend,11,&sock,0,0,0);
     sock.sa = sinme;
     lib$get_ef (&lefn[0]);
     lib$get_ef (&lefn[1]);
     status = sys$qio (lefn[0],channel,EX__RECEIVE,iosb,0,0,echorecv,11,&sock,0,
       0,0);
     putchar('.');
     sys$bintim (&deltat,&timrval);
     sys$setimr (lefn[1],&timrval,0,0);
     lefmask = 3 << ((lefn[1] > 32) ? lefn[1]-32 : lefn[1]);
     sys$wflor (lefn[0],lefmask);
     sys$readef (lefn[0],&status);
     if (!(status & (1 << ((lefn[0] > 32) ? lefn[0]-32 : lefn[0]))))
	{
	test_local();
	puts("\n** TAMVM1's TCP/IP network is down.");
	puts("** Notify machine room (845-4219) to autolog TCPIP.");
	puts(try_later);
	exit (SS$_UNREACHABLE);
	}
     else
	{
	puts("\n** TAMVM1 network servers not responding.");
        puts("** Notify machine room (845-4219) to autolog XPrint servers.");
	puts(try_later);
	exit (SS$_NOLISTENER);
	}
}

test_local()
{
     char *localhost="localhost";
     int status;

     sinme.sin_port = htons(0);
     sock.sa = sinme;
     status = sys$qiow (0,channel,EX__SOCKET,iosb,0,0,0,0,&sock,0,0,0);
     if (status != SS$_NORMAL || iosb[0] != SS$_NORMAL)
	localnetprob();
     sinhim.sin_addr.s_addr = rhost (&localhost);
     sinhim.sin_port = htons (IPPORT_TELNET);
     sock.sa = sinhim;
     status=sys$qiow(0,channel,EX__CONNECT,iosb,0,0,0,0,&sock,0,0,0);
     putchar('.');
     status = sys$qiow (0,channel,EX__CLOSE,iosb,0,0,0,0,&sock,0,0,0);
     if (status != SS$_NORMAL || iosb[0] != SS$_NORMAL) {
	localnetprob(); }
     return;
}

EX__RECEIVE is equivalent to IO$_CONINTREAD, if that matters. The device to 
which the channel is assigned is a psuedo-mailbox-type-device used with
Excelan's EXOS TCP/IP software. The event flags used every time are 63 and 62
as returned by LIB$GET_EF. As you can see, the test_local() routine that is
called when the timer expires uses a new TCP socket and event flag 0, so it 
should not be interfering with it's caller's event flags. If the EX__CONNECT
$QIO call in test_local() were to fail (or not complete) then control would
be passed to routine localnetprob() which would issue a different exit message 
and would not ever return to test_tamvm1().

Does anyone have any idea what one earth I could be doing to cause RWAST? How
can I cancel the yet-to-complete-and-never-to-timeout EX__RECEIVE $QIO if that
is causing this?

By the time I get a response to this, as slow as the net has been, I'll
probably  have filled up the process entry slots on the system and long since
have run out of process names :-)...please help ASAP.

--Shane Davis
  Systems Programming Assistant
  Texas A&M Univ. Computing Services Ctr. Software Systems Group

********************************************************************************
	BITnet		   THEnet			Internet

  XPMAINT@TAMVENUS	THOR::XPMAINT		xpmaint@venus.tamu.edu
  RSD1901@TAMSIGMA	 ZAC::RSD1901			--------
   X233SD@TAMVM1	   ------		x233sd@tamvm1.tamu.edu

	SPAN: UTSPAN::UTADNX::(THEnet addr)
********************************************************************************

-------