[comp.unix.sysv386] Problems with the rexd and rshd daemons in Interactive 386/ix 2.0.2

niklas@appli.se (Niklas Hallqvist) (09/18/90)

	Hello, out there. 

	This is a repost of an article posted to comp.unix.i386 at the time
of the comp.unix.* reorganisation.  I didn't get any answer then, so this
time I'll broaden the distribution to include comp.protocols.{nfs,tcp-ip}
on top of using the new correct newsgroup; comp.unix.sysv386!

	We have an ethernet network with three nodes, all of them running
NFS.  One of the most useful commands is on(1) which runs commands on another
node but retains the environment (including the current directory).  Very
neat!  My problem is: I can't use this facility to run programs on our
386/ix (2.0.2 core, 1.1.2 TCP/IP, 2.0 NFS).  I get this error message:
"on: af clnt_call..RPC: Unable to receive" sometimes, and sometimes
I won't even get an error message!  The logfile /tmp/rexd.log looks
something like this:

Sep  6 09:00 (Rpchild/10444): Child #10444 processing RPC for request 
        REXD INFO: errno=22, msg="Invalid argument" 
Sep  6 09:00 (Rpchild/10444): About to fork execution child; cmd='ls' 
        REXD INFO: errno=9, msg="Bad file number" 
Sep  6 09:00 (Rpchild/10444): [RPC Child: svc_fds == 0, shutting down] 
        REXD INFO: errno=9, msg="Bad file number" 

or like this:

Sep  6 09:02 (Rpchild/10446): Child #10446 processing RPC for request
	REXD INFO: errno=22, msg="Invalid argument"
Sep  6 09:02 (Rpchild/10446): About to fork execution child; cmd='ls'
	REXD INFO: errno=9, msg="Bad file number"
Sep  6 09:02 (Rpchild/10446): [RPC Child: svc_fds == 0, shutting down]
	REXD INFO: errno=4, msg="Interrupted system call"

The other way everything works like it's expected to (e.g. running
a command on our NCR Tower using the 386/ix on(1) command).  Even
local usages, like "on localhost ls" fails!  What have I done wrong?
Is there a magical kernel parameter which is wrongly set?  Please help!

	And then there's this "remote shell" handle by /etc/rshd on the
386/ix.  Very often (not always, though) my client "remsh" on another node
gets hung after sending the standard input to the foreign shell.  Very
annoying indeed!  After I kill the client the daemon continues as if nothing
has happenned.  It seems like the EOF gets lost on the way, but reappears if
I kill the client.

Another possibly related weirdness of our 386/ix system is the presence
of all these strange TIME_WAIT, CLOSE_WAIT, FIN_WAIT_2 & CLOSED IP-sessions
that never goes away from our netstat:

Active Internet connections 
Proto Recv-Q Send-Q  Local Address          Foreign Address        (state) 
tcp        0      0  ix.1224                ix.111                 TIME_WAIT 
tcp        0      0  ix.1182                nix.1181               CLOSE_WAIT 
tcp        0      0  ix.1181                nix.1039               CLOSE_WAIT 
tcp        0      0  ix.shell               appli.1023             CLOSED 

The corresponding lines from our node "nix": (only the ones concerning "ix")
tcp        0      0  nix.1181               ix.1182                FIN_WAIT_2 
tcp        0      0  nix.1039               ix.1181                FIN_WAIT_2 

There are no corresponding line for the CLOSED connection to node "appli" in
the output from netstat on that node.  What's going on here?  Most things
does work like X11, NFS, rlogin, rcp etc.  It's just "rexd" & "rshd" that
fails!  Any Ideas?

					Niklas
---
Niklas Hallqvist	Phone: +46-(0)31-19 14 85
Applitron Datasystem	Fax:   +46-(0)31-19 80 89
N. Gubberogatan 30	Email: niklas@appli.se
S-416 63  GOTEBORG	       sunic!chalmers!appli!niklas
Sweden

niklas@appli.se (Niklas Hallqvist) (09/18/90)

	Hello, out there. 

	This is a repost of an article posted to comp.unix.i386 at the time
of the comp.unix.* reorganisation.  I didn't get any answer then, so this
time I'll broaden the distribution to include comp.protocols.{nfs,tcp-ip}
on top of using the new correct newsgroup; comp.unix.sysv386!

	We have an ethernet network with three nodes, all of them running
NFS.  One of the most useful commands is on(1) which runs commands on another
node but retains the environment (including the current directory).  Very
neat!  My problem is: I can't use this facility to run programs on our
386/ix (2.0.2 core, 1.1.2 TCP/IP, 2.0 NFS).  I get this error message:
"on: af clnt_call..RPC: Unable to receive" sometimes, and sometimes
I won't even get an error message!  The logfile /tmp/rexd.log looks
something like this:

Sep  6 09:00 (Rpchild/10444): Child #10444 processing RPC for request 
        REXD INFO: errno=22, msg="Invalid argument" 
Sep  6 09:00 (Rpchild/10444): About to fork execution child; cmd='ls' 
        REXD INFO: errno=9, msg="Bad file number" 
Sep  6 09:00 (Rpchild/10444): [RPC Child: svc_fds == 0, shutting down] 
        REXD INFO: errno=9, msg="Bad file number" 

or like this:

Sep  6 09:02 (Rpchild/10446): Child #10446 processing RPC for request
	REXD INFO: errno=22, msg="Invalid argument"
Sep  6 09:02 (Rpchild/10446): About to fork execution child; cmd='ls'
	REXD INFO: errno=9, msg="Bad file number"
Sep  6 09:02 (Rpchild/10446): [RPC Child: svc_fds == 0, shutting down]
	REXD INFO: errno=4, msg="Interrupted system call"

The other way everything works like it's expected to (e.g. running
a command on our NCR Tower using the 386/ix on(1) command).  Even
local usages, like "on localhost ls" fails!  What have I done wrong?
Is there a magical kernel parameter which is wrongly set?  Please help!

	And then there's this "remote shell" handle by /etc/rshd on the
386/ix.  Very often (not always, though) my client "remsh" on another node
gets hung after sending the standard input to the foreign shell.  Very
annoying indeed!  After I kill the client the daemon continues as if nothing
has happenned.  It seems like the EOF gets lost on the way, but reappears if
I kill the client.

Another possibly related weirdness of our 386/ix system is the presence
of all these strange TIME_WAIT, CLOSE_WAIT, FIN_WAIT_2 & CLOSED IP-sessions
that never goes away from our netstat:

Active Internet connections 
Proto Recv-Q Send-Q  Local Address          Foreign Address        (state) 
tcp        0      0  ix.1224                ix.111                 TIME_WAIT 
tcp        0      0  ix.1182                nix.1181               CLOSE_WAIT 
tcp        0      0  ix.1181                nix.1039               CLOSE_WAIT 
tcp        0      0  ix.shell               appli.1023             CLOSED 

The corresponding lines from our node "nix": (only the ones concerning "ix")
tcp        0      0  nix.1181               ix.1182                FIN_WAIT_2 
tcp        0      0  nix.1039               ix.1181                FIN_WAIT_2 

There are no corresponding line for the CLOSED connection to node "appli" in
the output from netstat on that node.  What's going on here?  Most things
does work like X11, NFS, rlogin, rcp etc.  It's just "rexd" & "rshd" that
fails!  Any Ideas?

					Niklas

-- 
Niklas Hallqvist                Phone: +46-(0)31-19 14 85
Applitron Datasystem            Fax:   +46-(0)31-19 80 89
N. Gubberogatan 30              Email: niklas@appli.se
S-416 63  GOTEBORG, Sweden             sunic!chalmers!appli!niklas

als@bohra.cpg.oz (Anthony Shipman) (09/19/90)

In article <1112@appli.se>, niklas@appli.se (Niklas Hallqvist) writes:
> 
> 	Hello, out there. 
.......
> 	And then there's this "remote shell" handle by /etc/rshd on the
> 386/ix.  Very often (not always, though) my client "remsh" on another node
> gets hung after sending the standard input to the foreign shell.  Very
> annoying indeed!  After I kill the client the daemon continues as if nothing
> has happenned.  It seems like the EOF gets lost on the way, but reappears if
> I kill the client.

This used to happen with me when trying to pipe a file from a non 386/ix system
to 386/ix 2.0.1 rshd. It would happen 100% of the time. However when the 
sender was killed the missing EOF would go through and the command would
complete properly.

I count this as a bug in 386/ix TCP/IP. Some other programs would not talk to
foreign tcp/ip nodes properly, sometimes coming, sometimes going. There was
even incompatibility between different versions of 386/ix. It appears that
386/ix was only tested with itself (at least at Rev 2.0.?).
-- 
Anthony Shipman                               ACSnet: als@bohra.cpg.oz.au
Computer Power Group
9th Flr, 616 St. Kilda Rd.,
St. Kilda, Melbourne, Australia
D