niklas@appli.se (Niklas Hallqvist) (09/06/90)
Hello, out there. We have an ethernet network with three nodes, all of them running NFS. One of the most useful commands is on(1) which runs commands on another node but retains the environment (including the current directory). Very neat! My problem is: I can't use this facility to run programs on our 386/ix (2.0.2 core, 1.1.2 TCP/IP, 2.0 NFS). I get this error message: "on: af clnt_call..RPC: Unable to receive" sometimes, and sometimes I won't even get an error message! The logfile /tmp/rexd.log looks something like this: Sep 6 09:00 (Rpchild/10444): Child #10444 processing RPC for request REXD INFO: errno=22, msg="Invalid argument" Sep 6 09:00 (Rpchild/10444): About to fork execution child; cmd='ls' REXD INFO: errno=9, msg="Bad file number" Sep 6 09:00 (Rpchild/10444): [RPC Child: svc_fds == 0, shutting down] REXD INFO: errno=9, msg="Bad file number" or like this: Sep 6 09:02 (Rpchild/10446): Child #10446 processing RPC for request REXD INFO: errno=22, msg="Invalid argument" Sep 6 09:02 (Rpchild/10446): About to fork execution child; cmd='ls' REXD INFO: errno=9, msg="Bad file number" Sep 6 09:02 (Rpchild/10446): [RPC Child: svc_fds == 0, shutting down] REXD INFO: errno=4, msg="Interrupted system call" The other way everything works like it's expected to (e.g. running a command on our NCR Tower using the 386/ix on(1) command). Even local usages, like "on localhost ls" fails! What have I done wrong? Is there a magical kernel parameter which is wrongly set? Please help! And then there's this "remote shell" handle by /etc/rshd on the 386/ix. Very often (not always, though) my client "remsh" on another node gets hung after sending the standard input to the foreign shell. Very annoying indeed! After I kill the client the daemon continues as if nothing has happenned. It seems like the EOF gets lost on the way, but reappears if I kill the client. Another possibly related weirdness of our 386/ix system is the presence of all these strange TIME_WAIT, CLOSE_WAIT, FIN_WAIT_2 & CLOSED IP-sessions that never goes away from our netstat: Active Internet connections Proto Recv-Q Send-Q Local Address Foreign Address (state) tcp 0 0 ix.1224 ix.111 TIME_WAIT tcp 0 0 ix.1182 nix.1181 CLOSE_WAIT tcp 0 0 ix.1181 nix.1039 CLOSE_WAIT tcp 0 0 ix.shell appli.1023 CLOSED The corresponding lines from our node "nix": (only the ones concerning "ix") tcp 0 0 nix.1181 ix.1182 FIN_WAIT_2 tcp 0 0 nix.1039 ix.1181 FIN_WAIT_2 There are no corresponding line for the CLOSED connection to node "appli" in the output from netstat on that node. What's going on here? Most things does work like X11, NFS, rlogin, rcp etc. It's just "rexd" & "rshd" that fails! Any Ideas? Niklas --- Niklas Hallqvist Phone: +46-(0)31-19 14 85 Applitron Datasystem Fax: +46-(0)31-19 80 89 N. Gubberogatan 30 Email: niklas@appli.se S-416 63 GOTEBORG sunic!chalmers!appli!niklas Sweden