[comp.sys.ncr] Finding out if TCP/IP is up

TEMNGT23@ysub.ysu.edu (Lou Anschuetz) (05/10/91)

I have an NCR Tower 700 with 16MB of memory.  All my users come in
through the Ethernet board via a terminal server.  It is possible
for certain dialup users (Commodore 64 in fact) to reboot their
machines while still on line.  Since this "information" is just
passed down the pipeline to the tower, it attempts to handle this
by allocating thousands of 2k streams.  I am already set at maximum
for 2k streams, so there is nothing further I can do there.  The
problem is that WIN TCP/IP on seeing this error quietly dies.
In fact, WIN TCP/IP *NEVER* generates any errors, so it is
impossible to use any automated system to determine if TCP/IP
is up or not based on any kind of error logging.

What I need, therefore, is a way to determine if TCP/IP is up
from cron.  If it is not I can do a win restart (a little
undocumented feature....).  But, how do I do the test?  If it
is down, netstat NEVER returns so I can't use that as a test.
Any help would be really appreciated (we have been down twice
this week for over 8 hours each time - making my users an
unhappy lot.....).  Thanks in advance!

Lou Anschuetz
temngt23@ysub.ysu.edu

jeffl@NCoast.ORG (Jeff Leyser) (05/14/91)

In post <91130.081040TEMNGT23@ysub.ysu.edu>, Lou Anschuetz <TEMNGT23@ysub.ysu.edu> says:
!!I have an NCR Tower 700 with 16MB of memory.  All my users come in
!!through the Ethernet board via a terminal server.  It is possible
!!for certain dialup users (Commodore 64 in fact) to reboot their
!!machines while still on line.  Since this "information" is just
!!passed down the pipeline to the tower, it attempts to handle this
!!by allocating thousands of 2k streams.  I am already set at maximum

First off, I don't understand this at all.  What "information" is being
sent down what "line?"
!!What I need, therefore, is a way to determine if TCP/IP is up
!!from cron.  If it is not I can do a win restart (a little

Several things, depending on how WIN crashed:

A) Grep through a ps -aef and look for tcplisten, inetinit, and listen
(if you have listen configured to run at all)

B) Look at the output of an ifconfig.  Maybe WIN is marking the
interface as down.

C) run a netstat in the background, with it's output going to a file.
check the file for data.  If there is no data after a reasonable time,
kill the netstat, and assume WIN is hosed.
-- 
Jeff Leyser                                     jeffl@NCoast.ORG

TEMNGT23@ysub.ysu.edu (Lou Anschuetz) (05/14/91)

In article <1991May13.174301.1651@NCoast.ORG>, jeffl@NCoast.ORG (Jeff Leyser)
says:
>
>In post <91130.081040TEMNGT23@ysub.ysu.edu>, Lou Anschuetz
><TEMNGT23@ysub.ysu.edu> says:
>!!I have an NCR Tower 700 with 16MB of memory.  All my users come in
>!!through the Ethernet board via a terminal server.  It is possible
>!!for certain dialup users (Commodore 64 in fact) to reboot their
>!!machines while still on line.  Since this "information" is just
>!!passed down the pipeline to the tower, it attempts to handle this
>!!by allocating thousands of 2k streams.  I am already set at maximum
>
>First off, I don't understand this at all.  What "information" is being
>sent down what "line?"
Basically, every bit of machine code that the machine is executing
is sent out through their modem, comes in through my modem, passes
silently through the terminal server and into the ethernet board on
the tower.  So, the tower sees every bit of machine code the PC on
the other end executes.  Neat, huh?  :-(
>!!What I need, therefore, is a way to determine if TCP/IP is up
>!!from cron.  If it is not I can do a win restart (a little
>
>Several things, depending on how WIN crashed:
>
>A) Grep through a ps -aef and look for tcplisten, inetinit, and listen
>(if you have listen configured to run at all)
tcplisten and inetinit are still listed as active PIDS when this
occurs.  What is listen (I didn't see that anywhere in the manual????)?

>
>B) Look at the output of an ifconfig.  Maybe WIN is marking the
>interface as down.
I guess I don't know how to do that.  Any help would be appreciated.
This may well be the key since the PIDS are still there, it is
just that it doesn't serve....  :-(  :-(

>
>C) run a netstat in the background, with it's output going to a file.
>check the file for data.  If there is no data after a reasonable time,
>kill the netstat, and assume WIN is hosed.
I think I found a slightly easier way since netstat always creates
/tmp/netstatdata, even if it isn't collecting any data :-(
My approach is to use ping and if I get "100% packet loss" to
myself (yep, I can't get to me either through the loopback - why -
I don't know).  If I get this condition then my batch file
executes a "win stop" followed by a "win restart".  Unfortunately,
the restart seemed not to work, but may have just gotten bumped
by line noise (I tried this during an electrical storm  :-O).
Any revisions to this technique would be appreciated, and any
ideas on how to avoid it (hardware or software) would also be
appreciated.  It is now happening about twice a day, and
is rather a nuisance....  Of course, it is impossible to
easily have CODAR take a look, since it is not at all predictable.
But, shouldn't WIN recognize it's failure and do some internal
diagnostics, generate an error (anything)?

>Jeff Leyser                                     jeffl@NCoast.ORG
Thanks in advance!

Lou Anschuetz
a very tired system administrator
temngt23@ysu.edu

jeffl@NCoast.ORG (Jeff Leyser) (05/15/91)

In post <91134.081431TEMNGT23@ysub.ysu.edu>, Lou Anschuetz <TEMNGT23@ysub.ysu.edu> says:
!!In article <1991May13.174301.1651@NCoast.ORG>, jeffl@NCoast.ORG (Jeff Leyser)
!!says:
!!>
!!>In post <91130.081040TEMNGT23@ysub.ysu.edu>, Lou Anschuetz
!!><TEMNGT23@ysub.ysu.edu> says:
!!> [Wierd stuff causing WIN-TCP to crash]
!!>First off, I don't understand this at all.  What "information" is being
!!>sent down what "line?"
!!Basically, every bit of machine code that the machine is executing
!!is sent out through their modem, comes in through my modem, passes
!!silently through the terminal server and into the ethernet board on
!!the tower.  So, the tower sees every bit of machine code the PC on
!!the other end executes.  Neat, huh?  :-(

Sounds like the real long term solution is to shoot the people using
these machines! ;^)

!!>!! [ How to tell if WIN-TCP is down ]
!!>
!!>Several things, depending on how WIN crashed:
!!>
!!>A) Grep through a ps -aef and look for tcplisten, inetinit, and listen
!!>(if you have listen configured to run at all)
!!tcplisten and inetinit are still listed as active PIDS when this
!!occurs.  What is listen (I didn't see that anywhere in the manual????)?

Listen is used to, ummm, listen for services not known to tcplisten.
Two popular services that use listen are UUCP over TCP, and RFS.

!!>B) Look at the output of an ifconfig.  Maybe WIN is marking the
!!>interface as down.
!!I guess I don't know how to do that.  Any help would be appreciated.

/usr/etc/ifconfig [en0|lo0]

Where en0 is the first ethernet board, and lo0 is the loopback.

!!>C) run a netstat in the background, with it's output going to a file.
!!>check the file for data.  If there is no data after a reasonable time,
!!>kill the netstat, and assume WIN is hosed.
!!I think I found a slightly easier way since netstat always creates
!!/tmp/netstatdata, even if it isn't collecting any data :-(
!!My approach is to use ping and if I get "100% packet loss" to
!!myself (yep, I can't get to me either through the loopback - why -
!!I don't know).

If all the processes are still up, but you can't ping through loopback or
through ethernet, it's a good bet your STREAMS drivers are screwed up.  This
would also jibe with your remark about WIN-TCP requesting a whole bunch of
STREAMS buffers.  Two things you can check -- see if the process inetinit is
alive and well.  This process maintains the STREAMS drivers.  However, it is
possible to have the processes running, and the drivers messed up.  DO NOT
SIMPLY RESTART THE PROCESS.  You _MUST_ stop and start all of WIN_TCP.  If
inetinit is up, and things still don't work, do what you are already doing --
ping the loopback.  (Ping sends ICMP packets requesting an echo.  If you can't
echo to yourself, either the "send" isn't going down to the loop, or the loop
can't give ping the "recieve."  In either case, the drivers are screwed up.)


However, all of this is nothing more than a patch.  You really need to
_educate_ the users who are causing the crash.  Teach them what not to
do, and tell them all the problems they are causing.  Otherwise, you'll
be chasing your tail on this one for a long, long time.
-- 
Jeff Leyser                                     jeffl@NCoast.ORG

ken@uctcs.uucp (Ken McGregor) (05/16/91)

In article <91130.081040TEMNGT23@ysub.ysu.edu> TEMNGT23@ysub.ysu.edu (Lou Anschuetz) writes:
>I have an NCR Tower 700 with 16MB of memory.  All my users come in
>through the Ethernet board via a terminal server.  It is possible

>allocating thousands of 2k streams.  I am already set at maximum
>for 2k streams, so there is nothing further I can do there.  The
>problem is that WIN TCP/IP on seeing this error quietly dies.
>
>What I need, therefore, is a way to determine if TCP/IP is up
>from cron.  If it is not I can do a win restart (a little
>undocumented feature....).

Are you sure that this will work.  We have a similar problem with WIN-TCP on
our Tower 600's.  Some resource jams and the system fills up the 2k stream
blocks.  We are only using terminals on the terminal servers - not pc's.
When we restart TCP it restarts fine, but the streams are still full, so it
never comes up.  What we do to solve the problem is a total reboot.

We can identify the problem be running crash with input from a file.  This 
runs strstat.  Pipe the output into grep 2048, cut on the free field, then
test for 0.  If so reboot.  We run this via cron every 15 minutes.

I wish however that there was a solution available to WIN.
Ken MacGregor                                             ken@cs.uct.ac.za
Computer Science                             ken%uctcs%quagga@uunet.uu.net
University of Cape Town   ...uunet!{m2xenix!quagga,ddsw1!olsa99}!uctcs!ken

TEMNGT23@ysub.ysu.edu (Lou Anschuetz) (05/16/91)

In article <1991May15.142242.3193@NCoast.ORG>, jeffl@NCoast.ORG (Jeff Leyser)
says:
>If all the processes are still up, but you can't ping through loopback or
>through ethernet, it's a good bet your STREAMS drivers are screwed up.  This
>would also jibe with your remark about WIN-TCP requesting a whole bunch of
>STREAMS buffers.  Two things you can check -- see if the process inetinit is
>alive and well.  This process maintains the STREAMS drivers.  However, it is
>possible to have the processes running, and the drivers messed up.  DO NOT
>SIMPLY RESTART THE PROCESS.  You _MUST_ stop and start all of WIN_TCP.  If
>inetinit is up, and things still don't work, do what you are already doing --
>ping the loopback.  (Ping sends ICMP packets requesting an echo.  If you can't
>echo to yourself, either the "send" isn't going down to the loop, or the loop
>can't give ping the "recieve."  In either case, the drivers are screwed up.)
Following is the file I would like to execute from cron.  Looks like it
should work based on your comments.  Let me know if you see any problem.

/usr/etc/ping yfnserv 2 2 | grep "100% packet loss" >/dev/null
if [ "$?" -eq 0 ]
then
   /etc/init.d/win stop
   /etc/init.d/win start
   echo "WIN restarted" >> /tmp/neterror
fi
>
>
>However, all of this is nothing more than a patch.  You really need to
>_educate_ the users who are causing the crash.  Teach them what not to
>do, and tell them all the problems they are causing.  Otherwise, you'll
>be chasing your tail on this one for a long, long time.
Aye, there's the rub.  We have 3,000 users and add about 30 more per
week.  It's tough to make them all understand.  Changing the subject
just slightly - I understand SysV.4 does dynamic streams allocation,
thereby making this problem less likely to occur.  Any word on
SysV.4 for the tower line (anybody can respond here) ?
>--
Lou Anschuetz
temngt23@ysub.ysu.edu