[bit.listserv.ibmtcp-l] TCPIP machine abends after starting other service machine

SYSHERM@UKCC.BITNET (Collins, Herman) (02/09/90)

Well, I'm having an odd problem with the TCPIP machine abending just
after startup, and I'm note sure to whom to turn.  If anybody has any
suggestions, I'd love to hear them.

I'm running the FAL 1.2.1 software on a 3090 under VM/XA-SP2, and using
David Lippke's driver to work a BTI interface.  This had been working
great for about eight months when David sent a new version of the
driver (2.15).  I decided to replace our very old version (0.6) with the
nice new code, but when I brought it up, the TCPIP machine abended with
an operation exception.  [This was a paricularly nasty situation because
the machine takes a VMDUMP after the abend, re-IPLs, trys to start up
again, abends again, ... well, you get the picture.]

Oh my, I thought, either I've screwed up the installation or something's
wrong with the driver.  I put back the old version, which had been
running minutes earlier, and it also abended in the same way.  I double
checked (triple checked, even) everything to make sure I REALLY had the
old version back in (I'm sure).  Let me note that we had put on a whole
bunch of CP fixes that morning (8904 SUP), but the TCPIP machine had
started up OK when we booted, and I checked it before I started fooling
around with it.

I got desparate (even superstitious) and started doing things like
giving the TCPIP machine another meg of memory "just in case", restoring
the old 191 disk from backup, rechecking the 592 disk, etc.  What
finally helped was commenting out the autolog statements for the two
service machines that TCPIP starts up (SMTP and NAMESRV).  If TCPIP
doesn't start those machines, it comes up OK, with either the old or new
version of the driver.  I can then XAUTOLOG the two machines (either
from TCPIP or another id) and everything works fine.

I thought maybe I could reproduce the problem with straight IBM code (we
also run FAL through a 7170 on a 3084), but if I even comment out the
DEVICE/LINK/START statements, the TCPIP machine starts up fine (but
doesn't do anything, of course).  We've since put the same CP fixes
(more or less) on the 3084, and the TCPIP seems to start up fine.

I called the support center to see if there are any outstanding problems
that might apply, but I'm waiting for a call back.  I don't have a lot
of hope.  I looked at the dump, and the location where the operation
exception occurred is mostly zeros.  It looks like the code took a flying
leap into a data buffer, or something clobbered some code, or both.  I
don't know if this is a CP problem, a FAL problem, or a Lippke driver
problem.  Any ideas?

                                    Herman Collins

128 McVey Hall                      Bitnet: SYSHERM@UKCC
University of Kentucky              Internet: sysherm@ukcc.uky.edu
Lexington, Ky.  40506-0045          Phone: 606-257-2256

RIEHL@VM.USC.EDU (02/10/90)

I dont know if your on this list.  is this the version of the driver
that your working with?
jr
----------------------------Original message----------------------------
Well, I'm having an odd problem with the TCPIP machine abending just
after startup, and I'm note sure to whom to turn.  If anybody has any
suggestions, I'd love to hear them.

I'm running the FAL 1.2.1 software on a 3090 under VM/XA-SP2, and using
David Lippke's driver to work a BTI interface.  This had been working
great for about eight months when David sent a new version of the
driver (2.15).  I decided to replace our very old version (0.6) with the
nice new code, but when I brought it up, the TCPIP machine abended with
an operation exception.  [This was a paricularly nasty situation because
the machine takes a VMDUMP after the abend, re-IPLs, trys to start up
again, abends again, ... well, you get the picture.]

Oh my, I thought, either I've screwed up the installation or something's
wrong with the driver.  I put back the old version, which had been
running minutes earlier, and it also abended in the same way.  I double
checked (triple checked, even) everything to make sure I REALLY had the
old version back in (I'm sure).  Let me note that we had put on a whole
bunch of CP fixes that morning (8904 SUP), but the TCPIP machine had
started up OK when we booted, and I checked it before I started fooling
around with it.

I got desparate (even superstitious) and started doing things like
giving the TCPIP machine another meg of memory "just in case", restoring
the old 191 disk from backup, rechecking the 592 disk, etc.  What
finally helped was commenting out the autolog statements for the two
service machines that TCPIP starts up (SMTP and NAMESRV).  If TCPIP
doesn't start those machines, it comes up OK, with either the old or new
version of the driver.  I can then XAUTOLOG the two machines (either
from TCPIP or another id) and everything works fine.

I thought maybe I could reproduce the problem with straight IBM code (we
also run FAL through a 7170 on a 3084), but if I even comment out the
DEVICE/LINK/START statements, the TCPIP machine starts up fine (but
doesn't do anything, of course).  We've since put the same CP fixes
(more or less) on the 3084, and the TCPIP seems to start up fine.

I called the support center to see if there are any outstanding problems
that might apply, but I'm waiting for a call back.  I don't have a lot
of hope.  I looked at the dump, and the location where the operation
exception occurred is mostly zeros.  It looks like the code took a flying
leap into a data buffer, or something clobbered some code, or both.  I
don't know if this is a CP problem, a FAL problem, or a Lippke driver
problem.  Any ideas?

                                    Herman Collins

128 McVey Hall                      Bitnet: SYSHERM@UKCC
University of Kentucky              Internet: sysherm@ukcc.uky.edu
Lexington, Ky.  40506-0045          Phone: 606-257-2256