[comp.os.vms] KERMIT brings system to its knees

campbell@maynard.BSW.COM (Larry Campbell) (08/03/87)

We run KERMIT in two modes on our VAX.  (There are names -- local and
remote -- for the modes but I can never remember which is which.)
Anyway, one mode is where you log in to the VAX from a PC and then run
KERMIT, typically in server mode.  This works fine.  But when a local
(dumb terminal, say) VAX user uses KERMIT to connect to a modem line
and dial out from the VAX, it brings the system to its knees.

MONITOR PROCESS/TOPCPU shows nothing unusual.  But users only get at
best 50% of wall clock time;  it seems like something is eating the
system at some interrupt level.

We're on a 750 running VMS 4.1 (I know, I know), with KERMIT 3.1.066.

Has anyone else observed this behavior?  Is it a VMS bug or a KERMIT bug?
-- 
Larry Campbell                                The Boston Software Works, Inc.
Internet: campbell@BSW.COM                  120 Fulton Street, Boston MA 02109
uucp: {husc6,mirror,think}!maynard!campbell         +1 617 367 6846

tencati@VLSI.JPL.NASA.GOV (08/04/87)

Larry,

Is KERMIT installed?  I have no such problem on my 780.   I would tend to 
believe that KERMIT should NOT be installed.  I was just thinking..

Normal user, right?  Non-elevated privs?  What does MONITOR SYSTEM look like?
-------

jmleonar@CRDEC-VAX2.ARPA (jle) (08/05/87)

Rather than use MONITOR PROCESS/TOPCPU, use MONITOR MODES to display the
time (percentage) the system is spending in the various process states.  If
the I/O rate of KERMIT/VAXNET is killing the VAX, the INTERRUPT STACK and
KERNEL percentages will be very high (and the USER MODE) will be low...

                                             Joe Leonard
                                        <jmleonar@crdec.arpa>

Disclaimer: The views of my employer do not conform to my views, or to any
            accepted standard of logic that the Greeks thought up anyway...

RALPH@UHHEPG.BITNET (08/06/87)

Date:  4-AUG-1987 17:51:51.03
From: Ralph Becker-Szendy RALPH AT UHHEPG
To:   B_INFOVAX,RALPH
Subj: Re: KERMIT brings system to its knees
Concerning KERMIT eating the system:

Look how much time is spent in kernel/supervisor mode or servicing interrupts.
MONITOR PROCESS/TOPCPU doesn't give you the whole picture, i think it only
accounts for USER MODE TIME !

Some of my old experience with this kind of stuff:

At the place i worked before, we used VAXNET and later KERMIT to connect a
750 to a 780 (we had ordered DECNET with DEUNAs? and ETHERNET, but being in
Europe it tooks 3 or 4 months to deliver it).

Most of the time we used it as following: user sits in a room with a (dumb)
terminal connected to the 750. He has to work on the 780, so he connects
to the 780 via a 9600 baud terminal cable, using VAXNET or KERMIT. For a
typical heavy editing session, VAXNET would use MOST OF THE 750 ! It never
really showed up in MONITOR, but the machine would be virtually dead for
lower priority batch jobs, which means that somehow the user is able to
get most of the resources. After getting KERMIT, it would just use around
HALF THE 750 (which we considered much better than before). It is just amazing
how such a simple use as just logically connecting one port to another port
can use most of the resources of a 100,000$ machine (if you use inefficient
software), so we ended up doing the connection in hardware: just swapping the
cables on the terminal whenever someone would have to go to the 780.

By the way, after we got DECNET, i tried SET HOST/DTE once, and it worked
really nice, using even less cpu-power than a SET HOST via ETHERNET.
The DECNET link via ETHERNET always seemed to be very reliable (and extremely
user-friendly), but INEFFICIENT.

My typical use was quite perverted: we always had too little disk space
on the 780 (using RP06 drives !), but my monte-carlo program couldn't run
on the 750 (would take too long with such a slow CPU). So, i ended using
the DECNET connection as a "remote disk drive", and it was amazingly slow.
The following figures are for the case of only one batch job (and no heavy
interactive users) on the machine; the program reads a record from one file,
spends something like 30 cpu-seconds grinding it down, and writes a record
to the output file. Running my job with local disks on the 780 i typically
got 70% of the CPU time (the rest goes to NULL), using two tape drives
(MT45?, the old 1600 bpi ones) it was around 40%, and using the disks on
the 750 via DECNET the figure was pretty much like tape drives. In this
case, the NETSERVER on the 750 would typically munch around 20% of the
750 CPU time.

Long live DIGITAL ! I am looking ahead to being connected to a campus-wide
network via TCP/IP, that should be fun to watch.

Ralph Becker-Szendy
University of HAwaii / High Energy Physics Group

Disclaimer: The views expressed here are probably not endorsed by my
employer. I hardly ever actually speak to my employer. Even our system manager
stops smiling when i come by.