[comp.unix.ultrix] hanging jobs

saus@media-lab.media.mit.edu (Mark Sausville) (12/07/89)

I haven't seen anyone griping about this but it's getting bad enough
that I thought I'd ask:

Ultrix 3.1 on VAX 6320

Certain processes seem to hang around doing something long after the
users who had initiated them have gone home.  Emacs (gnu 18.54) and
mail (/usr/ucb/mail - ultrix)  are two of the more prominent offending
programs.

One notices these processes by running top(1).  They typically show up
as having used minutes of cpu time which serves to distinguish them
from the live programs which typically utilize seconds of cpu time.

Often, these jobs sit there eating lots of CPU doing who knows what.
My guess is that they are polling hard for input.

When users are queried about these processes they usually say, "Huh,
what emacs (mail) job?"

Anybody else seeing something like this?

						Mark.

Mark Sausville                           MIT Media Laboratory
Computer Systems Administrator           Room E15-354
617-253-0325                             20 Ames Street
saus@media-lab.media.mit.edu             Cambridge, MA 02139

barmar@Think.COM (12/07/89)

In article <SAUS.89Dec6220626@media-lab.media.mit.edu> saus@media-lab.media.mit.edu (Mark Sausville) writes:
>Certain processes seem to hang around doing something long after the
>users who had initiated them have gone home.  Emacs (gnu 18.54) and
>mail (/usr/ucb/mail - ultrix)  are two of the more prominent offending
>programs.

We've seen runaway Emacses, too.  I believe we determined that it happens
when the user hangs up the phone or closes a telnet or rlogin connection
while in Emacs.  The system call that Emacs uses to read from the terminal
just returns no characters in this situation, rather than blocking or
returning an error code, so Emacs goes into a tight loop trying to read
input.  I don't remember any more details, though.


Barry Margolin, Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

grr@cbmvax.UUCP (George Robbins) (12/08/89)

In article <SAUS.89Dec6220626@media-lab.media.mit.edu> saus@media-lab.media.mit.edu (Mark Sausville) writes:
> 
> I haven't seen anyone griping about this but it's getting bad enough
> that I thought I'd ask:
> 
> Ultrix 3.1 on VAX 6320
> 
> Certain processes seem to hang around doing something long after the
> users who had initiated them have gone home.  Emacs (gnu 18.54) and
> mail (/usr/ucb/mail - ultrix)  are two of the more prominent offending
> programs.
> 
> One notices these processes by running top(1).  They typically show up
> as having used minutes of cpu time which serves to distinguish them
> from the live programs which typically utilize seconds of cpu time.
> 
> Often, these jobs sit there eating lots of CPU doing who knows what.
> My guess is that they are polling hard for input.

Both of these guys try to catch/process the hangup signal or other
signals sent to them when the user logs out or is disconnected.

Mail seems to always do this, though perhaps only if you are in the
message entry mode.  Gnu-emacs alledgely gets stuck in a loop if you
have modified a buffer and it asks if you want it saved.

I don't know if these are simply application problems or whether Ultrix
is doing somethign a teensy bit different than other BSD systems that
causes the problem.

Note that the gnu-emacs thing can also apparently be a disk intensive
loop, with unpleasent effects if the disk it is trying to access
happens to be NFS mounted...

-- 
George Robbins - now working for,	uucp: {uunet|pyramid|rutgers}!cbmvax!grr
but no way officially representing	arpa: cbmvax!grr@uunet.uu.net
Commodore, Engineering Department	fone: 215-431-9255 (only by moonlite)

bin@primate.wisc.edu (Brain in Neutral) (12/08/89)

> We've seen runaway Emacses, too.  I believe we determined that it happens
> when the user hangs up the phone or closes a telnet or rlogin connection
> while in Emacs.  The system call that Emacs uses to read from the terminal
> just returns no characters in this situation, rather than blocking or
> returning an error code, so Emacs goes into a tight loop trying to read
> input.  I don't remember any more details, though.

That's my guess on the hung mailers, too.  I see sendmail processes every
now and then that chew up our whole machine and have to be killed.  Usually
it can be traced to something odd happening while the mailer was collecting
a message.  I figured it was due to not detecting EOF properly.

Paul DuBois
dubois@primate.wisc.edu

bin@primate.wisc.edu (Brain in Neutral) (12/08/89)

From article <8876@cbmvax.UUCP>, by grr@cbmvax.UUCP (George Robbins):
> Mail seems to always do this, though perhaps only if you are in the
> message entry mode.  Gnu-emacs alledgely gets stuck in a loop if you
> have modified a buffer and it asks if you want it saved.
> 
> I don't know if these are simply application problems or whether Ultrix
> is doing somethign a teensy bit different than other BSD systems that
> causes the problem.

I've seen the mailer loop on a 4.2BSD system as well, so it isn't just
Ultrix-specific; perhaps inherited from BSD.

Paul DuBois
dubois@primate.wisc.edu

hans@umd5.umd.edu (Hans Breitenlohner) (12/13/89)

In article <SAUS.89Dec6220626@media-lab.media.mit.edu> saus@media-lab.media.mit.edu (Mark Sausville) writes:
>
>I haven't seen anyone griping about this but it's getting bad enough
>that I thought I'd ask:
>
>Ultrix 3.1 on VAX 6320
>
>Certain processes seem to hang around doing something long after the
>users who had initiated them have gone home.  Emacs (gnu 18.54) and
>mail (/usr/ucb/mail - ultrix)  are two of the more prominent offending
>programs.
>
>One notices these processes by running top(1).  They typically show up
>as having used minutes of cpu time which serves to distinguish them
>from the live programs which typically utilize seconds of cpu time.
>
>Often, these jobs sit there eating lots of CPU doing who knows what.
>My guess is that they are polling hard for input.
>
>When users are queried about these processes they usually say, "Huh,
>what emacs (mail) job?"
>
>Anybody else seeing something like this?
>
>						Mark.
>
>Mark Sausville                           MIT Media Laboratory
>Computer Systems Administrator           Room E15-354
>617-253-0325                             20 Ames Street
>saus@media-lab.media.mit.edu             Cambridge, MA 02139


Yes, I have seen several cases of this.

1. /bin/sh -- if you close a telnet connection while it hangs on a read,
   it will loop.  Our solution was to have the offending shell script
   use /bin/sh5 instead.
2. /usr/ucb/mail -- if you close a telnet connection while it is doing
   something other than waiting for a command, it will spin in a loop
   of the following form:
      do {
          ...
          getc(ibuf)
          ...
      } while (ferror(ibuf) && ibuf == stdin);

   on the mistaken assumption that errors on stdin are transient, and that
   eventually an EOF will be returned.

   This bug is particularly insidious, as it is most likely to show up when
   your system gets very busy.

   Berkeley has fixed the problem, but the /usr/ucb/mail in Ultrix is based
   on 6 year old Berkeley sources.

   Below are changes (to Ultrix 3.0 sources) which fixed this problem for
   us.  Your Mileage may vary.

   If you don't have sources you will have to invoke adb to fix this one,
   which will be a challenge since the executable is stripped.

3. I have also seen emacs processes, but never pursued that problem.

Hope this helps.

Hans




*** fio.c.old	Wed Oct  4 13:19:48 1989
--- fio.c	Thu Oct  5 18:59:13 1989
***************
*** 175,180
  	return(c+1);
  }
  
  /*
   * Quickly read a line from the specified input into the line
   * buffer; return characters read.

--- 175,181 -----
  	return(c+1);
  }
  
+ #if 0
  /*
   * Quickly read a line from the specified input into the line
   * buffer; return characters read.
***************
*** 206,211
  	*cp = 0;
  	return(cp - linebuf + 1);
  }
  
  /*
   * Read up a line from the specified input into the line

--- 207,213 -----
  	*cp = 0;
  	return(cp - linebuf + 1);
  }
+ #endif
  
  /*
   * Read up a line from the specified input into the line
***************
*** 220,226
  	register char *cp;
  	register int c;
  
- 	do {				/*read while no errs & stdin ==file*/
  		clearerr(ibuf);		/*reset err indication on input*/
  		c = getc(ibuf);
  		for (cp = linebuf; c != '\n' && c != EOF; c = getc(ibuf)) {

--- 222,227 -----
  	register char *cp;
  	register int c;
  
  		clearerr(ibuf);		/*reset err indication on input*/
  		c = getc(ibuf);
  		for (cp = linebuf; c != '\n' && c != EOF; c = getc(ibuf)) {
***************
*** 229,235
  			if (cp - linebuf < LINESIZE-2)
  				*cp++ = c;
  		}
- 	} while (ferror(ibuf) && ibuf == stdin);
  	*cp = 0;			/*terminates line*/
  	if (c == EOF && cp == linebuf)	/*if @ beginning of line & char=EOF*/
  		return(0);		/* then return 0*/

--- 230,235 -----
  			if (cp - linebuf < LINESIZE-2)
  				*cp++ = c;
  		}
  	*cp = 0;			/*terminates line*/
  	if (c == EOF && cp == linebuf)	/*if @ beginning of line & char=EOF*/
  		return(0);		/* then return 0*/