[comp.unix.xenix] system lockup

fnf@estinc.UUCP (Fred Fish) (02/14/89)

I had a case of a program running on a Bell Technologies MPE386,
under SCO 386 Xenix Version 2.3.1, which was causing the system
to lock up (no response to any typing on any terminal).  I finally
tracked it down.  This simple program demonstrates the problem
on my system:

	static char buf [1024 * 1024];

	main ()
	{
		write (0, buf, sizeof (buf));
	}

Can anyone else confirm similar symptoms?

-Fred
-- 
# Fred Fish, 1835 E. Belmont Drive, Tempe, AZ 85284,  USA
# 1-602-491-0048           asuvax!{nud,mcdphx}!estinc!fnf

daveh@marob.MASA.COM (Dave Hammond) (02/15/89)

In article <60@estinc.UUCP> fnf@estinc.UUCP (Fred Fish) writes:
>[...]
>under SCO 386 Xenix Version 2.3.1, which was causing the system
>to lock up (no response to any typing on any terminal).  I finally
>tracked it down.  This simple program demonstrates the problem
>on my system:
>
>	static char buf [1024 * 1024];
>	main ()
>	{
>		write (0, buf, sizeof (buf));
>	}

I'm not offering confirmation, but I am quite curious why you
are writing to file descriptor 0, normally associated with your
standard input -- a *read* descriptor.   Perhaps this is contributing
to your problem?

[inews fodder]
[inews fodder]
[inews fodder]
[inews fodder]
[inews fodder]
[inews fodder]
[inews fodder]
[inews fodder]
[inews fodder]
[inews fodder]
--
Dave Hammond
daveh@marob.masa.com

jim@tiamat.fsc.com (Jim O'Connor) (02/15/89)

In article <60@estinc.UUCP>, fnf@estinc.UUCP (Fred Fish) writes:
> 
> I had a case of a program running on a Bell Technologies MPE386,
> under SCO 386 Xenix Version 2.3.1, which was causing the system
> to lock up (no response to any typing on any terminal).  I finally
> tracked it down.  This simple program demonstrates the problem
> on my system:
> 
> 	static char buf [1024 * 1024];
> 
> 	main ()
> 	{
> 		write (0, buf, sizeof (buf));
> 	}

Something doesn't look right here.  Isn't this an attempt to write to
"stdin" which is probably not open for writing?  It seems that a return
value of -1, and errno set to EBADF should be expected.  As to why it 
hangs the system, I don't know, but it really shouldn't matter, since I
don't think too many programs would be intentionally writing 1024 * 1024 NULL
bytes to a file desciptor for which an attempt at opening for writing
(and in this case, closing one that is already open for reading) has not been
made.  

What program do you have that needs to do this?


------------- 
James B. O'Connor			jim@tiamat.fsc.com
Filtration Sciences Corporation		615/821-4022 x. 651

*** Altos users unite! mail to "info-altos-request@tiamat.fsc.com" ***

karl@ddsw1.MCS.COM (Karl Denninger) (02/16/89)

In article <60@estinc.UUCP> fnf@estinc.UUCP (Fred Fish) writes:
$
$I had a case of a program running on a Bell Technologies MPE386,
$under SCO 386 Xenix Version 2.3.1, which was causing the system
$to lock up (no response to any typing on any terminal).  I finally
$tracked it down.  This simple program demonstrates the problem
$on my system:
$
$	static char buf [1024 * 1024];
$
$	main ()
$	{
$		write (0, buf, sizeof (buf));
$	}
$
$Can anyone else confirm similar symptoms?

One "yeah, it happens here".

I think I'd call that a pretty major bug.  I've forwarded this to SCO.

--
Karl Denninger (karl@ddsw1.MCS.COM, ddsw1!karl)
Data: [+1 312 566-8912], Voice: [+1 312 566-8910]
Macro Computer Solutions, Inc.    	"Quality solutions at a fair price"

fnf@estinc.UUCP (Fred Fish) (02/16/89)

In article <555@marob.MASA.COM> daveh@marob.masa.com (Dave Hammond) writes:
<In article <60@estinc.UUCP> fnf@estinc.UUCP (Fred Fish) writes:
<<[...]
<<This simple program demonstrates the problem on my system:
<<
<<	static char buf [1024 * 1024];
<<	main ()
<<	{
<<		write (0, buf, sizeof (buf));
<<	}
<
<I'm not offering confirmation, but I am quite curious why you
<are writing to file descriptor 0, normally associated with your
<standard input -- a *read* descriptor.   Perhaps this is contributing
<to your problem?

This is a contrived example which demonstrates the system lockup.  The
application was attempting to write to a file that it had opened, but
the file descriptor was getting lost due to a program bug, causing it
to attempt to write to descriptor 0.  No, I would not normally attempt
to write 1Mb of null bytes to my standard input stream.  However, I
wouldn't expect such an attempt to bring the system to its knees.

-Fred
-- 
# Fred Fish, 1835 E. Belmont Drive, Tempe, AZ 85284,  USA
# 1-602-491-0048           asuvax!{nud,mcdphx}!estinc!fnf

davidsen@steinmetz.ge.com (William E. Davidsen Jr) (02/17/89)

In article <60@estinc.UUCP> fnf@estinc.UUCP (Fred Fish) writes:

| 	static char buf [1024 * 1024];
| 
| 	main ()
| 	{
| 		write (0, buf, sizeof (buf));
| 	}
| 
| Can anyone else confirm similar symptoms?

  You're right! All of my programs which write huge arrary to the
keyboard don't work under Xenix! Seriously, I think there are two
problems here, one of which is you're trying to write to an input file.
If that hangs your system it's a bug. 

Begin unsupported hypothesis
____________________________

  When you are writing a huge, uninitialized, array, I *believe* that it
is not paged from the disk but created and zeroed on the fly on demand.
This could get the CPU very busy in kernel and hang the system until
done. If you don't have a lot of memory you will probably start paging
some of the newly created pages out. There may well be a problem in
doing a write which gets a page fault while setting up the i/o.

  I borrowed a system at work to try this, and it did not hang anything.
I was able to change to other virtual screens, etc. I'm not sure what
writing to the keyboard does, but it didn't hurt anything. Xenix/386
v2.3.1 on a Dell325.
-- 
	bill davidsen		(wedu@ge-crd.arpa)
  {uunet | philabs}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

djones@jcc-one.UUCP (Dan Jones) (02/17/89)

I can confirm Fred's problem.  However I got a little suspicious, so logged in
via a terminal line and tried it -- all was fine.

It seems to be just the console driver that cannot accept the big write.
Sending the info to a serial port or over ethernet to a telnet session works
just fine.

Anyone for an input buffer overflow problem in the console driver? {*grin*}

	Dan Jones

P.S. Anyone gotten ambitious and seen what happens if instead of 1MB of NULs,
one sends, for example, 1MB of 'A's????

how@milhow1uunet.UU.NET (Mike Howard) (02/18/89)

In article <107@jcc-one.UUCP> djones@jcc-one.UUCP (Dan Jones) writes:
>I can confirm Fred's problem.  However I got a little suspicious, so logged in
>via a terminal line and tried it -- all was fine.

My system locked for about 2 minutes (1:53 according to time(CP)) and
then came back.  This was with nothing else going on.

A second test, with a couple of things going on (a few shells and
a cu): `date ; time tst ; date' - resulted in:
day of month 00:13:17

real  18:12.2
user      0.0
sys    1:49.4

day-of-month 00:31.29

The actual elapsed time was about 18 minutes, whereas the clock
only advanced 18 seconds.

As a guess: the `write()' is done as a more or less atomic operation
and the kernal fires the entire 1 Meg of zeros out at spl7 so that
the system `hang's.  [ if this is true, then it also implies that
the kernal is smart enough to not page in memory which is initialized
to zero and is being read, because my disk is not read ].  This is
supported by the clock not working correctly during the test: the
timer interupt is at spl6 and so is masked out during the spl7
time.

If there is a bug, it is in SCO's handling of tty interupts.
Anybody competent out there want to comment on tty drivers?

-- 
Mike Howard
uunet!milhow1!how