[news.software.b] Anne Jones dumps core

mills@ccu.umanitoba.ca (Gary Mills) (03/16/90)

Has anyone else seen this problem?  Users complain that they get the
following error when attempting to post an article with Pnews:

/usr/local/lib/newsbin/inject/anne.jones: 5115 Memory fault - core dumped

It seems to be intermittent.  I haven't seen the problem myself.  This
is C-news, on a Sun-4/280 under 4.0.3.  I have seen a memory fault from
`ps' a few times, once when I had just restarted `cron', and a couple
of times when I had started a `catman' in the background.  Nothing
unusual showed up on the system log at the time.
-- 
-Gary Mills-             -University of Manitoba-             -Winnipeg-

moraes@cs.toronto.edu (Mark Moraes) (03/16/90)

mills@ccu.umanitoba.ca (Gary Mills) writes:
>/usr/local/lib/newsbin/inject/anne.jones: 5115 Memory fault - core dumped

It means some program invoked by the anne.jones shell script is dumping
core, the most likely culprit is awk, which does this when confronted
by lines longer than some constant above 2K.  (New awk is slightly
more graceful, and prints an error message, GNU awk reallocates the
buffer till it runs out of memory, it seems) sed is the next likely
culprit, though it usually just silently truncates the line and keeps
going, which is often worse.

If you have a core file around, use 'file' to check what caused it.

henry@utzoo.uucp (Henry Spencer) (03/17/90)

In article <1990Mar15.185725.27695@ccu.umanitoba.ca> mills@ccu.umanitoba.ca (Gary Mills) writes:
>Has anyone else seen this problem?  Users complain that they get the
>following error when attempting to post an article with Pnews:
>
>/usr/local/lib/newsbin/inject/anne.jones: 5115 Memory fault - core dumped

Given that anne.jones is a shell file, it can't be dumping core itself.
The most obvious possibility is a core dump from awk:  in particular, old
awks tend to react badly to very long lines.  A nastier one is a core
dump from the shell -- we've seen reports that the Sun4 4.0 shell can dump
core on some of our shell programs in ill-defined circumstances.  (This is
all the more disgusting because the problem that is most likely to be to
blame was understood years ago, and Geoff's paper "A Partial Tour Through
the UNIX Shell" at San Diego Usenix explained it and how to fix it.)
-- 
MSDOS, abbrev:  Maybe SomeDay |     Henry Spencer at U of Toronto Zoology
an Operating System.          | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

guy@auspex.auspex.com (Guy Harris) (03/20/90)

>(This is all the more disgusting because the problem that is most
>likely to be to blame was understood years ago, and Geoff's paper
>"A Partial Tour Through the UNIX Shell" at San Diego Usenix explained
>it and how to fix it.)

Well, it depends on which problem you're thinking of.

The SunOS 3.2 Bourne shell was scoured for the "catch SIGSEGV and grow
the 'stack'" hack.  (Actually, the BRL Bourne shell - based, like the
SunOS 3.x one, on the S5R2 Bourne shell - was the one that was scoured,
and the changes were later applied to the 3.x one, but I digress....)

Unfortunately, not all of the problems were caught (no, not all of the
places where you have to check use "pushstak()", and we didn't rewhack
the handling of the "stack" as much as you did).  Some are still present
in the 4.0 shell, but I think are fixed in the 4.1 shell. 

BTW, comments on a couple of statements in the paper:

	1) "The Ninth Edition (and presumably System V Release 2) shell
	   used some of the directory(3) routines from the C library..."

	   Nope.  The S5R2 shell couldn't use the directory routines
	   from the C library since they weren't in the S5R2 C library.

	2) "The performance impact [of checking manually rather than
	   having the MMU do it] has not been measured, but appears to
	   be insignificant."

	   At one point, I compiled two versions of the shell, identical
	   except that one did the checks and one depended on the
	   "SIGSEGV hack", on a 3B2/400 running S5R3 (it happened to be the
	   nearest handy system that 1) had an OS that could
	   conveniently compile an S5-based shell like the SunOS 3.x one
	   and 2) made the "SIGSEGV hack" work) and tried doing

		echo `cat *.c`

	   in the hopes it'd really stress the code where the checks
	   appeared.

	   The performance impact was, in fact, insignificant, at least
	   for that test.

	   I'm curious what the performance impact was on the original
	   PDP-11(s) on which the work was done, given that John Mashey
	   has claimed, as I remember, that Bourne put the SIGSEGV hack
	   in at his urging in order to speed up the shell.

mills@ccu.umanitoba.ca (Gary Mills) (07/20/90)

About a month ago, I posted an article about one user having problems
posting news.  This is Cnews on SunOS 4.0.3, and the core file said it
was /bin/sh.  Now, I have a clue form that user:

}Remember a couple weeks ago when I mentioned I had a problem with posting news
}articles?  I kept getting a core dump and you saw that sh was killed off
}somehow.  Well I got fed up enough to actually do something, and I found that
}the problem was in the environment variable CONNECT.  I was using that
}variable name for a script of mine, and inews evidently didn't like it.  I
}changed the variable name and voila!  I am now a net.poster once again!

This sounds like an environment problem involving Sun's /bin/sh.  The
particular variable he used is not used by Cnews.  Perhaps /bin/sh trashes
the environment when it does an export and the environment must expand?
Does anyone have any ideas on this?
-- 
-Gary Mills-             -University of Manitoba-             -Winnipeg-

flee@guardian.cs.psu.edu (Felix Lee) (07/21/90)

>This sounds like an environment problem involving Sun's /bin/sh.  The
>particular variable he used is not used by Cnews.

There is a problem with SunOS 4.0, and maybe still in 4.1, where
various things will dump core if your environment is exactly the wrong
length.  I've seen this happen with sh, vi, and sendmail.  I've never
found out why.
--
Felix Lee	flee@cs.psu.edu

guy@auspex.auspex.com (Guy Harris) (07/22/90)

>This sounds like an environment problem involving Sun's /bin/sh.

Yeah, there are some cases where the 4.0.3 "/bin/sh" will drop core;
they may be environment-related.  I expect they're fixed in 4.1....