mills@ccu.umanitoba.ca (Gary Mills) (03/16/90)
Has anyone else seen this problem? Users complain that they get the following error when attempting to post an article with Pnews: /usr/local/lib/newsbin/inject/anne.jones: 5115 Memory fault - core dumped It seems to be intermittent. I haven't seen the problem myself. This is C-news, on a Sun-4/280 under 4.0.3. I have seen a memory fault from `ps' a few times, once when I had just restarted `cron', and a couple of times when I had started a `catman' in the background. Nothing unusual showed up on the system log at the time. -- -Gary Mills- -University of Manitoba- -Winnipeg-
moraes@cs.toronto.edu (Mark Moraes) (03/16/90)
mills@ccu.umanitoba.ca (Gary Mills) writes: >/usr/local/lib/newsbin/inject/anne.jones: 5115 Memory fault - core dumped It means some program invoked by the anne.jones shell script is dumping core, the most likely culprit is awk, which does this when confronted by lines longer than some constant above 2K. (New awk is slightly more graceful, and prints an error message, GNU awk reallocates the buffer till it runs out of memory, it seems) sed is the next likely culprit, though it usually just silently truncates the line and keeps going, which is often worse. If you have a core file around, use 'file' to check what caused it.
henry@utzoo.uucp (Henry Spencer) (03/17/90)
In article <1990Mar15.185725.27695@ccu.umanitoba.ca> mills@ccu.umanitoba.ca (Gary Mills) writes: >Has anyone else seen this problem? Users complain that they get the >following error when attempting to post an article with Pnews: > >/usr/local/lib/newsbin/inject/anne.jones: 5115 Memory fault - core dumped Given that anne.jones is a shell file, it can't be dumping core itself. The most obvious possibility is a core dump from awk: in particular, old awks tend to react badly to very long lines. A nastier one is a core dump from the shell -- we've seen reports that the Sun4 4.0 shell can dump core on some of our shell programs in ill-defined circumstances. (This is all the more disgusting because the problem that is most likely to be to blame was understood years ago, and Geoff's paper "A Partial Tour Through the UNIX Shell" at San Diego Usenix explained it and how to fix it.) -- MSDOS, abbrev: Maybe SomeDay | Henry Spencer at U of Toronto Zoology an Operating System. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
guy@auspex.auspex.com (Guy Harris) (03/20/90)
>(This is all the more disgusting because the problem that is most >likely to be to blame was understood years ago, and Geoff's paper >"A Partial Tour Through the UNIX Shell" at San Diego Usenix explained >it and how to fix it.) Well, it depends on which problem you're thinking of. The SunOS 3.2 Bourne shell was scoured for the "catch SIGSEGV and grow the 'stack'" hack. (Actually, the BRL Bourne shell - based, like the SunOS 3.x one, on the S5R2 Bourne shell - was the one that was scoured, and the changes were later applied to the 3.x one, but I digress....) Unfortunately, not all of the problems were caught (no, not all of the places where you have to check use "pushstak()", and we didn't rewhack the handling of the "stack" as much as you did). Some are still present in the 4.0 shell, but I think are fixed in the 4.1 shell. BTW, comments on a couple of statements in the paper: 1) "The Ninth Edition (and presumably System V Release 2) shell used some of the directory(3) routines from the C library..." Nope. The S5R2 shell couldn't use the directory routines from the C library since they weren't in the S5R2 C library. 2) "The performance impact [of checking manually rather than having the MMU do it] has not been measured, but appears to be insignificant." At one point, I compiled two versions of the shell, identical except that one did the checks and one depended on the "SIGSEGV hack", on a 3B2/400 running S5R3 (it happened to be the nearest handy system that 1) had an OS that could conveniently compile an S5-based shell like the SunOS 3.x one and 2) made the "SIGSEGV hack" work) and tried doing echo `cat *.c` in the hopes it'd really stress the code where the checks appeared. The performance impact was, in fact, insignificant, at least for that test. I'm curious what the performance impact was on the original PDP-11(s) on which the work was done, given that John Mashey has claimed, as I remember, that Bourne put the SIGSEGV hack in at his urging in order to speed up the shell.
mills@ccu.umanitoba.ca (Gary Mills) (07/20/90)
About a month ago, I posted an article about one user having problems posting news. This is Cnews on SunOS 4.0.3, and the core file said it was /bin/sh. Now, I have a clue form that user: }Remember a couple weeks ago when I mentioned I had a problem with posting news }articles? I kept getting a core dump and you saw that sh was killed off }somehow. Well I got fed up enough to actually do something, and I found that }the problem was in the environment variable CONNECT. I was using that }variable name for a script of mine, and inews evidently didn't like it. I }changed the variable name and voila! I am now a net.poster once again! This sounds like an environment problem involving Sun's /bin/sh. The particular variable he used is not used by Cnews. Perhaps /bin/sh trashes the environment when it does an export and the environment must expand? Does anyone have any ideas on this? -- -Gary Mills- -University of Manitoba- -Winnipeg-
flee@guardian.cs.psu.edu (Felix Lee) (07/21/90)
>This sounds like an environment problem involving Sun's /bin/sh. The >particular variable he used is not used by Cnews. There is a problem with SunOS 4.0, and maybe still in 4.1, where various things will dump core if your environment is exactly the wrong length. I've seen this happen with sh, vi, and sendmail. I've never found out why. -- Felix Lee flee@cs.psu.edu
guy@auspex.auspex.com (Guy Harris) (07/22/90)
>This sounds like an environment problem involving Sun's /bin/sh.
Yeah, there are some cases where the 4.0.3 "/bin/sh" will drop core;
they may be environment-related. I expect they're fixed in 4.1....