[comp.sources.bugs] Sun4os4 /bin/sh dumps core.

schwartz@shire.cs.psu.edu (Scott Schwartz) (09/08/89)

In article <550@caldwr.UUCP> rfinch@caldwr.UUCP (Ralph Finch) writes:

   I'm running SunOS 4.0.1 on a Sun 4/260.  Got perl 3.0 (beta),
   ran configure, make depend, make, then make test.  On the
   latter it gives the message:

   sh: 24615 Memory fault - core dumped

A common problem on Sun 4's, unfortunately.  The only known workaround
is to change the size of your environment.  Removing some of your
environment variables might do the trick.  With some patience you can
get vi to do this too, by the way.

-- Scott



--
Scott Schwartz		<schwartz@shire.cs.psu.edu>
"APAR's?  We don' neeed no steeenking APARS!"

flee@shire.cs.psu.edu (Felix Lee) (09/08/89)

Actually, a problem with SunOS 4.0 in general--I suspect dynamic
libraries.  The following may or may not cause a coredump:
	(in csh)
	% set x=x
	% while (1)
	? sh -c 'a='$x'`pwd` echo -n .'		# note the quoting!
	? set x=x$x
	? end

After a couple hundred dots, you may or may not get a series of
	sh: Memory fault - core dumped
(The loop will eventually stop with csh complaining "Word too long".)
Number of dots before coredump seems to depend on the user, but is
consistent otherwise.
--
Felix Lee	flee@shire.cs.psu.edu	*!psuvax1!flee

kutvonen@cs.Helsinki.FI (Petri Kutvonen) (09/08/89)

In article <FLEE.89Sep7225002@shire.cs.psu.edu> flee@shire.cs.psu.edu (Felix Lee) writes:
>Actually, a problem with SunOS 4.0 in general--I suspect dynamic
>libraries. [...]

Probably not so.  I have perl 3.0-beta running on Sun 3's (under SunOS 4.0.3)
but it (or actually sh) dumps core on my Sun 4/260 (also SunOS 4.0.3)
when I try 'make test'.  Either this is a specific Sun 4 problem or I have
made some mistake while running Configure. 

Petri Kutvonen   University of Helsinki           kutvonen@cs.Helsinki.FI
                 Department of Computer Science   +358 0 7084216

guy@auspex.auspex.com (Guy Harris) (09/09/89)

>Actually, a problem with SunOS 4.0 in general--I suspect dynamic
>libraries.

Not too good an idea to suspect that, considering:

	Script started on Fri Sep  8 18:00:49 1989
	auspex% file /bin/sh
	/bin/sh:	sparc demand paged executable not stripped
	script done on Fri Sep  8 18:00:57 1989

The reader will note the significant absence of the phrase "dynamically
linked" in the above output from "file", demonstrating that "sh" doesn't
*use* any dynamic libraries....

The problem with "perl" had nothing to do with dynamic linking, either;
see my posting indicating the fix.

guy@auspex.auspex.com (Guy Harris) (09/10/89)

>Probably not so.  I have perl 3.0-beta running on Sun 3's (under SunOS 4.0.3)
>but it (or actually sh) dumps core

Uhh, excuse me, but please do NOT assume that because the string "sh:"
appears in the "core dumped" message that it's the shell that's dumping
core.  In fact, in the case I saw when running "make test" on a Sun-4,
it's the shell that's telling you that something *else*, namely Perl,
dumped core - yes, even if your shell is the C shell, since "make" is
running Perl from the Makefile and it uses the Bourne shell for
everything on SunOS. 

If you want to know what program generated a core file on SunOS 3.x or
4.x, do "file core".

The problem is, as Larry Wall and I noted, a "vfork"-related problem
(and merely including <vfork.h> doesn't seem to fix it). 

asgard@omni.com (Jay R. Stoner) (09/11/89)

In article <2256@hydra.Helsinki.FI> kutvonen@cs.Helsinki.FI (Petri Kutvonen) writes:
>In article <FLEE.89Sep7225002@shire.cs.psu.edu> flee@shire.cs.psu.edu (Felix Lee) writes:
>>Actually, a problem with SunOS 4.0 in general--I suspect dynamic
>>libraries. [...]

This is probably a silly question, but could it have anything to do with
struct alignment rules wrt SPARC as opposed to 680x0?
-- 
			| J.R. (Use the Source, Luke) Stoner
  "Dying is easy,	| asgard@cpro.uucp
    comedy		| asgard@pagopago.omni.com
      is hard."		| ...{hplabs,apple,pyramid}!pacbell!cpro!asgard

jmd@ursa.UUCP (Josh Diamond) (09/12/89)

In article <2431@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes:
>>Probably not so.  I have perl 3.0-beta running on Sun 3's (under SunOS 4.0.3)
>>but it (or actually sh) dumps core
>
>Uhh, excuse me, but please do NOT assume that because the string "sh:"
>appears in the "core dumped" message that it's the shell that's dumping
>core.  In fact, in the case I saw when running "make test" on a Sun-4,
>it's the shell that's telling you that something *else*, namely Perl,
>dumped core - yes, even if your shell is the C shell, since "make" is
>running Perl from the Makefile and it uses the Bourne shell for
>everything on SunOS. 
>
>If you want to know what program generated a core file on SunOS 3.x or
>4.x, do "file core".

Ah, but I _have_ managed to get /bin/sh to core dump with a segmentation
fault under SunOS 4.0*.  "file core" tells me that the core file is indeed
from /bin/sh.  It occurs when either the command line is very long or the
environment is very large.   It has bitten me several times in "make".  
Sun sent us a patched version of /bin/sh to fix it, and I beleive it has 
fixed the problem.  If you are running into this, you should probably get
in touch with Sun about it...


						Josh Diamond
						AKA Spidey!!!



-- 
 /\ \  / /\  Josh Diamond       {philabs.phillips.com, sun.com}!gotham!ursa!jmd
//\\ .. //\\ AKA Spidey!!!    ...!{sun, pwcmrd, philabs, pyrnj}!gotham!ursa!jmd
//\((  ))/\\
/  < `' >  \         Beauty is the purgation of superfluities. -- Michaelangelo

guy@auspex.auspex.com (Guy Harris) (09/12/89)

>This is probably a silly question, but could it have anything to do with
>struct alignment rules wrt SPARC as opposed to 680x0?

One last time:

	1) The bug that causes "/bin/sh" to drop core is not the same as
	   the bug that causes Perl 3.0, as distributed, to drop core on
	   a SPARC-based machine when you run "make test".

	2) Neither bug is due to dynamic linking; the Perl bug is due to
	   a problem with "vfork", and is fixed by compiling "util.c"
	   with "vfork" #defined as "fork".  (The "/bin/sh" bug is quite
	   unlikely to be due to "vfork" problems, since "/bin/sh" in
	   SunOS doesn't *do* any "vfork"s....)

There may be some alignment bugs in Perl; Theodore Ts'o's posting
indicates that there seemed to be such bugs when he brought it up on a
DECStation, and such bugs would be likely to show up on SPARC-based
machines as well.  The bug that started this thread ain't one of them,
though.  The cause of said bug has been identified, so please, no more
speculation on its cause....

roy@sts.sts.COM (09/13/89)

/* Written 11:23 am  Sep  9, 1989 by guy@auspex.auspex.com in sts:comp.sources.bugs */
/* ---------- "Re: Sun4os4 /bin/sh dumps core. (wa" ---------- */

Uhh, excuse me, but please do NOT assume that because the string "sh:"
appears in the "core dumped" message that it's the shell that's dumping
core.  ...

/* End of text from sts:comp.sources.bugs */

/* Written  9:19 am  Sep 12, 1989 by guy@auspex.auspex.com in sts:comp.sources.bugs */
/* ---------- "Re: Sun4os4 /bin/sh dumps core. (wa" ---------- */

	1) The bug that causes "/bin/sh" to drop core is not the same as
	   the bug that causes Perl 3.0, as distributed, to drop core on
	   a SPARC-based machine when you run "make test".

/* End of text from sts:comp.sources.bugs */

What bug that causes "/bin/sh" to drop core?  In the first quoted
article by you, you said that "/bin/sh" wasn't dropping core at all.
Have you found out that it is now?


Roy

tytso@athena.mit.edu (Theodore Y. Ts'o) (09/13/89)

In article <2439@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes:
>There may be some alignment bugs in Perl; Theodore Ts'o's posting
>indicates that there seemed to be such bugs when he brought it up on a
>DECStation, and such bugs would be likely to show up on SPARC-based
>machines as well.  The bug that started this thread ain't one of them,
>though.  The cause of said bug has been identified, so please, no more
>speculation on its cause....

In fact, once the patches I've posted are applied (the most important
of which fixed a array overrun which trashed malloc control blocks),
There's only one problem that I know of which might possibly be related
to alignment restrictions:   

<tytso@hodge>   {~/watchmaker/mipssrc/perl/t}
101% op.sort
1..3
Fixed up unaligned data access for pid 2332 (perl) at pc 0x418b0c
Fixed up unaligned data access for pid 2332 (perl) at pc 0x418b24
Fixed up unaligned data access for pid 2332 (perl) at pc 0x418b2c
not ok 1
ok 2
Segmentation violation (core dumped)

(the "Fixed up unaligned...." message is printed by the Ultrix Kernel)

All of the other regression tests pass on the PMAX, so I doubt most of
the reported bugs are alignment-based, given that everything else
works on a MIPS architecture.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Theodore Ts'o				bloom-beacon!mit-athena!tytso
3 Ames St., Cambridge, MA 02139		tytso@athena.mit.edu
   Everybody's playing the game, but nobody's rules are the same!

guy@auspex.auspex.com (Guy Harris) (09/13/89)

>What bug that causes "/bin/sh" to drop core?  In the first quoted
>article by you, you said that "/bin/sh" wasn't dropping core at all.
>Have you found out that it is now?

No.  The article that started this off, <550@caldwr.UUCP>, noted a bug
that caused "perl" to drop core:

	...

	I'm running SunOS 4.0.1 on a Sun 4/260.  Got perl 3.0 (beta),
	ran configure, make depend, make, then make test.  On the
	latter it gives the message:

	sh: 24615 Memory fault - core dumped

	...

A followup article, <FLEE.89Sep7225002@shire.cs.psu.edu>, claimed,
incorrectly, that this was "actually a problem with SunOS 4.0 in
general", and cited an unrelated bug that caused the shell to drop core:

	Actually, a problem with SunOS 4.0 in general--I suspect dynamic
	libraries.  The following may or may not cause a coredump:
		(in csh)
		% set x=x
		% while (1)
		? sh -c 'a='$x'`pwd` echo -n .'		# note the quoting!
		? set x=x$x
		? end

	After a couple hundred dots, you may or may not get a series of
		sh: Memory fault - core dumped

In the first quoted article by me, <2431@auspex.auspex.com>, I was
responding to article <2256@hydra.Helsinki.FI>, in which the author
claimed that

	Probably not so.  I have perl 3.0-beta running on Sun 3's (under
	SunOS 4.0.3) but it (or actually sh) dumps core on my Sun 4/260
	(also SunOS 4.0.3) when I try 'make test'.  Either this is a
	specific Sun 4 problem or I have made some mistake while running
	Configure. 

The error message may have said "sh: ... - core dumped", but this
definitely does not mean that it was necessarily the shell that dumped
core (I just tried running "sh -c a.out" where "a.out" was a program
that called "abort()" and thus dumped core, and 1) the error message had
"sh:" in front of it and 2) it was "a.out", not the shell, that dumped
core).  When I ran "make test" on a 4/280 here running 4.0.1, "perl"
dropped core - NOT the shell.

In the article to which you're replying, <2439@auspex.auspex.com>, "The
bug that causes '/bin/sh' to drop core" is the one described in
<FLEE.89Sep7225002@shire.cs.psu.edu>, the article mentioned above.

In short, in the first quoted article, I said that "/bin/sh" wasn't
dropping core when "make test" is run for "perl" on a Sun-4.  I didn't
say it wasn't dropping core when you ran the sequence described in
<FLEE.89Sep7225002@shire.cs.psu.edu>, but that's a different sequence
from the one that causes "perl" to drop core.

guy@auspex.auspex.com (Guy Harris) (09/14/89)

>Ah, but I _have_ managed to get /bin/sh to core dump with a segmentation
>fault under SunOS 4.0*.

So have I, but the problem is that people are confusing the bug that
causes *perl* to drop core with another bug that causes "sh" to drop
core.  If people want to discuss the latter bug, I suggest they:

	1) remove the "was: perl 3.0 dumps core" from the subject line,
	   since the bugs are almost certainly unrelated (the "perl" bug
	   is due to a problem with "vfork", and "/bin/sh" doesn't use
	   "vfork")

and

	2) take it to a different group, e.g. "comp.sys.sun", since Sun
	   hasn't decided to risk the wrath of AT&T's legal department
	   and post the sources to its version of the S5R3.1 Bourne
	   shell to any of the "{alt,comp}.sources.*" newsgroups.

In other words, remove said discussion from *this* thread.

tml@hemuli.atk.vtt.fi (Tor Lillqvist) (09/17/89)

Various people write about the perl test suite failing with a
memory fault and dumped core on Sun 4 machines, a few even
thinking that it is /bin/sh that dumps core.

The problem with perl 3.0 on SPARC machines probably lies in
malloc.c, where the overhead union needs more strict alignment.
Adding a double field in the union takes care of that.  At least
this was the case on the HP9000 Series 800 (HP-PA, aka Spectrum),
which also is a strict-alignment RISC machine.
-- 
Tor Lillqvist
Technical Research Centre of Finland, Computing Services (VTT/ATK)
tml@hemuli.atk.vtt.fi [130.188.52.2]

guy@auspex.auspex.com (Guy Harris) (09/20/89)

 >Various people write about the perl test suite failing with a
 >memory fault and dumped core on Sun 4 machines, a few even
 >thinking that it is /bin/sh that dumps core.
 >
 >The problem with perl 3.0 on SPARC machines probably lies in
 >malloc.c, where the overhead union needs more strict alignment.
 >Adding a double field in the union takes care of that.  At least
 >this was the case on the HP9000 Series 800 (HP-PA, aka Spectrum),
 >which also is a strict-alignment RISC machine.

And again....

The problem with perl 3.0 beta on SPARC machines that causes it to drop
core in the perl test suite absolutely, positively, certainly lies in
"util.c", not in "malloc.c"; changing "popen" in "util.c" not to use
"vfork" fixes it in beta, and the change in 3.0 gamma that has it
include <vfork.h> on SPARC machines fixes it in gamma (although I think
I tried that in beta and it wasn't sufficient there).

A version compiled with the "malloc" distributed with "perl" worked in
both the beta and gamma versions on a SPARC machine.  There may be other
sequences that cause it to drop core on a SPARC machine due to alignment
problems in "malloc", but the one generated by "make test" on the Sun-4
here is *not* one of them.