[unix-pc.general] Large programs core dumping upon execution

sysop@comhex.UUCP (Joe E. Powell) (07/26/89)

Has anyone else ever noticed that very large (over 300K) files
sometimes tend to core dump when they are invoked?  They usually
work fine, but every now and again, the program will just refuse
to start up.  Is it just me or have other people had this happen?

I've noticed this occasionally on nethack and moria, but more
often with gcc (esp gcc 1.35).

I'm running 3.51a, with a 40 MB drive and 2.5 MB of RAM.

--
Joe E. Powell
unf7!comhex!sysop@bikini.cis.ufl.edu

lenny@icus.islp.ny.us (Lenny Tropiano) (07/27/89)

In article <211@comhex.UUCP> sysop@comhex.UUCP (Joe E. Powell) writes:
|>Has anyone else ever noticed that very large (over 300K) files
|>sometimes tend to core dump when they are invoked?  They usually
|>work fine, but every now and again, the program will just refuse
|>to start up.  Is it just me or have other people had this happen?
|>
|>I've noticed this occasionally on nethack and moria, but more
|>often with gcc (esp gcc 1.35).
|>
|>I'm running 3.51a, with a 40 MB drive and 2.5 MB of RAM.

I would check the output (if any) in your /usr/adm/unix.log file.  It 
is possible you are getting NMI Parity errors (if your program core dumps
with a "Memory Fault" this kinda sounds like the symptoms)   This usually
signifies bad memory (sorry to say) ... Run memory diagnostics, although
that doesn't always find the problem.   I'd also suggest pulling the .5MB
of RAM out, and see if it goes away ... (possibly the bad memory is on
the expansion board).

Good luck!

-Lenny
-- 
Lenny Tropiano             ICUS Software Systems         [w] +1 (516) 589-7930
lenny@icus.islp.ny.us      Telex; 154232428 ICUS         [h] +1 (516) 968-8576
{ames,talcott,decuac,hombre,pacbell,sbcs}!icus!lenny     attmail!icus!lenny
        ICUS Software Systems -- PO Box 1; Islip Terrace, NY  11752

jcm@mtunb.ATT.COM (was-John McMillan) (07/28/89)

In article <929@icus.islp.ny.us> lenny@icus.islp.ny.us (Lenny Tropiano) writes:
>In article <211@comhex.UUCP> sysop@comhex.UUCP (Joe E. Powell) writes:
>|>Has anyone else ever noticed that very large (over 300K) files
>|>sometimes tend to core dump when they are invoked?  They usually
>|>work fine, but every now and again, the program will just refuse
>|>to start up.  Is it just me or have other people had this happen?
>|>
>|>I've noticed this occasionally on nethack and moria, but more
>|>often with gcc (esp gcc 1.35).
>|>
>|>I'm running 3.51a, with a 40 MB drive and 2.5 MB of RAM.

Uhhhhh.  Finally, a chance to disagree with Lenny!?-)

Sounds to me like our once-a-week visit to SWAP land.

If you've exhausted SWAP space, it is presented to the program as
an ENOMEM error in some phase of forking/execing/malloc-ing/stack-extending.

And programs often presume there's lotza mem, so why check return values!

	***** NOTE WELL:  _MY_ code never does this ]8-)  *****

If so: run less, or allocate more.

john mcmillan	-- att!mtunb!jcm -- "What NEVER? ... Hardly EVER ..."
					Gilbert & Sullivan (Pinafore)

sysop@comhex.UUCP (Joe E. Powell) (07/28/89)

In article <929@icus.islp.ny.us>, lenny@icus.islp.ny.us (Lenny Tropiano) writes:
> In article <211@comhex.UUCP> sysop@comhex.UUCP (Joe E. Powell) writes:
> |>Has anyone else ever noticed that very large (over 300K) files
> |>sometimes tend to core dump when they are invoked?  They usually
> |>work fine, but every now and again, the program will just refuse
> |>to start up.  Is it just me or have other people had this happen?
> |>
> |>I've noticed this occasionally on nethack and moria, but more
> |>often with gcc (esp gcc 1.35).
> |>
> |>I'm running 3.51a, with a 40 MB drive and 2.5 MB of RAM.
> 
> I would check the output (if any) in your /usr/adm/unix.log file.  It 

I mail myself a copy of the unix.log file every night, so I know I'm
not having anything like that show up.

> is possible you are getting NMI Parity errors (if your program core dumps
> with a "Memory Fault" this kinda sounds like the symptoms)   This usually

Hmmm...but doesn't it say "Memory Fault" every time a program dumps core?
No matter what?

> signifies bad memory (sorry to say) ... Run memory diagnostics, although
> that doesn't always find the problem.   I'd also suggest pulling the .5MB
> of RAM out, and see if it goes away ... (possibly the bad memory is on
> the expansion board).

This happens on two different machines.  One with 1MB on the motherboard
w/1.5MB combo card, and another with 2MB on the motherboard w/.5MB ram 
card.

Let me clarity what happens:

$ nethack
Memory fault - core dumped
$ nethack
Memory fault - core dumped
$ nethack
Memory fault - core dumped
$ nethack

the nethack screen starts up.

Do you still think I'm having memory problems?

--
Joe E. Powell
unf7!comhex!sysop@bikini.cis.ufl.edu

res@cbnews.ATT.COM (Robert E. Stampfli) (07/30/89)

In article <211@comhex.UUCP> sysop@comhex.UUCP (Joe E. Powell) writes:
>Has anyone else ever noticed that very large (over 300K) files
>sometimes tend to core dump when they are invoked?  They usually
>work fine, but every now and again, the program will just refuse
>to start up.  Is it just me or have other people had this happen?
>
>I've noticed this occasionally on nethack and moria, but more
>often with gcc (esp gcc 1.35).
>
>I'm running 3.51a, with a 40 MB drive and 2.5 MB of RAM.
>
>--
>Joe E. Powell
>unf7!comhex!sysop@bikini.cis.ufl.edu

Yes, I have noticed this also with gcc 1.35.  

We recently added more memory to one of our machines (a 2-meg expansion
card), making our configuration remarkably similar to yours: 2.5 meg ram
(.5 on motherboard, 2.0 on expansion) 40 meg disk, 3.51 (not "a" revision).
We have a serial card, also.  Now, all of a sudden, I notice that
gcc-1.35 dumps core more than half of the time, but when it doesn't, it
works fine.  This is when I run it from the tty002 line connected to my
terminal.  Now the kicker:  When I run gcc from the console, it *always*
works.  My hypothesis up to this point, without looking at the problem in
detail, was that there is probably some interaction with the number of bytes
of exported variables on the stack, perhaps a bug in gcc that caused it to
use more stack than was allocated.  This would explain it working sometimes
and not others, but after your posting I am less convinced it is a gcc problem.

I am curious: Do you run gcc from the console or a tty line.  If so, which
tty (before the upgrade, my tty was on tty001, and it worked fine from there,
although I have not had the opportunity to recable and try it since).
Also, is your machine a .5/2.0 configuration?  BTW, the memory card is an
upgraded .5 meg card which was run thru numerous passes of the diagnostics
without a glitch.  I forget the signal number, but gcc dies with a
segmentation fault, which is unlikely to be due to a hardware problem.

Rob Stampfli
att!cbnews!res (work)
osu-cis!n8emr!kd8wk!res (home)

bob@rush.cts.com (Bob Ames) (08/03/89)

In article <211@comhex.UUCP> sysop@comhex.UUCP (Joe E. Powell) writes:
>Has anyone else ever noticed that very large (over 300K) files
>sometimes tend to core dump when they are invoked?  They usually
>work fine, but every now and again, the program will just refuse
>to start up.  Is it just me or have other people had this happen?
>
>I've noticed this occasionally on nethack and moria, but more
>often with gcc (esp gcc 1.35).
>
>I'm running 3.51a, with a 40 MB drive and 2.5 MB of RAM.

I don`t have gcc, but I have the same thing happen on moria.  I have
never seen nethack2.2 have a problem.

On a slightly related subject, I`ve been trying, for about 1 YEAR, to get
a version of nethack higher than 2.2.  I`ve never seen kitchen
sinks...

Does *anybody* have a working 2.3x version that wouldn`t be too hard
to the unix-pc?  I can now FTP, I think, if provided exact instructions
after the 'ftp site.name.xxf.fgr' command...  Or I could call you...
Or I could send disks with SASE...  Or I could ....  Or I could ....

Also, What`s happened to killer?  I finally got a DOS card for this
beast and now can`t get to killer to get ibm-pc archives |-(

Sorry for all these subjects, I only get unix-pc.all currently.

Bob Ames

Bob Ames   The National Organization for  the Reform of Marijuana Laws, NORML 
"Pot is the world's best source of complete protein, alcohol fuel, and paper,
is the best fire de-erosion seed, and is america's largest cash crop." - USDA
bob@rush.cts.com or ncr-sd!rush!bob@nosc.mil  or rutgers!ucsd!ncr-sd!rush!bob
619-741-UN2X "We each pay a fabulous price for our visions of paradise," Rush

jcm@mtunb.ATT.COM (was-John McMillan) (08/03/89)

In article <211@comhex.UUCP> sysop@comhex.UUCP (Joe E. Powell) writes:
>Has anyone else ever noticed that very large (over 300K) files
>sometimes tend to core dump when they are invoked?  They usually
>work fine, but every now and again, the program will just refuse
>to start up.  Is it just me or have other people had this happen?

A final (HAhahaha...) muttering from me on this:

1) While I posted (or E-mailed) earlier that this sounds like classic
	outta-SWAP trouble, another possibility occured to me.

2) Until approx. the 3.51C kernel, there was an error in the
	'getcontext()' code.  Despite mis-leading COMMENTS, the code
	failed to properly change context between two processes.

	Specifically, if the OUTGOING process had SHARED MEMORY [SHM],
	it was left mapped-in in the 'MMU'.  This caused gross pain
	when that SHM was at a low enough address that another
	process attempted to use the same Virtual Address [VA] space.

	Since the 'MMU' indicated the page was PRESENT, the new
	process didn't fault-in its own page on 1st access -- ie.,
	start-up, usually.  Since it's typically difficult to
	execute another programs shared data, death by illegal-
	instruction was common if the new program had a LARGE TEXT
	image.

	The error remained as long as it did because:
	a) the coincidence of low-VA SHM & concurrent large TEXT process
		is rare;
	b) the kernel code was 'correct' and the comments were
		mis-leading: the CONCEPT was flawed because the
		stated process was not 'in-context' when the code
		was executing;
	c) the entire, creaking/ancient VM base for the 3B1 -- based
		on some Berkeley model -- is obscure and barely
		even patchable due to its anomalies.

In brief, there are SOME cases where program start-up failures may
	reflect the above problem.  I presume IPCS(1) would indicate
	if you've SHM in use, but WHERE it's mapped is another Q.

john mcmillan	-- att!mtunb!jcm

wolfer@cbnewse.ATT.COM (paul.d.wolfson) (08/05/89)

In article <211@comhex.UUCP>, sysop@comhex.UUCP (Joe E. Powell) writes:
> Has anyone else ever noticed that very large (over 300K) files
> sometimes tend to core dump when they are invoked?  They usually
> work fine, but every now and again, the program will just refuse
> to start up.  Is it just me or have other people had this happen?
> 
> I've noticed this occasionally on nethack and moria, but more
> often with gcc (esp gcc 1.35).
> 
> I'm running 3.51a, with a 40 MB drive and 2.5 MB of RAM.
> 
> --
> Joe E. Powell
> unf7!comhex!sysop@bikini.cis.ufl.edu

I've been running umoria (4.85, all patches applied) for about two
years. I have just the basic system (20M HD, 3.5, 1Meg RAM).
Umoria has a few panic save routines to prevent you from losing
your character, but I haven't tried the latest version of Nethack.
With umoria, the few times it crashed (very few) I tracked down
to overindexing arrays, of all things. Some larger Unix machines seem to let you
get away with this to a certain degree, but not the unixpc. Umoria
is compiled with the -g option in the makefile, so just run sdb 
to find where in the code it's blowing up, or  With Nethack, check to see
of the -g option is used, and check the code for the overindexing problem.
I have found no indications of core dumps being caused by the unixpc,
itself, for any of the large games I've run.

Also, it's always a good idea to run lint on these big games. They
are usually writted for and run on the big mainframes on college campuses.
Most of them are written for BSD with hooks for SYSV. I'm not
sure where the unixpc fits in to these unix versions.. It seems 
to be a mongrel composed of
a little of both. Anyway, I've found some very strange things in the
lint output for some of these games, and lint output may give you some 
leads as to where to search for your core dump problems.

________________________________________________________________________
P. Wolfson

student@unf7.UUCP (student account) (08/07/89)

In article <1586@mtunb.ATT.COM>, jcm@mtunb.ATT.COM (was-John McMillan) writes:
>> In article <211@comhex.UUCP> I write:
>> [ description about large programs dumping core on startup -- deleted ]
> [ description of shared memory page conflict deleted ]

That was it!  I took out all the programs that were using shared memory
and everything is working fine now.

You say the problem has been fixed?  I assume we'll see the fixes in the
upcoming fix disk?

--
Joe E. Powell
unf7!comhex!sysop@bikini.cis.ufl.edu

jcm@mtunb.ATT.COM (was-John McMillan) (08/08/89)

In article <212@unf7.UUCP> unf7!comhex!sysop@bikini.cis.ufl.edu writes:
>
>That was it!  I took out all the programs that were using shared memory
>and everything is working fine now.
>
>You say the problem has been fixed?  I assume we'll see the fixes in the
>upcoming fix disk?

	I say: the problem is fixed in currently available fix-disk --
	to the best of MY recollection.  Regardless, the fixed source
	has been submitted & accepted, so far as I know.  If this is
	a problem, and you will accept a *3.51* experimental kernel,
	submit an E-mail address to moi:	^^^^^^^^^^^^

		att!mtunb!jcm

john mcmillan	-- as above -- Growling on forever....