[comp.unix.questions] Help-Bus Errors

bogatko@lzga.ATT.COM (George Bogatko) (02/09/90)

Help.  We have a program that we do not have source for that is dumping
core with Bus Error.  Does anybody have, or can point me to, a list of
what causes the major core dumps,  I.E. Bus, EMT, Memory Fault, etc.

Specifically, what kind of C errors will cause a Bus error as opposed to
an EMT trap, or a Memory Fault.

I don't read this group often, so email will be faster.

Please, no flames.

Yours, in anticipation

George Bogatko.

aryeh@eddie.mit.edu (Aryeh M. Weiss) (02/11/90)

In article <1810@lzga.ATT.COM> bogatko@lzga.ATT.COM (George Bogatko) writes:
>Help.  We have a program that we do not have source for that is dumping
>core with Bus Error.  Does anybody have, or can point me to, a list of
>what causes the major core dumps,  I.E. Bus, EMT, Memory Fault, etc.

A program dumps core when it receives a signal that is not currently being
caught or ignored and causes a core dump.  (Signals that cause core dumps
are (SIG) QUIT, ILL, TRAP, IOT, EMT, FPE, BUS, SEGV, and SYS.)  Any of
these signals can be sent to a process via kill(2S).  SIGQUIT is usually
caused by the quit key (^\).  SIGILL by execution of an illegal instruction
(this may be indicative of a trashed stack causing a procedure to return to
a random location in the code).  SIGTRAP, IOT, and EMT are caused by
executing special processor machine instructions.  The names are throwbacks
to the pdp-11 days and are named after instructions in the pdp-11
instruction set.  These are obviously machine dependent, but seem to have
equivalents on various popular hardware platforms (Vaxes, 68000, 80x86).
Trap instructions are used by debuggers to set breakpoints in the code of a
process being traced, but I don't know how they interact with SIGTRAP when
being used for this purpose.  SIGFPE are caused by floating point errors,
such as divide by 0, overflow, and (on Intel x86/x87 system) FPU stack
overflows (Xenix 386 users may be familiar with this last one).

Now the tricky ones: SIGSEGV is caused when a process addresses a location
outside of its (code or data) address space.  This is typically caused by
overrunning an array, incrementing (and dereferencing) a pointer beyond the
end of process memory, and, most familiar to all programmers of non-Vax
Unix systems, dereferencing the dreaded NULL pointer.  SIGBUS errors are
quite machine dependent, but in my experience can be caused by two
circumstances: (1) reference to an impossible machine address (this would
occur on 68000 systems if you went beyond address 2^24 and may occur on
386/286 systems if you load a segment register with an absurd segment
number) and (2) reference an odd address with a word oriented instruction
(this is a no-no on Vaxes and 68000's, but 80x86 systems don't mind).
SIGSYS is for bad arguments to a system call, but this has never happened
to me and I do not know how bad the argument has to be.  Illegal addresses
passed to system calls generally get returned to the calling process with
an error code, so I don't know how exactly to get one of those (this may be
another throwback to the olden days of yore).

>Please, no flames.

This question certainly comes under the heading of things your mother
(and the manuals) never told you.


-- 

hue@netcom.UUCP (Jonathan Hue) (02/11/90)

In article <1990Feb10.192028.16025@eddie.mit.edu> aryeh@eddie.MIT.EDU (Aryeh M. Weiss) writes:
>Unix systems, dereferencing the dreaded NULL pointer.  SIGBUS errors are
>quite machine dependent, but in my experience can be caused by two
>circumstances

Another thing that can cause bus errors is accessing an address which
is a valid address within your process, but the thing at that location
doesn't respond to a read or write.  An example of this would be a
frame buffer that you mapped into your process' address space, but was
flaky and for some reason didn't generate DTACKs.

-Jonathan

meissner@osf.org (Michael Meissner) (02/13/90)

In article <1990Feb10.192028.16025@eddie.mit.edu> aryeh@eddie.mit.edu
(Aryeh M. Weiss) writes:

	...

| SIGSYS is for bad arguments to a system call, but this has never happened
| to me and I do not know how bad the argument has to be.  Illegal addresses
| passed to system calls generally get returned to the calling process with
| an error code, so I don't know how exactly to get one of those (this may be
| another throwback to the olden days of yore).

When I was at Data General, we once grep'ed the current version of
System V that we had at the time (probably V.1), and the only place
that ever generated SIGSYS was if you passed something other than 0,
1, or 2 as the whence argument to lseek.  Given that the Version 6
PDP-11 UNIX only had a 'seek' call which took 3, 4, or 5 in addition
to lseek's value, to multiply the offset by 512, it may be SIGSYS was
a portibility guide that long since has unneeded.

--
Michael Meissner	email: meissner@osf.org		phone: 617-621-8861
Open Software Foundation, 11 Cambridge Center, Cambridge, MA

Catproof is an oxymoron, Childproof is nearly so

dyer@spdcc.COM (Steve Dyer) (02/13/90)

In article <MEISSNER.90Feb12175641@curley.osf.org> meissner@osf.org (Michael Meissner) writes:
>When I was at Data General, we once grep'ed the current version of
>System V that we had at the time (probably V.1), and the only place
>that ever generated SIGSYS was if you passed something other than 0,
>1, or 2 as the whence argument to lseek.  Given that the Version 6
>PDP-11 UNIX only had a 'seek' call which took 3, 4, or 5 in addition
>to lseek's value, to multiply the offset by 512, it may be SIGSYS was
>a portibility guide that long since has unneeded.

Another instance in which SIGSYS was returned was in the INDIR system call
in PDP-11s.  The read and write system calls had an inline calling sequence
like this:

mov	fd, r0	/ fildes in R0
sys	READ	/ sys is the PDP-11 trap instruction, READ the syscall index 
.word	bufaddr
.word	count
/next instruction...

similarly for write.

You can see that this doesn't lend itself easily to C language calls
like read(fd, bufaddr, count), especially for pure text programs.
INDIR was used to implement the system call libraries, accomodating
"pure text" programs which could not modify inline system call arguments.

.text
read:	...
/ get fd from stack, place in R0
/ move bufaddr and count to dataarea[1] and dataarea[2]
sys	INDIR	/ INDIR == 0
.word	dataarea
/next instruction

.data
dataarea:	sys	READ
	.word 0
	.word 0

If dataarea[0] wasn't a trap instruction, you'd get a SIGSYS.


-- 
Steve Dyer
dyer@ursa-major.spdcc.com aka {ima,harvard,rayssd,linus,m2c}!spdcc!dyer
dyer@arktouros.mit.edu, dyer@hstbme.mit.edu

bogatko@lzga.ATT.COM (George Bogatko) (02/14/90)

Now, SIGSYS I know about.  I got it when I tried to execute code that
used message queues on a 3B400 on which the IPC package had not been loaded.

Horrible death.

GB

tim@ohday.sybase.com (Tim Wood) (02/15/90)

In article <1990Feb10.192028.16025@eddie.mit.edu> aryeh@eddie.MIT.EDU (Aryeh M. Weiss) writes:
>SIGBUS errors are
>quite machine dependent, but in my experience can be caused by ...
>[referencing] an odd address with a word oriented instruction
>(this is a no-no on Vaxes and 68000's, but 80x86 systems don't mind).
		     ^^^^^
The VAX does not have alignment restrictions, that is,
one may read or write a multi-byte operands at a byte-boundary address.
Doing this incurs some performance penalties on the VAX, as well as
making your program less portable.  The trend these days, especially with
RISC, seems to be toward alignment restriction.  Nice explanation of coredump 
signals, BTW.
-TW
---
Sybase, Inc. / 6475 Christie Ave. / Emeryville, CA / 94608	  415-596-3500
tim@sybase.com          {pacbell,pyramid,sun,{uunet,ucbvax}!mtxinu}!sybase!tim
		This message is solely my personal opinion.
		It is not a representation of Sybase, Inc.  OK.

chris@mimsy.umd.edu (Chris Torek) (02/18/90)

In article <1990Feb10.192028.16025@eddie.mit.edu> aryeh@eddie.mit.edu
(Aryeh M. Weiss) writes:
[lots of good stuff]
>SIGBUS errors are quite machine dependent ... [but include, e.g.,]
>reference an odd address with a word oriented instruction ... on Vaxes
>and 68000's ....

As someone else already mentioned, VAXen do not care about address
alignment except for speed (aligned operands are somewhat faster).
68000 and 68010 CPUs do; 68020 and 68030 CPUs do not; many RISC chips
do.  On the VAX, a SIGBUS (bus error) is caused by exactly one condition:
an address in the range 0x80000000..0xffffffff.  (Half of these are legal
kernel space addresses; from the kernel, the fault occurs for addresses
in 0xc0000000..0xffffffff.  Bus timeouts appear as machine checks rather
than faults.)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris

aryeh@eddie.mit.edu (Aryeh M. Weiss) (02/19/90)

In article <22598@mimsy.umd.edu> chris@mimsy.umd.edu (Chris Torek) writes:
>In article <1990Feb10.192028.16025@eddie.mit.edu> aryeh@eddie.mit.edu
>(Aryeh M. Weiss) writes:
>>reference an odd address with a word oriented instruction ... on Vaxes
>>and 68000's ....
>
>As someone else already mentioned, VAXen do not care about address
>alignment except for speed (aligned operands are somewhat faster).

Sorry, my mistake.  I was recalling some of this from some old experiences.
Since I have always seen word aligned Vax code and I just assumed ...

-- 

jmm@eci386.uucp (John Macdonald) (02/20/90)

Wow, a chance to pick nits on Chris Torek...

In article <22598@mimsy.umd.edu> chris@mimsy.umd.edu (Chris Torek) writes:
| In article <1990Feb10.192028.16025@eddie.mit.edu> aryeh@eddie.mit.edu
| (Aryeh M. Weiss) writes:
| [lots of good stuff]
| >SIGBUS errors are quite machine dependent ... [but include, e.g.,]
| >reference an odd address with a word oriented instruction ... on Vaxes
| >and 68000's ....
| 
| As someone else already mentioned, VAXen do not care about address
| alignment except for speed (aligned operands are somewhat faster).
| 68000 and 68010 CPUs do; 68020 and 68030 CPUs do not;  [ ... ]
                           ^^^^^

On the 68020, only data references can be unaligned (and slow); code words
must have even alignment or the fetch fails.  I would guess that the 68030
is the same, but I've never checked.
-- 
Algol 60 was an improvment on most           | John Macdonald
of its successors - C.A.R. Hoare             |   jmm@eci386