[comp.os.minix] Receiving General Protection panic from 386 Minix

ghelmer@dsuvax.uucp (Guy Helmer) (09/22/90)

I just built the 386 version of MINIX, and I've been searching for
the source of this error for several hours now.  I receive a
general protection from process number 1, pc = 0x0007:0x00000385
and the friendly "Kernel panic: exception in kernel, mm, or fs"
message immediately after pressing the '=' key at the boot menu.
I'm trying to run this without shoelace or db.

I've tried this on two very different 386 boxes with identical
results from both, so I must have done something wrong while
building the system.  I've re-built the kernel several times,
as well as the various tools (build, init, bootblok).  bootblok
has the patches to copy itself very high in memory before loading
the rest of the o/s, so I don't believe it's related to the tools.

For the gurus that have built 386 kernels that work, am I right
in believing that the code segment of the above address (0x0007)
is the segment descriptor for the BIOS code segment that the
BIOS uses in INT 0x15 function 0x89?  Right now I think perhaps
a GDT entry isn't being set up correctly, but that's an
uneducated guess :-(

Thanks for any help!
-- 
Guy Helmer
work: DSU Computing Services, Business & Education Institute    (605) 256-5315
play: MidIX System Support Services                             (605) 256-2788
helmer@sdnet.bitnet, ghelmer@dsuvax.uucp, uunet!loft386!dsuvax!ghelmer

awb@almond.ed.ac.uk (Alan W Black) (09/23/90)

In article <1990Sep22.055445.15470@dsuvax.uucp> ghelmer@dsuvax.uucp (Guy Helmer) writes:
>I just built the 386 version of MINIX, and I've been searching for
>the source of this error for several hours now.  I receive a
>general protection from process number 1, pc = 0x0007:0x00000385
>and the friendly "Kernel panic: exception in kernel, mm, or fs"
>message immediately after pressing the '=' key at the boot menu.
>I'm trying to run this without shoelace or db.
>
>  [details deleted] 
>
>Guy Helmer
>work: DSU Computing Services, Business & Education Institute    (605) 256-5315
>play: MidIX System Support Services                             (605) 256-2788
>helmer@sdnet.bitnet, ghelmer@dsuvax.uucp, uunet!loft386!dsuvax!ghelmer

I recently did this on a 386 machine.  After a lot of searching and
debugging we discovered that the system corrupts itself when booting. 
We basically we had two problem one ours and one caused by a problem in
bootblok.s (as distributed -- I got mine from plains.nodak.edu)
If the size of the kernel is one sector bigger that the
number of tracks being loaded, it misses the last sector. 

The fix is something like (sorry I don't have the actual bootblok.s
here) at the end of the loop where it is loading sectors

        mov     ax,disksec      | see if we are done loading
        cmp     ax,final        | ditto
        jb      load            | jump if there is more to load

You should (I believe) decrement disksec before doing the test.  I
can't remember if this is right but it definately in this area
and after we changed this it worked.

Now it happens that if you build a kernel with the small number of
disk buffers (in minix/config.h) i.e. not the INTEL_32BITS default
but the old default.  It will work without the above change.

The other problem (which I actually think is what you are
getting first).  There were no instructions on how to build
the default db (x386_1.1/tools/db.s) I originally simply bcc'd it
but it seems it should be that you build the .o file then
ld it on its own without the library.

(thanks to Richard Tobin for doing most of this debugging)

Hope this doesn't just confuse the issue

Alan

Alan W Black                          80 South Bridge, Edinburgh, UK
Dept of Artificial Intelligence       tel: (+44) -31 225 7774 x228 or x223
University of Edinburgh               email: awb@ed.ac.uk

ghelmer@dsuvax.uucp (Guy Helmer) (09/26/90)

In <1990Sep22.055445.15470@dsuvax.uucp> ghelmer@dsuvax.uucp (Guy Helmer) writes:
>I just built the 386 version of MINIX, and I've been searching for
>the source of this error for several hours now.  I receive a
>general protection from process number 1, pc = 0x0007:0x00000385
>and the friendly "Kernel panic: exception in kernel, mm, or fs"
>message immediately after pressing the '=' key at the boot menu.
>I'm trying to run this without shoelace or db.

I quit trying to run without db.  Last night I tried to debug 386 Minix
for a few hours and only decided that the general protection exception
was happening after the TTY proc was initialized but before MM was
initialized.  I didn't have time to track it down farther.

This morning, though, I rebuilt everything after changing the number
of buffers in the cache to 30, and 386 Minix started without any complaints.
I'd like to find the source of the trouble when using large numbers
of buffers and the plain Minix bootstrap.

I guess I'll have to figure out how to use shoelace to boot Minix
so I can have lots of cache until I get this problem figured out.

>Thanks for any help!

Thanks to everyone who responded!
-- 
Guy Helmer
work: DSU Computing Services, Business & Education Institute    (605) 256-5315
play: MidIX System Support Services                             (605) 256-2788
helmer@sdnet.bitnet, ghelmer@dsuvax.uucp, uunet!loft386!dsuvax!ghelmer

wkt@csadfa.cs.adfa.oz.au (09/27/90)

In article <1990Sep25.221053.23430@dsuvax.uucp>, Guy Helmer writes:
>This morning, though, I rebuilt everything after changing the number
>of buffers in the cache to 30, and 386 Minix started without any complaints.
>I'd like to find the source of the trouble when using large numbers
>of buffers and the plain Minix bootstrap.

Bruce Evans wrote to me a few weeks ago explaining the problem. It seems
that when build is compiled as a 16-bit binary it can't cope with sizes
bigger than 64K. Specifying a buffer size bigger than this gives the
problem you've see.

Solution: Build a '386 kernel with 30 buffers. Use it to make a 32-bit
build, and use _it_ to build a kernel with, say, 300 buffers. This works!

BTW, you get an awfully large image, around 500K, with this. This is mostly
empty space. Has anybody thought of a method of removing the empty
(uninitialised data) part of the image, and to create this at boot time?!

Cheers,
	Warren Toomey	wkt@csadfa.cs.adfa.oz.au

evans@syd.dit.CSIRO.AU (Bruce.Evans) (10/03/90)

In article <1990Sep22.055445.15470@dsuvax.uucp> ghelmer@dsuvax.uucp (Guy Helmer) writes:
>I just built the 386 version of MINIX, and I've been searching for
>the source of this error for several hours now.  I receive a
>general protection from process number 1, pc = 0x0007:0x00000385
>and the friendly "Kernel panic: exception in kernel, mm, or fs"
>message immediately after pressing the '=' key at the boot menu.
>I'm trying to run this without shoelace or db.

Warren Toomey has already answered this. This reply has been delayed
by news timewarp.

The problem is that the 16-bit "build" messes up the 32-bit image with
the default number of buffers, because it uses unsigned to hold various
sizes, and silently truncates the long bss size in fs's exec header.
The 32-bit build seems to work OK but I still recommend shoelace.

The main other problem with building Minix-386 (again using the new
build) was that the makefile didn't cover "db". Add this:
---
AS86		=as -0 -a
LD86		=ld -0

/etc/db: db.s
	$(AS86) -o db.o db.s
	$(LD86) -o /etc/db db.o
	rm db.o
---

>For the gurus that have built 386 kernels that work, am I right
>in believing that the code segment of the above address (0x0007)
>is the segment descriptor for the BIOS code segment that the

No. 0x0007 = 0000 0000 0000 0 1 11 binary.
             ssss ssss ssss s t pp
Where 's' = descriptor table index (= 0), 't' = LDT/GDT table selector
(= 1 for LDT), 'p' = privilege level (= 3 for servers and users/0).
The index is never 0 for GDT entries so we can tell this is an LDT
selector without looking at the table selector bit. For servers and
users, the CS selector is always 0x0007 and the DS selector is always
0x000F so very little can be decided from the selector alone.

The next step in debugging is to look at the processes number:

>general protection from process number 1, pc = 0x0007:0x00000385

Assuming the kernel itself is working, this says that the faulty
task is FS and something bad happens at address 0x385 in the code
segment. The next step is to disassemble around address 0x385 using
db after stopping the boot at a convenient breakpoint, or using mdb 
on the fs binary on a working system. Either the instruction or the
data it references it could be wrong. In this case I think the trap
is caused by a data reference beyond the end of the data segment.
Build has caused the data segment to be too small.
-- 
Bruce Evans		evans@syd.dit.csiro.au