bobm%pixel@uunet.UU.NET (12/30/90)
Help! I can't boot minix.
My hardware is 4 Mb of RAM, with parity, without no-slot-clock,
with the canonical Miniscribe disk. I'm attached to a Sun 3/50
via serial line.
I've been able to download Jordan's pi-digit program and run it
with no problems. I've been able to download minix and do many
interesting things with it, but not to boot it.
I'm trying to boot the original distribution of minix, as I haven't
seen any more full releases since then. Specifically, the EPROM says
this:
Version: Sat Jul 14 19:25:31 PDT 1990
And minix says this:
Minix 1.3 kernel version: Sat Jul 14 18:47:33 PDT 1990
When I follow the instructions for booting RAM disk, or for booting
from hard disk (I've got the miniroot loaded onto the disk), minix
seems to start up, then partway through the exec of /etc/init, it
blows up with a bpt instruction at address 0x40. Poking around leads
me to believe that idle_task() has called zero_page() with an address
in the 0xff000000's and that somehow the stack has been overscribbled.
Is this a hardware bug? It must be, since what I'm doing is identical
to what other people have done.
This is a subtle hardware bug. Console and SCSI interrupts are
working. I've used the debugger for a couple of days now with no
apparent problems.
I've appended a typescript of a typical session below. Oh, and reply
to bobm@convex.com, as convex's mail configuration is broken this week
and this message's return address won't work.
Note that 0xfd04 is the address of _dirty_pages, the head of the dirty
page list. It contains a number that definitely isn't a page address.
The word before, 0xfd00, has had the test pattern written into it.
That wasn't there when ram_size() finished running.
Thanks for any insight you can offer.
K<bob>
------------------------------------------------------------------------------
Script started on Sun Dec 30 01:51:41 1990
jogger-egg> tip hardwire
connected
NS32000 ROM Debugger
Version: Sat Jul 14 19:25:31 PDT 1990
RAM free above 0x1554
Command (? for help): download 2000
~
[EOT]
jogger-egg> dnld root/sys/image
Length=50566 CRC=58777
jogger-egg> tip hardwire
connected
CRC ok, length = 50566
Command (? for help): ed 2024 d
2024(202): 100
2028(0):
Command (? for help): ed 2028 d
2028(0): d'360
202c(0):
Command (? for help): set bkpt0 7d0c
Command (? for help): dis r'bkpt0
7d0c enter [r3,r4],0
Command (? for help): run 2000
Minix 1.3 kernel version: Sat Jul 14 18:47:33 PDT 1990
Pages of user memory: 883
Start user pages: 0x8D000
Start of RAM disk: 0x32800
7d0c enter [r3,r4],0 (breakpoint)
Command (? for help): download 32800
~
[EOT]
jogger-egg> dnld disk1/min532
Length=368790 CRC=51953
51.120u 10.100s 6:26.46 15.8% 0+44k 49+1io 47pf+0w
jogger-egg> tip hardwire
connected
CRC ok, length = 368790
Command (? for help): run
Please enter date: MMDDYYhhmmss. Then hit RETURN.
40 bpt (breakpoint)
Command (? for help): show
r0=ff15e900 r1=ff15f9ff r2=0 r3=ff15e9ff
r4=0 r5=0 r6=0 r7=0
f0=0 f1=0 f2=0 f3=0
f4=0 f5=0 f6=0 f7=0
l1l=0 l1h=0 l3l=0 l3h=0
l5l=0 l5h=0 l7l=0 l7h=0
pc=40 usp=e470 isp=e254 fp=e470
sb=0 intbase=40 mod=20c6 psr=ipsuNzfvltc
dcr=0 dsr=0 car=0 bpc=0
cfg=bf7 ptb0=0 ptb1=3e8000 tear=7024
mcr=5 msr=af fsr=10000 .=7d0f
v1=0 v2=0 bkpt0=7d0c bkpt1=0
bkpt2=0 bkpt3=0 bkpt4=0 bkpt5=0
bkpt6=0 bkpt7=0 radix=d'16 debug=d'0
scrlen=d'24 scsi_adr=1 scsi_lun=0
Command (? for help): trace
0x3b1e
Command (? for help): disassemble 3af0 16
3af0 enter [r3],0
3af3 bsr -5863
3af8 movd 49676(pc),r3
3afe cmpqd 0,r3
3b00 beq 9
3b02 movd 0(r3),49666(pc)
3b09 movqd 1,tos
3b0b bsr -5867
3b10 adjspb -4
3b13 cmpqd 0,r3
3b15 beq 33
3b17 movd r3,tos
3b19 bsr 8609
3b1e bsr -5906
3b23 movd 49629(pc),0(r3)
3b2a movd r3,49622(pc)
3b30 adjspb -4
3b33 br -59
3b36 bsr -5893
3b3b br -72
3b3e exit [r3]
3b40 ret 0
Command (? for help): dump fd00 100
0000FD00 A5 A5 A5 A5 | FF 1F 1F 5F | FA FF FF FF | 00 00 00 00 ......._........
0000FD10 04 00 00 00 | FA FF FF FF | 44 00 00 00 | 04 00 00 00 ........D.......
0000FD20 1A FC FF FF | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 ................
0000FD30 00 00 1C C7 | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 ................
0000FD40 00 00 00 00 | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 ................
0000FD50 00 00 00 00 | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 ................
0000FD60 00 00 00 00 | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 ................
0000FD70 00 00 00 00 | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 ................
0000FD80 00 00 00 00 | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 ................
0000FD90 00 00 00 00 | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 ................
0000FDA0 00 00 00 00 | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 ................
0000FDB0 00 00 00 00 | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 ................
0000FDC0 00 00 00 00 | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 ................
0000FDD0 00 00 00 00 | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 ................
0000FDE0 00 00 00 00 | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 ................
0000FDF0 00 00 00 00 | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 ................
Command (? for help): ~
[EOT]
jogger-egg> .
exit
script done on Sun Dec 30 02:10:34 1990ANTSU@kontu.utu.fi (12/31/90)
Hi! I think you should "set bkpt0 0" to get rid of the breakpoint (??) I think there is a "bug" in Bruce's instructions in that part... The breakpoint is set but not reset. I am not sure about this but I remember haing had the same suprise myself... Wshes, Antti-Pekka antsu@kontu.utu.fi
phil@Shiva.COM (Phil Budne) (12/31/90)
I used to see this exactly... I could get around it by setting a break in
the null process and single stepping it until the rc script was running.
I always chalked it up to the fact I was running the CPU at only 10Mhz
(nver have had time to debug it), but the problem went away after I rebuilt
the kernel from source (a scary proposition given how hard it was to
boot up under the best circumstances)!
-Phil
........o
Philip Budne : o---+----o Shiva Corporation
FastPath Project Leader : o | 1 Cambridge Center
: Shiva | Cambridge, Ma 02142
Internet: phil@Shiva.COM : . | Tel (617) 252-6300
: o---o Fax (617) 252-6852bobm%pixel@uunet.UU.NET (01/02/91)
Well, it was indeed a hardware problem. I managed to produce an anomaly using the ROM monitor's "fill", "dump" and "crc" commands, and quickly narrowed it down to "sometimes writes to address N modify address N xor 0x100000." It turned out that signal A20 wasn't making it to the page comparator, U36. Now that's fixed, and Minix comes up partway. When I boot, I get this. NS32000 ROM Debugger Version: Sat Jul 14 19:25:31 PDT 1990 RAM free above 0x1554 Command (? for help): read 0 2000 80 Command (? for help): run 2000 Minix 1.3 kernel version: Sat Sep 29 14:08:32 1990 Pages of user memory: 973 Start user pages: 0x33000 Tue Nov 30 00:00:01 1999 (hmmm.. better reinstall the no-slot clock) ***************************************************************** * * * W E L C O M E T O M I N I X 3 2 0 0 0 * * * * (c) Copyright 1987 Prentice-Hall * * * ***************************************************************** /dev/hd3 mounted login: But it doesn't listen to anything I type. Would anyone care to confirm or deny that this is the first time the machine has tried to do interrupt-driven input from the console? I'm guessing that the ICU works, because I booted the hard drive, and that requires SCSI interrupts. Either DUART 0 is scrozzled or there's a bad connection on signal /DUAR0 (*). Thanks to the several people who sent suggestions. K<bob>
news@daver.bungi.com (01/12/91)
HM, I didn't mean to send this or the other message to the mailing list. Everyone ignore the message except the originator of this thread. Anyhow, there is a missing step in Bruce's instructions, just after "cross your fingers" but before typing "run", and that is: set bkpt0 0 I had the same exact (bizarre) results until I realized that the breakpoint had gotten hit again. (O.K. so you might want to type the command before crossing your fingers)