bobm%pixel@uunet.UU.NET (12/30/90)
Help! I can't boot minix. My hardware is 4 Mb of RAM, with parity, without no-slot-clock, with the canonical Miniscribe disk. I'm attached to a Sun 3/50 via serial line. I've been able to download Jordan's pi-digit program and run it with no problems. I've been able to download minix and do many interesting things with it, but not to boot it. I'm trying to boot the original distribution of minix, as I haven't seen any more full releases since then. Specifically, the EPROM says this: Version: Sat Jul 14 19:25:31 PDT 1990 And minix says this: Minix 1.3 kernel version: Sat Jul 14 18:47:33 PDT 1990 When I follow the instructions for booting RAM disk, or for booting from hard disk (I've got the miniroot loaded onto the disk), minix seems to start up, then partway through the exec of /etc/init, it blows up with a bpt instruction at address 0x40. Poking around leads me to believe that idle_task() has called zero_page() with an address in the 0xff000000's and that somehow the stack has been overscribbled. Is this a hardware bug? It must be, since what I'm doing is identical to what other people have done. This is a subtle hardware bug. Console and SCSI interrupts are working. I've used the debugger for a couple of days now with no apparent problems. I've appended a typescript of a typical session below. Oh, and reply to bobm@convex.com, as convex's mail configuration is broken this week and this message's return address won't work. Note that 0xfd04 is the address of _dirty_pages, the head of the dirty page list. It contains a number that definitely isn't a page address. The word before, 0xfd00, has had the test pattern written into it. That wasn't there when ram_size() finished running. Thanks for any insight you can offer. K<bob> ------------------------------------------------------------------------------ Script started on Sun Dec 30 01:51:41 1990 jogger-egg> tip hardwire connected NS32000 ROM Debugger Version: Sat Jul 14 19:25:31 PDT 1990 RAM free above 0x1554 Command (? for help): download 2000 ~ [EOT] jogger-egg> dnld root/sys/image Length=50566 CRC=58777 jogger-egg> tip hardwire connected CRC ok, length = 50566 Command (? for help): ed 2024 d 2024(202): 100 2028(0): Command (? for help): ed 2028 d 2028(0): d'360 202c(0): Command (? for help): set bkpt0 7d0c Command (? for help): dis r'bkpt0 7d0c enter [r3,r4],0 Command (? for help): run 2000 Minix 1.3 kernel version: Sat Jul 14 18:47:33 PDT 1990 Pages of user memory: 883 Start user pages: 0x8D000 Start of RAM disk: 0x32800 7d0c enter [r3,r4],0 (breakpoint) Command (? for help): download 32800 ~ [EOT] jogger-egg> dnld disk1/min532 Length=368790 CRC=51953 51.120u 10.100s 6:26.46 15.8% 0+44k 49+1io 47pf+0w jogger-egg> tip hardwire connected CRC ok, length = 368790 Command (? for help): run Please enter date: MMDDYYhhmmss. Then hit RETURN. 40 bpt (breakpoint) Command (? for help): show r0=ff15e900 r1=ff15f9ff r2=0 r3=ff15e9ff r4=0 r5=0 r6=0 r7=0 f0=0 f1=0 f2=0 f3=0 f4=0 f5=0 f6=0 f7=0 l1l=0 l1h=0 l3l=0 l3h=0 l5l=0 l5h=0 l7l=0 l7h=0 pc=40 usp=e470 isp=e254 fp=e470 sb=0 intbase=40 mod=20c6 psr=ipsuNzfvltc dcr=0 dsr=0 car=0 bpc=0 cfg=bf7 ptb0=0 ptb1=3e8000 tear=7024 mcr=5 msr=af fsr=10000 .=7d0f v1=0 v2=0 bkpt0=7d0c bkpt1=0 bkpt2=0 bkpt3=0 bkpt4=0 bkpt5=0 bkpt6=0 bkpt7=0 radix=d'16 debug=d'0 scrlen=d'24 scsi_adr=1 scsi_lun=0 Command (? for help): trace 0x3b1e Command (? for help): disassemble 3af0 16 3af0 enter [r3],0 3af3 bsr -5863 3af8 movd 49676(pc),r3 3afe cmpqd 0,r3 3b00 beq 9 3b02 movd 0(r3),49666(pc) 3b09 movqd 1,tos 3b0b bsr -5867 3b10 adjspb -4 3b13 cmpqd 0,r3 3b15 beq 33 3b17 movd r3,tos 3b19 bsr 8609 3b1e bsr -5906 3b23 movd 49629(pc),0(r3) 3b2a movd r3,49622(pc) 3b30 adjspb -4 3b33 br -59 3b36 bsr -5893 3b3b br -72 3b3e exit [r3] 3b40 ret 0 Command (? for help): dump fd00 100 0000FD00 A5 A5 A5 A5 | FF 1F 1F 5F | FA FF FF FF | 00 00 00 00 ......._........ 0000FD10 04 00 00 00 | FA FF FF FF | 44 00 00 00 | 04 00 00 00 ........D....... 0000FD20 1A FC FF FF | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 ................ 0000FD30 00 00 1C C7 | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 ................ 0000FD40 00 00 00 00 | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 ................ 0000FD50 00 00 00 00 | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 ................ 0000FD60 00 00 00 00 | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 ................ 0000FD70 00 00 00 00 | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 ................ 0000FD80 00 00 00 00 | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 ................ 0000FD90 00 00 00 00 | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 ................ 0000FDA0 00 00 00 00 | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 ................ 0000FDB0 00 00 00 00 | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 ................ 0000FDC0 00 00 00 00 | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 ................ 0000FDD0 00 00 00 00 | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 ................ 0000FDE0 00 00 00 00 | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 ................ 0000FDF0 00 00 00 00 | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 ................ Command (? for help): ~ [EOT] jogger-egg> . exit script done on Sun Dec 30 02:10:34 1990
ANTSU@kontu.utu.fi (12/31/90)
Hi! I think you should "set bkpt0 0" to get rid of the breakpoint (??) I think there is a "bug" in Bruce's instructions in that part... The breakpoint is set but not reset. I am not sure about this but I remember haing had the same suprise myself... Wshes, Antti-Pekka antsu@kontu.utu.fi
phil@Shiva.COM (Phil Budne) (12/31/90)
I used to see this exactly... I could get around it by setting a break in the null process and single stepping it until the rc script was running. I always chalked it up to the fact I was running the CPU at only 10Mhz (nver have had time to debug it), but the problem went away after I rebuilt the kernel from source (a scary proposition given how hard it was to boot up under the best circumstances)! -Phil ........o Philip Budne : o---+----o Shiva Corporation FastPath Project Leader : o | 1 Cambridge Center : Shiva | Cambridge, Ma 02142 Internet: phil@Shiva.COM : . | Tel (617) 252-6300 : o---o Fax (617) 252-6852
bobm%pixel@uunet.UU.NET (01/02/91)
Well, it was indeed a hardware problem. I managed to produce an anomaly using the ROM monitor's "fill", "dump" and "crc" commands, and quickly narrowed it down to "sometimes writes to address N modify address N xor 0x100000." It turned out that signal A20 wasn't making it to the page comparator, U36. Now that's fixed, and Minix comes up partway. When I boot, I get this. NS32000 ROM Debugger Version: Sat Jul 14 19:25:31 PDT 1990 RAM free above 0x1554 Command (? for help): read 0 2000 80 Command (? for help): run 2000 Minix 1.3 kernel version: Sat Sep 29 14:08:32 1990 Pages of user memory: 973 Start user pages: 0x33000 Tue Nov 30 00:00:01 1999 (hmmm.. better reinstall the no-slot clock) ***************************************************************** * * * W E L C O M E T O M I N I X 3 2 0 0 0 * * * * (c) Copyright 1987 Prentice-Hall * * * ***************************************************************** /dev/hd3 mounted login: But it doesn't listen to anything I type. Would anyone care to confirm or deny that this is the first time the machine has tried to do interrupt-driven input from the console? I'm guessing that the ICU works, because I booted the hard drive, and that requires SCSI interrupts. Either DUART 0 is scrozzled or there's a bad connection on signal /DUAR0 (*). Thanks to the several people who sent suggestions. K<bob>
news@daver.bungi.com (01/12/91)
HM, I didn't mean to send this or the other message to the mailing list. Everyone ignore the message except the originator of this thread. Anyhow, there is a missing step in Bruce's instructions, just after "cross your fingers" but before typing "run", and that is: set bkpt0 0 I had the same exact (bizarre) results until I realized that the breakpoint had gotten hit again. (O.K. so you might want to type the command before crossing your fingers)