[comp.sys.nsc.32k] I can't boot minix.

bobm%pixel@uunet.UU.NET (12/30/90)

Help!  I can't boot minix.

My hardware is 4 Mb of RAM, with parity, without no-slot-clock,
with the canonical Miniscribe disk.  I'm attached to a Sun 3/50
via serial line.

I've been able to download Jordan's pi-digit program and run it
with no problems.  I've been able to download minix and do many
interesting things with it, but not to boot it.

I'm trying to boot the original distribution of minix, as I haven't
seen any more full releases since then.  Specifically, the EPROM says
this:

	Version: Sat Jul 14 19:25:31 PDT 1990

And minix says this:

	Minix 1.3 kernel version: Sat Jul 14 18:47:33 PDT 1990

When I follow the instructions for booting RAM disk, or for booting
from hard disk (I've got the miniroot loaded onto the disk), minix
seems to start up, then partway through the exec of /etc/init, it
blows up with a bpt instruction at address 0x40.  Poking around leads
me to believe that idle_task() has called zero_page() with an address
in the 0xff000000's and that somehow the stack has been overscribbled.

Is this a hardware bug?  It must be, since what I'm doing is identical
to what other people have done.

This is a subtle hardware bug.  Console and SCSI interrupts are
working.  I've used the debugger for a couple of days now with no
apparent problems.

I've appended a typescript of a typical session below.  Oh, and reply
to bobm@convex.com, as convex's mail configuration is broken this week
and this message's return address won't work.

Note that 0xfd04 is the address of _dirty_pages, the head of the dirty
page list.  It contains a number that definitely isn't a page address.
The word before, 0xfd00, has had the test pattern written into it.
That wasn't there when ram_size() finished running.

Thanks for any insight you can offer.

					K<bob>

------------------------------------------------------------------------------
Script started on Sun Dec 30 01:51:41 1990
jogger-egg> tip hardwire
connected

NS32000 ROM Debugger
Version: Sat Jul 14 19:25:31 PDT 1990
RAM free above 0x1554

Command (? for help): download 2000
~
[EOT]
jogger-egg> dnld root/sys/image
Length=50566 CRC=58777
jogger-egg> tip hardwire
connected
CRC ok, length = 50566
Command (? for help): ed 2024 d
2024(202): 100
2028(0): 
Command (? for help): ed 2028 d
2028(0): d'360
202c(0): 
Command (? for help): set bkpt0 7d0c
Command (? for help): dis r'bkpt0
    7d0c	enter	[r3,r4],0
Command (? for help): run 2000
Minix 1.3 kernel version: Sat Jul 14 18:47:33 PDT 1990
Pages of user memory: 883
    Start user pages: 0x8D000
   Start of RAM disk: 0x32800
    7d0c 	enter	[r3,r4],0	(breakpoint)
Command (? for help): download 32800
~
[EOT]
jogger-egg> dnld disk1/min532
Length=368790 CRC=51953
51.120u 10.100s 6:26.46 15.8% 0+44k 49+1io 47pf+0w
jogger-egg> tip hardwire
connected
CRC ok, length = 368790
Command (? for help): run

Please enter date: MMDDYYhhmmss. Then hit RETURN.
      40 	bpt		(breakpoint)
Command (? for help): show
      r0=ff15e900       r1=ff15f9ff       r2=0              r3=ff15e9ff 
      r4=0              r5=0              r6=0              r7=0        
      f0=0              f1=0              f2=0              f3=0        
      f4=0              f5=0              f6=0              f7=0        
     l1l=0             l1h=0             l3l=0             l3h=0        
     l5l=0             l5h=0             l7l=0             l7h=0        
      pc=40            usp=e470          isp=e254           fp=e470     
      sb=0         intbase=40            mod=20c6          psr=ipsuNzfvltc
     dcr=0             dsr=0             car=0             bpc=0        
     cfg=bf7          ptb0=0            ptb1=3e8000       tear=7024     
     mcr=5             msr=af            fsr=10000           .=7d0f     
      v1=0              v2=0           bkpt0=7d0c        bkpt1=0        
   bkpt2=0           bkpt3=0           bkpt4=0           bkpt5=0        
   bkpt6=0           bkpt7=0           radix=d'16        debug=d'0      
  scrlen=d'24     scsi_adr=1        scsi_lun=0        
Command (? for help): trace
0x3b1e
Command (? for help): disassemble 3af0 16
    3af0        enter   [r3],0
    3af3        bsr     -5863
    3af8        movd    49676(pc),r3
    3afe        cmpqd   0,r3
    3b00        beq     9
    3b02        movd    0(r3),49666(pc)
    3b09        movqd   1,tos
    3b0b        bsr     -5867
    3b10        adjspb  -4
    3b13        cmpqd   0,r3
    3b15        beq     33
    3b17        movd    r3,tos
    3b19        bsr     8609
    3b1e        bsr     -5906
    3b23        movd    49629(pc),0(r3)
    3b2a        movd    r3,49622(pc)
    3b30        adjspb  -4
    3b33        br      -59
    3b36        bsr     -5893
    3b3b        br      -72
    3b3e        exit    [r3]
    3b40        ret     0
Command (? for help): dump fd00 100
0000FD00 A5 A5 A5 A5 | FF 1F 1F 5F | FA FF FF FF | 00 00 00 00 ......._........
0000FD10 04 00 00 00 | FA FF FF FF | 44 00 00 00 | 04 00 00 00 ........D.......
0000FD20 1A FC FF FF | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 ................
0000FD30 00 00 1C C7 | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 ................
0000FD40 00 00 00 00 | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 ................
0000FD50 00 00 00 00 | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 ................
0000FD60 00 00 00 00 | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 ................
0000FD70 00 00 00 00 | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 ................
0000FD80 00 00 00 00 | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 ................
0000FD90 00 00 00 00 | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 ................
0000FDA0 00 00 00 00 | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 ................
0000FDB0 00 00 00 00 | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 ................
0000FDC0 00 00 00 00 | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 ................
0000FDD0 00 00 00 00 | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 ................
0000FDE0 00 00 00 00 | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 ................
0000FDF0 00 00 00 00 | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 ................
Command (? for help): ~
[EOT]
jogger-egg> .
exit

script done on Sun Dec 30 02:10:34 1990

ANTSU@kontu.utu.fi (12/31/90)

Hi!

I think you should "set bkpt0 0" to get rid of the breakpoint (??)
I think there is a "bug" in Bruce's instructions in that
part... The breakpoint is set but not reset.
I am not sure about this but I remember haing had the same suprise myself...

Wshes, Antti-Pekka antsu@kontu.utu.fi

phil@Shiva.COM (Phil Budne) (12/31/90)

I used to see this exactly... I could get around it by setting a break in
the null process and single stepping it until the rc script was running.

I always chalked it up to the fact I was running the CPU at only 10Mhz
(nver have had time to debug it), but the problem went away after I rebuilt
the kernel from source (a scary proposition given how hard it was to
boot up under the best circumstances)!

-Phil
                                    ........o
Philip Budne                        :   o---+----o      Shiva Corporation
FastPath Project Leader             :       o    |      1 Cambridge Center
                                    :     Shiva  |      Cambridge, Ma 02142
Internet: phil@Shiva.COM            :        .   |      Tel (617) 252-6300
                                    :        o---o      Fax (617) 252-6852

bobm%pixel@uunet.UU.NET (01/02/91)

Well, it was indeed a hardware problem.  I managed to produce an
anomaly using the ROM monitor's "fill", "dump" and "crc" commands,
and quickly narrowed it down to "sometimes writes to address N modify
address N xor 0x100000."  It turned out that signal A20 wasn't making
it to the page comparator, U36.  Now that's fixed, and Minix comes up
partway.  When I boot, I get this.

	NS32000 ROM Debugger
	Version: Sat Jul 14 19:25:31 PDT 1990
	RAM free above 0x1554

	Command (? for help): read 0 2000 80
	Command (? for help): run 2000
	Minix 1.3 kernel version: Sat Sep 29 14:08:32 1990
	Pages of user memory: 973
	    Start user pages: 0x33000
	Tue Nov 30 00:00:01 1999   (hmmm.. better reinstall the no-slot clock)
	*****************************************************************
	*								*
	*	 W E L C O M E   T O   M I N I X   3 2 0 0 0		*
	*								*
	*	       (c) Copyright 1987 Prentice-Hall			*
	*								*
	*****************************************************************
	/dev/hd3 mounted
	login: 

But it doesn't listen to anything I type.  Would anyone care to
confirm or deny that this is the first time the machine has tried to
do interrupt-driven input from the console?  I'm guessing that the ICU
works, because I booted the hard drive, and that requires SCSI
interrupts.  Either DUART 0 is scrozzled or there's a bad connection
on signal /DUAR0 (*).

Thanks to the several people who sent suggestions.

					K<bob>

news@daver.bungi.com (01/12/91)

HM, I didn't mean to send this or the other message to the mailing list.

Everyone ignore the message except the originator of this thread.

Anyhow, there is a missing step in Bruce's instructions, just after
"cross your fingers" but before typing "run", and that is:

set bkpt0 0

I had the same exact (bizarre) results until I realized that the breakpoint
had gotten hit again.

(O.K. so you might want to type the command before crossing your fingers)