van@triton.unm.edu (Van Rauch) (10/17/90)
Well, I just caught up on about 130 articles in unix.ultrix and didn't see any allusion toward the problem we have but will keep this short and ask for suggestions, no matter how ambiguous. We have a DEC System 5000/200 running 4.0. Attempts to run 3.1d on it were tried first, but one could not build a kernel under 3.1d. So what the heck, lets' go with 4.0. Besides 4.0's innability to function as a YP Server we have a very strange problem with our 5000. About every 3 to 7 days, the system will HANG. No messages on the console, nothing in syserr.hostname.# (uerf), no core file in /usr/adm/crash (savecore is turned on). - nuthin! Each reboot is complaint free excepting of course all the open files that fsck has to deal with. DEC has since replaced the system board twice, originally we were running 4.0 of the firmware, now we have 5.3. (When is DEC going to start shipping updated Hardware Operator Guides with new rev.s of firmware??). Console commands are very different. I do not believe the problem lies in the E'net interface as when the system hangs the console hangs along with it. None of our other systems (all running 3.1x) are exhibiting any sympathetic behavior. Suggestions (obvious/ambiguous) on trouble shooting procedures would be gratefully accepted. Thanks. Van Rauch van@triton.unm.edu Application/Systems University of NM, CIRT
carter@ferrari.mst6.lanl.gov (Dave Carter) (10/17/90)
In article <1990Oct16.211229.18767@ariel.unm.edu> van@triton.unm.edu (Van Rauch) writes: > >DEC has since replaced the system board twice, originally >we were running 4.0 of the firmware, now we have >5.3. (When is DEC going to start shipping updated >Hardware Operator Guides with new rev.s of firmware??). >Console commands are very different. oh please! this is dec we're dealing with - you actually expect to receive documentation that resembles the software they ship, or even software that works on the machine it is shipped with? :-) i had to beg for a hardware operator guide. (i also got an old hardware manual, but new firmware.) my sales rep told me he'd "get back to me with a quote" for this documentation. by the time i got the manual, i had already made some phone calls, and managed to at least boot the thing up. i don't mean to put down dec (we use dec stuff exclusively) but it does seem strange to ship new workstations (the 5000s) along with an operating system (3.x) which doesn't even support the 5000! :-) - dave
jack@cscdec.cs.com (Jack Hudler) (10/18/90)
In article <1258@mustang.mst6.lanl.gov> carter@ferrari.mst6.lanl.gov (Dave Carter) writes: > >i had to beg for a hardware operator guide. (i also got an old >hardware manual, but new firmware.) my sales rep told me he'd "get >back to me with a quote" for this documentation. by the time i got >the manual, i had already made some phone calls, and managed to at >least boot the thing up. I had the same experence, but I still don't have any documentation on what goodies I can attach to this thing. -- Jack Computer Support Corporation Dallas,Texas Hudler Internet: jack@cscdec.cs.com
mellon@fenris.pa.dec.com (Ted Lemon) (10/18/90)
Just to reiterate Ed Santiago's statement of about a month ago, if you look on gatekeeper:pub/DEC, you will find the following useful files: DS5000_boot_commands.ps DS5000_console_command_comparison.ps DS5000_newboot_stuff.tar There's some other stuff there, too, but I don't have Ed's original message, so I don't know if it's useful. _MelloN_
van@triton.unm.edu (Van Rauch) (10/19/90)
In article <1990Oct16.211229.18767@ariel.unm.edu> van@triton.unm.edu (Van Rauch) writes: > >very strange problem with our 5000. About every 3 to 7 days, >the system will HANG. No messages on the console, nothing in >syserr.hostname.# (uerf), no core file in /usr/adm/crash (savecore is >turned on). - nuthin! > I should have rtfm'd before I posted. There exists a doc called: "Starting the Crash Dump Routine Mnaully on RISC Processors" in volume 3 of System and Network Management. As far as I can tell there is a bug in the 4.0 kernel that innocuous user and system processes are tripping over. After spending a few hours with crash vmcore.# vmunix.# a trace on runnable processes at the time of the crash shows different processes that are eventually executing panic and boot instruction, for example: > proc -r SLT S PID PPID PGRP UID PY CPU SIGS EVENT FLAGS ... 80 r 4324 3999 4324 7341 113 255 0 in trace pagi ... > trace 80 Stack trace -- last called first 0 boot (paniced = 0, arghowto = 0) [../../machine/mips/machdep.c: ,545 0x8010 9ea8] 1 panic (s = 80159828) [../../sys/subr_prf.c: ,1159 0x800a3c18] 2 kn02trap_error (ep = ffffdcf8, code = 80112fcc, sr = 0008, signo = ffffdcd4 ... > ps 80 SLOT PID UID COMMAND 80 4324 7341 (sml) > where "sml" is a program made available to students for a cs class. The $60,000 question is, how does one get the text string for the argument to PANIC eg. panic (s = 80159828)? Or more plainly, where do I go from here? The consensus here is that without adb, one can't get it. Does anyone know differently? Each time our 5000 has hanged, a different process leads to the panic and boot. ie. there is no consistency at the csh level for what comamnd is tripping the ?kernel? bug. Without more help from /bin/crash I'm at a loss for how to find the instruction that does the damage. --- And now for someting completely different... cmp different under 4.0 Given two files, foo1 and foo2; foo1 is NONempty and foo2 is empty. And the script, "cmp.csh": #! /bin/csh set x = `cmp foo1 foo2` echo $x echo $x[1] --- under 3.x: ---- fornax.unm.edu:van -> cmp.csh cmp: EOF on foo2 cmp: --- under 4.0 ---- triton.unm.edu:van -> cmp.csh cmp: EOF on foo2 Subscript out of range. This happens because cmp under 4.0 was changed to write EOF diagnostics to std err. instead of std out. Under 3.x EOF diags are written to std out. Yes I'm splitting hairs here, but when your favorite prof comes to you pulling his/her hair out because their homegrown script breaks on the "new" system, it makes you appreciate consistency ;-) --- Van Rauch van@triton.unm.edu Application/Systems University of NM, CIRT
pavlov@canisius.UUCP (Greg Pavlov) (10/20/90)
In article <1258@mustang.mst6.lanl.gov>, carter@ferrari.mst6.lanl.gov (Dave Carter) writes: > i don't mean to put down dec (we use dec stuff exclusively) but it > does seem strange to ship new workstations (the 5000s) along with an > operating system (3.x) which doesn't even support the 5000! :-) > Gee, I guess those weren't 5000 we were running on 2 months ago, after all.......... greg pavlov, fstrf, amherst, ny pavlov@stewart.fstrf.org
gringort@wsl.dec.com (Joel Gringorten) (10/23/90)
In article <1258@mustang.mst6.lanl.gov>, carter@ferrari.mst6.lanl.gov (Dave Carter) writes: |> i don't mean to put down dec (we use dec stuff exclusively) but it |> does seem strange to ship new workstations (the 5000s) along with an |> operating system (3.x) which doesn't even support the 5000! :-) Sorry but this just isn't true. The first version of Ultrix to support the DS5000/200 was 3.1D. -joel
aem@aber-cs.UUCP (Alec D.E. Muffett) (10/24/90)
In article <1990Oct18.213749.29975@ariel.unm.edu> van@triton.unm.edu (Van Rauch) writes: >In article <1990Oct16.211229.18767@ariel.unm.edu> van@triton.unm.edu (Van Rauch) writes: >> >>very strange problem with our 5000. About every 3 to 7 days, >The $60,000 question is, how does one get the text string for >the argument to PANIC eg. panic (s = 80159828)? Or more >plainly, where do I go from here? > >The consensus here is that without adb, one can't get it. >Does anyone know differently? Try 'where' under dbx -k vmunix.# vmcore.# alec JANET aem@uk.ac.aber INET: aem@cs.aber.ac.uk or aem@aber.ac.uk UUCP: ...!mcsun!ukc!aber-cs!aem ARPA: aem%uk.ac.aber.cs@nsfnet-relay.ac.uk,aem%uk.ac.aber@nsfnet-relay.ac.uk BITNET: <play around with aem%aber@ukacrl, ok?> SNAIL: Alec Muffett, Computer Unit, Llandinam Building, UCW Campus, Aberystwyth, UK, SY23 3DB