heselton@admin.okanagan.bcc.CDN (Mike Heselton) (09/29/87)
Having seen very few complaints about LAVC clusters, I am beginning to believe that we are the only site in the world with this problem. Can anyone make us feel a little less alone. We are running the following (tiny) cluster: VAX-11/750 (boot node) MicroVax II - 8mb Memory - 9mb Memory - UDA-50 - Emulex QD32 disk Controller - 2 RA81s - 2 Fujitsu M2361As - DELUA - DEQNA - 1 DMF32 - Emulex TC03 tape Coupler - 2 DZ-11s - Fujitsu M244X tape drive We are running LAVC under VMS V4.5A and have been experiencing the following crashes on our MVII. As far as we can tell, at times when the disk traffic from the 750 to the MVII gets high. VAX/VMS System dump analyzer Dump taken on 23-SEP-1987 10:59:45.23 INVEXCEPTN, Exception while above ASTDEL or on interrupt stack System crash information ------------------------ Time of system crash: 23-SEP-1987 10:59:45.23 Version of system: VAX/VMS VERSION V4.5 VAXcluster node name: OKMV01 Reason for BUGCHECK exception: INVEXCEPTN, Exception while above ASTDEL or on interrupt stack Process currently executing: GOODALL Current image file: OKCADM$DUA0:[SYS2.SYSCOMMON.][RAF]RAFPC.EXE Current IPL: 20 (decimal) General registers: R0 = 00000014 R1 = 00140000 R2 = 80036320 R3 = 801D8950 R4 = 8032F600 R5 = 801CDA40 R6 = 8032F83E R7 = 00000001 R8 = 0000A000 R9 = 801DBD50 R10 = 8035E2E0 R11 = 801C9E00 AP = 00000000 FP = 000001CC SP = 7FFE7C0C PC = 80004862 PSL = 00140009 Processor registers: MicroVAX II P0BR = 80714600 SBR = 008EA800 ASTLVL = 00000004 P0LR = 000006DF SLR = 00005280 SISR = 00000100 P1BR = 7FF2B600 PCBB = 00666878 ICCS = 00000040 P1LR = 001FFA13 SCBB = 008E7200 SID = 08000000 TODR = 98B4EE16 SYSTYPE= 01010000 ISP = 80465A00 KSP = 7FFE7C0C ESP = 7FFE9E00 SSP = 7FFED032 USP = 7FF443FC We first noticed these crashed when we installed the cluster and performed backups of our disks to tape. On an unloaded system we cannot reliably perform a BACKUP/BUFFER=5/NOCRC of the RA81's to the tape drive at 6250 bpi, the system crashes as above. We can perform the same backup at 1600 bpi with no problems. If we remove the BUFFER=5 we can, for the most part, make it through the backup without a crash as long as the system is not completely idle. (ie. I read my morning deluge of INFO-VAX mail, thank god for the volume of mail) We have also noticed these crashes at other times on and off. We have 3 MVIIs and all 3 behave the same, we also have an 11/780 that we have tried as a boot node for a diskless MVII but as soon as we get a few users (students) trying to access the 780s disks we get the identical crash. We don't believe it could be the FUJI disk drives or controller and have a hard time thinking it could be the tape drive or controller, as we have tried it with a diskless MVII with a TK50 as its only tape drive and it still crashes with a few students running. Has anyone out there seen or heard of any similar problems or even solutions? We are planning to SPR the problem, as we have had it since spring when we first setup our little cluster but we thought we would see if anyone else had seen the problem first. Thanks for any help you can give us. Mike Heselton Programmer/Analyst Okanagan College 1000 K.L.O. Road Kelowna, B.C., Canada V1Y 4X8 HESELTON@ADMIN.OKANAGAN.BCC.CDN
russell@CINCOM.UMD.EDU ("CHRIS RUSSELL") (10/09/87)
(My apologies for posting, but I haven't gotten the knack of sending to CSNET yet.) >Having seen very few complaints about LAVC clusters, I am beginning to believe >that we are the only site in the world with this problem. Can anyone make >us feel a little less alone. > >We are running the following (tiny) cluster: > > [Description of Cluster] > >We are running LAVC under VMS V4.5A and have been experiencing the following >crashes on our MVII. As far as we can tell, at times when the disk traffic from >the 750 to the MVII gets high. Mike, First of all, "NO", you are not the only person struggling with an LAVC. I went through Hell and back again to get our 7 satellite LAVC up and running. Especially since our boot node is an 11/750 which is only supposed to handle 5 satellites... :-) I don't know if the problems we encountered are causing your difficulties, but here's a couple of things to look out for that we ran into: 1) The Rev Level on your Microvax DEQNA must be Rev E1 or later. We were running on C2 DEQNAs, and things appeared to be working. However, every once in a while, every satellite would lose touch with every other satellite, causing about 5 or 6 pages of console printout. This happenned several times an hour. Once we upgraded the DEQNAs, the problem went away. 2) The other thing I would suggest is that you get VMS 4.5C from DEC. That's what we're running. I really don't know the exact differences, but I do know that we are running 4.5C and it's running very smoothly now. Hope this helps. ~chris ----------------------------------------------------------------------- INFO-VAX: Love it or Leave it - But No More Meta-Discussions! Please! ----------------------------------------------------------------------- Christopher Russell ARPA: SYSMGR@KING.EE.UMD.EDU Operations Manager JNET: RUSSELL@UMCINCOM Computer Aided Design Lab UUCP: ...!seismo!umcp-cs!eneevax!russell University of Maryland FONE: (301)454-8886/454-8950 "If growing up were fun, I'd have done it already." ----------------------------------------------------------------------- ------