steve@cdp.UUCP (12/12/89)
SUMMARY -- README ------- We have been experiencing regular crashes running under Interactive 2.0.2 with 3 SCSI disks on an aha1542a. Later in this message is a script which crashes our machine. The purpose of this message is to find other people who are willing to try to replicate these crashes on various machines. I encourage folks to try out the script, even if they do not have our exact hardware configuration. This will help us to better understand the whether the problem lies in hardware or in 2.0.2. DETAILS ------- The symptom of the crashes is that all processes continue to run, but any process that goes for the disk hangs. So, getty prints the login prompt, and accepts a name at login:, but when it goes to spawn login, the exec hangs the system. Switch to a different virtual console, and repeat the same thing. emacs works fine until it tries to auto-save, open a file, etc... The crashes occurs intermittently -- about once a day on our machine that averages 20 users at a time and is up 24 hours a day. On an alternate machine, I have developed a little program that can reliably crash the machine with 3 or more disks within about 2 seconds of invocation. The program looks like this : ----------------------------------------------------------- : # crashix.sh # # hangs i/386 2.0.2 SCSI driver within 2 seconds. # Assume root partition on /dev/dsk/0s1, other # partitions on /dev/dsk/1s3 and /dev/dsk/2s3. # sync sync dd if=/dev/dsk/0s1 bs=128k >/dev/null & dd if=/dev/dsk/1s3 bs=128k >/dev/null & dd if=/dev/dsk/2s3 bs=128k >/dev/null & ---------------------------------------------------------- Notes : o It is important to have all of the dd's in the background, so that they all have the same priority. The occurence of crashes is related disk use intensity; having processes with lower priority reduces the disk use intensity and crash frequency. o You may have different partitioning on your disks. Change the device names to suit your configuration. Try it with 2 3 or 4 disks. o Our hardware configuration is : - Mylex MI-386/20 motherboard, 8MB RAM on board. - aha1542a SCSI host adapter 4 CDC 94161 150MB SCSI disks 1 Archive 2150s "VIPER" cartridge drive - a hercules card. o I have tried crashix.sh with three other motherboards (two based on recent Chips & Technologies chip sets), and the behaviour is the same (crash within 2 seconds). Support at interactive OEM division has returned calls, but skeptical that this is software-related. Thanks for your help. If people that test their machines keep me informed, I will post the results to the net. Steve Fram Chief Programmer Community Data Processing (CdP) {hplabs, pyramid, ...}!cdp!steve (415)-322-9069
jgd@rsiatl.UUCP (John G. De Armond) (12/13/89)
In article <654400003@cdp> steve@cdp.UUCP writes: > > >SUMMARY -- README >------- >We have been experiencing regular crashes running under >Interactive 2.0.2 with 3 SCSI disks on an aha1542a. Later in >this message is a script which crashes our machine. The >purpose of this message is to find other people who are >willing to try to replicate these crashes on various >machines. I encourage folks to try out the script, even if >they do not have our exact hardware configuration. This will >help us to better understand the whether the problem lies in >hardware or in 2.0.2. > >DETAILS >------- >The symptom of the crashes is that all processes continue to >run, but any process that goes for the disk hangs. So, getty >prints the login prompt, and accepts a name at login:, but >when it goes to spawn login, the exec hangs the system. >Switch to a different virtual console, and repeat the same >thing. emacs works fine until it tries to auto-save, open >a file, etc... Steve, We have had the same failure here under similiar conditions. Configuration here is an Adaptec host adaptor and 2 380 mb Newbury data drives. Our problem seemed to manifest itself mostly under pathalogical conditions, such as when a bad block is discovered. I've also seen it when I've been running a script similiar to yours designed to hammer a new hard disk before putting it into service. The external symptoms are as you note PLUS I notice that the activity LED on the Adaptec board is stuck on AND the activity LED on one of the drives is on continously. We now have a bit more data in that it occurs on two totally different drive types. Without any investigation other than external observation, I suspect that the problem has to do with either a buffer getting overrun or a problem with a task releasing the scsi bus to another one. The fact that the problem only occurs either when 2 drives are heavily loaded or when an error condition happens - which appears from the LED activity to tie the bus up for a spell - should be a major clue. I absolutely cannot cause this failure by any combination of loading on one drive. John -- John De Armond, WD4OQC | The Fano Factor - Radiation Systems, Inc. Atlanta, GA | Where Theory meets Reality. emory!rsiatl!jgd **I am the NRA** |
steve@cdp.UUCP (12/14/89)
This is a followup on a posting I made a couple days ago, about being able to easily crash interactive 2.0.2, by reading from 3 SCSI disks simultaneously (running an aha1542a controller). I have replicated the crash on a compaq 386/20e. This was sufficient for interactive to "validate" the bug report -- i.e., they consider it a driver bug. There is no committment on their part to fix it, but the L.A. support person continues to be communicative and sympathetic. Hollis doesn't return my phone calls. We have now come up with a program that will crash 2.0.2 with just 2 SCSI disks and 1 SCSI tape. I suspect that this is a more standard configuration. The tape drive is an archive 2150s (with ROM revisions as recommended in the interactive 1.0.6 release notes). The disks are CDC 94161. Alas, this program takes between 10 minutes and 1 hour to hang the disk driver (as opposed to the 3 disk version, which hung the driver in < 2 seconds). Does anyone have experience running the future domain controller with 3 or more SCSI disks ? We are considering replacing our SCSI disks with 2 high capacity ESDI disks (> 600 MB), running the Western Digital wd1007v-se2 (replacement for WD1007). Does anyone have experience (good or bad) with such a configuration ? Steve Fram Chief Programmer Community Data Processing (CdP) {pyramid, hplabs, ...}!cdp!steve -------------------------------- cut here -------------------------------- : # crashix2.sh # # crash interactive 2.0.2 running just 2 SCSI disks and 1 # SCSI tape. # # disk parameters dd_dev1=/dev/rdsk/0s1 # root dd_dev2=/dev/rdsk/1s3 # one partition on disk dd_count=1000 dd_bs=128k # tape parameters tape_dev=/dev/ct tape_bs=32k while : do echo "New dd loop..." dd if=$dd_dev1 of=/dev/null bs=$dd_bs count=$dd_count 2>/dev/null & ddp1=$! dd if=$dd_dev2 of=/dev/null bs=$dd_bs count=$dd_count 2>/dev/null & ddp2=$! while kill -0 $dd_p1 || kill -0 $dd_p2 do sleep 10 done 2>/dev/null done & while : do echo "New tape loop..." dd if=$tape_dev of=/dev/null bs=$tape_bs done &
kmoore@shiloh.UUCP (kirk moore) (12/15/89)
I have been running a WD7000fasst card with a 280 meg Newbury (scsi) No problems at all. -- Kirk Moore --- Bellevue, WA --- uunet!pilchuck!dataio!-------\ uw-beaver!uw-entropy!dataio!-----shiloh!kmoore shiloh --- Bellevue, WA --- (206) 562-1561(board) - (206) 747-5709(voice)
neese@adaptex.UUCP (12/15/89)
>We have had the same failure here under similiar conditions. Configuration >here is an Adaptec host adaptor and 2 380 mb Newbury data drives. > >Our problem seemed to manifest itself mostly under pathalogical conditions, >such as when a bad block is discovered. I've also seen it when I've been >running a script similiar to yours designed to hammer a new hard disk before >putting it into service. > >The external symptoms are as you note PLUS I notice that the activity LED >on the Adaptec board is stuck on AND the activity LED on one of the drives >is on continously. > >We now have a bit more data in that it occurs on two totally different >drive types. > >Without any investigation other than external observation, I suspect that the >problem has to do with either a buffer getting overrun or a problem with >a task releasing the scsi bus to another one. > >The fact that the problem only occurs either when 2 drives are heavily loaded >or when an error condition happens - which appears from the LED activity >to tie the bus up for a spell - should be a major clue. I absolutely >cannot cause this failure by any combination of loading on one drive. The problem, in this instance anyway,, is the Newbury drives. Newbury SCSI hard drives do not correctly support SCSI bus arbitration. That is what causes the hang condition when there is more than one drive in the system. I had another customer that had the same problem and Newbury finally admitted the problem. Roy Neese Adaptec Central Field Applications Engineer UUCP @ {texbell,attctc}!cpe!adaptex!neese merch!adaptex!neese uunet!swbatl!texbell!merch!adaptex!neese
neese@adaptex.UUCP (12/15/89)
>This is a followup on a posting I made a couple days ago, about >being able to easily crash interactive 2.0.2, by reading from >3 SCSI disks simultaneously (running an aha1542a controller). > >I have replicated the crash on a compaq 386/20e. This was >sufficient for interactive to "validate" the bug report -- >i.e., they consider it a driver bug. There is no committment >on their part to fix it, but the L.A. support person continues >to be communicative and sympathetic. Hollis doesn't return >my phone calls. > >We have now come up with a program that will crash 2.0.2 with >just 2 SCSI disks and 1 SCSI tape. I suspect that this is a >more standard configuration. The tape drive is an archive >2150s (with ROM revisions as recommended in the interactive >1.0.6 release notes). The disks are CDC 94161. Alas, this >program takes between 10 minutes and 1 hour to hang the disk >driver (as opposed to the 3 disk version, which hung the driver >in < 2 seconds). > >Does anyone have experience running the future domain >controller with 3 or more SCSI disks ? We are considering >replacing our SCSI disks with 2 high capacity ESDI disks (> 600 >MB), running the Western Digital wd1007v-se2 (replacement for >WD1007). Does anyone have experience (good or bad) with such a >configuration ? Just to alleviate any concerns. The 154x host adapters support up to the maximum number of SCSI devices you can have (7 targets * 8 LUN's). This has been verified in many ways. I have had as many as 6 hard drives on my SCO 2.3GT system, and ran this test you suppiled with no problems. I let it run for 2 days. I expanded it to hit all 6 drives and still no problems. Good test though. Roy Neese Adaptec Central Field Applications Engineer UUCP @ {texbell,attctc}!cpe!adaptex!neese merch!adaptex!neese uunet!swbatl!texbell!merch!adaptex!neese
steve@corpane.UUCP (Steve Snow) (12/15/89)
>In article <654400003@cdp> steve@cdp.UUCP writes: >> >> >>SUMMARY -- README >>------- >>We have been experiencing regular crashes running under >>Interactive 2.0.2 with 3 SCSI disks on an aha1542a. Later in >>this message is a script which crashes our machine. The >>------- >Steve, >We have had the same failure here under similiar conditions. Configuration >here is an Adaptec host adaptor and 2 380 mb Newbury data drives. We are running ISC 2.0.2 here on an Acer System 15 with two 600meg Micropolis drives. We have however replaced the ISC SCSI driver with the Chantel SCSI driver. I ran the test script for well over 6 minutes with no problems at all. Sounds like to me your problem is in the ISC driver. I highly recommend the Chantel driver since it supports 8mm tape drives and optical disks which we use for backups. The driver has been very solid and instalation was easy. Steve Snow -- Steve Snow| Corpane Industries | DISK Inc. | DISK 300-1200bd | 10100 Bluegrass Pkwy| 5716 Outer Loop | (502)968-5401 | Louisville, KY 40299| Louisville, KY 40219 | thru | ..ukma!corpane!steve| ..ukma!corpane!disk!steve| (502)968-5406
neese@adaptex.UUCP (12/17/89)
>I have been running a WD7000fasst card with a 280 meg Newbury (scsi) > >No problems at all. You won't have any problems until you add another SCSI drive. It doesn't matter whose drive you add, if there is a Newbury drive in the system, it will have problems, regardless of whose controller (adapter) you use, unless the adapter is only capable of doing single-threaded I/O. Roy Neese Adaptec Central Field Applications Engineer UUCP @ {texbell,attctc}!cpe!adaptex!neese merch!adaptex!neese uunet!swbatl!texbell!merch!adaptex!neese
kmoore@shiloh.UUCP (kirk moore) (12/20/89)
Try running the W7000fasst Card with the custom drivers from Columbia Data products. If you are interested I will repost the Number and Address for CDP... -- Kirk Moore --- Bellevue, WA --- uunet!pilchuck!dataio!-------\ uw-beaver!uw-entropy!dataio!-----shiloh!kmoore shiloh --- Bellevue, WA --- (206) 562-1561(board) - (206) 747-5709(voice)