romwa@gpu.utcs.toronto.edu (Mark Dornfeld) (04/29/88)
Can anyone help with this Xenix error message? error on fixed disk (minor 40), block=16544 Error Type 0, Code 3, Unit 0 Write/Drive Fault The message started appearing with different blocks identified about a month after installation. About a dozen are now listed. I have run 'badtrk' already suspecting a flaw on the disk, but no new bad tracks appeared. Is there a way to find out what Cylinder/Head contains the suspect tracks and put them in the bad track table? Mark T. Dornfeld Royal Ontario Museum 100 Queens Park Toronto, Ontario, CANADA M5S 2C6 mark@utgpu!rom - or - romwa@utgpu
stu@jpusa1.UUCP (Stu Heiss) (05/02/88)
In article <1988Apr29.151753.3956@gpu.utcs.toronto.edu> romwa@gpu.utcs.toronto.edu (Mark Dornfeld) writes:
-
-Can anyone help with this Xenix error message?
-
-error on fixed disk (minor 40), block=16544
-Error Type 0, Code 3, Unit 0
-Write/Drive Fault
-
-The message started appearing with different blocks identified
-about a month after installation. About a dozen are now
-listed.
-
-I have run 'badtrk' already suspecting a flaw on the disk, but
-no new bad tracks appeared.
-
You can use badtrk to map out a block once you know the cylinder/head/sector.
The non-destructive test is not good enough to catch much. The destructive
one is pretty good but can miss too.
-
-Is there a way to find out what Cylinder/Head contains the
-suspect tracks and put them in the bad track table?
-
Look at /usr/adm/messages unless you have the misfortune of a bad block
associated with that file - it happened here once. If this does
happen, do the following:
$ mv /usr/adm/messages /usr/adm/messages.bad
$ touch /usr/adm/messages
When you start haveing disk troubles, *CHECK THE CABLES!!!*. This is
*so* obvious that I never do it first and it has been the problem on
three different machines I'm responsible for. I'm really going to look
there first next time it happens :-). In particular, look for
connector pins that have lost the springiness or are bent, corosion on
the edge connector (remove with a pencil eraser), and if the cable is
bent right at the connector, a possible wire break. If this doesn't
turn up anything, get ready for some hair pulling. I usually do a low
level format, mkfs, and start trying to dd the raw device a number of
times to see if I can isolate a bad block or get some confidence that
the problem was cured with the format. You may want to try swapping
cables and disk controller if you have access to some spares.stephm@sco.COM (Stephen P. Marr) (05/02/88)
romwa@gpu.utcs.toronto.edu (Mark Dornfeld) writes: > <...>Can anyone help with this Xenix error message? > >error on fixed disk (minor 40), block=16544 >Error Type 0, Code 3, Unit 0 >Write/Drive Fault > ><...> > >Mark T. Dornfeld > ><...> > >mark@utgpu!rom - or - romwa@utgpu A Write/Drive Fault means that the controller went bye-bye. I'm responsible for running some 35+ machines here at SCO, and I've seen this error on two machines in the last 2.5 years. On the first occasion, I went through the same grief as you trying to figure out what the bejeezus was wrong; I tried badtrk'ing just about anything that seemed anywhere near the error location (I figured the the location by calculating the start of the filesystem, and knowing my drive parameters, I figured out where an offset of XXX blocks was, and badtrk'd the track before it, the track after, and the offending track. I got a similar error in an unrelated region within two days. "GARFLE" says I, as I proceeded to do it all over again, all the time thinking, "If this keeps up, there won't be much of a disk left." Again, within two days it happened again; so I replaced the drive. That still didn't fix the problem. So I replaced the controller. I've since had the controller tested by the manufacturer, and it indeed turned up faulty, and the original drive has worked perfectly in another machine ever since. So, my advice to you is to replace the controller. Best of luck to you, -- Steph Marr, The Santa Cruz Operation Inc., ...!{uunet,ihnp4,ucscc}!sco!stephm Internet: (MX Handlers) stephm@sco.COM (Others) @ucscc.ucsc.edu:stephm@sco.COM "There was coffee. Life would go on." --William Gibson
jack@turnkey.TCC.COM (Jack F. Vogel) (05/02/88)
In article <1988Apr29.151753.3956@gpu.utcs.toronto.edu> romwa@gpu.utcs.toronto.edu (Mark Dornfeld) writes: > >Can anyone help with this Xenix error message? > >error on fixed disk (minor 40), block=16544 >Error Type 0, Code 3, Unit 0 >Write/Drive Fault > >.[...] >I have run 'badtrk' already suspecting a flaw on the disk, but >no new bad tracks appeared. What type of drive is this? The reason I ask is that I had similar behavior using an Atasi drive here. I would get intermittent errors but when doing even a destructive drive scan the bad sectors would not be found. I hate to tell you this, but eventually the drive really gave up the ghost. It sounds like you may be experiencing a similar problem. I would suggest you get ready to purchase a new drive for the system, or if it is a new drive to get it replaced. The clue is you have drive errors without definite bad sectors found, this indicates failing drive mechanics rather than bad media. One final possibility is a failing controller. In our case the bad drive was the second one, so the controller was extremely unlikely. >Is there a way to find out what Cylinder/Head contains the >suspect tracks and put them in the bad track table? > Yes, remember 17 blocks (sectors) per track times x heads per cylinder. I believe (somebody correct me if wrong) that the blocks are numbered by cylinder meaning head 0 - X times the 17 sectors will equal the block numbers, then move to the next track; or track1,head0 will be block 0-16; track1,head1 will be block 17-33, etc. However, as I indicated above, I suspect you have a creeping drive death here, and that marking indicated block errors will not solve your problem. Hate to be the bearer of bad news. Best regards, -- Jack F. Vogel Turnkey Computer Consultants, Costa Mesa, CA UUCP: ...{nosc|uunet}!turnkey!jack Internet: jack@turnkey.TCC.COM
romwa@gpu.utcs.toronto.edu (Mark Dornfeld) (05/08/88)
In article <497@scovert> stephm@sco.COM (Stephen P. Marr) writes: >romwa@gpu.utcs.toronto.edu (Mark Dornfeld) writes: >> ><...>Can anyone help with this Xenix error message? >> >>error on fixed disk (minor 40), block=16544 >>Error Type 0, Code 3, Unit 0 >>Write/Drive Fault >> >><...> >> >>Mark T. Dornfeld >> >><...> >> >>mark@utgpu!rom - or - romwa@utgpu > >A Write/Drive Fault means that the controller went bye-bye. >I'm responsible for running some 35+ machines here at SCO, >and I've seen this error on two machines in the last 2.5 years. > I've gotten some good advice on this problem and in general everybody's experience is with a bad controller or disk. But here's an update: I noticed a pattern of the times of the bad writes and all except two of them occur between 9 and 10 AM and between 5 and 7 PM. This machine is in a high security collection room in the Museum and there is some type of security device with motion scanners and whatnot very near the computer. ( I learned this after the trouble started happening.) When the doors are either unlocked in the AM or locked in the PM, the security company sends some signals into the security system for verification. The clustering of these bad writes leads me to suspect some high frequency interference triggering an error message or, in fact, a bad write to disk. This is only the current theory and I will suspect everything until I find the problem, but the plot of the times sure isn't anywhere near random. There are no cron processes scheduled during these times so nothing should be writing to disk anyway. Any more help/ideas are welcome. Mark T. Dornfeld Royal Ontario Museum 100 Queens Park Toronto, Ontario, CANADA M5S 2C6 mark@utgpu!rom - or - romwa@utgpu