marzusch@odiehh.hanse.de (Ralph-Diether Marzusch) (06/13/91)
This is a summary of a discussion regarding certain floppy and harddisk problems under ISC UNIX (2.0.2 and 2.2.1). This topic has recently been discussed in a local german newsgroup. Since nobody came up with a solution to this problem I'm summarizing and repeating it here: Due to additional hints and experiences from gemini@geminix.in-berlin.de (Uwe Doering) and tik@abqhh.hanse.de (Michael Havemester) I found two problems with ISC's harddisk and floppy drivers (possibly not related to each other): 1st problem: When using certain motherboards and ET4000 VGA controllers together write accesses to the floppy disk(s) may fail, however these failures are not reported at all (invalid data will be written without notice). This happens frequently when mounting floppy file systems read/write (causing the whole file system to be destroyed) and less frequently when writing to floppy disks sequentially (e.g. copying disks). Since the `install' disk is a `mounted' file system and the install procedure writes quite a lot, installation of the OS fails 9 out of 10 times because the install floppy gets destroyed (a *copy* of the install disk, of course ...). 2nd problem: If you connect two (!) hard disk drives to one (ore even two) `standard' AT type hard disk controller (i.e. MFM, RLL or ESDI drives) you may experience a `hanging' disk controller (making any further disk accesses impossible which possibly destroys one or more file systems) when both disks are accessed concurrently (to reproduce this problem try to enable an additional swap partition on the second volume, this will cause lots of disk accesses on both disks when the system starts paging or swapping). There seem to be no solutions to these problems, just some workarounds: workarounds for 1st problem: a) adjust `setup' parameters (DMA rate or bus clock may help a little bit but will probably slow down the machine) b) Michael told me this problem disappeared when he installed the ET4000 in an 8 bit slot; however this doesn't help much since Thomas Roell's X11 server requires a 16 bit path to the VGA controller c) don't *write* to floppy disks any more until a fix is available - reading floppys seems to work without problems (or, at least, verify *every* floppy you write - especially backup floppys!) d) try another motherboard (if you can afford buying new hardware each time some software fails ...) workarounds for 2nd problem: a) connect only *one* disk to your system b) throw away your `standard' disk controller (and the disks connected to it) and get a SCSI controller and SCSI disk [a good decision anyway, but quite expensive if you already have two other disks ...] c) try another motherboard ... [Personally, after having had to re-install the whole system at least 3 times I finally decided to get a SCSI disk - which seems to have solved the 2nd problem.] ISC seems to know about problem 1 since even my distributor in Germany told me it might have to do with the ET4000 controller, however there seems to be no fix. Both problems did already appear under ISC 2.0.2, but the situation became worse when I tried (and finally succeeded after several retries :-( ) to install ISC 2.2.1. Is there anybody out there who is or was stuck with the same problems (and possibly knows how to solve them other than buying new hardware)? Anybody at ISC listening? -- Ralph-Diether -- .--------------------------------------------------------------------. | Ralph-Diether Marzusch, Rehwinkel 2, W-2070 Grosshansdorf, Germany | | E-mail: marzusch@odiehh.hanse.de Voice: +49 4102 64193 | `--------------------------------------------------------------------'
karln@uunet.uu.net (06/14/91)
In article <113@odiehh.hanse.de> marzusch@odiehh.hanse.de writes: > >1st problem: > When using certain motherboards and ET4000 VGA controllers together > write accesses to the floppy disk(s) may fail, however these failures > b) Michael told me this problem disappeared when he installed the ET4000 in > an 8 bit slot; however this doesn't help much since Thomas Roell's > X11 server requires a 16 bit path to the VGA controller This brings to mind a problem I was having with _some_ of these video cards. My SCSI would not boot at all. The problem turned out the the video card was treating the 16-bit access line of the buss very poorly. There was a jumper, labeled JP-4, that although undocumented, turned out to force the VGA card BIOS into 8 bit mode, same (alomst) as plugging the card into a 8 bit slot. The difference being that the card chipset itself was still acessible though the 16 bit slot. Only the cards BIOS ran in 8 bit mode. Well this solved the problem I was having at a considerable loss in X11perf results. Further investigation however revealed that I could get the motherboard to shadowram the VGA BIOS area (C000:0 - C7FF:FFFF, or C000 - C800, or at least 32k starting at C000, whatever you understand). After that I got all my performance back and everybody is still happy. PS: This also sort of answers a question I posted yesterday (6/13) about 16 Bit VGA compatiblity with Hard Disk Controllers. I have seen a ET4000 card that did not have the jp-4 (located on mine, in the middle of the board near the buss connector) but seems to work fine with my exact same set of hardware. I suppose that means that someone worked it out properly, but it could mean that marginal problems such as yours just have not shown up yet. This person tends to install from a tape backup. Anyway I hope this helps. These kind of problems really hurt, I know. Karl Nicholas -- *********************************************************************** | Karl Nicholas | A recent Gallop Poll showed that 1 in 6 | | karln!karln@uunet.uu.net | Americans have spoken to a dead person. | ***********************************************************************
davidg%aegis.or.jp@kyoto-u.ac.jp (Dave McLane) (06/15/91)
marzusch@odiehh.hanse.de (Ralph-Diether Marzusch) writes: > 2nd problem: > If you connect two (!) hard disk drives to one (ore even two) `standard' > AT type hard disk controller (i.e. MFM, RLL or ESDI drives) you may > experience a `hanging' disk controller (making any further disk accesses > impossible which possibly destroys one or more file systems) when both > disks are accessed concurrently (to reproduce this problem try to enable > an additional swap partition on the second volume, this will cause lots > of disk accesses on both disks when the system starts paging or swapping Sorry, I came in late on this one but did anyone report problems with one ESDI and one IDE? or one ESDI and one SCSI I have a 320MB ESDI running and want to add another disk a Dell System 333D and ISC 2.2.1.... Dave -- Dave McLane <davidg%aegis.or.jp@kyoto-u.ac.jp>
randy@chinet.chi.il.us (Randy Suess) (06/15/91)
In article <113@odiehh.hanse.de> marzusch@odiehh.hanse.de writes: > b) Michael told me this problem disappeared when he installed the ET4000 in > an 8 bit slot; however this doesn't help much since Thomas Roell's > X11 server requires a 16 bit path to the VGA controller Not true. Any software couldn't care less if a VGA card is in the 8 bit or 16 bit slot. And you will not notice any differance in which slot it is in. The VGA spec is an 8 bit one. The 16 bit nonsense has to do with the on-board ROM, which is not used by any software that wants performance. Under DOS, Windows bypasses the bios and goes directly to the (8 bit) hardware, as does X11. You run into more problems using the 16 bit mode than it is worth because of the AT bus's problem with a 16 bit memory access grabbing a whole 256k chunk of memory. Any 8 bit memory addressing in this space (such as an ethernet card) loses. -- Randy Suess randy@chinet.chi.il.us
john@jwt.UUCP (John Temples) (06/15/91)
In article <1991Jun14.153823.23334@uunet.uu.net> karln@karln.UUCP () writes:
[ Putting the VGA BIOS in 8-bit mode hurt X performance; shadowing the BIOS
brought performance back up. ]
I don't see how anything to do with the VGA BIOS or shadow RAM could affect
your UNIX performance.
--
John W. Temples -- john@jwt.UUCP (uunet!jwt!john)
karln@uunet.uu.net (06/17/91)
In article <1991Jun15.161431.395@jwt.UUCP> john@jwt.UUCP (John Temples) writes: >In article <1991Jun14.153823.23334@uunet.uu.net> karln@karln.UUCP () writes: >[ Putting the VGA BIOS in 8-bit mode hurt X performance; shadowing the BIOS > brought performance back up. ] > >I don't see how anything to do with the VGA BIOS or shadow RAM could affect >your UNIX performance. >-- I do not see how it would make any _noticeable_ difference for UNIX either. The statement was 'X' performance, as in x11perf. With the VGA BIOS non shadowed/in 8 bit mode, I was getting ~20,000 -dot per second. With the VGA BIOS shadowed or in 16 bit mode I am getting ~50,000 -dots per second. Hope this helps :-) Karl. -- *********************************************************************** | Karl Nicholas | A million monkeys in a million years | | karln!karln@uunet.uu.net | did write Shakespear, we evolved ... | ***********************************************************************
karln@uunet.uu.net (06/17/91)
In article <1991Jun15.003544.4596@chinet.chi.il.us> randy@chinet.chi.il.us (Randy Suess) writes: > by any software that wants performance. Under DOS, Windows bypasses > the bios and goes directly to the (8 bit) hardware, as does > X11. You run into more problems using the 16 bit mode than it Is this true for Roell's X11R4 implmentation? If it is, how come when I "SHADOW RAM" the VGA BIOS, I get better than double the x11perf test results? Just Curious, Karl. -- *********************************************************************** | Karl Nicholas | A million monkeys in a million years | | karln!karln@uunet.uu.net | did write Shakespear, we evolved ... | ***********************************************************************
rcbarn@rwc.urc.tue.nl (Raymond Nijssen) (06/17/91)
marzusch@odiehh.hanse.de (Ralph-Diether Marzusch) writes: > This is a summary of a discussion regarding certain floppy and harddisk > problems under ISC UNIX (2.0.2 and 2.2.1). > >[...] I found two problems >with ISC's harddisk and floppy drivers (possibly not related to each other): >1st problem: > When using certain motherboards and ET4000 VGA controllers together > write accesses to the floppy disk(s) may fail, however these failures > are not reported at all (invalid data will be written without notice). You omitted detailed info about your floppy controller, the I/O bus speed, wait-states, motherboards etc. If you want detailed answers, you'll have to supply all the details..... >2nd problem: > If you connect two (!) hard disk drives to one (ore even two) `standard' > AT type hard disk controller (i.e. MFM, RLL or ESDI drives) you may This statement is *very* inaccurate!! see below. > experience a `hanging' disk controller (making any further disk accesses > impossible which possibly destroys one or more file systems) when both > disks are accessed concurrently > >There seem to be no solutions to these problems, just some workarounds: This statement is even more inaccurate!! see below. >workarounds for 2nd problem: > a) connect only *one* disk to your system You gotta be kiddin'.... > b) throw away your `standard' disk controller (and the disks connected to > it) and get a SCSI controller and SCSI disk [a good decision anyway, but > quite expensive if you already have two other disks ...] Please don't advise people to do something silly like this .... > c) try another motherboard ... That does it! Oh Connor, if you're reading this, please add this issue to the FAQ list!! Here we go again: The second problem, the one with the HD controller locking up under ISC is known to occur in combination with Western Digital's WD1006Vsr2 RLL controller; it contains a bug which only occurs if 2 drives are attached to it and if the HD device driver has the 'overlapped-seeks' feature enabled, like the one in ISC/ix has by default. It does not happen with ESDI. >Anybody at ISC listening? Yes they are, but I recon they got kind of fed up with people blaming ISC time and time again because of a bug in a product which is not theirs. However, some time ago, they repeatedly took effort to help people out in this one, see below: :Article 7943 of comp.unix.i386: :Path: al.ele.tue.nl!svin02!hp4nl!mcsun!uunet!isis!ico!dougp :>From: dougp@ico.isc.com (Doug Pintar) :Newsgroups: comp.unix.i386 :Subject: Re: ISC 2.0.2: "PANIC athd_recvdata: LOGIC ERROR missing MEMBREAK" :Message-ID: <1990Aug27.161420.11723@ico.isc.com> :Date: 27 Aug 90 16:14:20 GMT :References: <9008171007.aa06635@PARIS.ICS.UCI.EDU> <483@comcon.UUCP> :Reply-To: dougp@ico.ISC.COM (Doug Pintar) :Organization: Interactive Systems Corp., Boulder CO :Lines: 27 : :In article <483@comcon.UUCP> tim@comcon.UUCP (Tim Brown) writes: :>The WD1006SVR2 has a known history of locking up with ISC. Something :>to do with the card being unable to recover from errored seek aheads. :>As I understand it the 1006 does multiple seeks and if one fails it :>*should* go back and do single seeks, which it doesn't do correctly :>thus it locks up. BTW, ISC *said* they would have a work around in :>the 2.2 kernel. They don't. I have seen the same thing with 2.2. :>THe only fix I know of is to get another controller. : :Well, this is kinda sorta true. The HPDD, by default, WILL do overlapped :seeks on multi-drive AT-controller systems. To pull this stunt off, it counts :on the controller doing retries if a data transfer request gets a 'drive busy' :error when it's still performing a previously-requested seek operation. The :WD series of controllers had been able to do this since the 1001, 'way back :when. Somehow it got broken in some revs of the 1006. The fix, which has :been around as long as the HPDD and *requires* no change in the product, is :to change the file /etc/conf/pack.d/dsk/space.c AFTER configuring the HPDD :for your system. In the 'disk_config_tbl' entry for either a primary or :secondary AT hard disk is a line that looks like: : (CCAP_RETRY | CCAP_ERRCOR), /* capabilities */ :change this to be: : (CCAP_RETRY | CCAP_ERRCOR | CCAP_NOSEEK), /* capabilities */ :to cripple the overlapped seek stuff. This should fix the problem if it's :really a multi-drive seek condition that's doing you in. : :Good luck, :DLP >Is there anybody out there who is or was stuck with the same problems >(and possibly knows how to solve them other than buying new hardware)? Hey, of course! Have faith in the net! The patch mentioned above _does_ circumvene the bug in your hardware. It worked for me. Have a look at your HD controller. If it says 'PROTO' on one of the big chips, it's very likely that you have a buggy version, this was at least the case when I gathered lots of reactions about this topic: All systems which had such a controller suffered from lockups; All people who had a controller without 'PROTO' stated that they had never noticed any problem like that. -Raymond -- | Raymond X.T. Nijssen | Eindhoven Univ. of Technology | | raymond@es.ele.tue.nl | EH 7.13, PO 513, 5600 MB Eindhoven, The Netherlands | | "Don't put that on the wall in a tax-payer supported museum!" Pat Buchanan |
gemini@geminix.in-berlin.de (Uwe Doering) (06/18/91)
rcbarn@rwc.urc.tue.nl (Raymond Nijssen) writes: >Here we go again: > >The second problem, the one with the HD controller locking up under ISC >is known to occur in combination with Western Digital's WD1006Vsr2 RLL >controller; it contains a bug which only occurs if 2 drives are >attached to it and if the HD device driver has the 'overlapped-seeks' >feature enabled, like the one in ISC/ix has by default. It does not >happen with ESDI. Not so fast, please. That CCAP_NOSEEK fix may be a cure for some problems people have with ISC's HD driver, but not for all. I had problems for months with the following HD/controller combination: Adaptec 2322D ESDI controller Fujitsu 2249E ESDI HD drive (320 MB formatted) I had two of those drives. My news partition was on the second one. Every one or two days the second disk wasn't readable any more. That is, it was still accessible by the controller and was at least seeking and recalibrating while the HD driver tried to read a sector. But no sector on the whole HD was readable/writable any more. Note that the UNIX kernel wasn't hung or the like. It only couldn't read or write the second drive, but was otherwise healthy. After shutting down UNIX, the computer had to get a hardware reset. A soft boot wasn't enough to cure this problem. In those months where I tried to fix this problem I swapped anything. I changed the second HD drive to another model (Fujitsu 2263E, 620 MB). Same problem. I then used two controllers, one for every HD drive. Didn't help. I got a new cable set. I even exchanged the power supply and for some time had each HD drive on its own power supply. Nothing helped. The only thing I didn't replace was the main board. But how could a main board decide that it only wants to disable the second HD drive. Remember, the problem was the same with both drives on the same controller _and_ with each drive on its own controller. And of course I tried the CCAP_NOSEEK trick. But it didn't help either. So what's left? It can't be an incompatibility between the Adaptec controller and the Fujitsu HD drives, because the first drive was the same model as (originally) was the second drive, but this malfunction never happend with the first drive. After experiencing all that, I don't have any other choice but to suspect that this is a bug in ISC's HD driver. I recently switched to SCSI HD drives, and the problem was gone, as was to be expected. Uwe -- Uwe Doering | INET : gemini@geminix.in-berlin.de Berlin |---------------------------------------------------------------- Germany | UUCP : ...!unido!fub!geminix.in-berlin.de!gemini
edhall@rand.org (Ed Hall) (06/19/91)
In article <rcbarn.677172100@rwc.urc.tue.nl> rcbarn@urc.tue.nl writes: >marzusch@odiehh.hanse.de (Ralph-Diether Marzusch) writes: >>2nd problem: >> If you connect two (!) hard disk drives to one (ore even two) `standard' >> AT type hard disk controller (i.e. MFM, RLL or ESDI drives) you may > >This statement is *very* inaccurate!! see below. > >> experience a `hanging' disk controller (making any further disk accesses >> impossible which possibly destroys one or more file systems) when both >> disks are accessed concurrently >> >>There seem to be no solutions to these problems, just some workarounds: > >This statement is even more inaccurate!! see below. It might be. I've had infrequent but persistant lockup problems with ISC and a WD1006SR2 and a *SINGLE* disk. >That does it! Oh Connor, if you're reading this, please add this issue to the >FAQ list!! > >Here we go again: > >The second problem, the one with the HD controller locking up under ISC >is known to occur in combination with Western Digital's WD1006Vsr2 RLL >controller; it contains a bug which only occurs if 2 drives are >attached to it and if the HD device driver has the 'overlapped-seeks' >feature enabled, like the one in ISC/ix has by default. It does not >happen with ESDI. Wrong, at least in my case and in the case of some other folks who have written that they have the lockup problem. THERE ARE TWO DIFFERENT PROBLEMS WITH THE WD1006SR2 AND UNIX. The second problem doesn't even seem to be limited to ISC; there was a fellow running ESIX who posted a complaint about it a few months back. I exchanged messages with him and verified that it had nothing to do with the overlapped-seek problem. The overlapped-seek problem bites much faster, and tends to cause bad data to be read and/or written to one or both disks. The lockup problem generally happens after hours of heavy use, especially when a lot of expansion swaps are taking place. (Adding memory almost completely cured my problem--from a few lockups a week to one every few months). The symptom is that the disk light comes on and stays on (this is similar to what can happen with the overlapped-seek problem); after a reset, damage to filesystems is generally limited to unflushed buffers (which differs from the overlapped-seek problem, which frequently trashes one or more filesystems). The clincher is that the lockup problem can happen with just a single disk. As you state, there is a relatively simple fix for the overlapping seek problem--simply disable overlapping seeks. I wish that the solution for the lockup problem were so simple. -Ed Hall edhall@rand.org
rcbarn@rwa.urc.tue.nl (Raymond Nijssen) (06/19/91)
edhall@rand.org (Ed Hall) writes: >In article <rcbarn.677172100@rwc.urc.tue.nl> rcbarn@urc.tue.nl writes: >>marzusch@odiehh.hanse.de (Ralph-Diether Marzusch) writes: >>> If you connect two (!) hard disk drives to one (ore even two) `standard' >>> AT type hard disk controller (i.e. MFM, RLL or ESDI drives) you may >>> experience a `hanging' disk controller (making any further disk accesses >>> impossible which possibly destroys one or more file systems) when both >>> disks are accessed concurrently >It might be. I've had infrequent but persistant lockup problems with >ISC and a WD1006SR2 and a *SINGLE* disk. Although does not sound like the problem the original poster addressed, I'm very interested whether your controller has a chip with 'PROTO' on it, as I've never heard of this problem. Could you please have a look? >The clincher is that the lockup problem can happen with just a single disk. >As you state, there is a relatively simple fix for the overlapping >seek problem--simply disable overlapping seeks. I wish that the >solution for the lockup problem were so simple. Well, it worked perfectly for me. I never experienced the second kind of lockups, even when I was running X on my system with just 4Mb, swap partitions on both disks and huge, almost continuous swapping. This is why I still believe that the solution for the lockup problem is so simple. (For you info: without this fix, the system would crash within two or three minutes after having started X). I now have another WD1006Svr2 (whithout the infamous 'PROTO' message) for some months, overlapped-seeks re-enabled, and the system has never paniced ever since. -Raymond -- | Raymond X.T. Nijssen | Eindhoven Univ. of Technology | | raymond@es.ele.tue.nl | EH 7.13, PO 513, 5600 MB Eindhoven, The Netherlands | | "Don't put that on the wall in a tax-payer supported museum!" Pat Buchanan |
rcbarn@rwa.urc.tue.nl (Raymond Nijssen) (06/19/91)
gemini@geminix.in-berlin.de (Uwe Doering) writes: >rcbarn@rwc.urc.tue.nl (Raymond Nijssen) writes: >> >>The second problem, the one with the HD controller locking up under ISC >>is known to occur in combination with Western Digital's WD1006Vsr2 RLL >>controller; it contains a bug which only occurs if 2 drives are >>attached to it and if the HD device driver has the 'overlapped-seeks' >>feature enabled, like the one in ISC/ix has by default. It does not >>happen with ESDI. >Not so fast, please. That CCAP_NOSEEK fix may be a cure for some problems >people have with ISC's HD driver, but not for all. Sure, but the original posting was about lockups occurring when two drives were accessed simultaneously. As far as I know, this fix is only useful to circumvene a bug in some WD1006Svr2 controllers. >I had problems for months with the following HD/controller combination: >[...] >Adaptec 2322D ESDI controller >Fujitsu 2249E ESDI HD drive (320 MB formatted) >[ .... lots of detailed info omitted .... ] Hear! Hear! This is the degree of detail useful to all of us. > After experiencing all that, I don't have any other choice but to > suspect that this is a bug in ISC's HD driver. Maybe, maybe not: when I had troubles with my WD1006, I decided to go for Adaptec's 2372 rev. C. I tried 2 (well, actually 3, but one was DOA) of them, and both did not work as expected, each in a different way however. I was quite mad at Adaptec at that time, especially when I got piles of reactions from other people on the net who had similar experiences with this revision (rev. B seemed to work fine). Some months thereafter, it turned out that there was a bug in my C&T AMI clone-motherboard as my newly purchased Wangtek tape streamer controller caused the system to hang (even without panic message) under ISC, not under AT&T unix though. It might very well be that the Adaptec controllers were not to blame for the problems I had with them. -Raymond -- | Raymond X.T. Nijssen | Eindhoven Univ. of Technology | | raymond@es.ele.tue.nl | EH 7.13, PO 513, 5600 MB Eindhoven, The Netherlands | | "Don't put that on the wall in a tax-payer supported museum!" Pat Buchanan |
jdeitch@jadpc.cts.com (Jim Deitch) (06/20/91)
In article <113@odiehh.hanse.de> marzusch@odiehh.hanse.de writes: > > This is a summary of a discussion regarding certain floppy and harddisk > problems under ISC UNIX (2.0.2 and 2.2.1). This topic has recently been > discussed in a local german newsgroup. Since nobody came up with a solution > to this problem I'm summarizing and repeating it here: > > >Due to additional hints and experiences from gemini@geminix.in-berlin.de >(Uwe Doering) and tik@abqhh.hanse.de (Michael Havemester) I found two problems >with ISC's harddisk and floppy drivers (possibly not related to each other): > >2nd problem: > If you connect two (!) hard disk drives to one (ore even two) `standard' > AT type hard disk controller (i.e. MFM, RLL or ESDI drives) you may > experience a `hanging' disk controller (making any further disk accesses > impossible which possibly destroys one or more file systems) when both > disks are accessed concurrently (to reproduce this problem try to enable > an additional swap partition on the second volume, this will cause lots > of disk accesses on both disks when the system starts paging or swapping). > >There seem to be no solutions to these problems, just some workarounds: > >workarounds for 2nd problem: > a) connect only *one* disk to your system > b) throw away your `standard' disk controller (and the disks connected to > it) and get a SCSI controller and SCSI disk [a good decision anyway, but > quite expensive if you already have two other disks ...] > c) try another motherboard ... > I must have someone watching over me then. For the last year I have been running a system with 2 mfm controllers with 2 drives each and haven't had any of the problems you describe show up. Both controllers are WD1006, 1 with floppy (MM2) and 1 without (MM1). In the last 6 months I moved over to a bigger system. I now have 1 ESDI (WD1007-SV2), 1 MFM (WD1006-mm1), and an Adaptec 1542A in the system. All 3 controllers have 2 drives, with the adaptec also having a tape drive. Still no problem. I run Cnews on the system and have the spool and bin directories on different drives on the same controller (MFM) and haven't had any problems. I have 2 swap areas defined, ESDI drive 1 and 2, and haven't had a problem. What type controllers and motherboard are you using? Jim -- ARPANET: jadpc!jdeitch@nosc.mil INTERNET: jdeitch@jadpc.cts.com UUCP: nosc!jadpc!jdeitch