zeeff@b-tech.UUCP (Jon Zeeff) (04/02/88)
I recently spent lots of time trying to get a new drive and controller installed on my Everex '386 machine running Microport '386 unix 2.2. I had lots of problems and learned the following. Dcopy doesn't seem to work. A dcopy from one 4096 drive to another seemed to work ok, but fsck found many errors (too many to fix). When using a WD1006-WAH controller, the system will hang if it encounters a drive error. Drive error messages don't list the drive #. It would also be nicer if they listed the block number for use with mkpart -A. In the install process, the -V and -v options don't work. You must enter all the bad sectors by hand and hope that the list supplied with the drive is complete. Mkpart will find bad sectors, but it won't mark them as bad. Uport unix doesn't seem to reset the disk and try again when it encounters a disk error. Uport promises that people waiting for the non beta version of Merge that they were promised 30 days ago will be shipped a copy in two weeks. We'll see. When splitting a drive between dos and unix, there doesn't seem to be a way to have the first portion of the drive used for unix and the second used for dos. I wanted to do this because the first portion of the drive had fewer bad sectors. Hopefully I'm wrong on some of these things. On a 4096 drive, a WD1006 controller does about 235k/sec with 1:1 interleave. A normal controller with 1:3 does about 125k/sec. Both test were done with "/bin/time cp /dev/dsk/0s1 /dev/null" and using real time on a unloaded machine. Here is the bfi<->sector chart I came up with for 1:3 interleave. I have no idea if it is correct. Sector Bfi 1 100+ 2 1900+ 3 3700+ 4 5500+ 5 7300+ 6 9100+ 7 700+ 8 2500+ 9 4300+ 10 6100+ 11 7900+ 12 9700+ 13 1300+ 14 3100+ 15 4900+ 16 6700+ 17 8500+ -- Jon Zeeff Branch Technology, uunet!umix!b-tech!zeeff zeeff%b-tech.uucp@umix.cc.umich.edu
james@bigtex.uucp (James Van Artsdalen) (04/04/88)
IN article <4387@b-tech.UUCP>, zeeff@b-tech.UUCP (Jon Zeeff) wrote: > Dcopy doesn't seem to work. A dcopy from one 4096 drive to another seemed to > work ok, but fsck found many errors (too many to fix). Bet you were bit by the dual-drive-failure bug. To my experience, that bug is still with us on the 386: it just doesn't print the error message any more. I had trouble with the WD1003 and WD1006: don't have a second drive to test the WD1007 with. > When using a WD1006-WAH controller, the system will hang if it encounters a > drive error. I had this problem too. Pretty much prevents you from using any drive that does not have the manufacturer's bad sector list. I did not determine whether the fault was with the WD1006 or uPort's hd driver (but guess which I suspect). The problems went away once I corrected the bad sector table for 1:1 interleave (see last paragraph below: INSTALL makes dumb assumptions). > In the install process, the -V and -v options don't work. You must enter > all the bad sectors by hand and hope that the list supplied with the drive > is complete. Mkpart will find bad sectors, but it won't mark them as bad. To my experience, the manufacturer-supplied bad track list is complete, as their analog equipment will find anything that a simple write-read test might hope to find. As an aside, I seriously question the accuracy of testing any drive via mkpart or any post-manufacture test: it might be all you can do, but it may also give you a false sense of security. A more serious related problem is that uPort does not appear to permit more than 62 bad sectors per drive. On a big disk where the manufacturer only gives the bad track numbers (or if you run third party test programs that return only track numbers), you can quickly hit this number at 17 sec/trk. I understand the desire to limit the size of the alternates table, but not at the cost of being unable to use a drive (perhaps a binary, not linear, search of the alternates table is indicated?). > Uport unix doesn't seem to reset the disk and try again when it encounters a > disk error. Is this related to the WD1006 problem reported above? I assume so. > On a 4096 drive, a WD1006 controller does about 235k/sec with 1:1 > interleave. A normal controller with 1:3 does about 125k/sec. Both > test were done with "/bin/time cp /dev/dsk/0s1 /dev/null" and using > real time on a unloaded machine. The same command gave me 38.3:real, 0.1:user and 23.8:sys with a WD1007/WA2 and a Compaq-damaged CDC Wren III. Didn't bother to kill cron or anything, so it was "unloaded" only in that no one was doing anything. That comes out to 327K/sec. Don't know how much time it takes to switch heads, so don't know what the theoretical maximum rate is, though it's probably less than three times that value (for an ESDI drive that's really 34 sec/trk - WD1007 emulates 17 sec/trk). > Here is the bfi<->sector chart I came up with for 1:3 interleave. I have no > idea if it is correct. > > Sector Bfi > [ table deleted ] The table shown did not match the table on page 12 of the "Installation Notes for Runtime System" that came with my documentation. I'm mail the correct table to anyone who sends mail (to jva@astro.as.utexas.edu: killer's situation is probably eating my mail). Be aware that the INSTALL script on the Build disk assumes that if you don't have a Televideo, you're using 3:1 interleave. Dumb assumption with the WD1006 or WD1007 (ie, Compaq 386/20 with the 150meg hard disk or PC's Ltd with the 300meg drive). You have to modify the build disk to use 1:1 interleave and have the bad sectors marked correctly. Send to address in above paragraph for details... -- James R. Van Artsdalen jva@astro.as.utexas.edu "Live Free or Die" Home: 512-346-2444 Work: 328-0282; 110 Wild Basin Rd. Ste #230, Austin TX 78746
karl@ddsw1.UUCP (Karl Denninger) (04/05/88)
In article <1446@bigtex.uucp> james@bigtex.UUCP (James Van Artsdalen) writes: >IN article <4387@b-tech.UUCP>, zeeff@b-tech.UUCP (Jon Zeeff) wrote: >> Dcopy doesn't seem to work. A dcopy from one 4096 drive to another seemed to >> work ok, but fsck found many errors (too many to fix). > >Bet you were bit by the dual-drive-failure bug. To my experience, that bug >is still with us on the 386: it just doesn't print the error message any more. >I had trouble with the WD1003 and WD1006: don't have a second drive to test >the WD1007 with. Yep; this I have seen on everything from the Televideo systems to a Generic WA2 to whatever.. The strange thing is that it's not consistant; on one system it will occur, on another nearly *identical* one it will not. Strange. Xenix works great on both, by the way... >> When using a WD1006-WAH controller, the system will hang if it encounters a >> drive error. > >> Uport unix doesn't seem to reset the disk and try again when it encounters a >> disk error. > >Is this related to the WD1006 problem reported above? I assume so. Not necessarially. Tatung WA2 "compatible" controllers blow up in the same manner; the system just goes to sleep. Uport has also done something even worse to me once or twice; after the first disk error, EVERY WRITE after that point was junked. Guess how much of my disk was left by the time I figured that one out and hit <reset>? >Be aware that the INSTALL script on the Build disk assumes that if you don't >have a Televideo, you're using 3:1 interleave. Dumb assumption with the >WD1006 or WD1007 (ie, Compaq 386/20 with the 150meg hard disk or PC's Ltd >with the 300meg drive). You have to modify the build disk to use 1:1 >interleave and have the bad sectors marked correctly. Send to address in >above paragraph for details... There's more.... From what I can see if you DO say you have a Televideo the system does some strange things as well. CORETEST reports 450K/second transfer when the Televideo system has been formatted at 1:1 under MSDOS. You can't prep the disk low-level under DOS if you're going to use it with UNIX; seems as though you *MUST* low-level format to get the bad-track table on there (so says their tech support... why?). In any event, the formatter goes ahead and uses 2:1 interleave, with NO CHOICE! AARRGGHHH!!!! Only 240K/second transfer rate results, 1/2 what the system is capable of. It would be nice if we could use the nicities of the hardware.... ---- Karl Denninger | Data: +1 312 566-8912 Macro Computer Solutions, Inc. | Voice: +1 312 566-8910 ...ihnp4!ddsw1!karl | "Quality solutions for work or play"
james@bigtex.uucp (James Van Artsdalen) (04/07/88)
IN article <924@ddsw1.UUCP>, karl@ddsw1.UUCP (Karl Denninger) wrote: > [...] You can't prep the > disk low-level under DOS if you're going to use it with UNIX; seems as > though you *MUST* low-level format to get the bad-track table on there > (so says their tech support... why?). In any event, the formatter goes > ahead and uses 2:1 interleave, with NO CHOICE! AARRGGHHH!!!! Only > 240K/second transfer rate results, 1/2 what the system is capable of. For the record: I low-level format my disks with Western Digital's programs, not with uPort's. This can be inconvenient. To accomplish this, you must modify the INSTALL script on a copy of the build disk. Change the interleave to the correct value. If you don't, the bad block table is created with the wrong values and your system is about to hang... Karl, are you sure it formatted at 2:1 interleave? The INSTALL script on my disks assumes 1:1 for Televideo and 3:1 for all others. I don't know of any way to determine the actual interleave after formatting. If all else fails, it is possible to manually build the /etc/partitions file on the build disk and then use "mkpart -i disk0" to initialize the VTOC. This is a pain, but can be done. For my own use I now have a modified INSTALL script that uses the /etc/partitions file on the build disk, and have put "ed" on the build disk so I can edit the partitions file on floppy. -- James R. Van Artsdalen jva@astro.as.utexas.edu "Live Free or Die" Home: 512-346-2444 Work: 328-0282; 110 Wild Basin Rd. Ste #230, Austin TX 78746
karl@ddsw1.UUCP (Karl Denninger) (04/09/88)
In article <1469@bigtex.uucp> james@bigtex.UUCP (James Van Artsdalen) writes: >IN article <924@ddsw1.UUCP>, karl@ddsw1.UUCP (Karl Denninger) wrote: >> [...] You can't prep the >> disk low-level under DOS if you're going to use it with UNIX; seems as >> though you *MUST* low-level format to get the bad-track table on there >> (so says their tech support... why?). In any event, the formatter goes >> ahead and uses 2:1 interleave, with NO CHOICE! AARRGGHHH!!!! Only >> 240K/second transfer rate results, 1/2 what the system is capable of. > >For the record: I low-level format my disks with Western Digital's programs, >not with uPort's. This can be inconvenient. To accomplish this, you must >modify the INSTALL script on a copy of the build disk. Change the interleave >to the correct value. If you don't, the bad block table is created with the >wrong values and your system is about to hang... > >Karl, are you sure it formatted at 2:1 interleave? The INSTALL script on my >disks assumes 1:1 for Televideo and 3:1 for all others. I don't know of any >way to determine the actual interleave after formatting. Actually it's quite trivial; run CORETEST, which measures the transfer rate, and compute from there. It's given reliable numbers for us so far; with a 2:1 interleave (the Uport/386 drive) you see about 240k/second through the drive system, the OTHER drive which was formatted with DOS reports the (correct) 490K/second transfer rate. Your comment above about the "wrong" bad track entries is interesting, and may lead to the FINAL reason for this discrepancy.... When I tried to install on a HD prepped from Speedstor (my favorite) at 1:1, installation began ok, then blew up with what was an obvious R/W error on the drive. Now, I DID specify where the bad spots were during installation; is it possible that Uport has blown it w/regards to the mapping of bad regions on the disk (ie: BFI <> sector number translation) and is using 2:1 tables for the Televideo? If this is the case, then 1:1 using their installation script is not achievable (although cheating might do it). >If all else fails, it is possible to manually build the /etc/partitions file >on the build disk and then use "mkpart -i disk0" to initialize the VTOC. This >is a pain, but can be done. For my own use I now have a modified INSTALL >script that uses the /etc/partitions file on the build disk, and have put "ed" >on the build disk so I can edit the partitions file on floppy. Does this ALSO take care of badtracking correctly? Technical support at implied that it had something to do with the formatting process (which doesn't seem right; I've looked at that partitions file). Can I then assume that if the file '/etc/partitions' is created on the floppy I can use 'mkpart -i disk0' to init a low-level formatted HD with the info in that file? This actually works? I take it you need to hand-code the defect locations for this as well... This is a major mess, Microport! ----- Karl Denninger | Data: +1 312 566-8912 Macro Computer Solutions, Inc. | Voice: +1 312 566-8910 ...ihnp4!ddsw1!karl | "Quality solutions for work or play"
james@bigtex.uucp (James Van Artsdalen) (04/13/88)
IN article <946@ddsw1.UUCP>, karl@ddsw1.UUCP (Karl Denninger) wrote: > >Karl, are you sure it formatted at 2:1 interleave? The INSTALL script on my > >disks assumes 1:1 for Televideo and 3:1 for all others. I don't know of any > >way to determine the actual interleave after formatting. > Actually it's quite trivial; run CORETEST, which measures the transfer rate, > and compute from there. [...] Hmmm. I had hoped that the various buffering controllers always read the track in one rotation by simply reading whatever came under the head, instead of reading the sectors in order. That would have cut the rotational latency in half, and made the interleave irrelevant. Oh well, maybe the next generation of controllers... > When I tried to > install on a HD prepped from Speedstor (my favorite) at 1:1, installation > began ok, then blew up with what was an obvious R/W error on the drive. > Now, I DID specify where the bad spots were during installation; is it > possible that Uport has blown it w/regards to the mapping of bad regions > on the disk (ie: BFI <> sector number translation) and is using 2:1 > tables for the Televideo? If this is the case, then 1:1 using their > installation script is not achievable (although cheating might do it). Well, what I did that worked is as follows: 1) back up old disk to tape 2) make copy of build disk. 3) put stripped /unix with tape driver on the floppy along with ed, tar & ls. 4) edit /INSTALL on floppy so that "intlv" is always 1, and disable patches to kernel (because I stripped the kernel in #3 and it isn't going to work). 4) format disk under DOS. 5) boot the floppy & go like it says until it boots the hard disk. 6) boot the floppy again, mount hard disk partitions & restore from tape. > Does this ALSO take care of badtracking correctly? Technical support at > implied that it had something to do with the formatting process (which > doesn't seem right; I've looked at that partitions file). Setting intlv correctly appears to take care of badtracking. I've hedged my bets here as I'm not sure of the cause of Karl's problem. But in my case it worked. I was told by John Sully that the disksetup program does use the intlv value to calculate the mapping from defect BFI to sector number, so if intlv doesn't match the interleave, you will have trouble. I have found no reason so far to use their formatting process, and have not done so at any time with unix/386. > Can I then assume that if the file '/etc/partitions' is created on the > floppy I can use 'mkpart -i disk0' to init a low-level formatted HD with the > info in that file? This actually works? If you do this by hand, the order can be important. The correct order is: 1. mkpart -i disk0 2. fdisk /dev/rdsk/0s0 <fdisk.data 3. mkpart -P rootus -P swap -P reserved -P alts disk0 4. mkpart -P usr -P usr2 -P dos (whichever apply) Order isn't important here: 5. mkfs & labelit on rootus, usr and usr2 (whichever apply) The mkfs.data file has the constants to use. 6. Create & enlarge the lost+found directories. At this point I always restore from tape, but you probably can: 7. mount /dev/dsk/0s0 /mnt 8. find /dev /bin /etc /shlib /unix /tmp -print | cpio -pdmau /mnt 9. >/mnt/etc/mnttab 10. mkdir /mnt/mnt /mnt/usr2 The permissions on files need inspection at this point, but it is entirely possible to set up a disk without using the INSTALL script, using steps 1-6. One thing though that can't be overemphasized: the /etc/partitions file on a disk MUST match exactly the partitions on both hard disks. If not, then next time mkpart is run it apparently tries to reset the VTOC (both internal and on disk) with predictable results (ie, have your backup handy). Never let a partitions file wander into /etc unless it's the same file that did the mkpart -i to your disk... > I take it you need to hand-code the defect locations for this as well... No, the disksetup program *appears* to work just fine so long as the intlv value matches reality. I let it calculate the values when I was using the WD1006. Fortunately, the WD1007 ESDI controller remaps things around and ESDI drives appear to have no defects to the software (there's a spare sector every 34 or so for this purpose I gather). -- James R. Van Artsdalen ...!ut-sally!uastro!bigtex!james "Live Free or Die" Home: 512-346-2444 Work: 328-0282; 110 Wild Basin Rd. Ste #230, Austin TX 78746
neighorn@catlabs.UUCP (Steven C. Neighorn) (04/17/88)
In article <1446@bigtex.uucp> james@bigtex.UUCP (James Van Artsdalen) writes: >Bet you were bit by the dual-drive-failure bug. To my experience, that bug >is still with us on the 386: it just doesn't print the error message any more. >I had trouble with the WD1003 and WD1006: don't have a second drive to test >the WD1007 with. I set up a system running dual Toshiba MK56-B drives with an Everex HD controller (WD compatible) on an Intel 386 motherboard with V/386. This system has encountered no HD errors in about 8 months of uptime. I might add that this system supports 16 terminals using two 8-port Digiboards. Normally there are between 8-10 users on the system during the workday. I am a bit worried by the reports of dual-HD problems on V/386, even though I have not encountered any of them yet. Am I lucky? Am I unixing on thin ice? Just what is going on here? >I had this problem too. Pretty much prevents you from using any drive that >does not have the manufacturer's bad sector list. I did not determine whether >the fault was with the WD1006 or uPort's hd driver (but guess which I suspect). >The problems went away once I corrected the bad sector table for 1:1 interleave >(see last paragraph below: INSTALL makes dumb assumptions). I have had good luck using 3rd (4th?) party hard disk analyzers to find bad sectors, and then using this information for Microport's bad sector input table. Manufacturer's tests appear much more demanding than anything user disk analyzers find. Once in a great while though, these user programs *do* find legitimate errors the manufacturer's tests do not find. -- Steven C. Neighorn ...!tektronix!{psu-cs,reed,ogcvax}!qiclab!catlabs!neighorn Portland Public Schools "Where we train young Star Fighters to defend the (503) 249-2000 ext 337 frontier against Xur and the Ko-dan Armada"