[comp.unix.microport] Misc uport bugs and observations

zeeff@b-tech.UUCP (Jon Zeeff) (04/02/88)

I recently spent lots of time trying to get a new drive and
controller installed on my Everex '386 machine running Microport '386 unix
2.2.  I had lots of problems and learned the following.


Dcopy doesn't seem to work.  A dcopy from one 4096 drive to another seemed to
work ok, but fsck found many errors (too many to fix).

When using a WD1006-WAH controller, the system will hang if it encounters a
drive error.

Drive error messages don't list the drive #.  It would also be nicer if they
listed the block number for use with mkpart -A.

In the install process, the -V and -v options don't work.  You must enter
all the bad sectors by hand and hope that the list supplied with the drive
is complete.  Mkpart will find bad sectors, but it won't mark them as bad.

Uport unix doesn't seem to reset the disk and try again when it encounters a
disk error.

Uport promises that people waiting for the non beta version of Merge 
that they were promised 30 days ago will be shipped a copy in two 
weeks.  We'll see.  

When splitting a drive between dos and unix, there doesn't seem to be a way
to have the first portion of the drive used for unix and the second used for
dos.  I wanted to do this because the first portion of the drive had fewer
bad sectors.

Hopefully I'm wrong on some of these things.

On a 4096 drive, a WD1006 controller does about 235k/sec with 1:1
interleave.  A normal controller with 1:3 does about 125k/sec.  Both
test were done with "/bin/time cp /dev/dsk/0s1 /dev/null" and using
real time on a unloaded machine.

Here is the bfi<->sector chart I came up with for 1:3 interleave.  I have no
idea if it is correct.

Sector		Bfi
1		100+
2		1900+
3		3700+
4		5500+
5		7300+
6		9100+
7		700+
8		2500+
9		4300+
10		6100+
11		7900+
12		9700+
13		1300+
14		3100+
15		4900+
16		6700+
17		8500+

-- 
Jon Zeeff           		Branch Technology,
uunet!umix!b-tech!zeeff  	zeeff%b-tech.uucp@umix.cc.umich.edu

james@bigtex.uucp (James Van Artsdalen) (04/04/88)

IN article <4387@b-tech.UUCP>, zeeff@b-tech.UUCP (Jon Zeeff) wrote:
> Dcopy doesn't seem to work.  A dcopy from one 4096 drive to another seemed to
> work ok, but fsck found many errors (too many to fix).

Bet you were bit by the dual-drive-failure bug.  To my experience, that bug
is still with us on the 386: it just doesn't print the error message any more.
I had trouble with the WD1003 and WD1006: don't have a second drive to test
the WD1007 with.

> When using a WD1006-WAH controller, the system will hang if it encounters a
> drive error.

I had this problem too.  Pretty much prevents you from using any drive that
does not have the manufacturer's bad sector list.  I did not determine whether
the fault was with the WD1006 or uPort's hd driver (but guess which I suspect).
The problems went away once I corrected the bad sector table for 1:1 interleave
(see last paragraph below: INSTALL makes dumb assumptions).

> In the install process, the -V and -v options don't work.  You must enter
> all the bad sectors by hand and hope that the list supplied with the drive
> is complete.  Mkpart will find bad sectors, but it won't mark them as bad.

To my experience, the manufacturer-supplied bad track list is complete, as
their analog equipment will find anything that a simple write-read test might
hope to find.  As an aside, I seriously question the accuracy of testing any
drive via mkpart or any post-manufacture test: it might be all you can do,
but it may also give you a false sense of security.

A more serious related problem is that uPort does not appear to permit more
than 62 bad sectors per drive.  On a big disk where the manufacturer only gives
the bad track numbers (or if you run third party test programs that return only
track numbers), you can quickly hit this number at 17 sec/trk.  I understand
the desire to limit the size of the alternates table, but not at the cost of
being unable to use a drive (perhaps a binary, not linear, search of the
alternates table is indicated?).

> Uport unix doesn't seem to reset the disk and try again when it encounters a
> disk error.

Is this related to the WD1006 problem reported above?  I assume so.

> On a 4096 drive, a WD1006 controller does about 235k/sec with 1:1
> interleave.  A normal controller with 1:3 does about 125k/sec.  Both
> test were done with "/bin/time cp /dev/dsk/0s1 /dev/null" and using
> real time on a unloaded machine.

The same command gave me 38.3:real, 0.1:user and 23.8:sys with a WD1007/WA2
and a Compaq-damaged CDC Wren III.  Didn't bother to kill cron or anything,
so it was "unloaded" only in that no one was doing anything.  That comes out
to 327K/sec.  Don't know how much time it takes to switch heads, so don't
know what the theoretical maximum rate is, though it's probably less than
three times that value (for an ESDI drive that's really 34 sec/trk - WD1007
emulates 17 sec/trk).

> Here is the bfi<->sector chart I came up with for 1:3 interleave.  I have no
> idea if it is correct.
> 
> Sector		Bfi
> [ table deleted ]

The table shown did not match the table on page 12 of the "Installation Notes
for Runtime System" that came with my documentation.  I'm mail the correct
table to anyone who sends mail (to jva@astro.as.utexas.edu: killer's situation
is probably eating my mail).

Be aware that the INSTALL script on the Build disk assumes that if you don't
have a Televideo, you're using 3:1 interleave.  Dumb assumption with the
WD1006 or WD1007 (ie, Compaq 386/20 with the 150meg hard disk or PC's Ltd
with the 300meg drive).  You have to modify the build disk to use 1:1
interleave and have the bad sectors marked correctly.  Send to address in
above paragraph for details...
-- 
James R. Van Artsdalen       jva@astro.as.utexas.edu         "Live Free or Die"
Home: 512-346-2444 Work: 328-0282; 110 Wild Basin Rd. Ste #230, Austin TX 78746

karl@ddsw1.UUCP (Karl Denninger) (04/05/88)

In article <1446@bigtex.uucp> james@bigtex.UUCP (James Van Artsdalen) writes:
>IN article <4387@b-tech.UUCP>, zeeff@b-tech.UUCP (Jon Zeeff) wrote:
>> Dcopy doesn't seem to work.  A dcopy from one 4096 drive to another seemed to
>> work ok, but fsck found many errors (too many to fix).
>
>Bet you were bit by the dual-drive-failure bug.  To my experience, that bug
>is still with us on the 386: it just doesn't print the error message any more.
>I had trouble with the WD1003 and WD1006: don't have a second drive to test
>the WD1007 with.

Yep; this I have seen on everything from the Televideo systems to a Generic
WA2 to whatever..  The strange thing is that it's not consistant; on one
system it will occur, on another nearly *identical* one it will not.  Strange.
Xenix works great on both, by the way...

>> When using a WD1006-WAH controller, the system will hang if it encounters a
>> drive error.
>
>> Uport unix doesn't seem to reset the disk and try again when it encounters a
>> disk error.
>
>Is this related to the WD1006 problem reported above?  I assume so.

Not necessarially.  Tatung WA2 "compatible" controllers blow up in the same
manner; the system just goes to sleep.  Uport has also done something even
worse to me once or twice; after the first disk error, EVERY WRITE after
that point was junked.  Guess how much of my disk was left by the time I
figured that one out and hit <reset>?

>Be aware that the INSTALL script on the Build disk assumes that if you don't
>have a Televideo, you're using 3:1 interleave.  Dumb assumption with the
>WD1006 or WD1007 (ie, Compaq 386/20 with the 150meg hard disk or PC's Ltd
>with the 300meg drive).  You have to modify the build disk to use 1:1
>interleave and have the bad sectors marked correctly.  Send to address in
>above paragraph for details...

There's more....

From what I can see if you DO say you have a Televideo the system does some
strange things as well.  CORETEST reports 450K/second transfer when the
Televideo system has been formatted at 1:1 under MSDOS.  You can't prep the
disk low-level under DOS if you're going to use it with UNIX; seems as
though you *MUST* low-level format to get the bad-track table on there
(so says their tech support...  why?).  In any event, the formatter goes 
ahead and uses 2:1 interleave, with NO CHOICE! AARRGGHHH!!!!  Only 
240K/second transfer rate results, 1/2 what the system is capable of.

It would be nice if we could use the nicities of the hardware....

----
Karl Denninger                 |  Data: +1 312 566-8912
Macro Computer Solutions, Inc. | Voice: +1 312 566-8910
...ihnp4!ddsw1!karl            | "Quality solutions for work or play"

james@bigtex.uucp (James Van Artsdalen) (04/07/88)

IN article <924@ddsw1.UUCP>, karl@ddsw1.UUCP (Karl Denninger) wrote:
> [...]  You can't prep the
> disk low-level under DOS if you're going to use it with UNIX; seems as
> though you *MUST* low-level format to get the bad-track table on there
> (so says their tech support...  why?).  In any event, the formatter goes 
> ahead and uses 2:1 interleave, with NO CHOICE! AARRGGHHH!!!!  Only 
> 240K/second transfer rate results, 1/2 what the system is capable of.

For the record: I low-level format my disks with Western Digital's programs,
not with uPort's.  This can be inconvenient.  To accomplish this, you must
modify the INSTALL script on a copy of the build disk.  Change the interleave
to the correct value.  If you don't, the bad block table is created with the
wrong values and your system is about to hang...

Karl, are you sure it formatted at 2:1 interleave?  The INSTALL script on my
disks assumes 1:1 for Televideo and 3:1 for all others.  I don't know of any
way to determine the actual interleave after formatting.

If all else fails, it is possible to manually build the /etc/partitions file
on the build disk and then use "mkpart -i disk0" to initialize the VTOC.  This
is a pain, but can be done.  For my own use I now have a modified INSTALL
script that uses the /etc/partitions file on the build disk, and have put "ed"
on the build disk so I can edit the partitions file on floppy.
-- 
James R. Van Artsdalen       jva@astro.as.utexas.edu         "Live Free or Die"
Home: 512-346-2444 Work: 328-0282; 110 Wild Basin Rd. Ste #230, Austin TX 78746

karl@ddsw1.UUCP (Karl Denninger) (04/09/88)

In article <1469@bigtex.uucp> james@bigtex.UUCP (James Van Artsdalen) writes:
>IN article <924@ddsw1.UUCP>, karl@ddsw1.UUCP (Karl Denninger) wrote:
>> [...]  You can't prep the
>> disk low-level under DOS if you're going to use it with UNIX; seems as
>> though you *MUST* low-level format to get the bad-track table on there
>> (so says their tech support...  why?).  In any event, the formatter goes 
>> ahead and uses 2:1 interleave, with NO CHOICE! AARRGGHHH!!!!  Only 
>> 240K/second transfer rate results, 1/2 what the system is capable of.
>
>For the record: I low-level format my disks with Western Digital's programs,
>not with uPort's.  This can be inconvenient.  To accomplish this, you must
>modify the INSTALL script on a copy of the build disk.  Change the interleave
>to the correct value.  If you don't, the bad block table is created with the
>wrong values and your system is about to hang...
>
>Karl, are you sure it formatted at 2:1 interleave?  The INSTALL script on my
>disks assumes 1:1 for Televideo and 3:1 for all others.  I don't know of any
>way to determine the actual interleave after formatting.

Actually it's quite trivial; run CORETEST, which measures the transfer rate,
and compute from there.  It's given reliable numbers for us so far; with a
2:1 interleave (the Uport/386 drive) you see about 240k/second through the 
drive system, the OTHER drive which was formatted with DOS reports the 
(correct) 490K/second transfer rate.

Your comment above about the "wrong" bad track entries is interesting, and
may lead to the FINAL reason for this discrepancy.... When I tried to
install on a HD prepped from Speedstor (my favorite) at 1:1, installation
began ok, then blew up with what was an obvious R/W error on the drive.
Now, I DID specify where the bad spots were during installation; is it 
possible that Uport has blown it w/regards to the mapping of bad regions 
on the disk (ie: BFI <> sector number translation) and is using 2:1
tables for the Televideo?  If this is the case, then 1:1 using their
installation script is not achievable (although cheating might do it).

>If all else fails, it is possible to manually build the /etc/partitions file
>on the build disk and then use "mkpart -i disk0" to initialize the VTOC.  This
>is a pain, but can be done.  For my own use I now have a modified INSTALL
>script that uses the /etc/partitions file on the build disk, and have put "ed"
>on the build disk so I can edit the partitions file on floppy.

Does this ALSO take care of badtracking correctly?  Technical support at
implied that it had something to do with the formatting process (which
doesn't seem right; I've looked at that partitions file).

Can I then assume that if the file '/etc/partitions' is created on the
floppy I can use 'mkpart -i disk0' to init a low-level formatted HD with the
info in that file?  This actually works?

I take it you need to hand-code the defect locations for this as well...

This is a major mess, Microport!

-----
Karl Denninger                 |  Data: +1 312 566-8912
Macro Computer Solutions, Inc. | Voice: +1 312 566-8910
...ihnp4!ddsw1!karl            | "Quality solutions for work or play"

james@bigtex.uucp (James Van Artsdalen) (04/13/88)

IN article <946@ddsw1.UUCP>, karl@ddsw1.UUCP (Karl Denninger) wrote:
> >Karl, are you sure it formatted at 2:1 interleave?  The INSTALL script on my
> >disks assumes 1:1 for Televideo and 3:1 for all others.  I don't know of any
> >way to determine the actual interleave after formatting.

> Actually it's quite trivial; run CORETEST, which measures the transfer rate,
> and compute from there.  [...]

Hmmm.  I had hoped that the various buffering controllers always read the
track in one rotation by simply reading whatever came under the head, instead
of reading the sectors in order.  That would have cut the rotational latency
in half, and made the interleave irrelevant.  Oh well, maybe the next
generation of controllers...

> When I tried to
> install on a HD prepped from Speedstor (my favorite) at 1:1, installation
> began ok, then blew up with what was an obvious R/W error on the drive.
> Now, I DID specify where the bad spots were during installation; is it 
> possible that Uport has blown it w/regards to the mapping of bad regions 
> on the disk (ie: BFI <> sector number translation) and is using 2:1
> tables for the Televideo?  If this is the case, then 1:1 using their
> installation script is not achievable (although cheating might do it).

Well, what I did that worked is as follows:

1) back up old disk to tape
2) make copy of build disk.
3) put stripped /unix with tape driver on the floppy along with ed, tar & ls.
4) edit /INSTALL on floppy so that "intlv" is always 1, and
   disable patches to kernel (because I stripped the kernel in #3 and it
   isn't going to work).
4) format disk under DOS.
5) boot the floppy & go like it says until it boots the hard disk.
6) boot the floppy again, mount hard disk partitions & restore from tape.

> Does this ALSO take care of badtracking correctly?  Technical support at
> implied that it had something to do with the formatting process (which
> doesn't seem right; I've looked at that partitions file).

Setting intlv correctly appears to take care of badtracking.  I've hedged my
bets here as I'm not sure of the cause of Karl's problem.  But in my case it
worked.

I was told by John Sully that the disksetup program does use the intlv
value to calculate the mapping from defect BFI to sector number, so if
intlv doesn't match the interleave, you will have trouble.  I have found
no reason so far to use their formatting process, and have not done so at
any time with unix/386.

> Can I then assume that if the file '/etc/partitions' is created on the
> floppy I can use 'mkpart -i disk0' to init a low-level formatted HD with the
> info in that file?  This actually works?

If you do this by hand, the order can be important.  The correct order is:

1. mkpart -i disk0
2. fdisk /dev/rdsk/0s0 <fdisk.data
3. mkpart -P rootus -P swap -P reserved -P alts disk0
4. mkpart -P usr -P usr2 -P dos				(whichever apply)

Order isn't important here:

5. mkfs  & labelit on rootus, usr and usr2		(whichever apply)
   The mkfs.data file has the constants to use.
6. Create & enlarge the lost+found directories.

At this point I always restore from tape, but you probably can:

7.  mount /dev/dsk/0s0 /mnt
8.  find /dev /bin /etc /shlib /unix /tmp -print | cpio -pdmau /mnt
9.  >/mnt/etc/mnttab
10. mkdir /mnt/mnt /mnt/usr2

The permissions on files need inspection at this point, but it is entirely
possible to set up a disk without using the INSTALL script, using steps 1-6.

One thing though that can't be overemphasized:  the /etc/partitions file on a
disk MUST match exactly the partitions on both hard disks.  If not, then next
time mkpart is run it apparently tries to reset the VTOC (both internal and
on disk) with predictable results (ie, have your backup handy).  Never let
a partitions file wander into /etc unless it's the same file that did the
mkpart -i to your disk...

> I take it you need to hand-code the defect locations for this as well...

No, the disksetup program *appears* to work just fine so long as the intlv
value matches reality.  I let it calculate the values when I was using the
WD1006.  Fortunately, the WD1007 ESDI controller remaps things around and
ESDI drives appear to have no defects to the software (there's a spare
sector every 34 or so for this purpose I gather).
-- 
James R. Van Artsdalen   ...!ut-sally!uastro!bigtex!james    "Live Free or Die"
Home: 512-346-2444 Work: 328-0282; 110 Wild Basin Rd. Ste #230, Austin TX 78746

neighorn@catlabs.UUCP (Steven C. Neighorn) (04/17/88)

In article <1446@bigtex.uucp> james@bigtex.UUCP (James Van Artsdalen) writes:
>Bet you were bit by the dual-drive-failure bug.  To my experience, that bug
>is still with us on the 386: it just doesn't print the error message any more.
>I had trouble with the WD1003 and WD1006: don't have a second drive to test
>the WD1007 with.

I set up a system running dual Toshiba MK56-B drives with an Everex HD
controller (WD compatible) on an Intel 386 motherboard with V/386. This
system has encountered no HD errors in about 8 months of uptime. I might add
that this system supports 16 terminals using two 8-port Digiboards. Normally
there are between 8-10 users on the system during the workday. I am a bit
worried by the reports of dual-HD problems on V/386, even though I have
not encountered any of them yet. Am I lucky? Am I unixing on thin ice?
Just what is going on here?

>I had this problem too.  Pretty much prevents you from using any drive that
>does not have the manufacturer's bad sector list.  I did not determine whether
>the fault was with the WD1006 or uPort's hd driver (but guess which I suspect).
>The problems went away once I corrected the bad sector table for 1:1 interleave
>(see last paragraph below: INSTALL makes dumb assumptions).

I have had good luck using 3rd (4th?) party hard disk analyzers to find bad
sectors, and then using this information for Microport's bad sector input
table. Manufacturer's tests appear much more demanding than anything user
disk analyzers find. Once in a great while though, these user programs *do*
find legitimate errors the manufacturer's tests do not find.
-- 
Steven C. Neighorn    ...!tektronix!{psu-cs,reed,ogcvax}!qiclab!catlabs!neighorn
Portland Public Schools     "Where we train young Star Fighters to defend the
(503) 249-2000 ext 337          frontier against Xur and the Ko-dan Armada"