[comp.unix.microport] How does Microport System V/AT handle bad blocks?

rd@tarpit.UUCP (Bob Thrush) (12/18/88)

About 3 months ago, the 2nd drive on this System V/AT 2.3.1 system died.
It was replaced, the entire drive was formatted, one partition was
created with /etc/fdisk and 2 file systems were made.  In the past 
month, I have been noticing intermittent "HD I/O Errors ..." often 
followed by serious file system problems on the replacement drive.  
I have searched the printed manuals and man pages and have not 
found any documentation of this error.

Here are a few samples:

HD I/O Error Fun: 30 Cyl: 329 Hd: 5 Sec:  9 Status: 51 Estat: 10 Drstat: A5
HD I/O Error Fun: 20 Cyl: 346 Hd: 3 Sec: 12 Status: 59 Estat: 10 Drstat: B3
HD I/O Error Fun: 30 Cyl: 197 Hd: 4 Sec: 11 Status: 51 Estat: 10 Drstat: A4
HD I/O Error Fun: 30 Cyl: 346 Hd: 2 Sec:  7 Status: 51 Estat: 10 Drstat: B2

Exactly what do these messages mean?  Furthermore, is there a way
to have the messages logged to a file?  If (as I expect) they indicate
disk errors, does System V/AT gracefully switch to alternate areas
in the face of disk write errors?  

How does the bad block mechanism work?  If bad block mapping is not 
done automatically, how do I translate the above into a badblock update?
How many bad blocks are allowed?  If I have multiple System5 partitions,
how do I enter the initial bad block information for the 2nd and
subsequent partitions?

I would appreciate any help regarding this problem.  Especially
in understanding the bad block mechanism and the meaning of the
HD I/O Errors.  If any information is dependent on a particular
release of System V/AT, please be specific.  I will summarize all
email responses.

For those who wish to read on, I have attached relevant info from
/etc/fdisk, /etc/divvy, and /etc/showbad.

**********************************************************
/etc/fdisk 1 yields:

Drive parameters from fixed disk unit 1
Cylinders       Tracks/Cylinder    Landing Zone         Write Precomp
  982              7                    982                -1

Display Partition Information

Partition       Status  Type            Start   End     Size    Blocks
   4            N       unknown         0       0       0       0
   3            N       unknown         0       0       0       0
   2            N       unknown         0       0       0       0
   1            A       System5         1       981     981     116739

**********************************************************
/etc/divvy -d 1 yields:

 CONTENTS OF PARTITION END RECORD FOR UNIT #1

                 Drive Table
                 ----- -----
         Number of cylinders:  982
         Number of heads/cylinder:  7
         Landing zone:  982
         Write precomp:  -1
         Sectors/track:  17
         Sector size:  512
         Number of alternate cylinders:  0
         Actual sectors/cylinder:  119
         DOS disk control byte:  0
         DOS compatible null 0:  0
         DOS compatible null 1:  0
         DOS compatible null 2:  0
         DOS compatible null 3:  0
         DOS compatible null 4:  0
         DOS compatible null 5:  0
         Slice table pointer:  0

                 Slice Table
                 ----- -----
Slice 0 ROOT -- first sector:  119, number of sectors:  40000
Slice 1 SWAP -- first sector:  40119, number of sectors:  0
Slice 2 USR -- first sector:  115838, number of sectors:  0
Slice 3 TMP -- first sector:  40119, number of sectors:  75719
Slice 4 Reserved -- first sector:  26000, number of sectors:  0
Slice 5 DOS partition -- first sector: 0, number of sectors: 0
Slice 6 UNIX partition #1 -- first sector: 119, number of sectors: 116739
Slice 7 UNIX partition #2 -- first sector: 0, number of sectors: 0
Slice 8 UNIX partition #3 -- first sector: 0, number of sectors: 0
Slice 9 UNIX partition #4 -- first sector: 0, number of sectors: 0
Slice 10 Entire disk -- first sector: 0, number of sectors: 116858
Slice 11 Last track active pt -- first sector: 116841, number of sectors: 17

             Minor Device Table
             ----- ------ -----
  Note that the Winchester driver ONLY uses the information
  stored in the minor device table of the partition end
  record of the primary drive (unit 0).

    i1010minor[0] (unit 0, slice 0):  0
    i1010minor[1] (unit 0, slice 1):  1
    i1010minor[2] (unit 0, slice 2):  2
    i1010minor[3] (unit 0, slice 3):  3
    i1010minor[4] (unit 0, slice 4):  4
    i1010minor[5] (unit 0, slice 5):  5
    i1010minor[6] (unit 0, slice 6):  6
    i1010minor[7] (unit 0, slice 7):  7
    i1010minor[8] (unit 0, slice 8):  8
    i1010minor[9] (unit 0, slice 9):  9
    i1010minor[10] (unit 0, slice 10):  10
    i1010minor[11] (unit 0, slice 11):  11
    i1010minor[12] (reserved):  0
    i1010minor[13] (reserved):  0
    i1010minor[14] (reserved):  0
    i1010minor[15] (reserved):  0
    i1010minor[16] (reserved):  0
    i1010minor[17] (reserved):  0
    i1010minor[18] (reserved):  0
    i1010minor[19] (reserved):  0
    i1010minor[20] (unit 1, slice 0):  1040
    i1010minor[21] (unit 1, slice 1):  1041
    i1010minor[22] (unit 1, slice 2):  1042
    i1010minor[23] (unit 1, slice 3):  1043
    i1010minor[24] (unit 1, slice 4):  1044
    i1010minor[25] (unit 1, slice 5):  1045
    i1010minor[26] (unit 1, slice 6):  1046
    i1010minor[27] (unit 1, slice 7):  1047
    i1010minor[28] (unit 1, slice 8):  1048
    i1010minor[29] (unit 1, slice 9):  1049
    i1010minor[30] (unit 1, slice 10):  1050
    i1010minor[31] (unit 1, slice 11):  1051

**********************************************************
/etc/showbad 1 yields (a lot of bad blocks):

                Bad Track Table - Unit 1 
    Bad Cylinder    Bad Head     Alt. Cylinder      Alt. Head

        28              3               974             0
        33              1               974             1
        40              1               974             2
        41              1               974             3
        63              1               974             4
        77              1               974             5
        119             0               974             6
        122             1               975             0
        123             1               975             1
        124             1               975             2
        141             0               975             3
        211             1               975             4
        230             1               975             5
        474             4               975             6
        643             4               976             0
        700             3               976             1
        719             3               976             2
        735             3               976             3
        736             3               976             4
        740             3               976             5
        792             4               976             6
        794             4               978             1
        795             4               977             0
        800             1               977             1
        831             3               977             2
        843             3               977             3
        849             3               977             4
        859             3               977             5
        874             3               977             6
        968             3               978             0
**********************************************************

Thanks, 
-- 
Bob Thrush                 UUCP: {rtmvax,ucf-cs}!tarpit!rd
Automation Intelligence,   1200 W. Colonial Drive, Orlando, Florida 32804

larry@focsys.UUCP (Larry Williamson) (12/19/88)

In article <460@tarpit.UUCP> rd@tarpit.UUCP (Bob Thrush) writes:
>About 3 months ago, the 2nd drive on this System V/AT 2.3.1 system died.
>It was replaced, the entire drive was formatted [...]
> [...] I have been noticing intermittent "HD I/O Errors ..." often 
>followed by serious file system problems on the replacement drive.  
>
>Here are a few samples:
>
>HD I/O Error Fun: 30 Cyl: 329 Hd: 5 Sec:  9 Status: 51 Estat: 10 Drstat: A5
>
>Exactly what do these messages mean?

This means, you've got trouble. We had been running with two hard disks
for a few months (maybe 6 or 7??) with no troubles. Then I started to
see the occassional hard disk error. Then near the end, there were many
errors. 

It got to the point where we could not even backup this second disk. The
errors caused cpio (and tar) to die. (The first disk continued to work
just fine, there was an occassional error but never any trouble).

We upgraded to 2.4 and errors have disappeared completely. We also replaced
the disk, I couldn't bring myself to trust it.

I'm not sure why, but it seemed that the disk errors grew at an exponential
rate. I would therefore suggest that you *very quickly*, get your 2.4 upgrade
and install it. I would also suggest that you verify your backups, you might
be surprised by what is on (or not on) those tapes!

Good Luck,
    Larry

-- 
Larry Williamson  -- Focus Systems -- Waterloo, Ontario
                  watmath!focsys!larry  (519) 746-4918

pcg@aber-cs.UUCP (Piercarlo Grandi) (12/21/88)

In article <326@focsys.UUCP> larry@focsys.UUCP (Larry Williamson) writes:

    In article <460@tarpit.UUCP> rd@tarpit.UUCP (Bob Thrush) writes:

	[ .... io errors on two drive system .... ]
    
    [ .... io errors as well .... ]
    
    We upgraded to 2.4 and errors have disappeared completely. We also replaced
    the disk, I couldn't bring myself to trust it.

The bad block handling code in 2.3 was horribly braindamaged. It did not
recover from soft errors, and then wrote random trash in random blocks.  The
disk instead you could have truested; it was clearly a case of environmental
(dis)adaptation of the format.

    I'm not sure why, but it seemed that the disk errors grew at an
    exponential rate.

A folksy description of a common problem follows.

Winchester disks are very delicate things. If operating temperature changes,
etc..., they suffer contraction/expansion of the surfaces, or of the heads
etc..., and what was previously recorded may become gibberish.  This does not
imply that the surface has become damaged though, simply that it has become
difficult to read back the recorded format.

The sumptoms are an increase of the number of soft errors, and then of hard
errors.  The cure is to reformat the disk. By the way, never trust a
preformatted disk; always reformat it on site, in the place where the machine
will be used, in its typical operating conditions.

    I would therefore suggest that you *very quickly*, get your 2.4 upgrade
    and install it.

The advantage of 2.4 is that bad block handling now is said to be ok.
previously if a read from a disk failed, it was not retried at all (even if
most errors are soft), and the buffer cache slot that was assigned to the
block to be read was not marked invalid. If and when written back to disk,
the previous contents of that slot would overwrite the contents of the disk
block, with astonishing results.

    I would also suggest that you verify your backups, you might be surprised
    by what is on (or not on) those tapes!

I would also suggest not to trust the current contents of your disks, unless
you check them. Note that I said *contents*, not just *structure*, i.e.  some
of your files contents may have been corrupted.
-- 
Piercarlo "Peter" Grandi			INET: pcg@cs.aber.ac.uk
Sw.Eng. Group, Dept. of Computer Science	UUCP: ...!mcvax!ukc!aber-cs!pcg
UCW, Penglais, Aberystwyth, WALES SY23 3BZ (UK)

rd@tarpit.UUCP (Bob Thrush) (12/21/88)

In article <326@focsys.UUCP> larry@focsys.UUCP (Larry Williamson) writes:
>In article <460@tarpit.UUCP> rd@tarpit.UUCP (Bob Thrush) writes:
>>About 3 months ago, the 2nd drive on this System V/AT 2.3.1 system died.
>> [...] I have been noticing intermittent "HD I/O Errors ..." 
>>[...] Exactly what do these messages mean?
>
>This means, you've got trouble. [...]
>We upgraded to 2.4 and errors have disappeared completely. We also replaced
>the disk, I couldn't bring myself to trust it.
>
>I'm not sure why, but it seemed that the disk errors grew at an exponential
>rate. I would therefore suggest that you *very quickly*, get your 2.4 upgrade
>and install it. I would also suggest that you verify your backups, you might
>be surprised by what is on (or not on) those tapes!

Larry, thanks for the advice.  I've had 2.4 since it was announced. 
However, I have heard (in this newsgroup) so many reports of problems 
with 2.4, ie. keyboard lockups, different curses problems (that I 
found workarounds to in 2.3.1) that I have been reluctant to trade in
the devil I (sort of) know for 2.4.  Are these problem reports regarding
2.4 not as serious as a casual reader would assume?  Has Microport
made any comment regarding the 2.4 problem reports?

The 2nd disk (that I'm having trouble with) is mostly used as the news
spool directory, so it is definitely getting a whole lot different 
activity than it did before the onset of the problems.  Each time the
problem shows up, I find that each subsequent fsck finds more problems,
usually associated with duplicates in the free list.  I wind up
mkfs'ing the news file system to correct(?) the problem.  I am usually
able to restore most of the news spool directory from a backup tape
made when I first notice a problem (I don't backup news routinely).
I have noticed that one cpio was hosed part way in.  When restoring,
cpio reported something like "the archive is not in cpio format".  I
investigated this further on a Tektronix workstation that was able
to read the "cpio -ocv" format and found 2 places where the cpio
header contained (probably) the correct file size but the following
data was short by exactly 8192 bytes.  I edited the headers (subtracted
8192 from the size) and was able to successfully restore from the
tape.  Fortunately, the two truncated articles were not in newsgroups
that our site regularly reads.

I'm tempted to rebuild my 2.3.1 kernel with the hard disk driver
from 2.4 to narrow down the problem.  Any comments from the net
or Microport regarding this possibility? Since I'm leaving ASAP
for Xmas holiday, I won't be responding soon to this group; however,
I will followup when I return. 

BTW, I got a complete rundown of the meaning of the hard disk i/o
errors from Randy Jarrett who copied a posting <358@uport.UUCP>
by Marc de Groot (then of Microport).  When I return from the
holidays, I'll repost that if there is interest.  Thanks, Randy
(and Marc).

I'm still interested in knowing how Microport System V/AT 
handles bad blocks.

>
>Good Luck,
>    Larry
>
>-- 
>Larry Williamson  -- Focus Systems -- Waterloo, Ontario
>                  watmath!focsys!larry  (519) 746-4918
-- 
Bob Thrush                 UUCP: {rtmvax,ucf-cs}!tarpit!rd
Automation Intelligence,   1200 W. Colonial Drive, Orlando, Florida 32804

markz@ssc.UUCP (Mark Zenier) (12/24/88)

In article <464@tarpit.UUCP>, rd@tarpit.UUCP (Bob Thrush) writes:
> Are these problem reports regarding
> 2.4 not as serious as a casual reader would assume?  

2.4 is a great improvement over 2.3.  The screen seems a bit faster and
the floppy disk driver doesn't crash the system if the wrong device is used
for doscp.  And my keyboard locked up just as much with 2.3, until I bought 
a decent one.

> The 2nd disk (that I'm having trouble with) is mostly used as the news
> spool directory, so it is definitely getting a whole lot different 
> activity than it did before the onset of the problems.  Each time the
> problem shows up, I find that each subsequent fsck finds more problems,
> usually associated with duplicates in the free list.  

Your problem sounds like the canonical Two Drive bug that was the topic of
much discussion here a couple of months ago.  The problem was (microport
correct me if I'm wrong) that divvy didn't set up the bad track areas on
the second drive correctly.  This is fixed either with 2.4 or by getting
a fixed 2.3 divvy utility from the uport bulliten board.


Mark Zenier    uunet!nwnexus!pilchuck!ssc!markz    markz@ssc.uucp
                      uw-beaver!tikal!

steve@nuchat.UUCP (Steve Nuchia) (12/25/88)

In article <464@tarpit.UUCP> rd@tarpit.UUCP (Bob Thrush) writes:
[concerning microbug phantom disk errors on second drive]

>The 2nd disk (that I'm having trouble with) is mostly used as the news
>spool directory, so it is definitely getting a whole lot different 
>activity than it did before the onset of the problems.  Each time the

From my extensive experience with this problem if it gets you it
gets you in proportion to the frequency of write access.  News
spool is about the worst thing to put out there but I kept mine
there because I didn't want the errors eating anything I wanted
to keep.  Now I'm using Interacteve on Bell Tech.  Still have
some problems but nothing like Microport.  I spent a year and
a half of my life working with those clowns.  Boy am I a sucker.

>problem shows up, I find that each subsequent fsck finds more problems,
>usually associated with duplicates in the free list.  I wind up
>mkfs'ing the news file system to correct(?) the problem.  I am usually

The problem here is a BUG in FSCK.  There is a workaround.  I know
of at least two people in Microport who have been assigned to fix
it, I don't know if either of them made any more progress than I did.

The bug is that, for large filesystems, fsck's free block bitmap
gets corrupted.  The bitmap is built in phase 1, corrupted in phase 2
by an as-yet undiscovered mechanism, and used to rebuild a bad freelist
in phase 5/6.  Note that it will report a bad freelist on a perfectly
good filesystem, then proceed to trash it, if you let it.  When it
rebuilds a random freelist it uses some blocks assigned to files
as freelist chain block, corrupting the files.  When some of those
blocks fall in directories you really get filesystem hash.

The workaround is to run fsck on your filesystem but NOT ALLOW it
to REBUILD THE FREELIST.  Then run fsck -f on it.  The -f option
says to just run phase 1 and 5/6, and it can be allowed to rebuild
the freelist since it didn't scribble on its bitmap in phase 2.

My analysis of the code says that this is a compiler bug, but
there is the possibility that it is a subtle architecture
dependency in fsck itself.  In any case the mechanism appears
to involve aliasing of one or more blocks in fsck's "virtual memory"
code -- it manages a file-backed buffer pool using some of the
most twisted code I've ever laid eyes on.  The problem is not
sensitive to optimization when compiling fsck.  It is extremely
sensitive to the size and contents of your filesystem.  In my
experience filesystems that are small enough to not require a
temporary file are safe.

>BTW, I got a complete rundown of the meaning of the hard disk i/o
>errors from Randy Jarrett who copied a posting <358@uport.UUCP>
>by Marc de Groot (then of Microport).  When I return from the
>holidays, I'll repost that if there is interest.  Thanks, Randy
>(and Marc).

Please do.
-- 
Steve Nuchia	      South Coast Computing Services
uunet!nuchat!steve    POB 890952  Houston, Texas  77289
(713) 964 2462	      Consultation & Systems, Support for PD Software.

trevor@trevan.UUCP (trevor) (01/01/89)

In article <2689@nuchat.UUCP>, steve@nuchat.UUCP (Steve Nuchia) writes:
> 
> The problem here is a BUG in FSCK.  There is a workaround.  I know
> of at least two people in Microport who have been assigned to fix
> it, I don't know if either of them made any more progress than I did.
> 
> The bug is that, for large filesystems, fsck's free block bitmap
> gets corrupted.  The bitmap is built in phase 1, corrupted in phase 2
>.....
> 
> The workaround is to run fsck on your filesystem but NOT ALLOW it
> to REBUILD THE FREELIST.  Then run fsck -f on it.  The -f option
> says to just run phase 1 and 5/6, and it can be allowed to rebuild
> the freelist since it didn't scribble on its bitmap in phase 2.
> 

Well well I spent a whole week trying to sort my disks out and now it
turns out to be FSCK to be at fault. Microport does admit to there being
a problem but it says only with file systems greater tan 130000 blocks.
All my file systems are less than 100,000 blocks and I still get this
problem.

I must thank Steve for a workaround which will help but there is still
the problem of the file system check at boot up. I guess we will have to make
it interactive inorder to stop this self destruction. This means that
unattended reboots after powercuts etc, will not be possible unless
someone can tell us how to prevent fsck from rebuilding the free list
first time round. I guess it might be possible to create some sort of
shell programm to interact with fsck and answer all the questions.

This must be the worst bug in Microports system and is worse than most
viruses. Why didnt Microport warn us of this problem? If they knew
about it I think it was totally negligent of them not to have told us.

I think that Microport should make the fixing of this bug their top priority.

" Maynard) (01/02/89)

In article <211@trevan.UUCP> trevor@trevan.UUCP (trevor) writes:
>Well well I spent a whole week trying to sort my disks out and now it
>turns out to be FSCK to be at fault. Microport does admit to there being
>a problem but it says only with file systems greater tan 130000 blocks.
>All my file systems are less than 100,000 blocks and I still get this
>problem.

I first encountered this problem on an 84K block filesystem. I spent a
week with fsdb and fsck, using fsdb to straighten out the worst
problems, and then using fsck to (I thought) straighten out the
filesystem. ARGH!! I finally gave up when Steve told me about the bug.

>I must thank Steve for a workaround which will help but there is still
>the problem of the file system check at boot up. I guess we will have to make
>it interactive inorder to stop this self destruction. This means that
>unattended reboots after powercuts etc, will not be possible unless
>someone can tell us how to prevent fsck from rebuilding the free list
>first time round. I guess it might be possible to create some sort of
>shell programm to interact with fsck and answer all the questions.

I've already done this in response to this problem.
To turn off the automatic fscks at boot time, edit /etc/bcheckrc and
/etc/mountall and remove the -y switch from the fsck command.
I now leave a boot floppy in drive 0 with the door closed, so that in
the event of an automatic reboot, it doesn't even attempt to reboot the
full system; I then manually fsck things from the boot floppy, doing it
twice if the first time claims that the free list needs rebuilding.
This makes it even more important to have at least one partition small
enough to be checked without the need of a work file; fsck that one
first, then mount it on /mnt and use /mnt/foo as the work file for the
rest of them.

>This must be the worst bug in Microports system and is worse than most
>viruses. Why didnt Microport warn us of this problem? If they knew
>about it I think it was totally negligent of them not to have told us.

They didn't know it was fsck causing the problem until Steve took one of
their service techs through crashing a large file system and showed him
how fsck would corrupt it. This only happened a couple of months ago.

As for telling us about known bugs, they only do that for holders of
their misnamed support contracts. I agree that it's negligent for them
not to periodically mail out lists of known bugs. Maybe they're afraid
it'll make their software look buggy.

Actually, it's not that bad of a bug; if you know about the workaround,
it's easy (though time-consuming) to deal with. It'd not be a nuisance
at all if the system didn't repeatedly crash.

>I think that Microport should make the fixing of this bug their top priority.

What? Service their customer base? Radical concept, that.

-- 
Jay Maynard, EMT-P, K5ZC, PP-ASEL   | Never ascribe to malice that which can
uucp:       uunet!nuchat!    (eieio)| adequately be explained by stupidity.
   hoptoad!academ!uhnix1!splut!jay  +----------------------------------------
{killer,bellcore}!tness1!           | Free Texas from its chains: SECEDE!!

steve@nuchat.UUCP (Steve Nuchia) (01/03/89)

In article <798@splut.UUCP> Jay Maynard writes:
>In article <211@trevan.UUCP> trevor@trevan.UUCP (trevor) writes:
>>Well well I spent a whole week trying to sort my disks out and now it
>>turns out to be FSCK to be at fault. Microport does admit to there being
>>a problem but it says only with file systems greater tan 130000 blocks.
>>All my file systems are less than 100,000 blocks and I still get this
>>problem.

>I first encountered this problem on an 84K block filesystem. I spent a
>week with fsdb and fsck, using fsdb to straighten out the worst

With the default number of inodes the problem is rare or nonexistent
under 130000 blocks.  Mkfs will give you something like 13000 inodes
in that case, which is a little light for storing a full news feed.
If you run a 120000 block filesystem with say 20000 inodes it will
definitely trigger the bug, at least when sufficiently full.


[filler line, sorry.]
-- 
Steve Nuchia	      South Coast Computing Services
uunet!nuchat!steve    POB 890952  Houston, Texas  77289
(713) 964 2462	      Consultation & Systems, Support for PD Software.

dougm@uport.UUCP (Doug Moran) (01/06/89)

In article <211@trevan.UUCP> trevor@trevan.UUCP (trevor) writes:
>This must be the worst bug in Microports system and is worse than most
>viruses. Why didnt Microport warn us of this problem? If they knew
>about it I think it was totally negligent of them not to have told us.

In the Release Notes for Release 2.4 of System V/AT, on page R-21,
is the following:

"File systems greater than approx. 130000 blocks experience corruption
over time that fsck can't repair.  fsck may report negative numbers
and corrupt the file system further (#605)."

There *is* a bug in fsck, we *are* aware of it, and we *are*
trying to fix it.  And we did try and warn you.  How can we
we warn you better (no sarcasm intended; I am trying to make
the Release Notes etc. more user-friendly)?

Doug Moran,
Tech. Pubs.

gk@kksys.mn.org (Greg Kemnitz) (01/28/89)

In article <798@splut.UUCP> jay@splut.UUCP (Jay Maynard) writes:
>In article <211@trevan.UUCP> trevor@trevan.UUCP (trevor) writes:
        [ comments about Microport fsck trashing disks...]
>>This must be the worst bug in Microports system and is worse than most
>>viruses. Why didnt Microport warn us of this problem? If they knew
>>about it I think it was totally negligent of them not to have told us.
>
>They didn't know it was fsck causing the problem until Steve took one of
>their service techs through crashing a large file system and showed him
>how fsck would corrupt it. This only happened a couple of months ago.

Actually, they have been aware of it for much longer than that... Well
over a year ago we were experiencing the same problem and had MANY
long discussions with them regarding it.  They informed us that there
was a known problem with fsck, and that "someone is working on it".
This was with the 1.3.6 release.  As of the 2.2 release it still was
not fixed.

We did discover a workaround though...  Replace the system with a '386
running Interactive 386/ix.  Works great!  Also fixes all the other
problems that Microport is "working on".

Of course, this "solution" IS a bit expensive....

Greg Kemnitz / K and K Systems / PO Box 41804 / Plymouth, MN 55441-0804
Domain:  gk@kksys.mn.org  /  UUCP:  ...!rutgers!bungia!kksys!gk
Voice:   (612)475-1527    /  Fax:   (612)475-1979

bill@ssbn.WLK.COM (Bill Kennedy) (01/29/89)

In article <920@kksys.mn.org> gk@kksys.UUCP (Greg Kemnitz) writes:
>In article <798@splut.UUCP> jay@splut.UUCP (Jay Maynard) writes:
>>In article <211@trevan.UUCP> trevor@trevan.UUCP (trevor) writes:
>        [ comments about Microport fsck trashing disks...]
>>>This must be the worst bug in Microports system and is worse than most
>>>viruses. Why didnt Microport warn us of this problem? If they knew
>>>about it I think it was totally negligent of them not to have told us.

[ Greg points out that they did tell us ]

>long discussions with them regarding it.  They informed us that there
>was a known problem with fsck, and that "someone is working on it".
>This was with the 1.3.6 release.  As of the 2.2 release it still was
>not fixed.

It's still documented (and, unfortunately, confirmed) in 2.4.  I'm unsure
of the need for file systems > 130,000 blocks on a '286.  I encountered
it because I needed a half height drive and the one I got was 122Mb, so
I juggled things until the drive and 2.4 were happy with each other (no
help from the install instructions!).  The 72Mb drive was plenty for
what I wanted but it was physically too large.

>We did discover a workaround though...  Replace the system with a '386
>running Interactive 386/ix.  Works great!  Also fixes all the other
>problems that Microport is "working on".
>
>Of course, this "solution" IS a bit expensive....

Here I disagree with Greg but only partially.  He's right on target with
the overall premise, i.e. don't buy Microport.  I disagree that it's
expensive.  If you place any value on your system's reliability, user
satisfaction, or your own time, avoiding Microport is quite cost effective.

I view Microport's "offerings" (no, I will still not dignify them by
calling them "products") as experimental.  What *IS* expensive is what
they charge for experimental works alleged to be products.  I have a
'286 that runs V/AT but it's my luggable that accompanies me when I'm on
the road.  As such, the quirks, bugs, and anomalies are 100% my responsibility
and I am the only one victimized by them.  I expect no support and get
none, so I am never disappointed.  If you are going to run a System V
on an AT/clone, I'm not aware of anything else.  AT&T had a very nice
System V for the PC 6300 PLUS.  I think it will help your blood pressure
if you can accept V/AT as an experiment by experimenters, it does mine.

Changing to a '386 makes a lot of sense if you have to have decent reliability
and user satisfaction (even if you're the only user :-).  Avoiding Microport
makes even more sense.  I tried V/386 and pitched it (and the $$) into the
street when I saw what it was going to do to my uucp neighbors and users
who have come to think of this system as "available, usable, and reliable".
The money hurt because it was a lot of it and it was mine, personally.  I
concluded that I would have spent far more on the telephone and chasing
alleged "problems" and would never achieve what I set out to do.  It was
amazing how my "hardware problems" vanished when I installed AT&T 386 UNIX.
It's ironic how many of those "hardware problems" are documented as bug
fixes in 3.06e and disappointing how many of them would still be wrong
with my equipment if I used 3.06e.

I think that what we have here is a perceptual problem.  I think that the
average '286/'386 user came from one of two camps, down from minis or up
from PC's.  There may be a few who dove in from nowhere but probably not
many.  Those who came down from minis are apalled that fundamental things
(fsck, device drivers, etc.) don't work right.  Those coming up from PC's
are puzzled because their hardware doesn't work right with this new stuff.

The perceptual problem is compounded because we are probably mostly
individuals buying with our own money.  We expect a certain minimum
functionality and we don't get it.  If it was a car or a microwave oven
there's a manufacturer's warranty, statutory relief; with Microport
there's an arrogant snort.  That pisses us off (just like a lemon car)
because it was our own money and our expectations, the reasonable ones,
were neither met nor are they likely to be.  The arrogant snort I refer
to is not from the technically inclined and conscientious personnel at
Microport.  I think that they are as outraged and upset as those of us
whose money pays their salaries.  Management either doesn't care or
won't listen.

So who is the winner and who is the loser?  As long as we, in the
marketplace, keep approving their effort by continuing to spend money
on it, we will lose and management will win.  The situation can not
and will not change until we make it change.  We, the customers,
constructed the (in my opinion) fraud, and it is our responsibility to
make it stop.  Greg made it stop, he changed equipment and vendors.
Now he has achieved the expected minumum functionality and probably more.
Until a clear signal is sent to Microport management, in a language they
understand, we are wasting time and blood pressure being outraged.  For
all of John Plocher's efforts (I believe them to be considerable), have
we seen a significant change?  I haven't.  Can we expect John to make
management apply the resources to produce a respectable product?  I
think not, but you and I can.

Am I a hypocrite for buying, using, and upgrading V/AT?  For my equipment
it's the only game in town and bad breath is better than no breath at all.
Sorry for the length, but I hadn't seen this said before and I thought
it needed saying.
-- 
Bill Kennedy  usenet      {killer,att,cs.utexas.edu,sun!daver}!ssbn!bill
              internet    bill@ssbn.WLK.COM

mrm@sceard.UUCP (M.R.Murphy) (02/01/89)

In article <1131@ssbn.WLK.COM> bill@ssbn.WLK.COM (Bill Kennedy) writes:
>In article <920@kksys.mn.org> gk@kksys.UUCP (Greg Kemnitz) writes:
>>In article <798@splut.UUCP> jay@splut.UUCP (Jay Maynard) writes:
>>>In article <211@trevan.UUCP> trevor@trevan.UUCP (trevor) writes:
[many comments about Microport not fixing bugs deleted...]
>
>Here I disagree with Greg but only partially.  He's right on target with
>the overall premise, i.e. don't buy Microport.  I disagree that it's
>expensive.  If you place any value on your system's reliability, user
>satisfaction, or your own time, avoiding Microport is quite cost effective.

I disagree, see below.

>
>I view Microport's "offerings" (no, I will still not dignify them by
>calling them "products") as experimental.  What *IS* expensive is what
>they charge for experimental works alleged to be products.  I have a
>'286 that runs V/AT but it's my luggable that accompanies me when I'm on
>the road.  As such, the quirks, bugs, and anomalies are 100% my responsibility
>and I am the only one victimized by them.  I expect no support and get
>none, so I am never disappointed.  If you are going to run a System V
>on an AT/clone, I'm not aware of anything else.  AT&T had a very nice
>System V for the PC 6300 PLUS.  I think it will help your blood pressure
>if you can accept V/AT as an experiment by experimenters, it does mine.

Expense is relative. Anyone care to cite the cost history of a non-academic
UNIX(tm) license over the years?

>
>Changing to a '386 makes a lot of sense if you have to have decent reliability
>and user satisfaction (even if you're the only user :-).  Avoiding Microport
>makes even more sense.  I tried V/386 and pitched it (and the $$) into the
>street when I saw what it was going to do to my uucp neighbors and users
>who have come to think of this system as "available, usable, and reliable".

ruptime on our network yields:

acim      up  3+14:43,     0 users # AT&T 6386WGS 20MHZ 4MB,140MB, SVR3.2
getnf8.s  up 20+03:55,     0 users # Clone 286    10MHZ 3MB,60MB, SVAT 2.4
getnfd.s  up 17+14:35,     4 users # Clone 286    10MHZ 5MB,160MB, SVAT 2.2
getnfe.s  up 53+15:13,     3 users # Clone 286    12MHZ 5MB,160MB, SVAT 2.2

Note that getnfe.s has been up over 53 days. It is our news and mail gateway.
This could probably be described as "available, usable, and reliable".
A great deal of care was exercised in the choice of hardware and in the
configuration of the software for the system to make it so.

A lot of the problems with Microport systems stem from the great variety
of hardware that is "real close to just almost like" stuff from IBM(tm). That,
and the fact that UNIX(tm) has had its fair share of timing problems, coding
oversights, and design flaws through all of its releases. These flaws are
carried over from release to release and port to port by folks who are human
and who get to add their own bugs to the system (2.8bsd,2.9bsd,4.1bsd,4.2bsd,
4.3bsd,...:-). Generally, these people are doing the best they can at the time.
I believe that is also true of Microport. That UNIX(tm) works as well as it
does in the great range of environments in which it has found itself (IBM
mainframes running UNIX over a CTS base, Univac(tm) 1100 series mainframes,
on down to 8086's running hacked 22-bit memory management) is amazing. 

>The money hurt because it was a lot of it and it was mine, personally.  I
>concluded that I would have spent far more on the telephone and chasing
>alleged "problems" and would never achieve what I set out to do.  It was
>amazing how my "hardware problems" vanished when I installed AT&T 386 UNIX.

The machine "acim", a stock 386 from AT&T with AT&T SVR3.2, does not appear to
have fewer problems than the clones. The hardware that didn't work in the
clones with the flavor of UNIX that we are using, we dumped. As in, oh, well,
this disk controller doesn't work with the OS, let's just put it in that DOS
machine and get one that does work. Yes, this takes time and effort, but
the resulting system performance is worth the effort (I hope :-).
It is also interesting to note that the prices of the pieces are quite low
when compared with pieces of similar functionality from vendors such as
SUN(tm), DEC(tm), Data General(tm), ...

>It's ironic how many of those "hardware problems" are documented as bug
>fixes in 3.06e and disappointing how many of them would still be wrong
>with my equipment if I used 3.06e.

It may take some experimenting to get a system that is reliable and
that works as a whole. More experimenting than I like, but it can be done.
It also possible to call one of the vendors mentioned above, find a salesperson
who is willing to sell a system, pay vast amounts of money, and have a system
installed and running, without ever touching a keyboard, let alone a
screwdriver.
>
>I think that what we have here is a perceptual problem.  I think that the
>average '286/'386 user came from one of two camps, down from minis or up
>from PC's.  There may be a few who dove in from nowhere but probably not
>many.  Those who came down from minis are apalled that fundamental things
>(fsck, device drivers, etc.) don't work right.  Those coming up from PC's
>are puzzled because their hardware doesn't work right with this new stuff.

I am not apalled that the drivers don't work right. I am disappointed, some-
times a bit dismayed, but understanding of the people who tried to get it
right but goofed up some. If I can, I work around the problem. If I can't,
then I violate my license agreement (just a little:-) and disassemble the
offending code, and see if I can fix it. So far, so good.

>
>The perceptual problem is compounded because we are probably mostly
>individuals buying with our own money.  We expect a certain minimum
>functionality and we don't get it.  If it was a car or a microwave oven
>there's a manufacturer's warranty, statutory relief; with Microport
>there's an arrogant snort.  That pisses us off (just like a lemon car)
>because it was our own money and our expectations, the reasonable ones,
>were neither met nor are they likely to be.  The arrogant snort I refer
>to is not from the technically inclined and conscientious personnel at
>Microport.  I think that they are as outraged and upset as those of us
>whose money pays their salaries.  Management either doesn't care or
>won't listen.

I am not competent to speculate on Microport's management or on the feelings
of the Microport staff. I do feel, however, that the system configuration
and system management problems encountered in setting up and using a
286,1MBram,40MBdisk, 2 user UNIX system may be as difficult as the problems
encountered in setting up a 24MB VAX(tm),1.2GBdisk running BSD. The problem
as I see it is that the little (physically) machine that sits on the desk
may be mentally larger than the mainframe and mini-computer systems that were
required just a few years ago to support multi-programming and multi-tasking
operating systems. The machines have shrunk in size, the support problems
haven't. Individuals and small companies, like ours, couldn't afford the
hardware for UNIX (or the license:-( just a few years ago. Now we all can.
We may not, however, be able to accept the individual burden of support
that systems of this complexity currently demand.

>
>So who is the winner and who is the loser?  As long as we, in the
>marketplace, keep approving their effort by continuing to spend money
>on it, we will lose and management will win.  The situation can not
>and will not change until we make it change.  We, the customers,
>constructed the (in my opinion) fraud, and it is our responsibility to
>make it stop.  Greg made it stop, he changed equipment and vendors.
>Now he has achieved the expected minumum functionality and probably more.
>Until a clear signal is sent to Microport management, in a language they
>understand, we are wasting time and blood pressure being outraged.  For
>all of John Plocher's efforts (I believe them to be considerable), have
>we seen a significant change?  I haven't.  Can we expect John to make
>management apply the resources to produce a respectable product?  I
>think not, but you and I can.

I disagree. I think that, taking into consideration the problems of support
in a widely varying hardware and user expertise environment, all of the
UNIX vendors, not just Microport, have done a rather amazing job. I also
think that the products are more than respectable. Certainly they have bugs.
Freedom from bugs is a necessary and sufficient condition for triviality in
a program :-).

>
>Am I a hypocrite for buying, using, and upgrading V/AT?  For my equipment
>it's the only game in town and bad breath is better than no breath at all.

No, you're not a hipocrite.

>Sorry for the length, but I hadn't seen this said before and I thought
>it needed saying.

Ditto.

>-- 
>Bill Kennedy  usenet      {killer,att,cs.utexas.edu,sun!daver}!ssbn!bill
>              internet    bill@ssbn.WLK.COM

---
Mike Murphy  Sceard Systems, Inc.  544 South Pacific St. San Marcos, CA  92069
mrm@sceard.UUCP       {hp-sdd,nosc,ucsd}!sceard!mrm            +1 619 471 0655

mike@cimcor.mn.org (Michael Grenier) (02/01/89)

>>They didn't know it was fsck causing the problem until Steve took one of
>>their service techs through crashing a large file system and showed him
>>how fsck would corrupt it. This only happened a couple of months ago.
> 
> Actually, they have been aware of it for much longer than that... Well
> over a year ago we were experiencing the same problem and had MANY
> long discussions with them regarding it.  They informed us that there
> was a known problem with fsck, and that "someone is working on it".
> This was with the 1.3.6 release.  As of the 2.2 release it still was
> not fixed.

True, however Microport DOES have a version (probably beta only) that
works fine up to file partitions in the 1/2 gigabyte region (.i.e 1024K
blocks).  I know because mine is working fine on this 180K block
partition.  It now runs in large model and no longer needs a temp file
and thus doesn't corrupt file systems by using it. 

I don't know when it will be released officially but you could probably
get the beta version with a call to John Plocher at Microport.

    -Mike Grenier
     mike@cimcor.mn.org

learn@igloo.UUCP (william vajk) (02/01/89)

In article <871@sceard.UUCP>, mrm@sceard.UUCP (M.R.Murphy) writes:

Flame mode on...what else would one expect....

> Expense is relative. Anyone care to cite the cost history of a non-academic
> UNIX(tm) license over the years?

Who gives a shit what the relative costs are. They promised something not
yet delivered, a WORKING system. There are hidden costs in running this
crapola compared to something that works out of the box, add them in and what 
have you now ????
 
> A lot of the problems with Microport systems stem from the great variety
> of hardware that is "real close to just almost like" stuff from IBM(tm). 

We'be been over this absolute bullshit nonsense time and again. I was told
that microport would run on ANY 286 AT clone, that they found NO
incompatability problems, and that was the 1.3.6 that I bought. Have they
improved in the past 2+ years ? Certainly they have. But they still scribble
the disks in fsck, even in 2.4 Coupled with your elseif below, it is obvious
that hardware isn't a large part of the solution, especially considering
those who switched to xenix and cut their losses earlier realized little
or no hardware problems. The problems are in the code, get it ?

> That UNIX(tm) works as well as it
> does in the great range of environments in which it has found itself (IBM
> mainframes running UNIX over a CTS base, Univac(tm) 1100 series mainframes,
> on down to 8086's running hacked 22-bit memory management) is amazing. 

Let's keep one thing straight here. We're discussing one vendor with one
product. I could care less about products I didn't buy. It doesn't work
correctly here, and in a lot of other places. Most of us that are in this
newsgroup are unix buffs, Many of us have developed a certain unfavorable
passion for this one vendor based on their failure to make timely repairs.
Spouting about the wonders of the base product from which this one was
derived does nothing good, and is simply a waste of bandwidth.
 
> It is also interesting to note that the prices of the pieces are quite low
> when compared with pieces of similar functionality from vendors such as
> SUN(tm), DEC(tm), Data General(tm), ...

Why do you insist it is ok to steal from purchasers 'because it is cheap?'

> I am not apalled that the drivers don't work right. I am disappointed, some-
> times a bit dismayed, but understanding of the people who tried to get it
> right but goofed up some. If I can, I work around the problem. If I can't,
> then I violate my license agreement (just a little:-) and disassemble the
> offending code, and see if I can fix it. So far, so good.
 
And this is the best bit of all. Here we have a gentleman who proports to
have fixed what microport couldn't or wouldn't in over two years, and he
keeps it to himself. I rather think that this explains a lot more about the
author than he thinks. None of the conclusions are very *nice*.

> We may not, however, be able to accept the individual burden of support
> that systems of this complexity currently demand.
 
Compared to what unix or xenix system ?

Do you think reasonable 'support' for a system includes a sysadmin rehacking 
the code that microport screwed up ?
 
> I disagree. I think that, taking into consideration the problems of support
> in a widely varying hardware and user expertise environment, all of the
> UNIX vendors, not just Microport, have done a rather amazing job.

Nice of you to redefine 'support' again. Support in this context is handholding,
a form of training a user. It is NOT fixing of bugs. It is NOT selling repairs 
that should be free. Essentially the only thing microport has done really well
is to sell deffective code and string the users along through several
upgrades while not fixing some of the original problems.

If it were offered, would you as someone apparently favorably inclined towards 
microport sink your life's savings into their stock ? How about Sun or some
of the other vendors you mentioned. I see. And it has nothing to do with
price. It must have to do with performance. If Henry Ford had made as bad
an automobile as uport has a unix release, we'd all still be riding horses.



Bill Vajk		| A hypocrite is a gilded pill, composed of two
learn@igloo		| natural ingredients, natural dishonesty, and
			| artificial dissimulation.     -Overbury-

plocher@uport.UUCP (John Plocher) (02/04/89)

In article <1095@igloo.UUCP> learn@igloo.UUCP (william vajk) writes:
>Flame mode on...what else would one expect....
>
>If it were offered, would you as someone apparently favorably inclined towards 
>microport sink your life's savings into their stock ?
>
>Bill Vajk		| A hypocrite is a gilded pill, composed of two
>learn@igloo		| natural ingredients, natural dishonesty, and
>			| artificial dissimulation.     -Overbury-

Bill,
    A year ago I was a Microport "user" in good old Wisconsin.  I was offered
a chance to give up my University job and move out to California and join the
staff here at Microport.  Not only did I commit my "life savings" (such that
it was...), I committed my financial future and my professional reputation
to the company, in the hope that I could do something to improve the product.
Sure, I could have sat home and bitched at "those guys" over at microport 
every chance I got, but I didn't.  I moved out here and DID something about it.

    In my book, that is more of a risk than any stock could ever be.

    -John Plocher
     Microport Systems

learn@igloo.Scum.COM (william vajk) (02/06/89)

In article <301@uport.UUCP>, plocher@uport.UUCP (John Plocher) writes:

> I moved out here and DID something about it.

I commend your efforts, John, and wish you well. What you say makes
a lot of sense yet also indicates that some two years later there are
still serious problems.