rd@tarpit.UUCP (Bob Thrush) (12/18/88)
About 3 months ago, the 2nd drive on this System V/AT 2.3.1 system died. It was replaced, the entire drive was formatted, one partition was created with /etc/fdisk and 2 file systems were made. In the past month, I have been noticing intermittent "HD I/O Errors ..." often followed by serious file system problems on the replacement drive. I have searched the printed manuals and man pages and have not found any documentation of this error. Here are a few samples: HD I/O Error Fun: 30 Cyl: 329 Hd: 5 Sec: 9 Status: 51 Estat: 10 Drstat: A5 HD I/O Error Fun: 20 Cyl: 346 Hd: 3 Sec: 12 Status: 59 Estat: 10 Drstat: B3 HD I/O Error Fun: 30 Cyl: 197 Hd: 4 Sec: 11 Status: 51 Estat: 10 Drstat: A4 HD I/O Error Fun: 30 Cyl: 346 Hd: 2 Sec: 7 Status: 51 Estat: 10 Drstat: B2 Exactly what do these messages mean? Furthermore, is there a way to have the messages logged to a file? If (as I expect) they indicate disk errors, does System V/AT gracefully switch to alternate areas in the face of disk write errors? How does the bad block mechanism work? If bad block mapping is not done automatically, how do I translate the above into a badblock update? How many bad blocks are allowed? If I have multiple System5 partitions, how do I enter the initial bad block information for the 2nd and subsequent partitions? I would appreciate any help regarding this problem. Especially in understanding the bad block mechanism and the meaning of the HD I/O Errors. If any information is dependent on a particular release of System V/AT, please be specific. I will summarize all email responses. For those who wish to read on, I have attached relevant info from /etc/fdisk, /etc/divvy, and /etc/showbad. ********************************************************** /etc/fdisk 1 yields: Drive parameters from fixed disk unit 1 Cylinders Tracks/Cylinder Landing Zone Write Precomp 982 7 982 -1 Display Partition Information Partition Status Type Start End Size Blocks 4 N unknown 0 0 0 0 3 N unknown 0 0 0 0 2 N unknown 0 0 0 0 1 A System5 1 981 981 116739 ********************************************************** /etc/divvy -d 1 yields: CONTENTS OF PARTITION END RECORD FOR UNIT #1 Drive Table ----- ----- Number of cylinders: 982 Number of heads/cylinder: 7 Landing zone: 982 Write precomp: -1 Sectors/track: 17 Sector size: 512 Number of alternate cylinders: 0 Actual sectors/cylinder: 119 DOS disk control byte: 0 DOS compatible null 0: 0 DOS compatible null 1: 0 DOS compatible null 2: 0 DOS compatible null 3: 0 DOS compatible null 4: 0 DOS compatible null 5: 0 Slice table pointer: 0 Slice Table ----- ----- Slice 0 ROOT -- first sector: 119, number of sectors: 40000 Slice 1 SWAP -- first sector: 40119, number of sectors: 0 Slice 2 USR -- first sector: 115838, number of sectors: 0 Slice 3 TMP -- first sector: 40119, number of sectors: 75719 Slice 4 Reserved -- first sector: 26000, number of sectors: 0 Slice 5 DOS partition -- first sector: 0, number of sectors: 0 Slice 6 UNIX partition #1 -- first sector: 119, number of sectors: 116739 Slice 7 UNIX partition #2 -- first sector: 0, number of sectors: 0 Slice 8 UNIX partition #3 -- first sector: 0, number of sectors: 0 Slice 9 UNIX partition #4 -- first sector: 0, number of sectors: 0 Slice 10 Entire disk -- first sector: 0, number of sectors: 116858 Slice 11 Last track active pt -- first sector: 116841, number of sectors: 17 Minor Device Table ----- ------ ----- Note that the Winchester driver ONLY uses the information stored in the minor device table of the partition end record of the primary drive (unit 0). i1010minor[0] (unit 0, slice 0): 0 i1010minor[1] (unit 0, slice 1): 1 i1010minor[2] (unit 0, slice 2): 2 i1010minor[3] (unit 0, slice 3): 3 i1010minor[4] (unit 0, slice 4): 4 i1010minor[5] (unit 0, slice 5): 5 i1010minor[6] (unit 0, slice 6): 6 i1010minor[7] (unit 0, slice 7): 7 i1010minor[8] (unit 0, slice 8): 8 i1010minor[9] (unit 0, slice 9): 9 i1010minor[10] (unit 0, slice 10): 10 i1010minor[11] (unit 0, slice 11): 11 i1010minor[12] (reserved): 0 i1010minor[13] (reserved): 0 i1010minor[14] (reserved): 0 i1010minor[15] (reserved): 0 i1010minor[16] (reserved): 0 i1010minor[17] (reserved): 0 i1010minor[18] (reserved): 0 i1010minor[19] (reserved): 0 i1010minor[20] (unit 1, slice 0): 1040 i1010minor[21] (unit 1, slice 1): 1041 i1010minor[22] (unit 1, slice 2): 1042 i1010minor[23] (unit 1, slice 3): 1043 i1010minor[24] (unit 1, slice 4): 1044 i1010minor[25] (unit 1, slice 5): 1045 i1010minor[26] (unit 1, slice 6): 1046 i1010minor[27] (unit 1, slice 7): 1047 i1010minor[28] (unit 1, slice 8): 1048 i1010minor[29] (unit 1, slice 9): 1049 i1010minor[30] (unit 1, slice 10): 1050 i1010minor[31] (unit 1, slice 11): 1051 ********************************************************** /etc/showbad 1 yields (a lot of bad blocks): Bad Track Table - Unit 1 Bad Cylinder Bad Head Alt. Cylinder Alt. Head 28 3 974 0 33 1 974 1 40 1 974 2 41 1 974 3 63 1 974 4 77 1 974 5 119 0 974 6 122 1 975 0 123 1 975 1 124 1 975 2 141 0 975 3 211 1 975 4 230 1 975 5 474 4 975 6 643 4 976 0 700 3 976 1 719 3 976 2 735 3 976 3 736 3 976 4 740 3 976 5 792 4 976 6 794 4 978 1 795 4 977 0 800 1 977 1 831 3 977 2 843 3 977 3 849 3 977 4 859 3 977 5 874 3 977 6 968 3 978 0 ********************************************************** Thanks, -- Bob Thrush UUCP: {rtmvax,ucf-cs}!tarpit!rd Automation Intelligence, 1200 W. Colonial Drive, Orlando, Florida 32804
larry@focsys.UUCP (Larry Williamson) (12/19/88)
In article <460@tarpit.UUCP> rd@tarpit.UUCP (Bob Thrush) writes: >About 3 months ago, the 2nd drive on this System V/AT 2.3.1 system died. >It was replaced, the entire drive was formatted [...] > [...] I have been noticing intermittent "HD I/O Errors ..." often >followed by serious file system problems on the replacement drive. > >Here are a few samples: > >HD I/O Error Fun: 30 Cyl: 329 Hd: 5 Sec: 9 Status: 51 Estat: 10 Drstat: A5 > >Exactly what do these messages mean? This means, you've got trouble. We had been running with two hard disks for a few months (maybe 6 or 7??) with no troubles. Then I started to see the occassional hard disk error. Then near the end, there were many errors. It got to the point where we could not even backup this second disk. The errors caused cpio (and tar) to die. (The first disk continued to work just fine, there was an occassional error but never any trouble). We upgraded to 2.4 and errors have disappeared completely. We also replaced the disk, I couldn't bring myself to trust it. I'm not sure why, but it seemed that the disk errors grew at an exponential rate. I would therefore suggest that you *very quickly*, get your 2.4 upgrade and install it. I would also suggest that you verify your backups, you might be surprised by what is on (or not on) those tapes! Good Luck, Larry -- Larry Williamson -- Focus Systems -- Waterloo, Ontario watmath!focsys!larry (519) 746-4918
pcg@aber-cs.UUCP (Piercarlo Grandi) (12/21/88)
In article <326@focsys.UUCP> larry@focsys.UUCP (Larry Williamson) writes: In article <460@tarpit.UUCP> rd@tarpit.UUCP (Bob Thrush) writes: [ .... io errors on two drive system .... ] [ .... io errors as well .... ] We upgraded to 2.4 and errors have disappeared completely. We also replaced the disk, I couldn't bring myself to trust it. The bad block handling code in 2.3 was horribly braindamaged. It did not recover from soft errors, and then wrote random trash in random blocks. The disk instead you could have truested; it was clearly a case of environmental (dis)adaptation of the format. I'm not sure why, but it seemed that the disk errors grew at an exponential rate. A folksy description of a common problem follows. Winchester disks are very delicate things. If operating temperature changes, etc..., they suffer contraction/expansion of the surfaces, or of the heads etc..., and what was previously recorded may become gibberish. This does not imply that the surface has become damaged though, simply that it has become difficult to read back the recorded format. The sumptoms are an increase of the number of soft errors, and then of hard errors. The cure is to reformat the disk. By the way, never trust a preformatted disk; always reformat it on site, in the place where the machine will be used, in its typical operating conditions. I would therefore suggest that you *very quickly*, get your 2.4 upgrade and install it. The advantage of 2.4 is that bad block handling now is said to be ok. previously if a read from a disk failed, it was not retried at all (even if most errors are soft), and the buffer cache slot that was assigned to the block to be read was not marked invalid. If and when written back to disk, the previous contents of that slot would overwrite the contents of the disk block, with astonishing results. I would also suggest that you verify your backups, you might be surprised by what is on (or not on) those tapes! I would also suggest not to trust the current contents of your disks, unless you check them. Note that I said *contents*, not just *structure*, i.e. some of your files contents may have been corrupted. -- Piercarlo "Peter" Grandi INET: pcg@cs.aber.ac.uk Sw.Eng. Group, Dept. of Computer Science UUCP: ...!mcvax!ukc!aber-cs!pcg UCW, Penglais, Aberystwyth, WALES SY23 3BZ (UK)
rd@tarpit.UUCP (Bob Thrush) (12/21/88)
In article <326@focsys.UUCP> larry@focsys.UUCP (Larry Williamson) writes: >In article <460@tarpit.UUCP> rd@tarpit.UUCP (Bob Thrush) writes: >>About 3 months ago, the 2nd drive on this System V/AT 2.3.1 system died. >> [...] I have been noticing intermittent "HD I/O Errors ..." >>[...] Exactly what do these messages mean? > >This means, you've got trouble. [...] >We upgraded to 2.4 and errors have disappeared completely. We also replaced >the disk, I couldn't bring myself to trust it. > >I'm not sure why, but it seemed that the disk errors grew at an exponential >rate. I would therefore suggest that you *very quickly*, get your 2.4 upgrade >and install it. I would also suggest that you verify your backups, you might >be surprised by what is on (or not on) those tapes! Larry, thanks for the advice. I've had 2.4 since it was announced. However, I have heard (in this newsgroup) so many reports of problems with 2.4, ie. keyboard lockups, different curses problems (that I found workarounds to in 2.3.1) that I have been reluctant to trade in the devil I (sort of) know for 2.4. Are these problem reports regarding 2.4 not as serious as a casual reader would assume? Has Microport made any comment regarding the 2.4 problem reports? The 2nd disk (that I'm having trouble with) is mostly used as the news spool directory, so it is definitely getting a whole lot different activity than it did before the onset of the problems. Each time the problem shows up, I find that each subsequent fsck finds more problems, usually associated with duplicates in the free list. I wind up mkfs'ing the news file system to correct(?) the problem. I am usually able to restore most of the news spool directory from a backup tape made when I first notice a problem (I don't backup news routinely). I have noticed that one cpio was hosed part way in. When restoring, cpio reported something like "the archive is not in cpio format". I investigated this further on a Tektronix workstation that was able to read the "cpio -ocv" format and found 2 places where the cpio header contained (probably) the correct file size but the following data was short by exactly 8192 bytes. I edited the headers (subtracted 8192 from the size) and was able to successfully restore from the tape. Fortunately, the two truncated articles were not in newsgroups that our site regularly reads. I'm tempted to rebuild my 2.3.1 kernel with the hard disk driver from 2.4 to narrow down the problem. Any comments from the net or Microport regarding this possibility? Since I'm leaving ASAP for Xmas holiday, I won't be responding soon to this group; however, I will followup when I return. BTW, I got a complete rundown of the meaning of the hard disk i/o errors from Randy Jarrett who copied a posting <358@uport.UUCP> by Marc de Groot (then of Microport). When I return from the holidays, I'll repost that if there is interest. Thanks, Randy (and Marc). I'm still interested in knowing how Microport System V/AT handles bad blocks. > >Good Luck, > Larry > >-- >Larry Williamson -- Focus Systems -- Waterloo, Ontario > watmath!focsys!larry (519) 746-4918 -- Bob Thrush UUCP: {rtmvax,ucf-cs}!tarpit!rd Automation Intelligence, 1200 W. Colonial Drive, Orlando, Florida 32804
markz@ssc.UUCP (Mark Zenier) (12/24/88)
In article <464@tarpit.UUCP>, rd@tarpit.UUCP (Bob Thrush) writes: > Are these problem reports regarding > 2.4 not as serious as a casual reader would assume? 2.4 is a great improvement over 2.3. The screen seems a bit faster and the floppy disk driver doesn't crash the system if the wrong device is used for doscp. And my keyboard locked up just as much with 2.3, until I bought a decent one. > The 2nd disk (that I'm having trouble with) is mostly used as the news > spool directory, so it is definitely getting a whole lot different > activity than it did before the onset of the problems. Each time the > problem shows up, I find that each subsequent fsck finds more problems, > usually associated with duplicates in the free list. Your problem sounds like the canonical Two Drive bug that was the topic of much discussion here a couple of months ago. The problem was (microport correct me if I'm wrong) that divvy didn't set up the bad track areas on the second drive correctly. This is fixed either with 2.4 or by getting a fixed 2.3 divvy utility from the uport bulliten board. Mark Zenier uunet!nwnexus!pilchuck!ssc!markz markz@ssc.uucp uw-beaver!tikal!
steve@nuchat.UUCP (Steve Nuchia) (12/25/88)
In article <464@tarpit.UUCP> rd@tarpit.UUCP (Bob Thrush) writes: [concerning microbug phantom disk errors on second drive] >The 2nd disk (that I'm having trouble with) is mostly used as the news >spool directory, so it is definitely getting a whole lot different >activity than it did before the onset of the problems. Each time the From my extensive experience with this problem if it gets you it gets you in proportion to the frequency of write access. News spool is about the worst thing to put out there but I kept mine there because I didn't want the errors eating anything I wanted to keep. Now I'm using Interacteve on Bell Tech. Still have some problems but nothing like Microport. I spent a year and a half of my life working with those clowns. Boy am I a sucker. >problem shows up, I find that each subsequent fsck finds more problems, >usually associated with duplicates in the free list. I wind up >mkfs'ing the news file system to correct(?) the problem. I am usually The problem here is a BUG in FSCK. There is a workaround. I know of at least two people in Microport who have been assigned to fix it, I don't know if either of them made any more progress than I did. The bug is that, for large filesystems, fsck's free block bitmap gets corrupted. The bitmap is built in phase 1, corrupted in phase 2 by an as-yet undiscovered mechanism, and used to rebuild a bad freelist in phase 5/6. Note that it will report a bad freelist on a perfectly good filesystem, then proceed to trash it, if you let it. When it rebuilds a random freelist it uses some blocks assigned to files as freelist chain block, corrupting the files. When some of those blocks fall in directories you really get filesystem hash. The workaround is to run fsck on your filesystem but NOT ALLOW it to REBUILD THE FREELIST. Then run fsck -f on it. The -f option says to just run phase 1 and 5/6, and it can be allowed to rebuild the freelist since it didn't scribble on its bitmap in phase 2. My analysis of the code says that this is a compiler bug, but there is the possibility that it is a subtle architecture dependency in fsck itself. In any case the mechanism appears to involve aliasing of one or more blocks in fsck's "virtual memory" code -- it manages a file-backed buffer pool using some of the most twisted code I've ever laid eyes on. The problem is not sensitive to optimization when compiling fsck. It is extremely sensitive to the size and contents of your filesystem. In my experience filesystems that are small enough to not require a temporary file are safe. >BTW, I got a complete rundown of the meaning of the hard disk i/o >errors from Randy Jarrett who copied a posting <358@uport.UUCP> >by Marc de Groot (then of Microport). When I return from the >holidays, I'll repost that if there is interest. Thanks, Randy >(and Marc). Please do. -- Steve Nuchia South Coast Computing Services uunet!nuchat!steve POB 890952 Houston, Texas 77289 (713) 964 2462 Consultation & Systems, Support for PD Software.
trevor@trevan.UUCP (trevor) (01/01/89)
In article <2689@nuchat.UUCP>, steve@nuchat.UUCP (Steve Nuchia) writes: > > The problem here is a BUG in FSCK. There is a workaround. I know > of at least two people in Microport who have been assigned to fix > it, I don't know if either of them made any more progress than I did. > > The bug is that, for large filesystems, fsck's free block bitmap > gets corrupted. The bitmap is built in phase 1, corrupted in phase 2 >..... > > The workaround is to run fsck on your filesystem but NOT ALLOW it > to REBUILD THE FREELIST. Then run fsck -f on it. The -f option > says to just run phase 1 and 5/6, and it can be allowed to rebuild > the freelist since it didn't scribble on its bitmap in phase 2. > Well well I spent a whole week trying to sort my disks out and now it turns out to be FSCK to be at fault. Microport does admit to there being a problem but it says only with file systems greater tan 130000 blocks. All my file systems are less than 100,000 blocks and I still get this problem. I must thank Steve for a workaround which will help but there is still the problem of the file system check at boot up. I guess we will have to make it interactive inorder to stop this self destruction. This means that unattended reboots after powercuts etc, will not be possible unless someone can tell us how to prevent fsck from rebuilding the free list first time round. I guess it might be possible to create some sort of shell programm to interact with fsck and answer all the questions. This must be the worst bug in Microports system and is worse than most viruses. Why didnt Microport warn us of this problem? If they knew about it I think it was totally negligent of them not to have told us. I think that Microport should make the fixing of this bug their top priority.
" Maynard) (01/02/89)
In article <211@trevan.UUCP> trevor@trevan.UUCP (trevor) writes: >Well well I spent a whole week trying to sort my disks out and now it >turns out to be FSCK to be at fault. Microport does admit to there being >a problem but it says only with file systems greater tan 130000 blocks. >All my file systems are less than 100,000 blocks and I still get this >problem. I first encountered this problem on an 84K block filesystem. I spent a week with fsdb and fsck, using fsdb to straighten out the worst problems, and then using fsck to (I thought) straighten out the filesystem. ARGH!! I finally gave up when Steve told me about the bug. >I must thank Steve for a workaround which will help but there is still >the problem of the file system check at boot up. I guess we will have to make >it interactive inorder to stop this self destruction. This means that >unattended reboots after powercuts etc, will not be possible unless >someone can tell us how to prevent fsck from rebuilding the free list >first time round. I guess it might be possible to create some sort of >shell programm to interact with fsck and answer all the questions. I've already done this in response to this problem. To turn off the automatic fscks at boot time, edit /etc/bcheckrc and /etc/mountall and remove the -y switch from the fsck command. I now leave a boot floppy in drive 0 with the door closed, so that in the event of an automatic reboot, it doesn't even attempt to reboot the full system; I then manually fsck things from the boot floppy, doing it twice if the first time claims that the free list needs rebuilding. This makes it even more important to have at least one partition small enough to be checked without the need of a work file; fsck that one first, then mount it on /mnt and use /mnt/foo as the work file for the rest of them. >This must be the worst bug in Microports system and is worse than most >viruses. Why didnt Microport warn us of this problem? If they knew >about it I think it was totally negligent of them not to have told us. They didn't know it was fsck causing the problem until Steve took one of their service techs through crashing a large file system and showed him how fsck would corrupt it. This only happened a couple of months ago. As for telling us about known bugs, they only do that for holders of their misnamed support contracts. I agree that it's negligent for them not to periodically mail out lists of known bugs. Maybe they're afraid it'll make their software look buggy. Actually, it's not that bad of a bug; if you know about the workaround, it's easy (though time-consuming) to deal with. It'd not be a nuisance at all if the system didn't repeatedly crash. >I think that Microport should make the fixing of this bug their top priority. What? Service their customer base? Radical concept, that. -- Jay Maynard, EMT-P, K5ZC, PP-ASEL | Never ascribe to malice that which can uucp: uunet!nuchat! (eieio)| adequately be explained by stupidity. hoptoad!academ!uhnix1!splut!jay +---------------------------------------- {killer,bellcore}!tness1! | Free Texas from its chains: SECEDE!!
steve@nuchat.UUCP (Steve Nuchia) (01/03/89)
In article <798@splut.UUCP> Jay Maynard writes: >In article <211@trevan.UUCP> trevor@trevan.UUCP (trevor) writes: >>Well well I spent a whole week trying to sort my disks out and now it >>turns out to be FSCK to be at fault. Microport does admit to there being >>a problem but it says only with file systems greater tan 130000 blocks. >>All my file systems are less than 100,000 blocks and I still get this >>problem. >I first encountered this problem on an 84K block filesystem. I spent a >week with fsdb and fsck, using fsdb to straighten out the worst With the default number of inodes the problem is rare or nonexistent under 130000 blocks. Mkfs will give you something like 13000 inodes in that case, which is a little light for storing a full news feed. If you run a 120000 block filesystem with say 20000 inodes it will definitely trigger the bug, at least when sufficiently full. [filler line, sorry.] -- Steve Nuchia South Coast Computing Services uunet!nuchat!steve POB 890952 Houston, Texas 77289 (713) 964 2462 Consultation & Systems, Support for PD Software.
dougm@uport.UUCP (Doug Moran) (01/06/89)
In article <211@trevan.UUCP> trevor@trevan.UUCP (trevor) writes: >This must be the worst bug in Microports system and is worse than most >viruses. Why didnt Microport warn us of this problem? If they knew >about it I think it was totally negligent of them not to have told us. In the Release Notes for Release 2.4 of System V/AT, on page R-21, is the following: "File systems greater than approx. 130000 blocks experience corruption over time that fsck can't repair. fsck may report negative numbers and corrupt the file system further (#605)." There *is* a bug in fsck, we *are* aware of it, and we *are* trying to fix it. And we did try and warn you. How can we we warn you better (no sarcasm intended; I am trying to make the Release Notes etc. more user-friendly)? Doug Moran, Tech. Pubs.
gk@kksys.mn.org (Greg Kemnitz) (01/28/89)
In article <798@splut.UUCP> jay@splut.UUCP (Jay Maynard) writes: >In article <211@trevan.UUCP> trevor@trevan.UUCP (trevor) writes: [ comments about Microport fsck trashing disks...] >>This must be the worst bug in Microports system and is worse than most >>viruses. Why didnt Microport warn us of this problem? If they knew >>about it I think it was totally negligent of them not to have told us. > >They didn't know it was fsck causing the problem until Steve took one of >their service techs through crashing a large file system and showed him >how fsck would corrupt it. This only happened a couple of months ago. Actually, they have been aware of it for much longer than that... Well over a year ago we were experiencing the same problem and had MANY long discussions with them regarding it. They informed us that there was a known problem with fsck, and that "someone is working on it". This was with the 1.3.6 release. As of the 2.2 release it still was not fixed. We did discover a workaround though... Replace the system with a '386 running Interactive 386/ix. Works great! Also fixes all the other problems that Microport is "working on". Of course, this "solution" IS a bit expensive.... Greg Kemnitz / K and K Systems / PO Box 41804 / Plymouth, MN 55441-0804 Domain: gk@kksys.mn.org / UUCP: ...!rutgers!bungia!kksys!gk Voice: (612)475-1527 / Fax: (612)475-1979
bill@ssbn.WLK.COM (Bill Kennedy) (01/29/89)
In article <920@kksys.mn.org> gk@kksys.UUCP (Greg Kemnitz) writes: >In article <798@splut.UUCP> jay@splut.UUCP (Jay Maynard) writes: >>In article <211@trevan.UUCP> trevor@trevan.UUCP (trevor) writes: > [ comments about Microport fsck trashing disks...] >>>This must be the worst bug in Microports system and is worse than most >>>viruses. Why didnt Microport warn us of this problem? If they knew >>>about it I think it was totally negligent of them not to have told us. [ Greg points out that they did tell us ] >long discussions with them regarding it. They informed us that there >was a known problem with fsck, and that "someone is working on it". >This was with the 1.3.6 release. As of the 2.2 release it still was >not fixed. It's still documented (and, unfortunately, confirmed) in 2.4. I'm unsure of the need for file systems > 130,000 blocks on a '286. I encountered it because I needed a half height drive and the one I got was 122Mb, so I juggled things until the drive and 2.4 were happy with each other (no help from the install instructions!). The 72Mb drive was plenty for what I wanted but it was physically too large. >We did discover a workaround though... Replace the system with a '386 >running Interactive 386/ix. Works great! Also fixes all the other >problems that Microport is "working on". > >Of course, this "solution" IS a bit expensive.... Here I disagree with Greg but only partially. He's right on target with the overall premise, i.e. don't buy Microport. I disagree that it's expensive. If you place any value on your system's reliability, user satisfaction, or your own time, avoiding Microport is quite cost effective. I view Microport's "offerings" (no, I will still not dignify them by calling them "products") as experimental. What *IS* expensive is what they charge for experimental works alleged to be products. I have a '286 that runs V/AT but it's my luggable that accompanies me when I'm on the road. As such, the quirks, bugs, and anomalies are 100% my responsibility and I am the only one victimized by them. I expect no support and get none, so I am never disappointed. If you are going to run a System V on an AT/clone, I'm not aware of anything else. AT&T had a very nice System V for the PC 6300 PLUS. I think it will help your blood pressure if you can accept V/AT as an experiment by experimenters, it does mine. Changing to a '386 makes a lot of sense if you have to have decent reliability and user satisfaction (even if you're the only user :-). Avoiding Microport makes even more sense. I tried V/386 and pitched it (and the $$) into the street when I saw what it was going to do to my uucp neighbors and users who have come to think of this system as "available, usable, and reliable". The money hurt because it was a lot of it and it was mine, personally. I concluded that I would have spent far more on the telephone and chasing alleged "problems" and would never achieve what I set out to do. It was amazing how my "hardware problems" vanished when I installed AT&T 386 UNIX. It's ironic how many of those "hardware problems" are documented as bug fixes in 3.06e and disappointing how many of them would still be wrong with my equipment if I used 3.06e. I think that what we have here is a perceptual problem. I think that the average '286/'386 user came from one of two camps, down from minis or up from PC's. There may be a few who dove in from nowhere but probably not many. Those who came down from minis are apalled that fundamental things (fsck, device drivers, etc.) don't work right. Those coming up from PC's are puzzled because their hardware doesn't work right with this new stuff. The perceptual problem is compounded because we are probably mostly individuals buying with our own money. We expect a certain minimum functionality and we don't get it. If it was a car or a microwave oven there's a manufacturer's warranty, statutory relief; with Microport there's an arrogant snort. That pisses us off (just like a lemon car) because it was our own money and our expectations, the reasonable ones, were neither met nor are they likely to be. The arrogant snort I refer to is not from the technically inclined and conscientious personnel at Microport. I think that they are as outraged and upset as those of us whose money pays their salaries. Management either doesn't care or won't listen. So who is the winner and who is the loser? As long as we, in the marketplace, keep approving their effort by continuing to spend money on it, we will lose and management will win. The situation can not and will not change until we make it change. We, the customers, constructed the (in my opinion) fraud, and it is our responsibility to make it stop. Greg made it stop, he changed equipment and vendors. Now he has achieved the expected minumum functionality and probably more. Until a clear signal is sent to Microport management, in a language they understand, we are wasting time and blood pressure being outraged. For all of John Plocher's efforts (I believe them to be considerable), have we seen a significant change? I haven't. Can we expect John to make management apply the resources to produce a respectable product? I think not, but you and I can. Am I a hypocrite for buying, using, and upgrading V/AT? For my equipment it's the only game in town and bad breath is better than no breath at all. Sorry for the length, but I hadn't seen this said before and I thought it needed saying. -- Bill Kennedy usenet {killer,att,cs.utexas.edu,sun!daver}!ssbn!bill internet bill@ssbn.WLK.COM
mrm@sceard.UUCP (M.R.Murphy) (02/01/89)
In article <1131@ssbn.WLK.COM> bill@ssbn.WLK.COM (Bill Kennedy) writes: >In article <920@kksys.mn.org> gk@kksys.UUCP (Greg Kemnitz) writes: >>In article <798@splut.UUCP> jay@splut.UUCP (Jay Maynard) writes: >>>In article <211@trevan.UUCP> trevor@trevan.UUCP (trevor) writes: [many comments about Microport not fixing bugs deleted...] > >Here I disagree with Greg but only partially. He's right on target with >the overall premise, i.e. don't buy Microport. I disagree that it's >expensive. If you place any value on your system's reliability, user >satisfaction, or your own time, avoiding Microport is quite cost effective. I disagree, see below. > >I view Microport's "offerings" (no, I will still not dignify them by >calling them "products") as experimental. What *IS* expensive is what >they charge for experimental works alleged to be products. I have a >'286 that runs V/AT but it's my luggable that accompanies me when I'm on >the road. As such, the quirks, bugs, and anomalies are 100% my responsibility >and I am the only one victimized by them. I expect no support and get >none, so I am never disappointed. If you are going to run a System V >on an AT/clone, I'm not aware of anything else. AT&T had a very nice >System V for the PC 6300 PLUS. I think it will help your blood pressure >if you can accept V/AT as an experiment by experimenters, it does mine. Expense is relative. Anyone care to cite the cost history of a non-academic UNIX(tm) license over the years? > >Changing to a '386 makes a lot of sense if you have to have decent reliability >and user satisfaction (even if you're the only user :-). Avoiding Microport >makes even more sense. I tried V/386 and pitched it (and the $$) into the >street when I saw what it was going to do to my uucp neighbors and users >who have come to think of this system as "available, usable, and reliable". ruptime on our network yields: acim up 3+14:43, 0 users # AT&T 6386WGS 20MHZ 4MB,140MB, SVR3.2 getnf8.s up 20+03:55, 0 users # Clone 286 10MHZ 3MB,60MB, SVAT 2.4 getnfd.s up 17+14:35, 4 users # Clone 286 10MHZ 5MB,160MB, SVAT 2.2 getnfe.s up 53+15:13, 3 users # Clone 286 12MHZ 5MB,160MB, SVAT 2.2 Note that getnfe.s has been up over 53 days. It is our news and mail gateway. This could probably be described as "available, usable, and reliable". A great deal of care was exercised in the choice of hardware and in the configuration of the software for the system to make it so. A lot of the problems with Microport systems stem from the great variety of hardware that is "real close to just almost like" stuff from IBM(tm). That, and the fact that UNIX(tm) has had its fair share of timing problems, coding oversights, and design flaws through all of its releases. These flaws are carried over from release to release and port to port by folks who are human and who get to add their own bugs to the system (2.8bsd,2.9bsd,4.1bsd,4.2bsd, 4.3bsd,...:-). Generally, these people are doing the best they can at the time. I believe that is also true of Microport. That UNIX(tm) works as well as it does in the great range of environments in which it has found itself (IBM mainframes running UNIX over a CTS base, Univac(tm) 1100 series mainframes, on down to 8086's running hacked 22-bit memory management) is amazing. >The money hurt because it was a lot of it and it was mine, personally. I >concluded that I would have spent far more on the telephone and chasing >alleged "problems" and would never achieve what I set out to do. It was >amazing how my "hardware problems" vanished when I installed AT&T 386 UNIX. The machine "acim", a stock 386 from AT&T with AT&T SVR3.2, does not appear to have fewer problems than the clones. The hardware that didn't work in the clones with the flavor of UNIX that we are using, we dumped. As in, oh, well, this disk controller doesn't work with the OS, let's just put it in that DOS machine and get one that does work. Yes, this takes time and effort, but the resulting system performance is worth the effort (I hope :-). It is also interesting to note that the prices of the pieces are quite low when compared with pieces of similar functionality from vendors such as SUN(tm), DEC(tm), Data General(tm), ... >It's ironic how many of those "hardware problems" are documented as bug >fixes in 3.06e and disappointing how many of them would still be wrong >with my equipment if I used 3.06e. It may take some experimenting to get a system that is reliable and that works as a whole. More experimenting than I like, but it can be done. It also possible to call one of the vendors mentioned above, find a salesperson who is willing to sell a system, pay vast amounts of money, and have a system installed and running, without ever touching a keyboard, let alone a screwdriver. > >I think that what we have here is a perceptual problem. I think that the >average '286/'386 user came from one of two camps, down from minis or up >from PC's. There may be a few who dove in from nowhere but probably not >many. Those who came down from minis are apalled that fundamental things >(fsck, device drivers, etc.) don't work right. Those coming up from PC's >are puzzled because their hardware doesn't work right with this new stuff. I am not apalled that the drivers don't work right. I am disappointed, some- times a bit dismayed, but understanding of the people who tried to get it right but goofed up some. If I can, I work around the problem. If I can't, then I violate my license agreement (just a little:-) and disassemble the offending code, and see if I can fix it. So far, so good. > >The perceptual problem is compounded because we are probably mostly >individuals buying with our own money. We expect a certain minimum >functionality and we don't get it. If it was a car or a microwave oven >there's a manufacturer's warranty, statutory relief; with Microport >there's an arrogant snort. That pisses us off (just like a lemon car) >because it was our own money and our expectations, the reasonable ones, >were neither met nor are they likely to be. The arrogant snort I refer >to is not from the technically inclined and conscientious personnel at >Microport. I think that they are as outraged and upset as those of us >whose money pays their salaries. Management either doesn't care or >won't listen. I am not competent to speculate on Microport's management or on the feelings of the Microport staff. I do feel, however, that the system configuration and system management problems encountered in setting up and using a 286,1MBram,40MBdisk, 2 user UNIX system may be as difficult as the problems encountered in setting up a 24MB VAX(tm),1.2GBdisk running BSD. The problem as I see it is that the little (physically) machine that sits on the desk may be mentally larger than the mainframe and mini-computer systems that were required just a few years ago to support multi-programming and multi-tasking operating systems. The machines have shrunk in size, the support problems haven't. Individuals and small companies, like ours, couldn't afford the hardware for UNIX (or the license:-( just a few years ago. Now we all can. We may not, however, be able to accept the individual burden of support that systems of this complexity currently demand. > >So who is the winner and who is the loser? As long as we, in the >marketplace, keep approving their effort by continuing to spend money >on it, we will lose and management will win. The situation can not >and will not change until we make it change. We, the customers, >constructed the (in my opinion) fraud, and it is our responsibility to >make it stop. Greg made it stop, he changed equipment and vendors. >Now he has achieved the expected minumum functionality and probably more. >Until a clear signal is sent to Microport management, in a language they >understand, we are wasting time and blood pressure being outraged. For >all of John Plocher's efforts (I believe them to be considerable), have >we seen a significant change? I haven't. Can we expect John to make >management apply the resources to produce a respectable product? I >think not, but you and I can. I disagree. I think that, taking into consideration the problems of support in a widely varying hardware and user expertise environment, all of the UNIX vendors, not just Microport, have done a rather amazing job. I also think that the products are more than respectable. Certainly they have bugs. Freedom from bugs is a necessary and sufficient condition for triviality in a program :-). > >Am I a hypocrite for buying, using, and upgrading V/AT? For my equipment >it's the only game in town and bad breath is better than no breath at all. No, you're not a hipocrite. >Sorry for the length, but I hadn't seen this said before and I thought >it needed saying. Ditto. >-- >Bill Kennedy usenet {killer,att,cs.utexas.edu,sun!daver}!ssbn!bill > internet bill@ssbn.WLK.COM --- Mike Murphy Sceard Systems, Inc. 544 South Pacific St. San Marcos, CA 92069 mrm@sceard.UUCP {hp-sdd,nosc,ucsd}!sceard!mrm +1 619 471 0655
mike@cimcor.mn.org (Michael Grenier) (02/01/89)
>>They didn't know it was fsck causing the problem until Steve took one of >>their service techs through crashing a large file system and showed him >>how fsck would corrupt it. This only happened a couple of months ago. > > Actually, they have been aware of it for much longer than that... Well > over a year ago we were experiencing the same problem and had MANY > long discussions with them regarding it. They informed us that there > was a known problem with fsck, and that "someone is working on it". > This was with the 1.3.6 release. As of the 2.2 release it still was > not fixed. True, however Microport DOES have a version (probably beta only) that works fine up to file partitions in the 1/2 gigabyte region (.i.e 1024K blocks). I know because mine is working fine on this 180K block partition. It now runs in large model and no longer needs a temp file and thus doesn't corrupt file systems by using it. I don't know when it will be released officially but you could probably get the beta version with a call to John Plocher at Microport. -Mike Grenier mike@cimcor.mn.org
learn@igloo.UUCP (william vajk) (02/01/89)
In article <871@sceard.UUCP>, mrm@sceard.UUCP (M.R.Murphy) writes: Flame mode on...what else would one expect.... > Expense is relative. Anyone care to cite the cost history of a non-academic > UNIX(tm) license over the years? Who gives a shit what the relative costs are. They promised something not yet delivered, a WORKING system. There are hidden costs in running this crapola compared to something that works out of the box, add them in and what have you now ???? > A lot of the problems with Microport systems stem from the great variety > of hardware that is "real close to just almost like" stuff from IBM(tm). We'be been over this absolute bullshit nonsense time and again. I was told that microport would run on ANY 286 AT clone, that they found NO incompatability problems, and that was the 1.3.6 that I bought. Have they improved in the past 2+ years ? Certainly they have. But they still scribble the disks in fsck, even in 2.4 Coupled with your elseif below, it is obvious that hardware isn't a large part of the solution, especially considering those who switched to xenix and cut their losses earlier realized little or no hardware problems. The problems are in the code, get it ? > That UNIX(tm) works as well as it > does in the great range of environments in which it has found itself (IBM > mainframes running UNIX over a CTS base, Univac(tm) 1100 series mainframes, > on down to 8086's running hacked 22-bit memory management) is amazing. Let's keep one thing straight here. We're discussing one vendor with one product. I could care less about products I didn't buy. It doesn't work correctly here, and in a lot of other places. Most of us that are in this newsgroup are unix buffs, Many of us have developed a certain unfavorable passion for this one vendor based on their failure to make timely repairs. Spouting about the wonders of the base product from which this one was derived does nothing good, and is simply a waste of bandwidth. > It is also interesting to note that the prices of the pieces are quite low > when compared with pieces of similar functionality from vendors such as > SUN(tm), DEC(tm), Data General(tm), ... Why do you insist it is ok to steal from purchasers 'because it is cheap?' > I am not apalled that the drivers don't work right. I am disappointed, some- > times a bit dismayed, but understanding of the people who tried to get it > right but goofed up some. If I can, I work around the problem. If I can't, > then I violate my license agreement (just a little:-) and disassemble the > offending code, and see if I can fix it. So far, so good. And this is the best bit of all. Here we have a gentleman who proports to have fixed what microport couldn't or wouldn't in over two years, and he keeps it to himself. I rather think that this explains a lot more about the author than he thinks. None of the conclusions are very *nice*. > We may not, however, be able to accept the individual burden of support > that systems of this complexity currently demand. Compared to what unix or xenix system ? Do you think reasonable 'support' for a system includes a sysadmin rehacking the code that microport screwed up ? > I disagree. I think that, taking into consideration the problems of support > in a widely varying hardware and user expertise environment, all of the > UNIX vendors, not just Microport, have done a rather amazing job. Nice of you to redefine 'support' again. Support in this context is handholding, a form of training a user. It is NOT fixing of bugs. It is NOT selling repairs that should be free. Essentially the only thing microport has done really well is to sell deffective code and string the users along through several upgrades while not fixing some of the original problems. If it were offered, would you as someone apparently favorably inclined towards microport sink your life's savings into their stock ? How about Sun or some of the other vendors you mentioned. I see. And it has nothing to do with price. It must have to do with performance. If Henry Ford had made as bad an automobile as uport has a unix release, we'd all still be riding horses. Bill Vajk | A hypocrite is a gilded pill, composed of two learn@igloo | natural ingredients, natural dishonesty, and | artificial dissimulation. -Overbury-
plocher@uport.UUCP (John Plocher) (02/04/89)
In article <1095@igloo.UUCP> learn@igloo.UUCP (william vajk) writes: >Flame mode on...what else would one expect.... > >If it were offered, would you as someone apparently favorably inclined towards >microport sink your life's savings into their stock ? > >Bill Vajk | A hypocrite is a gilded pill, composed of two >learn@igloo | natural ingredients, natural dishonesty, and > | artificial dissimulation. -Overbury- Bill, A year ago I was a Microport "user" in good old Wisconsin. I was offered a chance to give up my University job and move out to California and join the staff here at Microport. Not only did I commit my "life savings" (such that it was...), I committed my financial future and my professional reputation to the company, in the hope that I could do something to improve the product. Sure, I could have sat home and bitched at "those guys" over at microport every chance I got, but I didn't. I moved out here and DID something about it. In my book, that is more of a risk than any stock could ever be. -John Plocher Microport Systems
learn@igloo.Scum.COM (william vajk) (02/06/89)
In article <301@uport.UUCP>, plocher@uport.UUCP (John Plocher) writes: > I moved out here and DID something about it. I commend your efforts, John, and wish you well. What you say makes a lot of sense yet also indicates that some two years later there are still serious problems.