akermanis@trco01.dec.com (11/07/88)
>Finally let me say that for a while I thought I might be infected >several months ago. Seems that L2 COBBLER has a bug whereby it insists >on padding its output bootfiles to exact multiples of 256 bytes >(sorta like XModem downloads). The extra garbage doesn't stop >booting, but does mess up BootMod utilities and shows up on IDENTs. >Of course this would have been a pretty crude virus if it were one. >Anyone else notice this Cobbler bug? I have not experienced your problem, but have seen another CC3DISK.DR problem from time to time. Just recently, I added a 20Meg HD using the B&B interface. This is plugged into the CC Bus that I have. Because of this change, I OS9GEN'd a new boot disk. The HD booted just fine and all seemed well. I tried to write to the floppies, and kept getting Write Verification errors or you could not create a new boot volume. All I did was reshuffle CC3DISK around and BINGO, everything works fine now. This bug I still do not understand. Even when all I had was just the Floppy controller plugged in many moons ago, I would come across this problem. Any thoughts on this problem or cure ? John Akermanis
koonce@brahms.berkeley.edu (tim koonce) (11/08/88)
In article <8811071544.AA29817@decwrl.dec.com> akermanis@trco01.dec.com writes: > >Because of this change, I OS9GEN'd a new boot disk. The HD booted just fine >and all seemed well. I tried to write to the floppies, and kept getting >Write Verification errors or you could not create a new boot volume. > >All I did was reshuffle CC3DISK around and BINGO, everything works fine >now. > >Any thoughts on this problem or cure ? > >John Akermanis You've encountered the now-famous Boot List Order Bug (BLOB). It has been partly diagnosed as hardware timing problems. Basically, most coco peripherals weren't really well designed for the 1.8 meg clock speed, so there's marginal bus timing for many things. Floppy controllers seem especially susceptible for some reason. Nobody really knows of a good fix, although specific hardware vendors, specifically Burke&Burke, and Bruce Isted, have been experimenting with ways to lessen the likelihood of problems. Welcome to a large crowd of people with boot-list problems. The only known fix is to re-order your boot modules. A number of articles came down comp.sys.m6809, comp.sys.os9, and the coco mailing list recently about this. Note: One floppy problem that seems to be especially prevalent is failures when formatting. Whenever you put together a new boot, try formatting a floppy disk. If that works, then there's a good chance you've found a good order. Otherwise, try a new boot list order. - Tim Koonce
pete@wlbr.EATON.COM (Pete Lyall) (11/10/88)
In article <8811071544.AA29817@decwrl.dec.com> akermanis@trco01.dec.com writes: >All I did was reshuffle CC3DISK around and BINGO, everything works fine >now. This bug I still do not understand. Even when all I had was just the >Floppy controller plugged in many moons ago, I would come across this problem. >Any thoughts on this problem or cure ? This is the now infamous 'boot list order bug'. It's alleged to be caused by some clocking/gating problems combined with the positioning of some modules (notably RBF drivers) in memory. You have found the only current workaraound. Lots of the OS9 heavies have been studying this one for over a year. Pete -- Pete Lyall (OS9 Users Group VP)| DELPHI: OS9UGVP | Eaton Corp.(818)-706-5693 Compuserve: 76703,4230 (OS9 Sysop) OS9 (home): (805)-985-0632 (24hr./1200 baud) Internet: pete@wlbr.eaton.com UUCP: {hacgate,jplgodo,voder}!wlbr!pete
neals@tekigm2.TEK.COM (Neal Sedell) (11/11/88)
In article <16634@agate.BERKELEY.EDU> koonce@brahms.berkeley.edu (tim koonce) writes: >You've encountered the now-famous Boot List Order Bug (BLOB). It has >been partly diagnosed as hardware timing problems. Basically, most >coco peripherals weren't really well designed for the 1.8 meg clock >speed, so there's marginal bus timing for many things. Excuse me if I'm wrong, but hasn't the BLOB been around as long as OS9??? I've had BLOB problems with my trusty old COCO II w/Hard Drive Spec. floppy controller (unless is was all a bad dream :-) years ago. I don't eat and sleep OS9, but I have messed with several floppy and SASI (that's no typo, unfortunately) interfaces on several different systems. Having said that, I find it EXTREMELY unlikely that it is a hardware problem.... The odds against reading a sector from a floppy (or hard) disk without a CRC error are so low it virtually isn't possible. And why oh why would the order the modules are read in matter???? They still contain the exact same data. I can imagine no possible "memory" mechanism that would be sensitive to the order of the boot file, unless it is sensitive to the destination address in RAM the data is written to, and since the 6809 doesn't MUX the address and data together it seems not too likely. Consider also that for a disk read operation the sloppy E timing would contribute to an over-long read of the controller data register. That would only help the data hold timing, and not contribute to an error between the controller and CPU. Now, maybe it's the write operations to the controller. Perhaps the wrong register gets a short write glitch on the way to addressing the right register - and maybe the wrong sector gets read.... But then the OS9 CRC would be off. What does OS9 do when it tries to boot and the necessary system modules are corrupt??? Surely it doesn't try to execute them anyway. So we have a situation where we can't get bad data off of a disk sector to start with, and even if we did OS9 won't try to run it unless it passes another even more rigorous validity check. Sure sounds like a software bug to me. Then again, until the BLOB is unmasked there is always the chance that I am wrong. -- Neal Sedell
jejones@mcrware.UUCP (James Jones) (11/11/88)
In article <3783@tekigm2.TEK.COM>, neals@tekigm2.TEK.COM (Neal Sedell) writes: > Excuse me if I'm wrong, but hasn't the BLOB been around as long as OS9??? No, it hasn't. OS-9 predates the Color Computer (honest! :-), and the BLOB only shows up on CoCo systems. James Jones (what, an organization have *my* opinions? <hysterical laughter>)
akermanis@trco01.dec.com (11/15/88)
The BLOB problem with OS-9 Level II sure has generated a lot of theories and know facts about causes and fixes. To understand what is happening both hardware and software wise is a tall chore to say the least. A comment that the BLOB problem has been possibly around since the Level I days is a surprise to me. I do not disprove this information, just that I have never experienced that problem with any of the Level I versions that were released. I have been using OS-9 since it was first released for the COCO years ago and have always obtained the latest and greatest versions to keep up to date. What I am really getting at, is some additional info on stream for the record about BLOB problem. The BLOB problem first surfaced after my third OS9GEN'd disk. At this time, all I had on the expansion port was my Floppy Disk Controller (old style, with external +12 for it) and started getting 'Write Verification Errors'. After replacing some chips on the controller and corrupting a few disks, I went out and purchased the latest version of the COCO Floppy Controller because of known problems with the old one and the COCO 3 running at 1.8meg. The new controller once installed, did the same thing as the old. This is when the BLOB problem first caught my attention. By going back to a copy of the original disk, the problem disappeared. To make the story a little shorter, as I installed some new hardware from time to time, I would make duplicates of my working disks add the hardware and run extensive tests to make sure I have created a bug free system disk. I have found that coping large files (ie Basic09, OS9Boot) from dir to dir or disk to disk would fail consistently if the 'Gen' was flaky or just format a few diskettes. The last thing I have done is install a B&B interface, WD1002A-WX1 controller and a ST225 hard disk system. (The B&B interface was modified to gate SCS and E to produce SCSE) This all sits on a CC Bus along with the RS floppy controller and the RS232 pak. The only device that ever seems to complain about errors is the floppy in any Gen that is bad. The hard disk always runs clean even when the floppy is acting up. The module order that seems to work 95% of the time has always been REL, BOOT, OS9P1, RBF, CC3DISK, F0, F1, F2 and after this, it does not seem to matter. (f0 etc is for the floppy. d0 etc is the hard disk) I also ensure that CC3Disk, RBF and descriptors are within the same 8k block. I agree with the hardware theory previously posted to some extent, this can cause some of the boot type problems, but if the system boots and you run into floppy problems, this I feel is some sort of software problem. Since with a lone COCO and Floppy Controller plugged into the expansion port exhibited the symptoms also, I would feel this backs the last statement some what. On a bad 'GEN' if you continue to push it, you will eventually crash the system. One could also look at the RS Floppy Controller and say that under certain conditions that the controller causes the problem. 1) I am curious to ask others who use SDISK driver or equivalent, have they experienced similar symptoms as with CC3DISK.dr ? 2) If so, then would you think the problem goes beyond CC3Disk ? 3) Has anyone experienced problems with their Hard Disks after a Gen ? From various pieces of information that has been posted here or on other systems as CIS, I seem to sense that CC3Disk.dr may be the other piece of the puzzle other than the hardware angle. I have ruled out RBF and other associated modules since from my experiences they work well with BBFHDISK.dr from B&B and to date have never exhibited the above symptoms like the Floppies. I know that this may have been hashed out here before, but I have recently joined this news group and do not wish to bore anyone with old topics. The topic however is quit interesting and a challenge to all of us. John Akermanis "If the shoe fits, It's ugly"
knudsen@ihlpl.ATT.COM (Knudsen) (11/16/88)
I haven't had BLOB problems, but I figured that was because I am very conservative (read lazy and chicken) about making new bootfiles. However, sometime before I got my B&B I got a Sardis no-halt controller. Whatever else may be said about bugs in its driver re interrupt handling, the fact is I never had trouble with it, even when running for a while with the B&B, which is supposed to be very dangerous. When I heard about the Sardis SDISK3 incampatibility with B&B, I just made a new boot with good old CC3DISK, and gave up the no-halt feature (only bothers me when DSAVEing to a floppy; otherwise I never use floppies anyway). Yes, the Sardis works fine as a vanilla controller. The bottom line is that with plain old CC3DISK (OK, a couple little patches here and there) I get no problems with the Sardis controller. This makes me wonder about the RS controllers; I think someone mentioned them as one of the things that don't gate SCS- properly. And since a FD controller actually does weirder things with the Coco bus than an HD controller (!), maybe that's the real weak spot. How about it, folks? Any correlation between BLOB grief and RS -vs- non-RS FD controller? -- Mike Knudsen Bell Labs(AT&T) att!ihlpl!knudsen "Lawyers are like nuclear bombs and PClones. Nobody likes them, but the other guy's got one, so I better get one too."