[comp.sys.m6809] ARTICLE

akermanis@trco01.dec.com (11/07/88)

>Finally let me say that for a while I thought I might be infected
>several months ago.  Seems that L2 COBBLER has a bug whereby it insists
>on padding its output bootfiles to exact multiples of 256 bytes
>(sorta like XModem downloads).  The extra garbage doesn't stop
>booting, but does mess up BootMod utilities and shows up on IDENTs.
>Of course this would have been a pretty crude virus if it were one.
>Anyone else notice this Cobbler bug?

I have not experienced your problem, but have seen another CC3DISK.DR problem
from time to time.

Just recently, I added a 20Meg HD using the B&B interface. This is plugged
into the CC Bus that I have.

Because of this change, I OS9GEN'd a new boot disk. The HD booted just fine
and all seemed well. I tried to write to the floppies, and kept getting
Write Verification errors or you could not create a new boot volume.

All I did was reshuffle CC3DISK around and BINGO, everything works fine
now. This bug I still do not understand. Even when all I had was just the
Floppy controller plugged in many moons ago, I would come across this problem.

Any thoughts on this problem or cure ?

John Akermanis

koonce@brahms.berkeley.edu (tim koonce) (11/08/88)

In article <8811071544.AA29817@decwrl.dec.com> akermanis@trco01.dec.com writes:
>
>Because of this change, I OS9GEN'd a new boot disk. The HD booted just fine
>and all seemed well. I tried to write to the floppies, and kept getting
>Write Verification errors or you could not create a new boot volume.
>
>All I did was reshuffle CC3DISK around and BINGO, everything works fine
>now.
>
>Any thoughts on this problem or cure ?
>
>John Akermanis

You've encountered the now-famous Boot List Order Bug (BLOB).  It has
been partly diagnosed as hardware timing problems.  Basically, most
coco peripherals weren't really well designed for the 1.8 meg clock
speed, so there's marginal bus timing for many things.  Floppy
controllers seem especially susceptible for some reason.  Nobody
really knows of a good fix, although specific hardware vendors,
specifically Burke&Burke, and Bruce Isted, have been experimenting
with ways to lessen the likelihood of problems.

Welcome to a large crowd of people with boot-list problems.  The only
known fix is to re-order your boot modules.  A number of articles came
down comp.sys.m6809, comp.sys.os9, and the coco mailing list recently
about this.

Note:  One floppy problem that seems to be especially prevalent is
failures when formatting.  Whenever you put together a new boot, try
formatting a floppy disk.  If that works, then there's a good chance
you've found a good order.  Otherwise, try a new boot list order.


					- Tim Koonce

pete@wlbr.EATON.COM (Pete Lyall) (11/10/88)

In article <8811071544.AA29817@decwrl.dec.com> akermanis@trco01.dec.com writes:
>All I did was reshuffle CC3DISK around and BINGO, everything works fine
>now. This bug I still do not understand. Even when all I had was just the
>Floppy controller plugged in many moons ago, I would come across this problem.
>Any thoughts on this problem or cure ?

This is the now infamous 'boot list order bug'. It's alleged to
be caused by some clocking/gating problems combined with the
positioning of some modules (notably RBF drivers) in memory. You have
found the only current workaraound. Lots of the OS9 heavies have been
studying this one for over a year.

Pete

-- 
Pete Lyall (OS9 Users Group VP)|  DELPHI: OS9UGVP  |  Eaton Corp.(818)-706-5693
Compuserve: 76703,4230 (OS9 Sysop) OS9 (home): (805)-985-0632 (24hr./1200 baud)
Internet: pete@wlbr.eaton.com            UUCP: {hacgate,jplgodo,voder}!wlbr!pete 

neals@tekigm2.TEK.COM (Neal Sedell) (11/11/88)

In article <16634@agate.BERKELEY.EDU> koonce@brahms.berkeley.edu (tim koonce) writes:
>You've encountered the now-famous Boot List Order Bug (BLOB).  It has
>been partly diagnosed as hardware timing problems.  Basically, most
>coco peripherals weren't really well designed for the 1.8 meg clock
>speed, so there's marginal bus timing for many things.

Excuse me if I'm wrong, but hasn't the BLOB been around as long as OS9???
I've had BLOB problems with my trusty old COCO II w/Hard Drive Spec. floppy
controller (unless is was all a bad dream :-) years ago.  I don't eat and
sleep OS9, but I have messed with several floppy and SASI (that's no typo,
unfortunately) interfaces on several different systems.

Having said that, I find it EXTREMELY unlikely that it is a hardware
problem....  The odds against reading a sector from a floppy (or hard)
disk without a CRC error are so low it virtually isn't possible.  And why
oh why would the order the modules are read in matter????  They still
contain the exact same data.  I can imagine no possible "memory" mechanism
that would be sensitive to the order of the boot file, unless it is sensitive
to the destination address in RAM the data is written to, and since the 6809
doesn't MUX the address and data together it seems not too likely.  Consider
also that for a disk read operation the sloppy E timing would contribute
to an over-long read of the controller data register.  That would only help the
data hold timing, and not contribute to an error between the controller and
CPU.  Now, maybe it's the write operations to the controller.  Perhaps the
wrong register gets a short write glitch on the way to addressing the right
register - and maybe the wrong sector gets read....  But then the OS9
CRC would be off.  What does OS9 do when it tries to boot and the necessary
system modules are corrupt???  Surely it doesn't try to execute them anyway.

So we have a situation where we can't get bad data off of a disk sector
to start with, and even if we did OS9 won't try to run it unless it passes
another even more rigorous validity check.  Sure sounds like a software bug
to me.

Then again, until the BLOB is unmasked there is always the chance that
I am wrong.
-- 
Neal Sedell

jejones@mcrware.UUCP (James Jones) (11/11/88)

In article <3783@tekigm2.TEK.COM>, neals@tekigm2.TEK.COM (Neal Sedell) writes:
> Excuse me if I'm wrong, but hasn't the BLOB been around as long as OS9???

No, it hasn't.  OS-9 predates the Color Computer (honest! :-), and the BLOB
only shows up on CoCo systems.

		James Jones

(what, an organization have *my* opinions? <hysterical laughter>)

akermanis@trco01.dec.com (11/15/88)

The BLOB problem with OS-9 Level II sure has generated a lot of theories and 
know facts about causes and fixes.

To understand what is happening both hardware and software wise is a tall chore 
to say the least. A comment that the BLOB problem has been possibly around since 
the Level I days is a surprise to me. I do not disprove this information, just 
that I have never experienced that problem with any of the Level I versions that 
were released. I have been using OS-9 since it was first released for the COCO 
years ago and have always obtained the latest and greatest versions to keep up 
to date.

What I am really getting at, is some additional info on stream for the record 
about BLOB problem. The BLOB problem first surfaced after my third OS9GEN'd 
disk. At this time, all I had on the expansion port was my Floppy Disk 
Controller (old style, with external +12 for it) and started getting 'Write 
Verification Errors'. After replacing some chips on the controller and 
corrupting a few disks, I went out and purchased the latest version of the COCO 
Floppy Controller because of known problems with the old one and the COCO 3 
running at 1.8meg. The new controller once installed, did the same thing as the 
old. This is when the BLOB problem first caught my attention. By going back to 
a copy of the original disk, the problem disappeared.

To make the story a little shorter, as I installed some new hardware from time 
to time, I would make duplicates of my working disks add the hardware and run 
extensive tests to make sure I have created a bug free system disk. I have found 
that coping large files (ie Basic09, OS9Boot) from dir to dir or disk to disk 
would fail consistently if the 'Gen' was flaky or just format a few diskettes.

The last thing I have done is install a B&B interface, WD1002A-WX1 controller 
and a ST225 hard disk system. (The B&B interface was modified to gate SCS and E 
to produce SCSE) This all sits on a CC Bus along with the RS floppy controller 
and the RS232 pak. The only device that ever seems to complain about errors is 
the floppy in any Gen that is bad. The hard disk always runs clean even when the 
floppy is acting up.

The module order that seems to work 95% of the time has always been REL, BOOT, 
OS9P1, RBF, CC3DISK, F0, F1, F2 and after this, it does not seem to matter. (f0 
etc is for the floppy. d0 etc is the hard disk) I also ensure that CC3Disk, RBF 
and descriptors are within the same 8k block.

I agree with the hardware theory previously posted to some extent, this can 
cause some of the boot type problems, but if the system boots and you run into 
floppy problems, this I feel is some sort of software problem. Since with a lone 
COCO and Floppy Controller plugged into the expansion port exhibited the 
symptoms also, I would feel this backs the last statement some what. On a bad 
'GEN' if you continue to push it, you will eventually crash the system.

One could also look at the RS Floppy Controller and say that under certain 
conditions that the controller causes the problem.

1) I am curious to ask others who use SDISK driver or equivalent, have they 
   experienced similar symptoms as with CC3DISK.dr ?

2) If so, then would you think the problem goes beyond CC3Disk ?

3) Has anyone experienced problems with their Hard Disks after a Gen ?

From various pieces of information that has been posted here or on other systems 
as CIS, I seem to sense that CC3Disk.dr may be the other piece of the puzzle 
other than the hardware angle. I have ruled out RBF and other associated modules 
since from my experiences they work well with BBFHDISK.dr from B&B and to date 
have never exhibited the above symptoms like the Floppies.

I know that this may have been hashed out here before, but I have recently 
joined this news group and do not wish to bore anyone with old topics. The topic 
however is quit interesting and a challenge to all of us.

John Akermanis

"If the shoe fits, It's ugly"

knudsen@ihlpl.ATT.COM (Knudsen) (11/16/88)

I haven't had BLOB problems, but I figured that was because I am
very conservative (read lazy and chicken) about making new
bootfiles.  However, sometime before I got my B&B I got a Sardis
no-halt controller.  Whatever else may be said about bugs
in its driver re interrupt handling, the fact is I never had trouble
with it, even when running for a while with the B&B, which
is supposed to be very dangerous.

When I heard about the Sardis SDISK3 incampatibility with B&B,
I just made a new boot with good old CC3DISK, and gave
up the no-halt feature (only bothers me when DSAVEing to
a floppy; otherwise I never use floppies anyway).
Yes, the Sardis works fine as a vanilla controller.

The bottom line is that with plain old CC3DISK (OK, a couple
little patches here and there) I get no problems with the
Sardis controller.  This makes me wonder about the RS controllers;
I think someone mentioned them as one of the things that don't
gate SCS- properly.  And since a FD controller actually does
weirder things with the Coco bus than an HD controller (!),
maybe that's the real weak spot.

How about it, folks?  Any correlation between BLOB grief and
RS -vs- non-RS FD controller?
-- 
Mike Knudsen  Bell Labs(AT&T)   att!ihlpl!knudsen
"Lawyers are like nuclear bombs and PClones.  Nobody likes them,
but the other guy's got one, so I better get one too."