[comp.sys.tandy] Bootlist Order Bug

knudsen@ihlpl.ATT.COM (Knudsen) (10/28/88)

A technical explanation of how the OS9 L2 bootlist order bug
has been solved by others, and traced to faulty hardware design
outside of the Coco-III follows.

It's good to hear that the BLOB may have been solved.
(And scary to know it's hiding under my B&B interface!).
"It creeps, and leaps, and glides, and slides acros the floor...."

Since I used to design microcomputer boards and peripherals,
let me explain how I think this hardware bug has been masquerading as
a software (boot list ordering) bug:

As I understand it, some Coco hardware add-ons have NOT been
gating (AND-ing) their address-selected signal (generated by
a comparison of their I/O address with the bus address leads'
current value) with the E-Clock to generate their read-write strobe
pulses.  This omission is strictly a no-no.  Every microprocessor
first puts out an address, and data (if writing to memory or I/O),
and then later asserts some kind of "OK let's do it" strobe pulse.
This delay lets the address and data leads settle, and lets
the address comparator logic in all the other devices make up their
mind whether or not they are selected.

The final strobe gives the selected device the go-ahead to read
or write data.  In the 6809 this strobe is the E clock.
An 8088 or 8086 (PClone) micro uses separate Read and Write
strobes, but the idea is the same.
[Frankly I don't know how Chris Burke screwed this up, since
his HD interface ANDs the E clock with the 6809's R/-W lead
to generate simulated 8086 Read and Write strobes.]

If a peripheral does not wait for the strobe pulse, but just
feeds its address comparitor straight to whatever register
is latching data (on a write), it can get fooled by "glitches"
out of the address comparator.  These glitches occurr
when the new address differs from the previous cycle's address
by having not only some 1's where there were 0's but also the
reverse.  Actually even that isn't necessary to cause a glitch.
The fact is whenever a binary number like an address changes
value in several bit positions, it makes temporary transitions
thru lots of intermediate values.

These other phantom values, and their time durations, depend
on capacitive loading of the bus leads as well as the previous
address.  So using an MPI, or a Y-cable, or just having Backgammon
in slot 2 can all affect the results.  So can temperature,
humidity, ... you get the idea.  No wonder this whole thing was
getting into the realm of Satanism and voodoo.

And since the address previous to any I/O operation is so
important in determining the glitches, you can see why it
matters where OS9's various modules are locted in memory.
Some address combinations glitched your hardware, others would
not.  Note that I/O devices can also respond to cycles
just intended for RAM.

I'm offended that any hardware got on the market with such sloppy
design.  However, I remember hacking my DISTO RAMdisk
to work with the Coco III's 2 MHz clock, and the fix consisted
of gating the address select with E-clock.  (Marty Goodman's fix).

So peripherals CAN get away with it at the slow clock,
but not the fast.  That is also consistent with many users'
experiences aside from the BLOB.
So now you can see why the problem depends on your boot order,
your brands of hardware add-ons, your MPI, and clock speed.
-- 
Mike Knudsen  Bell Labs(AT&T)   att!ihlpl!knudsen
"Lawyers are like nuclear bombs.  Nobody likes them,
but the other guy's got one, so I better get one too."