[net.micro.6809] Newdisk bugs all fixed

ingoldsby@calgary.UUCP (Terry Ingoldsby) (05/26/86)

I am happy (you can't imagine how) to announce that I think I've got
all the minor bugs out of Dave Lewis' Newdisk driver and will be
posting the repaired source code to the net either tonight or tomorrow.
In case anyone else out there is working on a disk driver, I will explain
the cause of the bugs since the error is very subtle and I'm sure other
people might fall into the trap.

In Dave Lewis' Newdisk, the code to read and write to the disk goes something
like this:

        STA     >COMDREG     Send command to command register

        LDA     #2
READY   BITA    >STATREG     Check data ready bit in status register
        BEQ     READY

READLOOP LDA    >DATAREG     Get the data from the WD1793
         STA    ,X+          Put it in buffer
         STB    >DPORT       Reset the Halt function
         BRA    READLOOP

         or

WRITELOOP LDA   ,X+
          STA   >DATAREG
          STB   >DPORT
          BRA   WRITELOOP

An interrupt gets you out of the READ or WRITE loops.

Unfortunately, there are two things wrong with this scheme.  First of all,
Western Digital doesn't guarantee that status is meaningful until up to
28 usec after the command is given.  This means that the READY loop may
not function as planned.  The second error is almost invisible but is
really catastrophic.

It turns out that WD meant for the 1793 to be either used in a polled mode
or an interrupt driven mode but not both (ie. polled XOR interrupt).
According to their application notes, using both can result in the
cancellation of the interrupt.  Apparently, it can turn out that reading
the status register at *about* the same time as the interrupt pin is
going to toggle will clear the interrupt, before it ever has a chance to
occur.  This means that the driver will get stuck in one of the loops
forever, waiting for an interrupt which will never occur. 

Several people wrote with good ideas as to what was happening.  The closest
was Kent Myers who reported that:
>Terry, your explanation of the bug is incorrect. The NMI is not enabled
>until the third instruction of the WRTLOOP code is executed. Where the
>thing is hanging up is in the WAITWDRQ code. The write protect status
>is available in the controller immediately and there will never be a
>DRQ from the controller to break the loop. CCDisk uses the exact same
>code for the write loop, but they use the Y register as a timeout on
>the DRQ.

He then lists the code that Tandy uses.  If his disassembly is correct,
I think they fell into the same trap and put a timeout in to get around
it.  I haven't actually disassembled RS's code (not wanting to break my
licensing agreement) so I can only pass on what Kent said.  Fortunately
the correction is simple.  Loosely, it goes something like this:

        STA     >COMDREG
        STB     >DPORT   Halt and wait for DRQ.  If it never comes, thats
*                        OK since the WD1793 will interrupt us out anyway.
        NOP              Needed on READ and VERIFY to allow Halt one
*                        instruction to take effect before accessing the
*                        WD1793
READLOOP LDA    >DATAREG
         STA    ,X+
         STB    >DPORT
         BRA    READLOOP

I hope that this has not bored anyone too much and that it may help
someone in the future.

					Terry Ingoldsby