[net.micro.6809] Newdisk bug fixed

ingoldsby@calgary.UUCP (Terry Ingoldsby) (05/11/86)

After some late night hacking, I think I have been able to locate and
correct the bug in Dave Lewis' `Newdisk' device driver for OS9.  I
hope the following will allow other people to correct their versions.

In Dave's code, he uses the NMI generated by the WD1793 disk controller
as an asynchronous RTS.  The idea is sound but he had a little, itty, bitty
teeny, weeny bug.  But then, smallpox germs aren't too large either.  The
good news is that the vaccination for this bug is very easy.  Dave's code
looks something like this:

NMI.SVC   fix stack so it looks like the NMI never happened
          figure out error code
          effectively do an RTS with the error code in B


WRITE2   LDA #$A2 `Write sector' command
         BSR RWCMDX Execute command
WAITWDRQ BITA >STATREG Wait until controller is
         BEQ WAITWDRQ   ready to transfer data
*
WRTLOOP  LDA ,X+ Get byte from data buffer
         STA >DATAREG Put it in data register
         STB >DPORT Activate DRQ halt function
         BRA WRTLOOP Loop until interrupted
*


RWCMDX   LDX PD.BUF,Y Point to sector buffer
         LDB DPRT.IMG,U Do a side verify using the
         BITB #$40   DPORT image byte as a side
         BEQ WTKCMDX   select indicator
         ORA #8 Compare for side 1
WTKCMDX  STA >COMDREG Issue command to controller
         LDB #$A8 Set up DRQ halt function
         ORB DPRT.IMG,U OR in select bits
         LDA #2 DRQ bit in status register
         RTS


The idea is that someone calls WRITE2 (or a similar READ or VERIFY routine).
This routine then calls the appropriate CMDX routine which tells the disk
controller to execute the command, does some other stuff, and then RTS's
to the LOOP, where it loops until NMI provides the asynchronous RTS out of
WRITE2.  This is usually what happens.   Most successful commands take at
least a few milliseconds to execute and all is well.  It turns out that
unsuccessful commands return much quicker.  Moreover, the delay before
returning depends on many almost random factors (angular position of disk,
clock relations, etc.).  Some error conditions can be detected almost
instantaneously.  For example, a request to write to a write protected
disk can be determined to be an error in a matter of nano or microseconds.

Many of you are now shaking your heads and can see what the problem is.
For those who are still a bit bleary from late night hacking I will
complete the explanation.  If the NMI occurs very quickly then it will
occur before RWCMDX has had a chance to RTS.  This means that the NMI RTS
will take us back to the WRTLOOP, where we will loop forever waiting for
an event that has already occurred.  To fix this bug, simply take the RWCMDX
subroutine and insert it (or the WTKCMDX part of it) where previously there
was a call to the subroutine.

If others have trouble with this, I may (if Dave Lewis grants permission)
post the corrected code at a later date.

Keep hacking!


				Terry Ingoldsby

dml@loral.UUCP (05/16/86)

In article <111@vaxb.calgary.UUCP> ingoldsby@calgary.UUCP (Terry Ingoldsby)
writes:
>After some late night hacking, I think I have been able to locate and
>correct the bug in Dave Lewis' `Newdisk' device driver for OS9.  I
>hope the following will allow other people to correct their versions.
>

  You have not only my permission, but my blessing and any help you need.
Your analysis of the problem looks right and the fix is pretty simple. The
only change needed is to take the command activation out of the subroutine.


RWCMDX   LDX PD.BUF,Y Point to sector buffer
         LDB DPRT.IMG,U Do a side verify using the
         BITB #$40   DPORT image byte as a side
         BEQ WTKCMDX   select indicator
         ORA #8 Compare for side 1
WTKCMDX  LDB #$A8 Set up DRQ halt function
         ORB DPRT.IMG,U OR in select bits
         RTS

  And the calling routines should be modified to: (in four places I believe)
    -->  ADDED LINES

WRITE2   LDA #$A2 `Write sector' command
         BSR RWCMDX Set up for disk command
 -->     STA >COMDREG Issue command to controller
 -->     LDA #2 Identify DRQ status bit
WAITWDRQ BITA >STATREG Wait until controller is
         BEQ WAITWDRQ   ready to transfer data
*
WRTLOOP  LDA ,X+ Get byte from data buffer
         STA >DATAREG Put it in data register
         STB >DPORT Activate DRQ halt function
         BRA WRTLOOP Loop until interrupted

  The whole idea being to get that command store instruction out of the
(insert weird punctuation here) SUBROUTINE!

>In Dave's code, he uses the NMI generated by the WD1793 disk controller
>as an asynchronous RTS.  The idea is sound but he had a little, itty, bitty

  Don't be bashful -- just say "*!BUG!!!*" With hair and nippy claws!
The NMI/RTS wasn't my idea; the standard Radio Shack CCDisk module does it
that way, and so does the [GGAAAKKKK!] Disk Basic driver. The hardware's set
up that way you see. The 6809/6883 in normal speed mode is not fast enough
to run a polling loop in the time it takes the disk controller chip to read
a byte. The super-tight infinite loop in there now takes 17 machine cycles
per iteration. That's more than 19 microseconds of the ~25 microsec available.
So, the 6809 is synchronized to the 1793 by tying DRQ (data ready) to HALT!
The NMI is used to break it out of this loop.

  I will change the driver on my system this week or next and see how it works.
Many thanx to Terry for finding and exterminating this lil ba???rd.

>>>>>>>             Terry Ingoldsby

-------------------------------
		Dave Lewis    Loral Instrumentation   San Diego

    sdcsvax--\     gould9 --\
    ihnp4 ---->-->!sdcc3 ---->--->!loral!dml  (uucp)
    sdcrdcf -/     crash ---/

 "...got the most in you and use the least. Got a million in you and spend
pennies. Got a genius in you and think crazies. Got a heart in you and feel
empties."

-------------------------------

ingoldsby@calgary.UUCP (05/22/86)

In article <1132@loral.UUCP>, dml@loral.UUCP writes:
> In article <111@vaxb.calgary.UUCP> ingoldsby@calgary.UUCP (Terry Ingoldsby)
> writes:
> >After some late night hacking, I think I have been able to locate and
> >correct the bug in Dave Lewis' `Newdisk' device driver for OS9.  I
> >hope the following will allow other people to correct their versions.
> >
> 
>   You have not only my permission, but my blessing and any help you need.
> Your analysis of the problem looks right and the fix is pretty simple. The
> only change needed is to take the command activation out of the subroutine.
> 
> 
> RWCMDX   LDX PD.BUF,Y Point to sector buffer
>          LDB DPRT.IMG,U Do a side verify using the
>          BITB #$40   DPORT image byte as a side
>          BEQ WTKCMDX   select indicator
>          ORA #8 Compare for side 1
> WTKCMDX  LDB #$A8 Set up DRQ halt function
>          ORB DPRT.IMG,U OR in select bits
>          RTS
> 
>   And the calling routines should be modified to: (in four places I believe)
>     -->  ADDED LINES
> 
> WRITE2   LDA #$A2 `Write sector' command
>          BSR RWCMDX Set up for disk command
>  -->     STA >COMDREG Issue command to controller
>  -->     LDA #2 Identify DRQ status bit
> WAITWDRQ BITA >STATREG Wait until controller is
>          BEQ WAITWDRQ   ready to transfer data
> *
> WRTLOOP  LDA ,X+ Get byte from data buffer
>          STA >DATAREG Put it in data register
>          STB >DPORT Activate DRQ halt function
>          BRA WRTLOOP Loop until interrupted
> 
>   The whole idea being to get that command store instruction out of the
> (insert weird punctuation here) SUBROUTINE!
> 

Alas and alack my fix added a teeny weeny (or was it big and hairy) bug.
It is quite subtle and it causes the driver to become flakey.  It turns out
that Western Digital cannot guarantee that status is valid for `up to' 28usec
after the command is given.  I went to bed last night before I could exhaust-
ively verify that it works, but I believe that a slight delay is needed
before the `BITA >STATREG', say  PSHS CC,A,B,X,Y  PULS CC,A,B,X,Y.  Newdisk
is like a high spirited horse.  Very powerful, and must be given firm but
gentle guidance.

I will post anymore hints I come up with as I become convinced of them.

					Terry Ingoldsby