ingoldsby@calgary.UUCP (Terry Ingoldsby) (05/11/86)
After some late night hacking, I think I have been able to locate and
correct the bug in Dave Lewis' `Newdisk' device driver for OS9. I
hope the following will allow other people to correct their versions.
In Dave's code, he uses the NMI generated by the WD1793 disk controller
as an asynchronous RTS. The idea is sound but he had a little, itty, bitty
teeny, weeny bug. But then, smallpox germs aren't too large either. The
good news is that the vaccination for this bug is very easy. Dave's code
looks something like this:
NMI.SVC fix stack so it looks like the NMI never happened
figure out error code
effectively do an RTS with the error code in B
WRITE2 LDA #$A2 `Write sector' command
BSR RWCMDX Execute command
WAITWDRQ BITA >STATREG Wait until controller is
BEQ WAITWDRQ ready to transfer data
*
WRTLOOP LDA ,X+ Get byte from data buffer
STA >DATAREG Put it in data register
STB >DPORT Activate DRQ halt function
BRA WRTLOOP Loop until interrupted
*
RWCMDX LDX PD.BUF,Y Point to sector buffer
LDB DPRT.IMG,U Do a side verify using the
BITB #$40 DPORT image byte as a side
BEQ WTKCMDX select indicator
ORA #8 Compare for side 1
WTKCMDX STA >COMDREG Issue command to controller
LDB #$A8 Set up DRQ halt function
ORB DPRT.IMG,U OR in select bits
LDA #2 DRQ bit in status register
RTS
The idea is that someone calls WRITE2 (or a similar READ or VERIFY routine).
This routine then calls the appropriate CMDX routine which tells the disk
controller to execute the command, does some other stuff, and then RTS's
to the LOOP, where it loops until NMI provides the asynchronous RTS out of
WRITE2. This is usually what happens. Most successful commands take at
least a few milliseconds to execute and all is well. It turns out that
unsuccessful commands return much quicker. Moreover, the delay before
returning depends on many almost random factors (angular position of disk,
clock relations, etc.). Some error conditions can be detected almost
instantaneously. For example, a request to write to a write protected
disk can be determined to be an error in a matter of nano or microseconds.
Many of you are now shaking your heads and can see what the problem is.
For those who are still a bit bleary from late night hacking I will
complete the explanation. If the NMI occurs very quickly then it will
occur before RWCMDX has had a chance to RTS. This means that the NMI RTS
will take us back to the WRTLOOP, where we will loop forever waiting for
an event that has already occurred. To fix this bug, simply take the RWCMDX
subroutine and insert it (or the WTKCMDX part of it) where previously there
was a call to the subroutine.
If others have trouble with this, I may (if Dave Lewis grants permission)
post the corrected code at a later date.
Keep hacking!
Terry Ingoldsbydml@loral.UUCP (05/16/86)
In article <111@vaxb.calgary.UUCP> ingoldsby@calgary.UUCP (Terry Ingoldsby) writes: >After some late night hacking, I think I have been able to locate and >correct the bug in Dave Lewis' `Newdisk' device driver for OS9. I >hope the following will allow other people to correct their versions. > You have not only my permission, but my blessing and any help you need. Your analysis of the problem looks right and the fix is pretty simple. The only change needed is to take the command activation out of the subroutine. RWCMDX LDX PD.BUF,Y Point to sector buffer LDB DPRT.IMG,U Do a side verify using the BITB #$40 DPORT image byte as a side BEQ WTKCMDX select indicator ORA #8 Compare for side 1 WTKCMDX LDB #$A8 Set up DRQ halt function ORB DPRT.IMG,U OR in select bits RTS And the calling routines should be modified to: (in four places I believe) --> ADDED LINES WRITE2 LDA #$A2 `Write sector' command BSR RWCMDX Set up for disk command --> STA >COMDREG Issue command to controller --> LDA #2 Identify DRQ status bit WAITWDRQ BITA >STATREG Wait until controller is BEQ WAITWDRQ ready to transfer data * WRTLOOP LDA ,X+ Get byte from data buffer STA >DATAREG Put it in data register STB >DPORT Activate DRQ halt function BRA WRTLOOP Loop until interrupted The whole idea being to get that command store instruction out of the (insert weird punctuation here) SUBROUTINE! >In Dave's code, he uses the NMI generated by the WD1793 disk controller >as an asynchronous RTS. The idea is sound but he had a little, itty, bitty Don't be bashful -- just say "*!BUG!!!*" With hair and nippy claws! The NMI/RTS wasn't my idea; the standard Radio Shack CCDisk module does it that way, and so does the [GGAAAKKKK!] Disk Basic driver. The hardware's set up that way you see. The 6809/6883 in normal speed mode is not fast enough to run a polling loop in the time it takes the disk controller chip to read a byte. The super-tight infinite loop in there now takes 17 machine cycles per iteration. That's more than 19 microseconds of the ~25 microsec available. So, the 6809 is synchronized to the 1793 by tying DRQ (data ready) to HALT! The NMI is used to break it out of this loop. I will change the driver on my system this week or next and see how it works. Many thanx to Terry for finding and exterminating this lil ba???rd. >>>>>>> Terry Ingoldsby ------------------------------- Dave Lewis Loral Instrumentation San Diego sdcsvax--\ gould9 --\ ihnp4 ---->-->!sdcc3 ---->--->!loral!dml (uucp) sdcrdcf -/ crash ---/ "...got the most in you and use the least. Got a million in you and spend pennies. Got a genius in you and think crazies. Got a heart in you and feel empties." -------------------------------
ingoldsby@calgary.UUCP (05/22/86)
In article <1132@loral.UUCP>, dml@loral.UUCP writes: > In article <111@vaxb.calgary.UUCP> ingoldsby@calgary.UUCP (Terry Ingoldsby) > writes: > >After some late night hacking, I think I have been able to locate and > >correct the bug in Dave Lewis' `Newdisk' device driver for OS9. I > >hope the following will allow other people to correct their versions. > > > > You have not only my permission, but my blessing and any help you need. > Your analysis of the problem looks right and the fix is pretty simple. The > only change needed is to take the command activation out of the subroutine. > > > RWCMDX LDX PD.BUF,Y Point to sector buffer > LDB DPRT.IMG,U Do a side verify using the > BITB #$40 DPORT image byte as a side > BEQ WTKCMDX select indicator > ORA #8 Compare for side 1 > WTKCMDX LDB #$A8 Set up DRQ halt function > ORB DPRT.IMG,U OR in select bits > RTS > > And the calling routines should be modified to: (in four places I believe) > --> ADDED LINES > > WRITE2 LDA #$A2 `Write sector' command > BSR RWCMDX Set up for disk command > --> STA >COMDREG Issue command to controller > --> LDA #2 Identify DRQ status bit > WAITWDRQ BITA >STATREG Wait until controller is > BEQ WAITWDRQ ready to transfer data > * > WRTLOOP LDA ,X+ Get byte from data buffer > STA >DATAREG Put it in data register > STB >DPORT Activate DRQ halt function > BRA WRTLOOP Loop until interrupted > > The whole idea being to get that command store instruction out of the > (insert weird punctuation here) SUBROUTINE! > Alas and alack my fix added a teeny weeny (or was it big and hairy) bug. It is quite subtle and it causes the driver to become flakey. It turns out that Western Digital cannot guarantee that status is valid for `up to' 28usec after the command is given. I went to bed last night before I could exhaust- ively verify that it works, but I believe that a slight delay is needed before the `BITA >STATREG', say PSHS CC,A,B,X,Y PULS CC,A,B,X,Y. Newdisk is like a high spirited horse. Very powerful, and must be given firm but gentle guidance. I will post anymore hints I come up with as I become convinced of them. Terry Ingoldsby