ingoldsby@calgary.UUCP (Terry Ingoldsby) (05/11/86)
After some late night hacking, I think I have been able to locate and correct the bug in Dave Lewis' `Newdisk' device driver for OS9. I hope the following will allow other people to correct their versions. In Dave's code, he uses the NMI generated by the WD1793 disk controller as an asynchronous RTS. The idea is sound but he had a little, itty, bitty teeny, weeny bug. But then, smallpox germs aren't too large either. The good news is that the vaccination for this bug is very easy. Dave's code looks something like this: NMI.SVC fix stack so it looks like the NMI never happened figure out error code effectively do an RTS with the error code in B WRITE2 LDA #$A2 `Write sector' command BSR RWCMDX Execute command WAITWDRQ BITA >STATREG Wait until controller is BEQ WAITWDRQ ready to transfer data * WRTLOOP LDA ,X+ Get byte from data buffer STA >DATAREG Put it in data register STB >DPORT Activate DRQ halt function BRA WRTLOOP Loop until interrupted * RWCMDX LDX PD.BUF,Y Point to sector buffer LDB DPRT.IMG,U Do a side verify using the BITB #$40 DPORT image byte as a side BEQ WTKCMDX select indicator ORA #8 Compare for side 1 WTKCMDX STA >COMDREG Issue command to controller LDB #$A8 Set up DRQ halt function ORB DPRT.IMG,U OR in select bits LDA #2 DRQ bit in status register RTS The idea is that someone calls WRITE2 (or a similar READ or VERIFY routine). This routine then calls the appropriate CMDX routine which tells the disk controller to execute the command, does some other stuff, and then RTS's to the LOOP, where it loops until NMI provides the asynchronous RTS out of WRITE2. This is usually what happens. Most successful commands take at least a few milliseconds to execute and all is well. It turns out that unsuccessful commands return much quicker. Moreover, the delay before returning depends on many almost random factors (angular position of disk, clock relations, etc.). Some error conditions can be detected almost instantaneously. For example, a request to write to a write protected disk can be determined to be an error in a matter of nano or microseconds. Many of you are now shaking your heads and can see what the problem is. For those who are still a bit bleary from late night hacking I will complete the explanation. If the NMI occurs very quickly then it will occur before RWCMDX has had a chance to RTS. This means that the NMI RTS will take us back to the WRTLOOP, where we will loop forever waiting for an event that has already occurred. To fix this bug, simply take the RWCMDX subroutine and insert it (or the WTKCMDX part of it) where previously there was a call to the subroutine. If others have trouble with this, I may (if Dave Lewis grants permission) post the corrected code at a later date. Keep hacking! Terry Ingoldsby
dml@loral.UUCP (05/16/86)
In article <111@vaxb.calgary.UUCP> ingoldsby@calgary.UUCP (Terry Ingoldsby) writes: >After some late night hacking, I think I have been able to locate and >correct the bug in Dave Lewis' `Newdisk' device driver for OS9. I >hope the following will allow other people to correct their versions. > You have not only my permission, but my blessing and any help you need. Your analysis of the problem looks right and the fix is pretty simple. The only change needed is to take the command activation out of the subroutine. RWCMDX LDX PD.BUF,Y Point to sector buffer LDB DPRT.IMG,U Do a side verify using the BITB #$40 DPORT image byte as a side BEQ WTKCMDX select indicator ORA #8 Compare for side 1 WTKCMDX LDB #$A8 Set up DRQ halt function ORB DPRT.IMG,U OR in select bits RTS And the calling routines should be modified to: (in four places I believe) --> ADDED LINES WRITE2 LDA #$A2 `Write sector' command BSR RWCMDX Set up for disk command --> STA >COMDREG Issue command to controller --> LDA #2 Identify DRQ status bit WAITWDRQ BITA >STATREG Wait until controller is BEQ WAITWDRQ ready to transfer data * WRTLOOP LDA ,X+ Get byte from data buffer STA >DATAREG Put it in data register STB >DPORT Activate DRQ halt function BRA WRTLOOP Loop until interrupted The whole idea being to get that command store instruction out of the (insert weird punctuation here) SUBROUTINE! >In Dave's code, he uses the NMI generated by the WD1793 disk controller >as an asynchronous RTS. The idea is sound but he had a little, itty, bitty Don't be bashful -- just say "*!BUG!!!*" With hair and nippy claws! The NMI/RTS wasn't my idea; the standard Radio Shack CCDisk module does it that way, and so does the [GGAAAKKKK!] Disk Basic driver. The hardware's set up that way you see. The 6809/6883 in normal speed mode is not fast enough to run a polling loop in the time it takes the disk controller chip to read a byte. The super-tight infinite loop in there now takes 17 machine cycles per iteration. That's more than 19 microseconds of the ~25 microsec available. So, the 6809 is synchronized to the 1793 by tying DRQ (data ready) to HALT! The NMI is used to break it out of this loop. I will change the driver on my system this week or next and see how it works. Many thanx to Terry for finding and exterminating this lil ba???rd. >>>>>>> Terry Ingoldsby ------------------------------- Dave Lewis Loral Instrumentation San Diego sdcsvax--\ gould9 --\ ihnp4 ---->-->!sdcc3 ---->--->!loral!dml (uucp) sdcrdcf -/ crash ---/ "...got the most in you and use the least. Got a million in you and spend pennies. Got a genius in you and think crazies. Got a heart in you and feel empties." -------------------------------
ingoldsby@calgary.UUCP (05/22/86)
In article <1132@loral.UUCP>, dml@loral.UUCP writes: > In article <111@vaxb.calgary.UUCP> ingoldsby@calgary.UUCP (Terry Ingoldsby) > writes: > >After some late night hacking, I think I have been able to locate and > >correct the bug in Dave Lewis' `Newdisk' device driver for OS9. I > >hope the following will allow other people to correct their versions. > > > > You have not only my permission, but my blessing and any help you need. > Your analysis of the problem looks right and the fix is pretty simple. The > only change needed is to take the command activation out of the subroutine. > > > RWCMDX LDX PD.BUF,Y Point to sector buffer > LDB DPRT.IMG,U Do a side verify using the > BITB #$40 DPORT image byte as a side > BEQ WTKCMDX select indicator > ORA #8 Compare for side 1 > WTKCMDX LDB #$A8 Set up DRQ halt function > ORB DPRT.IMG,U OR in select bits > RTS > > And the calling routines should be modified to: (in four places I believe) > --> ADDED LINES > > WRITE2 LDA #$A2 `Write sector' command > BSR RWCMDX Set up for disk command > --> STA >COMDREG Issue command to controller > --> LDA #2 Identify DRQ status bit > WAITWDRQ BITA >STATREG Wait until controller is > BEQ WAITWDRQ ready to transfer data > * > WRTLOOP LDA ,X+ Get byte from data buffer > STA >DATAREG Put it in data register > STB >DPORT Activate DRQ halt function > BRA WRTLOOP Loop until interrupted > > The whole idea being to get that command store instruction out of the > (insert weird punctuation here) SUBROUTINE! > Alas and alack my fix added a teeny weeny (or was it big and hairy) bug. It is quite subtle and it causes the driver to become flakey. It turns out that Western Digital cannot guarantee that status is valid for `up to' 28usec after the command is given. I went to bed last night before I could exhaust- ively verify that it works, but I believe that a slight delay is needed before the `BITA >STATREG', say PSHS CC,A,B,X,Y PULS CC,A,B,X,Y. Newdisk is like a high spirited horse. Very powerful, and must be given firm but gentle guidance. I will post anymore hints I come up with as I become convinced of them. Terry Ingoldsby