mrr@amanpt1.UUCP (Mark Rinfret) (12/09/87)
Hi. I need help with an I/O programming problem. Some time ago I released a hard disk backup program (MRBackup), with good intentions, but have since become painfully aware of its flaws. I'm doing penance for that right now and trying to clean things up. One of the things I've discovered is that intermittently, during the formatting of a diskette (using the FormatDisk routine that I wrote), the program will hang. As I read the symptoms, the system is not getting an I/O completion interrupt for the formatting of the current track. Since this may or may not be a hardware problem on my system, I thought I'd handle the problem by doing asynchronous I/O to the disk drive and using an async timer, then Wait for either to complete. This is where things fall apart. I wrote a timer package and, as I usually do, added a conditionally compiled main routine to test it. The test runs 10 iterations of two parallel timers, 1 at 5 seconds, 1 at 10 seconds and works perfectly. The modified version of my FormatDisk routine, using SendIO, works fine without the timer. BUT! When I add the timer stuff to FormatDisk, everything falls apart. I usually get two timeouts in quick succession, then the program goes south. Pseudo code for what I'm doing looks like this: ...bunch of initialization... diskBit = disk signal bit from IOExTD request timerBit = timer signal bit from timerequest request (note: these bits checked out OK, values 0x8000000 and 0x4000000) SendIO(disk request) SendIO(timer request) signals = Wait(diskBit | timerBit) if (signals & timerBit) { We had a timeout...set timeout flag } else Abort timer request (AbortIO) GetMsg(timer request) /* pull timer reply from reply port */ if ( ! (signals & diskBit) ) { Disk I/O didn't complete - abort it. } GetMsg(disk request) /* pull disk reply from reply port */ ...rest of routine... Does anyone see anything wrong with the above "logic" or should I be aware of some incompatibility between the trackdisk device and the timer device? If someone is willing to help me with this further, I'll be happy to email you the actual source (Aztec C) that I've written. I've spent far too much time on this problem and would like to get on with improvements to MRBackup. By the way - MRBackup is undergoing a major facelift. I've added a lot more error recovery stuff. If a floppy disk media error is detected, MRBackup will now let you retry with a new floppy. It does this by saving the "context" of the file system at the time each floppy is initialized. I've also redone the Intuition stuff, having redesigned all windows and gadgets using PowerWindows 2.0 (nice program!). MRBackup now has its own screen, nicer menus and a "fuel gauge" for the backup floppy. Many thanks to all who called (everywhere is long-distance from here) to report bugs or suggest changes. Mark -- < Mark R. Rinfret, mrr@amanpt1 | ...rayssd!galaxia!amanpt1!mrr > < Aquidneck Management Associates Home: 401-846-7639 > < 6 John Clark Road Work: 401-849-8900 x56 > < Middletown, RI 02840 "The name has changed but I'm still guilty." >
dillon@CORY.BERKELEY.EDU (Matt Dillon) (12/12/87)
>Since this may or may not be a hardware problem on my system, I thought I'd >handle the problem by doing asynchronous I/O to the disk drive and using >an async timer, then Wait for either to complete. This is where things If other formats work, then I doubt it is a hardware problem. I suggest a bug in your code. The proper way to fix it is to do more testing. Using a 'timeout and assume it worked' method is *very bad* programming technique. :SendIO(disk request) :SendIO(timer request) : :signals = Wait(diskBit | timerBit) : :if (signals & timerBit) { : We had a timeout...set timeout flag :} :else : Abort timer request (AbortIO) : :GetMsg(timer request) /* pull timer reply from reply port */ : :if ( ! (signals & diskBit) ) { : Disk I/O didn't complete - abort it. :} : :GetMsg(disk request) /* pull disk reply from reply port */ >Does anyone see anything wrong with the above "logic" or should I be aware The code is all wrong! (unless you didn't include something I should know about!). (1) If the disk request completes first the only signal you get is diskBit... BUT the timer CAN complete just after that but BEFORE you AbortIO() the timer request... meaning the signal gets set and the NEXT Wait() returns immediately even if the next timer request isn't through. (2) You GetMsg() the request? WHAT? The request is NOT a message port... you CANNOT GetMsg() IT!!!!... Properly, you WaitPort() on the port the message will be replied to, and then GetMsg() from the port or simply Remove() the message. (3) The same illegal signal procedure might also be happenning with your disk request... and you are also GetMsg()ing that which is wrong. The fix is to use SetSignal() to CLEAR THE TWO SIGNALS BEFORE DISPATCHING THE IOREQUESTS. Hope this helps, -Matt