[comp.sys.amiga] Need help with disk, timer I/O.

mrr@amanpt1.UUCP (Mark Rinfret) (12/09/87)

Hi.  I need help with an I/O programming problem.  Some time ago I released
a hard disk backup program (MRBackup), with good intentions, but have since
become painfully aware of its flaws.  I'm doing penance for that right now
and trying to clean things up.  One of the things I've discovered is that
intermittently, during the formatting of a diskette (using the FormatDisk
routine that I wrote), the program will hang.  As I read the symptoms, the
system is not getting an I/O completion interrupt for the formatting of the
current track.

Since this may or may not be a hardware problem on my system, I thought I'd
handle the problem by doing asynchronous I/O to the disk drive and using
an async timer, then Wait for either to complete.  This is where things
fall apart.  I wrote a timer package and, as I usually do, added a
conditionally compiled main routine to test it.  The test runs 10 iterations
of two parallel timers, 1 at 5 seconds, 1 at 10 seconds and works perfectly.
The modified version of my FormatDisk routine, using SendIO, works fine
without the timer.  BUT!  When I add the timer stuff to FormatDisk, everything
falls apart.  I usually get two timeouts in quick succession, then the
program goes south.  Pseudo code for what I'm doing looks like this:

...bunch of initialization...

diskBit = disk signal bit from IOExTD request
timerBit = timer signal bit from timerequest request

(note: these bits checked out OK, values 0x8000000 and 0x4000000)

SendIO(disk request)
SendIO(timer request)

signals = Wait(diskBit | timerBit)

if (signals & timerBit) {
	We had a timeout...set timeout flag
}
else
	Abort timer request (AbortIO)

GetMsg(timer request)	/* pull timer reply from reply port */

if ( ! (signals & diskBit) ) {
	Disk I/O didn't complete - abort it.
}

GetMsg(disk request)	/* pull disk reply from reply port */

...rest of routine...

Does anyone see anything wrong with the above "logic" or should I be aware
of some incompatibility between the trackdisk device and the timer device?
If someone is willing to help me with this further, I'll be happy to email
you the actual source (Aztec C) that I've written.  I've spent far too
much time on this problem and would like to get on with improvements to
MRBackup.

By the way - MRBackup is undergoing a major facelift.  I've added a lot more
error recovery stuff.  If a floppy disk media error is detected, MRBackup
will now let you retry with a new floppy.  It does this by saving the
"context" of the file system at the time each floppy is initialized.  I've
also redone the Intuition stuff, having redesigned all windows and gadgets
using PowerWindows 2.0 (nice program!).  MRBackup now has its own screen,
nicer menus and a "fuel gauge" for the backup floppy.  Many thanks to all
who called (everywhere is long-distance from here) to report bugs or 
suggest changes.

Mark

-- 
< Mark R. Rinfret,        mrr@amanpt1 | ...rayssd!galaxia!amanpt1!mrr        >
< Aquidneck Management Associates       Home: 401-846-7639                   >
< 6 John Clark Road                     Work: 401-849-8900 x56               >
< Middletown, RI 02840          "The name has changed but I'm still guilty." >

dillon@CORY.BERKELEY.EDU (Matt Dillon) (12/12/87)

>Since this may or may not be a hardware problem on my system, I thought I'd
>handle the problem by doing asynchronous I/O to the disk drive and using
>an async timer, then Wait for either to complete.  This is where things

	If other formats work, then I doubt it is a hardware problem.  I
suggest a bug in your code.  The proper way to fix it is to do more testing.
Using a 'timeout and assume it worked' method is *very bad* programming
technique.

:SendIO(disk request)
:SendIO(timer request)
:
:signals = Wait(diskBit | timerBit)
:
:if (signals & timerBit) {
:	We had a timeout...set timeout flag
:}
:else
:	Abort timer request (AbortIO)
:
:GetMsg(timer request)	/* pull timer reply from reply port */
:
:if ( ! (signals & diskBit) ) {
:	Disk I/O didn't complete - abort it.
:}
:
:GetMsg(disk request)	/* pull disk reply from reply port */

>Does anyone see anything wrong with the above "logic" or should I be aware

	The code is all wrong! (unless you didn't include something I
should know about!).

	(1) If the disk request completes first the only signal you get is
	    diskBit... BUT the timer CAN complete just after that but BEFORE
	    you AbortIO() the timer request... meaning the signal gets set
	    and the NEXT Wait() returns immediately even if the next timer
	    request isn't through.

	(2) You GetMsg() the request?  WHAT?  The request is NOT a message
	    port... you CANNOT GetMsg() IT!!!!...  Properly, you WaitPort()
	    on the port the message will be replied to, and then GetMsg() from
	    the port or simply Remove() the message.

	(3) The same illegal signal procedure might also be happenning with
	    your disk request... and you are also GetMsg()ing that which is
	    wrong.  The fix is to use SetSignal() to CLEAR THE TWO SIGNALS
	    BEFORE DISPATCHING THE IOREQUESTS.


					Hope this helps,

					-Matt