[comp.os.minix] Disk Scheduling

matt@inuxf.UUCP (02/20/87)

I am planning on adding true disk scheduling to MINIX as part of an over-all
study of operating system efficiencies.  The existing method in MINIX is a 
First-come-first-served pipeline with no disk i/o queue.  I would like to
through out a straw man proposal for a method so that the alert among you
folks can poke holes, make suggestions, etc.

My first shot goes something like this:

	1) Alter floppy_task() on line 2540 of 'the book' so that all it does 
is receive a message, check for panic and queue up the request in a table
of pending process-message pairs.  It then loops back to receive another
message.

	2) Create a new procedure, say floppy_sched(), that examines the
queued requestes for the shortest seek time request and issues a do_rdwt()
for that request.

	3) floppy_sched() then will prepare a return message for the proper
process and send() a reply to that process.

	4) floppy_sched() re-examines the queue and the process continues.

A couple of questions come to mind.  How do I get floppy_sched() up and
running in the first place?  Make it PUBLIC (external or global) and allow
init() to call it as the system is coming up?  This seems to violate the
whole task concept where the user processes (ie init()) are not supposed to
have access to task calls.  There must be a clean method for doing this
but I have not run across it (I've only had the book three days now).

This seems to be fairly straight forward but I am not sure of the possible
race conditions or other hard to debug situations I may be headed for.
Is it not true that floppy_task() can only receive from one process at a time?
If so than I dont think the queue could be screwed up by simultaneous calls
to floppy_task().

Any comments?
---------------------------------------------------------
Dr Tanenbaum:
	I have tried three different paths to mail my thanks to you with
no success.  I did reach Mr. Fagen at PH and he gave his OK.  I have talked
to Prof Ganesh (dont remember his last name :-) and I should receive his
package shortly.  Again thanks for the info - my whole class appreciates it.


 Matt Verner   				UUCP:  ...ihnp4!inuxc!matt
 AT&T Graphics Software Labs		AT&T:  (317) 844-4364
 Indianapolis,  IN

           "The whole point of this sentence is to clearly
             explain the point this sentence is making."

hays@apollo.UUCP (02/24/87)

I had almost given up on this newsgroup and then someone comes
up with something intellegent to say!

Pax!

marcus@illusion.UUCP (03/02/87)

In article <236@inuxf.UUCP> matt@inuxf.UUCP (Matt Verner) writes:
>I am planning on adding true disk scheduling to MINIX as part of an over-all
>study of operating system efficiencies.  The existing method in MINIX is a 
>First-come-first-served pipeline with no disk i/o queue.  I would like to
>through out a straw man proposal for a method so that the alert among you
>folks can poke holes, make suggestions, etc.
...
>	2) Create a new procedure, say floppy_sched(), that examines the
>queued requestes for the shortest seek time request and issues a do_rdwt()
>for that request.
>	3) floppy_sched() then will prepare a return message for the proper
>process and send() a reply to that process.
>	4) floppy_sched() re-examines the queue and the process continues.

I would think that there is a potential performance problem here.  If two
processes are making requests for blocks at the same end of the disk and 
another is requesting blocks at the other end, the scheduler will service one
process then find the other process at the same end and work on it.  In the
meantime, the process that was just serviced requests the next block which
will be serviced next, etc.

The upshot of this is that the process that had requested a block at the
distant end of the disk will not be serviced until the processes that are
requesting `closer' blocks stop making requests.  This does optimize total
system throughput, but it puts a significant penalty on the one process
that had the misfortune to want a block that was too far from where the
head was currently working.

Unix implements a so-called `elevator algorithm' which attempts to order
requests so that the closest block IN THE CURRENT MOVEMENT DIRECTION is
serviced next, if there are none, then the direction is reversed.  This
tends to sweep the head across the disk more than is absolutely necessary,
and so it is sub-optimal in a global sense, but it does insure that all
requests will be serviced without undue delay.

marcus hall
..!ihnp4!illusion!marcus

jbs@mit-eddie.UUCP (03/02/87)

In article <108@illusion.UUCP> marcus@illusion.UUCP (Marcus Hall) writes:
>Unix implements a so-called `elevator algorithm' which attempts to order
>requests so that the closest block IN THE CURRENT MOVEMENT DIRECTION is
>serviced next, if there are none, then the direction is reversed.  This
>tends to sweep the head across the disk more than is absolutely necessary,
>and so it is sub-optimal in a global sense, but it does insure that all
>requests will be serviced without undue delay.

A disadvantage of this is that disk blocks in the middle of the disk
end up being serviced twice as fast (frequently) as blocks at either
edge.  An alternative here (again, sub-optimal globally in the
interests of "equality") is to service all the requests while moving
in one direction, and then sweep back to the first outstanding request
and service outstanding requests in the same direction.  The
usefulness of this depends on the seek time of the device as compared
to the amount of time spent waiting in the I/O queue.

Another alternative is a "fairness" count.  This is the maximum number
of requests allowed to be processed after a given request is received,
but before that request is serviced.  You then use minimal head
movement to select requests until the fairness count is expired, then
thatrequest is serviced regardless of head position.

Jeff Siegal

matt@inuxf.UUCP (03/03/87)

> In article <236@inuxf.UUCP> matt@inuxf.UUCP (Matt Verner) writes:
> >I am planning on adding true disk scheduling to MINIX as part of an over-all
> >study of operating system efficiencies.  The existing method in MINIX is a 
> >First-come-first-served pipeline with no disk i/o queue.  I would like to
> >through out a straw man proposal for a method so that the alert among you
> >folks can poke holes, make suggestions, etc.
> ...
> >	2) Create a new procedure, say floppy_sched(), that examines the
> >queued requestes for the shortest seek time request and issues a do_rdwt()
> >for that request.
> >	3) floppy_sched() then will prepare a return message for the proper
> >process and send() a reply to that process.
> >	4) floppy_sched() re-examines the queue and the process continues.
> 
> I would think that there is a potential performance problem here... 
> 
> Unix implements a so-called `elevator algorithm' which attempts to order
> requests so that the closest block IN THE CURRENT MOVEMENT DIRECTION is
> serviced next, if there are none, then the direction is reversed.  This
> tends to sweep the head across the disk more than is absolutely necessary,
> and so it is sub-optimal in a global sense, but it does insure that all
> requests will be serviced without undue delay.
> 
> marcus hall
> ..!ihnp4!illusion!marcus

I am well aware of the shortcomings of the Shortest-Seek-First algorithm.  I
am also well aware of other algorithms ( SCAN, LOOK, C-SCAN, C-LOOK, FCFS, etc)
and their respective advantages/disadvantages.  What I am truly looking for is 
a discussion on how to add the queueing mechanism to the floppy (and/or harddsk)
so that true scheduling can occur.  I appreciate the mail and this posting that
point out SSF shortcomings but that is not truly the issue at hand.  In short,
How does one go about adding a queue and a seperate task that will service
that queue independant of other processes.  With the existing message passing
scheme the driver must recieve() and then send() before going into another
recieve().  This means the driver cannot recieve() and queue, recieve() and
queue, etc.  Or am I missing something entirely?

 Matt Verner   				UUCP:  ...ihnp4!inuxc!matt
 AT&T Graphics Software Labs		AT&T:  (317) 844-4364
 Indianapolis,  IN

           "The whole point of this sentence is to clearly
             explain the point this sentence is making."

*** REPLACE THIS LINE WITH YOUR MESSAGE ***

henry@utzoo.UUCP (Henry Spencer) (03/04/87)

> > Unix implements a so-called `elevator algorithm'...
>
> A disadvantage of this is that disk blocks in the middle of the disk
> end up being serviced twice as fast (frequently) as blocks at either
> edge.  An alternative here (again, sub-optimal globally in the
> interests of "equality") is to service all the requests while moving
> in one direction, and then sweep back to the first outstanding request
> and service outstanding requests in the same direction...

In fact, this algorithm -- the "one-way elevator" algorithm -- is the one
used by all modern variants of Unix that I'm familiar with.  I think V6 was
the last version that used a two-way elevator.  V7 used a one-way elevator
with the added complication of giving writes priority over reads (except
that the code as shipped had a bug that ended up reversing the priority,
which had various annoying effects under heavy load...).  The more recent
variants that I've looked at have all used simple one-way elevator.

(As a side note, I vaguely recall that Berkeley came up with a clever
justification for putting reads before writes, apparently never considering
that it might simply be a buggy implementation of a writes-before-reads
strategy.  I asked DMR, he confirmed that it was a bug.)
-- 
"We must choose: the stars or	Henry Spencer @ U of Toronto Zoology
the dust.  Which shall it be?"	{allegra,ihnp4,decvax,pyramid}!utzoo!henry

philip@axis.UUCP (03/07/87)

> How does one go about adding a queue and a seperate task that will service
> that queue independant of other processes.  With the existing message passing
> scheme the driver must recieve() and then send() before going into another
> recieve().  This means the driver cannot recieve() and queue, recieve() and
> queue, etc.  Or am I missing something entirely?

The existing driver is nice and simple - ideal for its intended purpose
of being part of an operating systems course, where it is esential that
the functionality is not obscured by complex algorithims.

The existing structure (simplified) is:

	while (read message)
	do
		set up disc controller
		set GO bit

		read message from hardware (interrupt)

		send reply to the awaiting process

	done

On a heavily loaded system (loaded with disc activity, that is),
the driver will spend most of its time waiting for the message
from the hardware (an interrupt).

It should be possible to modify this structure slightly to allow the
maintainance of a sorted disc access list.

	while (read message)
	do
		switch on type of message:

		case read:
		case write:
				insert call into list (sorted)
				if pointer to currently executing access
				is NIL, set up controller and write GO bit.

		case hardware:

				if pointer is NULL then error(Random interrupt)
				else
					test controller status
					re-issue read/write if need be
					else send message to waiting process

				if list is not empty, set up and initiate
				next read/write
	done

Not having studied the sources in the book VERY closely, I think that
the above scheme would work. Obviously, the structure is more complex
and would probably not be suitable for an introductory os course.

When (and if !) my sources arrive, I may try re-writing the driver as
described above (unless some kind person saves me the trouble).

egisin@watmath.UUCP (03/11/87)

In article <236@inuxf.UUCP>, matt@inuxf.UUCP (Matt Verner) writes:
> 
> I am planning on adding true disk scheduling to MINIX as part of an over-all
> study of operating system efficiencies.  The existing method in MINIX is a 
> First-come-first-served pipeline with no disk i/o queue.  I would like to
> through out a straw man proposal for a method so that the alert among you
> folks can poke holes, make suggestions, etc.

A disk IO queue would improve performance only if the FS task
were modified to do asynchronous IO to the device tasks.

As-is, FS only does synchronous IO by using sendrec in fs/device:rw_dev().
Every system call that goes thru the FS task is completed
before another system call is accepted, with the exception of blocking
reads from a tty or a pipe. These are handled with lots of special-case code.

One of the effect of this is only one disk IO operation is pending
at a time, and only one disk device is active at a time.
If you have a floppy and a hard disk being accessed by two independent
processes, only one device will be active at a time (the floppy will
slow down the hard disk considerably).

What you would like to do when FS blocks for device IO
is to suspend the current system call in FS and the calling process;
accept another system call and start processing it.
This would allow multiple IO requests to be queued to the device tasks.
This is possible in Unix and probably Tunis (a monitor-based Unix-like OS)
because file system calls are procedure based, and each process
in the file system has it's own context (stack) that can be suspended
and resumed. Message passing systems make things difficult.
There is no way to suspend FS while waiting for the device to finish IO
with out a ugly rewrite of FS.  For example a write()
could do a number of device reads of indirect blocks,
then reads of partial blocks written, then writes of modified blocks.
You would have to be able to suspend and restart an FS operation
at any of these device IO operations, which could only be done
with a messy state driven FS task.

However, there are a couple of places in FS you could easily do asynchronous
IO.  One is do_sync, the other is read_ahead().  I don't know how much
this would improve performance when coupled with queues in the device tasks.

	Eric Gisin, egisin@math.waterloo.edu, watmath!egisin.

jgh@gec-mi-at.co.uk (Jeremy Harris) (03/18/87)

Re: one-way and two-way elevator algorithms; has anybody any performance
results for these two (actual or simulated)?

Jeremy Harris	jgh@gec-mi-at.co.uk	...!mcvax!ukc!hrc63!miduet!jgh
-- 
Jeremy Harris	jgh@gec-mi-at.co.uk	...!mcvax!ukc!hrc63!miduet!jgh

mjl@tropix.UUCP (Mike Lutz) (03/22/87)

In article <520@gec-mi-at.co.uk> jgh@gec-mi-at.co.uk (Jeremy Harris) writes:
>Re: one-way and two-way elevator algorithms; has anybody any performance
>results for these two (actual or simulated)?

The only paper I know that did a performance study was:

Teory, T. J., and Pinkerton, T. B., "A Comparative Analysis of Disk
Scheduling Policies," CACM, vol. 15, no. 3 (Mar 1972), pp. 177-184.