[comp.os.minix] FS

inst182@tuvie (Inst.f.Techn.Informatik) (07/17/90)

In article <628@philica.ica.philips.nl> adrie@beitel.ica.philips.nl (Adrie Koolen) writes:
>                The best solution to this problem would be to re-implement
>the FS as a 'stated' FS which returns to the main FS loop to send or
>receive messages. This approach requires quite complicated changes in the
>FS.

It would change the FS a lot, but I don't think it would be very complicated.
FS would look like this:

message_list	waiting_requests, requests_being_serviced;
message		messagein, messageout, messageaux;

for (;;) {
        receive (ANY, messagein)
        if (from_hardware (messagein)) {

                /* must be answer to previous request */
		messageaux = find_by_device (requests_being_serviced,
					     messagein.from);
		messageout = hardware2userprocess (messagein);
		send (messageaux.from, messageout);
		dequeue (requests_being_serviced, messageaux);

		/* device is free now, so send next request */
                if ((messageaux = find_by_device (
					waiting_requests,
					messagein.from)) != NOTFOUND) {
			messageout = userprocess2hardware (messageaux);
			send (messagein.from, messageout);
			dequeue (waiting_requests, messageaux);
			queue (requests_being_serviced, messageaux);
		}

	} else {
                if ((messageaux = find_by_device (
					requests_being_serviced,
					messagein.to)) != NOTFOUND) {
			/* device busy */
			queue (waiting_requests, messagein);
		} else {
			messageout = userprocess2hardware (messagein);
			send (messageout.to, messageout);
			queue (requests_being_serviced, messagein);
		}
	}
}

>FS. An easier method would be to create e.g. 3 FS threads. A FS, which
>waits in the main loop for a message is said to be free. User processes
>send messages to the FS, not knowing that there are 3 instances of the FS.
>A free FS is selected, which handles the request. When it sends a message
>to a task, the task replies to EXACTLY that FS thread. A FS thread cannot
>be interrupted by another FS thread, only by tasks. This feature can be
>used to garantee the consistency of the FS tables. This way, several
>processes can use the FS to access different tasks concurrently.

Yes that's possible, but I think one stated FS with queues is more elegant
than a fixed number of FS threads. And you could schedule the requests to the
same device (via elevator-algorithm, or something similar) more easily.
>
>Adrie Koolen (adrie@ica.philips.nl),
>Philips Innovation Centre Aachen

Peter J. Holzer (hp@vmars.tuwien.ac.at)

DISCLAIMER: It has been a while since I looked at the Minix sources, so
the above code segment is only a rough sketch, not something you could use.
But when I manage to get 1.5 I will look at the problem more closely.

adrie@philica.ica.philips.nl (Adrie Koolen) (07/18/90)

In article <1685@tuvie> inst182@tuvie.UUCP (Inst.f.Techn.Informatik) writes:
>In article <628@philica.ica.philips.nl> adrie@beitel.ica.philips.nl (Adrie Koolen) writes:
>>                The best solution to this problem would be to re-implement
>>the FS as a 'stated' FS which returns to the main FS loop to send or
>>receive messages. This approach requires quite complicated changes in the
>>FS.
>
>It would change the FS a lot, but I don't think it would be very complicated.
>...
>Peter J. Holzer (hp@vmars.tuwien.ac.at)

I agree with you in that the stated FS is more elegant than a threaded FS.
But both solutions have to solve two main problems. The first is that many
messages, sent from the FS to a task, are hidden in a deeply nested function.
For instance, when a process issues an ioctl(), it arrives at do_ioctl() in
fs/device.c, where a message is prepared, sent to the task via rw_dev() and
the results are stored in the reply message for the user process. After the
message has bee sent, the FS has to return to the main loop and when the task
replies, the FS has to finish the request. I think it's best that every time
a message is sent to a task, the address of a function to call when the reply
message is received, must be stored in some table. That way, the FS can
properly handle reply messages, received from tasks.

The second problem is that the FS also uses the tasks for its own purposes.
When reading a file, the FS has to read an i-node and possibly some indirect
blocks to access the data in the file. Each time the FS calls a task, you
must decide where to continue after the reply is received. And believe me,
this requires a lot of rewriting. When Minix was young, the FS contained
quite a lot of small bugs. They didn't occur often, so they were very hard
to find. When you rewrite the FS, be it into a stated or a threaded FS, you
will again create (mostly synchronisation) bugs, that are hard to find. I
say we should go for it, but I think that a restructurization of the FS
will be quite a big job and Andy will not like this (but he will have to,
because Minix will continue to grow and with faster hardware, the
deficiencies of the FS will become more and more apparent).

Adrie Koolen (adrie@ica.philips.nl)
Philips Innovation Centre Aachen

steve@cs.su.oz (Stephen Russell) (07/19/90)

In article <628@philica.ica.philips.nl> adrie@beitel.ica.philips.nl (Adrie Koolen) writes:
>                The best solution to this problem would be to re-implement
>the FS as a 'stated' FS which returns to the main FS loop to send or
>receive messages. [...] An easier method would be to create e.g. 3 FS threads.

In article <1685@tuvie> inst182@tuvie.UUCP (Inst.f.Techn.Informatik) replied:
>Yes that's possible, but I think one stated FS with queues is more elegant
>than a fixed number of FS threads.

The main loop of the stated FS is elegant, but the rest of the code is
rather messy. The problem is that a single user transaction may result
in multiple transactions with the device task, and these transactions
often occur in `inconvenient' places.

For example, consider scanning a directory for a matching file name
(the Unix namei() function). This contains a loop that reads blocks.
Somehow we must break out of this loop to get back to the main state
machine, and return to the search loop with the context (the dir
offset, the search string, etc) restored after the IO is completed. I'm
sure the Minix FS contains many other examples of deeply nested IO.

Making this work with a single-thread FS is complicated. Basically, the
_whole_ FS must be rewritten to `flatten' it out, and the context of
each transaction is kept in (static) structures. This is a major
rewrite.

The multi-threaded approach keeps almost all of the existing code, and
lets the OS look after keeping track of the context of each
transaction.  The cost is extra stack space for each thread.

In article <628@philica.ica.philips.nl> adrie@beitel.ica.philips.nl (Adrie Koolen) also wrote:
>A FS thread cannot
>be interrupted by another FS thread, only by tasks. This feature can be
>used to guarantee the consistency of the FS tables.

I'm not sure this is always true, though. It is possible that while one
thread is blocked, the FS state may be changed by another thread in a
way that causes problems.

A new question: can the Minix IO tasks support a multi-thread FS? The
major requirement is the ability to accept all pending requests
messages, pick the `best' one to service, and reply to that request
when it is finished. If not, the whole issue is moot.

Cheers

brucee@runxtsa.runx.oz.au (Bruce Evans) (07/21/90)

In article <633@philica.ica.philips.nl> adrie@beitel.ica.philips.nl (Adrie Koolen) writes:
>I agree with you in that the stated FS is more elegant than a threaded FS.
>But both solutions have to solve two main problems. The first is that many
>messages, sent from the FS to a task, are hidden in a deeply nested function.

I prefer threads. They neatly solve this first problem. The nested context
is automatically preserved with a separate stack for each thread, at least
after many global variables are made local or put in the fproc table. (The
defines in fs/param.h hide an alarming number of globals).

Switching *from* a thread can easily be handled in rw_dev (note all physical
i/o goes through there):

	while (waiting_for_reply_from(task_nr)
	    || (r = send(task_nr, mess_ptr)) == E_LOCKED)
		/* Don't send if FS is sure to block. Try later if the send
		 * failed due to deadlock prevention.
		 */
		unready_current_thread_and_run_another();
	if (r != OK) panic("blah", r);
	unready_current_thread_and_run_another();

This assumes a dispatcher thread that is always ready to run. It just loops
waiting for replies. When one arrives, it readies the thread that sent the
corresponding request and any others in the wait loop with task_nr ==
replying_task_nr, and picks one to run.

>The second problem is that the FS also uses the tasks for its own purposes.
>When reading a file, the FS has to read an i-node and possibly some indirect
>blocks to access the data in the file. Each time the FS calls a task, you
>must decide where to continue after the reply is received. And believe me,

Deciding where to continue is automatic with the above approach. Note that the
restriction to at most one thread sending to a driver task simplifies many
things. It is necessary because the device tasks are now single-threaded like
FS. (This is not so harmful as for FS. It penalizes mainly devices with
independent minors, and makes it impractical to put anything but short-term
strategies in the drivers.) It probably makes competition for resources like
disk buffers less of a problem.

graham@sce.carleton.ca (Doug Graham) (07/23/90)

In article <2013@runxtsa.runx.oz.au> brucee@runxtsa.runx.oz.au (Bruce Evans) writes:
>
>Switching *from* a thread can easily be handled in rw_dev (note all physical
>i/o goes through there):
>
>	while (waiting_for_reply_from(task_nr)
>	    || (r = send(task_nr, mess_ptr)) == E_LOCKED)
>		/* Don't send if FS is sure to block. Try later if the send
>		 * failed due to deadlock prevention.
>		 */
>		unready_current_thread_and_run_another();
>	if (r != OK) panic("blah", r);
>	unready_current_thread_and_run_another();
>
>This assumes a dispatcher thread that is always ready to run. It just loops
>waiting for replies. When one arrives, it readies the thread that sent the
>corresponding request and any others in the wait loop with task_nr ==
>replying_task_nr, and picks one to run.

If FS must be left as a task, or pool of tasks, this is probably the best way
to go. But I think you mispelled the name of the call that switches between
threads. Rather than "unready_current_task_and_run_another()", it should be
"sleep(task_nr)". The call that the dispatcher thread makes is
"wakeup(task_nr)". :-) In any case, it sounds like you're proposing a
synchronization mechanism that goes beyond message passing. Is this the case,
or would you implement these new "primitives" via message passing in some way?

I don't think it's this simple though. I mentioned in an earlier
message, that it's quite possible that two user processes, and thus two
FS threads, may be trying to read the same disk block. In this case,
I don't think the second request should get as far as rw_dev. Higher
level software would have to notice that the disk block was already
being read in, and block the second thread until the disk block became
available. Inodes and things would need to be locked as well. If this is
not done, an FS thread that blocked for one reason or another, would
not be able to make any assumptions about the state of the system after
returning from the blocked state. It might be possible to write FS
like that, but it sure would be difficult.

----
Doug.

brucee@runxtsa.runx.oz.au (Bruce Evans) (07/31/90)

In article <883@sce.carleton.ca> graham@sce.carleton.ca (Doug Graham) writes:
>"wakeup(task_nr)". :-) In any case, it sounds like you're proposing a
>synchronization mechanism that goes beyond message passing. Is this the case,
>or would you implement these new "primitives" via message passing in some way?

I was hoping to use a minor variant of a method I use in some real-time code.
The only primitive is suspend(). That wakes up the next active thread on a
circular queue. The thread either does some real work or tests a flag to
see if it can proceed.

>I don't think it's this simple though. I mentioned in an earlier
>message, that it's quite possible that two user processes, and thus two
>FS threads, may be trying to read the same disk block. In this case,
>I don't think the second request should get as far as rw_dev. Higher

You are right, there would have to be synchronization loops in a lot of
places. They might as well be made into library routines with names like
sleep() or wait() :-). The cache routines might end up looking like the
ones described in Bach's book instead of the current simple ones :-(.

What is there to worry about besides blocks and inodes? Probably everything
that eventually calls rw_dev, i.e. everything.
-- 
Bruce Evans
Internet: brucee@runxtsa.runx.oz.au    UUCP: uunet!runxtsa.runx.oz.au!brucee
(My other address (evans@ditsyda.oz.au) no longer works)