[comp.arch] The merits of FCFS for file servers

pcg@cs.aber.ac.uk (Piercarlo Antonio Grandi) (03/05/91)

[ this article may have already appeared; I repost it because probably
it did not get out of the local machine; apologies if you see it more
than once ]

I had written, on whether it pays to have enough nfsds that there is
a chance that several of them may queue on a disk and thus exercise the
elevator sorting algorithm of the Unix kernel:

pcg> The latter argument is somewhat doubtful as there is contradictory
pcg> evidence about the relative merits of FCFS and of elevator style
pcg> sorting as used by the Unix kernel.

lm> Ahem.  References, please?  I've looked at this one in detail, this
lm> should be interesting.

Apparently the gist of the argument is that elevator sorting imposes
excessive latency. There are several variations on elevator sorting that
try to break the input queue into segments in order to limit the maximum
latency to to which a request may be subjected. Some argue that having a
segment length of 1 (FCFS) is not that bad, actually, depending on
conditions. I was myself surprised by this, but on second thoughts it is
not that incredible; if the filesystem is careful in laying out blocks
in a physically clustered way (the SunOS for one is fanatical about
this, as you well know :->), and requests for sections of the same file
are closely clustered in time (not an uncommon circumstance), the arm
will tend to move optimally anyhow (there is some paper, possibly in
TOCS, about the DTSS filesystem in which they find this to be the case).

This is one side for the contradictory evidence; the other side is that
theoretical analysis shows that FCFS should be pretty bad, and then
there is the article (in BTLJ, I seem to remember) in which it is
reported that the addition of disk sorting to V7 made three disks across
which load was balanced perform as a single disk with one fixed arm,
which is a big win. The assumption there is that thruput is important,
not latency, so maybe there is little contradiciton with the above
analysis. Also, the V7 filesystem was prone to dispersing file blocks,
thus making the arm move far more randomly than with designs that use a
bitmap for frelist.

I am not at home unfortunately, so I cannot give you now many article or
book references. Ah, I actually have around here the following, which is
a bit old:

    %A Robert Geist
    %A Stephen Daniel
    %T A continuum for disk scheduling algorithms
    %J TOCS
    %I ACM
    %V 5
    %N 1
    %D FEB 1987

This is a bit old; the first paragraph is

    Scheduling algorithms for moving-head disks have been studied for
    many years, but which algorithm is "best" is still an open question.

The first section continues on the relative merits of FCFS, SSTF, and
variations on SCAN. The rest of the article shows that even if their
favourite technology is superior to FCFS, the superiority is not
massive.  The contradictory evidence is neatly summed in the
conclusions:

    Certainly the efficient implementation of V(R) presented in Section
    4 is no more complex than SSTF (or SCAN), so the question is easily
    reduce to one of V(R) versus FCFS. Our simulation results and those
    of other studies [4, 8, 9] continue to indicate horrible performance
    for FCFS, our tests from a real system indicate that FCFS is merely
    inferior and far short of catastrophic.

This paper was published in a journal not yet subject to Top Secret
classification, its subversive contents notwithstanding.  I will add
that I have seen other (equally unclassified :->) papers in which FCFS
was shown to be actually on a par with (in admittedly particular cases
even better than) more sophisticated policies under some fairly
reasonable assumptions. Did I seem them in some IEEE transactions? I
will check... I seem also to dimly remember that Wiederhold's book,
"Database Design" has a relevant discussion on arm scheduling.

IMNHO a nice short segment variation on elevator sorting (but not the
historical V7 Unix one) is probably, even if somewhat more complex
preferable to FCFS, because it can be expected either to win or not to
lose.

Going back to an NFS environment, allowing nfsds to queue outstanding
requests may therefore or may not be a win; it may be a lose, because in
certain environments having many nfsds can be a performance loss greater
than the advantage of elevator sorting.

Finally, as I am fond of repeating, architecture is a difficult
balancing of contradictory requirements and evidence. Flexibility is the
key, and reducing the cost of certain mechanisms makes flexibility
affordable...
--
Piercarlo Grandi                   | ARPA: pcg%uk.ac.aber@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@aber.ac.uk

jqj@duff.uoregon.edu (JQ Johnson) (03/06/91)

pcg> The latter argument is somewhat doubtful as there is contradictory
pcg> evidence about the relative merits of FCFS and of elevator style
pcg> sorting as used by the Unix kernel.
lm> Ahem.  References, please? 

Interested readers should see the last TOCS, specifically King, Richard P.
(1990) Disk arm movement in anticipation of future requests.  _ACM TOCS_
8, no. 3 (August 1990), pp. 230-250.  King argues that sorting is
irrelevant since in most practical systems the queue length is seldom more
than 1, hence the important thing is to have an algorithm that tends to
leave the actuator in a reasonable (read "average" or "near the center"
for many distributions of requests) place.

-- 
JQ Johnson
Director of Network Services		Internet: jqj@oregon.uoregon.edu
University of Oregon			voice:	(503) 346-4394
250E Computing Center			BITNET: jqj@oregon
Eugene, OR  97403-1212			fax: (503) 346-4397

mohta@necom830.cc.titech.ac.jp (Masataka Ohta) (03/06/91)

In article <PCG.91Mar4210505@aberdb.cs.aber.ac.uk>
	pcg@cs.aber.ac.uk (Piercarlo Antonio Grandi) writes:

>Some argue that having a
>segment length of 1 (FCFS) is not that bad, actually, depending on
>conditions. I was myself surprised by this, but on second thoughts it is
>not that incredible; if the filesystem is careful in laying out blocks
>in a physically clustered way

I'm afraid you are correct only 10% of the time.

What if there are two uncorrelated processes writing something on
different locations of a disk?

As you are talking about a network file server, such a situation is
very common.

						Masataka Ohta

PS

disk partitioning makes the situation even worse.

lm@slovax.Eng.Sun.COM (Larry McVoy) (03/08/91)

In article <PCG.91Mar4210505@aberdb.cs.aber.ac.uk>, pcg@cs.aber.ac.uk (Piercarlo Antonio Grandi) writes:
|> I had written, on whether it pays to have enough nfsds that there is
|> a chance that several of them may queue on a disk and thus exercise the
|> elevator sorting algorithm of the Unix kernel:
|> 
|> pcg> The latter argument is somewhat doubtful as there is contradictory
|> pcg> evidence about the relative merits of FCFS and of elevator style
|> pcg> sorting as used by the Unix kernel.
|> 
|> lm> Ahem.  References, please?  I've looked at this one in detail, this
|> lm> should be interesting.
|> 
|> Apparently the gist of the argument is that elevator sorting imposes
|> excessive latency. There are several variations on elevator sorting that
|> try to break the input queue into segments in order to limit the maximum
|> latency to to which a request may be subjected. Some argue that having a
|> segment length of 1 (FCFS) is not that bad, actually, depending on
|> conditions. I was myself surprised by this, but on second thoughts it is
|> not that incredible; if the filesystem is careful in laying out blocks
|> in a physically clustered way (the SunOS for one is fanatical about
|> this, as you well know :->), and requests for sections of the same file
|> are closely clustered in time (not an uncommon circumstance), the arm
|> will tend to move optimally anyhow (there is some paper, possibly in
|> TOCS, about the DTSS filesystem in which they find this to be the case).

I hate to break an otherwise perfect record, but for once I have to agree. :-)
And for the same reasons.  I think it is a right of passage for all new engineers (well, fs hackers at least) to think they can improve disksort().
I was no different.  I did do a nice job of actually measuring a system,
building a simulator, and running the data through all the algs I could dream up
(inorder, elevator, hacksaw, sstf).  By the way, Unix (at least SunOS) doesn't use an elevator, it uses what I call hacksaw, which is in increasing block order, then one long seek back.

Anyway, I found that the BSD FFS is doing a nice job of clustering related
requests (the whole point of cylinder groups, remember them?).  My conclusion was that FIFO would work just fine.  My data showed about 5% difference between
optimal and the worst of them all (don't remember which that was).

I am *not* sure that this scales up.  The server that I measured (a local Sun server dishing up home directories and diskless client gunk), was not that busy.
I wouldn't call it idle, but the I/O patterns were definitely bursts, and the bursts where typically related I/O's (inode, data, dir stuff).  I predict, off the cuff, that lightly to medium loaded servers will typically work just fine with FCFS.  I'm not sure how bad things will get when the server's arm is saturated, but I suspect that disksort() helps a lot in that case.
---
Larry McVoy, Sun Microsystems     (415) 336-7627       ...!sun!lm or lm@sun.com