[comp.unix.internals] Shareable, networked, swap device?

aglew@crhc.uiuc.edu (Andy Glew) (10/26/90)

Does anyone have a shareable, networked, swap device?

Anyone = commercial or academic / public domain, any flavour of UNIX.

Shareable, networked, swap device:
    NOT mkfile/swapon !
    What I seek is a swap device that multiple workstations share
dynamically, not a disk that can support several statically sized swap
partitions (well, not quite statiic, but more swap space isn't
allocated automagically).
    In any typical network of n machines, each machine is allocated,
say, 32M of swap space, for a total of 32*n megabytes.  But, in our
network, typically half of the machines are not being used at any time,
so they typically have around 30M of swap-space free.  At the same
time, a much smaller fraction of our machines are running very large
jobs, and really need swapspace around 64-128M - but these aren't
always the same machines (not designated compute servers).  
    It really would be nice if the unused swap memory of some machines
could be used to temporarily, transparently, expand the swap memory of
others, on demand.  Ie. it would be nice if swap space was a
centralized resource pool, rather than fragmented.
    As I've said above, management via mkfile/swapon is possible, but
klugey.  You want the swap space to be transparently added, without
human intervention, and without causing processes to die with the
message "too big".  Things aren't helped by SUN and friends not
supporting the swapon -l and swapon -d (list and delete) commands
found on System V.
    
How hard would this be to do?
    Modest.  The typical /dev/drum interface, where the kernel assumes
that it has exclusive control over a large space, could be modified.
All that is really needed is an information call to indicate when a
swap page has been freed, so that it can be physically removed from
under one machine's /dev/drum and given to another. And a trap so that
an attempt to access a /dev/drum page that has been removed can be
handled by requesting over the network.  Safety properties, of course,
are a bit harder, and it really would be nicer to have a "give me NNN
pages of swap space call" made to the shareable, networked, swap
device.

Has anyone seen something like this?


This is posted to comp.unix.internals, because any such device is
probably a driver, a server, an interface to /dev/drum, or all three;
to comp.unix.large, because large systems with lots of workstations
are likely to be playing the swap allocation game; and to
comp.unix.admin because such a device would make system administration
on large systems easier.  Followups to comp.unix.inyernals.
--
Andy Glew, a-glew@uiuc.edu [get ph nameserver from uxc.cso.uiuc.edu:net/qi]

lm@slovax.Sun.COM (Larry McVoy) (10/27/90)

In article <AGLEW.90Oct25235828@cobra.crhc.uiuc.edu> aglew@crhc.uiuc.edu (Andy Glew) writes:
>Does anyone have a shareable, networked, swap device?
>
>    In any typical network of n machines, each machine is allocated,
>say, 32M of swap space, for a total of 32*n megabytes.  But, in our
>network, typically half of the machines are not being used at any time,
>so they typically have around 30M of swap-space free.  At the same
>time, a much smaller fraction of our machines are running very large
>jobs, and really need swapspace around 64-128M - but these aren't
>always the same machines (not designated compute servers).  
>    It really would be nice if the unused swap memory of some machines
>could be used to temporarily, transparently, expand the swap memory of
>others, on demand.  Ie. it would be nice if swap space was a
>centralized resource pool, rather than fragmented.

Do you want any fairness?  Should hostA be able to use up all the swapspace
to the exclusion of b, c, d, and e?  Should the OS provide hooks to allow
you to tune this?  How would you tune it?  What hooks do you want?
---
Larry McVoy, Sun Microsystems     (415) 336-7627       ...!sun!lm or lm@sun.com

bzs@world.std.com (Barry Shein) (10/28/90)

I have to admit it is an interesting idea. Policy needs to be designed
obviously, but in the end what you really want is a bibop (big bag o'
pages, name stolen from the lisp culture) shareable area, instead of
typed pages the "types" are host id's (perhaps IP addresses.)

Couldn't NFS *almost* do this right now. You'd create this big file
and the clients would keep their own seek pointers (allocated by the
server, but otherwise stateless since each request to pagein would
include the seek pointer and perhaps the page size, or maybe size is
fixed, seek pointer becomes a funny kind of file handle.)

Sounds like most of the work is on the client side (not unusual for
these types of things.)

So basically the operations are "store this page somewhere in file X",
which would return a magic cookie used later to get that page back.

Maybe a better name would be a "cloakroom swap discipline", you
check-in your page and get your ticket to retrieve it later.

There is still the whole pre-allocation problem (then again this might
be a nice opportunity to splice in some long-overdue subterfuges...)
-- 
        -Barry Shein

Software Tool & Die    | {xylogics,uunet}!world!bzs | bzs@world.std.com
Purveyors to the Trade | Voice: 617-739-0202        | Login: 617-739-WRLD

buck@siswat.UUCP (A. Lester Buck) (10/28/90)

In article <AGLEW.90Oct25235828@cobra.crhc.uiuc.edu> aglew@crhc.uiuc.edu (Andy Glew) writes:
>Does anyone have a shareable, networked, swap device?

Even better, how about remote virtual memory?  Check out the Usenix paper
from summer 1990 by Comer and one of his students about their
implementation of a networked virtual memory server.  They show how
their implementation is quite competitive with other forms of virtual
memory.

-- 
A. Lester Buck    buck@siswat.lonestar.org  ...!uhnix1!lobster!siswat!buck

richard@aiai.ed.ac.uk (Richard Tobin) (10/29/90)

In article <BZS.90Oct27161143@world.std.com> bzs@world.std.com (Barry Shein) writes:
>Couldn't NFS *almost* do this right now.

I considered this.  The way I looked at it was that the problem is
that there's no way for the client to tell the server when a page is
freed.  Apart from this, it could work - the server wouldn't even have
to give the client a pointer into the file, it could just map (client,
client's-offset) to (server's-offset).

-- Richard
-- 
Richard Tobin,                       JANET: R.Tobin@uk.ac.ed             
AI Applications Institute,           ARPA:  R.Tobin%uk.ac.ed@nsfnet-relay.ac.uk
Edinburgh University.                UUCP:  ...!ukc!ed.ac.uk!R.Tobin

bzs@world.std.com (Barry Shein) (10/30/90)

From: richard@aiai.ed.ac.uk (Richard Tobin) [responding to me]

>>Couldn't NFS *almost* do this right now.
>
>I considered this.  The way I looked at it was that the problem is
>that there's no way for the client to tell the server when a page is
>freed.  Apart from this, it could work - the server wouldn't even have
>to give the client a pointer into the file, it could just map (client,
>client's-offset) to (server's-offset).

Hmm, basically a distributed scatter-gather MMU device. The client
believes it has Xmb of swap and the server just manages the address
mappings thru typical associative memory maps.

I suppose the easiest way to free pages would be by use of a tag (the
process id on the client would be a good candidate, the server doesn't
much care so long as it's client-unique.)

Then all you need is a free_tag() operation (I assume that once a page
is allocated to a process it's not freed until the process is
finished, something more flexible can be left as an exercise for the
reader.)

So you have a three-tuple to identify any page:

	f(host_address,page_offset,tag) -> server_page_location

for each page in the page server (again, assuming pages are fixed in
size, otherwise throw in size, bother.)

Interestingly, with the tag it allows for each process to have its own
virtual page-address space w/o the client needing to manage that at
all.

It could work. It's even stateless enough to survive server crashes,
and doesn't much interfere with the current model in a Unix client
of how swap is allocated.
-- 
        -Barry Shein

Software Tool & Die    | {xylogics,uunet}!world!bzs | bzs@world.std.com
Purveyors to the Trade | Voice: 617-739-0202        | Login: 617-739-WRLD