[comp.os.mach] Efficient copying from task to task

roland@sics.se (Roland Karlsson) (05/17/91)

I have two tasks with the same virtual memory layout.  Now I want to
(very efficiently) copy a block from one task to another task.  The
block shall be at the same addresses in both tasks.  The block do not
have to start or end at page boundaries.  The block can consist of any
number of integers.  It can be just some bytes or several megabytes.
This copying is made very frequently during execution of several one
thread tasks (=processes).

Proposal one:
------------
The solution in UNIX was to map the memory, to a file, with mmap.  All
UNIX processes can then map other processes memory to another address.
Then can an ordinary block copy be used.

    File:
    -------------
    | 0 | 1 | 2 |
    -------------

    Map for process 0:   Map for process 1:   Map for process 2:
    -------------        -------------        -------------
    | 0 | 1 | 2 |        | 1 | 2 | 0 |        | 2 | 0 | 1 |
    -------------        -------------        -------------
    x                    x                    x

I have tried to use vm_map to implement this behavior, but in without
success.  Every time I use vm_map (with the memory_object
MEMORY_OBJECT_NULL) to the same offset in the memory object it looks
like I get a new piece of memory, not an alias for the same.  Can it
be done???  Someone said the magical word "external pager" but I do
not know how to use it.


Proposal two:
------------
You could ignore the above mapping method and use a combination of
vm_write (for integral pages) and copying via a buffer (for parts of
pages).  This should be straight forward I think.  But I do not want
to copy via a buffer (this means copying two times) and I would also
like to use vm_read (this means also copying two times) as the task
that wants the memory otherwise would be idle during copying (when
they could copy half the memory each).


Refinement one to proposal two:
-------------------------------
It would be very nice if I could use a "copy on write" technique to
copy the integral pages from one task to another task.  How can that
be made???

Refinement two to proposal two:
-------------------------------
Can I write/read parts of pages to/from another tasks memory without
mapping???  Then I do not have to copy twice for small blocks.


Proposal three:
--------------
Something completely different.  But what???


--
Roland Karlsson
SICS, PO Box 1263, S-164 28 KISTA, SWEDEN	Internet: roland@sics.se
Tel: +46 8 752 15 40	Ttx: 812 61 54 SICS S	Fax: +46 8 751 72 30

goykhman_a@apollo.HP.COM (Alex Goykhman) (05/18/91)

In article <1991May17.074526.19128@sics.se> roland@sics.se (Roland Karlsson) writes:
>
>I have two tasks with the same virtual memory layout.  Now I want to
>(very efficiently) copy a block from one task to another task.  The
>block shall be at the same addresses in both tasks.  The block do not
>have to start or end at page boundaries.  The block can consist of any
>number of integers.  It can be just some bytes or several megabytes.
>This copying is made very frequently during execution of several one
>thread tasks (=processes).

    How about vm_copy(target_task, source_address, count, dest_address) ?

    Vm_copy can only transfer whole pages, though.

>
>Proposal one:
>------------
>The solution in UNIX was to map the memory, to a file, with mmap.  All
>UNIX processes can then map other processes memory to another address.
>Then can an ordinary block copy be used.
>
>    File:
>    -------------
>    | 0 | 1 | 2 |
>    -------------
>
>    Map for process 0:   Map for process 1:   Map for process 2:
>    -------------        -------------        -------------
>    | 0 | 1 | 2 |        | 1 | 2 | 0 |        | 2 | 0 | 1 |
>    -------------        -------------        -------------
>    x                    x                    x
>
>I have tried to use vm_map to implement this behavior, but in without
>success.  Every time I use vm_map (with the memory_object
>MEMORY_OBJECT_NULL) to the same offset in the memory object it looks
>like I get a new piece of memory, not an alias for the same.  Can it
>be done???  Someone said the magical word "external pager" but I do
>not know how to use it.

    Speaking of mmap, MEMORY_OBJECT_NULL is same as:

   #define  MAP_ANON         0x0002  /* allocated from memory, swap space */

    Since there is no underlying memory object (file), the vm_map()'s behavior
    with MEMORY_OBJECT_NULL is quite understandable
>
>Proposal two:
>------------

...
>
>Refinement one to proposal two:
>-------------------------------
>It would be very nice if I could use a "copy on write" technique to
>copy the integral pages from one task to another task.  How can that
>be made???

    For that, the source task must be the parent of a destination task.
    I.e., the source tasks issues a

        vm_map (..,MEMORY_OBJECT_NULL,..., VM_INHERIT_COPY),

    fills the region with data, and task_create() the dest. task
    The latter would get a "snapshot" of the region.  Trouble is, you'd
    need to issue task_create() every time you want to copy a block.

    Perhaps, what you need is to substitute VM_INHERIT_COPY for VM_INHERIT_SHARE,
    so the tasks can share the region in some ordered fashion.

>
>Refinement two to proposal two:
>-------------------------------
...
>
>Proposal three:
>--------------
>Something completely different.  But what???

    Sending data via MACH messages is another possibility.
>
>
>--
>Roland Karlsson
>SICS, PO Box 1263, S-164 28 KISTA, SWEDEN	Internet: roland@sics.se
>Tel: +46 8 752 15 40	Ttx: 812 61 54 SICS S	Fax: +46 8 751 72 30


Alex Goykhman                    speaking for myself  
Chelmsford System Software Lab   mit-eddie!apollo!goykhman_a
Hewlett-Packard, Company         goykhman_a@apollo.hp.com

francis@cs.ua.oz.au (Francis Vaughan) (05/19/91)

In article <1991May17.074526.19128@sics.se>, roland@sics.se (Roland
Karlsson) writes:
|> 
|> I have two tasks with the same virtual memory layout.  Now I want to
|> (very efficiently) copy a block from one task to another task.  The
|> block shall be at the same addresses in both tasks.  The block do not
|> have to start or end at page boundaries.  The block can consist of any
|> number of integers.  It can be just some bytes or several megabytes.
|> This copying is made very frequently during execution of several one
|> thread tasks (=processes).


UH? What do you mean by "thread tasks (=processes)" Do you really mean
that you must have two seperate virtual address spaces (= task = process)
or can you actually do it with multiple threads within the one task?

What do you use for sychronisation at the moment?  I would first of all
think very hard about building you problem as a multiply threaded task.

If you cannot, you have a problem since all virtual memory is handled
on page boundaries. Unless you have some pressing problem however
this should not be too bad. Just keep your other data out of the page.


|> Proposal one:
|> ------------
|> The solution in UNIX was to map the memory, to a file, with mmap.  All
|> UNIX processes can then map other processes memory to another address.
|> Then can an ordinary block copy be used.
|> 
|> I have tried to use vm_map to implement this behavior, but in without
|> success.  Every time I use vm_map (with the memory_object
|> MEMORY_OBJECT_NULL) to the same offset in the memory object it looks
|> like I get a new piece of memory, not an alias for the same.  Can it
|> be done???  Someone said the magical word "external pager" but I do
|> not know how to use it.

vm_map is not the correct call. You need map_fd(); This will get you
a pretty good aproximation to what you are used to. (I think.)
To use vm_map you would need to provide you own external pager. However
fd_map does what you need with the inode pager (which is actually an 
external pager). You only need an external pager if you want something
really fancy in the way of virtual memory behavior.

Be aware of a bug in Mach. fd_map() is a read-only map. Alterations in
the memory object are not reflected in the file. The only way to get
the alterations back is to write(); them. If you write the page back to 
the mapped file with write() and the page is not resident in physical 
memory the kernel will crash. (A conflict in locks to the inode I belive.)
You need to touch the page first to be sure.


If anyone really needs to write an external pager I have a usefull litle
example external pager that makes a good stub for writing real ones.


|> Proposal two:
|> ------------
|> You could ignore the above mapping method and use a combination of
|> vm_write (for integral pages) and copying via a buffer (for parts of
|> pages).  This should be straight forward I think.  But I do not want
|> to copy via a buffer (this means copying two times) and I would also
|> like to use vm_read (this means also copying two times) as the task
|> that wants the memory otherwise would be idle during copying (when
|> they could copy half the memory each).

Just use IPC, at least you only need to do the copy once then. But still
maybe sub-optimal. The kernel copes with messing about with copying and
remapping stuff for IPC tranfsers. However it won't put out of line data
where you want. (Which is a pity.) So you need to copy it from the IPC
buffer. Have a look at MIG too. Neat and relativly easy to use once you 
are used to it.


|> Refinement one to proposal two:
|> -------------------------------
|> It would be very nice if I could use a "copy on write" technique to
|> copy the integral pages from one task to another task.  How can that
|> be made???

Do you really mean copy-on-write? You previous description conflicts with
this. (sort of.)

To do what I think you mean just set the memory you want to share as 
VM_INHERIT_SHARE (or VM-INHERIT-COPY) with the vm_inherit() call 
(page aligned memory again) and fork() the shareing tasks. However
copy-on-write means you get a copy the first time you write and then thats
it. I dont think thats what you meant. SHARE means that it really is shared
with no sychronisation. You need to build you own.


|> Refinement two to proposal two:
|> -------------------------------
|> Can I write/read parts of pages to/from another tasks memory without
|> mapping???  Then I do not have to copy twice for small blocks.

Again maybe use IPC. 


|> Proposal three:
|> --------------
|> Something completely different.  But what???

Mutli-thread the problem. Use cthreads and its locking primitives.

If you are building data in place and then making it available once
it is ready, you will never get away without copying it at least once.

When it comes down to it, most of the above are just different variants
on using IPC do shuffle memory around.

There is not quite enough information about you problem to really
give definitive advice.

Hope these ramblings are of some use.

						Francis Vaughan.

veron@ecrc.de (Andre Veron) (05/21/91)

Copying data form task to task with data mapped at the
same virtual address in  the two tasks is problem we have
encountered when trying to design a Parallel Prolog-like System.

We wanted to have a machinery for creating dynamically tree of processes where:
1/**Every process which has forked off children is suspended until its
children terminate. 
2/**The children are created with a copy of a  part of the parent  data space. 
3/** The size of the data to be copied is not fixed and can easily be some
(tens of) megabytes big.
4/** The data must be mapped at the same virtual adress both in the parent 
and the children.
5/** Not all the data which is copied in a child's space is read or written
by the child. Locality in the accesses or updates is expected.
6/ The processes in the tree do not need file descriptors,I/O channels
or communication ports. 

Because of 3/ and 5/ the eager copying of UNIX fork put UNIX out of the game.

We then put some hope in the lazy copying of MACH.  The problem then appeared
to be that MACH as well as UNIX is not designed for applications which
need intensive forking of "threads" of computation which have their own 
separate adress spaces. The task in MACH  is a coarse grain entity which
have a whole bunch of available facilities like files, communication
port which are not always needed but are always there. A task is consequently
costly to create, to schedule and to terminate.

What we ended up with is a proposal for a new kind of thread/process
which we believe is implementable in an operating system and fits more
our needs. These threads/processes are executed in the same adress space
except in some precise regions that they personnaly own and which are copied
(lazily) form the parent at creation time. The virtual space
appears then to be "locally layered":

                             ------------- : Owned by Thread1
                             ------------- : Owned by Thread2
                             -------------
     |-----Shared space----| Private space |--------Shared space -------|
                             ------------- : ....
                             ------------- : ....
                             ------------- : Owned by ThreadN...
 Within a quantum of time allocated
to a global "task" (not a MACH task any more) context switching between
these threads/processes is cheap - the cost is the one of a unmapping 
the pages of one thread/process and mapping those of the next one.
Since the concept of private region is hardwired in the paradigm
all virtual memory handling can be done at forking time/context-switching time
when the system is in kernel mode. No additional and unelegant system calls
are need to set up  the execution environement of a newly created/scheduled
thread/process. Moreover teh resources (physical pages) allocated to a
terminated
thread can be kept by the "task" and ready to be allocated to the next created
thread/process.

If this kind of thread/process is of interest for something else than
a parallel prolog system, we would like to hear from the people interested
by them. We could then put some  pressure on operating systems designers  :->..


Andre Veron
ECRC GmbH (European Computer Research Centre)
Arabellastrasse 17
D-8000 Munich 81 FRG
email: veron@ecrc.de

francis@cs.ua.oz.au (Francis Vaughan) (05/21/91)

In article <1991May20.185351.17114@ecrc.de>, veron@ecrc.de (Andre Veron)
writes:
|> 

|> We then put some hope in the lazy copying of MACH.  The problem then
appeared
|> to be that MACH as well as UNIX is not designed for applications which
|> need intensive forking of "threads" of computation which have their own 
|> separate adress spaces. The task in MACH  is a coarse grain entity which
|> have a whole bunch of available facilities like files, communication
|> port which are not always needed but are always there. A task is
consequently
|> costly to create, to schedule and to terminate.

It is probably unreasonable to saddle Mach with the blame for the
cost of fork etc. It is a bit sad that the call you really need
(task_create()) is not implemented in 2.5 and derived systems.
However this is more a reflection of Mach as a development/research
system than anything else.

Many of the overheads you accuse tasks of are not nessesary for the
task, but rather part of the stuff added to make a task look like a
Unix process.

........

|> 
|> What we ended up with is a proposal for a new kind of thread/process
|> which we believe is implementable in an operating system and fits more
|> our needs. These threads/processes are executed in the same adress space
|> except in some precise regions that they personnaly own and which are copied
|> (lazily) from the parent at creation time. The virtual space
|> appears then to be "locally layered":

I guess a lot of us have wished for just a little local address
protection for a thread within a task. Your suggestion has merit.
Intuitively the cost would be somewhere between raw threads in the
same address space and full tasks. See later.


|>  Within a quantum of time allocated
|> to a global "task" (not a MACH task any more) context switching between
|> these threads/processes is cheap - the cost is the one of a unmapping 
|> the pages of one thread/process and mapping those of the next one.
|> Since the concept of private region is hardwired in the paradigm
|> all virtual memory handling can be done at forking
time/context-switching time
|> when the system is in kernel mode. 

|> No additional and unelegant system calls
|> are need to set up  the execution environement of a newly created/scheduled
|> thread/process. 

Well no more than there are already. Someone must define those
memory areas and the appropriate attibutes. Not a lot different to
vm_inherit().

A few thoughts come to mind. 

I guess a lot of us have  wished for just a little local address
protection for a thread within a task. Your suggestion has merit.
Intuitively the cost would be somewhere between raw threads in the
same address space and full tasks. The whole thing could be more
powerful than you suggest.

The cost of conventional context switching is not high in terms of
work directly done to bring the context switch about. Rather it is
the invalidation of caches and PTLBs that cause pain as the new
process gets started again.  Your suggestion would actually involve
a lot more code in the context switch than is currently needed, but
you would hope to gain with little or no PTLB and cache wrecking.  

Luckily on multis there would be no need for PTLB shootdown, (Machs
current answer to brain dead MMUs that don't have coherent PTLBs)
as the changes to the  the memory map are per thread and hence
cannot be of consequence to other processors.

Your proposal would be very heavily dependant upon the underlying
architecture for effeciency. You would need an MMU capable of
selective PTLB invalidations, otherwise you would need to kill the
whole PTLB which would make the context switch just as expensive as
a full task switch. Most MMUs with PTLBs have this facility.  

The same goes for data cache. If it caches physical addresses there
would be little problem. If it was a virtual address cache life
would be very much harder. You could never invalidate all the
appropriate entries in anything like the time that a full cache
refill would take, so a complete invalidation would be the cheapest
way out. Again no gain over conventional tasks.


Life might get unbeliveablely interesting if you wanted to use a
machine with an inverted page table and still keep your caches
alive. (However no experience with these so I won't opine further.)

|> Moreover the resources (physical pages) allocated to a terminated
|> thread can be kept by the "task" and ready to be allocated to the
next created
|> thread/process.

I think you would buy an argument with the kernel over who gets
first pick of the physical pages. Nobody gets allocated physical
pages like this, you get the use of them for as long as you have a
good claim, and often not even that long. You most certainly never
get the option of hanging on for a rainy day. It's simply not part
of the abstraction.



Personally I would use such a facility and I suspect many others
could make use of it too. However I don't belive it is a goer
because a lot of existing hardware would make it uneconomical. It
would never work on any current Suns for instance. Other
achitectures would be fine, Mutimax would be no problem. As virtual
addressed caches die out it may well catch on.  


More lies and drivel from.....
					Francis Vaughan

goykhman_a@apollo.HP.COM (Alex Goykhman) (05/22/91)

In article <1991May20.185351.17114@ecrc.de> veron@ecrc.de (Andre Veron) writes:
>
>Copying data form task to task with data mapped at the
>same virtual address in  the two tasks is problem we have
>encountered when trying to design a Parallel Prolog-like System.
>
>We wanted to have a machinery for creating dynamically tree of processes where:
>1/**Every process which has forked off children is suspended until its
>children terminate. 
>2/**The children are created with a copy of a  part of the parent  data space. 
>3/** The size of the data to be copied is not fixed and can easily be some
>(tens of) megabytes big.
>4/** The data must be mapped at the same virtual adress both in the parent 
>and the children.
>5/** Not all the data which is copied in a child's space is read or written
>by the child. Locality in the accesses or updates is expected.
>6/ The processes in the tree do not need file descriptors,I/O channels
>or communication ports. 
>
>Because of 3/ and 5/ the eager copying of UNIX fork put UNIX out of the game.
>
>We then put some hope in the lazy copying of MACH.  The problem then appeared
>to be that MACH as well as UNIX is not designed for applications which
>need intensive forking of "threads" of computation which have their own 
>separate adress spaces. ....................................................

    Why do you think so?


>....................... The task in MACH  is a coarse grain entity which
>have a whole bunch of available facilities like files, communication
>port which are not always needed but are always there. A task is consequently
>costly to create, to schedule and to terminate.

    Files???  Contextwise, a MACH task is (mostly) a collection of VM regions. 
    Considering the way MACH manages memory (page aliasing), creating a new
    task/context should be relatively cheap.
>
>What we ended up with is a proposal for a new kind of thread/process
>which we believe is implementable in an operating system and fits more
>our needs. These threads/processes are executed in the same adress space
>except in some precise regions that they personnaly own and which are copied
>(lazily) form the parent at creation time. The virtual space
>appears then to be "locally layered":
>
>                             ------------- : Owned by Thread1
>                             ------------- : Owned by Thread2
>                             -------------
>     |-----Shared space----| Private space |--------Shared space -------|
>                             ------------- : ....
>                             ------------- : ....
>                             ------------- : Owned by ThreadN...

    I am not sure what you mean by "private", since you also indicate that a 
    "private" region will be read-shared with the parent task.

    What you are really looking for is a VM_INHERIT_MOVE inheretance attribute
    in addition to VM_INHERIT_SHARE, 0VM_INHERIT_COPY, VM_INHERIT_NONE already
    provided by MACH.  While it would be nice to have one supported by MACH,
    you should be able to achieve similar results with VM_INHERIT_COPY and 
    vm_deallocate(parent_task).  

> Within a quantum of time allocated
>to a global "task" (not a MACH task any more) context switching between
>these threads/processes is cheap - the cost is the one of a unmapping 
>the pages of one thread/process and mapping those of the next one.
>Since the concept of private region is hardwired in the paradigm
>all virtual memory handling can be done at forking time/context-switching time
>when the system is in kernel mode. No additional and unelegant system calls
>are need to set up  the execution environement of a newly created/scheduled
>thread/process. Moreover teh resources (physical pages) allocated to a
>terminated
>thread can be kept by the "task" and ready to be allocated to the next created
>thread/process.

    Looks like you just reinvented the MACH :)
>
>If this kind of thread/process is of interest for something else than
>a parallel prolog system, we would like to hear from the people interested
>by them. We could then put some  pressure on operating systems designers  :->..
>
>
>Andre Veron
>ECRC GmbH (European Computer Research Centre)
>Arabellastrasse 17
>D-8000 Munich 81 FRG
>email: veron@ecrc.de

veron@ecrc.de (Andre Veron) (06/03/91)

In article <51b7861a.20b6d@apollo.HP.COM>, goykhman_a@apollo.HP.COM
(Alex Goykhman) writes:
|>>Because of 3/ and 5/ the eager copying of UNIX fork put UNIX out of
the game.
|>>
|>>We then put some hope in the lazy copying of MACH.  The problem then
appeared
|>>to be that MACH as well as UNIX is not designed for applications which
|>>need intensive forking of "threads" of computation which have their own 
|>>separate adress spaces. ....................................................
|>
|>    Why do you think so?
|>
|>
|>>....................... The task in MACH  is a coarse grain entity which
|>>have a whole bunch of available facilities like files, communication
|>>port which are not always needed but are always there. A task is
consequently
|>>costly to create, to schedule and to terminate.
|>
|>    Files???  Contextwise, a MACH task is (mostly) a collection of VM
regions. 
|>    Considering the way MACH manages memory (page aliasing), creating a new
|>    task/context should be relatively cheap.
|>>


It is claimed that Mach is able to fork off tasks in constant time
(without taking into account the cost of future page faults due to copy-on-
write copying). It simply can not be true.

All the pages inherited by the children which have the copy-on-write property
have to be set to read-only before forking in order to trigger
the copy-on-write. This implies a scan of the region for modifying the pages
properties and to invalidate the corresponding TLB entries for these pages.
This has a LINEAR cost.

When it is claimed that that the cots is constant, I do not conclude
that the designers are liars but  that this linear cost is simply hidden
by some other CONSTANT costs which are usuallly much bigger.

Since all I would be interested in task forking is the virtual memory handling
I conclude that MACH does not fulfill my needs.

|>>                             ------------- : Owned by Thread1
|>>                             ------------- : Owned by Thread2
|>>                             -------------
|>>     |-----Shared space----| Private space |--------Shared space -------|
|>>                             ------------- : ....
|>>                             ------------- : ....
|>>                             ------------- : Owned by ThreadN...
|>
|>    I am not sure what you mean by "private", since you also indicate that a 
|>    "private" region will be read-shared with the parent task.
|>

In the scheme I want, parent threads/processes are suspended until thier
offsprings terminate. Private means that the sibling threads/processes
do not see thei respective private regions.

|>    What you are really looking for is a VM_INHERIT_MOVE inheretance
attribute
|>    in addition to VM_INHERIT_SHARE, 0VM_INHERIT_COPY, VM_INHERIT_NONE
already
|>    provided by MACH.  While it would be nice to have one supported by MACH,
|>    you should be able to achieve similar results with VM_INHERIT_COPY and 
|>    vm_deallocate(parent_task).  
|>
|>> Within a quantum of time allocated
|>>to a global "task" (not a MACH task any more) context switching between
|>>these threads/processes is cheap - the cost is the one of a unmapping 
|>>the pages of one thread/process and mapping those of the next one.
|>>Since the concept of private region is hardwired in the paradigm
|>>all virtual memory handling can be done at forking
time/context-switching time
|>>when the system is in kernel mode. No additional and unelegant system calls
|>>are need to set up  the execution environement of a newly created/scheduled
|>>thread/process. Moreover teh resources (physical pages) allocated to a
|>>terminated
|>>thread can be kept by the "task" and ready to be allocated to the
next created
|>>thread/process.
|>
|>    Looks like you just reinvented the MACH :)


You are hard with me !! :->.
If such memory handling is mimicked in MACH with tasks and VM_INHERIT_COPY,
this does not work because taks are considered to be independent entities.
Hence during task context-switch all the pages of the scheduled out task
are removed form the TLB (Ok it may not be that bad if your MMU has a context
information in its TLB lines but still....).

My point is that since I know that these threads/processes are tightly
connected
I do not want to remove the TLB entries other than the ones from the
private regions.

You cant do that in MACH

BTW: I would need a forking time of less than 1 ms with a private region
(or VM_INHERIT_COPY region) has big as possible.

goykhman_a@apollo.hp.com (Alex Goykhman) (06/07/91)

In article <1991Jun3.085647.25107@ecrc.de> veron@ecrc.de (Andre Veron) writes:
>In article <51b7861a.20b6d@apollo.HP.COM>, goykhman_a@apollo.HP.COM
>(Alex Goykhman) writes:
>|>>Because of 3/ and 5/ the eager copying of UNIX fork put UNIX out of
>the game.
>|>>
>|>>We then put some hope in the lazy copying of MACH.  The problem then
>appeared
>|>>to be that MACH as well as UNIX is not designed for applications which
>|>>need intensive forking of "threads" of computation which have their own 
>|>>separate adress spaces. ....................................................
>|>
>|>    Why do you think so?
>|>
>|>
>|>>....................... The task in MACH  is a coarse grain entity which
>|>>have a whole bunch of available facilities like files, communication
>|>>port which are not always needed but are always there. A task is
>consequently
>|>>costly to create, to schedule and to terminate.
>|>
>|>    Files???  Contextwise, a MACH task is (mostly) a collection of VM
>regions. 
>|>    Considering the way MACH manages memory (page aliasing), creating a new
>|>    task/context should be relatively cheap.
>|>>
>
>
>It is claimed that Mach is able to fork off tasks in constant time
>(without taking into account the cost of future page faults due to copy-on-
>write copying). It simply can not be true.
>
>All the pages inherited by the children which have the copy-on-write property
>have to be set to read-only before forking in order to trigger
>the copy-on-write. This implies a scan of the region for modifying the pages
>properties and to invalidate the corresponding TLB entries for these pages.
>This has a LINEAR cost.

    The Mach's copy-on-write property is based on harware's ability to
    write-protect a physical page, and that has to be done only once 
    (and not necessarily during a context switch), regardless of how many 
    forks are issued involving the page.

    If you really need to invalidate TLBs for VM_INHERIT_COPY, you can't blame
    that on Mach but rather on the MMU that you are using.  What you really
    need is a "global" bit associated with every TLB entry and set to '1' only
    for VM_INHERIT_SHARE and VM_INHERIT_COPY pages.  This way, the "private"
    TLB entries could be cheaply purged during a context switch.
>
>When it is claimed that that the cots is constant, I do not conclude
>that the designers are liars but  that this linear cost is simply hidden
>by some other CONSTANT costs which are usuallly much bigger.
>
>Since all I would be interested in task forking is the virtual memory handling
>I conclude that MACH does not fulfill my needs.
>
>|>>                             ------------- : Owned by Thread1
>|>>                             ------------- : Owned by Thread2
>|>>                             -------------
>|>>     |-----Shared space----| Private space |--------Shared space -------|
>|>>                             ------------- : ....
>|>>                             ------------- : ....
>|>>                             ------------- : Owned by ThreadN...
>|>
>|>    I am not sure what you mean by "private", since you also indicate that a 
>|>    "private" region will be read-shared with the parent task.
>|>
>
>In the scheme I want, parent threads/processes are suspended until thier
>offsprings terminate. Private means that the sibling threads/processes
>do not see thei respective private regions.

    If the parent always gets suspended till a child terminates, than only one
    process (the "youngest" child) could be running at any given time.  ???
>
>|>    What you are really looking for is a VM_INHERIT_MOVE inheretance
>attribute
>|>    in addition to VM_INHERIT_SHARE, 0VM_INHERIT_COPY, VM_INHERIT_NONE
>already
>|>    provided by MACH.  While it would be nice to have one supported by MACH,
>|>    you should be able to achieve similar results with VM_INHERIT_COPY and 
>|>    vm_deallocate(parent_task).  
>|>
>|>> Within a quantum of time allocated
>|>>to a global "task" (not a MACH task any more) context switching between
>|>>these threads/processes is cheap - the cost is the one of a unmapping 
>|>>the pages of one thread/process and mapping those of the next one.
>|>>Since the concept of private region is hardwired in the paradigm
>|>>all virtual memory handling can be done at forking
>time/context-switching time
>|>>when the system is in kernel mode. No additional and unelegant system calls
>|>>are need to set up  the execution environement of a newly created/scheduled
>|>>thread/process. Moreover teh resources (physical pages) allocated to a
>|>>terminated
>|>>thread can be kept by the "task" and ready to be allocated to the
>next created
>|>>thread/process.
>|>
>|>    Looks like you just reinvented the MACH :)
>
>
>You are hard with me !! :->.
>If such memory handling is mimicked in MACH with tasks and VM_INHERIT_COPY,
>this does not work because taks are considered to be independent entities.
>Hence during task context-switch all the pages of the scheduled out task
>are removed form the TLB (Ok it may not be that bad if your MMU has a context
>information in its TLB lines but still....).
>
>My point is that since I know that these threads/processes are tightly
>connected
>I do not want to remove the TLB entries other than the ones from the
>private regions.
>
>You cant do that in MACH

    Mach can not prevent you from selectively purging TLB entries because
    it knows nothing about the TLBs.  Maintaining them is a job of the 
    machine-dependent (pmap) code which you can always modify it to suit your 
    needs, assuming that the underlying MMU harware would let you do it.

>
>BTW: I would need a forking time of less than 1 ms with a private region
>(or VM_INHERIT_COPY region) has big as possible.
>

Rick.Rashid@cs.cmu.edu (06/07/91)

I replied to Andre in more detail since he seemed to be
unfamiliar with some of the details of Mach.  Still, I
think it is worth noting to the group as a whole that
the cost of task_create for a task with 1.6MB of memory
(including a 1MB array stored at program startup time)
is 1ms on a Decstation 3100 under Mach 3.0 (with all
memory inherited copy-on-write).