[comp.unix.questions] Implement a Remote Fork facility

anthony@alberta.UUCP (Anthony Mutiso) (11/02/88)

I have a feeling that someone out there has tried to
implement a _remote fork_ system call.  This is
necessary for process migration etc.

MY PROBLEM: How does one copy a active process 
execution image, and restart it else where
jumping to same location the parent process is at.

REQUIREMENTS: Open file descriptors and the offset that
where in the parent process are available in the so called
child (Parent child relationship slightly altered).
All variables hold the same values as they did in the parent
just prior to the _remote fork_ call.
The child continues it's existence from the point the parent
forked at.
== all the above are the results we have all come to love
== in the fork(2) system call.

I have looked at all sorts of things with very poor results.
Copying the parents file descriptor table, but where does that
leave me, at best I will end up with inode numbers that are
rather difficult to map back to file pathnames.
Generating a core of the running process (stopped of course),
and finding a way to transform the core(5) to a.out(5) format
with the program entry point somewhere else. (How does one do
that).

I need ideas, clues, insight, and general all-round help.
Please if anyone has looked at this issue please fill me in (mail).

Thanks for any hints

Anthony Mutiso            anthony@alberta.uucp
                      or  {watmath, ubc-vision}!alberta!anthony

dlm@cuuxb.ATT.COM (Dennis L. Mumaugh) (11/03/88)

In article <1777@pembina.UUCP> anthony@alberta.UUCP (Anthony Mutiso) writes:

> I have  a  feeling  that  someone  out  there  has  tried  to
> implement a _remote fork_ system call.  This is necessary for
> process migration etc.

> MY PROBLEM:  How does one copy  a  active  process  execution
> image, and restart it else where jumping to same location the
> parent process is at.

> REQUIREMENTS:  Open file  descriptors  and  the  offset  that
> where  in  the  parent process are available in the so called
> child  (Parent  child  relationship  slightly  altered).  All
> variables hold the same values as they did in the parent just
> prior to the _remote fork_ call.  The  child  continues  it's
> existence  from  the  point  the parent forked at. == all the
> above are the results we have all come  to  love  ==  in  the
> fork(2) system call.

> I have looked at all sorts of things with very poor  results.
> Copying  the  parents  file  descriptor table, but where does
> that leave me, at best I will end up with inode numbers  that
> are   rather   difficult  to  map  back  to  file  pathnames.
> Generating a core of the running process (stopped of course),
> and finding a way to transform the core(5) to a.out(5) format
> with the program entry point somewhere else. (How does one do
> that).

> I need ideas, clues, insight,  and  general  all-round  help.
> Please  if  anyone has looked at this issue please fill me in
> (mail).

Ordinarily I would respond by email BUT people haven't  heard  of
the  following,  so I will append the necessary pointers.  By the
way, it includes the scheme for mapping from file  descriptor  to
file name, etc.

[ This is refer format ].

%A D. H. Lawrie
%A J. M. Randal
%A R. R. Barton
%T Experiments with Automatic File Migration
%J COMP
%I University of Illinois
%D 1982
%P 45-55

%A David Maier
%R UIUCDCS-R-86-1240
%I Department of Computer Science, University of Illinois
%C Urbana, Illinois 61801
%A R. P. Cagel
%T Process Suspension and Resumption in The UNIX System V Operating System
%D January 1986
%K process migration
%R M.S. Thesis
%X Process suspension and resumption features were added to the UNIX
kernel.  This will allow a user to reboot the operating system without
having to kill long running processes.  The process images are
extracted
from the kernel and saved in disk storage before the system halts.
Each process may be restarted from the point where it left off or even
moved to another machine to be resumed.  This thesis describes the
kernel changes to accomplish this.

%T An unix 4.2 BSD implementation of Process suspension and resumption
%A A.Y. Chen 
%D June 86  
%R UIUCDCS-R-86-1286
%I Department of Computer Science, University of Illinois
%C Urbana, Illinois 61801
%T Process Suspension and Resumption in The UNIX System V Operating System
%K process migration
%R M.S. Thesis
%X Process suspension and resumption features were added to the UNIX
kernel.  This will allow a user to reboot the operating system without
having to kill long running processes.  The process images are
extracted
from the kernel and saved in disk storage before the system halts.
Each process may be restarted from the point where it left off or even
moved to another machine to be resumed.  This thesis describes the
kernel changes to accomplish this.

-- 
=Dennis L. Mumaugh
 Lisle, IL       ...!{att,lll-crg}!cuuxb!dlm  OR cuuxb!dlm@arpa.att.com

ag@elgar.UUCP (Keith Gabryelski) (11/03/88)

In article <1777@pembina.UUCP> anthony@alberta.UUCP (Anthony Mutiso) writes:
>MY PROBLEM: How does one copy a active process 
>execution image, and restart it else where
>jumping to same location the parent process is at.

And on a related topic...

How does one stop a process in a way that it can be restarted after a
cold boot?  It would seem to me that restarting a core image would be
the best way.

I remember some discussion on restarting core dumps a few months back.
Does anyone have a copy of the thread?

Pax, Keith
-- 
ag@elgar.CTS.COM         Keith Gabryelski          ...!{ucsd, jack}!elgar!ag

anthony@alberta.UUCP (Anthony Mutiso) (11/06/88)

In article <2163@cuuxb.ATT.COM>, dlm@cuuxb.ATT.COM (Dennis L. Mumaugh) writes:
> In article <1777@pembina.UUCP> anthony@alberta.UUCP (Anthony Mutiso) writes:
> 
> > MY PROBLEM:  How does one copy  a  active  process  execution
> > image, and restart it else where jumping to same location the
> > parent process is at.
> 
> [ This is refer format ].
> 
> %A D. H. Lawrie > %A J. M. Randal > %A R. R. Barton
> %T Experiments with Automatic File Migration
> %J COMP > %I University of Illinois > %D 1982 > %P 45-55

(1) How would one go about converting a core image to a a.out object.

(2) All the data in the new a.out is initialized to the values present
in the core image at the time it was made.

(3) Have the program entry point somewhere in the program other than
in the main function, "the process contiunes as if it always exsisted".
of course some type of inti function will have to open all the former
process files and wind then up to the correct locations.

Hints, ideas anything.

Anthony Mutiso			anthony@alberta.UUCP

jbn@glacier.STANFORD.EDU (John B. Nagle) (11/07/88)

      It's been done.  See "The LOCUS Distributed System Architecture",
by Popek and Walker, MIT Press, 1985.  ISBN 0-262-16102-8
LOCUS is a distributed UNIX kernel developed at UCLA.  It's 4.2BSD
compatible, yet allows full distribution over a network of heterogeneous
machines.  Processes can be migrated from one similar machine to another while
running, using the migrate(II) system call.  Open files, pipes, signals, and
sockets survive migration.  Even shared file position works; the mechanism
for doing this efficiently is very clever.  A user can migrate his own tasks, 
or a background scheduler may force task migration.  Across heterogeneous 
CPUs, one can perform "exec"; an "exec" of an object program that needs a 
different kind of machine results in execution on a suitable machine elsewhere
in the network.  Very impressive.  Not clear why it never caught on.

					John Nagle
type of CPU than the one the process is running on will result in the
process 

ekrell@hector.UUCP (Eduardo Krell) (11/07/88)

In article <17819@glacier.STANFORD.EDU> jbn@glacier.UUCP (John B. Nagle) writes:

(about LOCUS)
>Not clear why it never caught on.

But it did. It is licensed by IBM as part of AIX. They call it "Transparent
Computing Facility", I think.
    
Eduardo Krell                   AT&T Bell Laboratories, Murray Hill, NJ

UUCP: {att,decvax,ucbvax}!ulysses!ekrell  Internet: ekrell@ulysses.att.com

gwyn@smoke.BRL.MIL (Doug Gwyn ) (11/09/88)

In article <16@elgar.UUCP> ag@elgar.UUCP (Keith Gabryelski) writes:
>How does one stop a process in a way that it can be restarted after a
>cold boot?

You obviously can't, in general.

geoff@eagle_snax.UUCP ( R.H. coast near the top) (11/09/88)

In article <17819@glacier.STANFORD.EDU> jbn@glacier.UUCP (John B. Nagle) writes:
>
>      It's been done.  See "The LOCUS Distributed System Architecture",
>by Popek and Walker, MIT Press, 1985.  ISBN 0-262-16102-8
>LOCUS is a distributed UNIX kernel developed at UCLA.  It's 4.2BSD
>compatible, yet allows full distribution over a network of heterogeneous
>machines. [...] Very impressive.  Not clear why it never caught on.

Well, the early versions were pretyy s-l-o-o-o-w at doing the
niftier things, but I think a lot of that got fixed. However
I understand that quite a number of companies entered into
licensing negotiations with Locus, including at least one which
bet the - software - future of the company on being able to get
hold of Locus and use it to compete with Apollo and Sun. Unbeknownst
to these hopefuls, IBM had funded the Locus startup, and eventually
decided to exercise their option to an exclusive license, thus causing
Locus to pull the plug on all of the other suitors. A number of the
elements of Locus are now beginning to trickle out in the form of
AIX features.

-- 
Geoff Arnold, Sun Microsystems Inc.   +------------------------------------+ 
PC Distributed Systems(home of PC-NFS)|Someone, somewhere, wants an RFC822 |
UUCP: {hplabs,decwrl...}!sun!garnold  |message from YOU.                   |
ARPA: garnold@sun.com                 +------------------------------------+

fred@oravax.UUCP (Charles Mills) (11/10/88)

In article <8831@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
>In article <16@elgar.UUCP> ag@elgar.UUCP (Keith Gabryelski) writes:
>>How does one stop a process in a way that it can be restarted after a
>>cold boot?
>
>You obviously can't, in general.

Wouldn't it be appropriate to explain why this is obvious?  It clearly wasn't
obvious, for example, to ag@elgar.UUCP (Keith Gabryelski).

Certainly there's no way of ensuring that the process's open file descriptors
can be meaningfully assigned when it's restarted, and perhaps it's this
to which you allude.  Except for that, though, I see no particular
problem in principle, though all the solutions I've heard about have
defects or failures of generality.  If you are aware of other reasons
why it can't be done, I'm sure I'm not the only person who'd be curious to
see them.

fred

jc@minya.UUCP (John Chambers) (11/14/88)

In article <529@oravax.UUCP>, fred@oravax.UUCP (Charles Mills) writes:
> In article <8831@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
> >In article <16@elgar.UUCP> ag@elgar.UUCP (Keith Gabryelski) writes:
> >>How does one stop a process in a way that it can be restarted after a
> >>cold boot?
> >
> >You obviously can't, in general.
> 
> Wouldn't it be appropriate to explain why this is obvious?  It clearly wasn't
> obvious, for example, to ag@elgar.UUCP (Keith Gabryelski).
> 
I suspect he was using the standard mathematical definition (of both 
"obvious" and "trivial"), to wit:
	anything I understand.















[OK, let's up the line count and re-post it ;-]
-- 
John Chambers <{adelie,ima,maynard,mit-eddie}!minya!{jc,root}> (617/484-6393)

[Any errors in the above are due to failures in the logic of the keyboard,
not in the fingers that did the typing.]