[comp.unix.wizards] "Check pointing" on BSD Unix

moss@takahe.cs.umass.edu (Eliot &) (11/07/89)

I would think it would work just to type the correct interrupt character to
force the process to dump core. That's the *easy* part. The *hard* part is
re-establishing the process's open files, network connections, children, etc.
Unless the process does not deal with files, I would not expect dumping core
to work.							Eliot
--

		J. Eliot B. Moss, Assistant Professor
		Department of Computer and Information Science
		Lederle Graduate Research Center
		University of Massachusetts
		Amherst, MA  01003
		(413) 545-4206; Moss@cs.umass.edu

smw@maxwell.Concordia.CA ( Steven Winikoff ) (11/07/89)

In article <1326@utkcs2.cs.utk.edu> battle@alphard.cs.utk.edu (David Battle) writes:
>Does anyone out there know of any way to "check point" a process
>under BSD unix?  In particular I am interested in a way to do it on
>a Ultrix VAX and possibly on a DECStation 3100.
>
>What I would like to be able to do would be to stop a running process,
>save it's core image to a file (similar to what gcore(1) does), kill
>the original process, and then later (like, say, after rebooting)
>restart the process from the saved core image.

I'm interested too!  Please post if you have any information that may 
help.  

On a related topic, is it necessary to enter single-user mode to do
full backups using cpio?
------------------------------------------------------------------------
Steven Winikoff                                 smw@maxwell.concordia.ca
Software Analyst
Concordia University Computer Centre            voice: (514) 848-7619
Montreal, Quebec, Canada                               (10:00-18:00 EST)  

pcg@aber-cs.UUCP (Piercarlo Grandi) (11/11/89)

In article <1326@utkcs2.cs.utk.edu> battle@alphard.cs.utk.edu (David Battle) writes:
    Does anyone out there know of any way to "check point" a process
    under BSD unix?  In particular I am interested in a way to do it on
    a Ultrix VAX and possibly on a DECStation 3100.
    
    What I would like to be able to do would be to stop a running process,
    save it's core image to a file (similar to what gcore(1) does), kill
    the original process, and then later (like, say, after rebooting)
    restart the process from the saved core image.

You have two choices; one is the famous undump program, that comes with TeX,
that is used to produces restartable, loaded images for programs that take a
long time to initialize. What you get with undump is the ability to restart
a program with memory as it was when it was stopped. Every other resource
must be reinitialized by the program itself (e.g., files must be reopened and
repositioned, etc...).

A more complete approach is described in some EUUG Bulletin, has been done
by people at Olivetti (?), and is a virtually full checkpoint/restart
facility.

If you want more information, I can dig up the exact reference, but I am
pretty sure it is not a product, just a demonstration of concept, and you
will have some difficulty laying your hands on it.

A full checkpoint restart facility is difficult to do under Unix, because a
process has a lot of relationships to other entities, and checkpointing them
all is hard, not to speak of restarting them (from open files to pipes to
socket connections, child process, etc...).
-- 
Piercarlo "Peter" Grandi           | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcvax!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

michaud@decvax.dec.com (Jeff Michaud) (11/16/89)

> What I would like to be able to do would be to stop a running process,
> save it's core image to a file (similar to what gcore(1) does), kill
> the original process, and then later (like, say, after rebooting)
> restart the process from the saved core image.

	It would be easiest if the application knew how to checkpoint itself.

	One of the biggy problems you are going to have trying to checkpoint then
	restart as a new process without cooperation from application is restoring
	open file descriptors.  You may be able to reopen normal files, but if the
	application had sockets open talking to someone else, it would be impossible
	to restore that state w/out the application being restarted knowing.

	You may want to look into the Tex/LaTeX distro.  TeX can dump itself and
	be massaged to create a new executable that includes newly built in
	style/macro files.

/--------------------------------------------------------------\
|Jeff Michaud    michaud@decwrl.dec.com  michaud@decvax.dec.com|
|DECnet-ULTRIX   #include <standard/disclaimer.h>              |
\--------------------------------------------------------------/