[comp.archives] Sun Checkpoint procedure

seth@sirius.ctr.columbia.edu (Seth Robertson) (04/13/90)

Archive-name: chkpnt/11-Apr-90
Original-posting-by: Seth Robertson <seth@sirius.ctr.columbia.edu>
Archive-site: sol.ctr.columbia.edu [128.59.64.40]
Archive-directory: pub
Archive-files: chkpnt.shar
Reposted-by: emv@math.lsa.umich.edu (Edward Vielmetti)


Here is something that you might want to x-ref in comp.archives:

Newsgroups: alt.sys.sun,alt.sources
From: seth@ctr.columbia.edu (Seth Robertson)
Subject: Sun Checkpoint procedure
Message-ID: <1990Apr8.214615.9283@ctr.columbia.edu>
Organization: Columbia University Center for Telecommunications Research
Date: Sun, 8 Apr 90 21:46:15 GMT


Greetings:

Below I have included a beta-test checkpointing program for Sun 3s.

Checkpointing, for those of you who do not know, consists of saving the
program state every so often so that if the program crashes you can
restart it from the last checkpoint.

Basically, what you have to do is insert a couple lines in your main()
and then select points in your program to do checkpoints (it should be
possible to set up an alarm to do it every hour or so, but I have not
tried this).  You need select these points carefully because the
process of checkpointing does have alot of overhead, so it is
important not to do it too frequently.

This checkpointing program is very good for compute-bound programs.
Programs that do I/O have problems because my checkpointing routines
_DO_NOT_DO_ANYTHING_ABOUT_FILES_ If you have open files, you MUST
reopen() them and relseek() them.  Also, programs that use signals
need to be careful.  Basically, if your program reads in data, thinks
about it for a few days, then spits it back out, this is for you.

Now for some restrictions.  It is currently working only on Sun 3s.
It compiles fine on Sun4s and (I believe) any Vaxen (Ultrix or BSD)
but the reason that it does not work is because of the Sun4s amd Vaxen
broken setjmp() routines.  On Sun4s, what needs to be done is for
someone to write an assembler routine to save all of the registers.
Especially the stack pointer.  I havn't done too much work on the
Vaxen, but the problem is pretty much the same.

I'm setting up a mailing list (which I envision to be very low volume,
but what do I know?) for future enhancements and the like.  The address
is "chkpnt-request@ctr.columbia.edu"

I would strongly request people to join the mailing list and report
the experiences they have.  I especially would like to hear from
anyone that gets it working on Sun 4s.

<code deleted>

					-Seth Robertson
					 seth@ctr.columbia.edu