rside@uvicctr.UUCP (Robert Side) (08/29/88)
First of all. I would like to *thank* all the people that responded to my problem. I tried to reply to everyone but I guess I have not mastered the mailing program on our system yet. Second, I have another problem concerning this problem which I will post in another article. I originaly wrote on checkpointing and the rollback of processes > I have a problem I hope somebody can help me with. > > Long Summary: > I would like to be able to *checkpoint* a running process > so that the process, which is under user control, can be rollbacked to a > given checkpoint and restarted. > > My idea to solve the problem: > The way I have been thinking to solve the problem is to save > the process's data, stack and registers when a checkpoint > occurs and when the user rollbacks the process, the saved > data, stack, and registers are copied into the process's memory > image and hopefully the process will think it is back at the time > the checkpointed was taken. > > Cravats: > Sun-3 workstations running Sun UNIX 4.2 Release 3.3. There will > be open files as well as open sockets. The ptrace system call can > be used. > > What I need help with: > I would like to know if the problem can be solved, > what literature (if any) has been written on the above problem, > what problems will arise, and, MOST OF ALL, how to do it. > > Please email responses (But I do read these groups) and I will > summarize. > > *MANY* thanks in advance and any help will be greatly appreciated. > > Rob Side > > Robert Side <rside@uvunix.uvic.cdn> > UUCP: ...!{ubc-vision,uw-beaver,ssc-vax}!uvicctr!rside > BITNET: rside@uvunix.bitnet -------------------- Jeff Woolsey <uw-beaver!ames!ucbcad!nsc.NSC.COM!woolsey> writes You've neglected one biggie: open files, and their positions. Another, not quite so biggie: process environment (particularly the current working directory, if the process has written out files it will later want to read). Of course, if the checkpoint is handled by the program itself, it can make sure that it happens and a good time (no open files, etc). If the checkpoint is handled by something external, so that you could use it to checkpoint ANYTHING (except programs running with privilege), you'll have to worry about all this stuff. Good luck. Jeff Woolsey woolsey@nsc.NSC.COM -or- woolsey@umn-cs.cs.umn.EDU -------------------- uunet!jetson.UPMA.MD.US!john (John Owens) writes Check out the undump mechanism used in GNU Emacs. It writes an executable image of the current process. It's used to turn certain pre-loaded data into shared read-only text, but you could adapt it to your uses. The only problem is knowing what your open files are. If you are able to, you could set a flag in the dumped image that your program will read on start, and it will reopen the files, fix the stack, and do a longjmp to a setjmp that you've stored before the undump. You can also do an ftell on all the files during the checkpoint and lseek during the restore.... Good luck! --- John Owens john@jetson.UPMA.MD.US SMART HOUSE L.P. uunet!jetson!john (old uucp) +1 301 249 6000 john%jetson.uucp@uunet.uu.net (old internet) -------------------- uunet!unisoft!cander (Charles Anderson) writes I will assume that you don't care about files being changed. Rolling them back (without just copying them) could be a problem without some help from the O.S. Here's a simple solution that the 4.2 dump program uses: fork and let the child do the work/transaction. If you need to rollback, just have the child exit. The parent is then in exactly the same state as when the "checkpoint" happened. Dump uses this to deal with potential tape problems. You could do any number of forks (up to the per users process limit) to maintain any number of current checkpoints. To roll forward or "commit the transaction" you could signal the parent(s) and have him/her/them exit. I realize it's kind of quick and dirty and it may be expensive if the process is big, but it will work. Otherwise, you could try to write the whole data segment out to disk to checkpoint and do a setjmp(). Then to rollback, you could read the data segment back in and longjmp(). I don't know if it would work, but it sounds good. Let me know what you decide on. It sounds like an interesting problem. Charles. {sun,uunet,ucbvax,pyrmaid}!unisoft!cander -------------------- uunet!dalsqnt!vector!chip (Chip Rosenthal) Writes >The way I have been thinking to solve the problem is to save >the process's data, stack and registers when a checkpoint >occurs Setjmp/longjmp does this for the stack and registers. --- Chip Rosenthal chip@vector.UUCP | I've been a wizard since my childhood. Dallas Semiconductor 214-450-0486 | And I've earned some respect for my art. -------------------- der Mouse <mcgill-vision!uunet!Larry.McRCIM.McGill.EDU!mouse> writes I implemented something similar once. What I did was to checkpoint a process into a file for later resumption, but the constraints were somewhat different. In particular, the whole point was to be able to restore a simulatior run after a crash, which makes restoring open files and so on effectively impossible. This is the difficult part of this: open files. My "solution" was to force the program to close all files before checkpointing; this was feasible in our case. Have you considered forking and letting one process run on, with the "resumption" consisting of switching to the other process? Depending on what you want, this might be good enough. Doing this would involve just adding two syscalls, one to dump a process and one to restore it. Yes, it's possible. I wouldn't attempt it without kernel source, but then I get very dogmatic about having source. I'd be glad to send you the code I have for dumping and restoring later, in another process, though it won't be directly useful. der Mouse old: mcgill-vision!mouse new: mouse@larry.mcrcim.mcgill.edu ---------------- Again thanks to those that replied Rob Side -- Robert Side <rside@uvunix.uvic.cdn> UUCP: ...!{ubc-vision,uw-beaver,ssc-vax}!uvicctr!rside BITNET: rside@uvunix.bitnet
laman@ivory.SanDiego.NCR.COM (Mike Laman) (08/31/88)
In article <484@uvicctr.UUCP> rside@uvicctr.UUCP (Robert Side) writes: >First of all. I would like to *thank* all the people that responded >to my problem. I tried to reply to everyone but I guess I have >not mastered the mailing program on our system yet. : : : >I originaly wrote on checkpointing and the rollback of processes > >> I have a problem I hope somebody can help me with. >> >> Long Summary: >> I would like to be able to *checkpoint* a running process >> so that the process, which is under user control, can be rollbacked to a >> given checkpoint and restarted. >> : [ Deleted the rest of his "original" message ] [ Deleted a couple messages Robert included ] : >uunet!jetson.UPMA.MD.US!john (John Owens) writes > : [ Deleted one suggestion from John's message to Robert ] : >Otherwise, you could try to write the whole data segment out to disk to >checkpoint and do a setjmp(). Then to rollback, you could read the >data segment back in and longjmp(). I don't know if it would work, but >it sounds good. > : [ Deleted a couple messages Robert included ] : I just wanted to add my two cents worth on the subject of writing out an arbitrary area of data in one process and reading it back in in another process in the future. It is possible. Afterall, that's how "rogue" saves a game. I just wanted to warn you of a nonobvious problem you can encounter. If the area you are saving contains various stdio library data and you use the stdio library for writing out the data, you will have a problem. When you write out the ``_iob'' table, it will show that a slot (among others, of course) is in use. Namely, the one you're using to write out the data. Eventually you'll finish writing out the data and (as a good programmer :-)) ``fclose()'' the file. Well, that frees the stdio ``_iob[]'' slot and closes the file descriptor, but be careful, when you read the data in later (in someother process probably). The ``_iob[]'' slot was "open" at the time the data was saved. After you have restored all the data, you need to "fclose()" that once open stream (which is really closed) used to write out the data, so you can free up the slot. Otherwise, each time you restore from a saved image, you'll keep eating up a stdio ``_iob[]'' slot. On many systems you'll get to save the image about 17 times (20 - 3 (stdin, stdout, stderr)). Then your "fopen()"'s will fail because your ``_iob[]'' table is full. And don't worry, I'm not even going to mention the lack of portability for systems with non contiguous data space. Hmmm. I guess I did. Mike Laman P.S. When you think about this you really start to worry about the guts of various libraries with their static (only once) initialized data. You'd better hope they are initialized properly. Example: terminfo curses - don't play a restored game of rogue on a different terminal! The interanlly static data is for the original terminal type! You're getting into a headache, generally speaking with this approach.