[comp.unix.wizards] File names from file descriptors and Checkpoints

rside@uvicctr.UUCP (Robert Side) (09/10/88)
Here is finally the summary on how to get a file name from
the file descriptor.

First, I would like to thank all the people that responded
to my problem on checkpointing processes as well as how to get file
name from a file descriptor. I tried to respond to all people
who sent me mail and I think I was more successful this time,
but if a reply did not reach you, let me now say *thank you* for your
reply.

Second, I would like to thanks two people Dave Curry and der Mouse
for sending me source to their solutions to my problem. As an
aside I will *NOT* send their source to anyone since I do not
have permission from them to do so. If you feel you need the source
I suggest you mail to these two people directly.

Finally, in short I believe the problem of checkpointing a process with
open files has been solved. At least to my satisfaction. The specific
question of an easy way of finding the file name from a file descriptor
is not solved. There may not even be a solution to this problem 
as is discussed below.

-----------------

From: Amos Shapir <taux01!taux02.taux01.UUCP!amos@nsc>
>Summary: it's impossible. All a process  has is a file descriptor, which
>may be  connected to a  pipe (and in modern  systems, to a  socket whose
>other end  is in Timboktu). Even  if it is  a regular file, it  may have
>been inherited from a great-grandparent, so changing fopen to keep track
>of file names is not sufficient.
>-- 
>	Amos Shapir				amos@nsc.com
>National Semiconductor (Israel)
>6 Maskit st. P.O.B. 3007, Herzlia 46104, Israel  Tel. +972 52 522261
>34 48 E / 32 10 N			(My other cpu is a NS32532)

-----------------

From: uunet!dalsqnt!vector!chip (Chip Rosenthal)
>Not easily.
>
>You could do it by calling fstat() with the filedes, which will give you
>the inode of the file and device it resides on.  Then you have to search
>through that device for all directory entries which reference this inode.
>This is what the SysV ncheck(1) does -- or at least it's what the XENIX
>V ncheck(C) does.  In both cases you need superuser privileges.  Also,
>this is not real clean -- possible problems:  the filedes is a pipe, the
>file contains multiple links, the file has been rm'ed by another process,
>etc.
>---
>Chip Rosenthal     chip@vector.UUCP | I've been a wizard since my childhood.
>Dallas Semiconductor   214-450-0486 | And I've earned some respect for my art.

-----------------

[  I lost the first message received by Dave Curry, (shame on me),
   however I will try to state approximately what he said ]

From: davy@relay.ubc.ca (Dave Curry) (Message 1)
> [I (Dave Curry) have written a set of library routines that will]
> [checkpoint and recover processes. They where written on a VAX for]
> [BSD 4.2. I do not remember if they handle sockets, but they do]
> [handle open files and pipes. If you like I can mail you a copy.]
> [The only request that I make is that if you use my code that you]
> [send me the diffs]
>

[ I (Rob speaking now) sent Dave mail asking him if he could dig up the source
  and send it to me and his next response (along with a transcript
  of my message) follows ]

From: davy@relay.ubc.ca (Dave Curry) (Message 2)
>     [ This is Rob speaking in the indented stuff ]
>
>     [ Some stuff deleted ]
>
>     I would like to take a look at the code, from what you have said
>     it is pretty close to meeting my specs. There are a few things
>     I am worried about. There will be open sockets. I guess I never
>     said this in the article but when a rollback occurs it must
>     overwrite the current memory image to keep the same processes id
>
>     [ Some Stuff deleted ]
>
>Keeping the same pid is easy enough, I guess.  The library writes the
>executable to the file "chkpt.dat" (user-settable), so assuming you
>have a process with the correct pid running, all you need to do is
>execl() "chkpt.dat", and you're all set.
>
>I'm still not sure how you'd go about creating sockets.  It's easy enough
>to "repoen" them I guess, and you could probably even save all the connect
>info and reconnect them to their servers.  But unless your servers and
>clients are all stateless, you're going to have a hard time putting the
>whole mess back into the same state.
>
>     It sounds that your library can modified to meet my needs. I
>     have written routines to checkpoint and rollback processes that
>     do not have open files, so if I could see how you restore the
>     files this would be a great help.
>
>     It would be *much* appreciated if you could dig up the code and
>     mail it to me.  If I make any changes I CERTAINLY will mail you
>     the diffs or the complete source of the changes if it is deemed
>     necessary.
>
>I'll probably have to pull it off tape.  I'll see if I can get to it
>today or tomorrow, if at all possible.
>
>     [ My signature Deleted ]
>
>--Dave
>

[ In Dave's last correspondence I received his code and low and behold
  it also handles sockets (almost) ]

From: davy@relay.ubc.ca (Dave Curry) [ Message 3 ]
>Here it comes... I looked through it, and it seems that it already does
>catch some of the socket system calls (the ones that allocate file
>descriptors), but there's also code that checks to see if the
>descriptor is a socket in chkpt.c and restore.c, so you'll need to fix
>that.  Also check the two #ifdef vax sections, which will require a
>few lines of assembler if you're not on a Vax.
>
>Finally, check the Makefile - it probably doesn't install things where
>you will be wanting them...
>
>--Dave
>
>  [ The actual code is deleted ]

[ I beleive Dave's code will work and I was in the process of getting
  it compiled when our Suns went down.  They will be up this weekend
  I hope and early next week I should be able to test it ]

-----------------

From: uunet!hao.UCAR.EDU!pag (Peter Gross)
>One problem:  file descriptors do not always refer to files.  Depending
>on which version of Unix you are running, they could be pipes, sockets,
>fifo's, etc.  Thus your solution of redoing the stdio lib to trap
>file names would leave some holes.
>
>--peter gross

-----------------

From: alberta!edm!steve
>stat(2) gives both an inode and a device #. I'm not exactly sure about the
>mapping from device # to device name/map point but, as a worst case, you could
>always fstat /<mountpoint>/.  for each mounted device and then stop when you
>get a correct value.
> 
>  One point: from an inode #, the best that I can figure out what to get is 
>A file name.  If a file has multiple links, then you can sometimes find
>multiple names for the file but, in most cases, this should not be a problem
>for you.
>
>btw: the way ncheck (probably) gets file nams from inode #s is to fstat every
>file in the apropriate mounted filesystem.  To speed things up, it might be 
>worthwile to assume that most of the files are in (or below) the current
>directory, and start by spanning that tree before you go thru the rest of the
>file system.
>	Sorry for being so verbose.
>-------------
>Stephen Samuel 	  (userzxcv@ualtamts.bitnet   or  alberta!edm!steve)
>MS-DOS : CPM impersonating UNIX  **   OS/2 : IBM impersonating APPLE
>

-----------------

From: uunet!gatech.gatech.edu!emory!vss (V.S.Sunderam)

>	I just read your recent postings regarding checkpointing & wanted
>	to let you know of our attempts in this regard. Our main
>	interest is process migration, but checkpoint restarts are a
>	special case & we do have some software that does this for Sun's.
>	However, we do not (yet) handle processes that use sockets; the
>	only other limitation is that the process use only NFS files.
>	
>	The Winter 88 Usenix proceedings (pp 357) has our paper that
>	describes the mechanisms & the software. If you are interested
>	I would be happy to give you more info  and/or source code.
>	
>	V.S.Sunderam
>	Dept.of Math & CS
>	Emory University
>	Atlanta, GA 30322
>	vss@mathcs.emory.edu
>	...!gatech!emory!vss

-----------------

From: der Mouse  <mcgill-vision!uunet!Larry.McRCIM.McGill.EDU!mouse> [Message 1]
>I implemented something similar once.  What I did was to checkpoint a
>process into a file for later resumption, but the constraints were
>somewhat different.  In particular, the whole point was to be able to
>restore a simulatior run after a crash, which makes restoring open
>files and so on effectively impossible.  This is the difficult part of
>this: open files.  My "solution" was to force the program to close all
>files before checkpointing; this was feasible in our case.
>
>Have you considered forking and letting one process run on, with the
>"resumption" consisting of switching to the other process?  Depending
>on what you want, this might be good enough.
>
>Doing this would involve just adding two syscalls, one to dump a
>process and one to restore it.  Yes, it's possible.  I wouldn't attempt
>it without kernel source, but then I get very dogmatic about having
>source.  I'd be glad to send you the code I have for dumping and
>restoring later, in another process, though it won't be directly useful.
>
>					der Mouse
>
>			old: mcgill-vision!mouse
>			new: mouse@larry.mcrcim.mcgill.edu

[ I wrote the >> parts ]

From: der Mouse  <mcgill-vision!uunet!Larry.McRCIM.McGill.EDU!mouse>[Message 2]
>> 1) If it is not too much trouble could you please send the code.  I
>>    have implemented two routines to save and restore a process and it
>>    does seem to work on small test programs and these program must
>>    not have open files.  I am currently working on the problem with
>>    open files.
>
>> 3) One of the limitations thrust upon me is NO KERNEL CHANGES
>
>I will be astonished if you get it to work with no kernel changes,
>unless you always use OMAGIC executables, and even then I would expect
>it to be quite a can of worms.
>
>My code consists of two syscalls, one to dump a process and the other
>to restore it.  The kernel code is in the following shar as snapshot.c;
>the only other tricky part is that the user-level code surrounding the
>snapshot syscall is special.  Everything but the stack pointer is saved
>on the stack to make life easier for the kernel.  This code follows
>after the shar.
>
>The kernel code here is for a mtXinu 4.3+NFS system; for real 4.3 all
>that needs changing is to scrap the silly vnode code and put back the
>real inode stuff.
>
[ Actual Code Delete ]
>
>Since you are forbidden kernel changes, this probably won't be much use
>to you.  If you'd like to talk about this some more, feel free to send
>me mail.
>
>der Mouse
>
>old: mcgill-vision!mouse
>new: mouse@larry.mcrcim.mcgill.edu
-- 
Robert Side <rside@uvunix.uvic.cdn>
UUCP:	...!{ubc-vision,uw-beaver,ssc-vax}!uvicctr!rside
BITNET:	rside@uvunix.bitnet