[comp.unix.questions] Core files ... still trying

warner@scubed.UUCP (Ken Warner) (06/25/88)

In article <11954@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>In article <790@scubed.UUCP> warner@scubed.UUCP (Ken Warner) writes:
>>Is there a way to run a core file?
>
>It cannot be done portably, but with certain restrictions, it can
>almost always be done.
[stuff deleted]
>-- 
>In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
>Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

Well, I've looked at unexec() from emacs and then tried to write my own version
of a fuction that will create a new executable, saving the current data space.
It does work (sort of), but when the new executable is run, a segmentation
violation occurs in the clean up on exit.  I dunno how to deal with this.
Anyone care to comment?  Below is a stack trace from dbx showing the error. 
Below that is the fuction I've been working with.  

Basically, the function (my)unexec() copies the text segment from the 
executable file since that won't be changing, then copies the data segment 
from memory.  Then copies the symbol table and string table from the executable
file.  The result is an executable with a snapshot of the data segment.  Like
I said, it runs but dies a horrible death on exit.  ut.c was the name of
my main().

-------------- dbx stack dump -----------------------------------

signal SEGV (segmentation violation) in cfree at 0x4fe8
cfree+0x2c:             cmpl    a3@(8),a5
(dbx) where
cfree(0x23458, 0x0) at 0x4fe8
free(0x2345c) at 0x53eb
fclose(0x206f4) at 0x48b3
_fwalk(0x4848) at 0x4f35
_cleanup(0x1, 0x0) at 0x4841
exit(0x1) at 0x4ca9
main(argc = 1, argv = 0xefffb8c, 0xefffb94), line 78 in "ut.c"
(dbx)  

--------------- myunexec.c -------------------------------------

#include <a.out.h>
#include <sys/file.h>
#include <sys/types.h>
#include <stdio.h>
#include <sys/stat.h>
#include <errno.h>

extern int first(); /* this was the name of a little stub in main() */
extern char *sbrk ();
extern int end,edata,etext;
static struct exec hdr;

/* ****************************************************************
 * unexec(new_name,a_name) where new_name is the name of the new executable
 * and a_name is the name of the file containing the currently executing
 * program
 *
 */
unexec (new_name, a_name)
char *new_name, *a_name;
{

    int new, a_out = -1;
    if (a_name && (a_out = open (a_name, O_RDONLY)) < 0)
    {
	perror (a_name);
    }
    if ((new = open (new_name, O_WRONLY|O_CREAT|O_TRUNC,0755)) < 0)
    {
	perror (new_name);
    }

    if (make_a_out(new, a_out,new_name,a_name) < 0)
    {
	close (new);
	/* unlink (new_name);	    	/* Failed, unlink new a.out */
	return -1;	
    }

	close (new);
	if (a_out >= 0)
	    close (a_out);
	return 0;
}

/* ****************************************************************
 * make_a_out
 */
static int make_a_out (new, a_out,new_name,a_name)
int new, a_out;
char *new_name,*a_name;
{
	char buff[PAGSIZ];
	unsigned int i,numrd,pos,bss_end;
	char zbuf[4];

    bzero(buff,PAGSIZ);

/* read the header */
    if (read (a_out, &hdr, sizeof(hdr)) != sizeof hdr)
    {
      perror (a_out);
    }
    if (N_BADMAG (hdr))
    {
      printf("invalid magic number in %s", a_name);
      fflush(stdout);
      exit(-1);
    }

/* rewind a.out */
    lseek (a_out,0,0);

/*snarf and plop text from old a.out to new */
    for(i=0;i<hdr.a_text;i+=PAGSIZ)
    {
	bzero(buff,PAGSIZ);
	if (read (a_out,buff,PAGSIZ) != PAGSIZ)
	{
	  perror ("make_hdr:#1 read(a.out)");
	  exit(-1);
	}
	if(write(new,buff,PAGSIZ) != PAGSIZ)
	{
		perror("make_hdr:#2 write(a.out)");
		exit(-1);
	}
    }

/*read and write the runtime data space */
    for(i=N_DATADDR(hdr);i<N_DATADDR(hdr)+ hdr.a_data;i++) 
    {
	if(write(new,(char *)i,1) != 1)
	{
	    printf("i = %d\n",i);
	    perror("make_hdr:#3 write(new)");
	    fflush(stdout);
	    exit(-1);
	}
    }
/* jump over data in a_out and align to page boundry in new */
    lseek(a_out,N_SYMOFF(hdr),L_SET);
    pos = lseek(new,0,L_INCR);
    lseek(new,(PAGSIZ + ((pos - 1) & ~(PAGSIZ - 1))),L_SET);


#ifdef DEBUG
    printf("(PAGSIZ + ((pos - 1) & ~(PAGSIZ - 1))) = %x\n", PAGSIZ + ((pos - 1) & ~(PAGSIZ - 1)));
#endif
/* read symbol and text string space */ 
    while(1)
    {
	bzero(buff,PAGSIZ);
	if ((numrd = read (a_out,buff,PAGSIZ)) != PAGSIZ)
	{
	    printf("Last page read. \n");
	    for(i=0;i<numrd;i++)
		    write(new,buff[i],1);
	    break;
	}
	if(write(new,buff,PAGSIZ) != PAGSIZ)
	{
	    printf("Last page written.\n");
	}
    }
    bss_end = (unsigned)sbrk(0);

/*if you want to diddle with the break this will round to next page boundry */
/*
	if((brk(PAGSIZ + ((bss_end - 1) & ~(PAGSIZ - 1)))) == -1)
		perror("brk");
	bss_end = (unsigned)sbrk(0);
	printf("bss_end = %d\n",bss_end);

    hdr.a_bss = (int)bss_end - (int)N_BSSADDR(hdr);
	printf("hdr.a_bss = %d\n",hdr.a_bss);
*/
/*rewind to begining of file */
    lseek (new,0,0);
    /* you can set the entry point to anything you want in the new executable 
    * ... but you will get a bus error on return since main is not on the stack
    */
    /*hdr.a_entry = first; */

/*write out the header if you changed anything */
    if (write (new, &hdr, sizeof hdr) != sizeof hdr)
    {
	perror ("make_hdr:#6 write(new)");
    }

    return 0;

}

leo@philmds.UUCP (Leo de Wit) (06/29/88)

In article <796@scubed.UUCP> warner@scubed.UUCP (Ken Warner) writes:
|In article <11954@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
||In article <790@scubed.UUCP> warner@scubed.UUCP (Ken Warner) writes:
|||Is there a way to run a core file?
||
||It cannot be done portably, but with certain restrictions, it can
||almost always be done.
|[stuff deleted]
||-- 
||In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
||Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris
|
|Well, I've looked at unexec() from emacs and then tried to write my own version
|of a fuction that will create a new executable, saving the current data space.
|It does work (sort of), but when the new executable is run, a segmentation
|violation occurs in the clean up on exit.  I dunno how to deal with this.
|Anyone care to comment?  Below is a stack trace from dbx showing the error. 
|Below that is the fuction I've been working with.  
|
|Basically, the function (my)unexec() copies the text segment from the 
|executable file since that won't be changing, then copies the data segment 
|from memory.  Then copies the symbol table and string table from the executable
|file.  The result is an executable with a snapshot of the data segment.  Like
|I said, it runs but dies a horrible death on exit.  ut.c was the name of
|my main().

You do not copy the BSS space nor any extension of the data space
(caused by sbrk's). This may just be your problem. If any dynamic
allocation has been done (using malloc, calloc, sbrk etc.) for instance
for FILE buffers, or even for FILE struct's (BSD for instance allocates
dynamically all FILE structs except for the 3 standard static ones)
problems are to be expected. 
The trouble could be that when _fwalk closes files, it tries to free
buffers residing in a space you should have included with you copy.
But of course you cannot include a BSS in your executable; so possibly
the best tactic is to avoid using stdio before the executable is written
to avoid buffer creation. This can be more tricky than you think; even
function as getlogin() use it.

I'm still wondering why you use 1 char writes? Pretty expensive I think.

And I also would like to know what use this scheme has? Is it for
speedup of programs that do a lot of processing, table creation and the
like before actually taking off? Then I've got an alternative: write
out the complete data space, and when you start up again, sbrk to the
correct position (derivable from the datafile's size and the current
breakpoint) and read it in. I've used it with a rapid prototyping
system that always created parse tables, even if the source program
text hadn't changed. The creation of tables that first took several
minutes now only takes about 4 seconds.  The only problem I had with
some variables that point into a different space, for instance environ
in the BSS that points to the stack. Just saving before and restoring
after the read of the datafile solved this.

    Leo.

chris@mimsy.UUCP (Chris Torek) (06/30/88)

>In article <796@scubed.UUCP> warner@scubed.UUCP (Ken Warner) writes:
>>[an example unexec()]
>>It does work (sort of), but when the new executable is run, a segmentation
>>violation occurs in the clean up on exit. ...

In article <537@philmds.UUCP> leo@philmds.UUCP (Leo de Wit) writes:
>You do not copy the BSS space nor any extension of the data space ...

Leo is correct here; but:

>But of course you cannot include a BSS in your executable;

while you can provide BSS in the new a.out, that is the wrong tack.
Instead, what was originally BSS, and any new data allocated via
sbrk(), must be saved in a new (larger) data segment in the new a.out.
This tends to make enormous a.out files; it sometimes helps not to
write blocks of zeroes, instead allowing them to be holes.  (Paging
from a hole should work, and does under 4BSD.)

>And I also would like to know what use this scheme has?

That depends.  A general restart mechanism can be handy.  Unexec is
not so general, but sometimes works.

>Is it for speedup of programs that do a lot of processing, table
>creation and the like before actually taking off? Then I've got an
>alternative: write out the complete data space, and when you start
>up again, sbrk to the correct position ... and read it in.

Sendmail does this, and I hate it.  Every time you recompile the
binary you have to do this again.  It also has some problems:

>... The only problem I had with some variables that point into a
>different space, for instance environ in the BSS that points to
>the stack. Just saving before and restoring after the read of the
>datafile solved this.

I prefer systems that save the data they want saved, rather than
blindly saving everything.  It is easy to imagine a library routine
(malloc or curses) that does something once on startup to get some
important information which may change from run to run (available
memory space or $TERM).  If the call has occurred once already, the
wrong data will be used next time.

Ever wonder why rogue used to refuse to work right on a different
kind of terminal when restoring a saved game? . . .
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris