[comp.unix.wizards] Getting path to executed program file

tgl@zog.cs.cmu.edu (Tom Lane) (07/25/90)

Is there any way for a program to discover the path name of the file
from which it was executed?

I would like to be able to access auxiliary files stored in the same
directory as the main executable file.  However, after reading the
man page for exec(2), it doesn't seem like the program gets enough
information to reliably determine what file/directory it was loaded
from.  (For one thing, you don't know whether PATH was used, and for
another, you don't know whether argv[0] is the same as the supplied
path or is just the last component.)  If the info is in fact
squirrelled away somewhere, please tell me where!

-- 
				tom lane
Internet: tgl@cs.cmu.edu
UUCP: <your favorite internet/arpanet gateway>!cs.cmu.edu!tgl
BITNET: tgl%cs.cmu.edu@cmuccvma
CompuServe: >internet:tgl@cs.cmu.edu

jik@athena.mit.edu (Jonathan I. Kamens) (07/25/90)

In article <9995@pt.cs.cmu.edu>, tgl@zog.cs.cmu.edu (Tom Lane) writes:
|> Is there any way for a program to discover the path name of the file
|> from which it was executed?

  Appended to this message is a message posted to comp.unix.wizards by
Greg Limes the nth time this was asked (your message was the mth time,
where m is about three or four more than n if I recall correctly, and
n is nonnegligable :-).  It says just about all there is to say about
this question.  I haven't had occasion to use his source code yet, so
I don't know whether or not it has bugs, but I doubt it...

Jonathan Kamens			              USnail:
MIT Project Athena				11 Ashford Terrace
jik@Athena.MIT.EDU				Allston, MA  02134
Office: 617-253-8495			      Home: 617-782-0710

Article 19066 of comp.unix.wizards:
Path: bloom-beacon!usc!apple!sun-barr!newstop!sun!limes
From: limes@sun.com (Greg Limes)
Newsgroups: comp.unix.wizards
Subject: Re: Reading the symbol table of the currently running executable
Message-ID: <LIMES.89Sep7153103@ouroborous.wseng.sun.com>
Date: 7 Sep 89 22:31:03 GMT
References: <9104@june.cs.washington.edu> <6131@lynx.UUCP>
Sender: news@sun.Eng.Sun.COM
Organization: Sun Microsystems, Inc.
Lines: 243
In-reply-to: mitch@lynx.uucp's message of 5 Sep 89 17:17:36 GMT

In article <6131@lynx.UUCP> mitch@lynx.uucp (Mitch Bunnell) writes:

> In article <9104@june.cs.washington.edu> bcn@cs.washington.edu (Clifford Neuman) writes:
> >  2) Obtaining the full path name of the presently running executable.

> 2 - Not possible.

Back before I knew this was impossible, I wrote the following piece of
support code. It has been doing the impossible for me for quite some
time (geez, has it been that long?) with limitations as stated.

/*
 * findx package		25may88 limes@sun.com
 * 
 * Over the last few days (weeks?) there has been some traffic about how to
 * tell where a running program came from. Well, there is a way to find
 * out without changing the shell, the kernel, C language startup
 * conventions, or whatever.
 * 
 * Anyway, here is the basic idea, presented as a package that should compile
 * and run without too many problems.
 * 
 * WHAT IT DOES First, it locates the path to the executable that was used by
 * the exec() that started this process. If the command name starts with a
 * "/", it must be taken literally; if it contains a "/", then it is
 * always relative to the current working directory at the start of the
 * program; otherwise, we have to chase across the PATH value in the
 * environment. If there is no PATH, or the PATH is empty, check the
 * current working directory.
 * 
 * On systems with symbolic links, we are not through yet. The purpose is to
 * locate the directory it is in, so we can get at any related data files.
 * So, we chase symbolic links until we have the real path name of the
 * final resolution file.
 * 
 * SECURITY This can be spoofed easily by making a hard link to, or a copy
 * of, the executable. If you want your program to be sure that it has
 * found the one true installation location, you will have to verify that
 * for yourself. findx() just locates the most likely candidate.
 * 
 * PORTABILITY This was developed on a Sun3 running SunOS 4.0, but I think I
 * at least made the algorithm portable. You may need to mess with include
 * files and such. Symbolic link searching is turned on if your errno.h
 * supplies ELOOP, and off otherwise; I assume that all systems with
 * symlinks have a readlink() call.
 * 
 * HOW TO USE IT Here is the definition of the various parameters. Further
 * down you will find an example main, so fear not ...
 * 
 * findx (cmd, cwd, dir, pgm, run, path)
 * 
 * cmd	pass the command name (argv[0]) here.  findx() knows how to handle
 *	just about anything. If it starts with /, then we use the absolute
 *	name, and ignore the path. If it contains a /, then use the relative
 *	name and ignore the path. Otherwise, look for the file in each
 *	directory named in the path for the file; if there is no path, pretend
 *	its "." like the execvp does.
 *	
 * cwd	pass a big buffer here. if this begins with a slash, I will assume
 *	it is filled in with the current working directory; otherwise, I will
 *	fill it in using getcwd(). Should be at least MAXPATHLEN bytes, if you
 *	do not fill it in yourself.
 *	
 * dir	pass a big buffer here. this gets the full path name of the
 *	directory that the executable was read from. Should be at least
 *	MAXPATHLEN bytes.
 *	
 * pgm	pass THE ADDRESS of a pointer variable here. findx() will fill the
 *	pointer variable with a pointer to the final component of the string
 *	passed as cmd above. Send a (char **)0 if you don't care about this.
 *	
 * run	pass THE ADDRESS of a pointer variable here. findx() will fill the
 *	pointer variable with a pointer to the final component of the name of
 *	the runnning program. Send a (char **)0 if you don't care about this.
 *	
 * path	pass the user's PATH variable here. I made it a parameter so you
 *	can fiddle with the path first. If you do not want to fiddle, pass
 *	getenv("PATH").
 * 
 * RETURN VALUES: Normally, findx() will return zero if all is well. If
 * something goes wrong, it will return -1 with the global variable
 * "errno" set to a corresponding error number.
 */

#include <strings.h>
#include <errno.h>
#include <sys/param.h>

#define	X_OK	1

#ifndef	MAXPATHLEN
#define	MAXPATHLEN	1024
#endif

#ifndef	ENAMETOOLONG
#define	ENAMETOOLONG	EINVAL
#endif

int             findx ();	/* get location of directory */
int             resolve ();	/* get link resolution name */

#ifdef	TESTMAIN
extern char    *getenv ();	/* read value from environment */
char           *pn = (char *) 0;/* program name */
char           *rn = (char *) 0;/* run name */
char            rd[MAXPATHLEN];	/* run directory */
char            wd[MAXPATHLEN] = ".";	/* working directory */

int
main (argc, argv)
    int             argc;
    char          **argv;
{
    findx (*argv, wd, rd, &pn, &rn, getenv ("PATH"));
    printf ("%s: %s running in %s from %s\n", pn, rn, wd, rd);
    return 0;
}

#endif

/*-
 * findx - find executable file in PATH
 * PARAMETERS:
 *	cmd	filename as typed by user
 *	cwd	where to return working directory
 *	dir	where to return program's directory
 *	pgm	where to return what user called it
 *	run	where to return final resolution name
 *	path	user's path from environment
 * RETURNS: returns zero for success, -1 for error (with errno set properly).
 */
int
findx (cmd, cwd, dir, pgm, run, path)
    char           *cmd;
    char           *cwd;
    char           *dir;
    char          **pgm;
    char          **run;
    char           *path;
{
    int             rv = 0;
    char           *f, *s;

    if (!cmd || !*cmd || !cwd || !dir) {
	errno = EINVAL;		/* stupid arguments! */
	return -1;
    }
    if (!path || !*path)	/* missing or null path */
	path = ".";		/* assume sanity */

    if (*cwd != '/')
	if (!(getcwd (cwd, MAXPATHLEN)))
	    return -1;		/* cant get working directory */

    f = rindex (cmd, '/');
    if (pgm)			/* user wants program name */
	*pgm = f ? f + 1 : cmd;

    if (dir) {			/* user wants program directory */
	rv = -1;
	if (*cmd == '/')	/* absname given */
	    rv = resolve ("", cmd + 1, dir, run);
	else if (f)		/* relname given */
	    rv = resolve (cwd, cmd, dir, run);
	else if (f = path) {	/* from searchpath */
	    rv = -1;
	    errno = ENOENT;	/* errno gets this if path empty */
	    while (*f && (rv < 0)) {
		s = f;
		while (*f && (*f != ':'))
		    ++f;
		if (*f)
		    *f++ = 0;
		if (*s == '/')
		    rv = resolve (s, cmd, dir, run);
		else {
		    char            abuf[MAXPATHLEN];

		    sprintf (abuf, "%s/%s", cwd, s);
		    rv = resolve (abuf, cmd, dir, run);
		}
	    }
	}
    }
    return rv;
}

/*
 * resolve - check for specified file in specified directory sets up
 * dir, following symlinks. returns zero for success, or -1 for error
 * (with errno set properly)
 */
int
resolve (indir, cmd, dir, run)
    char           *indir;	/* search directory */
    char           *cmd;	/* search for name */
    char           *dir;	/* directory buffer */
    char          **run;	/* resultion name ptr ptr */
{
    char           *p;
    int             rv = -1;

#ifdef	ELOOP
    int             lcc = 0;
    int             sll;
    char            symlink[MAXPATHLEN + 1];

#endif

    do {
	errno = ENAMETOOLONG;
	if (strlen (indir) + strlen (cmd) + 2 > MAXPATHLEN)
	    break;

	sprintf (dir, "%s/%s", indir, cmd);
	if (access (dir, X_OK) < 0)
	    break;		/* not an executable program */

#ifdef	ELOOP
	while ((sll = readlink (dir, symlink, MAXPATHLEN)) >= 0) {
	    symlink[sll] = 0;
	    if (*symlink == '/')
		strcpy (dir, symlink);
	    else
		sprintf (rindex (dir, '/'), "/%s", symlink);
	}
	if (errno != EINVAL)
	    break;
#endif

	p = rindex (dir, '/');
	*p++ = 0;
	if (run)		/* user wants resolution name */
	    *run = p;
	rv = 0;			/* complete, with success! */

    } while (0);

    return rv;
}

--
-- Greg Limes	limes@sun.com	...!sun!limes	73327,2473	[choose one]

tgl@zog.cs.cmu.edu (Tom Lane) (07/25/90)

In article <1990Jul25.064956.22757@mintaka.lcs.mit.edu>, jik@athena.mit.edu (Jonathan I. Kamens) writes:
> In article <9995@pt.cs.cmu.edu>, tgl@zog.cs.cmu.edu (Tom Lane) writes:
> |> Is there any way for a program to discover the path name of the file
> |> from which it was executed?
> 
> [Jonathan provides a chunk of code written by Greg Limes (limes@sun.com),
>  which uses argv[0] and the PATH environment string to try to determine
>  where the current executable file came from.]

Thanks for posting this code; I had been planning to write the same thing,
and this saves me from reinventing the wheel.  The business about following
a symbolic link to the real executable is a nice refinement that I hadn't
thought of.

HOWEVER, this doesn't really answer my question.  There are a couple of
assumptions implicit in this method, which Greg didn't document:

  1. It has to assume that argv[0] is identical to the path parameter
     given to exec.  The manuals I've checked say "by convention, argv[0]
     must be supplied and must point to a string identical to path
     *or path's last component*" (emphasis added).
     If the invoking program follows that last clause, then we'll fail
     when the user does something like
	$ ../otherdir/progname  parameters
     The shells I've tried around here seem to make argv[0] be the whole
     string, but who knows whether they all do?

  2. It has to assume that the exec call was execlp() or execvp(),
     and not one of the other forms of exec.  With the other forms,
     a simple name will always be found in the current directory.
     With execlp/execvp, this is true only if PATH contains "." as its
     first element.

In practice these problems probably don't materialize often, so Greg's
code probably gets the right answer 99% of the time.  Still, I would
like to know if it is possible to avoid these assumptions.

-- 
				tom lane
Internet: tgl@cs.cmu.edu
UUCP: <your favorite internet/arpanet gateway>!cs.cmu.edu!tgl
BITNET: tgl%cs.cmu.edu@cmuccvma
CompuServe: >internet:tgl@cs.cmu.edu

jik@athena.mit.edu (Jonathan I. Kamens) (07/25/90)

In article <10004@pt.cs.cmu.edu>, tgl@zog.cs.cmu.edu (Tom Lane) writes:
|> [lists the assumptions that the code I posted makes]
|> 
|> In practice these problems probably don't materialize often, so Greg's
|> code probably gets the right answer 99% of the time.  Still, I would
|> like to know if it is possible to avoid these assumptions.

  As I said in my original message, the code I posted "says just about
all there is to say about this question."  In other words, no, there is
no portable way to avaoid these assumptions, and indeed most systems
don't even have a non-portable way to avoid them, unless you consider
making your executable setuid root or setgid kmem and grovelling through
kernel memory to figure out how the process was started to be a
reasonable way to do things :-).

Jonathan Kamens			              USnail:
MIT Project Athena				11 Ashford Terrace
jik@Athena.MIT.EDU				Allston, MA  02134
Office: 617-253-8495			      Home: 617-782-0710