[comp.unix.questions] How to get the pathname of the current executable?

bethge@wums.wustl.edu (02/08/90)

I am a 10+ year veteran of VMS programming who is trying to learn to
like Unix.  (Really!)  It would help if I could find out how to port
some of my favorite VMS tricks.

I like to write programs that users can use without having  to  know
details  of  their  inner  workings.   Suppose  a program needs some
standard data which the user doesn't need to be concerned with,  and
which  for  various reasons needs to be read from a file rather than
compiled in.  The question is, how does the program find the file?

My VMS solution is to keep the file in the  same  directory  as  the
program  executable,  and  use  the system service which returns the
full pathname of the cuurrently  running  executable,  and  get  the
disk,  directory, etc.  from that.  But I don't know of a comparable
system routine in Unix.

I have looked at Unix programs which deal  with  this  problem,  and
found  that  the  pathname  for the data file is hard-coded into the
program.  This of course means that the program has to be edited and
recompiled if it becomes necessary to move the file.

Environment variables are a better solution, but  they  require  the
user  to define the environment variable before running the program.
I could define the program as  a  shell  script  which  defines  the
environment  variable  and  then fires up the executable, but that's
one more file to maintain.

Is there a better (more transparent) way?

____________________________________________________________________
Paul H. Bethge                                 bethge@wums.wustl.edu
Washington University, St. Louis                  bethge@wums.bitnet

jik@athena.mit.edu (Jonathan I. Kamens) (02/08/90)

  The subject of finding the full path name of the currently running
process was discussed in comp.unix.wizards fairly recently (in September
of last year, to be precise), and pretty much beaten until dead.

  The best posting on the subject was the one I've tacked onto the end
of this article; it is a routine to do what you want whenever it's
possible, with a whole bunch of comments explaining when (and why) it's not.

  I hope it helps.

  BTW, I didn't write it, I'm just reposting it.

Jonathan Kamens			              USnail:
MIT Project Athena				11 Ashford Terrace
jik@Athena.MIT.EDU				Allston, MA  02134
Office: 617-253-8495			      Home: 617-782-0710

------------------------------

Article 19066 of comp.unix.wizards:
Path: bloom-beacon!usc!apple!sun-barr!newstop!sun!limes
From: limes@sun.com (Greg Limes)
Newsgroups: comp.unix.wizards
Subject: Re: Reading the symbol table of the currently running executable
Date: 7 Sep 89 22:31:03 GMT
References: <9104@june.cs.washington.edu> <6131@lynx.UUCP>
Sender: news@sun.Eng.Sun.COM
Organization: Sun Microsystems, Inc.
Lines: 243
In-reply-to: mitch@lynx.uucp's message of 5 Sep 89 17:17:36 GMT

In article <6131@lynx.UUCP> mitch@lynx.uucp (Mitch Bunnell) writes:

> In article <9104@june.cs.washington.edu> bcn@cs.washington.edu
(Clifford Neuman) writes:
> >  2) Obtaining the full path name of the presently running executable.

> 2 - Not possible.

Back before I knew this was impossible, I wrote the following piece of
support code. It has been doing the impossible for me for quite some
time (geez, has it been that long?) with limitations as stated.

/*
 * findx package		25may88 limes@sun.com
 * 
 * Over the last few days (weeks?) there has been some traffic about how to
 * tell where a running program came from. Well, there is a way to find
 * out without changing the shell, the kernel, C language startup
 * conventions, or whatever.
 * 
 * Anyway, here is the basic idea, presented as a package that should compile
 * and run without too many problems.
 * 
 * WHAT IT DOES First, it locates the path to the executable that was used by
 * the exec() that started this process. If the command name starts with a
 * "/", it must be taken literally; if it contains a "/", then it is
 * always relative to the current working directory at the start of the
 * program; otherwise, we have to chase across the PATH value in the
 * environment. If there is no PATH, or the PATH is empty, check the
 * current working directory.
 * 
 * On systems with symbolic links, we are not through yet. The purpose is to
 * locate the directory it is in, so we can get at any related data files.
 * So, we chase symbolic links until we have the real path name of the
 * final resolution file.
 * 
 * SECURITY This can be spoofed easily by making a hard link to, or a copy
 * of, the executable. If you want your program to be sure that it has
 * found the one true installation location, you will have to verify that
 * for yourself. findx() just locates the most likely candidate.
 * 
 * PORTABILITY This was developed on a Sun3 running SunOS 4.0, but I think I
 * at least made the algorithm portable. You may need to mess with include
 * files and such. Symbolic link searching is turned on if your errno.h
 * supplies ELOOP, and off otherwise; I assume that all systems with
 * symlinks have a readlink() call.
 * 
 * HOW TO USE IT Here is the definition of the various parameters. Further
 * down you will find an example main, so fear not ...
 * 
 * findx (cmd, cwd, dir, pgm, run, path)
 * 
 * cmd	pass the command name (argv[0]) here.  findx() knows how to handle
 *	just about anything. If it starts with /, then we use the absolute
 *	name, and ignore the path. If it contains a /, then use the relative
 *	name and ignore the path. Otherwise, look for the file in each
 *	directory named in the path for the file; if there is no path, pretend
 *	its "." like the execvp does.
 *	
 * cwd	pass a big buffer here. if this begins with a slash, I will assume
 *	it is filled in with the current working directory; otherwise, I will
 *	fill it in using getcwd(). Should be at least MAXPATHLEN bytes, if you
 *	do not fill it in yourself.
 *	
 * dir	pass a big buffer here. this gets the full path name of the
 *	directory that the executable was read from. Should be at least
 *	MAXPATHLEN bytes.
 *	
 * pgm	pass THE ADDRESS of a pointer variable here. findx() will fill the
 *	pointer variable with a pointer to the final component of the string
 *	passed as cmd above. Send a (char **)0 if you don't care about this.
 *	
 * run	pass THE ADDRESS of a pointer variable here. findx() will fill the
 *	pointer variable with a pointer to the final component of the name of
 *	the runnning program. Send a (char **)0 if you don't care about this.
 *	
 * path	pass the user's PATH variable here. I made it a parameter so you
 *	can fiddle with the path first. If you do not want to fiddle, pass
 *	getenv("PATH").
 * 
 * RETURN VALUES: Normally, findx() will return zero if all is well. If
 * something goes wrong, it will return -1 with the global variable
 * "errno" set to a corresponding error number.
 */

#include <strings.h>
#include <errno.h>
#include <sys/param.h>

#define	X_OK	1

#ifndef	MAXPATHLEN
#define	MAXPATHLEN	1024
#endif

#ifndef	ENAMETOOLONG
#define	ENAMETOOLONG	EINVAL
#endif

int             findx ();	/* get location of directory */
int             resolve ();	/* get link resolution name */

#ifdef	TESTMAIN
extern char    *getenv ();	/* read value from environment */
char           *pn = (char *) 0;/* program name */
char           *rn = (char *) 0;/* run name */
char            rd[MAXPATHLEN];	/* run directory */
char            wd[MAXPATHLEN] = ".";	/* working directory */

int
main (argc, argv)
    int             argc;
    char          **argv;
{
    findx (*argv, wd, rd, &pn, &rn, getenv ("PATH"));
    printf ("%s: %s running in %s from %s\n", pn, rn, wd, rd);
    return 0;
}

#endif

/*-
 * findx - find executable file in PATH
 * PARAMETERS:
 *	cmd	filename as typed by user
 *	cwd	where to return working directory
 *	dir	where to return program's directory
 *	pgm	where to return what user called it
 *	run	where to return final resolution name
 *	path	user's path from environment
 * RETURNS: returns zero for success, -1 for error (with errno set properly).
 */
int
findx (cmd, cwd, dir, pgm, run, path)
    char           *cmd;
    char           *cwd;
    char           *dir;
    char          **pgm;
    char          **run;
    char           *path;
{
    int             rv = 0;
    char           *f, *s;

    if (!cmd || !*cmd || !cwd || !dir) {
	errno = EINVAL;		/* stupid arguments! */
	return -1;
    }
    if (!path || !*path)	/* missing or null path */
	path = ".";		/* assume sanity */

    if (*cwd != '/')
	if (!(getcwd (cwd, MAXPATHLEN)))
	    return -1;		/* cant get working directory */

    f = rindex (cmd, '/');
    if (pgm)			/* user wants program name */
	*pgm = f ? f + 1 : cmd;

    if (dir) {			/* user wants program directory */
	rv = -1;
	if (*cmd == '/')	/* absname given */
	    rv = resolve ("", cmd + 1, dir, run);
	else if (f)		/* relname given */
	    rv = resolve (cwd, cmd, dir, run);
	else if (f = path) {	/* from searchpath */
	    rv = -1;
	    errno = ENOENT;	/* errno gets this if path empty */
	    while (*f && (rv < 0)) {
		s = f;
		while (*f && (*f != ':'))
		    ++f;
		if (*f)
		    *f++ = 0;
		if (*s == '/')
		    rv = resolve (s, cmd, dir, run);
		else {
		    char            abuf[MAXPATHLEN];

		    sprintf (abuf, "%s/%s", cwd, s);
		    rv = resolve (abuf, cmd, dir, run);
		}
	    }
	}
    }
    return rv;
}

/*
 * resolve - check for specified file in specified directory sets up
 * dir, following symlinks. returns zero for success, or -1 for error
 * (with errno set properly)
 */
int
resolve (indir, cmd, dir, run)
    char           *indir;	/* search directory */
    char           *cmd;	/* search for name */
    char           *dir;	/* directory buffer */
    char          **run;	/* resultion name ptr ptr */
{
    char           *p;
    int             rv = -1;

#ifdef	ELOOP
    int             lcc = 0;
    int             sll;
    char            symlink[MAXPATHLEN + 1];

#endif

    do {
	errno = ENAMETOOLONG;
	if (strlen (indir) + strlen (cmd) + 2 > MAXPATHLEN)
	    break;

	sprintf (dir, "%s/%s", indir, cmd);
	if (access (dir, X_OK) < 0)
	    break;		/* not an executable program */

#ifdef	ELOOP
	while ((sll = readlink (dir, symlink, MAXPATHLEN)) >= 0) {
	    symlink[sll] = 0;
	    if (*symlink == '/')
		strcpy (dir, symlink);
	    else
		sprintf (rindex (dir, '/'), "/%s", symlink);
	}
	if (errno != EINVAL)
	    break;
#endif

	p = rindex (dir, '/');
	*p++ = 0;
	if (run)		/* user wants resolution name */
	    *run = p;
	rv = 0;			/* complete, with success! */

    } while (0);

    return rv;
}

--
-- Greg Limes	limes@sun.com	...!sun!limes	73327,2473	[choose one]

merlyn@iwarp.intel.com (Randal Schwartz) (02/08/90)

In article <1610.25d028a3@wums.wustl.edu>, bethge@wums writes:
[wants to know how to find the name of the current executable]
| Is there a better (more transparent) way?

No.  This was hashed out about a year ago in either c.u.q or c.u.w.
Basically, it boils down to the fact that you have no idea where you
came from, and the closest you could come is to count on the shells to
*mostly* give you the right answer *most* of the time in argv[0].
However, programs that do execv() are free to provide *whatever* they
want.  So, you're out of luck, and subject to spoofing.

Just another UNIX hacker,
-- 
/=Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ==========\
| on contract to Intel's iWarp project, Beaverton, Oregon, USA, Sol III      |
| merlyn@iwarp.intel.com ...!any-MX-mailer-like-uunet!iwarp.intel.com!merlyn |
\=Cute Quote: "Welcome to Portland, Oregon, home of the California Raisins!"=/

les@chinet.chi.il.us (Leslie Mikesell) (02/09/90)

In article <1610.25d028a3@wums.wustl.edu> bethge@wums.wustl.edu writes:

>I like to write programs that users can use without having  to  know
>details  of  their  inner  workings.   Suppose  a program needs some
>standard data which the user doesn't need to be concerned with,  and
>which  for  various reasons needs to be read from a file rather than
>compiled in.  The question is, how does the program find the file?

The normal unix choices would be:
 1) connect the file to one of the stdio streams before execution.
    This has the advantage of allowing pipes to work and can be
    hidden from the user by a shell wrapper.
 2) have a "start-up" configuration file in a standard place that
    set all the other options.  You might also look for a second
    set up file in the user's HOME directory.
 3) (my favorite) Put all the options on the command line with
    reasonable defaults compiled in.  Then if the desired options
    become clumsy to type in, just add a shell wrapper for the
    common variations.  As long as you are calling getopt() you
    might as well anticipate every option anyone might want.

Les Mikesell
  les@chinet.chi.il.us

bph@buengc.BU.EDU (Blair P. Houghton) (02/13/90)

In article <1990Feb7.211538.3894@iwarp.intel.com> merlyn@iwarp.intel.com (Randal Schwartz) writes:
>In article <1610.25d028a3@wums.wustl.edu>, bethge@wums writes:
>[wants to know how to find the name of the current executable]
>| Is there a better (more transparent) way?
>
>No.  This was hashed out about a year ago in either c.u.q or c.u.w.
>Basically, it boils down to the fact that you have no idea where you
>came from, and the closest you could come is to count on the shells to
>*mostly* give you the right answer *most* of the time in argv[0].
>However, programs that do execv() are free to provide *whatever* they
>want.  So, you're out of luck, and subject to spoofing.
>
>Just another UNIX hacker,

Sum hecker.

Take argv[0], if it doesn't have the path, or the full path,
cut it up to get the command name, say "prog", then strcat(3)
it onto "/usr/ucb/which" and call system(3):

    foo = "/usr/ucb/which prog"
    system(foo);

As long as you're still in the directory from which the
program was run, and as long as your path was the same
as the one set in your .cshrc (someone please tell me
why which(1) reads the .cshrc...) then you'll come up
with /usr/foo/bar/bletch/prog, barring surreptition.

As we saw last week, you can use any of a number
of rather machine-specific exec*() commands to get
which(1) to run, but only system(3) shows up in ANSI C.

Getting the output of which(1) back to the program can
take a number of routes, by a temporary file
("/usr/ucb/which prog > file"), or a socket, or dup'ping
streams, or...

				--Blair
				  "...but that's another
				   question..."

kohli@gemed (Jim Kohli) (02/13/90)

In article <5378@buengc.BU.EDU>, bph@buengc.bu.edu (Blair P. Houghton) writes:
<In article <1990Feb7.211538.3894@iwarp.intel.com> merlyn@iwarp.intel.com (Randal Schwartz) writes:
<>In article <1610.25d028a3@wums.wustl.edu>, bethge@wums writes:
<>[wants to know how to find the name of the current executable]
<>| Is there a better (more transparent) way?
<>
<>No.  This was hashed out about a year ago in either c.u.q or c.u.w.
<>Basically, it boils down to the fact that you have no idea where you
<>came from, and the closest you could come is to count on the shells to
<>*mostly* give you the right answer *most* of the time in argv[0].
<>However, programs that do execv() are free to provide *whatever* they
<>want.  So, you're out of luck, and subject to spoofing.
<>
<>Just another UNIX hacker,
<
<Sum hecker.
<
<Take argv[0], if it doesn't have the path, or the full path,
<cut it up to get the command name, say "prog", then strcat(3)
<it onto "/usr/ucb/which" and call system(3):
<
<    foo = "/usr/ucb/which prog"
<    system(foo);
<
<As long as you're still in the directory from which the
<program was run, and as long as your path was the same
<as the one set in your .cshrc (someone please tell me
                                ^^^^^^^^^^^^^^^^^^^^^^
<why which(1) reads the .cshrc...) then you'll come up
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
<with /usr/foo/bar/bletch/prog, barring surreptition.
<
<[...the rest of Blair's fine response excised...]
<
reading .cshrc may be somewhat helpful in resolving
aliases, eh?  although this doesn't help, as you said,
for cases where the path gets munged at the same time.

jim kohli
ge medical systems

scott@csusac.csus.edu (L. Scott Emmons) (02/13/90)

In article <5378@buengc.BU.EDU> bph@buengc.bu.edu (Blair P. Houghton) writes:
>(someone please tell me why which(1) reads the .cshrc...)
Because 'which' is a C-shell script itself...and when a C-shell script is called
a new csh is forked, and .cshrc is read whenever a csh is started up.
-- 
			L. Scott Emmons
			---------------
		...[!ucbvax]!ucdavis!csusac!scott
		ucdavis!csusac!scott@ucbvax.berkeley.edu

wisner@hayes.fai.alaska.edu (Bill Wisner) (02/13/90)

In article <5378@buengc.BU.EDU>, bph@buengc (Blair P. Houghton) writes:
>			       (someone please tell me
>why which(1) reads the .cshrc...)

Because which(1) is a csh script.

Bill Wisner <wisner@hayes.fai.alaska.edu> Gryphon Gang Fairbanks AK 99775

merlyn@iwarp.intel.com (Randal Schwartz) (02/14/90)

In article <5378@buengc.BU.EDU>, bph@buengc (Blair P. Houghton) writes:
| Sum hecker.
| 
| Take argv[0], if it doesn't have the path, or the full path,
| cut it up to get the command name, say "prog", then strcat(3)
| it onto "/usr/ucb/which" and call system(3):
| 
|     foo = "/usr/ucb/which prog"
|     system(foo);
| 
| As long as you're still in the directory from which the
| program was run, and as long as your path was the same
| as the one set in your .cshrc (someone please tell me
| why which(1) reads the .cshrc...) then you'll come up
| with /usr/foo/bar/bletch/prog, barring surreptition.

But, this is exactly what I said was subject to spoofing and failure!

There is no general solution that works in all cases, although you can
get a useful answer under *many* typical circumstances.

In case this isn't *very* obvious... remember: argv[0] is ARBITRARY!
Just because the shells typically pass the name of the command (with
or without a leading path, depending on the shell) in argv[0]
*doesn't* mean you can depend on it!

Try reading a little closer next time, please.

Just another UNIX hacker,
-- 
/=Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ==========\
| on contract to Intel's iWarp project, Beaverton, Oregon, USA, Sol III      |
| merlyn@iwarp.intel.com ...!any-MX-mailer-like-uunet!iwarp.intel.com!merlyn |
\=Cute Quote: "Welcome to Portland, Oregon, home of the California Raisins!"=/

dce@smsc.sony.com (David Elliott) (02/14/90)

In article <1990Feb13.095913.29040@hayes.fai.alaska.edu> wisner@hayes.fai.alaska.edu (Bill Wisner) writes:
>In article <5378@buengc.BU.EDU>, bph@buengc (Blair P. Houghton) writes:
>>			       (someone please tell me
>>why which(1) reads the .cshrc...)
>
>Because which(1) is a csh script.

Chuckle.  Pretty good.

I don't think that's what Blair was asking.  Obviously he knows it's a
csh script, since it would be hard for him to know that it reads
.cshrc otherwise.

The question is: Why does which run without the -f option, which would
cause it *not* to read .cshrc?

I think that the answer is that it wants to handle aliases.

The problem is that this can also cause the path to be changed (people
who use remote shells know to define the path in .cshrc).  One
possibility to "fix" which would be to have it run with -f, check the
path for all possibilities, source the .cshrc, and then check for
aliases.  Of course, this has problems, too.

Personally, I prefer builtin commands for doing this job, like type in
sh and Tony Birnseth's builtin which for csh.

-- 
David Elliott
dce@smsc.sony.com | ...!{uunet,mips}!sonyusa!dce
(408)944-4073

richard@aiai.ed.ac.uk (Richard Tobin) (02/14/90)

In article <5378@buengc.BU.EDU> bph@buengc.bu.edu (Blair P. Houghton) writes:
> someone please tell me why which(1) reads the .cshrc...

If you're asking "why does which(1) assume I use csh", then:

Different shells potentially interpret commands in completely different
ways.  A command like which *has* to depend on your shell.  It seems clear
to me that which should be built-in to csh and sh - that way it would
always be right.

-- Richard
-- 
Richard Tobin,                       JANET: R.Tobin@uk.ac.ed             
AI Applications Institute,           ARPA:  R.Tobin%uk.ac.ed@nsfnet-relay.ac.uk
Edinburgh University.                UUCP:  ...!ukc!ed.ac.uk!R.Tobin

barnett@crdgw1.crd.ge.com (Bruce Barnett) (02/14/90)

>>>			       (someone please tell me
>>>why which(1) reads the .cshrc...)
>>
>>Because which(1) is a csh script.

>The question is: Why does which run without the -f option, which would
>cause it *not* to read .cshrc?

The question is why people post without spending a couple of minutes
reading the script file.

It DOES use the -f option on start-up.
It DOES explicitly source the .cshrc file.

Use the source, Luke!

--
Bruce G. Barnett	<barnett@crd.ge.com>   uunet!crdgw1!barnett

gwyn@smoke.BRL.MIL (Doug Gwyn) (02/14/90)

In article <1747@skye.ed.ac.uk> richard@aiai.UUCP (Richard Tobin) writes:
>Different shells potentially interpret commands in completely different
>ways.  A command like which *has* to depend on your shell.  It seems clear
>to me that which should be built-in to csh and sh - that way it would
>always be right.

Well, without a precise definition of what it is that we expect "which"
to do, the issue cannot be settled.

My own view is that "which" and "every" should report ONLY on $PATH-based
commands (assuming standard UNIX, i.e., not multiple mounts on /bin), and
that "whatis" should be a builtin that produces a definition suitable for
feeding back to the shell (unlike System V's "type" builtin).

Here are some typical examples:

$ echo $PATH
/usr/lbin:/usr/5bin:/bin:/usr/bin:~/bin:/usr/ucb:.
$ whatis which
/usr/lbin/which
$ whatis every
/usr/lbin/every
$ whatis whatis		# 8th or 9th Edition UNIX or BRL Bourne shell
builtin whatis
$ whatis cd
builtin cd
$ whatis builtin	# 8th or 9th Edition UNIX or BRL Bourne shell
builtin builtin
$ whatis l
l () { ( set +u ; exec ls -bCF $* ) }
$ whatis sh
/usr/lbin/sh
$ whatis xyzzy
# xyzzy not found
$ which which
/usr/lbin/which
$ which every
/usr/lbin/every
$ which whatis
/usr/ucb/whatis
$ which cd
which: cd: not found
$ which builtin
which: builtin: not found
$ which l
which: l: not found
$ which sh
/usr/lbin/sh
$ which xyzzy
which: xyzzy: not found
$ every which
/usr/lbin/which
/usr/ucb/which
$ every every
/usr/lbin/every
$ every whatis
/usr/ucb/whatis
$ every cd
every: cd: not found
$ every builtin
every: builtin: not found
$ every l
every: l: not found
$ every xyzzy
every: xyzzy: not found
$ every sh
/usr/lbin/sh
/usr/5bin/sh
/bin/sh

(Actually, for interactive use I normally redefine "cd" and "which" using
shell functions, but the example is clearer if I show the default behavior.)

jc@minya.UUCP (John Chambers) (02/15/90)

> As long as you're still in the directory from which the
> program was run, and as long as your path was the same
> as the one set in your .cshrc (someone please tell me
> why which(1) reads the .cshrc...) then you'll come up
> with /usr/foo/bar/bletch/prog, barring surreptition.

I've been mystified about this on some Ultrix machines at work,
especially since this causes it to give the wrong result most
of the time.  When I got ahold of this machine (an ESIX system), 
I was further surprised to find that which didn't even exist.  
And here I'd thought it was a universal csh builtin.  Just shows 
how naive I was.

So I decided to try my hand at implementing it.  Half an hour later,
I had it working.  It works in the obvious way, using the PATH from
its environment, and gives the right result.  Something even more
surprising:  You know how the csh builtin has this several-second
delay before it answers?  Well, my little program answers with no
discernable delay.

How could they have all gotten it all so wrong?  I feel like posting
my program, but I'd feel a bit silly to do so, because it's such
a piece of trivia.  I mean, talk about a Programming 101 assignment.
At least, I think I'll take it to work, so I can find things on 
the Ultrix systems.

(Random sounds of disgust and exasperation.)

-- 
John Chambers ...!{harvard,ima,mit-eddie}!minya!jc

[Sorry, no clever saying today.]