[comp.sources.d] Finding where an executable was run from -- a proposal.

gnu@hoptoad.uucp (John Gilmore) (05/11/88)

tif@cpe.UUCP wrote:
>               ...it sounds like perl should have a special variable
> that is like $0 only contains a full path.

I have often wanted exactly this for C utilities.  Wouldn't it be nice
if you didn't have to build in the names of your control files -- if
the executable could derive the names of its config files from its own name?

The problem is that Unix doesn't provide a way to tell your own name.
What is passed in argv[0] need not bear any relation to the name of
the program (and often doesn't, if the shell has searched PATH to find
the executable).  On the other hand, the first argument to exec() is
always the correct path of the executable (either an absolute or
relative path).  But it's not available to the executed program.

If exec() would pass this value to the executed program, say as
argv[-1], then a program could reliably know its own name, and apply a
simple transformation to it to find its data files (e.g. for program
"XXXXXX/foo", its data files are found in "XXXXXX/lib/foo/whatever").
This works for all values of XXXXXX, whether absolute or relative.
For a subsystem like uucp, you would turn e.g. XXXXXX/uucico into
XXXXXX/lib/uucp/whatever (replace program name with subsystem name).

This would make lots of application programs easier to install; you just
copy it into somewhere on your PATH and it will run.  For all those "shrink
wrap applications" that ABI is likely to provide, this would be a major win.
It would also reduce the volume of arcane knowledge required to run
a Unix system (e.g. where are the netnews control files kept?  How about
crontab?  How about sendmail configs?  How about inet daemon config?)

If anyone implements this, I recommend providing a #define AV_EXECNAME -1
and documenting that argv[AV_EXECNAME] is the pathname given to exec().
No sense embedding another magic number (-1) into programs...
-- 
John Gilmore  {sun,pacbell,uunet,pyramid,ihnp4}!hoptoad!gnu        gnu@toad.com
"Use the Source, Luke...."

limes@sun.uucp (Greg Limes) (05/12/88)

In article <4527@hoptoad.uucp> gnu@hoptoad.uucp (John Gilmore) writes:
>If exec() would pass this value to the executed program, say as
>argv[-1], then a program could reliably know its own name, and apply a
>simple transformation to it to find its data files (e.g. for program
>"XXXXXX/foo", its data files are found in "XXXXXX/lib/foo/whatever").
>This works for all values of XXXXXX, whether absolute or relative.
>For a subsystem like uucp, you would turn e.g. XXXXXX/uucico into
>XXXXXX/lib/uucp/whatever (replace program name with subsystem name).

In Turbo-C, Borland passes the complete path name of the program
executed as argv[0].  This may be specific to Turbo-C, or may be
general across MS-DOS.  Are there any programs that this would break?

>If anyone implements this, I recommend providing a #define AV_EXECNAME -1
>and documenting that argv[AV_EXECNAME] is the pathname given to exec().
>No sense embedding another magic number (-1) into programs...

... and defining AV_EXECNAME as (0) would make this work for Turbo-C and
any other environments that do as I described above. As things stand
now, if argv[0][0] is '/' then that string is *usually* the name of the
executing program.
-- 
   Greg Limes
#include <std-disclaimer.h>

root@cca.ucsf.edu (Computer Center) (05/12/88)

In article <4527@hoptoad.uucp>, gnu@hoptoad.uucp (John Gilmore) writes:

> If exec() would pass this value to the executed program, say as
> argv[-1], then a program could reliably know its own name, and apply a
> simple transformation to it to find its data files (e.g. for program
> "XXXXXX/foo", its data files are found in "XXXXXX/lib/foo/whatever").
> This works for all values of XXXXXX, whether absolute or relative.
> For a subsystem like uucp, you would turn e.g. XXXXXX/uucico into
> XXXXXX/lib/uucp/whatever (replace program name with subsystem name).

Noooooooooo!

If the program is in XXXXXX/bin/foo its support should be reachable
                            ^^^
via XXXXXX/lib/foo.

Thos

Thos Sumner       (thos@cca.ucsf.edu)   BITNET:  thos@ucsfcca
(The I.G.)        (...ucbvax!ucsfcgl!cca.ucsf!thos)

OS|2 -- an Operating System for puppets.

#include <disclaimer.std>

andrew@frip.gwd.tek.com (Andrew Klossner) (05/12/88)

[]

	"This would make lots of application programs easier to
	install; you just copy it into somewhere on your PATH and it
	will run."

If an application uses this scheme to find its associated files, some
useful Unix idioms cease to work.  For example, say that "rn" lives in
/usr/news, but I don't want /usr/news in my PATH (too many nasty
commands are also there).  At present I can put a link to /usr/news/rn
in a directory that is in my path (e.g., my local bin).  With the
proposed scheme, that would cause rn to look in my_local_bin/lib/* for
its data files instead of in /usr/news/lib/*.

  -=- Andrew Klossner   (decvax!tektronix!tekecs!andrew)       [UUCP]
                        (andrew%tekecs.tek.com@relay.cs.net)   [ARPA]

jv@mhres.mh.nl (Johan Vromans) (05/13/88)

From article <4527@hoptoad.uucp>, by gnu@hoptoad.uucp (John Gilmore):
> If anyone implements this, I recommend providing a #define AV_EXECNAME -1
> and documenting that argv[AV_EXECNAME] is the pathname given to exec().

I'm already using the convention that library/data files belonging to a
program are located in a path relative to the name of the program. So I
strongly second this suggestion. Until this is adopted by the next C
standard, we'll need to have a library routine which does the job, based on
argv[0] and the PATH variable (despite of the possible problems - there's
no better way).
-- 
Johan Vromans                              | jv@mh.nl via European backbone
Multihouse N.V., Gouda, the Netherlands    | uucp: ..{uunet!}mcvax!mh.nl!jv
"It is better to light a candle than to curse the darkness"

wesommer@athena.mit.edu (William Sommerfeld) (05/13/88)

In article <9987@tekecs.TEK.COM> andrew@frip.gwd.tek.com (Andrew Klossner) writes:
>[]
>
>	"This would make lots of application programs easier to
>	install; you just copy it into somewhere on your PATH and it
>	will run."
>
>If an application uses this scheme to find its associated files, some
>useful Unix idioms cease to work.  For example, say that "rn" lives in
>/usr/news, but I don't want /usr/news in my PATH (too many nasty
>commands are also there).  At present I can put a link to /usr/news/rn
>in a directory that is in my path (e.g., my local bin).  With the
>proposed scheme, that would cause rn to look in my_local_bin/lib/* for
>its data files instead of in /usr/news/lib/*.

I remarked (in private mail) to John Gilmore that what he described
was very similar to the Multics referencing_dir mechanism. If it's
done right, the application gets passed its _real_ absolute pathname,
after all the symlinks have been chased.

While I'm here, I might as well lobby for support for a library
function/system call which canonicalizes a pathname, chasing all the
links and turning it into an absolute pathname.  abs_path(".", buf) should
be equivalent to getwd(buf).  It was useful on Multics.  It would be
very useful in some cases on UNIX.

While normally I am opposed to creating new system calls, a reasonable
implementation of abs_path in user code (assuming it didn't "cheat"
and use chdir()) would most likely be O(n**2) in terms of directory
lookups, whereas a version inside the kernel would be O(n).  (Of
course, you could always do what was done in both Amber and AEGIS:
move namei() or its equivalent out of the kernel and into a shared
library..)

				Bill Sommerfeld
				wesommer@athena.mit.edu

dmcanzi@watdcsu.waterloo.edu (David Canzi) (05/13/88)

How about if binary software was routinely distributed as (1) a library
containing most of the compiled code, (2) a short C source program in
which configurable information is compiled as external variables, and
(3) a makefile which can be edited to define the configurable options
by compiling the C source file with suitable "-D" options.  (Or perhaps
it would be simpler to edit the C source file directly.)

This way, if the program as distributed searches for config and data
files under /usr/lib/thingumbob, and you would rather install these
files under /usr/local/lib/thingumbob, you'd have that option.

And there will be no need to add another feature to the kernel.

-- 
David Canzi

dg@lakart.UUCP (David Goodenough) (05/13/88)

In article <4527@hoptoad.uucp> gnu@hoptoad.uucp (John Gilmore) writes:
>If exec() would pass this value to the executed program, say as
>argv[-1], then a program could reliably know its own name, and apply a
>simple transformation to it to find its data files (e.g. for program
>"XXXXXX/foo", its data files are found in "XXXXXX/lib/foo/whatever").
>This works for all values of XXXXXX, whether absolute or relative.
>For a subsystem like uucp, you would turn e.g. XXXXXX/uucico into
>XXXXXX/lib/uucp/whatever (replace program name with subsystem name).

Wait just a minute.

If the information is REALLY important, argv[0] is the FULL PATH NAME
that the program was invoked with:

Script started on Fri May 13 10:32:21 1988
lakart!dg(~)[1]-> cat eco.c
main(argc, argv)
char **argv;
 {
    printf("%s\n", argv[0]);
 }
lakart!dg(~)[2]-> eco
eco
lakart!dg(~)[3]-> ./eco
./eco
lakart!dg(~)[4]-> cd ..
lakart!dg(/u2)[5]-> dg/eco
dg/eco
lakart!dg(/u2)[6]-> cd dg/src
lakart!dg(src)[7]-> ../eco
../eco
lakart!dg(src)[8]-> echo ~dg/eco
/u2/dg/eco
lakart!dg(src)[9]-> ~dg/eco
/u2/dg/eco
lakart!dg(src)[10]-> ^D
script done on Fri May 13 10:33:05 1988

Now, if argv[0][0] is a '/' everything is OK, else just do a

popen("pwd", "r");

suck it all up, and prepend it to argv[0], with an intervening '/'. You
may not have an optimal path, BUT IT WILL BE CORRECT, and ABSOLUTE.
Now you can go to work.
-- 
	dg@lakart.UUCP - David Goodenough		+---+
							| +-+-+
	....... !harvard!adelie!cfisun!lakart!dg	+-+-+ |
						  	  +---+

daveb@geac.UUCP (David Collier-Brown) (05/16/88)

In article <5307@bloom-beacon.MIT.EDU> wesommer@athena.mit.edu (William Sommerfeld) writes:
| I remarked (in private mail) to John Gilmore that what he described
| was very similar to the Multics referencing_dir mechanism. If it's
| done right, the application gets passed its _real_ absolute pathname,
| after all the symlinks have been chased.
| 
| While I'm here, I might as well lobby for support for a library
| function/system call which canonicalizes a pathname, chasing all the
| links and turning it into an absolute pathname.  abs_path(".", buf) should
| be equivalent to getwd(buf).  It was useful on Multics.  It would be
| very useful in some cases on UNIX.
|...
| 				Bill Sommerfeld
| 				wesommer@athena.mit.edu

I have a copy of a program called "name" which appears to do just
that (the O(n**2) variant), whose origin is unknown.  Would the
author care to (re)post it?  Shall I?
-- 
 David Collier-Brown.  {mnetor yunexus utgpu}!geac!daveb
 Geac Computers Ltd.,  | "His Majesty made you a major 
 350 Steelcase Road,   |  because he believed you would 
 Markham, Ontario.     |  know when not to obey his orders"

swh@hpsmtc1.HP.COM (Steve Harrold) (05/18/88)

Re: the "name()" function

Please post it

---------------------
Steve Harrold			...hplabs!hpsmtc1!swh
				HPG200/13
				(408) 447-5580
---------------------

mer6g@uvaarpa.virginia.edu (Marc E. Rouleau) (05/18/88)

In article <107@lakart.UUCP> dg@lakart.UUCP (David Goodenough) writes:
> [ some examples pointing out that (in some cases) what ends up in
>   argv[0] can be turned into a full pathname by prepending `pwd` to it ]

This technique works only if the program being executed is invoked by
specifying a relative or absolute path for it.  If it is found by your
shell as specified by your $PATH variable, all bets are off ...

The whole point of John Gilmore's proposal was to address this problem
of executables being found by path-search and therefore having no
invocation-time knowledge of where they reside.

	-- Marc Rouleau

allbery@ncoast.UUCP (Brandon S. Allbery) (05/19/88)

As quoted from <4527@hoptoad.uucp> by gnu@hoptoad.uucp (John Gilmore):
+---------------
| If exec() would pass this value to the executed program, say as
| argv[-1], then a program could reliably know its own name, and apply a
| simple transformation to it to find its data files (e.g. for program
| "XXXXXX/foo", its data files are found in "XXXXXX/lib/foo/whatever").
| This works for all values of XXXXXX, whether absolute or relative.
+---------------

...until the program does a chdir(), at which point the program must have
resolved a relative pathname into an absolute one or it won't be able to use
the path any more.

Actually, the biggest problem with this is that by the time the kernel has
the executable, the pathname has been changed to a (dev, ino) pair.  This is
less than useful.  And as far as know, the kernel doesn't keep the pathname
around any longer than necessary (that being namei()).

And what happens if I "ln /usr/lib/uucp/uucico ~/etc/poll"?  (Not that I
advocate doing so, but....)
-- 
	      Brandon S. Allbery, moderator of comp.sources.misc
	{well!hoptoad,uunet!marque,cbosgd,sun!mandrill}!ncoast!allbery
Delphi: ALLBERY						     MCI Mail: BALLBERY

dgk@ulysses.homer.nj.att.com (David Korn[eww]) (05/21/88)

ksh passes the full pathname of the executable as the first environment
variable and names it _.  Thus, if the program is run by ksh,
genenv("_"); returns a pathname for the executable.  Now if everyone
would follow this convention the problem would be solved.

David Korn
ulysses!dgk

gnu@hoptoad.uucp (John Gilmore) (05/23/88)

mer6g@uvaarpa.virginia.edu (Marc E. Rouleau) wrote:
> The whole point of John Gilmore's proposal was to address this problem
> of executables being found by path-search and therefore having no
> invocation-time knowledge of where they reside.

Actually, that's not the point (you could always write a subroutine
that searched the path to find argv[0]).  The point is that I want a
mechanism that cannot be spoofed.  Mystery variables in the
environment, library routines that look at argv[0], etc, can all be
spoofed by a 3-line program (that changes the environment then calls
exec(), or that passes different things as the filename to execute
versus argv[0] to exec()).  If real applications are going to use this,
it's critical that they are able to depend upon the pointer they find.
Imagine if you could invoke uucp with your own private set of spool
directories -- all the security built into it would be pretty useless.
You could steal mail, forge things, etc.

Several people pointed out that hard links to a program would foul up
my proposed mechanism -- they are right.  You can prevent people from
copying the application to another spot and running it (by making the
program unreadable by mortals, even though executable by them); the
kernel could resolve all the symlinks before giving you the path; but
you can hard-link to anything you can name, even if it has '0'
permissions.

Perhaps a solution is to only provide this facility to programs with a
link count of 1?  In other words, if you hard-link to a program, it
would no longer be provided its name, and could exit with an error
message.  This is probably even worse -- by simply creating a hard link
to /usr/lib/uucico, you could make it unable to find its directories,
even when run from the right place.

Perhaps a variation on one of these themes would work.  If the kernel
was to output pathnames to user programs (presumably so that the
programs can trust that the pathnames are uncorrupted), it might
be reasonable to put some kinds of access control on hard links.
Maybe if/when Unix file protection ever gets revised, this can be
considered.

Someone else pointed out that Multics can reliably tell you the
pathname of your running program.  Maybe Multics didn't have hard
links, so it was trustable?
-- 
John Gilmore  {sun,pacbell,uunet,pyramid,ihnp4}!hoptoad!gnu        gnu@toad.com
"Use the Source, Luke...."

dlm@cuuxb.ATT.COM (Dennis L. Mumaugh) (05/25/88)

In article <4626@hoptoad.uucp> gnu@hoptoad.uucp (John Gilmore) writes:
>Actually, that's not the point (you could always write a subroutine
>that searched the path to find argv[0]).  The point is that I want a
>mechanism that cannot be spoofed.  Mystery variables in the
>environment, library routines that look at argv[0], etc, can all be
>spoofed by a 3-line program (that changes the environment then calls
>exec(), or that passes different things as the filename to execute
>versus argv[0] to exec()).  If real applications are going to use this,
>it's critical that they are able to depend upon the pointer they find.

There is such a facility that originated in Version 8  that  will
see  light  of  day in NEW ATT releases of UNIX.  This is part of
the /proc file system.  It  is  an  ioctl  that  returns  a  file
descriptor of the text of the process.
    PIOCOPENT -- provides a read-only  file  descriptor  for  the
    executable  file  associated with the "traced" process.  This
    allows a debugger to find the symbol table without having  to
    know any path names.

Once you have the file descriptor, fstat it and get  the  device,
inode  pair,  and then execute ncheck -i on the correct device to
get a path name.  Of course  this  is  modulo  links  (  hard  or
symbolic).

What?  You must be root to run ncheck?  True, but why  would  you
be so concerned about being lied to otherwise.

Actually, the PIOCOPENT would have been very useful  for  the  V6
Adventure  game that Jim Gillogly wrote.  He put the messages for
the game at the end of the a.out following the "meaningful"  part
of  the  a.out.  His program did all sorts of contortions to find
the name of the file so it could be opened and read. [It  had  to
be installation and user independent/proff.]

With the ioctl from /proc, it would be a three line code section:

	sprintf(procname,"/proc/%05d",getpid());
	procfd = open(procname,O_RDONLY);
	textfd = ioctl(procfd,PIOCOPENT);
	lseek(textfd, (long)offset,0);

So, if that is what you are really intending to do, use Version 8
or wait a year -- it is already available for System V Release 3.1. for
the 3B4000.
-- 
=Dennis L. Mumaugh
 Lisle, IL       ...!{ihnp4,cbosgd,lll-crg}!cuuxb!dlm

pjh@mccc.UUCP (Pete Holsberg) (05/25/88)

In article <10310@ulysses.homer.nj.att.com> dgk@ulysses.homer.nj.att.com (David Korn[eww]) writes:
...
...ksh passes the full pathname of the executable as the first environment
...variable and names it _.  Thus, if the program is run by ksh,
...genenv("_"); returns a pathname for the executable.  Now if everyone
...would follow this convention the problem would be solved.

Aspen Technology's implementation of ksh sets _ to just the name of the
executable.  (At least, that is what is stored in _ in the environment space.)

aburt@isis.UUCP (Andrew Burt) (05/26/88)

In article <4626@hoptoad.uucp> gnu@hoptoad.uucp (John Gilmore) writes:
>mer6g@uvaarpa.virginia.edu (Marc E. Rouleau) wrote:
>> The whole point of John Gilmore's proposal was to address this problem
>> of executables being found by path-search and therefore having no
>> invocation-time knowledge of where they reside.
>
>Actually, that's not the point (you could always write a subroutine
>that searched the path to find argv[0]).  The point is that I want a
>mechanism that cannot be spoofed.  Mystery variables in the
>environment, library routines that look at argv[0], etc, can all be
>spoofed by a 3-line program (that changes the environment then calls
>exec(), or that passes different things as the filename to execute
>versus argv[0] to exec()).  If real applications are going to use this,
>it's critical that they are able to depend upon the pointer they find.
>Imagine if you could invoke uucp with your own private set of spool
>directories -- all the security built into it would be pretty useless.
>You could steal mail, forge things, etc.

If all you need is a secure method of obtaining a single pathname (e.g., for
the lib dir of an application) why not use [**kludge alert**] a dummy
entry in /etc/passwd with home dir set to the path desired (actual login
disabled, of course)?  Program wanting to know its lib dir just does
getpwnam(compiled_in_application_id) and off it goes.

Now I hate to junk up /etc/passwd with this sort of thing (and have an
alternative suggestion below) but this is easily done with current tools.
I'd like to see a convention on usernames chosen for application "users",
maybe prepending an underscore to a simple name for the application (_uucp,
_news, _nethack, etc.).  Admittedly this isn't needed but it makes it
more obvious the entry isn't for a real user.

What would even be more useful (and what this approximates in an ugly way)
is a global environment (that isn't user changeable).  A true global env.
could be implemented by a lib func (getsysenv(var_name)) that looks for
"var_name=..." in a file, /etc/environment say.  (Granted, we could make
this a system call and store the environment in core all the time, but it
strikes me programs wouldn't look up definitions so often that much time
would be saved.)  Makes the system admin job easier too.  I always feel
a little uneasy editing passwd (I dial in on often noisy phone lines), I'd
feel better editing a less crucial file.

Implementation-wise there's not much to this -- anyone who wants a copy
let me know -- but there is the problem of getting it universally adopted.
The passwd approach has the advantage that the file exists and people know
what it does.  (On the other hand, the /etc/environment would be more
adoptable for non-unix systems where there is no passwd...)

Thoughts anyone?
-- 

Andrew Burt 				   			isis!aburt

              Fight Denver's pollution:  Don't Breathe and Drive.

jv@mhres.mh.nl (Johan Vromans) (05/28/88)

From article <2272@isis.UUCP>, by aburt@isis.UUCP (Andrew Burt):
> What would even be more useful (and what this approximates in an ugly way)
> is a global environment (that isn't user changeable).  A true global env.
> could be implemented by a lib func (getsysenv(var_name)) that looks for
> "var_name=..." in a file, /etc/environment say.  (Granted, we could make
> this a system call and store the environment in core all the time, but it
> strikes me programs wouldn't look up definitions so often that much time
> would be saved.)  Makes the system admin job easier too.  I always feel
> a little uneasy editing passwd (I dial in on often noisy phone lines), I'd
> feel better editing a less crucial file.
> 
> Thoughts anyone?

I like the idea. It seems to me that it is also usefull for tailorable
system constants like hostname, domain-name, number-of-lines for the system 
printer and (default) timezone.
The main thing is, that although everyone can use it, only the system
administrator can change the settings. This allows for security and 
reliability.

To get it accepted: make a solid, good-looking implementation and post it
to comp.sources.unix. And then start posting programs which use it.

-- 
Johan Vromans                              | jv@mh.nl via European backbone
Multihouse N.V., Gouda, the Netherlands    | uucp: ..{uunet!}mcvax!mh.nl!jv
"It is better to light a candle than to curse the darkness"

boyd@basser.oz (Boyd Roberts) (05/31/88)

You people are sick.  The best you're going to get is to change the
command interpretter to make argv[0] == the pathname passed to exec.
Then it's a convention.  Security, ``..'' & symbolic links be buggered.

I don't believe this discussion.  This is why Roche made diazepam (Valium).
With 5mg of diazepam comp.unix.wizards would become bearable (laughable).


Disclaimer:  what wizards?



Boyd Roberts			boyd@basser.cs.su.oz

``When the going gets wierd, the weird turn pro...''

limes@ouroborous (Greg Limes) (06/10/88)

GENERAL COMMENTS

   First off, thanks in advance for not wiring the base directory into the
   program anywhere; your application will fit nicely into a networked
   workstation environment where the users may mount your installed directory
   tree anywhere.

IGNORE THE ENVIRONMENT

   Fancy environment variables are fine, but these fail in unexpected ways;
   remember that the variable is blindly inherited across exec() calls. Thus, if
   your program was started by a "make" (or similar utility), you may get
   pointed to the wrong guy. Also, you may find that a large number of
   installations will not support this special new environment variable in any
   case. 

FORGET MODIFYING THE KERNEL

   Can you imagine trying to get all the Unix vendors together on this? Can you
   imagine trying to get all the customers to upgrade? I know of at least one
   major installation of Sun workstations that is still running SunOS 3.2 Beta!

DUPLICATE exec()'s WORK

   The only thing we can really count on (and even this not always) is that, if
   we do the same kind of search that exec() does, we should come up with the
   same destination. So, it looks like we will need to scan the $PATH variable,
   looking for an executable called (argv[0]).

REMEMBER SYMBOLIC LINKS

   Now, we probably want to find the directory, so toss in a readlink() and you
   are there. Add error checking to taste, season well with lint.

FINGERPRINT THE DIRECTORY

   To make this secure, fingerprint your directory. Make a read-only file that
   is set-uid to a user id number that your EXECUTABLE knows about, and put some
   data in the file so you are sure this is the right fingerprint. If I were
   worried about making, say, GnuEmacs "absolutely sure" of its start point, I
   would set up a "message of the day", owned by (say) daemon, setuid, and read
   only. Make all your critical files owned by and writable only by the same
   user.  Joe Hacker who duplicates the installation with the intention of
   changing things around will be unable to duplicate the key file, and the
   application will know that it has found an improper installation directory.
   You may want to fingerprint each directory in the tree, just in case someone
   gets fancy with mount points.

Anybody see any big holes here? (yea, a stupid question, I know...)

-- Greg Limes [limes@sun.com]				frames to /dev/fb