[comp.unix.questions] A couple questions

marcp@beryl.berkeley.edu.UUCP (04/14/87)

Hello!  I've a couple of questions in UNIX 4.2.

1.  How do programs like "more" distinguish between text files and
	executable files?  Hopefully, there's something surer than
	just taking a sample of a file and testing it.  (This question
	came up when a bunch of people started accidentally sending
	executables to a line printer, and I was trying to figure
	out a way to filter out the execs from the texts).

2.  Is it possible to, while in a C program, call another program and
	put it into the background?  Actually, I know it's possible,
	'cause I can do it with a line like:

		system("cat textfile &");

	This won't work, however, if I try to "more" the file instead.
	What determines what can be put in the background and what can't?
	Is there some way to run a program from within a program, and have
 	it return upon completion to the original program besides "system"?
	(The execl series, of course, doesn't return.)

Thanks for your time,

							Marc M. Pack

avolio@gildor.dec.com (Frederick M. Avolio) (04/14/87)

In article <3164@jade.BERKELEY.EDU> marcp@beryl.berkeley.edu (Marc M. Pack) writes:
>Hello!  I've a couple of questions in UNIX 4.2.
>
>1.  How do programs like "more" distinguish between text files and
>	executable files?  
More 1) checks the file type to make sure it is not a directory.
     2) Looks for a magic number at the head of the file.  For
	example, 0407 (exec.), 0410 (pure exec.), 0413(demand paged),
	0177545(old archive)..... it knows that it should not more it
	and gives an appropriate message.  It does not know about data
	files, as far as I can tell and more will go ahead and show them to
	you if you want.  Not too good for a printer either... The
	file program samples data.  If it finds characters greater
	than 255 it decides it is a "data" file.  A filter such as
	this is maybe your best bet if you have a "simple" lineprinter...

>2.  Is it possible to, while in a C program, call another program and
>	put it into the background?  Actually, I know it's possible,
>	'cause I can do it with a line like:
>		system("cat textfile &");
>
>	This won't work, however, if I try to "more" the file instead.
>	What determines what can be put in the background and what can't?
>	Is there some way to run a program from within a program, and have
> 	it return upon completion to the original program besides "system"?
>	(The execl series, of course, doesn't return.)

You're on the right track...  execl et al. don't return but
overlays so what you do is a fork and execl.  The parent can wait for
the child to finish or not.  BAsically this is what system is.  It ...

	FORK
		EXEC SHELL WITH COMMAND LINE
	PARENT WAITS

As in:

/* FORK RETURNS THE PID OF THE CHILD TO THE PARENT AND 0 TO THE CHILD */

       if ((pid = fork()) == 0)		/* IF CHILD */
		{	
                execl("/bin/sh", "sh", "-c", arg, 0);
		printf("HELP! SOMEONE STOLE THE SHELL!\n");
		exit(1);
	        }
	wait(0);	/* WAIT SATISFIED WHEN CHILD FINISHES

So this kind of thing will do what you want.  See manual page on fork
(section 2) for more details.  (If on a 4.*BSD ort Ultrix system use
cfork instead...  Why?  I don't know, the same reason you type 'sync'
three times before halting :-).)

BTW, "more" doesn't work too well in the background as you did it
because it is not associated at that point with the tty anymore.  Cat
doesn't care what it cats to.

Fred

josh@hi.UUCP (04/14/87)

In article <3164@jade.BERKELEY.EDU> marcp@beryl.berkeley.edu (Marc M. Pack) writes:
>Hello!  I've a couple of questions in UNIX 4.2.
>
>1.  How do programs like "more" distinguish between text files and
>	executable files?  Hopefully, there's something surer than
>	just taking a sample of a file and testing it. 

I think this is the way it does it. First it stat's it to see
if it is a directory then it reads in some of it to
see if it is ascii.

>
>2.  Is it possible to, while in a C program, call another program and
>	put it into the background?  Actually, I know it's possible,
>	'cause I can do it with a line like:
>
>		system("cat textfile &");
>
>	This won't work, however, if I try to "more" the file instead.
>	What determines what can be put in the background and what can't?
>	Is there some way to run a program from within a program, and have
> 	it return upon completion to the original program besides "system"?
>	(The execl series, of course, doesn't return.)

This is a loaded question and can become far more complicated then
you know...

The command more(1) checks to see if it is run on
a terminal (a tty).  This is why "more" will not run in a system() call.
You CAN put almost anything in the background, it is just harder
if it requires to be run on a terminal (talk(1), more(1), etc).

A short source for system might look like (Hmm...):

system(string)
char string[];
{
	if(vfork()) {
		return(wait(0));
	} else {
		... parse string ...
		execl(prog,args);
	}
}
/* This is not a complete system() but I am writting it from memory. */

If you wish to run a more(1) or a talk(1) or rogue(6), it requires
 the program to open a pty.  This makes the program (more) think
 it is on a terminal while it really is not.  This is how
 rlogin works.  Try the following:

	1) Set up a .rhosts file in your home directory
	2) rsh to that account using the following command:

	  % rsh {machine} /bin/csh -i

	3) Try doing a "more" of a file.  Gets sick eh?
	   This is one reason why pty's were invented.

I have a whole library of routines for networking, ptys, etc
 that I can post if there are any requests for it...

>
>Thanks for your time,
>
>							Marc M. Pack


	Hope I was helpful
			--Josh Siegel
-- 
Josh Siegel		(siegel@hc.dspo.gov)
                        (505) 277-2497  (Home)
		I'm a mathematician, not a programmer!

dce@mips.UUCP (04/15/87)

In article <3164@jade.BERKELEY.EDU> marcp@beryl.berkeley.edu (Marc M. Pack) writes:
>Hello!  I've a couple of questions in UNIX 4.2.
>
>1.  How do programs like "more" distinguish between text files and
>	executable files?  Hopefully, there's something surer than
>	just taking a sample of a file and testing it.  (This question
>	came up when a bunch of people started accidentally sending
>	executables to a line printer, and I was trying to figure
>	out a way to filter out the execs from the texts).

In standard 4.2/4.3 systems, more, vi, and file all have "magic numbers"
(numbers found at the beginning of special files like object files)
coded in them. In System V-based systems (and some BSD-based commercial
systems), the file command uses a file (/etc/magic) that contains
magic number information. Tektronix's UTek even supplies a libc subroutine
that interfaces to /etc/magic.

In any event, you probably want to make your filter check for nulls and
high bits ((x & 0200) != 0), since not all "garbage files" are known
to vi and more.

>2.  Is it possible to, while in a C program, call another program and
>	put it into the background?  Actually, I know it's possible,
>	'cause I can do it with a line like:
>
>		system("cat textfile &");
>
>	This won't work, however, if I try to "more" the file instead.
>	What determines what can be put in the background and what can't?
>	Is there some way to run a program from within a program, and have
> 	it return upon completion to the original program besides "system"?
>	(The execl series, of course, doesn't return.)

The more command works funny if you try to put it in the background because
it works very closely with the tty.

As for there being a way to execute subprograms other than with system(),
there is a way (remember: the shell is just another command as far as
Unix is concerned). The idea is something like:

	pid = fork();
	if (pid < 0) {
		perror("fork");
		return;
	}
	if (pid == 0) { 	/* child */
		execlp(command, command, arg1, arg2, ..., 0);
		perror(command);
		_exit(127);
	}
	while ((wpid = wait(0)) != pid) {
		if (wpid < 0) {
			break;
		}
	}

Note that the last loop is important (this is due to the way pipes
work, and the lack of this loop used to cause a bug to show up in the
crypt command).

The above code is effectively what system() does, but if you don't
need any of the special shell features (redirection, pipes, variables,
etc.) or want to do things like redirection yourself (see fpopen(3s)),
this saves you from forking a shell.
-- 
David Elliott		{decvax,ucbvax,ihnp4}!decwrl!mips!dce

jon@eps2.UUCP (04/15/87)

In article <1296@decuac.DEC.COM>, avolio@gildor.dec.com (Frederick M. Avolio) writes:
>        if ((pid = fork()) == 0)		/* IF CHILD */
> 		{	
>                 execl("/bin/sh", "sh", "-c", arg, 0);
> 		printf("HELP! SOMEONE STOLE THE SHELL!\n");
> 		exit(1);

Use _exit() in this instance instead of exit() because there is the
possibility of flushing the stdio buffers twice, once in the child and
once in the parent, which wouldn't really be right.  You might see some
output twice.  I would probably use an fputs to stderr here, to avoid
buffering.  This was probably just an oversight on Frederick's followup.
To really confuse people running ps, replace "sh" with a clever saying in
double quotes.


Jonathan Hue	DuPont Design Technologies/Via Visuals		leadsv!eps2!jon

gwyn@brl-smoke.UUCP (04/16/87)

In article <80@eps2.UUCP> jon@eps2.UUCP (Jonathan Hue) writes:
-In article <1296@decuac.DEC.COM>, avolio@gildor.dec.com (Frederick M. Avolio) writes:
->        if ((pid = fork()) == 0)		/* IF CHILD */
-> 		{	
->                 execl("/bin/sh", "sh", "-c", arg, 0);
-> 		printf("HELP! SOMEONE STOLE THE SHELL!\n");
-> 		exit(1);
-
-Use _exit() in this instance instead of exit() because there is the
-possibility of flushing the stdio buffers twice, once in the child and
-once in the parent, which wouldn't really be right.  You might see some
-output twice.  I would probably use an fputs to stderr here, to avoid
-buffering.  This was probably just an oversight on Frederick's followup.
-To really confuse people running ps, replace "sh" with a clever saying in
-double quotes.

The above advice, which is good as far as it goes, is insufficient.
Using _exit() will mean that the printf may not be seen (although
with default line-buffering to the terminal it probably would be),
since stdio buffer flushing would be skipped.  Before the fork(),
there should be an fflush(stdout) to clear out any buffered parent
data.  Error messages should probably be written to stderr rather
than stdout (stdout is often directed down a pipe).  Finally, it is
much better style to return an error indication to a higher level
of program control and let the higher level determine strategy (such
as whether to print an error message).

The fact that it is hard to get this stuff exactly right is why one
should use the library routines such as system() instead, whenever
possible.  (If the library routine is broken, get it fixed!)

baccala@USNA.MIL (Brent W Baccala) (04/17/87)

"Frederick M. Avolio" <avolio@gildor.dec.COM> writes:

>...(If on a 4.*BSD ort Ultrix system use
>cfork instead...  Why?  I don't know, the same reason you type 'sync'
>three times before halting :-).)

I've never heard of cfork, and it can't find a manual page for it on
our 4.3 BRL system.  I don't know much (anything) about Ultrix, but do
you mean vfork?  vfork does a "virtual" fork - most of the parent's
memory space is not copied.  Instead, the parent is suspended while
the child uses some of its memory.  If you're going to do an exec of
some flavor, you don't to change the parent's memory anyway, so this
is very memory-efficient.  There are (of course) strings attached to
what you can and can't do in a vfork - read the man page.  In
particular, you can't return from the function that called vfork
because that would screw up the stack frame.  You also can't use exit
on an error (use _exit) because exit will close stdio structures in
the parent.  Its even wrong (as some people have suggested) to use
exit from a fork, because even though you have a separate set of file
descriptors, data buffered before the fork will get flushed twice.

P.S.  "_exit" is a fast exit - it terminates the process without
doing any of the housekeeping that "exit" does (by calling
"_cleanup").

			- BRENT W. BACCALA -
			Computer Aided Design/Interactive Graphics
			U.S. Naval Academy
			Annapolis, MD

			<decvax!brl-smoke!usna!baccala>
			<seismo!usna!baccala>
			<baccala@usna.arpa>

robertd@ncoast.UUCP (Rob DeMarco) (04/18/87)

In article <3164@jade.BERKELEY.EDU> marcp@beryl.berkeley.edu (Marc M. Pack) writes:
>Hello!  I've a couple of questions in UNIX 4.2.
>
>1.  How do programs like "more" distinguish between text files and
>	executable files?  Hopefully, there's something surer than
>	just taking a sample of a file and testing it.  (This question
>	came up when a bunch of people started accidentally sending
>	executables to a line printer, and I was trying to figure
>	out a way to filter out the execs from the texts).

   I would believe that a pretty sure
method would be to test the file
permisions, if an "x" accours in the
file permisions, then it is executable,
other wise its text.

>2.  Is it possible to, while in a C program, call another program and
>	put it into the background?  Actually, I know it's possible,
>	'cause I can do it with a line like:
>
>		system("cat textfile &");
>
>	This won't work, however, if I try to "more" the file instead.
>	What determines what can be put in the background and what can't?

   My guess is that more accepts input,
since you have to press <SPACE> to go
on. Since it is in background, it
doesn't wait for it to complete before
going on, therefor , getting input is
imposible, because it doesn't check for
input.

>Thanks for your time,
   Your welcome!   :-)
>							Marc M. Pack


-- 
[=====================================]
[             Rob DeMarco             ]
[ UUCP:decvax!cwruecmp!ncoast!robertd ]
[                                     ]
[ "bus error - passengers dumped"     ]
[===============7@rid/*	(/*	(/,
/Wagiste

dsg@mitre-bedford.arpa (Dave Goldberg) (04/21/87)

>   I would believe that a pretty sure
>method would be to test the file
>permisions, if an "x" accours in the
>file permisions, then it is executable,
>other wise its text.

If only it were that easy.  However, I can make any text file have permission
of 755 (rwxrwxrwx) and still be a pure ascii file.  This is even useful in the
case of shell scripts.

Dave Goldberg
dsg@mitre-bedford.arpa

Disclaimer: for this you want a disclaimer!?!

dsg@mitre-bedford.arpa (Dave Goldberg) (04/21/87)

>of 755 (rwxrwxrwx) and still be a pure ascii file.  This is even useful in the
Before anyone gets a chance to flame me, 755 was a typo, I meant 777.

Dave Goldberg
dsg@mitre-bedford.arpa

Disclaimer: for this you want a disclaimer!?!

chuckles@aoa.UUCP (Charles Stern) (04/22/87)

In article <2382@ncoast.UUCP> robertd@ncoast.UUCP (Rob DeMarco) writes:

>>Hello!  I've a couple of questions in UNIX 4.2.
>>
>>1.  How do programs like "more" distinguish between text files and
>>	executable files?  Hopefully, there's something surer than
>>	just taking a sample of a file and testing it.  (This question
>>	came up when a bunch of people started accidentally sending
>>	executables to a line printer, and I was trying to figure
>>	out a way to filter out the execs from the texts).
>
>   I would believe that a pretty sure
>method would be to test the file
>permisions, if an "x" accours in the
>file permisions, then it is executable,
>other wise its text.

This is not truly accurate.  Consider the case of an executable shell script...
more SURELY works on that! (thank G-d ;-))


>>Thanks for your time,
>   Your welcome!   :-)
>>							Marc M. Pack
>
>
>-- 
>[=====================================]
>[             Rob DeMarco             ]
>[ UUCP:decvax!cwruecmp!ncoast!robertd ]
>[                                     ]
>[ "bus error - passengers dumped"     ]
>[=====================================]

-- 

	Charles Stern
	...!{decvax,linus,ima,ihnp4}!bbncca!aoa!chuckles
	...!{wjh12,mit-vax}!biomed!aoa!chuckles

 "What's black and dangerous and sits in a tree?"
 "A crow with a machine gun."
  -- "Star Smashers of the Galaxy Rangers"
         Harry Harrison

goudreau@dg_rtp.UUCP (Bob Goudreau) (04/23/87)

In article <2382@ncoast.UUCP> robertd@ncoast.UUCP (Rob DeMarco) writes:
>In article <3164@jade.BERKELEY.EDU> marcp@beryl.berkeley.edu (Marc M. Pack) writes:
>>Hello!  I've a couple of questions in UNIX 4.2.
>>
>>1.  How do programs like "more" distinguish between text files and
>>	executable files?  Hopefully, there's something surer than
>>	just taking a sample of a file and testing it.  (This question
>>	came up when a bunch of people started accidentally sending
>>	executables to a line printer, and I was trying to figure
>>	out a way to filter out the execs from the texts).
>
>   I would believe that a pretty sure
>method would be to test the file
>permisions, if an "x" accours in the
>file permisions, then it is executable,
>other wise its text.
>

This isn't such a good idea, for three reasons:
1)	Shell scripts are executable files, but they are also printable
	ASCII files.  It might make some of your users a little mad to find
	some print jobs refused.
2)	On the other side of the coin, there exist files which are not
	executable and which are also unprintable;  a common example is
	any ".o" file.
3)	Finally, pr can accept redirected input.  How is it supposed to do
	a stat() on stdin?

A better filter would be a program that looks for indications that the
file is an object or a.out (program) file.  This is in fact what more
does;  it checks for for a "magic number" at the beginning of a file
indicating that the file is a program or object file.


>>2.  Is it possible to, while in a C program, call another program and
>>	put it into the background?  Actually, I know it's possible,
>>	'cause I can do it with a line like:
>>
>>		system("cat textfile &");
>>
>>	This won't work, however, if I try to "more" the file instead.
>>	What determines what can be put in the background and what can't?
>
>   My guess is that more accepts input,
>since you have to press <SPACE> to go
>on. Since it is in background, it
>doesn't wait for it to complete before
>going on, therefor , getting input is
>imposible, because it doesn't check for
>input.

This is along the right lines, but not correct.
Consider the following program:

main()
{
	system ("more /etc/termcap &");
	for (;;)  ;
}

This will work, (try it) because only one process (the more) is trying to read
its stdin. (The system() does a fork() and the child process inherits identical
copies of its parent's file descriptors, including stdin).

The place where you will run into trouble is when your main program is
also trying to read stdin at the same time;  the two processes will
fight over the input.  The same is true of stdout and stderr --  you will get
jumbled-together output.

The moral is, Be careful of what child processes do with file descriptors
inherited from the parent.  If the child is going to open its own files,
go right ahead and use system(foo &) to put it in the background.
You may want to become familiar with the fork() system call, since it
allows you to bypass some of the overhead of the system() lib function.

-- 
Bob Goudreau
Data General Corp.
62 Alexander Drive
Research Triangle Park, NC  27709
(919) 248-6231
...!mcnc!rti-sel!dg_rtp!goudreau

jfh@killer.UUCP (John Haugh) (04/29/87)

I love people who don't know what they are talking about, which is why
I always say - 'Read the documentation whether you know it or not'.

Disinformation correction in progress ...  MUNCH ... MUNCH ... MUNCH

In article <1752@dg_rtp.UUCP>, goudreau@dg_rtp.UUCP (Bob Goudreau) writes:
> In article <2382@ncoast.UUCP> robertd@ncoast.UUCP (Rob DeMarco) writes:
> >In article <3164@jade.BERKELEY.EDU> marcp@beryl.berkeley.edu (Marc M. Pack) writes:
> >>Hello!  I've a couple of questions in UNIX 4.2.
> >>
> >>1.  How do programs like "more" distinguish between text files and
> >>	executable files?  Hopefully, there's something surer than
> >>	just taking a sample of a file and testing it.  [ More stuff ]
> >
> >   I would believe that a pretty sure
> >method would be to test the file
> >permisions, if an "x" accours in the
> >file permisions, then it is executable,
> >other wise its text.
> >
> 
> This isn't such a good idea, for three reasons:
[ Dumb comment about shell scripts being executable and printable ... ]
[ Dumb comment about .o's not being executable.  Of course, the question
sent this guy in that direction. ]
[ Misinfomation Alert  ]
> 3)	Finally, pr can accept redirected input.  How is it supposed to do
> 	a stat() on stdin?

Try the originally suggested idea.  It does work, and see what file(1) says

	" ... If an argument appears to be ASCII, _file_ examines the
	first 512 bytes and tries to guess its language.  ... "

		- Quoted from "Plexus Sys5 UNIX User's Reference Manual".

Read 512 bytes (or 1024 if you want to be surer) and check to see if all of the
characters are printable.  How about using something like ctypes(3).  The
two macros isspace() and isprint() should do the trick.  Then, to make
things real robust (remember that word from Comp-Sci 101 :-) print character
not in isspace() || isprint() with some special convention.

Now for alittle disinformation correction.  (I knew the manual would get
a big workout today.  From stat(2), I read

	"Similaryly, _fstat_ obtains information about an open file
	 known by the file descriptor _filedes_, ..."
		- Quoted from "Plexus Sys5 UNIX Programmer's Reference Manual".

Any file descriptor can be stat(2)'d, including 0, 1, and 2 which were
opened long, long ago.  If you wanted to, you could even find out the name
of the file that was connected to the descriptor.  (It is _not_ easy :-( )
> 
> A better filter would be a program that looks for indications that the
> file is an object or a.out (program) file.  This is in fact what more
> does;  it checks for for a "magic number" at the beginning of a file
> indicating that the file is a program or object file.
> 
No, this is a stupid idea.  The problem with printouts screwing up printers
is not because they are _object_ files, it is because they contain characters
that screw up the printer.  Look for those characters.  What happens if
your users decide to print core dumps, directories, /etc/wtmp and the like.
A well thought out approach, or even the one I suggested (it took me about
12 seconds to come up with it.) will find out if the file can be printed.
> 
> >>2.  Is it possible to, while in a C program, call another program and
> >>	put it into the background?  Actually, I know it's possible,
[ And he tells us why (system ("command &");) ]
> >
> >   My guess is that more accepts input,
> >since you have to press <SPACE> to go
> >on. Since it is in background, it
> >doesn't wait for it to complete before
> >going on, therefor , getting input is
> >imposible, because it doesn't check for
> >input.
[ I can't even follow what this poster wants to say ... ]
> 
> This is along the right lines, but not correct.
> Consider the following program:
> 
> main()
> {
> 	system ("more /etc/termcap &");
> 	for (;;)  ;
> }
> 
> This will work, (try it) because only one process (the more) is trying to read
> its stdin. (The system() does a fork() and the child process inherits identical
> copies of its parent's file descriptors, including stdin).
> 
No, once again you are wrong, wrong, wrong.  The system may do a fork(2), but
the child does an exec(2) of the sh(1), which says

	"If a command is followed by & the default standard input for the
	 command is the empty file _/dev/null_. ..."
		-Quoted from "Plexus Sys5 UNIX User's Reference Manual".

The shell closes file descriptor 0 and open(2)'s /dev/null, with the
consequence that the file descriptor that is returned is 0.  So it don't
get the same file descriptor.  And besides, _stdin_ is NOT a file descriptor.
Try using it in a place where one is needed.

> The place where you will run into trouble is when your main program is
> also trying to read stdin at the same time;  the two processes will
> fight over the input.  The same is true of stdout and stderr --  you will get
> jumbled-together output.
> 
Different problem with standard error and standard output.  If you look
at what the book's got to say, it tells you that with two processes reading
from a terminal, the system flips a coin to decide which process gets it.
It actually lets the two processes beat each other in the head for it.
(I have seen the code for coinflip() an clobber() in the kernel with my
own to eyes :-) :-) :-).
> The moral is, Be careful of what child processes do with file descriptors
> inherited from the parent.  If the child is going to open its own files,
> go right ahead and use system(foo &) to put it in the background.
> You may want to become familiar with the fork() system call, since it
> allows you to bypass some of the overhead of the system() lib function.
> 
You could always close them and not worry about it if that is such a big
deal.  More(1) will still grab you because it doesn't use stdin to get the
commands from the keyboard!!!  Try 'cat /etc/passwd | more'.  This works
pretty much the same as 'more /etc/passwd'.  If it didn't, the man(1)
command wouldn't work for those of us that have, or added a more(1) in the
output pipeline.  Since the 'cat /etc/passwd |' is the standard input,
it can't be reading from there.  I can't find strings(1) on this machine
so I can't tell you that more(1) is opening /dev/tty, BUT - the last
more(1) clone I wrote did just that.  Unless the output wasn't a file
or pipe or somthing other that a tty (remember isatty(2)?).

You of course, might want to become familiar with the manuals.  Of course,
you can always get had and say stupid things about system calls they just
added in the newest release.  But that is a different brand of stupidity.

The moral of the story is - don't just say 'This is a nice article, I think
I'll reply' unless you want to contribute some real information.  Also, not
everyone has the time to research or the knowelege to contribute a worth
while reply, so don't feel bad if you can't.  (But you still ought to read
the manuals anyway.)

- John.		(jfh@killer.UUCP)

Disclaimer:
	No disclaimer.  Whatcha gonna do, sue me?

gwyn@brl-smoke.ARPA (Doug Gwyn ) (05/06/87)

In article <816@killer.UUCP> jfh@killer.UUCP (John Haugh) writes:
>If you wanted to, you could even find out the name
>of the file that was connected to the descriptor.  (It is _not_ easy :-( )

I would think it's actually impossible.  The kernel doesn't remember
the name of the path you used to open an inode, and some descriptors
(e.g. pipes) have no associated names.

cdash@boulder.UUCP (05/07/87)

In article <5835@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
>In article <816@killer.UUCP> jfh@killer.UUCP (John Haugh) writes:
>>If you wanted to, you could even find out the name
>>of the file that was connected to the descriptor.  (It is _not_ easy :-( )
>
>I would think it's actually impossible.  The kernel doesn't remember
>the name of the path you used to open an inode, and some descriptors
>(e.g. pipes) have no associated names.

actually, it is possible. you know the inode associated with the descriptor
start at "/" and just keep looking for that i# keeping track of where you are.
Like the man said, it ain't easy, but it CAN be done.
-- 


cdash   aka cdash@boulder.colorado.edu    aka ...hao!boulder!cdash
	aka ...nbires!boulder!cdash

gwyn@brl-smoke.ARPA (Doug Gwyn ) (05/07/87)

In article <634@boulder.Colorado.EDU> cdash@boulder.Colorado.EDU (Charles Shub) writes:
>actually, it is possible. you know the inode associated with the descriptor
>start at "/" and just keep looking for that i# keeping track of where you are.

This can find A name for the inode (assuming that there IS one and
that you avoid the many pitfalls that are possible), but not THE
name that was used to open the file.  (Even in the absence of more
than one link, a variety of names could have been used.)

You wouldn't want to wait for this anyway on some of the large
filesystems we have around here.

chris@mimsy.UUCP (05/07/87)

>>In article <816@killer.UUCP> jfh@killer.UUCP (John Haugh) writes:
>>>If you wanted to, you could even find out the name
>>>of the file that was connected to the descriptor.

>In article <5835@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn) writes:
>>I would think it's actually impossible.

In article <634@boulder.Colorado.EDU> cdash@boulder.Colorado.EDU
(Charles Shub) writes:
>actually, it is possible. you know the inode associated with the descriptor
>start at "/" and just keep looking for that i# keeping track of where you are.
>Like the man said, it ain't easy, but it CAN be done.

`It' can be done:  What can be done?  The problem is ill-defined
in the first place.  `Find *the* name of the file.'  Who says there
is only one?

	% ln file ../other/file

`Find *a* name of the file.'  That can be done iff the file has at
least one name.

`Find the internal name of the file.'  Easy: this is just the <dev,
ino> pair.  One problem is that few people wish to deal with this
form of the name.

Incidentally, `find / -inum <n>' takes a *long* time on a big system.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
Domain:	chris@mimsy.umd.edu	Path:	seismo!mimsy!chris

guy@gorodish.UUCP (05/07/87)

> actually, it is possible. you know the inode associated with the descriptor
> start at "/" and just keep looking for that i# keeping track of where you
> are.

No, Doug is correct.  In the general case, it is *not* possible -
there may not *be* a directory entry that refers to the inode in
question.  The inode may be an unnamed pipe, or may be a file that
was unlinked.  Even if there is a directory entry that refers to the
inode in question, it is not possible if you lack read permission on
any of the directories leading up to that file.

Furthermore, given the procedure you suggest, it may be technically
possible in many cases but it is not practical in many, probably
most, of them.  Doing a top-down search for a given inode, starting
at "/", can take a *very* long time unless you're very near the root.

shz@desoto.UUCP (05/07/87)

In article <634@boulder.Colorado.EDU> cdash@boulder.Colorado.EDU (Charles Shub) writes:
>In article <5835@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
>>In article <816@killer.UUCP> jfh@killer.UUCP (John Haugh) writes:
>>>If you wanted to, you could even find out the name
>>>of the file that was connected to the descriptor.  (It is _not_ easy :-( )
>>
>>I would think it's actually impossible.  The kernel doesn't remember
>>the name of the path you used to open an inode, and some descriptors
>>(e.g. pipes) have no associated names.
>
>actually, it is possible. you know the inode associated with the descriptor
>start at "/" and just keep looking for that i# keeping track of where you are.
>Like the man said, it ain't easy, but it CAN be done.
>-- 

Actually, it is even more difficult.  I-numbers are only unique within
a filesystem.  You would first have to determine which filesystem your FD 
pointed to, and then start the search from the root of that filesystem.  Even 
then it may not be possible to find the name of the file that was opened.
Assuming the FD did reference a file (as opposed to an unnamed pipe or other 
'magic' device), the last remaining name could have been unlinked.  Which
brings us to the last point: there may be multiple links to the file.  The 
best you could hope for is to find the first name or all names, not 
necessarily the correct name.

Check out the [usually undocumented] '-inum' option of FIND(1).

Seth
desoto!shz

schwartz@swatsun (Scott Schwartz) (05/07/87)

In article <6582@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) writes:
> Incidentally, `find / -inum <n>' takes a *long* time on a big system.

But if you had to do it, wouldn't you use ncheck(8)? (Assuming you have
the required permissions)

-- 
# Scott Schwartz
# UUCP: ...{{seismo,ihnp4}!bpa, cbmvax!vu-vlsi, sun!liberty}!swatsun!schwartz
# AT&T: (215)-328-8610	/* lab phone */

rbj@icst-cmr.arpa (Root Boy Jim) (05/07/87)

   In article <634@boulder.Colorado.EDU> cdash@boulder.Colorado.EDU
	(Charles Shub) writes:
   >actually, it is possible. you know the inode associated with the descriptor
   >start at "/" and keep looking for that i# keeping track of where you are.

Doug & Chris have already pointed out the possibility that a file may have
more than one name. I wish to point out that it may have none at all.
In addition to the obvious case of a pipe (or FIFO, socket, or any other
abstraction that uses an inode abstraction internally), consider typing:

		rm -f foo; yes > foo & rm foo

Don't try this at home, kids!

	(Root Boy) Jim "Just Say Yes" Cottrell	<rbj@icst-cmr.arpa>

jgy@hropus.UUCP (05/08/87)

>>>In article <816@killer.UUCP> jfh@killer.UUCP (John Haugh) writes:
>>>>If you wanted to, you could even find out the name
>>>>of the file that was connected to the descriptor.
>
>>In article <5835@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn) writes:
>>>I would think it's actually impossible.
>
>In article <634@boulder.Colorado.EDU> cdash@boulder.Colorado.EDU
>(Charles Shub) writes:
>>actually, it is possible. you know the inode associated with the descriptor
>>start at "/" and just keep looking for that i# keeping track of where you are.
>>Like the man said, it ain't easy, but it CAN be done.
>
>`It' can be done:  What can be done?  The problem is ill-defined
>in the first place.  `Find *the* name of the file.'  Who says there
>is only one?
>
>	% ln file ../other/file
>
>`Find *a* name of the file.'  That can be done iff the file has at
>least one name.
>
>`Find the internal name of the file.'  Easy: this is just the <dev,
>ino> pair.  One problem is that few people wish to deal with this
>form of the name.
>
>Incidentally, `find / -inum <n>' takes a *long* time on a big system.
>-- 

Try finding the name if it was unlinked(rm'd) after opening!

peter@citcom.UUCP (Peter Klosky) (05/08/87)

In article <6582@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) writes:
> Incidentally, `find / -inum <n>' takes a *long* time on a big system.

It's true that scanning the whole file system to find a given
inum would take a long time.  This approach is like scanning a whole
document for a given word by examining each word.  A better approach is
to have a sorted list of words with pointers to occurences.  Then the
words can be scanned using binary search.  The same approach can be used
with inode numbers by preparing a sorted list of inode numbers and file
names.

Given a list of file system id, inode number, file name records, it is
possible to locate possible names for a file open by a process.  In many
cases, this will let the enhanced "ofiles" recently posted to 
net.sources reveal the names of the files open by a given process.
It will have trouble in the case where the list is out of date, as
the system does not update the inum list.  For this reason the
program can be fine-tuned to scan directories where changes occur
often such as /tmp or other directories often used by the application.

If the file table of the process has a tcp/ip deal going, "ofiles"
knows about that, too, and will report if the process is waiting
to receive datagrams concerning "rwho" or whatever.

"ofiles" will also cat an unreferenced file, so even with

	yes >foo& rm foo

it is possible to see all the exciting data.
n.b. This program is a security hole, so only use it on systems
where the users are trusted.
-- 
Peter Klosky, Citcom Systems (materiel de telecommunications)
seismo!vrdxhq!baskin!citcom!peter (703) 689-2800 x 235

chris@mimsy.UUCP (Chris Torek) (05/09/87)

>In article <6582@mimsy.UUCP> I wrote:
>>`find / -inum <n>' takes a *long* time on a big system.

In article <1116@pompeii.UUCP> schwartz@swatsun (Scott Schwartz) writes:
>But if you had to do it, wouldn't you use ncheck(8)? (Assuming you have
>the required permissions)

A dangerous assumption:

	% df /usr
	Filesystem    kbytes    used   avail capacity  Mounted on
	/dev/hp4a     182123  138994   24916    85%    /usr
	% ls -lg /dev/*hp4a
	brw-r-----  1 root     operator   0,  32 Oct 17  1986 /dev/hp4a
	crw-r-----  1 root     operator   4,  32 Apr 27 08:52 /dev/rhp4a
	% groups
	staff wheel daemon sys kmem operator uucp internet emacs zmob
	tex bridge mcmob info speech um-software
	%

Anyone can run `find /usr', but only root and group operator can
read the drive directly.  Had *I* to do it, I would indeed use
ncheck; but the average program seeking to tie a name to, e.g.,
stdin could not assume that ncheck would succeed.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
Domain:	chris@mimsy.umd.edu	Path:	seismo!mimsy!chris

brandon@tdi2.UUCP (05/12/87)

In article <5835@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
+---------------
| In article <816@killer.UUCP> jfh@killer.UUCP (John Haugh) writes:
| >If you wanted to, you could even find out the name
| >of the file that was connected to the descriptor.  (It is _not_ easy :-( )
| 
| I would think it's actually impossible.  The kernel doesn't remember
| the name of the path you used to open an inode, and some descriptors
| (e.g. pipes) have no associated names.
+---------------

Well, it's possible for non-pipes; unfortunately, you have to essentially
recode ncheck(1m) to do it.

++Brando
-- 
Brandon S. Allbery	           UUCP: cbatt!cwruecmp!ncoast!tdi2!brandon
Tridelta Industries, Inc.         CSNET: ncoast!allbery@Case
7350 Corporate Blvd.	       INTERNET: ncoast!allbery%Case.CSNET@relay.CS.NET
Mentor, Ohio 44060		  PHONE: +1 216 255 1080 (home +1 216 974 9210)

jfh@killer.UUCP (John Haugh) (05/13/87)

In article <314@desoto.UUCP>, shz@desoto.UUCP (S. Zirin) writes:
> In article <634@boulder.Colorado.EDU> cdash@boulder.Colorado.EDU (Charles Shub) writes:
> >In article <5835@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
> >>In article <816@killer.UUCP> jfh@killer.UUCP (John Haugh) writes:
> >>>If you wanted to, you could even find out the name
> >>>of the file that was connected to the descriptor.  (It is _not_ easy :-( )
> >>
> >>I would think it's actually impossible.  The kernel doesn't remember
> >>the name of the path you used to open an inode, and some descriptors
> >>(e.g. pipes) have no associated names.
> >
> >actually, it is possible. you know the inode associated with the descriptor
> >start at "/" and just keep looking for that i# keeping track of where you are.
> >Like the man said, it ain't easy, but it CAN be done.
> >-- 
> 
I wanted to remove Doug's remarks at this point and would like to have
included Guy Harris's as he had a much better disagreement with me than
Doug did.  This next guy does not understand the meaning of the phrase

	It is _not_ easy :-(	<-- frown face ...

> Actually, it is even more difficult.
Like I said, 

				X     X
                                   |
                                  ___
                                 /   \

>	... I-numbers are only unique within
> a filesystem.  You would first have to determine which filesystem your FD 
> pointed to, and then start the search from the root of that filesystem.
A file is described by a device/i-number pair to be more exact.  Pipes have
the pipe device for their device number or so I recall, unless they are named
pipes - 
> 	... Even 
> then it may not be possible to find the name of the file that was opened.
Did I say I want THE name or just a name? I forget, and I don't want to scroll
up in the editor - So I say now what I meant.  A NAME.  Multiple links to
a file are equivalent in USG Unix - only BSD Unix has 'symbolic links'
(last time I checked).  These might be different, in which case, all bets
are off.
> Assuming the FD did reference a file (as opposed to an unnamed pipe or other 
> 'magic' device), the last remaining name could have been unlinked.
Then I guess the file doesn't have a name.  In which case noone could find
out the name.  Sorry.  :-(
>	... Which
> brings us to the last point: there may be multiple links to the file.  The 
> best you could hope for is to find the first name or all names, not 
> necessarily the correct name.
> 
I think I handled this one in the paragraph a ways back.  If all paths lead
to the same inode, why should I care which one I want.  And if I do, I can
go look for the rest of the names.  Once again, it ain't easy.
> Check out the [usually undocumented] '-inum' option of FIND(1).
You work for AT&T right?  Well what is the deal with the '-depth' option
of find(1)?
> 
> Seth
> desoto!shz

- John.

Disclaimer -
	No disclaimer.  Whatcha gonna do, sue me?

jfh@killer.UUCP (John Haugh) (05/15/87)

Every one writes:
	yes - no - yes - no - yes - no.

Every one reads:
	confusion - confusion - confusion.

Some files have no name[s].  If the file descriptor is for one of those,
you ain't gonna get a name.  Files in this catagory that come to mind are
unnamed pipes, and files that have been removed.  You will never find a
name for these guys.

Some files have names that you can't access.  If you aren't root you can't
look everywheres for the directory names, unless your system permits it.
You may never find a name for these guys unless you are root.

Some files have more than one name.  You really only need one unless you
want to do something special, (like unlink it) since all links are equal
(except in BSD land).  You CAN find at least one name for these files.

Did I cover everything?  Lets put this one to rest.  Respond by E-Mail
rather than posting.  Also, somewheres I have a PD copy of ftw(3) - I
think someone posted one also a while ago.

- John.		(jfh@killer.UUCP)

Disclaimer -
	If my boss knew what USENET was, he'd want one of his own.

Favorite Borrowed Saying -
	"It's never too late for a happy childhood."