[comp.os.misc] tracing system calls

brent%terra@Sun.COM (Brent Callaghan) (09/02/88)

In article <2040@cuuxb.ATT.COM>, dlm@cuuxb.ATT.COM (Dennis L. Mumaugh) writes:
> 1).  We have a system call trace program that reports on each and
> every system call a process makes -- useful for support to figure
> out what a program  REALLY  is  doing.

And how!  Actually any SunOs 4.0 user can do that now with
the trace(1) command.

	$ trace date
	open ("/usr/lib/ld.so", 0, 021004) = 3
	read (3, "".., 32) = 32
	mmap (0, 139264, 5, 0x80000002, 3, 0) = 0xedda000
	mmap (0xedfa000, 8192, 7, 0x80000012, 3, 16384) = 0xedfa000
	munmap (0xedde000, 114688) = 0
	open ("/dev/zero", 0, 021112) = 6
	close (3) = 0
	getrlimit (3, 0xefffbc0) = 0
	mmap (0xee00000, 8192, 3, 0x80000012, 6, 0) = 0xee00000
	getuid () = 3497
	getgid () = 10
	open ("/etc/ld.so.cache", 0, 01670000000) = 3
	fstat (3, 0xefffb18) = 0
	mmap (0, 8192, 1, 0x80000001, 3, 0) = 0xedf4000
	close (3) = 0
	open ("/usr/lib/libc.so.0.10", 0, 01667720000) = 3
	read (3, "".., 32) = 32
	mmap (0, 409600, 5, 0x80000002, 3, 0) = 0xed72000
	mmap (0xedd2000, 16384, 7, 0x80000012, 3, 286720) = 0xedd2000
	munmap (0xedb8000, 106496) = 0
	close (3) = 0
	close (6) = 0
	getpagesize () = 8192
	brk (0x225a0) = 0
	brk (0x245a4) = 0
	gettimeofday (0x20558, 0) = 0
	gettimeofday (0x20558, 0) = 0
	open ("/usr/share/lib/zoneinfo/localtim".., 0, 01) = 3
	read (3, "".., 8192) = 754
	close (3) = 0
	ioctl (1, 0x40125401, 0xefff26c) = 0
	write (1, "Thu Sep  1 12:48:29 PDT 1988\n", 29)
	Thu Sep  1 12:48:29 PDT 1988
	= 29
	close (0) = 0
	close (1) = 0
	close (2) = 0
	exit (0) = ?
	$

Made in New Zealand -->  Brent Callaghan  @ Sun Microsystems
			 uucp: sun!bcallaghan
			 phone: (415) 336 6188

james@bigtex.uucp (James Van Artsdalen) (09/03/88)

In article <66624@sun.uucp>, brent%terra@Sun.COM (Brent Callaghan) wrote:

> And how!  Actually any SunOs 4.0 user can do that now with
> the trace(1) command.

> 	$ trace date

I count 8 closes, 6 mmaps, 5 opens, 3 reads, 2 munmaps, 1 write...

What on earth does all of this have to do with printing the date and
time???  getgid?  ioctl?  That's 36 system calls.

I don't want to flame Sun over trace though: that is incredibly
useful.  I am curious about implementation though: if it will display
the data for write(2) it would seem a security hole unless disabled
for suid processes.  Is there any possible way to write a similar
program under SysVr3 without kernel modifications?
-- 
James R. Van Artsdalen    ...!uunet!utastro!bigtex!james     "Live Free or Die"
Home: 512-346-2444 Work: 328-0282; 110 Wild Basin Rd. Ste #230, Austin TX 78746

ekrell@hector.UUCP (Eduardo Krell) (09/04/88)

In article <7460@bigtex.uucp> james@bigtex.UUCP (James Van Artsdalen) writes:

>Is there any possible way to write a similar
>program under SysVr3 without kernel modifications?

You could recompile the C library to intercept every system call
and then recompile the world. I suggest you don't do that.

It would be easier to upgrade to SVR3.2 which includes the /proc driver
and the truss program. (I don't remember if the SVR3.1 release had it,
but I don't think so).

Eduardo Krell                   AT&T Bell Laboratories, Murray Hill, NJ

UUCP: {att,decvax,ucbvax}!ulysses!ekrell  Internet: ekrell@ulysses.att.com

raf@andante.UUCP (Roger A. Faulkner) (09/04/88)

In article <66624@sun.uucp> brent%terra@Sun.COM (Brent Callaghan) writes:
>In article <2040@cuuxb.ATT.COM>, dlm@cuuxb.ATT.COM (Dennis L. Mumaugh) writes:
>> 1).  We have a system call trace program that reports on each and
>> every system call a process makes -- useful for support to figure
>> out what a program  REALLY  is  doing.

[ Dennis was referring to AT&T's as yet unreleased truss(1) command. ]

>And how!  Actually any SunOs 4.0 user can do that now with
>the trace(1) command.

[ output of 'trace date' omitted ]

Great minds run in the same paths, with some variations.
AT&T's truss(1) command was developed without any knowledge of Sun's trace(1)
command's actual or planned existence.  I presume the reverse is also true.

I wish to point out some of the ways in which AT&T's truss(1) is superior to
Sun's trace(1).  I do not especially want to put Sun down (though some could
read it that way), only to indicate some shortcomings in what they have done
and to impress on your minds some of the delicacy of debugger interfaces.

Sun does get credit for being first; trace(1) is already available on SunOS
4.0 while truss(1) is planned for a future release of System V from AT&T.
And no, it is not available in SVR3.2 as of now; it does exist as an add-on
package for SVR3.1 and SVR3.2 on the 3B2, but not yet to the outside world.
Complain to AT&T, not to me.

First and foremost, it must be observed that trace(1) is based on Sun's
enhanced ptrace(2) system call while truss(1) is based on AT&T's proc(4)
process filesystem, invented by Tom Killian of Bell Labs research and
extended and implemented for System V by Ron Gomes, with significant
input from me.  The deficiencies in trace(1) are largely due to the
deficiencies in ptrace(2) as compared to proc(4).

Ron Gomes did proc(4), I did truss(1); the credit (or blame) goes to us.

1. truss(1) can follow children created by fork(2).  You can trace a shell
   script of arbitrary complexity.  My favorite is spell(1), which runs
   an 8-member pipeline.  trace(1) can't do this because the ptrace(2)ed
   condition is not inherited; proc(4) tracing flags can be inherited.

2. Both trace(1) and truss(1) can grab existing processes.  However,
   truss(1) will grab an arbitrary number while trace(1) will grab only
   one.  Also, there is a bug in ptrace(2):  If a process terminates
   while being traced, its termination status is delivered (via wait(2))
   to the controlling process, not to the process's parent.  If a process
   is grabbed by trace(1) and then dies on a signal, the process's parent
   is not informed of the termination; to it, the process just vanished.
   (Terminating via exit(2) works OK because trace(1) lets go in time.)

3. truss(1) allows you to specify which system calls you wish to trace
   or exclude.  trace(1) traces all syscalls regardless.  proc(4) accepts
   a bit-mask to specify which syscalls to stop on; ptrace(2) stops on
   all syscalls.  Untraced syscalls incur no overhead with proc(4).

4. truss(1) does symbolic interpretation of syscall arguments, using
   #define names from relevant system header files.  trace(1) shows
   arguments only in decimal, octal, or hexadecimal.  truss(1) has
   an option to turn off symbolic interpretation, for unredeemed
   hackers like me who must see the raw bits to be happy.

5. truss(1) (verbose option) shows the contents of structures passed by
   address to specified system calls.  The contents are shown on output;
   values passed back from the operating system (like the stat structure
   from stat(2)) are displayed properly.  trace(1) doesn't do this.

6. truss(1) shows all characters of any filename argument; trace(1)
   shows only the first 32.  This is related to the next item.

7. trace(1) uses a heuristic based on the number of printable characters
   in the first 32 bytes of the I/O buffer for a read(2) or write(2) to
   decide whether or not to print the first 32 bytes of the buffer as a
   string (ambiguously, since '\' may or may not be an actual character
   in the I/O buffer).  truss(1) always prints the first 16 bytes in an
   unambiguous format.  Also, truss(1) accepts an option to print the
   entire contents of the I/O buffer for read()s or write()s on specified
   file descriptors.  This feature came only after I had an opportunity
   to play directly with trace(1); kudos to Sun, this is very useful.

8. truss(1) optionally prints the argument and environment strings passed
   in each exec(2) system call.  trace(1) could do this too, but it is
   useful mostly when following children, which trace(1) can't do.

9. Both truss(1) and trace(1) accept an option to count system calls rather
   than showing them line-by-line.  truss(1) only counts those syscalls which
   are being traced; child process syscalls may be included in the counts.

10.truss(1) reports sleeping system calls as "sleeping ..." if they remain
   asleep for more than 2 seconds.  trace(1) can't do this because of the
   ptrace(2) interface.

11.Both truss(1) and trace(1) report the receipt of signals.  Neither
   reports a signal before it is received (sent but blocked).  truss(1),
   by virtue of the proc(4) interface, reports any machine fault which
   the process incurs when it is incurred, even if the associated signal
   is blocked; trace(1) cannot do this with ptrace(2).

12.truss(1) accepts options to trace or exclude specified signals or
   machine faults.  proc(4) accepts a bit-mask of signals or faults
   to stop upon; ptrace(2) stops on all signals but no faults.

13.When truss(1) encounters an exec(2) of a set-uid or set-gid object
   (a process tracing security violation), proc(4) forces it to give up
   and allow the process to continue unmolested.  When trace(1) encounters
   such an exec(2), ptrace(2) silently disables the setting of the set-uid
   or set-gid and trace(1) continues to trace the process.  The process
   will eventually fail because it doesn't have correct permissions.
   The proc(4) interface does a proper job of enforcing security without
   changing process behavior; ptrace(2) just botches it (and always has).
   If truss(1) is run as super-user, set-uid and set-gid processes can
   be traced with no problem.  Running trace(1) as super-user helps some
   but it still has the same problem for non-super-user grabbed processes.

14.The ptrace(2) mechanism is intimately intertwined with the signal
   mechanism.  In particular, stopping on syscalls involves sending
   SIGTRAP.  If a process uses SIGTRAP for interprocess communication
   (I would call such a process terminally brain-damaged, but nothing in
   the system prevents such things), it will fail when trace(1) is applied
   to it.  The proc(4) mechanism is independent of the signal mechanism and
   does not suffer from this sort of problem.  A program using proc(4) can
   choose to trace signals or not; a signal is just one of the events a
   process can stop on, others are machine faults and syscalls.  A process
   can be stopped without sending SIGSTOP.  Provisions exist for cooperating
   with job-control stop/start signals and ptrace(2) as well.

15.ptrace(2) causes a traced process to die when its controlling process
   dies.  If a process is grabbed by trace(1) and trace(1) is killed with
   'kill -9', then the traced process also dies.  trace(1) catches all
   other signals in order to let go of the traced process before exiting.
   truss(1) doesn't have this problem; when it is killed with 'kill -9',
   the traced process continues unmolested.

16.There is a serious bug in SunOS 4.0 involving the interaction of
   job-control stop signals (SIGSTOP and its relatives) and ptrace(2).
   If a process is stopped by sending it a job-control stop signal and
   trace(1) is applied to it while it is so stopped, then trace(1) hangs
   and becomes unkillable, even with 'kill -9'.  The whole ptrace(2)
   mechanism is then locked out and any instance of dbx also becomes
   hung and unkillable.  The only recourse is a reboot.

				Roger A. Faulkner
				allegra!raf

mike@turing.unm.edu (Michael I. Bushnell) (09/04/88)

In article <7460@bigtex.uucp> james@bigtex.UUCP (James Van Artsdalen) writes:
>
>I don't want to flame Sun over trace though: that is incredibly
>useful.  I am curious about implementation though: if it will display
>the data for write(2) it would seem a security hole unless disabled
>for suid processes.  Is there any possible way to write a similar
>program under SysVr3 without kernel modifications?

Trace(1) is undoubtably done using ptrace(2) in combination with an
option added by SUN that stops the process upon execution of and upon return
from system calls.   If you don't modify your kernel to have this feature,
then trace(1) becomes a matter of tracing entry points to the C library...
that will find system calls executed the "normal" way, but not freaky things
like people writing code (on the fly) into their data segment and then
executing it.

And, since it probably uses ptrace(2), setuid is ignored for the process.

-- 
                N u m q u a m   G l o r i a   D e o 

			Michael I. Bushnell
			HASA - "A" division
			mike@turing.unm.edu
	    {ucbvax,gatech}!unmvax!turing.unm.edu!mike

ado@elsie.UUCP (Arthur David Olson) (09/05/88)

> 1. truss(1) can follow children created by fork(2)...
> ...
> 16.There is a serious bug in SunOS 4.0 involving the interaction of
>    job-control stop signals (SIGSTOP and its relatives) and ptrace(2)...

All of which is rendered moot by
17.trace(1) is available.  truss(1) isn't.

Horn tooting time will come when truss(1) hits the street.
-- 
	ado@ncifcrf.gov			ADO is a trademark of Ampex.

brent%terra@Sun.COM (Brent Callaghan) (09/05/88)

In article <7460@bigtex.uucp>, james@bigtex.uucp (James Van Artsdalen) writes:
> 
> > 	$ trace date
> 
> I count 8 closes, 6 mmaps, 5 opens, 3 reads, 2 munmaps, 1 write...
> 
> What on earth does all of this have to do with printing the date and
> time???  getgid?  ioctl?  That's 36 system calls.

Most of the system calls in this example are taking care of the
mapping in of the shared libraries.  The overhead is the same
for any dynamically linked process - it just looks top-heavy for
a program like "date".  It's the classical time/space tradeoff:
it takes a little longer to get to the gettimeofday() but date
has a much smaller executable, uses the latest version of the
C library, and shares the same C library code as all the other
resident processes.

> I am curious about implementation though: if it will display
> the data for write(2) it would seem a security hole unless disabled
> for suid processes.

You're right.  You can't trace suid processes unless you are
also superuser.  The trace command is bound by the restrictions
already on the ptrace() system call.  A suid program will run but
it won't have any of its suid-ness.  Neither can you trace other
people's processes - they must be yours.

Made in New Zealand -->  Brent Callaghan  @ Sun Microsystems
			 uucp: sun!bcallaghan
			 phone: (415) 336 6188

brent%terra@Sun.COM (Brent Callaghan) (09/06/88)

In article <11966@andante.UUCP>, raf@andante.UUCP (Roger A. Faulkner) writes:
> Great minds run in the same paths, with some variations.
> AT&T's truss(1) command was developed without any knowledge of Sun's trace(1)
> command's actual or planned existence.  I presume the reverse is also true.
> 

Yes indeed.  Except for the name there are incredible similarities:
both use a -p flag to trace a pid, a -c flag for system call counting,
and a -o flag for trace redirection to a file.

> First and foremost, it must be observed that trace(1) is based on Sun's
> enhanced ptrace(2) system call while truss(1) is based on AT&T's proc(4)
> process filesystem, invented by Tom Killian of Bell Labs research and
> extended and implemented for System V by Ron Gomes, with significant
> input from me.  The deficiencies in trace(1) are largely due to the
> deficiencies in ptrace(2) as compared to proc(4).

I agree, the /proc interface is a much better way to do this sort of thing.

> 1. truss(1) can follow children created by fork(2).  You can trace a shell
>    script of arbitrary complexity.  My favorite is spell(1), which runs
>    an 8-member pipeline.  trace(1) can't do this because the ptrace(2)ed
>    condition is not inherited; proc(4) tracing flags can be inherited.

Yes, this is a nice feature.  We had a "trace through fork" version
running internally but couldn't get it into the release in time.  The
price of being the first... :-)

> 10.truss(1) reports sleeping system calls as "sleeping ..." if they remain
>    asleep for more than 2 seconds.  trace(1) can't do this because of the
>    ptrace(2) interface.

A trace command user can usually assume a sleep if the cursor is sitting
after an "=" waiting for the return value to come back e.g.

	select (256, 0xdfffc24, 0xdfffc04, 0xdfffbe4, 0) = 
                                                           ^
Thanks for the description of truss and it's comparison with trace.
There's no doubt that truss is a better implementation of a system
call tracer.  I look forward to using it in sVr4.

Made in New Zealand -->  Brent Callaghan  @ Sun Microsystems
			 uucp: sun!bcallaghan
			 phone: (415) 336 6188

james@bigtex.uucp (James Van Artsdalen) (09/08/88)

In article <66892@sun.uucp>, brent%terra@Sun.COM (Brent Callaghan) wrote:
> In article <7460@bigtex.uucp>, james@bigtex.uucp (me) writes:

> > > 	$ trace date

> > What on earth does all of this have to do with printing the date and
> > time???  getgid?  ioctl?  That's 36 system calls.

> Most of the system calls in this example are taking care of the
> mapping in of the shared libraries.

How does this implementation compare to the SysVr3 shared libraries?
I had been under the impression there that shared libraries were
mapped into the address space by the kernel, not by the user process.
Is there any particular advantage between the two schemes, or does
SysVr3 do it like Sun OS-4?
-- 
James R. Van Artsdalen    ...!uunet!utastro!bigtex!james     "Live Free or Die"
Home: 512-346-2444 Work: 328-0282; 110 Wild Basin Rd. Ste #230, Austin TX 78746

guy@gorodish.Sun.COM (Guy Harris) (09/09/88)

> How does this implementation compare to the SysVr3 shared libraries?
> I had been under the impression there that shared libraries were
> mapped into the address space by the kernel, not by the user process.
> Is there any particular advantage between the two schemes, or does
> SysVr3 do it like Sun OS-4?

S5R3 does it in the kernel.  SunOS 4.0 does it in user mode.  The advantage to
the latter is that it's done in user mode; this means the code that does it
isn't wired-down code in the kernel, and means that it's easier to debug,
replace, etc..

It also means it's easier to make it more powerful; the SunOS shared libraries
are not tied to specific locations, and relocation is done when the library is
mapped in.  Process A can map the library in at address A, while process B can
map it in at address B.  This obviates the need to "register" shared libraries
at particular addresses.

In addition, you can specify a "search path" for shared libraries, so that you
can provide a "private" shared library that, for example, could wrap all system
calls that take pathnames with code that knows how to expand "~username", so
that all dynamically-linked programs will understand "~username" (with one
exception; set-UID and set-GID programs ignore this path, for obvious reasons).
Not all of the mechanism to make this convenient is present in the current
release, but it will probably appear in future releases.

You could also provide a run-time interface to the dynamic linker, so you can
write code that, for example, reads the name of a shared library file from
another file, and then gets a pointer to the routine named "foobar" in that
file and calls it.  This could be convenient for structured document editors;
if you wanted to define a new kind of frame within a document, say a frame for
editing PERT charts, you could make the implementation of that frame into a
shared library file and tell the editor that the code to implement this kind of
frame is in "/usr/local/docedit/lib/pert_frame.so".  Again, this mechanism is
not currently present, but will probably appear eventually.