[comp.lang.c] Argument validity checking

ggw@wolves.uucp (Gregory G. Woodbury) (01/19/90)

While playing around with yet another subroutine library to perform interactive
editing of fields under curses(3x) I came face-to-face with a missing feature
of UN*X (and probably most C language environments).

When a subroutine depends on the user to pass addresses (strings, structures,
or functions) that the subroutine is going to use, and the subroutine wants
to be robust about not killing the process if the user makes a mistake,
validity checking the aruments passed is one of the front line defenses.

The problem, however, is that UN*X environments (at least Sys5 and related
ones) do not provide a general means of determining if a given address is
going to generate a memory fault of some kind.  By this I mean that before
using the address (to call a function for example) there is no way to discern
that the address is not available to the process.

Some programs can use signals or other exception trapping mechanisms to
catch bad references after the fact and attempt to fix up -- but this is
not a general method.

Some architectures provide a machine instruction to "probe" an address
to determine access, but generally such instructions or facilities are
not available at the C interface.

I can easily see some of the complications of implementing such a
facility (handling paged out memory, dealing with shared memory
libraries and such like) but began wondering if other variants of UN*X
have provided such a facility, or how other programmers deal with the
desire to be robust in a non-robust environment like most UN*Xes?

-- 
Gregory G. Woodbury
Sysop/owner Wolves Den UNIX BBS, Durham NC
UUCP: ...dukcds!wolves!ggw   ...dukeac!wolves!ggw           [use the maps!]
Domain: ggw@cds.duke.edu  ggw@ac.duke.edu  ggw%wolves@ac.duke.edu
Phone: +1 919 493 1998 (Home)  +1 919 684 6126 (Work)
[The line eater is a boojum snark! ]           <standard disclaimers apply>

wittig@gmdzi.UUCP (Georg Wittig) (01/22/90)

ggw@wolves.uucp (Gregory G. Woodbury) writes:
>When a subroutine depends on the user to pass addresses (strings, structures,
>or functions) that the subroutine is going to use, and the subroutine wants
>to be robust about not killing the process if the user makes a mistake,
>validity checking the aruments passed is one of the front line defenses.

>The problem, however, is that UN*X environments (at least Sys5 and related
>ones) do not provide a general means of determining if a given address is
>going to generate a memory fault of some kind.

My solution is the following one:

	#define MIN_NON_NIL_PTR ((unsigned long) 1L)
	#define MAX_NON_NIL_PTR ((unsigned long) 0x00ffffffL)

	if ( ! ( ((unsigned long) ptr_in_question) >= MIN_NON_NIL_PTR   &&
		 ((unsigned long) ptr_in_question) <= MAX_NON_NIL_PTR ) )
	{	... get_angry_or_whatever () ...
	}
or, if you allow a nil ptr:

	if (ptr_in_question != 0   &&   (...see above...))

I know, that's not a perfect solution. The values MIN_NON_NIL_PTR and
MAX_NON_NIL_PTR may vary from machine to machine. You know how to use #ifdef :-)
The condition ``MIN <= ptr <= MAX'' may be more complicated, and so on, and so
on ...

BUT it works on surprising number of machines.

Does someone know if there exists a portable ANSI C conforming solution for that
problem?
-- 
Georg Wittig   GMD-Z1.BI   P.O. Box 1240   D-5205 St. Augustin 1 (West Germany)
email: wittig@gmdzi.uucp   phone: (+49 2241) 14-2294
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
"Freedom's just another word for nothing left to lose" (Kris Kristofferson)

michael@stb.uucp (Michael Gersten) (02/02/90)

<Sigh>. Guys, to find out if an address is valid or not, pass it to
access as a filename. It has to check for that being valid and in your
address, and you can see if it gives you EACCESS or not.

So there is a use for access() after all.

		Michael
-- 
		Michael
denwa!stb!michael anes.ucla.edu!stb!michael 
"The 80's: Ten years that came in a row."

ggw@wolves.uucp (Gregory G. Woodbury) (02/03/90)

In article <10542@june.cs.washington.edu>
machaffi@fred.cs.washington.edu.cs.washington.edu (Scott MacHaffie) writes:
>
>ggw@wolves.uucp (Gregory G. Woodbury) writes:
>>When a subroutine depends on the user to pass addresses (strings, structures,
>>or functions) that the subroutine is going to use, and the subroutine wants
>>to be robust about not killing the process if the user makes a mistake,
>>validity checking the aruments passed is one of the front line defenses.
>
>I must be missing something here, but what can a function do when
>it finds an invalid address besides printing an error message and
>exiting?  If the function can't do anything, then you might as well
>let the operating system catch it with an illegal memory access.

	True, the function can only complain to the calling program
that the service request is invalid.  If the calling program wishes to
perform a fix-up and/or display an error message, that is up to the
programmer using the function.  After all, the operating system tells the
program why a system call failed (usually - thats why there's the extern
int errno ;-).
	What I don't want to happen is for the program to come to a screeching
crash and dump core (or whatever) all over the machine's disks.  The original
complaint arises from the fact that there is no *portable* method to catch
invalid addresses so that the function can report to the caller that the
service request is incorrect.

>
>Even if the various checks could determine that the address is within
>the process' address space, there is no way to determine if it contains
>legal data without accessing it.  Thus, there is no way to tell
>beforehand if the address is valid.

	There are two kinds of validity confused here.  One does have to
depend on the caller to provide some things,  on the other hand, the
function should try to be robust in preventing the caller from shooting
itself in the foot.  The validity I was trying to insure was that the
subroutine can pull bytes (or whatever) from the given address, or that
the given address can be called as a function without *immediately* killing
the process.

	This kind of robustness is important in (mainly) two kinds of
systems - fault tolerant systems and in secure systems.  In a fault tolerant
system, the function should report that a fault has occurred and let the
operating system retry however it chooses.  In a security conscious system,
the *probe* should determine if the caller has sufficient privledge to
access the data.

	As I commented to someone somewhere. The complaint perhaps should
be directed to the POSIX standards people to see if there might be some
way to have to OS provide the service in some standard way.
-- 
Gregory G. Woodbury
Sysop/owner Wolves Den UNIX BBS, Durham NC
UUCP: ...dukcds!wolves!ggw   ...dukeac!wolves!ggw           [use the maps!]
Domain: ggw@cds.duke.edu  ggw@ac.duke.edu  ggw%wolves@ac.duke.edu
Phone: +1 919 493 1998 (Home)  +1 919 684 6126 (Work)
[The line eater is a boojum snark! ]           <standard disclaimers apply>

arielf@taux01.UUCP (Ariel Faigon) (02/04/90)

In <1990Jan26.003654.6080@NCoast.ORG> Brandon S. Allbery writes:
| As quoted from <1891@gmdzi.UUCP> by wittig@gmdzi.UUCP (Georg Wittig):
| +---------------
| | My solution is the following one:
| | 
| | 	#define MIN_NON_NIL_PTR ((unsigned long) 1L)
| | 	#define MAX_NON_NIL_PTR ((unsigned long) 0x00ffffffL)
| +---------------
|
I liked Brandon's original suggestion (to pass the address to some
system-call which checks for EFAULT).

Anyway, without claiming that the following solution is portable/general/
whatever I'll post my contribution to this thread,
just because on some systems it may be a bit better than Georg's solution
(although basically the same idea).

Quoted from some derivative of a 4.x BSD manual on end(3):

NAME
     end, etext, edata - last locations in program

SYNOPSIS
     extern end;
     extern etext;
     extern edata;

So (I add 'start' which may be defined in your C startup module):

#define IN_MY_TEXT(addr) ((void *) &start <= (addr) < (void *) &etext)
#define IN_MY_DATA(addr) (!(IN_MY_TEXT(addr) && (addr) < (void *) &end)
#define IN_MY_HEAP(addr) ((void *) &end <= (addr) < (void *) sbrk(0))
#define IN_MY_ADDRESS_SPACE(addr) \
	(IN_MY_TEXT(addr) || IN_MY_DATA(addr) || IN_MY_HEAP(addr))

(disclaimer: this code wasn't tested).

This still doesn't handle gaps, shared memory segments, and stack space
you can check (again, not bullet-proof) for an address near the top of
your stack by comparing 'addr' to some local variable address.

Just another approximation for the truth :-)
-- 
Ariel Faigon, CTP group, NSTA
National Semiconductor (Israel)
6 Maskit st.  P.O.B. 3007, Herzlia 46104, Israel   Tel. (972)52-522312
arielf%taux01@nsc.com   @{hplabs,pyramid,sun,decwrl} 34 48 E / 32 10 N

arielf@taux01.UUCP (Ariel Faigon) (02/04/90)

Ooops, I just wrote:
#define IN_MY_TEXT(addr) ((void *) &start <= (addr) < (void *) &etext)
                                          ^^^^^^^^^^^
#define IN_MY_HEAP(addr) ((void *) &end <= (addr) < (void *) sbrk(0))
					^^^^^^^^^^^

You need of course separate comparisons here
like in:
	((void *) &start <= (addr) && (addr) < (void *) &etext)

As I said the code wasn't tested, even not reviewed enough. sorry.
-- 
Ariel Faigon, CTP group, NSTA
National Semiconductor (Israel)
6 Maskit st.  P.O.B. 3007, Herzlia 46104, Israel   Tel. (972)52-522312
arielf%taux01@nsc.com   @{hplabs,pyramid,sun,decwrl} 34 48 E / 32 10 N

lehners@uniol.UUCP (Joerg Lehners) (02/05/90)

Hello !

michael@stb.uucp (Michael Gersten) writes:
><Sigh>. Guys, to find out if an address is valid or not, pass it to
>access as a filename. It has to check for that being valid and in your
>address, and you can see if it gives you EACCESS or not.

But that would cause tons of useless disk io.
And that routine would be really slow if the buffer (interpreted
as a path by access()) is a valid path to a file on eg. a mounted floopy
disk.

>So there is a use for access() after all.

I hope Michael is just joking ...

  Joerg
--
/ UUCP:    lehners@uniol              | Joerg Lehners                  \
|       ...!uunet!unido!uniol!lehners | Fachbereich 10 Informatik ARBI |
| BITNET:  066065 AT DOLUNI1          | Universitaet Oldenburg         |
\ Inhouse: aragorn!joerg              | D-2900 Oldenburg               /

machaffi@fred.cs.washington.edu (Scott MacHaffie) (02/05/90)

In article <1990Feb3.052307.12524@wolves.uucp> ggw@wolves.UUCP (Gregory G. Woodbury) writes:
>The validity I was trying to insure was that the
>subroutine can pull bytes (or whatever) from the given address, or that
>the given address can be called as a function without *immediately* killing
>the process.

If you are expecting a function pointer and are passed a character pointer,
for example, then the process will probably die as soon as you call the
"function".  As an example, suppose that you are passed a pointer to
a character string which just happens to be the same as a function prolog
-- except that the first "instruction" is an illegal memory access.
Thus, the same problem returns for testing for valid function pointers.

			Scott MacHaffie

lca@spodv4.UUCP (Lars H Carlsson) (02/05/90)

In article <1990Feb2.070437.2695@stb.uucp>, michael@stb.uucp (Michael Gersten) writes:
> <Sigh>. Guys, to find out if an address is valid or not, pass it to
> access as a filename. It has to check for that being valid and in your
> address, and you can see if it gives you EACCESS or not.
> 
> So there is a use for access() after all.
> 
> 		Michael
> -- 
> 		Michael
> denwa!stb!michael anes.ucla.edu!stb!michael 
> "The 80's: Ten years that came in a row."


X/Open page ACCESS(2).1
"
...
	[EACCES]	Permission bits of the file mode do not permit
			the requested access.
...
"

	(there are access and access ;-)

LH

chris@mimsy.umd.edu (Chris Torek) (02/05/90)

This whole discussion has been rather amazing.  In most cases, there is
little difference between a program that, when run, says

	% compute 2 + 2
	Segmentation fault (core dumped)
	% 

and one that says

	% compute 2 + 2
	!*797tKG
	%

where the former used an invalid address, and the latter used a valid but
incorrect address.  Testing whether an address can be read or written does
not tell whether that address *should* be read or written.  Much better
would be, for instance, a program that says:

	% compute 2 + 2
	compute: panic: add_integers: invalid data type code 47!
	compute: This program has discovered itself to be buggy.
		Please notify the vendor, including what you did
		and the exact output from the program.
	Segmentation fault (core dumped)
	% 

Address validity checking is at best a minor part of real validity
checking.  The core dump provides enough information to locate the bad
address, which is as much as the program could have done anyway (since
it must assume, once something has gone wrong, that *anything* could go
wrong).

There are a few exceptions to this rule, but they are fairly rare.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris