[comp.os.minix] #! in MM -- take 2

klamer@mi.eltn.utwente.nl (Klamer Schutte) (05/24/91)

In <10033@star.cs.vu.nl> kjb@cs.vu.nl (Kees J. Bot) writes:

>I'm posting my comments to Klamer's second try at an #! implementation in
>MM to remind you about my implementation of #! that I posted on May 13.

>klamer@mi.eltn.utwente.nl (Klamer Schutte) writes:
|>Here is the second version of my #!interpreter patch for mm/exec.c.
|>This version has all known bugs fixed.

>Except for not doing setuid and this other "feature".

I said i had bugs fixed. Setuid in your way is a bug.

|>One feature (bug ???) remains: i keep alignment from the data argv[] and
|>envp[] point to intact. There (migh ???) be a tradition of having this
|>data in the form of strings with only 1 \0 in between.
|>Where is the manual page for execve(2) ???? Or does POSIX(*) say anything
|>about this?

>I know of three places to look for the proper format of the initial stack:
>- The old V7 manuals under exec(2), written when users were not considered
>  too stupid to know such things.
>- The source code of execve(2).
>- The source code of ps(1).
>The ps(1) source contains this interesting comment:

How nice, to talk about self-documenting code. But i don't model system
call behaviour to assumptions made by any program. So ps(1), look at the
POSIX description of execve(). And yes, i have looked at the source code
of execve(2). This one is mm/exec.c right? The other one (lib/posix/execve.c)
is just an attempt to get the POSIX specification mapped to minix system.

>I decided to go over Klamer's patch with a fine comb this time.  (I wish
>someone would do that with my patch, with a mental -pedantic flag on.)

Good initiative. Any Minix'er is invited to do so.

>- ALIGN align to a multiple of 2, execve to a multiple of sizeof(char *).

Yep. The comment near ALIGN says this is the correct alingment for an Atari
ST. You know, that 16 bits machine ...

>- The interpreter is found relative to '/'.  (Move the first
>  tell_fs(CHDIR, ...) inside the do loop.)

This is a bug, yes. Not to bad -- the interpreter should be specified with
an absolute path in traditional (;-) #! implemetations. (Like SunOs).

>- Setuid bits on the script are still ignored.  (Wouldn't it be nice to
>  allow people to explore the security risks of a setuid script?)

They can. They only have to change mm/exec.c. Any user not capable of doing
that should not explore security risks ;-)

>- Change 'know' to 'now' in patch_stack.  (-pedantic)
>- The old argv[0][] is not removed from the initial stack.

See the earlier comments about the stack format.

>- The ALIGN(len) is still at the wrong place.  Try moving only the strings
>  by disp bytes, then move the pointers by argc*sizeof(char *) bytes.  Do
>  an ALIGN(disp) just before the return.

This will destroy alignment of the strings pointed by by argv[] and envp[].

>- If stk_bytes is close to ARG_MAX then the last few environment variables
>  may be truncated.

There are three choices overhere:
1) truncate some of the strings if stk_bytes is close to ARG_MAX
2) expand stk_bytes to bigger than ARG_MAX if ...
3) fail the execution if ...

I choose 1). You choose 3). 2) is IMHO out of the question.
Rethinking it, 3) might be the best choice. What is the proper error code?
ENOMEM specifies also that there is not enough core ...
(Don't tell me lib/posix/execve.c uses ...)

>- Read_header returns 0 when there is nothing behind #!.
Bug.
>- The size_ok function may return something other than -100.
Can't check now. Don't have full minix online.


I must say this kind of bug finding is a good way to get rid of them.
But must this be on news? Private mail should be appropriate.

Can anybody with a POSIX copy at hand clearify what the correct stack format 
is? My old V7 manual (for a m68k system) only says something about pdp11 and
interdata stack layouts...

Klamer
-- 
Klamer Schutte
Faculty of electrical engineering -- University of Twente, The Netherlands
klamer@mi.eltn.utwente.nl	{backbone}!mcsun!mi.eltn.utwente.nl!klamer

philip@cs.vu.nl (Philip Homburg) (05/24/91)

In article <klamer.675068096@mi.eltn.utwente.nl> klamer@mi.eltn.utwente.nl (Klamer Schutte) writes:
%In <10033@star.cs.vu.nl> kjb@cs.vu.nl (Kees J. Bot) writes:
%>Except for not doing setuid and this other "feature".
%
%I said i had bugs fixed. Setuid in your way is a bug.

What bug, if you say something is a bug please explain why, and how it is 
triggered.

%>I know of three places to look for the proper format of the initial stack:
%>- The old V7 manuals under exec(2), written when users were not considered
%>  too stupid to know such things.
%>- The source code of execve(2).
%>- The source code of ps(1).
%>The ps(1) source contains this interesting comment:
%
%How nice, to talk about self-documenting code. But i don't model system
%call behaviour to assumptions made by any program. So ps(1), look at the
%POSIX description of execve(). And yes, i have looked at the source code
%of execve(2). This one is mm/exec.c right? The other one (lib/posix/execve.c)
%is just an attempt to get the POSIX specification mapped to minix system.

First of all POSIX doesn't specify bit patterns but the C-functions to access
certain features of the operating system. I can't find an initial stack layout
in IEEE Std 1003.1-1988. But I might have overlooked it.

mm/exec.c doesn't specify the exec format either, it just copies some data from
the old process' image to the new ones. Why isn't the source of execve.c, 
crtso.s, and ps.c enough for you.

%This will destroy alignment of the strings pointed by by argv[] and envp[].

Without ps on a decent machine like a 386 there is no need for alignment
specifications execept maybe for speed. But on sparcs and mc68000s some objects
need to be aligned. According to the C standard arrays should be aligned on 
the alignment required by their basetypes. Since the base type of string is
char and (1 byte)chars need no alignment, there is no need to align strings
and the alignment of strings can't be destroyed.

%I must say this kind of bug finding is a good way to get rid of them.
%But must this be on news? Private mail should be appropriate.

Oh, why are public discussions not appropriate for shell scripts, or bugs?
I was glad that Kees posted his version of #! and I installed it immediately.
Your version is still somewhere in one of my mailboxes.

%Can anybody with a POSIX copy at hand clearify what the correct stack format 
%is? My old V7 manual (for a m68k system) only says something about pdp11 and
%interdata stack layouts...

First you state that the POSIX specification is mapped to the minix system, and
now you ask for "anybody with a POSIX copy". Yes of course, my V7 manual also
mentions only pdp11 and interdata, may be they didn't run V7 on mc68000s
back in 1979? You forgot to mention that that manual says nothing about 
padding. But the the strings should follow the the null after the environment
pointers. (Minix doesn't have the 0 word at the top of the memory either, has
it?)




						Philip

ast@cs.vu.nl (Andy Tanenbaum) (05/24/91)

In article <klamer.675068096@mi.eltn.utwente.nl> klamer@mi.eltn.utwente.nl (Klamer Schutte) writes:
>Can anybody with a POSIX copy at hand clearify what the correct stack format 
>is? My old V7 manual (for a m68k system) only says something about pdp11 and
>interdata stack layouts...

POSIX does not specify any particular stack format.  Each implementation
is free to do what it thinks best.  The execution of shell scripts by exec
is also not in there. 

Andy Tanenbaum (ast@cs.vu.nl)

kjb@cs.vu.nl (Kees J. Bot) (05/24/91)

klamer@mi.eltn.utwente.nl (Klamer Schutte) writes:
>In <10033@star.cs.vu.nl> kjb@cs.vu.nl (Kees J. Bot) writes:
>>klamer@mi.eltn.utwente.nl (Klamer Schutte) writes:

>>Except for not doing setuid and this other "feature".

>I said i had bugs fixed. Setuid in your way is a bug.

It might be possible for an application to use a setuid script in a way
that is not a security hole.  Someone could place a a shell in /bin that
is setuid root, but that is no reason to dissallow setuid for executables.
If you want to protect users then add a wastebasket to /bin/rm.

>How nice, to talk about self-documenting code. But i don't model system
>call behaviour to assumptions made by any program. So ps(1), look at the
>POSIX description of execve(). And yes, i have looked at the source code
>of execve(2). This one is mm/exec.c right? The other one (lib/posix/execve.c)
>is just an attempt to get the POSIX specification mapped to minix system.

If the source of execve() is the best description you have for the initial
stack then use it.  Make sure that your modified stack looks like something
execve() could have produced.  There is no reason to make a mess of it just
because there is no description in one of the many standards to choose from.
BTW. there is no Berlin wall between execve() and mm/exec.c.  Most UNIX
systems pass the addresses of the argv and envp array up to the kernel and
let the kernel collect the strings from user space.  In Minix it is more
convenient to build the initial stack in execve() and let MM fetch that.

>>- The ALIGN(len) is still at the wrong place.  Try moving only the strings
>>  by disp bytes, then move the pointers by argc*sizeof(char *) bytes.  Do
>>  an ALIGN(disp) just before the return.

>This will destroy alignment of the strings pointed by by argv[] and envp[].

Strings are made of bytes, and bytes have an alignment of 1, you can't
destroy that!  The argv and envp arrays are made of char *'s, so they
need an alignment of sizeof(char *) in the worst case.

>>- If stk_bytes is close to ARG_MAX then the last few environment variables
>>  may be truncated.

>There are three choices overhere:
>1) truncate some of the strings if stk_bytes is close to ARG_MAX
>2) expand stk_bytes to bigger than ARG_MAX if ...
>3) fail the execution if ...

>I choose 1). You choose 3). 2) is IMHO out of the question.
>Rethinking it, 3) might be the best choice. What is the proper error code?
>ENOMEM specifies also that there is not enough core ...
>(Don't tell me lib/posix/execve.c uses ...)

Execve() uses E2BIG, mm/do_exec() uses ENOMEM for its first and normally
useless check.  I used ENOMEM in case 3) above, which I now see to be wrong
thing to do.  (Bug #1 in my version, finally.)

>I must say this kind of bug finding is a good way to get rid of them.
>But must this be on news? Private mail should be appropriate.

You post it in public, so I comment on it in public.  Why shouldn't other
people know what's wrong with it?
--
	                        Kees J. Bot  (kjb@cs.vu.nl)
	              Systems Programmer, Vrije Universiteit Amsterdam

dprrhb@inetg1.ARCO.COM (Reginald H. Beardsley) (05/24/91)

In article <10055@star.cs.vu.nl>, kjb@cs.vu.nl (Kees J. Bot) writes:
> klamer@mi.eltn.utwente.nl (Klamer Schutte) writes, etc.
> 
> >>- The ALIGN(len) is still at the wrong place.  Try moving only the strings
> >>  by disp bytes, then move the pointers by argc*sizeof(char *) bytes.  Do
> >>  an ALIGN(disp) just before the return.
> 
> >This will destroy alignment of the strings pointed by by argv[] and envp[].
> 
> Strings are made of bytes, and bytes have an alignment of 1, you can't
> destroy that!  The argv and envp arrays are made of char *'s, so they
> need an alignment of sizeof(char *) in the worst case.
> 

  It is not the alignment of the characters that matters, it is the 
alignment of the adjacent full word values that matters.  This means that
in addition to starting the string on a word boundary, to needs to be 
padded to a multiple of the word length.  For SPARC this will need to be
4 bytes.  Even on systems that will tolerate the misalignment, there are 
very significant performance penalties for breaking the alignment.

-- 
Reginald H. Beardsley       
ARCO Information Services
Plano, TX 75075           
Phone: (214)-754-6785
Internet: dprrhb@arco.com 

nall@sun8.scri.fsu.edu (John Nall) (05/25/91)

In article <klamer.675068096@mi.eltn.utwente.nl> klamer@mi.eltn.utwente.nl (Klamer Schutte) writes:
>In <10033@star.cs.vu.nl> kjb@cs.vu.nl (Kees J. Bot) writes:
>
>  [...STUFF DELETED...]
>
>I must say this kind of bug finding is a good way to get rid of them.
>But must this be on news? Private mail should be appropriate.
>

     Seems to me, also, that private mail would be more appropriate.
     Perhaps there is some personal animosity existing that the rest
     of us are not aware of, but in general I would think that this
     sort of public fault-finding is not in good taste.



--
John W. Nall		| Supercomputer Computations Research Institute
nall@sun8.scri.fsu.edu  | Florida State University, Tallahassee, FL 32306
  "Complete solitude is a torment which not even Hell threatens."

klamer@mi.eltn.utwente.nl (Klamer Schutte) (05/27/91)

In <10050@star.cs.vu.nl> philip@cs.vu.nl (Philip Homburg) writes:

>In article <klamer.675068096@mi.eltn.utwente.nl> klamer@mi.eltn.utwente.nl (Klamer Schutte) writes:

>First of all POSIX doesn't specify bit patterns but the C-functions to access
>certain features of the operating system. I can't find an initial stack layout
>in IEEE Std 1003.1-1988. But I might have overlooked it.

I checked IEEE 1003.1-1986. Didn't mention it either.

>mm/exec.c doesn't specify the exec format either, it just copies some data from
>the old process' image to the new ones. Why isn't the source of execve.c, 
>crtso.s, and ps.c enough for you.

I checked crtso.s (implicit, by testing whether it worked. It did.)
Execve.c is just an implementation. And since i am changing the implementation,
i don't mind changing its produced output (because that was what i was doing).
I haven't checked ps.c, and i am not going to do it. I don't want system call
behaviour to be modelled after some assumptions that a utily program has made,
as long as these assumtions aren't supported by a manual or standard.

What i did check was the POSIX standard (OK, an old one) for execve(). This
one states that the main() function gets a pointer to an array of pointers
which point to the data. And my implementation does that...

The reason i wanted the alignment to be saved was that a char * is often used
were a void * is meant. And some people use void * to point to anything.
(See for instance the SunOs 4.1.1 manual page on malloc(), and you understand
 what i mean.)
In this way one can let argv[1] point to a double in execve(), and let
main() read this double.

>%This will destroy alignment of the strings pointed by by argv[] and envp[].

>%I must say this kind of bug finding is a good way to get rid of them.
>%But must this be on news? Private mail should be appropriate.

>Oh, why are public discussions not appropriate for shell scripts, or bugs?

Hmmmm. Should have missed the shell script.
Bugs should be send. -pedantic searches not IMHO.

Klamer
-- 
Klamer Schutte			Tel: +31-53-892786	Fax: +31-53-340045
Faculty of electrical engineering -- University of Twente, The Netherlands
preferred: klamer@mi.eltn.utwente.nl   SMTP: klamer@utelmi01.el.utwente.nl

philip@cs.vu.nl (Philip Homburg) (05/27/91)

In article <klamer.675334474@mi.eltn.utwente.nl> klamer@mi.eltn.utwente.nl (Klamer Schutte) writes:
%I checked crtso.s (implicit, by testing whether it worked. It did.)
%Execve.c is just an implementation. And since i am changing the implementation,
%i don't mind changing its produced output (because that was what i was doing).
%I haven't checked ps.c, and i am not going to do it. I don't want system call
%behaviour to be modelled after some assumptions that a utily program has made,
%as long as these assumptions aren't supported by a manual or standard.

Ok, there is no exact specification the initial stack layout other than in
ps.c or implicit in the v7 manual.  But this is not a permit to break
existing code and history (look at older or other versions of ps) just
because there was no committee to standardize it. It is very easy to change
the implementation of Kees to supply alignment without breaking ps. The
implementation is left as an exercise for the reader. I am not going to do it
because string are NOT aligned.

%The reason i wanted the alignment to be saved was that a char * is often used
%were a void * is meant. And some people use void * to point to anything.
%(See for instance the SunOs 4.1.1 manual page on malloc(), and you understand
% what i mean.)
%In this way one can let argv[1] point to a double in execve(), and let
%main() read this double.

This is absolutely ridiculous. Are you really talking about *(double *)argv[1] ?
Are you sure that your doubles don't contain zero bytes? How do you exec
your programs? a.out \x12\x3\0\4 ?

In minix malloc returns a void *. Is the text you are referring to:
	malloc() returns a pointer to  a  block  of  at  least  size
	bytes, which is appropriately aligned.

This is nothing special, ansi requires the same. But the strings of the 
arguments are really strings and not doubles, therefor require no alignment.

%>Oh, why are public discussions not appropriate for shell scripts, or bugs?
%
%Hmmmm. Should have missed the shell script.
%Bugs should be send. -pedantic searches not IMHO.

I thought that #! was to implement some features of executing shell scripts
(what do You call script starting with #!). And you call breaking ps
pedantic?




						Philip

adrie@philica.ica.philips.nl (Adrie Koolen) (05/28/91)

In article <1991May24.164952.22295@Arco.COM> dprrhb@inetg1.ARCO.COM (Reginald H. Beardsley) writes:
>  It is not the alignment of the characters that matters, it is the 
>alignment of the adjacent full word values that matters.  This means that
>in addition to starting the string on a word boundary, to needs to be 
>padded to a multiple of the word length.  For SPARC this will need to be
>4 bytes.  Even on systems that will tolerate the misalignment, there are 
>very significant performance penalties for breaking the alignment.

I compiled a program containing:

	char s1[] = "";
	char s2[] = "a";
	char s3[] = "ab";
	char s4[] = "abc";
	char s5[] = "abcd";
	char s6[] = "abcde";
	...
	printf("%08x, %08x, %08x, %08x, %08x, %08x.\n", s1, s2, s3, s4, s5, s6);

on a SparcStation. The Sun C compiler aligned all strings to word addresses:

	000040a8, 000040ac, 000040b0, 000040b4, 000040b8, 000040c0.

while the GNU C compiler put all strings one after the other without aligning
anything:

	000040a8, 000040a9, 000040ab, 000040ae, 000040b2, 000040b7.

It looks like string alignment is not necessary in ANSI C. But then again, maybe
I missed your point. Anyway, Minix-Sparc generates with the following program:

	main(int argc, char *argv[])
	  { int i;

	    for (i=0; i<argc; i++) printf("%08x: %s\n", argv[i], argv[i]);
	  }

and the invocation:

	proef d is een test

this output:

	00011F74: proef
	00011F7A: d
	00011F7C: is
	00011F7F: een
	00011F83: test

Adrie Koolen (adrie@ica.philips.nl)
Philips Innovation Centre Aachen

kjb@cs.vu.nl (Kees J. Bot) (05/28/91)

adrie@philica.ica.philips.nl (Adrie Koolen) writes:

>I compiled a program containing:

>	char s1[] = "";
>	char s2[] = "a";
>	char s3[] = "ab";
>	char s4[] = "abc";
>	char s5[] = "abcd";
>	char s6[] = "abcde";
>	...
>	printf("%08x, %08x, %08x, %08x, %08x, %08x.\n", s1, s2, s3, s4, s5, s6);

>on a SparcStation. The Sun C compiler aligned all strings to word addresses:

>	000040a8, 000040ac, 000040b0, 000040b4, 000040b8, 000040c0.

This is not exactly true, if you change the declarations to be like:

	char *s3 = "ab";

Then you will find that the strings now have an alignment of 1.  The Sun
compiler has the habit of giving all global objects an alignment of 4 in
the "data" segment.  The unnamed strings are put in the "data1" segment
with an alignment of 1.  The string "ab" in the s3[] declaration above is
seen by the compiler as an { 'a', 'b', '\0' } initializer and not as a
string that goes in the data1 segment.
The gcc compiler is a bit smarter by noticing that the s3 array doesn't
need an alignment of 4.

Compile the program with 'cc -S', 'gcc -S', and 'gcc -S -fwritable-strings'
for both the [] and the * versions and study the '.s' output.

For people who do not understand the alignment business yet, I will
try to explain...

Most machines these days look at their memory as an array of words at the
hardware level.  All accesses to memory are word accesses.  If one wants
to read a word at address 32, then the processor puts memory address 8
on the address bus (assuming 4 byte words).  Reading a word from address
33 will lead to a bus error if you are lucky, or the processor will fetch
both words 8 and 9 from memory to find the proper bytes.  Needless to say,
writing a misaligned word is even more expensive.  Reading halfwords at
an even address or reading a byte at *any* address makes the processor read
the word that contains that halfword or byte.  To write a byte the processor
may need to read a word first before writing it back with the new byte
in it.  I think the old PDP-11 did it this way, but I think most modern
processors have a way of specifying which bytes in a word need to be
written.  (Pins on a CPU used to be expensive, maybe they still are.)

(Things are always different in reality.  The Sun 4/330's here at the VU
like their memory best if served as 8 times 8 bit wide SIMM-modules, which
seems to indicate that they do memory transfers in 64 bit chunks.)

One note on the ANSI C standard:  Our local C guru informed me that the
only thing the standard says about alignment is that malloc returns an
address that is suitably aligned for any datatype, and nothing more.
--
	                        Kees J. Bot  (kjb@cs.vu.nl)
	              Systems Programmer, Vrije Universiteit Amsterdam

dprrhb@inetg1.ARCO.COM (Reginald H. Beardsley) (05/28/91)

The problem usually arises in data structure declarations which are not 
aligned and padded properly. Sun warns of this in their documentation. 
Certainly the compiler should pad the strings out correctly.  I had to 
write a long routine which did byte copies out of a several hundred entry 
data structure because the original design was implemented on an IBM
360/370
series machine and didn't care. (An odd number of shorts were declared in
the
structure which caused accesses to the full words following to core dump
:-( )
I'd be curious to know what would happen to the gcc example if you inserted
a declaration of a fullword value (int or float) in your list of string
declarations, say after "char string[]="a";".  BTW the data structure
referred
to previously was done in FORTRAN with equivalences.

-- 
Reginald H. Beardsley       
ARCO Information Services
Plano, TX 75075           
Phone: (214)-754-6785
Internet: dprrhb@arco.com