klamer@mi.eltn.utwente.nl (Klamer Schutte) (05/24/91)
In <10033@star.cs.vu.nl> kjb@cs.vu.nl (Kees J. Bot) writes: >I'm posting my comments to Klamer's second try at an #! implementation in >MM to remind you about my implementation of #! that I posted on May 13. >klamer@mi.eltn.utwente.nl (Klamer Schutte) writes: |>Here is the second version of my #!interpreter patch for mm/exec.c. |>This version has all known bugs fixed. >Except for not doing setuid and this other "feature". I said i had bugs fixed. Setuid in your way is a bug. |>One feature (bug ???) remains: i keep alignment from the data argv[] and |>envp[] point to intact. There (migh ???) be a tradition of having this |>data in the form of strings with only 1 \0 in between. |>Where is the manual page for execve(2) ???? Or does POSIX(*) say anything |>about this? >I know of three places to look for the proper format of the initial stack: >- The old V7 manuals under exec(2), written when users were not considered > too stupid to know such things. >- The source code of execve(2). >- The source code of ps(1). >The ps(1) source contains this interesting comment: How nice, to talk about self-documenting code. But i don't model system call behaviour to assumptions made by any program. So ps(1), look at the POSIX description of execve(). And yes, i have looked at the source code of execve(2). This one is mm/exec.c right? The other one (lib/posix/execve.c) is just an attempt to get the POSIX specification mapped to minix system. >I decided to go over Klamer's patch with a fine comb this time. (I wish >someone would do that with my patch, with a mental -pedantic flag on.) Good initiative. Any Minix'er is invited to do so. >- ALIGN align to a multiple of 2, execve to a multiple of sizeof(char *). Yep. The comment near ALIGN says this is the correct alingment for an Atari ST. You know, that 16 bits machine ... >- The interpreter is found relative to '/'. (Move the first > tell_fs(CHDIR, ...) inside the do loop.) This is a bug, yes. Not to bad -- the interpreter should be specified with an absolute path in traditional (;-) #! implemetations. (Like SunOs). >- Setuid bits on the script are still ignored. (Wouldn't it be nice to > allow people to explore the security risks of a setuid script?) They can. They only have to change mm/exec.c. Any user not capable of doing that should not explore security risks ;-) >- Change 'know' to 'now' in patch_stack. (-pedantic) >- The old argv[0][] is not removed from the initial stack. See the earlier comments about the stack format. >- The ALIGN(len) is still at the wrong place. Try moving only the strings > by disp bytes, then move the pointers by argc*sizeof(char *) bytes. Do > an ALIGN(disp) just before the return. This will destroy alignment of the strings pointed by by argv[] and envp[]. >- If stk_bytes is close to ARG_MAX then the last few environment variables > may be truncated. There are three choices overhere: 1) truncate some of the strings if stk_bytes is close to ARG_MAX 2) expand stk_bytes to bigger than ARG_MAX if ... 3) fail the execution if ... I choose 1). You choose 3). 2) is IMHO out of the question. Rethinking it, 3) might be the best choice. What is the proper error code? ENOMEM specifies also that there is not enough core ... (Don't tell me lib/posix/execve.c uses ...) >- Read_header returns 0 when there is nothing behind #!. Bug. >- The size_ok function may return something other than -100. Can't check now. Don't have full minix online. I must say this kind of bug finding is a good way to get rid of them. But must this be on news? Private mail should be appropriate. Can anybody with a POSIX copy at hand clearify what the correct stack format is? My old V7 manual (for a m68k system) only says something about pdp11 and interdata stack layouts... Klamer -- Klamer Schutte Faculty of electrical engineering -- University of Twente, The Netherlands klamer@mi.eltn.utwente.nl {backbone}!mcsun!mi.eltn.utwente.nl!klamer
philip@cs.vu.nl (Philip Homburg) (05/24/91)
In article <klamer.675068096@mi.eltn.utwente.nl> klamer@mi.eltn.utwente.nl (Klamer Schutte) writes: %In <10033@star.cs.vu.nl> kjb@cs.vu.nl (Kees J. Bot) writes: %>Except for not doing setuid and this other "feature". % %I said i had bugs fixed. Setuid in your way is a bug. What bug, if you say something is a bug please explain why, and how it is triggered. %>I know of three places to look for the proper format of the initial stack: %>- The old V7 manuals under exec(2), written when users were not considered %> too stupid to know such things. %>- The source code of execve(2). %>- The source code of ps(1). %>The ps(1) source contains this interesting comment: % %How nice, to talk about self-documenting code. But i don't model system %call behaviour to assumptions made by any program. So ps(1), look at the %POSIX description of execve(). And yes, i have looked at the source code %of execve(2). This one is mm/exec.c right? The other one (lib/posix/execve.c) %is just an attempt to get the POSIX specification mapped to minix system. First of all POSIX doesn't specify bit patterns but the C-functions to access certain features of the operating system. I can't find an initial stack layout in IEEE Std 1003.1-1988. But I might have overlooked it. mm/exec.c doesn't specify the exec format either, it just copies some data from the old process' image to the new ones. Why isn't the source of execve.c, crtso.s, and ps.c enough for you. %This will destroy alignment of the strings pointed by by argv[] and envp[]. Without ps on a decent machine like a 386 there is no need for alignment specifications execept maybe for speed. But on sparcs and mc68000s some objects need to be aligned. According to the C standard arrays should be aligned on the alignment required by their basetypes. Since the base type of string is char and (1 byte)chars need no alignment, there is no need to align strings and the alignment of strings can't be destroyed. %I must say this kind of bug finding is a good way to get rid of them. %But must this be on news? Private mail should be appropriate. Oh, why are public discussions not appropriate for shell scripts, or bugs? I was glad that Kees posted his version of #! and I installed it immediately. Your version is still somewhere in one of my mailboxes. %Can anybody with a POSIX copy at hand clearify what the correct stack format %is? My old V7 manual (for a m68k system) only says something about pdp11 and %interdata stack layouts... First you state that the POSIX specification is mapped to the minix system, and now you ask for "anybody with a POSIX copy". Yes of course, my V7 manual also mentions only pdp11 and interdata, may be they didn't run V7 on mc68000s back in 1979? You forgot to mention that that manual says nothing about padding. But the the strings should follow the the null after the environment pointers. (Minix doesn't have the 0 word at the top of the memory either, has it?) Philip
ast@cs.vu.nl (Andy Tanenbaum) (05/24/91)
In article <klamer.675068096@mi.eltn.utwente.nl> klamer@mi.eltn.utwente.nl (Klamer Schutte) writes: >Can anybody with a POSIX copy at hand clearify what the correct stack format >is? My old V7 manual (for a m68k system) only says something about pdp11 and >interdata stack layouts... POSIX does not specify any particular stack format. Each implementation is free to do what it thinks best. The execution of shell scripts by exec is also not in there. Andy Tanenbaum (ast@cs.vu.nl)
kjb@cs.vu.nl (Kees J. Bot) (05/24/91)
klamer@mi.eltn.utwente.nl (Klamer Schutte) writes: >In <10033@star.cs.vu.nl> kjb@cs.vu.nl (Kees J. Bot) writes: >>klamer@mi.eltn.utwente.nl (Klamer Schutte) writes: >>Except for not doing setuid and this other "feature". >I said i had bugs fixed. Setuid in your way is a bug. It might be possible for an application to use a setuid script in a way that is not a security hole. Someone could place a a shell in /bin that is setuid root, but that is no reason to dissallow setuid for executables. If you want to protect users then add a wastebasket to /bin/rm. >How nice, to talk about self-documenting code. But i don't model system >call behaviour to assumptions made by any program. So ps(1), look at the >POSIX description of execve(). And yes, i have looked at the source code >of execve(2). This one is mm/exec.c right? The other one (lib/posix/execve.c) >is just an attempt to get the POSIX specification mapped to minix system. If the source of execve() is the best description you have for the initial stack then use it. Make sure that your modified stack looks like something execve() could have produced. There is no reason to make a mess of it just because there is no description in one of the many standards to choose from. BTW. there is no Berlin wall between execve() and mm/exec.c. Most UNIX systems pass the addresses of the argv and envp array up to the kernel and let the kernel collect the strings from user space. In Minix it is more convenient to build the initial stack in execve() and let MM fetch that. >>- The ALIGN(len) is still at the wrong place. Try moving only the strings >> by disp bytes, then move the pointers by argc*sizeof(char *) bytes. Do >> an ALIGN(disp) just before the return. >This will destroy alignment of the strings pointed by by argv[] and envp[]. Strings are made of bytes, and bytes have an alignment of 1, you can't destroy that! The argv and envp arrays are made of char *'s, so they need an alignment of sizeof(char *) in the worst case. >>- If stk_bytes is close to ARG_MAX then the last few environment variables >> may be truncated. >There are three choices overhere: >1) truncate some of the strings if stk_bytes is close to ARG_MAX >2) expand stk_bytes to bigger than ARG_MAX if ... >3) fail the execution if ... >I choose 1). You choose 3). 2) is IMHO out of the question. >Rethinking it, 3) might be the best choice. What is the proper error code? >ENOMEM specifies also that there is not enough core ... >(Don't tell me lib/posix/execve.c uses ...) Execve() uses E2BIG, mm/do_exec() uses ENOMEM for its first and normally useless check. I used ENOMEM in case 3) above, which I now see to be wrong thing to do. (Bug #1 in my version, finally.) >I must say this kind of bug finding is a good way to get rid of them. >But must this be on news? Private mail should be appropriate. You post it in public, so I comment on it in public. Why shouldn't other people know what's wrong with it? -- Kees J. Bot (kjb@cs.vu.nl) Systems Programmer, Vrije Universiteit Amsterdam
dprrhb@inetg1.ARCO.COM (Reginald H. Beardsley) (05/24/91)
In article <10055@star.cs.vu.nl>, kjb@cs.vu.nl (Kees J. Bot) writes: > klamer@mi.eltn.utwente.nl (Klamer Schutte) writes, etc. > > >>- The ALIGN(len) is still at the wrong place. Try moving only the strings > >> by disp bytes, then move the pointers by argc*sizeof(char *) bytes. Do > >> an ALIGN(disp) just before the return. > > >This will destroy alignment of the strings pointed by by argv[] and envp[]. > > Strings are made of bytes, and bytes have an alignment of 1, you can't > destroy that! The argv and envp arrays are made of char *'s, so they > need an alignment of sizeof(char *) in the worst case. > It is not the alignment of the characters that matters, it is the alignment of the adjacent full word values that matters. This means that in addition to starting the string on a word boundary, to needs to be padded to a multiple of the word length. For SPARC this will need to be 4 bytes. Even on systems that will tolerate the misalignment, there are very significant performance penalties for breaking the alignment. -- Reginald H. Beardsley ARCO Information Services Plano, TX 75075 Phone: (214)-754-6785 Internet: dprrhb@arco.com
nall@sun8.scri.fsu.edu (John Nall) (05/25/91)
In article <klamer.675068096@mi.eltn.utwente.nl> klamer@mi.eltn.utwente.nl (Klamer Schutte) writes: >In <10033@star.cs.vu.nl> kjb@cs.vu.nl (Kees J. Bot) writes: > > [...STUFF DELETED...] > >I must say this kind of bug finding is a good way to get rid of them. >But must this be on news? Private mail should be appropriate. > Seems to me, also, that private mail would be more appropriate. Perhaps there is some personal animosity existing that the rest of us are not aware of, but in general I would think that this sort of public fault-finding is not in good taste. -- John W. Nall | Supercomputer Computations Research Institute nall@sun8.scri.fsu.edu | Florida State University, Tallahassee, FL 32306 "Complete solitude is a torment which not even Hell threatens."
klamer@mi.eltn.utwente.nl (Klamer Schutte) (05/27/91)
In <10050@star.cs.vu.nl> philip@cs.vu.nl (Philip Homburg) writes: >In article <klamer.675068096@mi.eltn.utwente.nl> klamer@mi.eltn.utwente.nl (Klamer Schutte) writes: >First of all POSIX doesn't specify bit patterns but the C-functions to access >certain features of the operating system. I can't find an initial stack layout >in IEEE Std 1003.1-1988. But I might have overlooked it. I checked IEEE 1003.1-1986. Didn't mention it either. >mm/exec.c doesn't specify the exec format either, it just copies some data from >the old process' image to the new ones. Why isn't the source of execve.c, >crtso.s, and ps.c enough for you. I checked crtso.s (implicit, by testing whether it worked. It did.) Execve.c is just an implementation. And since i am changing the implementation, i don't mind changing its produced output (because that was what i was doing). I haven't checked ps.c, and i am not going to do it. I don't want system call behaviour to be modelled after some assumptions that a utily program has made, as long as these assumtions aren't supported by a manual or standard. What i did check was the POSIX standard (OK, an old one) for execve(). This one states that the main() function gets a pointer to an array of pointers which point to the data. And my implementation does that... The reason i wanted the alignment to be saved was that a char * is often used were a void * is meant. And some people use void * to point to anything. (See for instance the SunOs 4.1.1 manual page on malloc(), and you understand what i mean.) In this way one can let argv[1] point to a double in execve(), and let main() read this double. >%This will destroy alignment of the strings pointed by by argv[] and envp[]. >%I must say this kind of bug finding is a good way to get rid of them. >%But must this be on news? Private mail should be appropriate. >Oh, why are public discussions not appropriate for shell scripts, or bugs? Hmmmm. Should have missed the shell script. Bugs should be send. -pedantic searches not IMHO. Klamer -- Klamer Schutte Tel: +31-53-892786 Fax: +31-53-340045 Faculty of electrical engineering -- University of Twente, The Netherlands preferred: klamer@mi.eltn.utwente.nl SMTP: klamer@utelmi01.el.utwente.nl
philip@cs.vu.nl (Philip Homburg) (05/27/91)
In article <klamer.675334474@mi.eltn.utwente.nl> klamer@mi.eltn.utwente.nl (Klamer Schutte) writes:
%I checked crtso.s (implicit, by testing whether it worked. It did.)
%Execve.c is just an implementation. And since i am changing the implementation,
%i don't mind changing its produced output (because that was what i was doing).
%I haven't checked ps.c, and i am not going to do it. I don't want system call
%behaviour to be modelled after some assumptions that a utily program has made,
%as long as these assumptions aren't supported by a manual or standard.
Ok, there is no exact specification the initial stack layout other than in
ps.c or implicit in the v7 manual. But this is not a permit to break
existing code and history (look at older or other versions of ps) just
because there was no committee to standardize it. It is very easy to change
the implementation of Kees to supply alignment without breaking ps. The
implementation is left as an exercise for the reader. I am not going to do it
because string are NOT aligned.
%The reason i wanted the alignment to be saved was that a char * is often used
%were a void * is meant. And some people use void * to point to anything.
%(See for instance the SunOs 4.1.1 manual page on malloc(), and you understand
% what i mean.)
%In this way one can let argv[1] point to a double in execve(), and let
%main() read this double.
This is absolutely ridiculous. Are you really talking about *(double *)argv[1] ?
Are you sure that your doubles don't contain zero bytes? How do you exec
your programs? a.out \x12\x3\0\4 ?
In minix malloc returns a void *. Is the text you are referring to:
malloc() returns a pointer to a block of at least size
bytes, which is appropriately aligned.
This is nothing special, ansi requires the same. But the strings of the
arguments are really strings and not doubles, therefor require no alignment.
%>Oh, why are public discussions not appropriate for shell scripts, or bugs?
%
%Hmmmm. Should have missed the shell script.
%Bugs should be send. -pedantic searches not IMHO.
I thought that #! was to implement some features of executing shell scripts
(what do You call script starting with #!). And you call breaking ps
pedantic?
Philip
adrie@philica.ica.philips.nl (Adrie Koolen) (05/28/91)
In article <1991May24.164952.22295@Arco.COM> dprrhb@inetg1.ARCO.COM (Reginald H. Beardsley) writes: > It is not the alignment of the characters that matters, it is the >alignment of the adjacent full word values that matters. This means that >in addition to starting the string on a word boundary, to needs to be >padded to a multiple of the word length. For SPARC this will need to be >4 bytes. Even on systems that will tolerate the misalignment, there are >very significant performance penalties for breaking the alignment. I compiled a program containing: char s1[] = ""; char s2[] = "a"; char s3[] = "ab"; char s4[] = "abc"; char s5[] = "abcd"; char s6[] = "abcde"; ... printf("%08x, %08x, %08x, %08x, %08x, %08x.\n", s1, s2, s3, s4, s5, s6); on a SparcStation. The Sun C compiler aligned all strings to word addresses: 000040a8, 000040ac, 000040b0, 000040b4, 000040b8, 000040c0. while the GNU C compiler put all strings one after the other without aligning anything: 000040a8, 000040a9, 000040ab, 000040ae, 000040b2, 000040b7. It looks like string alignment is not necessary in ANSI C. But then again, maybe I missed your point. Anyway, Minix-Sparc generates with the following program: main(int argc, char *argv[]) { int i; for (i=0; i<argc; i++) printf("%08x: %s\n", argv[i], argv[i]); } and the invocation: proef d is een test this output: 00011F74: proef 00011F7A: d 00011F7C: is 00011F7F: een 00011F83: test Adrie Koolen (adrie@ica.philips.nl) Philips Innovation Centre Aachen
kjb@cs.vu.nl (Kees J. Bot) (05/28/91)
adrie@philica.ica.philips.nl (Adrie Koolen) writes: >I compiled a program containing: > char s1[] = ""; > char s2[] = "a"; > char s3[] = "ab"; > char s4[] = "abc"; > char s5[] = "abcd"; > char s6[] = "abcde"; > ... > printf("%08x, %08x, %08x, %08x, %08x, %08x.\n", s1, s2, s3, s4, s5, s6); >on a SparcStation. The Sun C compiler aligned all strings to word addresses: > 000040a8, 000040ac, 000040b0, 000040b4, 000040b8, 000040c0. This is not exactly true, if you change the declarations to be like: char *s3 = "ab"; Then you will find that the strings now have an alignment of 1. The Sun compiler has the habit of giving all global objects an alignment of 4 in the "data" segment. The unnamed strings are put in the "data1" segment with an alignment of 1. The string "ab" in the s3[] declaration above is seen by the compiler as an { 'a', 'b', '\0' } initializer and not as a string that goes in the data1 segment. The gcc compiler is a bit smarter by noticing that the s3 array doesn't need an alignment of 4. Compile the program with 'cc -S', 'gcc -S', and 'gcc -S -fwritable-strings' for both the [] and the * versions and study the '.s' output. For people who do not understand the alignment business yet, I will try to explain... Most machines these days look at their memory as an array of words at the hardware level. All accesses to memory are word accesses. If one wants to read a word at address 32, then the processor puts memory address 8 on the address bus (assuming 4 byte words). Reading a word from address 33 will lead to a bus error if you are lucky, or the processor will fetch both words 8 and 9 from memory to find the proper bytes. Needless to say, writing a misaligned word is even more expensive. Reading halfwords at an even address or reading a byte at *any* address makes the processor read the word that contains that halfword or byte. To write a byte the processor may need to read a word first before writing it back with the new byte in it. I think the old PDP-11 did it this way, but I think most modern processors have a way of specifying which bytes in a word need to be written. (Pins on a CPU used to be expensive, maybe they still are.) (Things are always different in reality. The Sun 4/330's here at the VU like their memory best if served as 8 times 8 bit wide SIMM-modules, which seems to indicate that they do memory transfers in 64 bit chunks.) One note on the ANSI C standard: Our local C guru informed me that the only thing the standard says about alignment is that malloc returns an address that is suitably aligned for any datatype, and nothing more. -- Kees J. Bot (kjb@cs.vu.nl) Systems Programmer, Vrije Universiteit Amsterdam
dprrhb@inetg1.ARCO.COM (Reginald H. Beardsley) (05/28/91)
The problem usually arises in data structure declarations which are not aligned and padded properly. Sun warns of this in their documentation. Certainly the compiler should pad the strings out correctly. I had to write a long routine which did byte copies out of a several hundred entry data structure because the original design was implemented on an IBM 360/370 series machine and didn't care. (An odd number of shorts were declared in the structure which caused accesses to the full words following to core dump :-( ) I'd be curious to know what would happen to the gcc example if you inserted a declaration of a fullword value (int or float) in your list of string declarations, say after "char string[]="a";". BTW the data structure referred to previously was done in FORTRAN with equivalences. -- Reginald H. Beardsley ARCO Information Services Plano, TX 75075 Phone: (214)-754-6785 Internet: dprrhb@arco.com