rob@alice.UUCP (03/05/84)
opus!rcd complains that the kernel has no business looking for the #!. apparently, he doesn't understand that #! is a magic number, just like 0407 or 0413. the string that is exec'ed is inside the header that the kernel must read (note: must read; so much for 'no business opening' files) to determine that the file is a binary and exec it.
trb@masscomp.UUCP (03/06/84)
I just think that it was really nice that whoever put in the #! feature used the magic number 20443 for editable executables. They used the old 407 format for binary executables because it would have been hard to type and noisy to print out (407 is ^G^A). Seriously, 407 was a PDP11 Branch .+7 instruction, which was executed to jump over the a.out header. Andy Tannenbaum Masscomp Inc Westford MA (617) 692-6200 x274
tll@druxu.UUCP (LaidigTL) (03/06/84)
One little annoyance that appears with using "#!" as a magic number is the old byte-ordering problem. The magic number for an executable file is defined to be a short int (maybe unsigned, I forget), so that the old octal 407 is ^G^A on a PDP-11 or a VAX, but is ^A^G on some other machines. Similarly, depending on your machine, #! can have either of two values. You can get around this in several (semi-) portable ways, for instance: 1) Have the kernel do a strncmp to test against #!, and integer tests for the magic numbers of binary executable files. This is less efficient than is nice. 2) Test for the #! with an integer test for equality with '#!' if you believe in the portability of this. Tom Laidig AT&T Information Systems Laboratories, Denver ...!ihnp4!druxu!tll
rcd@opus.UUCP (03/07/84)
<> > opus!rcd complains that the kernel has no business looking for the #!. > apparently, he doesn't understand that #! is a magic number, just like > 0407 or 0413. the string that is exec'ed is inside the header that the > kernel must read (note: must read; so much for 'no business opening' files) > to determine that the file is a binary and exec it. Yes, the kernel has to open the file to look at the magic number and get the header, etc. However, it need not recognize a 020443 ("#!") magic number as valid - that was really my point. In the "old way", the kernel would just fail an attempt to exec a shell script and the exec library code in the process would pick up from there. (This is just why the ENOEXEC error code has the meaning it does.) Exec(2) could be made to look for a "#!" line and handle it appropriately. This seems a little clunky perhaps, but remember that the kernel just opened the file and read the start of it, so the library code probably isn't going to generate any disk accesses when it reopens the file to look again. There's a lot of code in the Berkeley kernels to handle shell scripts - take a look at it. Since (1) the number of applications which need to do exec's of arbitrary programs is relatively small, (2) the cost of handling scripts in user code is small compared to the rest of the fork/exec/start- up-a-shell overhead, and (3) the shell-script handling can be reasonably put in the library code where it is pageable, it seems that it would be better off there, particularly if you're at all tight on memory. I will also (re)state my opinion that the kernel is in a bad place (at least awkward) for giving a good interpretation of errors in being unable to "exec" a shell script - it would take STILL MORE precious kernel space for the code to generate good error values, as well as yet more expansion of the set of errno values. -- {hao,ucbvax,allegra}!nbires!rcd
kre@mulga.SUN (Robert Elz) (03/08/84)
The point of doing '#!' stuff inside the kernel is that it allows setuid interpreted programs (including 'sh' scripts as a special case). That can't be accomplished in any library routine, no matter how hard you try. With that, shell scripts become as versatile as compiled (a.out format) executables, you can ALWAYS use whichever is most appropriate, without being stopped by implementation restrictions - which is just as it should be. Another effect, is that the name of the script (interpreted program) goes in /usr/adm/acct instead of the ubiquitous 'sh'. I might add that the original idea & code to do this were by Dennis Ritchie (if my sources are correct, & they are fairly good sources I think), and I added it to 4.1bsd. Robert Elz decvax!mulga!kre ps: the code is reasonably portable, the "magic number" is the string "#!", it works, as is, whichever way your bytes are arranged. And yes, its fractionally slower than treating the magic number as some horrible octal constant!
thomas@utah-gr.UUCP (Spencer W. Thomas) (03/09/84)
The ONE thing that putting #! into the kernel gets you that having exec(2) do it (besides putting it in probably a more obvious place), is to allow setuid shell scripts. This gives shell scripts an equal footing with all other programs - you don't have to explain to users that if they write a C program and chmod u+s it works, but if they write a shell script and chmod u+s it doesn't. =Spencer
merlyn@sequent.UUCP (03/09/84)
[from nbires!rcd...] [[ There's a lot of code in the Berkeley kernels to handle shell scripts - [[ take a look at it. Since (1) the number of applications which need to do [[ exec's of arbitrary programs is relatively small, (2) the cost of handling [[ scripts in user code is small compared to the rest of the fork/exec/start- [[ up-a-shell overhead, and (3) the shell-script handling can be reasonably [[ put in the library code where it is pageable, it seems that it would be [[ better off there, particularly if you're at all tight on memory. I will [[ also (re)state my opinion that the kernel is in a bad place (at least [[ awkward) for giving a good interpretation of errors in being unable to [[ "exec" a shell script - it would take STILL MORE precious kernel space for [[ the code to generate good error values, as well as yet more expansion of [[ the set of errno values. [[ -- The shell script code CANNOT be put into a library for certain cases. The one I think of is that you cannot then have execute-only shell scripts. The kernel MUST open the file, and make it stdin to the designated program. Arbitrary interpreters (ala /bin/sh) would not have enough power to open the file (if read-denied) and start up the desired other interpreter. Setuid interpreted scripts also become available as well (such as an execute-only, read-protected, setuid cshell script). This is also something that couldn't be done at the application level, but rather at the kernel level. I thank Berkeley for putting in this feature, and appreciate it (and use it) quite regularly. Randal L. Schwartz, esq. Sequent Computer Systems, Inc. UUCP: ...!tektronix!ogcvax!sequent!merlyn BELL: (503)626-5700 P.S. It's true... the infamous UUCP-breaker from last year is back! After 10 months of VMS, I get to play with UNIX again! What a deal.
henry@utzoo.UUCP (Henry Spencer) (03/11/84)
Yup, #! in the kernel permits setuid shell scripts. I'm not sure that this is a virtue, considering that people seem to be unaware of the simply appalling number of security holes this opens up. If you think about the consequences of feeding a setuid shell file a non- standard value of the IFS variable, with some suitably-named programs lying around ready and waiting, you will have some idea of the sort of things I'm referring to. Shell files simply are not in a good position to handle things like this; the interpretation process for them is too complex and there is too little control over it. This does not mean that I'm opposed to #! in the kernel, just that setuid shell scripts seem a very weak justification for it, given that they are grossly unsafe. -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry
george@idis.UUCP (George Rosenberg) (03/11/84)
I think that giving the kernel the ability to exec files
that can be interpreted is a good feature.
I don't use a system with this feature.
Since there is a discussion about it, I will bring up some
points that I am curious about.
The programs that perform the most execs, sh and csh,
will also be the interpreters that most scripts employ.
If an exec fails the Bourne shell will interpret
a script via a longjmp.
I assume that a longjmp is much cheaper than completing an exec.
Has anyone compared the two?
Is there a way that an informed program such as a shell interpreter
can tell the kernel to return without completing the exec,
if the old program and new interpreter are the same file and
the setuid and setgid bits of the script are not set?
>From what I understand, execing a file that begins with "#!"
will examine the next two tokens on the first line.
The first token must be a path name of an interpreter.
The tokens are inserted before the old argument list.
The execute bit and the setuid and setgid bits of the script are honored.
The interpreter is responsible for opening the script.
Can the interpreter also have "#!" as its magic number?
It seems to me that a design that also accommodated access modes
in which a file containing a script was executable but not readable
would also have been desirable.
This could have been done by simulating an open for reading (and perhaps
also a deferred FIOCLEX) on a particular file descriptor (such as 19).
I think that simulating an lseek to the beginning of the second line
would also be desirable.
Does anyone know why nothing like this was added at the same time?
George Rosenberg
duke!mcnc!idis!george
decvax!idis!george
aegl@edee.UUCP (Tony Luck) (03/19/84)
One really useful effect of having the kernel look for '#!' is that shell scripts are then really just like other executables (there should be no way to tell by running it whether a program is a shell file or a binary) i.e. you can make them setuid if you need to, or use them as login shells without having to fix login/newgrp/su to know about ENOEXEC. Tony Luck { ... UK !ukc!edcaad!edee!aegl }
knop@dutesta.UUCP (Peter Knoppers) (04/08/84)
We have a PWB/UNIX system running on some PDP11/45's. Our shell is altered to operate on set-uid scripts. The shell is set-uid to root. On entry it checks the mode of the script. If the s-bit is set, the shell does a setuid to the user id of the file, if not the shell changes its privileges back to those of the real user. Peter Knoppers, Delft Univ. of Technology ..!{decvax,philabs}!mcvax!dutesta!knop -- Peter Knoppers, Delft Univ. of Technology ..!{decvax,philabs}!mcvax!dutesta!knop
allbery@ncoast.UUCP (Brandon Allbery) (12/05/85)
Expires: Quoted from <416@ihdev.UUCP> ["Re: magic numbers? (teach me, please)"], by pdg@ihdev.UUCP (P. D. Guthrie)... +--------------- | In article <124@rexago1.UUCP> rich@rexago1.UUCP (K. Richard Magill) writes: | > 1) How does the shell (exec?) know whether the command I just typed | > is a shell script or one of several possible types of | > executable? | | The shell doesn't know. The shell merely tells the kernel to exec the | file, after doing a fork. The kernel determines if a file is a binary | executable by the magic number, which is obtained by reading an a.out.h | structure (4.1,4.2) or filehdr.h (sys 5) and comparing it against | hardcoded numbers in the kernel. In 4.1 for instance only 407,413 and | 410 are legal. This also tells the kernel the specific type of | executable, and in some cases can set emulation modes. The kernel also | recognizes | #! /your/shellname | at the beginning of a file and execs off the appropriate shell instead. +--------------- In 4.2, the #! is recognized. In all other Unices, the exec will fail, and the shell will decide that the file must be a shell script; it proceeds to fork off a copy of itself to run the script. (Csh on non-4.2 systems checks for a # as the first character of the file, and forks itself if it sees it; if not, it forks a /bin/sh.) +--------------- | > 2) Presuming the answer to #1 above has something to do with | > magic numbers, who issues them? is there a common | > (definitive) base of them or does each | > manufacturer/environment make up their own set? | | The magic number is issued by the linker/loader. Pretty much the magic | number is decided by the manufacturer, but from what I have seen, is | kept constant over machines. Forgive me if this is wrong, but I do not | have any method of checking, but the magic numbers for say plain | executable 4.x Vax and plain executable SysV.x Vax are the same, but | SysV.x Vax and SysV.x 3B20 are different. Could someone comfirm this? +--------------- Executables using ``standard'' binary formats, i.e. a.out (PDP-11, Z8000) and b.out (MC68000) use the standard magic numbers 0405, 0407, 0410, 0411. Non-standard formats, like Xenix x.out (0x0206) and COFF (flames to /dev/null; most systems are [ab].out) use distinctive magic numbers. There are other magic numbers. Old-style archives (ar) have 0177545 as a magic number; again, the loader knows about this, since a library is an archive. System V archives begin with the magic ``number'' "!<arch>\n". Cpio archives also have magic numbers in them, but at the archive-member level. -- Lord Charteris (thurb) ncoast!allbery@Case.CSNet (ncoast!allbery%Case.CSNet@CSNet-Relay.ARPA) ..decvax!cwruecmp!ncoast!allbery (..ncoast!tdi2!root for business) 6615 Center St., Mentor, OH 44060 (I moved) --Phone: +01 216 974 9210 CIS 74106,1032 -- MCI MAIL BALLBERY (WARNING: I am only a part-time denizen...)
gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (12/06/85)
> Csh on non-4.2 systems checks for a # as > the first character of the file, and forks itself if it sees it; if not, it > forks a /bin/sh. Yes, some of them do, but if they do that's a bug. Virtually all of my Bourne shell scripts (including SVR2 system utility scripts) start with "#".
guy@sun.uucp (Guy Harris) (12/07/85)
> Executables using ``standard'' binary formats, i.e. a.out (PDP-11, Z8000) > and b.out (MC68000) use the standard magic numbers 0405, 0407, 0410, 0411. > Non-standard formats, like Xenix x.out (0x0206) and COFF (flames to > /dev/null; most systems are [ab].out) use distinctive magic numbers. Well, VAX UNIX (32V, 4.xBSD, System III, Version 8?) also uses those magic numbers (with 413 added for demand paged executables on 4.xBSD), and probably lots of other 4.xBSD systems (Sun's does). Does "most" mean "most UNIX implementations" or "most boxes running UNIX"? If the latter, I think Xenix is running on a lot of systems, possibly most. Then again, *my* copy of "Xenix(TM) Standard Object File Format (January 1983)" implies that that "0x0206" is the "magic number" and is *not* distinctive; the "x_cpu" field indicates what CPU it's intended for. (This is sort of like the new Sun UNIX 3.0 object file format, where the "a_machtype" field indicates whether it's intended for a 68010 or 68020). COFF seems to invert this, since the "file header" indicates what machine it's intended for (and tons of other glop) and the "UNIX header" (which is basically the old a.out header) has the 0405, 0407, 0410, 0411, and 0413 (yes, that's what they use for paged executables, surprise surprise) which indicates the format of the image but is machine-independent (modulo byte ordering). Then again, the "file header" magic number seems to indicate something about the format of the executable, but see a previous posting of mine for some dyspepsia caused by the proliferation of multiple file header magic numbers. > There are other magic numbers. Old-style archives (ar) have 0177545 as a > magic number; again, the loader knows about this, since a library is an > archive. System V archives begin with the magic ``number'' "!<arch>\n". System V, Release 2 archives, anyway; System V Release 1 had a portable archive format which was different from the 4.xBSD one which was the first one to use the "!<arch>\n" magic "number". I'm told they came to their senses because Version 8, being 4.1BSD-based, used that format. > Cpio archives also have magic numbers in them, but at the archive-member > level. No, it has a magic number at the beginning - 070707 (either as a "short" or a string, depending on whether it's an old cruddy "cpio" archive or a nice new "gee, we've finally caught up with 'tar' when it comes to portability" "cpio -c" archive. (S3 had "-c", but it had a bug so it wasn't really portable. S5 fixed this bug. S5 also broke the byte-swapping garbage: S3 had an option to swap the bytes within 2-byte quantities. Presumably, this was because running the tape through "dd" to byte-swap *everything*, and then byte-swapping the data and pathnames inside "cpio", thus swapping the binary portion of the header once and everything else twice, is obviously more efficient than just swapping the binary portion of the header once. ("cpio" already has hacks to deal with 4-byte quantities - namely, file size and modified time - automagically, by shoving "1L" into a "long" and seeing whether the 0th byte of that "long" is 0 or not, so PDP-11s and VAXes don't have problems.) It is also obvious that forcing the user to specify a byte-swapping option is better than just looking at the magic number and seeing whether it's 070707 or a byte-swapped 070707 and deciding whether to swap or not based on that. Whoever worked on "cpio" for S5 obviously figured that the purpose of this byte-swapping crap was to make it possible to move binary data between machines with different byte orders (as everybody knows, most files with binary data are continuous streams of 2-byte or 4-byte quantities), not to provide a gross and kludgy way of byte-swapping the binary portion of a "cpio" header, so they added an option to swap the 2-byte portions of 4-byte quantities ("stupid FP-11", to quote - if I remember correctly - the VAX System III linker, that particular piece of DEC hardware being responsible for some PDP-11 software, including but *NOT* limited to UNIX, having a different format for 32-bit integers than the VAX's hardware supports) and an option to swap both bytes and 2-byte quantities. They also "fixed" it not to swap the bytes of the pathnames. This "fix" means that running the "cpio" archive through "dd" to swap the bytes, and then doing a byte swap again in "cpio", results in path names with their bytes swapped! ("/nuxi", anyone?) In effect, you are now screwed if you have a "cpio" tape, not made with "-c", which was produced on a machine with a different byte order. You can't read it in conveniently. (This has been experimentally verified. I had to whip up a version of "cpio" which does what "cpio" should have done in the first place - namely, just byte swap the damn "short"s in the header - to read a tape made on a System V VAX using the System V "cpio" on a Sun.)) There are a number of quite intelligent and talented people working on UNIX development at AT&T Information Systems. It looks like the people in charge of keeping track of COFF magic numbers, and in charge of "cpio", are in need of some supervision by the aforementioned people. (Fortunately, it looks like the IEEE P1003 committee is looking at a "tar"-based format, with fixes to support storing information about directories and special files, for tapes. I'm told that the European UNIX vendor consortium, X/OPEN, chose a "cpio" format because of the "cpio" *program*'s byte-swapping "capabilities". Aside from the basic stupidity (and incorrectness, in the case of the S5 "cpio") of these "capabilities", they are irrelevant to the choice of tape *format* because: 1) "tar" doesn't need byte-swapping options because the control information is in printable ASCII string format (any tape controller which is good as anything other than a target for skeet-shooting will write character strings in memory out to the tape in character-string order) 2) "cpio" has the "-c" option which does the same thing, so it doesn't need those options except for reading old tapes (any reasonable "cpio"-format-based standard would be based on "cpio -c" format, not "cpio" format), and 3) a *good* program which handles "cpio" format can figure out the byte order it needs for reading pre-"cpio -c" tapes by looking at the magic number anyway! (Flame off, until next time a collection of stupidities this gross comes to light.) Guy Harris