kbad@atari.UUCP (Ken Badertscher) (03/27/89)
Please excuse the length of this message (nearly a hundred lines - sheesh!) but I consider Atari's role in setting software standards fairly important... Since I'm taking over the mantle (or is it a yoke? <yuk yuk>) of maintaining GEMDOS, I will be intimately involved in the pending (admittedly belated) adoption of a standard for extended argument passing in TOS. It is a major concern of mine, for a couple of reasons: - Two (or more) de facto standards exist - The proponents of these standards seem to have some emotional involvement in whose standard is best: "Since they (MWC) are bigger than I am..." Mr. Beckemeyer ;-) "Oh, I wish I had known about this conclave!" Mr. Schumacher ;-) "<expletive deleted>!" David Parsons ;-) ;-) - GEMDOS enhancements are coming that will require a robust means of passing lots of information between parent and child processes In fact, we have discussed this recently at Atari. It came up because of a bug in Gulam, which does something strange with the command line length byte. The MWC extended argument scheme not only puts extended arguments in the ARGV environment variable; it also puts a length byte of 127 into the command line. Normally, this is no big deal, because a command line string is null terminated anyways. Even if a Pexec caller passes a command tail that is not null terminated, GEMDOS makes sure that the trailing null is there in the command line part of the basepage of any process that it creates. If, however, a program uses a command line length byte that is unreasonable (like 127), that program is going to have problems. GEMDOS does nothing at all with the length byte passed in a Pexec command tail except copy it along with the rest of the command tail into the child's basepage. To quote the Pexec Cookbook (yes Virginia, there is a Pexec Cookbook), the command tail string is a "null-terminated Pascal-style string." This means that the maximum theoretical length of a GEMDOS command line is 126 bytes (128 bytes in the basepage minus length byte minus null terminator). In fact, the *actual* maximum length of a GEMDOS command line is 125 bytes, because that's the maximum number of bytes that GEMDOS will copy into a command line when it creates a process during Pexec. Unfortunately, I didn't get that little tidbit into the Pexec Cookbook before it went out. The upshot of all this is that it is* possible for a process to know if the ARGV in its environment is meant for it and not left over from one of its MWC/Manx/MT-C Shell spawned ancestors. If its command line length is something unreasonable, a program can take that as a clue that it should go sniffing around in its environment for more information. One can only imagine the multitude of meanings that could be assigned to bytewise-negative command line lengths... HOWEVER... I still have problems with both ARGV and xArg command line extenders. This whole iovector thing bothers me. David Parsons came up with a nice, clean isatty() that works just fine without cluttering up ones environment with ???????'s. As far as I can see, the iovector is not necessary, nor does it belong in a scheme for passing arguments. And as Dale et. al. said in the xArg proposal, it "makes a mess of the already confused environment string." XARG also bothers me, because it means that a child has to muck with its parent's address space. Can you say "virtual nightmare"? Not only that, but XARG.xparent (pointer to parent's basepage) is a needless duplication of information that a process can get from its own basepage (although I guess it could be useful for an extra level of verification). And again, needless support for the iovector string. And both methods *require* the startup code to search the environment for extra commands that may not even be there. Currently, I'm leaning towards the ARGV method with the iovector yanked, a negative byte value for the command line length to flag it, and a different name for the environment variable. This saves applications from needlessly hunting through their environments (if the length byte isn't negative), and applications that don't understand the significance of the negative length byte will get what they can from the 125 bytes that GEMDOS gives them. Since the dLibs don't use the iovector anyways, it only entails a small startup code change for them. If the name of the environment variable is different, library code which uses the iovector from ARGV can still do it, but I STRONGLY recommend that people who are using iovector strings fix their libraries to use a safer isatty(). I am not formulating a standard on a Sunday afternoon, though. I'll let this percolate through the net for a week or three, and see what bounces back. From here and other sources, we'll come up with something that everyone can at least live with. And to those of you who have been bemoaning the lack of "strong Atari involvement" in matters like this, I have but one thing to say: Be careful what you wish for, you just might get it! -- Ken Badertscher | #include <disclaimer> Atari R&D Software Engine | GEMDOS LIVES! ...or is that Frodo? {portal,ames,imagen}!atari!kbad | I can never remember these things...
root@TSfR.UUCP (usenet) (03/29/89)
In article <1405@atari.UUCP> kbad@atari.UUCP (Ken Badertscher) writes: > - GEMDOS enhancements are coming that will require a robust means of > passing lots of information between parent and child processes > [discussion on pascal strings & the command line tail] > The upshot of all this is that it is* possible for a process to know >if the ARGV in its environment is meant for it and not left over from >one of its MWC/Manx/MT-C Shell spawned ancestors. If its command line >length is something unreasonable, a program can take that as a clue... What if the program uses the length byte to figure out length of command line tail like so: for (length = *(tail)++; length-- > && .... ???? It would be nice to be able to have the extended arguments code also be able to handle talking to processes that don't grok extended arguments and which don't clear sign-extension on the length byte (which is not an unreasonable thing to ignore, because the tail can't be >127 chars.) >XARG also bothers me, because it >means that a child has to muck with its parent's address space. No kidding. But xargs was designed under the reluctant assumption that the OS would never change to accomodate it. Dale & I initially suggested that we use some of the `reserved' fields in the basepage for argc/argv (after all, the basepage is for process information, and _real_ argc/argv are most certainly process information) but Allan Pratt strongly objected to that, so we had to hack the environment and do the relatively byzantine bit of `environment-pointer-to- magic-verify-record-to-arguments-we-hope.' But at least xArgs doesn't carry the argument list to all the descendants of the process, which is what the Mark Williams `standard' does. Dale and I felt that hacking parents data-space was less objectionable that filling the environment with crud (my word) / extranous information (his words :-) > Currently, I'm leaning towards the ARGV method with the iovector >yanked, a negative byte value for the command line length to flag it, >and a different name for the environment variable. This is with the argument list in the environment, right? The big problem with this is that you will get those strings propagated all over the universe, anyway, and if a process modifies the environment with putenv(), the ARGV= stuff will be scrambled around. Say, for example, that I call the teeny-shell from a program that uses MWC arguments; Teeny-shell will get ARGV=<0>ARG0<0><ARG1<0>...<0><0>. which will parse to at least one valid environment variable - ARGV=ARG0. If any of the arguments I passed into the teeny-shell have `=' in them, they, too, will get converted into valid environment variables. (if I do `tsh -i PATH=g:,c:\bin,c:\games,. some-initial-command', for example.) And if this child (the one who doesn't grok the MWC `standard') does a putenv() or two, it will end up (most likely) putting new environment variables at the end of the arglist, after the corpse of the argument-list. The diddles to the length of the cmdline tail will prevent children from incorrectly reading the environment, but there will still be this collection of garbage left orphaned in the environment. And if the teeny- shell calls another process that groks MWC args, which then forks off another process, well, then that orphan will see the first ARGV= value, which is now being treated as a valid environment variable, and get stuck with the fun prospect of trying to figure out if it is, indeed, the start of the argument list or if it's something else. > I am not formulating a standard on a Sunday afternoon, though. That's a relief :-) If you're planning to do enhancements to TOS for extended argument passing, perhaps it's time to consider the basepage & additional gemdos traps for extended argument pexecing? I don't like corrupting the environment with argument stuff*, because of the transient nature of arguments. When you start diddling with the environment, you end up with having to make both the caller and the callee grok the format, otherwise you start to see strange little artifacts pop up all over the place. If you use one of those reserved fields, well, if the child doesn't know what the entended arguments are, they die right there, rather than being carried all over God's creation by its children. -david parsons -orc@pell.uucp
dbsuther@PacBell.COM (Daniel B. Suthers) (03/30/89)
In article <512@TSfR.UUCP> orc@pell.UUCP (David L. Parsons) writes: >In article <1405@atari.UUCP> kbad@atari.UUCP (Ken Badertscher) writes: >> - GEMDOS enhancements are coming that will require a robust means of >> passing lots of information between parent and child processes > >> [discussion on pascal strings & the command line tail] > >> The upshot of all this is that it is* possible for a process to know >>if the ARGV in its environment is meant for it and not left over from >>one of its MWC/Manx/MT-C Shell spawned ancestors. If its command line >>length is something unreasonable, a program can take that as a clue... I am very happy that this aspect of the OS is being standardized. It is always necessary for a "working" OS to handle inter program communications. I have but ONE favor to ask; When the standard is created, could you please publish not only the definition, but the Startup Code for the current (and past) Compilers? It is possible for MWC, Laser, Lattice, Aztec, etc to each do their own, but it would be much cleaner for us, the users, if we could obtain the trivial code up-front from a reputable source (ATARI). It would be a public relations coup if your company actually created the necessary code to enable any defunct compilers to continue to be useful. My offer: If you will provide the definition, I will provide the startup module for Lattice C. Daniel Suthers, Systems Analyst UUCP: {ihnp4}!pacbell!pbnon!dbsuther
sarnila@tukki.jyu.fi (Pekka Sarnila) (03/30/89)
Would it be totaly unthinkable to make GEMDOS Pexec to support some more intelligent parameter passing. What I would propose: if lenght is say FF the last long (after null) would contain the pointer to unix like parameter structure. In that case Pexec would count the lenght of that structure and allocate that much more space and copy (and reloc pointers) it to the new processes memory area. Or programms that expect unix like parameter would have different magic number from which Pexec would know to treat it differently. Startup would just pass the pointer then to main(). (This is what I do in my GEMDOS enhancer) -- ------------------------------------------------------------------- Pekka Sarnila, University of Jyvaskyla Finland sarnila@tukki.jyu.fi -------------------------------------------------------------------
kbad@atari.UUCP (Ken Badertscher) (04/02/89)
Here is a summary of some of the mail & messages I've seen in the past week regarding the proposed GEMDOS Extended Argument Standard: David Beckemeyer writes: ------------------------ What about the transition period? There are a whole bunch of programs that use the current MWC style ARGV. MWC programs, and anything compiled with their libraries use iovector and they don't always work right when it's not there. Also I bet even if MWC does implement the new standard, it will take a while and many users won't change over for a long time. This leaves an ever increasing number of programs that won't understand the new format (and probably won't work right). The next issue is the idea of the "negative" command tail count. I wonder if some startup code might just take the value as 8-bit unsigned and attempt to grab, say 254 bytes from the tail without checking and maybe bomb out in attempting it (I've seen programs that do worse). I don't see how [leaving just the iovector after ARGV] can work, since the old MWC ARGV is *always* followed by the list of arguments. If the the new name has the arguments, you can't put them after ARGV too. John Dunning writes: -------------------- I think the scheme you outlined (use the magic env var "ARGV=..." as a sentinel, and use a negative byte length in the command string) is sort of ok, but but I'd advocate taking it one step further. The thing I'd worry about is old programs (or new programs using startup code that's not quite 'right' etc) getting their caller's args, if they were spawned with Pexec(0, foo, bar, 0). I propose the following extension to your scheme: When Pexec'ing, and there's no environment passed to Pexec, instead of simply cloning the current env, clone selectively, ie copy only up til the magic sentinel. Do everything else the same, and I think you get all the same benefits, with less risk of confusing programs that, for whatever reason, aren't adhering to your new protocol. You also don't even NEED to assign meanings for oddball values of the command line length byte if you do that. Part of why I suggest this scheme is that I just re-wrote the system() function for the GNU C lib to do just what I described. David Parsons writes: --------------------- What if the program uses the length byte to figure out length of command line tail like so: for (length = *(tail)++; length-- > && .... ???? It would be nice to be able to have the extended arguments code also be able to handle talking to processes that don't grok extended arguments and which don't clear sign-extension on the length byte (which is not an unreasonable thing to ignore, because the tail can't be >127 chars.) [...] at least xArgs doesn't carry the argument list to all the descendants of the process, which is what the Mark Williams `standard' does. [...] The big problem with this is that you will get those strings propagated all over the universe, anyway, and if a process modifies the environment with putenv(), the ARGV= stuff will be scrambled around. [and he proceeds to describe just how ugly the argument propagation gets] If you're planning to do enhancements to TOS for extended argument passing, perhaps it's time to consider the basepage & additional gemdos traps for extended argument pexecing? Pekka Sarnila writes: --------------------- if length is say FF the last long (after null) would contain the pointer to unix like parameter structure. In that case Pexec would count the length of that structure and allocate that much more space and copy (and reloc pointers) it to the new processes memory area. ================================================================ The transition period is going to be ugly. No two ways about it. No matter what new standard we adopt, we're going to break some software. If the new scheme is sufficiently different from the old ones, however, software which uses an older method will continue to work with other software which agrees with that old method. The negative command tail count has been received negatively. That's OK, because there's still an "unreasonable" positive command tail length that isn't in use: 126 bytes. Remember that GEMDOS only ever copies 125 bytes from the command tail passed to it in Pexec. The various suggestions that deal with changing the behavior of Pexec are unacceptable for two reasons: 1) Folks with an older version of TOS would lose big. 2) We have just finished a TOS revision, and the next one won't be complete for quite some time. I would prefer for this standard to be in place before that, so that the Desktop, et. al. can take advantage of it. It's just going to have to be the responsibility of library startup code to comply with the new standard. Period. Using the basepage, or additional GEMDOS traps, won't fly either, for the reasons stated above. Finally, let's consider the sticky problem of environment-based argument strings being propagated all over creation. There are two things that GEMDOS copies from a parent when it creates a process. The basepage (process descriptor) information, and the environment. GEMDOS allocates memory for these, then fills them in. It would be nice if it allocated an argv[] area as well, but it doesn't. The only clean way that I see to get around that problem is to put the extra args in the environment. This causes problems if the args take the form of a standard argv[] array. BUT... If the extended argument string is not itself a series of null terminated strings, the problem solves itself neatly. getenv() and putenv() will not confuse the environment, and can in fact be easily used to pass a lot of args to a child. The environment arguments should consist of a regular command-tail style string, with arguments separated by white space. That way, arguments can even contain `='! So... here's GEMDOS Extended Argument Standard proposal #2: ---------------------------------------------------------------- If a program wants to pass more than 125 bytes of arguments to a child, it should create a command tail of (at most) the first 125 bytes of the "real" arguments, and give it a length byte of 126. This will satisfy children which don't understand the extended argument standard. The program should also place an environment variable called ARGS in the environment it intends to pass to the child (its own, or a specially created environment, per the Pexec spec). The ARGS variable should consist of the name of the child (argv[0]) followed by a space, followed by the rest of the arguments (argv[1..n]), separated by spaces. The startup code of ARGS-aware programs is responsible for checking the command line for the magic length byte, then searching the environment for the ARGS environment variable command line. It must copy the ARGS command line somewhere so that it can be parsed into an argv[] string. IT MUST NOT BE PARSED IN PLACE, OR THE ENVIRONMENT WILL BE CORRUPTED. The startup code can then set the argument count and argv[] array pointers, and do whatever else it needs to do before starting up the mainline code. ---------------------------------------------------------------- Hopefully, some time this week, I will be able to get in touch with most of the compiler vendors to discuss this plan with them. In the mean time, please keep the e-mail coming! -- Ken Badertscher | #include <disclaimer> Atari R&D | No pith, just a path: Software Engine | {portal,ames,imagen}!atari!kbad
7103_300@uwovax.uwo.ca (Eric Smith) (04/03/89)
In article <1424@atari.UUCP>, kbad@atari.UUCP (Ken Badertscher) Dwrites: > So... here's GEMDOS Extended Argument Standard proposal #2: < description of proposal omitted > That doesn't sound too bad, but I liked the MWC proposal better, mostly because it's in use already. Creating an entirely new proposal means that *nothing* currently in existence will work with the new standard. One potential problem I see with the 2nd proposal is the suggestion that programs stick a "126" in for the argument length if they have set up a valid ARGV= vector. This could still break old programs that are expecting a valid length count. On the other hand, if the extended argument passing scheme is only used for very long command lines, then argv[0] is unusuable most of the time. Finally, seperating the arguments with white space makes it impossible to pass arguments containing blanks, a minus for programs like "grep". -- Eric R. Smith email: Dept. of Mathematics 7103_300@uwovax.uwo.ca University of Western Ontario 7103_300@uwovax.bitnet London, Ont. Canada N6A 5B7 (a shared mailbox: put my name on ph: (519) 661-3638 the Subj: line, please!)
david@bdt.UUCP (David Beckemeyer) (04/04/89)
Most of my comments have been sent directly to Atari, but just to make my position known to the rest of the world, I'll make a brief statement. I think it's great that Atari wants to get involved. I agree with most of the problems stated by others regarding both the MWC ARGV= format and the new format lightly proposed by Atari. I think the best thing I've heard is the idea of GEMDOS Pexec enhancements as opposed to some more convoluted environment hacking. This could be done with complete downward compatibility. The details must be worked out very carefully. One rough idea I have is along the lines of a new Pexec option [e.g. Pexec(10, ...)] or possibly, as proposed by others, a special interpretation of the tail Pexec argument (I think this is a little more dangerous, personally). I propose a new GEMDOS trap function code which requests the extended arguments. This way new programs must explicitly ask for the extended arguments, so there will be no confusion about whether the program knows what it's doing or not. I prefer this to a new definition of the basepage header; let GEMDOS decide where to keep the new arguments. This has the most chance for downward compatibility and also it will be easier to support in the future becuase it is all handled internally to GEMDOS, anyway it wants (no future worries about protected memory and children accessing the parents memory space). Programs that don't know anything about the new GEMDOS extendend arguments will not be affectted directly. If they want to pass around ARGV= or Xarg formats, fine. If they don't, the children will just look for the "old-style" command tail, which will contain valid info. Programs that don't understand the new GEMDOS standard will not get extended arguments from shells that don't use any of the "old" formats but they will at least run. They can be updated as needed and perhaps special utilitiies provided with new shells will allow passing arguments in some "old" style. Whatever is finally "officially" decided, you can be sure that new versions of MT C-Shell and other BDT programs will support it. -- David Beckemeyer (david@bdt.UUCP) | "Adios amigos. And, as they say when Beckemeyer Development Tools | the boys are scratching the bad ones, 478 Santa Clara Ave. Oakland, CA 94610 | 'Stay a long time, Cowboy!'" UUCP: {uunet,ucbvax}!unisoft!bdt!david | - Jo Mora
t19@np1.hep.nl (Geert J v Oldenborgh) (04/04/89)
I wonder why we could not use the present 'standard' with one small diff: let the sartup code of a program kill the extra part of the environment by putting a 0L on "_P(B=...)". This unclutters the environment and avoids misunderstandings of its children, the main arguments against this scheme. The only requirement now is that the _PBP, ARGV and args must be at the END of the environment now, but they are already there in all implementations I have seen. G.J. van Oldenborgh
apratt@atari.UUCP (Allan Pratt) (04/06/89)
In article <156@np1.hep.nl> t19@np1.hep.nl (Geert J v Oldenborgh) writes: > I wonder why we could not use the present 'standard' with one small diff: > let the sartup code of a program kill the extra part of the environment by > putting a 0L on "_P(B=...)". This unclutters the environment and avoids > misunderstandings of its children, the main arguments against this scheme. The only thing that fixes is a "smart" program finding surprising things in its environment. (That is, an argument FOO=3 will also look like an environment variable FOO with value 3). This doesn't help dumb programs at all. A dumb program will still get this funny thing in its environment, and will still pass that along to its children. Worse, if the dumb program actually tries to change its environment, it might put a new (and important) variable at the end, after the ARGV and the rest, and a "smart" program will lose this when it nulls out ARGV. So ARGV is bad for dumb programs. We need something which does not violate the format of an environment variable (i.e. no embedded nulls allowed) and therefore doesn't confuse either dumb programs which change their environments or smart ones which use getenv(). We also need to address validation: finding ARGV in your environment does not mean that the args were meant for you. MWC kindly puts a strange number in the command-line length byte ($7f), but, alas, doesn't check it. If it did check this would solve the problem, because a dumb program might blindly pass along the ARGV in the environment, but it's not likely to put the same number in the command-line length byte. Another validation technique is to use the process' PID (basepage address) to identify who put the args in the environment: eight bytes can encode the PID of the culprit, and this number could be checked against the "parent's basepage" field in the child's basepage; a match means these args really came from your parent. This removes the problem of putting a funny number in the command-line length byte, which might just happen accidentally. ============================================ Opinions expressed above do not necessarily -- Allan Pratt, Atari Corp. reflect those of Atari Corp. or anyone else. ...ames!atari!apratt
apratt@atari.UUCP (Allan Pratt) (04/08/89)
In article <536@bdt.UUCP> david@bdt.UUCP (David Beckemeyer) writes: > ... I propose a new GEMDOS trap function code > which requests the extended arguments. This way new programs must > explicitly ask for the extended arguments, so there will be no confusion > about whether the program knows what it's doing or not. Unfortunately, it's too late to put something like this in TOS 1.4, and we'd like to establish a new standard sooner than we can get another TOS release out. To require an OS upgrade for people to get the benefit of extended arguments is really more trouble than we'd hoped for; almost ANYTHING (including ARGV= and xArgs) is preferable to that. ============================================ Opinions expressed above do not necessarily -- Allan Pratt, Atari Corp. reflect those of Atari Corp. or anyone else. ...ames!atari!apratt
usenet@TSfR.UUCP (usenet) (04/08/89)
In article <1424@atari.UUCP> kbad@atari.UUCP (Ken Badertscher) writes: > The negative command tail count has been received negatively. That's >OK, because there's still an "unreasonable" positive command tail >length that isn't in use: 126 bytes. Remember that GEMDOS only ever >copies 125 bytes from the command tail passed to it in Pexec. Unless you run into mixed-mode conflicts with processes that don't know about this limit and assume that 127 bytes of command tail means 127 bytes of command tail. The Pexec cookbook has not made it into the hands of all of the developers out there. > The various suggestions that deal with changing the behavior of Pexec >are unacceptable for two reasons: > 1) Folks with an older version of TOS would lose big. Whyso? It should be possible for Atari, of all people, to figure out code that will correctly insert the values into old-style TOS basepages. And if that turns out not to be possible, it provides incentive for people to upgrade to the newer and hopefully better working ROMS. > 2) We have just finished a TOS revision, and the next one won't be > complete for quite some time. I would prefer for this standard > to be in place before that, so that the Desktop, et. al. can take > advantage of it. If the Desktop is wired into the ROMS, I fail to see how the desktop can take advantage of the new argument passing scheme before the rest of the OS does. > It's just going to have to be the responsibility of library startup >code to comply with the new standard. Period. And if you have object-code-only modules that you can't get recompiled under the new libraries, you're out of luck? (What's the chance, for example, that Atari will recompile the Alcyon C compiler to take advantage of the new scheme?) > Using the basepage, or additional GEMDOS traps, won't fly either, for >the reasons stated above. You are Atari. You can create a TSR a'la FOLDRXXX or CACHEXXX or ... to handle that. I fail to see the moral difficulty of writing patch code to do the dirty work of loading in a real argc/argv space. > > So... here's GEMDOS Extended Argument Standard proposal #2: > [omitted] And you're still cluttering up the environment with stuff that's not supposed to be there, but this time it's one *huge* string (which would look *awful from a printenv or any other command to print out the environment) and it relys on a "can't happen" value in the command tail that does* happen (it happened to me today, twice, while recompiling STadel - if I'd have been running the proposed standard I'd be getting _really_ lovely failures as cc tried to eat the command-line fed to make. Lovely, and almost impossible to track down, because cc gives a usage message if the arguments are incorrect, but does not echo the arguments back at you) as well as using up an environment value that is may be used from inside other programs for other usages. If you're insistant on installing an environment kludge, at least give the damn thing some improbable name ("cLuTtEr=" comes to mind) and have it contain the address of its parents basepage for validation. You're still filling the environment with cr*p that doesn't belong there, but then you eliminate the possibility of having the wrong child eat the environment and you drastically cut down on the chances of overwriting valid environment variable names. Xargs, at least, had the advantage of a minimal load on the environment, as well as a name (xArg) that will probably not conflict with anything that a user wants to stuff into the environment. -david parsons -orc@pell.uucp